Richard Smith, a prominent statistician, has recently taken an interest in multiproxy reconstructions, publishing a comment on Li, Nychka and Ammann 2010 (JASA) here and submitting another article here. I’ll try to comment more on another occasion.
Today I want to express my frustration at the amount of ingenuity expended by academics on more and more complicated multivariate methods, without any attention or consideration to the characteristics of the actual data. For example, Smith (2010) attempts to analyze the MBH98 North American tree ring network without even once mentioning bristlecones.
Smith starts off as follows:
In this discussion, we use principal components analysis, regression, and time series analysis, to reconstruct the temperature signal since 1400 based on tree rings data.Although the “hockey stick” shape is less clear cut than in the original analysis of Mann, Bradley, and Hughes (1998, 1999), there is still substantial evidence that recent decades are among the warmest of the past 600 years.
Smith refers to MM2003, MM2005a and MM2003b, describing only one of a number of issues raised in those articles – Mannian principal components. Smith Smith describes the network as follows:
The basic dataset consists of reconstructed temperatures from 70 trees for 1400–1980, in the North American International Tree Ring Data Base (ITRDB).
The dataset is located at Nychka’s website here and is a mirror image of the MBH tree ring network that we archived in connection with MM2005a. 20 of these are Graybill strip bark chronologies – the ones that were left out in the CENSORED directory.
Pause for a minute here. Leaving aside the quibble that we are talking about tree ring chronologies rather “70 trees”, Smith has, without reflection, taken for granted that the 70 tree ring chronologies are 70 examples of “reconstructed temperature”. They aren’t. They are indices of tree growth at these 70 sites, which, in many cases, are more responsive to precipitation than temperature. Academics in this field are far too quick to assume that things are “proxies” when this is something that has to be shown.
The underlying question in this field is whether Graybill strip bark bristlecone chronologies have a unique capability of measuring world temperature. We discussed this in MM200b as follows:
While our attention was drawn to bristlecone pines (and to Gaspé cedars) by methodological artifices in MBH98, ultimately, the more important issue is the validity of the proxies themselves. This applies particularly for the 1000–1399 extension of MBH98 contained in Mann et al. . In this case, because of the reduction in the number of sites, the majority of sites in the AD1000 network end up being bristlecone pine sites, which dominate the PC1 in Mann et al.  simply because of their longevity, not through a mathematical artifice (as in MBH98).
Given the pivotal dependence of MBH98 results on bristlecone pines and Gaspé cedars, one would have thought that there would be copious literature proving the validity of these indicators as temperature proxies. Instead the specialist literature only raises questions about each indicator which need to be resolved prior to using them as temperature proxies at all, let alone considering them as uniquely accurate stenographs
of the world’s temperature history
Most “practical” readers of this blog have no difficulty in understanding this point, whereas academics in this field prefer to consider the matter via abstract policies on PC retention, with Smith being no exception.
Smith’s approach was to regress world temperature against principal components of the MBH tree ring network (with all 20 Graybill chronologies) varying the number of retained principal components and examining the fit. Smith described the problem as an inverse regression i.e. “cause” (world temperature y) against “effects” – the proxies denoted as x.
While Smith says that this is a “natural” way to look at the data, I don’t think that OLS regression of cause against a very large number series is “natural” at all. On the contrary, if one looks at this methodology even with relatively simple pseudoproxies, it is a very poor method. (There’s a 2006 CA post on these issues that IMO is a very good treatment.)
In my opinion, if the tree ring series truly contain a “signal”, a much more “natural” approach is to calculate an average – an alternative that is seldom considered by academics in this field.
Reducing the number of proxy series in the X matrix makes the problem of OLS regression less bad. Smith characterizes the OLS problem as one of overfitting and says that a “standard method for dealing with this problem” is to transform into principal components. Smith then goes to the problem of how many principal components to retain.
I don’t think that one can assume that principal components applied to the NOAMER tree ring network will automatically lead to good results.
Preisendorfer, a leading authority on principal components who was cited in MBH98, provided the following advice in his text- advice quoted at CA here:
The null hypothesis of a dominant variance selection rule [such as Rule N] says that Z is generated by a random process of some specified form, for example a random process that generates equal eigenvalues of the associated scatter [covariance] matrix S…
One may only view the rejection of a null hypothesis as an attention getter, a ringing bell, that says: you may have a non-random process generating your data set Z. The rejection is a signal to look deeper, to test further. One looks deeper, for example, by drawing on one’s knowledge and experience of how the map of e[i] looks under known real-life synoptic situations or through exhaustive case studies of e[i]‘s appearance under carefully controlled artificial data set experiments. There is no royal road to the successful interpretation of selected eigenmaps e[i] or principal time series a[j] for physical meaning or for clues to the type of physical process underlying the data set Z. The learning process of interpreting [eigenvectors] e[i] and principal components a[j] is not unlike that of the intern doctor who eventually learns to diagnose a disease from the appearance of the vital signs of his patient. Rule N in this sense is, for example, analogous to the blood pressure reading in medicine. The doctor, observing a significantly high blood pressure, would be remiss if he stops his diagnosis at this point of his patient’s examination. ….Page 269.
A ringing bell.
Applying Preisendorfer’s advice, the next scientific task is to determine whether Graybill bristlecone chronologies truly have a unique ability to measure world temperatures and, if so, why. A step urged on the field in MM2005b,
Instead of grasping this nettle – one that has been outstanding for a long time – Smith, like Mann and Wahl and Ammann before him, purported to argue that inclusion of bristlecones could be mandated “statistically” without the need to examine whether the proxies had any merit or not.
Smith’s approach was a little different than similar arguments by Mann and Wahl and Ammann. Smith did a series of such regressions varying K, calculating the Akaiche Information Criterion and other similar criteria for each regression, ultimately recommending 8 PCs, still giving a HS, though one that is not as bent as the original. Smith begged off consideration of the bristlecones as follows:
I have confined this discussion to statistical aspects of the reconstruction, not touching on the question of selecting trees for the proxy series (extensively discussed by M&M, Wegman, Scott, and Said and Ammann/Wahl) nor the apparent recent “divergence” of the relationship between tree ring reconstructions and measured temperatures (see, e.g., NRC 2006, pp. 48–52). I regard these as part of the wider scientific debate about dendroclimatology but not strictly part of the statistical discussion, though it would be possible to apply the same methods as have been given here to examine the sensitivity of the analysis to different constructions of the proxy series or to different specifications of the starting and ending points of the analysis.
I strongly disagree with Smith’s acquiescence in failing to grasp the nettle of the Graybill chronologies. The non-robustness of results to the presence/absence of bristlecones should have been clearly reported and discussed.
Ron Broberg commented on Smith (2010) here. Broberg referred to a number of my posts on principal components and commented acidly on my failure to propose a “good rule for the retention of PCs”:
I’m listing some of Steve McIntyre’s posts on the number of PCs to retain. If, after reading these, you still don’t know what McIntyre believes to be a good rule for the retention of PCs, then at least I know I’m not alone. If I have missed something, please let me know.
While Broberg may be frustrated, our original approach to the problem was one of auditing and verification i.e. begin with the examination of MBH policy for retention of principal components. We tried strenuously to figure out what Mann had done and were unable to do so. Mann’s criteria for PC retention remain unexplained and unknown to this day. Broberg may be frustrated, but I can assure readers that I am far more frustrated that this important step in MBH remains unexplained to this day.
In the case at hand, until one resolves whether Graybill bristlecone chronologies are a valid temperature proxy, nor do I see the point of trying to opine on the “right” number of retained principal components. It seems to me that Smith begged the question with his initial statement that the 70 series in the NOAMER network were “reconstructed temperatures”. Maybe they are, maybe they aren’t. Surely that needs to be demonstrated scientifically, rather than assumed.