One of Hansen’s pit bulls, Tamino, has re-visited Mannian principal components. Tamino is a bright person, whose remarks are all too often marred by the pit bull persona that he has unfortunately chosen to adopt. His choice of topic is a curious one, as he ends up re-opening a few old scabs, none of which seem likely to me to improve things for the Team, but some of which are loose ends, and so I’m happy to re-visit them and perhaps resolve some of them. Tamino’s post and associated comments are marred by a number of untrue assertions and allegations about our work, a practice that is unfortunately all too common within the adversarial climate science community. When Tamino criticizes our work, he never quotes from it directly – a practice that seems to have originated at realclimate and adopted by other climate scientists. Thus his post ends up being replete with inaccuracies whenever it comes to our work.
Tamino’s main effort in this post is an attempt to claim that Mannian non-centered principal components methodology is a legitimate methodological choice. This would seem to be an uphill fight given the positions taken by the NAS Panel and the Wegman Panel – see also this account of a 2006 American Statistical Association session. However, Tamino makes a try, claiming that Mannian methodology is within an accepted literature, that it has desirable properties for climate reconstructions and that there were good reasons for its selection by MBH. I do not believe that he established any of these claims; I’ll do a post on this topic.
In the course of this exposition, Tamino discusses the vexed issue of PC retention for tree ring networks, claiming that MBH employed “objective criteria” for retaining PCs. I am unaware of any methodology which can successfully replicate the actual pattern of retained PCs in MBH98, a point that I made in an early CA post here. This inconsistency has annoyed me for some time and I’m glad that Tamino has revived the issue as perhaps he can finally explain how one actually derives MBH retention for any case other than the AD1400 NOAMER network. I’ll do a detailed post on this as well.
Whenever there is any discussion of principal components or some such multivariate methodology, readers should keep one thought firmly in their minds: at the end of the day – after the principal components, after the regression, after the re-scaling, after the expansion to gridcells and calculation of NH temperature – the entire procedure simply results in the assignment of weights to each proxy. This is a point that I chose to highlight and spend some time on in my Georgia Tech presentation. The results of any particular procedural option can be illustrated with the sort of map shown here – in which the weight of each site is indicated by the area of the dot on a world map. Variant MBH results largely depend on the weight assigned to Graybill bristlecone chronologies – which themselves have problems (e.g. Ababneh.)
Figure 1. Each dot shows weights of AD1400 MBH98 proxies with area proportional to weight. Major weights are assigned to the NOAMER PC1 (bristlecones), Gaspe, Tornetrask and Tasmania. Issues have arisen in updating each of the first three sites.
Readers also need to keep in mind that two quite distinct principal component operations are carried out in MBH98 – one on gridded temperature data from CRU; and one on tree ring networks from ITRDB and elsewhere. The characteristics of the networks are very different. The gridded temperature networks result in a matrix of data with some effort at geographic organization, while there is no such attempt in the tree ring networks. The tree ring networks are more like the GHCN station networks than like a gridded network. For comparison, imagine a PC calculation on station data in the GHCN network going prior to 1930 with no attempt at geographic organization or balance. Followers of this discussion realize that long station data is overwhelmingly dominated by U.S. (USHCN) station information. Actually, the tree ring networks are even more disparate than GHCN station networks. Many of the tree ring sites are limited by precipitation. Thus, PC on the tree ring networks is more like doing a PC analysis on a pseudo-GHCN network consisting of mostly of precipitation measurements with some blends of temperature and precipitation, with a predominance of stations from the southwest U.S. There are obviously many issues in trying to transpose Preisendorfer methodology developed for very specific circumstances to this sort of information.
Before getting to either of the above larger topics, I wish to lead in with a discussion of the PC calculations in the gridded temperature network – something that hasn’t been on our radar screens very much, but which may shed some light on the thorny tree ring calculations. Here there is some new information since early 2005 – the source code provided to the House Energy and Commerce Committee shows the “Rule N” calculation for gridded temperature networks (though not for tree ring networks) and, in the absence of other information, can help illuminate obscure Mannian calculations.
In the course of this review, I also reconsidered Preisendorfer 1988, Mann’s cited authority for PCA. Preisendorfer (1988), entitled “Principal Component Analysis in Meteorology and Oceanography”, is an interesting and impressive opus, with many interesting by-ways and asides. Preisendorfer is clearly an experienced mathematician and his results are framed in mathematical terms, rather than ad hoc recipes. Many of his comments remain relevant to the present debate, .
The underlying model for Preisendorfer’s entire book is a data network obtained from a spatial field over time. His matrices are indexed over time and over space; they are not abstract indexes. He discusses gridded networks of temperature or sea level pressure or composites, but they are never geographically inhomogeneous ragbags, like inhomogeneous collections of site chronologies from ITRDB. Preisendorfer also presumes that the gridded data is produced from the operation of a physical system governed by equations – again something that hardly applies to tree ring neworks with no geographic homogeneity or indexing. The premise of a physical system is intimately related to his use of PCA and methods for retaining PC series:
Various data sets generated by solutions of any of a large class of linear ordinary or of linear partial differential equations exhibit the PCA property in the limit of large sample sizes n. When this is the case, the eigenvectors of the data sets resemble the theoretical orthogonal spatial eigenmodes of the solutions. In this way, “empirical orthogonal functions” arise with definite physical meaning.
As observed previously at CA here , the first step in Preisendorfer’s methodology is “t-centering”, i.e. removing the mean of each time series:
The first step in the PCA of [data set] Z is to center the values z[ t,x] on their averages over the t series… Using these t-centered values z[t,x], we form a new n x p matrix. …
While the purpose of this post is not to discuss short-centering, I’ll note that Preisendorfer explicitly stated that non-centered singular value decompositions of data matrices are not principal components analyses.
If Z in (2.56) is not rendered into t-centered form, then the result is analogous to non-centered covariance matrices and is denoted by S’. The statistical, physical and geometric properties of S’ and S [the covariance matrix] are quite distinct. PCA, by definition, works with variances i.e. squared anomalies about a mean.
This doesn’t mean that one cannot propose some sort of rationale for weights derived from non-centered analyses. However the rationale for such weighting must be demonstrated; in addition, rules derived from Preisendorfer may not readily transpose.
In addition, if the spatial field is scalar and of one type (e.g. only temperature or only sea level pressure), Preisendorfer’s base case uses the covariance matrix. If the data is a composite of two different spatial fields (e.g. one field of temperature, one of sea level pressure), Preisendorfer recommends that the data be standardized by dividing by its standard deviation (page 41).
In his chapter 5, Preisendorfer discusses various rules for deciding how many PCs to retain in a representation of such a gridded system, including rules based on “dominant variance” (such as Rule N). One section of chapter 5 is entitled “Dynamical Origins of the Dominant Variance Selection Rules” and commences:
To provide a simple physical basis for the dominant variance selection rules, we consider the dynamical model (3.12) ….
Retention rules within the Preisendorfer corpus are thus linked to the physical assumptions. The Rule N test is described in Chapter 5 and describes a simulation procedure in which 100 random matrices are generated. For each random matrix, eigenvalues are calculated. The 95th percentile eigenvalue is determined for each eigenvalue and a curve drawn. This is compared to the actual eigenvalues, with data eigenvalues larger than the curve from random eigenvalues being described as “hopefully signal” (p 192).
Necessary but not Sufficient
Preisendorfer himself nowhere asserts that passing Rule N or any other rule demonstrates statistical significance. His entire approach is using PCA as exploratory – something entirely consistent with practice in social sciences. Preisendorfer (269ff):
The null hypothesis of a dominant variance rule [e.g. Rue N] says that the [data matrix] Z is generated by a random process of some specified form, for example a random process that generates equal eigenvalues of the associated scatter matrix S. Hence if one can reject this hypothesis (say via Rule N) then there is reason to believe that Z has been generated by a process unlike that used in the null hypothesis. This rejection does not automatically fix the process generating Z as non-random for there may be alternate random processes generating Z besides that used in the null hypothesis. One may only view the rejeciton of a null hypothesis as an attention getter, a ringing bell that says: you may have a non-random process generating your data set Z. The rejection is a signal to look deeper, to test further.
Preisendorfer goes on to say (270):
There is no royal road to the successful interpretation of selected … principal time series for physical meaning or for clues to the type of physical process underlying the data set Z. The learning process of interpreting [eigenvectors and principal components] is not unlike that of the intern doctor who eventually learns to diagnose a disease from the appearance of the vital signs of his patient. Rule N in this sense is, for example, analogous to the blood pressure reading in medicine. The doctor, observing a significantly high blood pressure, would be remiss if he stops his diagnosis at this point of his patient’s examination.
All of Preisendorfer’s comments on 269-271 are worth considering. He completes the section by describing PCA as a “means rather than an end”:
The novice practitioner of PCA may well fix at the outset the proper place of PCA in his studies of the atmosphere-ocean system: PCA is a probing tool; it is a preliminary testing device; it is a technique to be used at the outset of a search for the physical basis of a data set; it is some initial ground on which to rest diagnoses, model building and predictions. In sum, PCA is not an end in itself but a means toward an end.
Previously Overland and Preisendorfer  had clearly stated that being significant under Rule N was only necessary for significance; they did not argue that it was sufficient. The nuanced approach of Preisendorfer’s actual text is also evident in ecological articles using PCA (articles that are arguably far more relevant to the analyses of tree ring networks than gridded meteorological data.) For example, Franklin et al.  stated:
In the final analysis, the retained components must make good scientific sense (Frane & Hill 1976; Legendre & Legendre 1983; Pielou 1984; Zwick & Velicer 1986; Ludwig & Reynolds 1988; Palmer 1993).
The take-home point here – as so often in these reviews – is that there is no magic formula, including Rule N, by which climate scientists can deduce from principal components applied to a tree ring network that bristlecones measure world temperature or are appropriate to include in an inverse linear regression.
MBH98 on Gridded Temperature Networks
Let’s start by reviewing what MBH98 actually said about their PC analysis of gridded temperature information and compare this procedure to procedures recommended in Preisendorfer (1988):
For each grid-point, the mean was removed, and the series was normalized by its standard deviation. A standardized data matrix T of the data is formed by weighting each grid-point by the cosine of its central latitude to ensure areally proportional contributed variance, and a conventional Principal Component Analysis (PCA) is performed … An objective criterion was used to determine the particular set of eigenvectors which should be used in the calibration as follows. Preisendorfer’s  selection rule ‘rule N’ was applied to the multiproxy network to determine the approximate number Neofs of significant independent climate patterns that are resolved by the network, taking into account the spatial correlation within the multiproxy data set..
A couple of points to note here. First the subtraction of the mean of each gridded series is described and is in accordance with the t-centering of Preisendorfer. No short centering here. Although Preisendorfer recommended standardization (division by standard deviation) for series not expressed in common units, he did not explicitly recommend this for spatial fields of one type. To my knowledge, not much turns on this particular decision for gridcell temperatures in terms of final weights for individual proxies; I merely note that this particular step does not appear mandatory or even preferred within the Preisendorfer opus. As noted by von Storch, to accomplish areal proportion, weighting should have been done by the square root of the cosine latitude. Again not much turns on this particular error.
The portion of MBH98 source code archived in summer 2005 in response to a request from the House Energy and Commerce contains the following comments pertaining to the use of Rule N for gridded networks:
now determine suggested number of EOFs in training based on rule N applied to the proxy data alone during the interval t > iproxmin (the minimum year by which each proxy is required to have started, note that default is iproxmin = 1820 if variable proxy network is allowed (latest begin date in network) . We seek the n first eigenvectors whose eigenvalues exceed 1/nproxy’. nproxy’ is the effective climatic spatial degrees of freedom spanned by the proxy network (typically an appropriate estimate is 20-40)
The source code then carries out an SVD on a short-scaled matrix of proxies available in a step (e.g. 22 proxies in the AD1400 step) and calculates the sum of the eigenvalues squared (sumtot) and divides the square of each eigenvalue by the sum of the squared eigenvalues.
If M is the number of proxies in the network (22 in the AD1400 network), the number of retained temperature PCs is set equal to the number of greater than 2/M (.0909 for the AD1400 network) – see the description of “RuleN3” in the source code. In passing, note that the distribution of eigenvalues is affected by scaling, with more weight loaded on to the PC1 in Mannian short-segment scaling.
So the actual retention criterion (regardless of what is stated in MBH) is based on a rule of thumb related to the number of proxies. The hurdle for significance is that the relative contribution from the eigenvalue exceeds twice the contribution based on equal contributions from all series. I am unaware of this rule of thumb occurring within Preisendorfer 1988 and thus it is not a direct application of Rule N. It seems possible that this rule of thumb can emerge from AR1 red noise models with AR1 values at levels of around 0.3, in the range that Mann likes. So it’s possible that there’s been some sort of offline calculation developing this rule of thumb. I’ll experiment a little with this and report on this today or tomorrow.
As a closing thought, I remind readers once again that, at the end of the day, all that is determined in these Wizard of Oz calculations is the weight that should be applied to individual proxies and, in particular, to bristlecones. Given the NAS Panel statement that strip bark chronologies should be “avoided”, no one should accept the validity of any mechanistic rule supposedly showing the mandatory inclusion of the PC4 (bristlecones in a covariance PC analysis).
I notice that some commenters at Tamino’s claim that discussion of the effect of bristlecones is somehow “moving the goalposts”. For such readers, I refer to our March 2004 submission to Nature where we certainly argued that the MBH principal components method was flawed, but we also focused on the operational impact of the flawed methodology – to overweight the Graybill bristlecone chronologies in the reconstruction – an overweighting that was highly questionable given the prior concerns expressed by specialists over these series.
We do assert, based on the above considerations, that the distinctive hockey-stick shape of the MBH98 temperature reconstruction is primarily due to the Graybill cambial dieback and similar tree ring sites that exhibit non-linear or non temperature-related 20th century growth and to a questionable step in their principal components algorithm that overweights these series. Without these problematic series and without their biased principal component analysis, Mann et al. are not entitled to conclude that the 20th century was uniquely warm based on their data and methods.
The arguments in MM2005 (EE, GRL) are consistent with this. While there’s been a development in some aspects of my understanding of the statistical issues over time, bristlecones have been an issue since March 2004 and are not new goalposts. Since then Mann et al have proposed various strategies purporting to yield similar results to MBH98 (without ever confronting either the failed verification r2 results or the lack of claimed “robustness” to all dendroclimatic indicators). We discussed such salvage proposals in MM2005 (EE), which typically develop different strategies for including a heavy weight to Graybill bristlecone chronologies. Some strategies pertain to policies on PC retention in tree ring networks, which I’ll discuss tomorrow. Typically such strategies sacrifice some important consideration (e.g. geographic balance), but all such strategies with the MBH network require the bristlecones. More on this tomorrow.