How many principal components to retain? Recent readers of Climate Audit may not realize that this was an absolute battleground issue of MBH and Wahl and Ammann. In one sense, it was never resolved with MBH back in 2003-2005, but that was before the existence of blogs made it possible to focus attention on problems. So I’m quite fascinated to see this play out under slightly different circumstances.
In the Steig et al case, as shown in the figure below, if one PC is used, there is a negative trend and as more PCs are added, the trend appears to converge to zero. (I haven’t explored nuances of this calculation and merely present this graphic. Retaining 3 PCs, by a fortunate coincidence, happens to maximize the trend. [Note: These were done using a technique similar to but not identical to RegEM TTLS – a sort of Reg Truncated SVD, which converges a lot faster. Jeff C has now run RegEM with higher regpar tho not as high as shown here – and has got (these are 10 times my graphic due to decades rather than years) 1 -0.07; 2 +0.15; 3 +0.15; 4 +0.16; 5 +0.12; 6 +0.10; 7 +0.11; 8 +0.11; 9 -0.08; 10 +0.05; 11 -0.02; 12 -0.05) so the max is not precisely at regpar=3 – so there’s no need to raise eyebrows at 3 rather than 4. However, the trend does not “converge” to 0.15, but goes negative at higher regpars.)

Fig 1. Trends by retained PCs. (Using truncated SVD to approximate RegEM – this will be discussed some time.)
As reported elsewhere, the eigenvectors corresponding to these PCs match very closely to what one expect from spatially autocorrelated series on an Antarctic-shaped disk – a phenomenon that was known in the literature a generation ago. The “physical” explanations provided by Steig et al appear to be flights of fancy – “castles in the clouds” was a phrase used by Buell in the 1970s to describe attempts at that time to attribute meaning to eigenvector patterns generated merely by the geometry. But that’s a different issue than PC retention.
Now let’s turning back the clock a little. As many readers know, some time ago, coauthor Bradley credited Mann with “originating new mathematical approaches that were crucial to identifying strong trends”. One of the “new mathematical approaches” was Mannian principal components (though it wasn’t really “principal components”). It had the unique ability of extracting hockey stick shaped series even from random data. It was that effective at identifying hockey sticks. Mannian principal components were criticized by Wegman and even by the NAS panel. However, in what I suppose is a show of solidarity against such intermeddling, third party paleoclimate use of Mann’s PC1 has actually increased since these reports (Hegerl, Juckes, Osborn and Briffa); Mann’s PC1 even occurs in one of the IPCC AR4 spaghetti graphs.
In the case of the critical North American tree ring network, Mann’s powerful data mining method worked differently than with random red noise. It found Graybill bristlecone data and formed the PC1 from this data. In the red noise case, the method aligned and inverted series to get to a HS-shaped PC1; in the NOAMER tree ring case, all it had to do was line up the Graybill bristlecones.
In MBH98, Mann retained 2 PCs for the AD1400 North American network in dispute. In 2003, no one even knew how many PCs Mann retained for the various network/step combinations and Mann refused to say. Trying to guess was complicated by untrue information (e.g. Mann’s “159 series”). In our 2003 Materials Complaint, we asked for a listing of retained PCs and this was provided in the July 2004 Corrigendum Supplementary Information. The only published principle for retaining tree ring PCs was this:
Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820).
This makes no mention of Preisendorfer’s Rule N – a rule mentioned in connection with temperature principal components. Code archived for the House Committee in 2005 evidences use of Preisendorfer’s Rule N in connection with temperature PCs – but no code evidencing its use in connection with tree ring PCs was provided then. Nor, given the observed retentions (to be discussed below) does it seem possible that this rule was actually used.
In the absence of any reported principle for deciding how many tree ring PCs to retain, for our emulation of MBH98, prior to the Corrigendum SI, we guessed as best we could, using strands of information from here and there, and after July 2004, used the retentions in the Corrigendum SI. For the AD1400 NOAMER network, we had used 2 PCs (the number used in MBH98 for this network/step combination) right from the outset. However, if you used the default settings of a standard principal components algorithm (covariance matrix), the bristlecones got demoted to the PC4. Instead of contributing 38% of the variance, they yielded less than 8% of the variance. (Using a correlation matrix, they only got demoted to the PC2 – something that we reported in passing in MM2005 EE, but which others paid a lot of attention to later in the piece.)
Using the retention schedule of the Corrigendum SI (2 PCs), this meant that two NOAMER ovariance PCs were retained – neither of which was imprinted by the bristlecone. So in the subsequent regression phase, there wasn’t a HS to grab and results were very different than MBH. Mann quickly noticed that the bristlecones were in the PC4 and mentioned this in his Nature reply and in his December 2004 realclimate post trying to preempt our still unpublished 2005 articles (where we specifically report this phenomenon). We cited a realclimate post in MM2005 -EE by the way.
In the regression phase as carried out in MBH98, it didn’t matter whether the bristlecones got in through the PC4 or the PC1, as long as they got in. In his 2nd Nature reply, Mann argued that application of Preisendorfer’s Rule N to the NOAMER AD1400 network entitled him to revise the number of retained PCs – the “right” number of PCs to retain was now said to be 5. The argument originally presented in his 2nd Nature reply became a realclimate post in late 2004.
In a couple of the earliest CA posts,Was Preisendorfer’s Rule N Used in MBH98 Tree Ring Networks? (see also here), I replicated Mann’s calculation for the North American AD1400 network and then tested other network/calculation step combinations to see if the observed PC retention counts could be generated by Rule N applied to that network. It was impossible to replicate observed counts using this alleged rule. Some differences were extreme – it was impossible to see how Rule N could result in 9 retained PCs for the AD1750 Stahle/SWM network and 1 retained PC for the AD1450 Vaganov network.
Not that the “community” had the faintest interest in whether any descriptions of MBH methodology were true or not. However my guess is that some present readers who are scratching their heads at Gavin’s “explanations” of retained PCs in Steig et al will be able to appreciate the absurdity of Mann’s claims to have made a retention schedule using Rule N. I have no idea how it was actually made – but however it was done, it wasn’t done using the procedure that supposedly rationalized going down to the PC4 in the NOAMER network.
Some of Gavin’s new comments seem to be only loosely connected with the actual record. He said:
Schneider et al (2004) looked much more closely at how many eigenmodes can be usefully extracted from the data and how much of the variance they explain. Their answer was 3 or possibly 4. That’s just how it works out.
Does Gavin think that nobody’s going to check? Schneider et al 2004 is online here. Schneider et al reports on a PC analysis of the T_IR data, stating:
Applying PCA to the covariance matrix of monthly TIR anomalies covering the Antarctic continent results in two modes with distinct eigenvalues that meet the separation criteria of North et al. (1982). The leading mode explains 52% of the variance in TIR, while the second mode accounts for 9% of the variance. The first EOF, shown in Fig. 1a as a regression of TIR anomaly data onto the first normalized principal component (TIR -PC1, Fig. 1b) is associated most strongly with the high plateau of East Antarctica. Locally, high correlations in East Antarctica indicate that up to 80% of the variance in TIR can be explained by this first mode, as determined by r2 values. More moderate correlation of the same sign occurs over West Antarctica.
The second EOF (Fig. 1c) is centered on the Ross Ice Shelf and on the Marie Byrd Land region of the continent, where 40-60% of the TIR variance is explained. Most of West Antarctica is of the same sign, but the pattern changes sign over the Ronne-Filchner ice shelf (at 60°W) and most of East Antarctica. Some coastal areas near 120°E have the same sign as West Antarctica. Only a small fraction of the variance in East Antarctic temperatures can be explained by mode 2.
The two EOFs are illustrated in Schneider et al Figure 1 and look virtually identical to the eigenvectors that I plotted (from the PC analysis of the AVHRR data) as shown below (you need to mentally change blue to red for the PC1 – the sign doesn’t “matter”)

From Schneider et al 2004 Figure 1.
“Two modes with distinct eigenvalues that meet the separation criteria of North et al. (1982)” doesn’t mean the same thing to me as
3 or possibly 4. That’s just how it works out
I guess you have to be a to understand this equivalence. Gavin also says in reply to Ryan O:
Since we are interested in the robust features of the spatial correlation, you don’t want to include too many PCs or eigenmodes (each with ever more localised structures) since you will be including features that are very dependent on individual (and possibly suspect) records
Unless, of course, they are bristlecones.
Lest we forget, Wahl and Ammann had their own rationalization for the bristlecones. They argued that if you keep adding PCs until you include the bristlecones, the results “converge” – an interesting argument to keep in mind, given the apparent “convergence” of Steig results to no trend as more PCs are added. Wahl and Ammann:
When two or three PCs are used, the resulting reconstructions (represented by scenario 5d, the pink (1400–1449) and green (1450–1499) curve in Figure 3) are highly similar (supplemental information). As reported below, these reconstructions are functionally equivalent to reconstructions in which the bristlecone/foxtail pine records are directly excluded (cf. pink/blue curve for scenarios 6a/b in Figure 4). When four or five PCs are used, the resulting reconstructions (represented by scenario 5c, within the thick blue range in Figure 3) are virtually indistinguishable (supplemental information) and are very similar to scenario 5b. The convergence of results obtained using four or five PCs, coupled with the closeness of 5c to 5b, indicates that information relevant to the global eigenvector patterns being reconstructed is no longer added by higher-order PCs beyond the level necessary to capture the temporal information structure of the data (four PCs using unstandardized data, or two PCs using standardized data).
The Wahl and Ammann strategy was condemned by Wegman as “having no statistical integrity”. Wegman:
Wahl and Ammann [argue] that if one adds enough principal components back into the proxy, one obtains the hockey stick shape again. This is precisely the point of contention…
A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions.
Again this proved not to be an incidental issue. Wahl and Ammann’s summary of the procedure – the one said by Wegman to have “no statistical integrity” siad:
“when the full information in the proxy data is represented by the PC series [i.e. enough to get the bristlecones in], the impact of PC calculation methods on climate reconstruction in the MBH method is extremely small… a slight modification to the original Mann et al. reconstruction is justifiable for the first half of the 15th century (∼+0.05–0.10◦), which leaves entirely unaltered the primary conclusion of Mann et al.”…
It was this conclusion that was adopted by the IPCC as the last word on the entire episode:
The McIntyre and McKitrick 2005a,b criticism [relating to the extraction of the dominant modes of variability present in a network of western North American tree ring chrono-logies, using Principal Components Analysis] may have some theoretical foundation, but Wahl and Amman (2006) also show that the impact on the amplitude of the final reconstruction is very small (~0.05°C).
So however trivial these matters may seem, there’s lots of fairly intricate debate in the background. I, for one, welcome the entry of another data set.
The real answer is, of course, that the you don’t decide whether or not to use bristlecones by Preisendorfer’s Rule N. (See http://www.climateaudit.org/?p=296 or http://www.climateaudit.org/?p=2844 .) But it’s sort of fun seeing if they can keep their stories straight.