In his RC post yesterday – also see here – Steig used North et al (1982) as supposed authority for retaining three PCs, a reference unfortunately omitted from the original article. Steig also linked to an earlier RC post on principal components retention, which advocated a completely different “standard approach” to determining which PCs to retain. In yesterday’s post, Steig said:
The standard approach to determining which PCs represent to retain is to use the criterion of North et al. (1982), which provides an estimate of the uncertainty in the eigenvalues of the linear decomposition of the data into its time varying PCs and spatial eigenvectors (or EOFs).
I’ve revised this thread in light of a plausible interpretation of Steig’s methodology provided by a reader below – an interpretation that eluded me in my first go at this topic. Whatever the merits of this supposedly “standard approach”, it is not used in MBH98 nor in Wahl and Ammann.
Steig provided the figure shown below with the following comments:
The figure shows the eigenvalue spectrum — including the uncertainties — for both the satellite data from the main temperature reconstruction and the occupied weather station data used in Steig et al., 2009. It’s apparent that in the satellite data (our predictand data set), there are three eigenvalues that lie well above the rest. One could argue for retaining #4 as well, though it does slightly overlap with #5. Retaining more than 4 requires retaining at least 6, and at most 7, to avoid having to retain all the rest (due to their overlapping error bars). With the weather station data (our predictor data set), one could justify choosing to retain 4 by the same criteria, or at most 7. Together, this suggests that in the combined data sets, a maximum of 7 PCs should be retained, and as few as 3. Retaining just 3 is a very reasonable choice, given the significant drop off in variance explained in the satellite data after this point: remember, we are trying to avoid including PCs that simply represent noise.
Figure 1. Eigenvalues for AVHRR data.
As I observed earlier this year here, if you calculate eigenvalues from spatially autocorrelated random data on a geometric shape, you get eigenvectors occurring in multiplets (corresponding to related Chladni patterns/eigenfunctions). The figure below shows the eigenvalues for spatially autocorrelated data on a circular disk. Notice that, after the PC1, lower eigenvalues occur in pair or multiplets. In particular, notice that the PC2 and PC3 have equal eigenvalues.
North et al 1982 observed that PCs with “close” but not identical eigenvalues could end up being “statistically inseparable”, just as PCs with identical eigenvalues. North:
An obvious difficulty arises in physically interpreting an EOF if it is not even well-defined intrinsically. This can happen for instance if two or more EOFs have the same eigenvalue. It is easily demonstrated that any linear combination of the members of the degenerate multiplet is also an EOF with the same eigenvalue. Hence in the case of a degenerate multiplet one can choose a range of linear combinations which .. are indistinguishable in terms of their contribution to the average variance…Such degeneracies often arise from a symmetry in the problem but they can be present for no apparent reason (accidental degeneracy.)
As the reader observes, if you have to take everything in an overlapping confidence intervals, then there are only a few choices available. You can cut off after 3 PCs or 7 PCs – as Steig observes in his discussion. Not mentioned by Steig are other cut off points within North’s criteria: after 1, 9, 10 and 13 PCs, the latter being, by coincidence, the number in Ryan’s preferred version. North’s criterion gives no guidance as between these 4 situations.
In addition, if you assume that there is negative exponential decorrelation, then the eigenvalues from a Chladni situation on an Antarctic shaped disk area as follows – with the first few eigenvalues separated purely by the Chladni patterns, rather than anything physical – a point made on earlier occasions, where we cited Buell’s cautions in the 1970s against practitioners using poorly understood methodology to build “castles in the clouds” (or perhaps in this case, in the snow).
As an experiment, I plotted the first 200 eigenvalues from the AVHRR anomaly matrix as against the first 200 eigenvalues assuming negative exponential spatial decorrelation from randomly chosen gridcells. (I don’t have a big enough computer to do this for all 5509 gridcells and I’m satisfied that the sampling is a sensible way of doing the comparison). A comparison of observed eigenvalues to those from random matrices is the sort of thing recommended by Preisendorfer.
This is actually a rather remarkable pattern. It indicates that the AVHRR data has less loading in the first few PCs than spatially autocorrelated data from an Antarctic-shaped disk and more loading in the lower order PCs.