One of the most ridiculous aspects and most misleading aspects of MBH (and efforts to rehabilitate it) is the assumption that principal components applied to geographically heterogeneous networks necessarily yield time series of climatic interest.
Preisendorfer (and others) state explicitly that principal components should be used as an exploratory method – and disavowed any notion that merely passing a Rule N test imbued a time series with magical properties.
In primitive societies, there is a “correct” order for magical incantations and woe betide anyone who dares to question the authority of the witch doctors. We see this with the recent discussion at Tamino’s of Rule N: total disregard for any need to establish the scientific veracity of a relationship between ring widths and temperature and replacement of such investigation by incantations that certain relatively arbitrary methods are “correct” or “proper”, using the language of ritual or magic rather than science.
There is much that needs to be said on this topic and I will do so next week.
Today, I want to discuss something very interesting that arose in the discussion of the Stahle SWM network – an “uncontroversial” network, but which may have great utility in understanding the properties of principal components applied to geographically heterogeneous tree ring networks.
I’d plotted up barplots of the Stahle SWM network eigenvectors and, at the suggestion of a reader, eigenvector contour maps. These plots had several extremely interesting properties which are connected with known properties of Toeplitz matrices – hence the title – which, in turn, give some considerable insight into the meaning of principal components applied to such networks – a topic conspicuously lacking in all literature to date..
The sites in the Stahle/SWM network span a north-south range in Texas-Mexico (See location map here). The network contains both earlywood and latewood sites.
Prior to making a barplot of the eigenvector weights, I arranged the series by Earlywood-Latewood and then in a N-S order. The PC1 was more or less a type of weighted average; while the PC2 more or less yielded a “gradient” in eigenvector weights in each case, as shown below, first in barplot form and second in contour map format. Note that, for each of the EW and LW sequences, there is one change in sign for the eigenvector coefficients. This is important.
The next PC showed a contrast between earlywood and latewood; the PC4 showed a type of “upside-down quadratic” or “horseshoe” shape. In this case, for both the EW and LW, there are 2 changes in sign with negative weights at the N and S ends of the range and positive weights in the middle.
Recent Mathematical Literature on “Horseshoes”
I ran across two interesting articles that describe eigenvector patterns that are strongly reminiscent of this particular pattern (I’ll explain how I ran across these articles in the conclusion.)
Diaconis et al 2008, HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS, discusses the provenance of “horseshoe” patterns in principal component representations arising out of ecological ordinations, using the lively example of roll call voting patterns in the U.S. House of Representatives. Despite the lively example, the math is not easy.
Another slightly easier survey is by De Leuuw here where De Leuuw observes of Toeplitz matrices A:
Suppose
is the eigenvector of A corresponding with eigenvalue
the kth largest eigenvalue. We can plot the n elements of
against 1; 2;…;n and connect successive points. The zero-crossings of the resulting piecewise linear function are called the nodes of the eigenvector.
Eigenvector
has exactly k-1 sign changes. The nodes of successive eigenvectors interlace,
Notice the similarity in pattern to the Stahle SWM patterns shown above: the Earlywood PC2 has exactly one sign change; the next PC has two sign changes and so on,… The eigenvector patterns have the same form as the patterns for this specialized matrix form (Toeplitz matrix.)
If you’re familiar with AR1 autocorrelation matrices, you’re familiar with a type example of a Toeplitz matrix, which is defined as a matrix where the value of an entry A[j;i] is given by the difference in indices i.e. For the autocorrelation matrix,
.
Applying to the Stahle SWM Network
Because the eigenvector patterns in the Stahle SWM network are so distinct, one is therefore led to examine the underlying data for evidence of a Toeplitz structure, which in this case would have a very easy interpretation: spatial autocorrelation.
If the site chronologies are spatially autocorrelated with negative exponential (or other monotonic) decorrelation, then this would go a considerable way towards yielding the desired structure. Below is a plot of EW, LW and EWxLW correlations by intersite distance. A spatial autocorrelation model has an extremely good fit to the EW correlations (r2=0.90!) and almost as good to the LW correlations (r2=0.83).
Next I did a one-parameter model of the EW correlation, estimating the coefficient for decorrelation by distance. I then created a matrix (Toeplitz) with the modeled intersite correlations and calculated the eigenvalues, comparing these eigenvalues to the observed eigenvalues, as shown below. The match is fantastically good, compared to anything that I’ve ever seen in dendro or proxy studies. I’m sure that there will be some statistic that measures this correspondence, but for now, the visual match is obviously almost exact.
There is one conclusion that can be drawn from this: everything here turns on spatial autocorrelation and the geometry of the sites (i.e. the intersite distances.) If you have another geometric arrangement of sites, then you’ll get different eigenvalue patterns.
How does this tie back to Toeplitz matrices? In this particular example, it seems to be possible to assign a linear order to the sites (mostly N-S, but the Texas sites are ordered a little “north” of all the Mexican sites, slightly against the latitude by itself). Using this linear order, the correlation matrix can be approximated by a Toeplitz matrix.
The r squared for a fit of EW correlation against abs(i-j) where i and j are simply the matrix index is 0.70! So the approximation by a Toeplitz matrix (a one-dimensional ordering) is itself very good. Because the approximation is as good as it is, the various properties of the eigenvector coefficients deriving from Toeplitz matrices can be observed in the eigenvector patterns of the Stahle SWM network.
Instead of this network containing a huge amount of “information”, what we have is a network that is approximated closely by a spatially autocorrelated sites arranged in a straight line.
Comments
I take a couple of thoughts away from this exercise – aside from it being interesting in itself.
First, the spatial autocorrelation in the site chronologies is exceptionally strong.
Second, the eigenvectors (and eigenvalues) in such situations will be affected by geographic inhomogeneity and it should be possible to say something about this. In the Stahle SWM case, the sites are not random but there is a clump of sites in the north and a clump in the south. If one is seeking a measurement of the “field” over the geographic space, then you can’t get there through PC analysis on its own. The PC1 gets overweighted by the clump of NW sites, where the three nearby sites are spatially autocorrelated and then overweighted in the PC analysis relative to their geographic significance. To make a proper estimate over the geographic region would require kriging or some other utilization of the geographic information. (This is one of many noticeable problems with Mannian methods – geographic information is never used and it doesn’t even “matter” if it is wrong – Mann stubbornly refused to correct geographic errors in Mann et al 2007.)
Third, Rule N, uninformed by the geography and information on spatial autocorrelation, is not very helpful. Supposedly using Rule N (though this is probably untrue), Mann deduced that there were 9 significant PCs in the Stahle SWM network. It’s hard to believe that there is more than 1. And it seems clear that even the PC1, as an effective index of whatever “climate field” is at work, is biased by geographical inhomogeneity and that a more useful index could be constructed by kriging.
Interpretation of the SWM network is simplified by its quasi-linearity in geographic structure. The 2-D situation in larger networks complicates the interpretation, but the salient aspects should carry over. Quite a bit is known about “block Toeplitz” matrices arising out of 2-D spatial autocorrelation, which can perhaps be usefully applied if one feels determined to derive something from tree ring networks.
The very strong spatial autocorrelation observed here seems relevant for assessing data quality in other situations. For example, Graybill’s Niwot Ridge limber pine chronology looks nothing like Woodhouse’s nearby Niwot Ridge chronology. Yeah, yeah, there are microclimate differences, but the inconsistency points to something going well beyond climatic differences. But even if the differences are climatic, this needs to be ironed out. These sites are a half hour drive from UCAR world headquarters and a reconciliation is long overdue.
Postscript:
How did I encounter these interesting articles? In his effort to rehabilitate Mann, while evading the explicit statement of Mann’s citation (Preisendorfer) that PC methods were necessarily centered because they were an analysis of variance, Tamino found some literature mentions of uncentered SVD of data matrices (loosely called “uncentered PC analysis” on some occasions), which he claimed as some sort of justification for Mannian short-centering – again without confronting the problematic qualities of Mannian short-centered in time series analysis. The references by Tamino and his readers were primarily to literature in ecological ordination, where ter Braak has used uncentered PC analysis.
Exactly how usage in ecological ordination can be levered into justification of a quite different use in time series analysis mystifies me, but it sounds good for people that believe in magic.
Anyway, ter Braak and ecological ordination literature are cited by Diaconis et al and de Leuuw. I noticed these papers through googling ter braak and other combinations.
26 Comments
Steve, I cannot compliment you enough on the quality of your work and your presentation of same. I look forward every day to your latest effort.
Congratulations! This looks like work of fundamental importance and value to the field. I hope the recognized experts acknowledge your contribution and don’t dismiss it because of your past critiques of the hockey stick.
It is very interesting to discover that at least one tree ring network does contain useful information after all. It was starting to look like there was a lot of noise in them, and nothing really simple that one could learn from them.
The result that the patterns reveal spatial correlation is very intuitive and should lead to whole new direction of research in the subject. Perhaps something about climate might be extracted in the end but it is now clear that the strong geographic component of the signal needs to be controlled for first before one can start to look for other systematic influences.
Steve–
Have you considered doing your own 2k year temperature reconstruction? (Maybe you have?)
With all of the things that you’ve learned Mann has done wrong, could you do them right? It would be interesting to see another high quality reconstruction in a peer-reviewed journal that one could use as a counterweight to the hickey stick graph.
Thanks,
tim
I’d like to second what Tim G said, Steve. What with documenting the flaws in MBH’s hockey stick and after seeing the criticisms Craig Loehle received for his reconstruction, you must know where the pitfalls are and how it should be done correctly. Nobody else appears willing to buck the consensus or to have a better grasp of the science than you, so why not assemble a reconstruction the right way and let the results speak? If it ends up looking more like a hockey stick than a sine wave, so be it. It’d be a significant contribution to science.
Tim G:
If it were that easy, then Steve might have done one by now. Check out the Loehle 2007 category for an example that avoids the intrinsic noisiness of tree rings.
Loehle’s method would seem to be a reasonable way to approach tree-ring temperature reconstructions as well as non-tree-ring studies.
For each tree-ring, compare it to the 20th century temperature record. Reject all the poor correlations, calibrate the others. Now they are all direct temperature proxies. The only weighting to be done would be geographic.
Drat, left out “nearest” in the second clause of the second sentence.
Well done, Steve. One of your best posts!
Sweet.
#3 & #4
Plant response to moisture is an upside down U, as is the response to temperature and insolation. Thus a particular tree-ring width is only due to a unique temperature at the peak of the U where moisture and insolation are constant. All other tree-ring widths correspond to two temperatures. Variation of moisture and insolation means tree-ring widths correspond to a range of possible combinations of temperature, moisture level and insolation. Pestilence and disease add noise to the combination. It’s hard to imagine a more difficult proxy for temperature to choose. I suspect that Steve has other interesting problems to solve and discoveries to make than attempting to conquer Mount Improbable 🙂
Steve, although I don’t have a link to the paper, isn’t this specific problem addressed by Karl et al 1982.
Karl, T. R., A. J. Koscielny, and H. F. Diaz, 1982: Potential errors in the application of principal component (eigenvector) analysis to geophysical data. J. Appl. Meteor., 21, 1183–1186. [Steve: http://ams.allenpress.com/archive/1520-0450/21/8/pdf/i1520-0450-21-8-1183.pdf ]
Maybe, if Tamino wants some help defending the use of PCA in climate related sciences he should peruse this paper and this followup.
I mean it’s what some in NASA think and those guy’s are always right, aren’t they.
re 11
From the second paper:
Peer review???
sorry for the many typos. Here it is with fewer
From the second paper:
Peer review???
What a neat exposition, Steve!
As if by accident, one of my first CA posts suggested more use of geostatistics (and by inference, kriging) for mixed spatial & temporal data.
Without wanting to take from your serious work, as a reward for effort I offer the following lighthearted comment from
http://www.independentteacher.org/vol5/5.1-5.html
Interestingly the Wikipedia article on kriging mentions Bre-X:
#15. Kriging wasn’t the real issue at Busang; the real issue was they salted the assays. I did some posts on Bre-X early in CA, which may interest some readers.
To be honest, having inbibed a lot of statistical theory almost against my will by adminning this site for so long, I’m still in awe that something as noisy as tree ring growth can be reduced to such a simple modelling analysis using Toeplitz Matrices to a result of spacial autocorrelation.
It leads me to a separate question regarding the gridding of temperature proxies – clearly if a tree ring growth represents some function of a series of climatic variables + noise, then talking about that function as having meaning even 150km away is pushing the envelope of significance even if the proxies happen to share a climatic region as these have.
Could anyone give a simple explanation in a few lines of what this means? Is it something like that the readings are more correlated with geographical position than anything else…?
This is why, if there is any hope of using proxies, it must be done by reconstructing local temperatures and then combining them. Averaging z-scores (tree rings or Thompson’s ice cores) or doing a global/regional PCA or something simply does not make sense.
#17. My conclusion is that the spatial correlation in the Stahle network is quite strong and suggests that one can use this to advantage in checking for outliers i.e. are there subsets that do not cohere to the spatial correlation pattern e.g. perhaps the Graybill sites.
If the NOAMER network is also held accountable to having spatial autocorrelation, the improbability of the Graybill chronologies increases even more. What are the odds that the top 15 HS-shaped sites (in terms of 20th century difference of mean) drawn from a network with high spatial autocorrelation should all be drawn by the same guy? Does Graybill have a magic flute or a flawed instrument?
If these were satellite measurements and one class of instrument had a major drift relative to other instruments, the instrument class would show up in a Mannian PC analysis – but that wouldn’t show that the instrument was valid. I’m going to do a post using this illustration.
Although my mind numbed a bit reading this post (it’s been a long time since college) it appears that there was a lot of eye-balling and intuiting in data selection, and the Taminoesque attempts at justification are justy baffle-gab – blinding with science.
You’re a patient man Steve.
BTW Apparently, sea temperatures and atmospheric temperatures are not increasing.
Very interesting post.
Btw the author of A HORSESHOE FOR MULTIDIMENSIONAL SCALING is Jan de Leeuw (not Leuuw).
Steve, have you tried any non-linear PCA? There are some free Matlab codes for kernel PCA available from the internet.
Kernel PCA is available in the freely available Matlab SPRT (Statistical Pattern Recognition Toolbox).
Also, there are also more references and links to free software (R, Matlab, C, Java …) related to kernel methods found in the following link. The link contains the software for the book itself but the site lists other softwares as well.
– Kernel Methods for Pattern Analysis
“If the facts don’t fit the theory, change the facts.”
–Albert Einstein
Repost, different wording. I think there is a typo in the post.The author of HORSESHOE FOR MULTIDIMENSIONAL SCALING is de Leeuw (not de Leuuw as is mentioned 3 times).
One Trackback
[…] is far more knowledgeable, and has posted about the topic frequently on CA. See, for starters, “Toeplitz Matrices and the Stahle Treering Network”, 3/22/08, “Antarctic Spatial Autocorrelation #1″, 2/20/09, “Steig Eigenvectors and […]