## Toeplitz Matrices and the Stahle Tree Ring Network

One of the most ridiculous aspects and most misleading aspects of MBH (and efforts to rehabilitate it) is the assumption that principal components applied to geographically heterogeneous networks necessarily yield time series of climatic interest.

Preisendorfer (and others) state explicitly that principal components should be used as an exploratory method – and disavowed any notion that merely passing a Rule N test imbued a time series with magical properties.

In primitive societies, there is a “correct” order for magical incantations and woe betide anyone who dares to question the authority of the witch doctors. We see this with the recent discussion at Tamino’s of Rule N: total disregard for any need to establish the scientific veracity of a relationship between ring widths and temperature and replacement of such investigation by incantations that certain relatively arbitrary methods are “correct” or “proper”, using the language of ritual or magic rather than science.

There is much that needs to be said on this topic and I will do so next week.

Today, I want to discuss something very interesting that arose in the discussion of the Stahle SWM network – an “uncontroversial” network, but which may have great utility in understanding the properties of principal components applied to geographically heterogeneous tree ring networks.

I’d plotted up barplots of the Stahle SWM network eigenvectors and, at the suggestion of a reader, eigenvector contour maps. These plots had several extremely interesting properties which are connected with known properties of Toeplitz matrices – hence the title – which, in turn, give some considerable insight into the meaning of principal components applied to such networks – a topic conspicuously lacking in all literature to date..

The sites in the Stahle/SWM network span a north-south range in Texas-Mexico (See location map here). The network contains both earlywood and latewood sites.

Prior to making a barplot of the eigenvector weights, I arranged the series by Earlywood-Latewood and then in a N-S order. The PC1 was more or less a type of weighted average; while the PC2 more or less yielded a “gradient” in eigenvector weights in each case, as shown below, first in barplot form and second in contour map format. Note that, for each of the EW and LW sequences, there is one change in sign for the eigenvector coefficients. This is important.

The next PC showed a contrast between earlywood and latewood; the PC4 showed a type of “upside-down quadratic” or “horseshoe” shape. In this case, for both the EW and LW, there are 2 changes in sign with negative weights at the N and S ends of the range and positive weights in the middle.

Recent Mathematical Literature on “Horseshoes”
I ran across two interesting articles that describe eigenvector patterns that are strongly reminiscent of this particular pattern (I’ll explain how I ran across these articles in the conclusion.)

Diaconis et al 2008, HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS, discusses the provenance of “horseshoe” patterns in principal component representations arising out of ecological ordinations, using the lively example of roll call voting patterns in the U.S. House of Representatives. Despite the lively example, the math is not easy.

Another slightly easier survey is by De Leuuw here where De Leuuw observes of Toeplitz matrices A:

Suppose $x_k$ is the eigenvector of A corresponding with eigenvalue $\lambda_k$ the kth largest eigenvalue. We can plot the n elements of $x_k$ against 1; 2;…;n and connect successive points. The zero-crossings of the resulting piecewise linear function are called the nodes of the eigenvector.

Eigenvector $x_k$ has exactly k-1 sign changes. The nodes of successive eigenvectors interlace,

Notice the similarity in pattern to the Stahle SWM patterns shown above: the Earlywood PC2 has exactly one sign change; the next PC has two sign changes and so on,… The eigenvector patterns have the same form as the patterns for this specialized matrix form (Toeplitz matrix.)

If you’re familiar with AR1 autocorrelation matrices, you’re familiar with a type example of a Toeplitz matrix, which is defined as a matrix where the value of an entry A[j;i] is given by the difference in indices i.e. $A[j;i]=t_{j-i}.$ For the autocorrelation matrix, $t_{j-i}=\rho^{j-i}$.

Applying to the Stahle SWM Network
Because the eigenvector patterns in the Stahle SWM network are so distinct, one is therefore led to examine the underlying data for evidence of a Toeplitz structure, which in this case would have a very easy interpretation: spatial autocorrelation.

If the site chronologies are spatially autocorrelated with negative exponential (or other monotonic) decorrelation, then this would go a considerable way towards yielding the desired structure. Below is a plot of EW, LW and EWxLW correlations by intersite distance. A spatial autocorrelation model has an extremely good fit to the EW correlations (r2=0.90!) and almost as good to the LW correlations (r2=0.83).

Next I did a one-parameter model of the EW correlation, estimating the coefficient for decorrelation by distance. I then created a matrix (Toeplitz) with the modeled intersite correlations and calculated the eigenvalues, comparing these eigenvalues to the observed eigenvalues, as shown below. The match is fantastically good, compared to anything that I’ve ever seen in dendro or proxy studies. I’m sure that there will be some statistic that measures this correspondence, but for now, the visual match is obviously almost exact.

There is one conclusion that can be drawn from this: everything here turns on spatial autocorrelation and the geometry of the sites (i.e. the intersite distances.) If you have another geometric arrangement of sites, then you’ll get different eigenvalue patterns.

How does this tie back to Toeplitz matrices? In this particular example, it seems to be possible to assign a linear order to the sites (mostly N-S, but the Texas sites are ordered a little “north” of all the Mexican sites, slightly against the latitude by itself). Using this linear order, the correlation matrix can be approximated by a Toeplitz matrix.

The r squared for a fit of EW correlation against abs(i-j) where i and j are simply the matrix index is 0.70! So the approximation by a Toeplitz matrix (a one-dimensional ordering) is itself very good. Because the approximation is as good as it is, the various properties of the eigenvector coefficients deriving from Toeplitz matrices can be observed in the eigenvector patterns of the Stahle SWM network.

Instead of this network containing a huge amount of “information”, what we have is a network that is approximated closely by a spatially autocorrelated sites arranged in a straight line.

I take a couple of thoughts away from this exercise – aside from it being interesting in itself.

First, the spatial autocorrelation in the site chronologies is exceptionally strong.

Second, the eigenvectors (and eigenvalues) in such situations will be affected by geographic inhomogeneity and it should be possible to say something about this. In the Stahle SWM case, the sites are not random but there is a clump of sites in the north and a clump in the south. If one is seeking a measurement of the “field” over the geographic space, then you can’t get there through PC analysis on its own. The PC1 gets overweighted by the clump of NW sites, where the three nearby sites are spatially autocorrelated and then overweighted in the PC analysis relative to their geographic significance. To make a proper estimate over the geographic region would require kriging or some other utilization of the geographic information. (This is one of many noticeable problems with Mannian methods – geographic information is never used and it doesn’t even “matter” if it is wrong – Mann stubbornly refused to correct geographic errors in Mann et al 2007.)

Third, Rule N, uninformed by the geography and information on spatial autocorrelation, is not very helpful. Supposedly using Rule N (though this is probably untrue), Mann deduced that there were 9 significant PCs in the Stahle SWM network. It’s hard to believe that there is more than 1. And it seems clear that even the PC1, as an effective index of whatever “climate field” is at work, is biased by geographical inhomogeneity and that a more useful index could be constructed by kriging.

Interpretation of the SWM network is simplified by its quasi-linearity in geographic structure. The 2-D situation in larger networks complicates the interpretation, but the salient aspects should carry over. Quite a bit is known about “block Toeplitz” matrices arising out of 2-D spatial autocorrelation, which can perhaps be usefully applied if one feels determined to derive something from tree ring networks.

The very strong spatial autocorrelation observed here seems relevant for assessing data quality in other situations. For example, Graybill’s Niwot Ridge limber pine chronology looks nothing like Woodhouse’s nearby Niwot Ridge chronology. Yeah, yeah, there are microclimate differences, but the inconsistency points to something going well beyond climatic differences. But even if the differences are climatic, this needs to be ironed out. These sites are a half hour drive from UCAR world headquarters and a reconciliation is long overdue.

Postscript:

How did I encounter these interesting articles? In his effort to rehabilitate Mann, while evading the explicit statement of Mann’s citation (Preisendorfer) that PC methods were necessarily centered because they were an analysis of variance, Tamino found some literature mentions of uncentered SVD of data matrices (loosely called “uncentered PC analysis” on some occasions), which he claimed as some sort of justification for Mannian short-centering – again without confronting the problematic qualities of Mannian short-centered in time series analysis. The references by Tamino and his readers were primarily to literature in ecological ordination, where ter Braak has used uncentered PC analysis.

Exactly how usage in ecological ordination can be levered into justification of a quite different use in time series analysis mystifies me, but it sounds good for people that believe in magic.

Anyway, ter Braak and ecological ordination literature are cited by Diaconis et al and de Leuuw. I noticed these papers through googling ter braak and other combinations.

1. Posted Mar 22, 2008 at 8:16 AM | Permalink

Steve, I cannot compliment you enough on the quality of your work and your presentation of same. I look forward every day to your latest effort.

2. Peter Hartley
Posted Mar 22, 2008 at 9:42 AM | Permalink

Congratulations! This looks like work of fundamental importance and value to the field. I hope the recognized experts acknowledge your contribution and don’t dismiss it because of your past critiques of the hockey stick.

It is very interesting to discover that at least one tree ring network does contain useful information after all. It was starting to look like there was a lot of noise in them, and nothing really simple that one could learn from them.

The result that the patterns reveal spatial correlation is very intuitive and should lead to whole new direction of research in the subject. Perhaps something about climate might be extracted in the end but it is now clear that the strong geographic component of the signal needs to be controlled for first before one can start to look for other systematic influences.

3. Tim G
Posted Mar 22, 2008 at 9:50 AM | Permalink

Steve–

Have you considered doing your own 2k year temperature reconstruction? (Maybe you have?)

With all of the things that you’ve learned Mann has done wrong, could you do them right? It would be interesting to see another high quality reconstruction in a peer-reviewed journal that one could use as a counterweight to the hickey stick graph.

Thanks,
tim

4. Posted Mar 22, 2008 at 10:44 AM | Permalink

I’d like to second what Tim G said, Steve. What with documenting the flaws in MBH’s hockey stick and after seeing the criticisms Craig Loehle received for his reconstruction, you must know where the pitfalls are and how it should be done correctly. Nobody else appears willing to buck the consensus or to have a better grasp of the science than you, so why not assemble a reconstruction the right way and let the results speak? If it ends up looking more like a hockey stick than a sine wave, so be it. It’d be a significant contribution to science.

5. John A
Posted Mar 22, 2008 at 10:49 AM | Permalink

Tim G:

If it were that easy, then Steve might have done one by now. Check out the Loehle 2007 category for an example that avoids the intrinsic noisiness of tree rings.

6. Alan S. Blue
Posted Mar 22, 2008 at 11:29 AM | Permalink

Loehle’s method would seem to be a reasonable way to approach tree-ring temperature reconstructions as well as non-tree-ring studies.

For each tree-ring, compare it to the 20th century temperature record. Reject all the poor correlations, calibrate the others. Now they are all direct temperature proxies. The only weighting to be done would be geographic.

7. Alan S. Blue
Posted Mar 22, 2008 at 11:31 AM | Permalink

Drat, left out “nearest” in the second clause of the second sentence.

8. AlanB
Posted Mar 22, 2008 at 12:35 PM | Permalink

Well done, Steve. One of your best posts!

9. Sam Urbinto
Posted Mar 22, 2008 at 2:01 PM | Permalink

Sweet.

10. Posted Mar 22, 2008 at 2:45 PM | Permalink

#3 & #4

Plant response to moisture is an upside down U, as is the response to temperature and insolation. Thus a particular tree-ring width is only due to a unique temperature at the peak of the U where moisture and insolation are constant. All other tree-ring widths correspond to two temperatures. Variation of moisture and insolation means tree-ring widths correspond to a range of possible combinations of temperature, moisture level and insolation. Pestilence and disease add noise to the combination. It’s hard to imagine a more difficult proxy for temperature to choose. I suspect that Steve has other interesting problems to solve and discoveries to make than attempting to conquer Mount Improbable 🙂

11. Ellis
Posted Mar 22, 2008 at 3:21 PM | Permalink

Second, the eigenvectors (and eigenvalues) in such situations will be affected by geographic inhomogeneity and it should be possible to say something about this.

Steve, although I don’t have a link to the paper, isn’t this specific problem addressed by Karl et al 1982.
Karl, T. R., A. J. Koscielny, and H. F. Diaz, 1982: Potential errors in the application of principal component (eigenvector) analysis to geophysical data. J. Appl. Meteor., 21, 1183–1186. [Steve: http://ams.allenpress.com/archive/1520-0450/21/8/pdf/i1520-0450-21-8-1183.pdf ]

Maybe, if Tamino wants some help defending the use of PCA in climate related sciences he should peruse this paper and this followup.

I mean it’s what some in NASA think and those guy’s are always right, aren’t they.

12. Stan Palmer
Posted Mar 22, 2008 at 4:50 PM | Permalink

re 11

From the second paper:

… The spatial orhaogonality of the eigenvectors is not well-adapted to all problmes. As a consequence, many EOF solutionss dipaly an artificial alternating sign strcuture. Horel (1981) quotes many cases in the literature of this artficial feature, for example for the sea lvel pressure in restricted regions such as Australia or the Artic. …

Peer review???

13. Stan Palmer
Posted Mar 22, 2008 at 4:55 PM | Permalink

sorry for the many typos. Here it is with fewer

From the second paper:

… The spatial orthogonality of the eigenvectors is not well-adapted to all problems. As a consequence, many EOF solutions display an artificial alternating sign structure. Horel (1981) quotes many cases in the literature of this artificial feature, for example for the sea lvel pressure in restricted regions such as Australia or the Artic.

Peer review???

14. Geoff Sherrington
Posted Mar 22, 2008 at 6:11 PM | Permalink

What a neat exposition, Steve!

As if by accident, one of my first CA posts suggested more use of geostatistics (and by inference, kriging) for mixed spatial & temporal data.

Without wanting to take from your serious work, as a reward for effort I offer the following lighthearted comment from

http://www.independentteacher.org/vol5/5.1-5.html

Typical of perspectives shared by Mozart and his libretto collaborators would be the symbolism of the Temple of Reason (standing alongside the Temple of Wisdom and the Temple of Nature) in The Magic Flute and the importance of self-restraint in the initiation rituals Tamino must undergo in that opera. … A representative passage from Jane Austen’s Persuasion, for example, reflects similar premises: all “qualities of the mind,” observes Austen’s protagonist, Anne Elliot, “should have [their] proportions and limits.”

15. John A
Posted Mar 22, 2008 at 7:43 PM | Permalink

Interestingly the Wikipedia article on kriging mentions Bre-X:

Controversy in mineral exploration and mining

The question of whether spatial dependence may be assumed or ought to be verified by applying Fisher’s F-test to the variance of a set of measured values and the first variance term of the ordered set prior to interpolation by kriging is of particular relevance in mineral exploration and mining. For example, Clark’s hypothetical uranium data in Practical Geostatistics do not display a significant degree of spatial dependence but the author reports a kriged estimate for some selected coordinates within this sample space anyway. The practice of kriging lends itself to abuse, particularly when applied to a model ore distribution based on the assumption that ore concentrations display a significant degree of spatial dependency in the sample space under examination. Spatial dependence between borehole grades was assumed at Bre-X’s Busang property, Hecla’s Grouse Creek mine and scores of others where gold grades turned out to be lower than predicted. A significant degree of spatial dependence is required to justify interpolation between measured values in ordered sets. Failing to pass a test for spatial dependence would indicate that a constant model cannot be distinguished from a kriging model without further information or knowledge.

16. Steve McIntyre
Posted Mar 22, 2008 at 7:57 PM | Permalink

#15. Kriging wasn’t the real issue at Busang; the real issue was they salted the assays. I did some posts on Bre-X early in CA, which may interest some readers.

17. John A
Posted Mar 23, 2008 at 3:30 AM | Permalink

To be honest, having inbibed a lot of statistical theory almost against my will by adminning this site for so long, I’m still in awe that something as noisy as tree ring growth can be reduced to such a simple modelling analysis using Toeplitz Matrices to a result of spacial autocorrelation.

It leads me to a separate question regarding the gridding of temperature proxies – clearly if a tree ring growth represents some function of a series of climatic variables + noise, then talking about that function as having meaning even 150km away is pushing the envelope of significance even if the proxies happen to share a climatic region as these have.

18. fred
Posted Mar 23, 2008 at 5:54 AM | Permalink

Could anyone give a simple explanation in a few lines of what this means? Is it something like that the readings are more correlated with geographical position than anything else…?

19. Craig Loehle
Posted Mar 23, 2008 at 6:52 AM | Permalink

This is why, if there is any hope of using proxies, it must be done by reconstructing local temperatures and then combining them. Averaging z-scores (tree rings or Thompson’s ice cores) or doing a global/regional PCA or something simply does not make sense.

20. Steve McIntyre
Posted Mar 23, 2008 at 7:01 AM | Permalink

#17. My conclusion is that the spatial correlation in the Stahle network is quite strong and suggests that one can use this to advantage in checking for outliers i.e. are there subsets that do not cohere to the spatial correlation pattern e.g. perhaps the Graybill sites.

If the NOAMER network is also held accountable to having spatial autocorrelation, the improbability of the Graybill chronologies increases even more. What are the odds that the top 15 HS-shaped sites (in terms of 20th century difference of mean) drawn from a network with high spatial autocorrelation should all be drawn by the same guy? Does Graybill have a magic flute or a flawed instrument?

If these were satellite measurements and one class of instrument had a major drift relative to other instruments, the instrument class would show up in a Mannian PC analysis – but that wouldn’t show that the instrument was valid. I’m going to do a post using this illustration.

21. Robert Wood
Posted Mar 23, 2008 at 4:16 PM | Permalink

Although my mind numbed a bit reading this post (it’s been a long time since college) it appears that there was a lot of eye-balling and intuiting in data selection, and the Taminoesque attempts at justification are justy baffle-gab – blinding with science.

You’re a patient man Steve.

BTW Apparently, sea temperatures and atmospheric temperatures are not increasing.

22. harold
Posted Mar 23, 2008 at 4:55 PM | Permalink

Very interesting post.
Btw the author of A HORSESHOE FOR MULTIDIMENSIONAL SCALING is Jan de Leeuw (not Leuuw).

23. Falafulu Fisi
Posted Mar 24, 2008 at 8:46 AM | Permalink

Steve, have you tried any non-linear PCA? There are some free Matlab codes for kernel PCA available from the internet.

24. Falafulu Fisi
Posted Mar 24, 2008 at 8:57 AM | Permalink

Kernel PCA is available in the freely available Matlab SPRT (Statistical Pattern Recognition Toolbox).

Also, there are also more references and links to free software (R, Matlab, C, Java …) related to kernel methods found in the following link. The link contains the software for the book itself but the site lists other softwares as well.

25. Patrick Henry
Posted Mar 24, 2008 at 10:25 AM | Permalink

“If the facts don’t fit the theory, change the facts.”
–Albert Einstein

26. harold
Posted Mar 26, 2008 at 5:53 PM | Permalink

Repost, different wording. I think there is a typo in the post.The author of HORSESHOE FOR MULTIDIMENSIONAL SCALING is de Leeuw (not de Leuuw as is mentioned 3 times).