I’ve had a number of requests to explain some statistical topics and tests of significance. I’d rather not get involved in an explanation of general statistical concepts, which are perfectly well covered in many other places. However, I am going to post some notes up on “spurious significance”, which, after all, was part of the title of our GRL article "Hockey Sticks, Principal Components and Spurious Significance", although most of the attention has been spent on principal components.
"Spurious significance" is a term in statistics used to describe a situation when a statistic returns a value which is "statistically significant", when it is impossible that there is any significance. It’s a topic, which sounds easy, but quickly gets difficult. Phillips  introduces the topic as follows (anyone who doubts the quick descent into complexity need only to look at the article itself). I’m going to reference Phillips frequently because his original 1986 article on the topic was a remarkable tour de force, which framed this entire matter in a very sophisticated way. ) Phillips:
Spurious regressions or nonsense correlations as they were originally called have a long history in statistics, dating back at least to Yule . Textbooks and the literature of statistics and econometrics abound with interesting examples, many of them quite humorous. One is the high correlation between the number of ordained ministers and the rate of alcoholism in Britain in the nineteenth century. Another is that of Yule  reporting a correlation of 0.95 between the proportion of Church of England marriages to all marriages and the mortality rate over the period 1866-1911. Yet another is the econometric example of alchemy reported by Hendry  between the price level and cumulative rainfall in the U.K. The latter “relation” proved resilient to many econometric diagnostic tests and was humorously advanced by its author as a new “theory” of inflation. With so many well known examples like these, the pitfalls of regression and correlation studies are now common knowledge, even to nonspecialists. The situation is especially difficult in cases where the data are trending, as indeed they are in the examples above — because “third” factors that drive the trends come into play into the behaviour of the regression, although these factors may not be at all evident in the data.
Another set of examples is here including a regression of:
Egyptian infant mortality rate (Y), 1971-1990, annual data, on Gross aggregate income of American farmers (I) and Total Honduran money supply (M), where the values of the key statistics are: R2 = .918, F = 95.17.
Ultimately, where I’m going with this is a consideration of a couple of different situations, one with which I’m very familiar and one which is new to me. The old one is the regression of MBH98 proxies in the MBH98 calibration step against MBH98 temperature PCs, and, in particular, our two old favorites: the Gaspé tree ring series and the NOAMER PC1 against the temperature PC1. The other one is going to be the regression of the satellite GLB monthly series against a time trend.
I’m writing this little guide through the technical literature, mostly for my own reference. None of our published results rely on understanding this literature. However, my intuition is that it’s relevant to some of the issues that are worrying me, so I’m trying to master the literature. I’ll try to write it relatively easily mostly so that I’m sure that I understand things, but I make no promises on sugarcoating it.
The discussions will refer to linear regression. Recently, there has been rather a fashion in multiproxy studies to propose that “scaling” proxy series to the mean and variance of the target series in the calibration period as an alternative to regression (e.g. Esper et al [GRL 2005] and references). However, it seems intuitively clear to me that, whatever the merits of this approach, they do not circumvent issues of spurious relationships. This can be seen simply by simply recognizing that “scaling” as practiced by paleoclimatologists is simply “constrained” regression i.e. regression with a restriction on the coefficients. The equivalence can be demonstrated with a trivial Lagrange multiplier argument, which I’m thinking of submitting somewhere. But for now, interested parties should bear with me and accept for now that scaling as practiced in paleoclimate is simply a form of constrained regression and not a magic bullet for avoiding problems of spurious significance.
These notes will focus heavily on autocorrelated series. Statistics as presented in curricula always starts from the concept of independent draws, but the behaviour of even simple statistics like the mean and variance in highly autocorrelated series is quite different. It turns out that these issues are intimately involved with spurious regression: one of the main reasons for spurious statistics is a massive under-estimation of standard deviation (variance) in autocorrelated series using the standard ordinary-least-squares (OLS) standard deviation (variance). A variety of technologies have been proposed in econometrics for dealing with this problem, but the issue doesn’t seem to have surfaced in paleoclimatology.
Phillips, P. . New Tools for Understainding Spurious Regressions, Econometrica, 66, 1299-1325. http://cowles.econ.yale.edu/P/cp/p09b/p0966.pdf
Phillips, P. . "Understanding Spurious Regressions in Econometrics." Journal of Econometrics, 33, 1986 http://cowles.econ.yale.edu/P/cp/p06b/p0667.pdf
Esper J, Frank DC, Wilson RJS, Briffa KR (2005) Effect of scaling and regression on reconstructed temperature amplitude for the past millennium. Geophysical Research Letters 32, doi: 10.1029/2004GL021236. http://www.wsl.ch/staff/jan.esper/publications/GRL_Esper_2005.pdf