One of the papers that has most informed my views on multiproxy studies (and I’ve mentioned it from time to time) is Ferson et al. , Understanding Spurious Regressions in Financial Economics which I read a couple of years ago. "Spurious regression" here is a false relationship between series, frequently observed with highly autocorrelated series – random walks are the classic example of Granger and Newbold , but the effect is also observable in finite samples of high-AR1 series. These very high AR1 coefficients are characteristic of both proxies and temperature PC series. Some of the phrases from Ferson should send chills up the spine of anyone relying on multiproxy studies:
Data mining for predictor variables [proxies] interacts with spurious regression bias. The two effects reinforce each other because more highly persistent series are more likely to be found significant in the search for predictor variables. Our simulations suggest that many of the regressions in the literature, based on individual predictor variables, may be spurious…
If the expected return accounts for 1 per cent of the stock return variance, mining among 5 to 10 instruments has as much impact as 50 to 100 instruments with no spurious regression. Assuming we sift through only 10 instruments, all of the regressions from the previous studies in Table I appear consistent with a spurious mining process. The pattern of evidence in the instruments in the literature is similar to what is expected under a spurious mining process with an underlying persistent expected return. In this case, we would expect instruments to arise, then fail to work out of sample.
Readers of this blog are familiar with the fact that the classic proxies are hugely autocorrelated, especially the NOAMER PC1 (whose autocorrelation is off the chart, as is the Gaspé tree ring series.) I’ve been aware of these big autocorrelations for a long time, although I’m only now getting into a position where I can discuss these issues from a more theoretical perspective. There are some interesting connections of Ferson et al  with ARMA(1,1) processes, an observation made in Deng . These connections sparked my binge of posts in August on ARMA (1,1) processes, although the connection would not be apparent and I didn’t mention it at the time. It’s possible that some portion of the autocorrelation in the bristlecones is exacerbated by a non-stationary trend (fertilization). Trends are hard to separate out from high autocorrelation. The difference is not material to any points made here.
Obviously, we’re getting into very high AR1 coefficients in both proxies and temperature PC1s. Modeled as ARMA(1,1) processes, the AR1 coefficient is typically even higher than in a pure AR1 model, intensifying the problem in any situations of interest here. Anytime you see AR1 coefficients greater than 0.9 and especially greater than 0.95, warning labels need to be attached to the regressions as you get into spurious regression territory in finite samples.
Here is a more extended excerpt (I’ll try to do a more detailed commentary some time):
Abstract: Even though stock returns are not highly autocorrelated, there is a spurious regression bias in predictive regressions for stock returns related to the classic studies of Yule  and Granger and Newbold . Data mining for predictor variables interacts with spurious regression bias. The two effects reinforce each other because more highly persistent series are more likely to be found significant in the search for predictor variables. Our simulations suggest that many of the regression in the literature, based on individual predictor variables, may be spurious.
“Unlike the regressions in those papers [Yule 1926 and Granger and Newbold 1974], asset pricing regressions use rates of return, which are not highly persistent, as the dependent variables. However, asset returns are the expected returns plus unpredictable noise. If the expected returns are persistent, then there is a risk of finding a spurious relation between the return and an independent, highly autocorrelated lagged variable.
Where there is no persistence in the true expected return, the spurious regression phenomenon is not a concern. This is true even when the measured regressor is highly persistent. …
Given persistent expected returns, we find that spurious regression can be a serious concern. The problem for stock returns gets worse as the autocorrelation in the expected returns increases and as the fraction of the stock return variance attributed to the conditional mean increases. ..we find that 7 of the 17 statistics that would be considered significant using traditional standards are no longer significant in view of the spurious regression bias. We therefore call into question the validity of specific instruments identified in the literature, such as the term spread, boo-to-market ratio and dividend yield.
Data mining, in the form of a search through the data for high-R2 predictors, results in regressions whose apparent explanatory power occurs by chance. (p. 1410). ..In the presence of spurious regression, persistent variables are likely to be mined and the two effects reinforce each other. …If the expected return accounts for 1 per cent of the stock return variance, mining among 5 to 10 instruments has as much impact as 50 to 100 instruments with no spurious regression. Assuming we sift through only 10 instruments, all of the regressions from the previous studies in Table I appear consistent with a spurious mining process.
The pattern of evidence in the instruments in the literature is similar to what is expected under a spurious mining process with an underlying persistent expected return. In this case, we would expect instruments to arise, then fail to work out of sample. With fresh data, new instruments would arise then fail; the dividend yield rose to prominence in the 1980s, but fails to work in post-1990 data. The book-to-market ratio seems to have weakened in recent data, With fresh data, new instruments seem to work. There are two implications. First we should be concerned that these new instruments are likely to fail out of sample. Second, any stylized facts based on empirically motivated instruments and asset pricing tests based on such tests should be viewed with scepticism.
We see plenty of examples in climate science that correspond to Ferson’s hypothethical analyst. Consider Jacoby searching through 36 sites to locate the 10 most "temperature sensitive". Consider Briffa (or Esper) searching through hundreds of Schweingruber (and other) series. No one knows how the series in Crowley and Lowery were selected, but one presumes that a little "sifting" might have taken place.
Consider Mann’s PC methodology, which, in a sense, automates Jacoby’s procedure (this is an equivalence that I’ve tested through simulations.) Whereas Jacoby’s mining is accomplished by a weighting of 1/N on included series and 0 on excluded series, Mann’s PC1 puts weightings close to 1/N on mined series (bristlecones) and close to 0 on excluded series. You have to think about it a little, but it’s not hard once you see it. I’m inching towards a little more subtle explanation of Mann’s regression module, but it looks like this is a form of data mining as well, with weightings of the 112 variables being high or low according to the data-mining criterion. (With 112 variables in a regresion-type model for a serie sof length 79, one would expect some decent calibration R2′s. If the selected predictors are persistent (per Ferson), you will get high REs against a short (48 year) verification period, but poor out-of-sample R2′s. Sound famliar?
Ferson’s comments about out-of-sample breakdown very much influence my expectations about the post-1980 behavior of bristlecones, Gaspé, TTHH etc. We’re already seeing complaints by Jacoby that TTHH has broken down as a linear proxy; Briffa complains about some "unknown anthropogenic" factor lowering late 20th century RW and MXD series. Unreported Gaspé re-sampling did not show a hockeystick. If Hughes’ 2002 sampling at Sheep Mountain had shown big ring widths following emperature, we’d have heard about it. The dog didn’t bark.
Just like Ferson’s scenario of new proxies emerging as the old ones fail out-of-sample, isn’t that what we’re seeing now? All of sudden we’re seeing offshore Oman coldwater diatoms, Briffa-adjusted Yamal tree rings. At any given time, there will be some "proxies" with hockey stick shapes. But do they work out-of-sample? Is this science "advancing" or simply more interaction of data mining and spurious regression?
References: Ferson, W., S. Sarkissian and T Simin, 2003. Ferson, et al., 2003. Spurious regressions in financial economics, Journal of Finance, 58(4), 1393-1413; http://www.cass.city.ac.uk/faculty/g.urga/files/FersonEtAl2003.pdf
Ferson, W., S. Sarkissian and T Simin, 2003b. Is Stock return predictability spurious?, J of Inv Management 4(3), 1-10.
Deng, Ai, 2005. Understanding spurious regression in
financial economics. http://econ.bu.edu/perron/seminar-papers/spurious_1.pdf