When one looks at the plots of the various Juckes proxies against gridcell temperature, the possibility of spurious regression must come to mind.
“Spurious regression” has been discussed on this blog from time to time and tries to provide a statistical framework for seemingly high correlations between unrelated series – things like Honduran births and Australian wine exports. The original article on the topic by Yule in 1928 observed a correlation of something like 0.97 between alcoholism and Church of England marriages. A prominent econometrician (Hendry) observed in the early 1980s that rainfall provided an excellent statistical explanation of inflation in an interesting article “Econometrics – Alchemy or Science?” (url).
The same question – Alchemy or Science? – is surely applicable to proxymetrics. In 1974, Granger and Newbold, the former a Nobel Prize winning economist, wrote an influential article on Spurious Regression, posted up here, which I discussed last year here.
Granger and Newbold observed that, although the classic spurious regressions (see Spurious #1) had very high correlation (r2 statistics), they had very low (under 1.5) Durbin-Watson (DW) statistics. (The DW statistic measures autocorrelation in the residuals.)
Granger and Newbold:
It is very common to see reported in applied econometric literature time series regression equations with an apparently high degree of fit, as measured by the coefficient of multiple correlation R2 or the corrected coefficient R2, but with an extremely low value for the Durbin-Watson statistic. We find it very curious that whereas virtually every textbook on econometric methodology contains explicit warnings of the dangers of autocorrelated errors, this phenomenon crops up so frequently in well-respected applied work. Numerous examples could be cited, but doubtless the reader has met sufficient cases to accept our point. It would, for example, be easy to quote published equations for which R2 = 0.997 and the Durbin-Watson statistic (d) is 0.53. The most extreme example we have met is an equation for which R2 = 0.99 and d = 0.093.,,
Granger and Newbold found that regressions between random walks consistently had “statistically significant” F-statistics (or equivalent statistically significant correlation r statistics) – they didn’t mention whether they were “99.98% significant”, but some would have been. Granger and Newbold suggested that the Durbin-Watson (DW) statistic did a good job of identifying problems. They didn’t argue that a failed DW statistic was a necessary condition of a spurious condition, but they certainly argued that a failed DW statistic was sufficient for a failed model.
Granger and Newbold:
It has been well known for some time now that if one performs a regression and finds the residual series is strongly autocorrelated, then there are serious problems in interpreting the coefficients of the equation. Despite this, many papers still appear with equations having such symptoms and these equations are presented as though they have some worth. It is possible that earlier warnings have been stated insufficiently strongly. From our own studies we would conclude that if a regression equation relating economic variables is found to have strongly autocorrelated residuals, equivalent to a low Durbin-Watson value, the only conclusion that can be reached is that the equation is mis-specified, whatever the value of R^2 observed. …
It is not our intention in this paper to go deeply into the problem of how one should estimate equations in econometrics, but rather to point out the difficulties involved. In our opinion the econometrician can no longer ignore the time series properties of the variables with which he is concerned – except at his peril. The fact that many economic “Ålevels’ are near random walks or integrated processes means that considerable care has to be taken in specifying one’s equations… One cannot propose universal rules about how to analyse a group of time series as it is virtually always possible to find examples that could occur for which the rule would not apply.
Let us now visit the residuals of Juckes’ Union reconstruction. Here is a plot of residuals between the Union reconstruction and the archived instrumental temperature.
Figure 1. Residuals from Juckes “Union” reconstruction and archived instrumental temperature.
Plotting of residuals is a standard operation in applied statistics. A simple inspection of the plot strongly suggests the possibility of autocorrelated errors. A Durbin-Watson test (which is a very elementary test) returns a value of 1.09 far below the required minimum of 1.5. The p-value for autocorrelated residuals is 1.018e-07 – less than one in a millll-yun.
Following Granger and Newbold, the following can be asserted of the supposed relationship between temperature and the Juckes Union reconstruction:
“the only conclusion that can be reached is that the equation is mis-specified, whatever the value of R^2 observed”
If you do an elementary arima fit on the residuals, one gets highly significant ARMA(1,1) coefficients relative to the standard errors of the estimates:
ar1 ma1 intercept
0.9505 -0.7545 -0.0104
s.e. 0.0406 0.0941 0.0440
It is difficult (rather impossible) to contemplate an undergraduate econometrics student presenting a univariate relationship between two variables, with correlation between the only statistical test being performed. It is impossible to contemplate a student failing to carry out a Durbin-Watson (or equivalent) test. It is hard to imagine the response of statistical reviewers to a team of professors presenting an econometrics paper based on a single univariate equation with a Durbin-Watson statistic of 1.09. (It is taking all my will-power to avoid making a snarky comment.)
Now think about the reviewing by paleoclimatologists at Climates of the Past. No one has raised this topic. I suppose that either I or someone else will wander over to CPD and make this and other observations – but it would be nice to see some adequate reviewing within the discipline.