Granger and Newbold  provided examples of spurious significance in a random walk context. This has been extended by various authors to a number of other persistent processes. Granger and Newbold suggested that the DW statistic could be used to test the autocorrelation in the residuals, giving a test that could be used in a relatively unsupervised way to check for spurious relationships. Here are some examples from cases familiar to readers: Gaspé cedars, the MBH98 NOAMER PC1, the MBH98 reconstruction and satellite temperature trends.
First, here is a graphic showing autocorrelation functions for some of the series that have interested us: the Gaspé cedar series, the NOAMER PC1, the temperature PC1 and, for comparison, the Central England series. The lesser autocorrelation in the CEngland series is quite dramatic. The autocorrelations of the Gaspé cedar series, NOAMER PC1 and temperature PC1 are all high enough that regressions involving them are in a red zone.
Figure 1. Selected Autocorrelation Functions.
In Spurious Significance # 2 […], I quoted the following conclusion from Granger and Newbold :
It has been well known for some time now that if one performs a regression and finds the residual series is strongly autocorrelated, then there are serious problems in interpreting the coefficients of the equation. Despite this, many papers still appear with equations having such symptoms and these equations are presented as though they have some worth. It is possible that earlier warnings have been stated insufficiently strongly. From our own studies we would conclude that if a regression equation relating economic variables is found to have strongly autocorrelated residuals, equivalent to a low Durbin-Watson value, the only conclusion that can be reached is that the equation is mis-specified, whatever the value of R2 observed.
Here are some practical examples.
Gaspé Cedar Series
In our EE article, we discussed the Gaspé cedar series at considerable length. Together with the NOAMER PC1, it is one of two hockey stick shaped series that imprint the 15th century MBH98 results. We’ve shown that this series was specifically edited by MBH to insert it into the AD1400 calculations, where it lowered 15th century values (and that this editing was not reported at the time.) We showed that the series had many quality control problems, that later versions do not have a hockey stick shape, that the proponents have refused to identify the location of the site for re-sampling etc. etc. Here we look at a much simpler question: what are the results of the DW test and should it have given rise to concerns that there was a mis-specified relationship affecting the Gaspé tree rings?
It turns out that this is a classic case of a relationship failing Granger and Newbold criteria. The regression has a very high correlation (r- 0.59; r2 -.35) to the temperature PC1 (that’s why it gets very highly weighted in the regression phase of the reconstruction. However, the DW statistic is 1.08 ; the p-value for this statistic is 4.925e-06, which mandates the conclusion that the relationship is mis-specified. No econometrician would present a calculation which depended on a step in which the DW statistic was 1.08. The idea of specifically editing a data set so that a series with a DW statistic of 1.08 can be inserted into a relationship and affect final results would be incomprehensible to any modern statistician.
Some of the defences to MBH98 have been that in a multiproxy method, errors or mis-specification in individual proxies will get washed out. One of the fundamental points in both our articles is that this claim is just arm-waving and not proven; in fact, the results are highly dependent on individual series: bristlecones and the Gaspé cedar series, and mis-specifications do not get washed out.
The MBH98 NOAMER PC1
The second key calibration is of course between the temperature PC1 and the MBH98 North American tree ring PC1 (which is essentially the bristlecones), which is the PC4 in a centered calculation. Relative to the Gaspé series, this has a slightly lower correlation to the temperature PC1 ( r — 0.46; R2 — 0.22). We discussed the relationship between bristlecone growth and temperature at length in our EE article and it appears highly probable that the relative high correlation between the NOAMER PC1 and the temperature PC1 is spurious. In this case, the DW statistic is right at the edge of rejection DW= 1.5668 (p-value = 0.02064). As Ross mentioned in a post yesterday, the DW statistic only measures AR1 serial correlation. Unsupervised statistics are not a magic bullet; here the DW statistic is very much in a danger zone and careful analysis of this critical relationship should have been carried out.
MBH98 NH Temperature Reconstructions
The DW statistics for the MBH98 temperature reconstructions are a little further away from the red zone. MBH have never reported (and still refuse to provide) a digital version of their AD1400 step. The DW that I obtained from AD1400 result from my Wahl-Ammann run-through (see […] was DW = 1.6335, p-value = 0.04965. For the AD1820 MBH98 step (which is archived), the DW = 1.7468 ( p-value = 0.1285).
Satellite Temperature Trends
I’ve posted up a graphic showing the “trend” in satellite temperatures. Formally, this “trend” is generated by a regression of the data against time. Here the DW statistic is DW = 0.4445 (p-value < 2.2e-16), clearly failing the test for autocorrelated residuals.
I’m interested in the reasons why the DW statistic goes out of the red zone going from the NOAMER PC1/Gaspé to the MBH98 reconstruction. My understanding is that the other proxies essentially add white noise as "ripples" on the wave. The addition of the ripples takes the DW statistic out of the red zone without any change in the spurious relationship. This is easy to say and, not even that hard to picture once you get there. I hope to show how this type of effect is consistent with Ferson et al  and Deng , which follow on from Phillips  which gives some motivation for this.
Tomorrow, I’ll review our reconstruction of NH temperature using dot.com stock prices to show another example of spurious significance – this time with RE statistics, and, in a few days, start in with Phillips , who provides a theoretical framework for spurious t-statistics. Ross also pointed out that the DW test only measured for AR1 relationships in the residuals and suggested the LM test – I’ll try that on the examples shown here as well.