"Spurious significance" was a phrase used in the title of our GRL article. We regarded this as perhaps the most essential point of the article, but it seems to have gotten lost. This is the second of a planned series of notes on spurious significance, to give a sense of the statistical background. Granger and Newbold  posted up here is an extremely famous article, which starts off the modern discussion of the problem of spurious regression. Granger is a recent Nobel laureate in economics.
Granger and Newbold observed that, although the classic spurious regressions (see Spurious #1) had very high R2 statistics, they had very low (under 1.5) Durbin-Watson (DW) statistics. (The DW statistic measures autocorrelation in the residuals.) Granger and Newbold:
It is very common to see reported in applied econometric literature time series regression equations with an apparently high degree of fit, as measured by the coefficient of multiple correlation R2 or the corrected coefficient R2, but with an extremely low value for the Durbin-Watson statistic. We find it very curious that whereas virtually every textbook on econometric methodology contains explicit warnings of the dangers of autocorrelated errors, this phenomenon crops up so frequently in well-respected applied work. Numerous examples could be cited, but doubtless the reader has met sufficient cases to accept our point. It would, for example, be easy to quote published equations for which R2 = 0.997 and the Durbin-Watson statistic (d) is 0.53. The most extreme example we have met is an equation for which R2 = 0.99 and d = 0.093.,,
Granger and Newbold moved beyond the framework of curious examples by doing simulations in which they generated series of random walks, regressing one against another. They found that these regressions consistently had “statistically significant”‘? F-statistics (the F-statistic is related to the R2 statistic) and suggested that the Durbin-Watson (DW) statistic did a good job of identifying problems. They didn’t argue that a failed DW statistic was a necessary condition of a spurious condition, but they certainly argued that a failed DW statistic was sufficient for a failed model. Granger and Newbold:
It has been well known for some time now that if one performs a regression and finds the residual series is strongly autocorrelated, then there are serious problems in interpreting the coefficients of the equation. Despite this, many papers still appear with equations having such symptoms and these equations are presented as though they have some worth. It is possible that earlier warnings have been stated insufficiently strongly. From our own studies we would conclude that if a regression equation relating economic variables is found to have strongly autocorrelated residuals, equivalent to a low Durbin-Watson value, the only conclusion that can be reached is that the equation is mis-specified, whatever the value of R2 observed. “⤍
It is not our intention in this paper to go deeply into the problem of how one should estimate equations in econometrics, but rather to point out the difficulties involved. In our opinion the econometrician can no longer ignore the time series properties of the variables with which he is concerned – except at his peril. The fact that many economic “‘œlevels’ are near random walks or integrated processes means that considerable care has to be taken in specifying one’s equations”⤍
One cannot propose universal rules about how to analyse a group of time series as it is virtually always possible to find examples that could occur for which the rule would not apply.
In this first systematic article on spurious regression, you can see what appears to me to be the over-riding goal of theoreticians: to find a statistic or statistics which can identify spurious relationships in an unsupervised way i.e. as some functional of the data and the residuals.
While Granger and Newbold did not propose the DW statistic as a magic bullet for testing spurious regressions, not performing a DW statistic on a regression relating highly autocorrelated series would be inconceivable for any econometrician after 1974. I’ve seen occasional use of DW statistics in paleoclimate articles, but very few. Given the remarkable autocorrelations in paleoclimate series, you would think that it would be a very standard test. It’s almost as though paleoclimatologists are afraid to use this test.
I’ll give some examples tomorrow in series that we’ve discussed in the past.