## Spurious Significance #2 : Granger and Newbold 1974

"Spurious significance" was a phrase used in the title of our GRL article. We regarded this as perhaps the most essential point of the article, but it seems to have gotten lost. This is the second of a planned series of notes on spurious significance, to give a sense of the statistical background. Granger and Newbold  posted up here is an extremely famous article, which starts off the modern discussion of the problem of spurious regression. Granger is a recent Nobel laureate in economics.

Granger and Newbold observed that, although the classic spurious regressions (see Spurious #1) had very high R2 statistics, they had very low (under 1.5) Durbin-Watson (DW) statistics. (The DW statistic measures autocorrelation in the residuals.) Granger and Newbold:

It is very common to see reported in applied econometric literature time series regression equations with an apparently high degree of fit, as measured by the coefficient of multiple correlation R2 or the corrected coefficient R2, but with an extremely low value for the Durbin-Watson statistic. We find it very curious that whereas virtually every textbook on econometric methodology contains explicit warnings of the dangers of autocorrelated errors, this phenomenon crops up so frequently in well-respected applied work. Numerous examples could be cited, but doubtless the reader has met sufficient cases to accept our point. It would, for example, be easy to quote published equations for which R2 = 0.997 and the Durbin-Watson statistic (d) is 0.53. The most extreme example we have met is an equation for which R2 = 0.99 and d = 0.093.,,

Granger and Newbold moved beyond the framework of curious examples by doing simulations in which they generated series of random walks, regressing one against another. They found that these regressions consistently had “statistically significant”‘? F-statistics (the F-statistic is related to the R2 statistic) and suggested that the Durbin-Watson (DW) statistic did a good job of identifying problems. They didn’t argue that a failed DW statistic was a necessary condition of a spurious condition, but they certainly argued that a failed DW statistic was sufficient for a failed model. Granger and Newbold:

It has been well known for some time now that if one performs a regression and finds the residual series is strongly autocorrelated, then there are serious problems in interpreting the coefficients of the equation. Despite this, many papers still appear with equations having such symptoms and these equations are presented as though they have some worth. It is possible that earlier warnings have been stated insufficiently strongly. From our own studies we would conclude that if a regression equation relating economic variables is found to have strongly autocorrelated residuals, equivalent to a low Durbin-Watson value, the only conclusion that can be reached is that the equation is mis-specified, whatever the value of R2 observed. “⤍

It is not our intention in this paper to go deeply into the problem of how one should estimate equations in econometrics, but rather to point out the difficulties involved. In our opinion the econometrician can no longer ignore the time series properties of the variables with which he is concerned – except at his peril. The fact that many economic “‘œlevels’ are near random walks or integrated processes means that considerable care has to be taken in specifying one’s equations”⤍
One cannot propose universal rules about how to analyse a group of time series as it is virtually always possible to find examples that could occur for which the rule would not apply.

In this first systematic article on spurious regression, you can see what appears to me to be the over-riding goal of theoreticians: to find a statistic or statistics which can identify spurious relationships in an unsupervised way i.e. as some functional of the data and the residuals.

While Granger and Newbold did not propose the DW statistic as a magic bullet for testing spurious regressions, not performing a DW statistic on a regression relating highly autocorrelated series would be inconceivable for any econometrician after 1974. I’ve seen occasional use of DW statistics in paleoclimate articles, but very few. Given the remarkable autocorrelations in paleoclimate series, you would think that it would be a very standard test. It’s almost as though paleoclimatologists are afraid to use this test.

I’ll give some examples tomorrow in series that we’ve discussed in the past.

1. Ross McKitrick
Posted Aug 22, 2005 at 11:46 PM | Permalink

Durbin-Watson is part of the standard introductory treatment of econometrics and has been for decades, because it comes up a lot and autocorrelation matters a lot. However DW has a couple of limitations. It’s got an obscure distribution (but so what, there are tables and Shazam can compute the exact p-value), it’s not valid if there are lagged dependent variables, and it only tests for AR1. There’s another test that has a fancy-sounding name and is easy to do (2 big advantages, in my view), called the LM test, which is more general and which is steadily getting into the texts. Or there’s the brute force method of estimating models with ARMA residuals and testing lags for insignificance.
The connection to the term “spurious” is building across these notes, but already a key point is worth stressing. When you do a regression the package mechanically computes the ratio of the estimated parameter to the estimated standard error and sticks it in a column under the heading “t-statistic”. But that is no guarantee the number therein came from a data generating process that follows a t-distribution. You have to be able to rule out some influential model misspecification problems. Otherwise you might be comparing your “t-statistic” to the wrong critical values. In the case of Granger and Newbold they looked at regressing random walks on each other. In that case a “t-stat” of, say, 4.0 does not mean the relationship is significant since the ratio in question doesn’t follow a t distribution. “Spurious significance” in this sense means comparing your test statistic to the wrong benchmark and concluding you have significance when in reality you do not.

2. Paul
Posted Aug 23, 2005 at 3:33 AM | Permalink

I would go further than this:

not performing a DW statistic on a regression relating highly autocorrelated series would be inconceivable for any econometrician after 1974

Any reasonably knowledgable econometrician would perform such tests on ALL the time series in use PRIOR to any regression, in order to determine the existance of units roots and hence understand the nature of the data being analysed and the potential statistical pitfalls that could result.

That has always been one of the glaring omissions on all hte time series work on temperature or proxies – some basic unit root test on the data and a description of whether they are stationary or not.

3. Steve McIntyre
Posted Aug 23, 2005 at 7:39 AM | Permalink

Some of the articles that have interested me the most pertain to situations where the DW statistic does not work. That accounts for much of my present interest in ARMA(1,1) statistics, where Feng  appears to have used “almost integrated almost white” (ARMA(1,1) processes to explain spurious regressions in Ferson et al . I’m hoping to get there in these notes without stumbling too much.

4. Ross McKitrick
Posted Aug 23, 2005 at 4:13 PM | Permalink

Re #2: Individual series need to be tested for unit roots, but this is different from applying a DW test on a regression model. And there certainly are papers that examine geophysical data for nonstationarity prior to proceeding with trend modeling or other analysis. But the result is a body of literature that is divided on whether temperature is nonstationary or highly autocorrelated; either way it means the data need to be handled and interpreted carefully.

5. Steve McIntyre
Posted Aug 23, 2005 at 6:06 PM | Permalink

Paul, one of the interesting features of the temperature series is that, modeled as ARMA(1,1), their AR coefficients are >>0.9 (and a negative MA1 coefficient), quite a bit higher than modeled as ARMA(1,0), which is the more usual comparison. I don’t entirely know where this leads, but I’m going to get to some econometric models raising real issues about spurious significance in this type of context.

6. Paul
Posted Aug 24, 2005 at 5:18 AM | Permalink

#5.

Indeed. Especially for forecasting purposes! (multiple equation models employing lagged endogenous variables exhibiting autocorrelation and all that)