One of the examples of spurious regression mentioned in Phillips 1998, quoted by Eduardo Zorita at CPD and previously here by me, was taken from Hendry 1980, from an article entitled “Econometrics – Alchemy or Science?”. Hendry is a very eminent professor of economics and the article proved to be as interesting as its title.
Before I present Hendry’s example of spurious regression, here are some extended quotes from Hendry’s criticisms of econometrics, which are very reminiscent of views expressed here about multiproxy studies. The resemblance is not accidental as that is a framework that I approach the topic from. One of my regrets about the NAS panel was that they saw fit not to include a statistician qualified on these issues.
Hendry introduces his comments by an extended summary of Keynes’ 1940 critique of Tinbergen’s econometric modeling, an article worth reading in its own right, as Keynes is perhaps the leading 20th century economist (who plied statistical waters in his early days – I noticed a 1908 article coauthored by him and Yule of spurious regression renown.) Hendry:
Despite its obvious potential, econometrics has not had an easy time from many who have made major contributions to the development of economics, beginning from Keynes’ famous review in 1939 of Tinbergen’s book Statistical testing of Business Cycle Theories. In an oft quoted passage in his Comment (1940, 156), Keynes accepts that Tinbergen’s approach is objective but continues
No one could be more frank, more painstaking, more free from subjective bias or parti pris than Professor Tinbergen. There is no one therefore so far as human qualities go, whom it would be safer to trust with black magic. That there is anyone I would trust with it at this present stage or that this brand of statistical alchemy is ripe to become a branch of science, I am not yet persuaded.
It’s interesting to note Keynes’ praise for Tinbergen’s personal qualities. Anonymous Referee #2 obviously expresses these selfless qualities of “frnkness”, “freedom from subjective bias” and lack of parti pris. Hendry summarizes Keynes’ critique as follows – an almost perfect list of the multiproxy problems:
His objections make an excellent list of what might be called “problems of the linear regression model”, namely (in modern parlance): using an incomplete set of determining factors (omitted variables bias); building models with unobservable variables (such as expectations), estimated from badly measured data based on index numbers (Keynes calls this “the frightful inadequacy of most of the statistics”); obtained “spurious correlations from the use of “proxy” variables and simultaneity as well as (and I quote) “the mine [Mr Yule] sprang under the contraptions of optimistic statisticians”; being unable to separate the distinct effects of multicolllinear variables; assuming linear functional forms not knowing the appropriate dimensions of the regresses; mis-specifying the dynamic reactions and lag lengths; incorrectly pre-filtering the data; invalidly inferring “causes” from correlations; predicting inaccurately (non-constant parameters); confusing statistical with economic “significance of trends and failing to relate economic theory to econometrics” To Keynes’ list of problems, I would add stochastic misspecification, incorrect erogeneity assumptions, inadequate sample sizes, aggregation, lack of structural identification, and an inability to refer back uniquely from observed empirical results to any given initial theory.
Hendry reports that these issues recur in the 1970s in a variety of articles by leading economists (including Leontief):
An echo of this debate recurs in the early 1970s. For example, following a sharp critique of mathematical economics as having “no links with concrete facts”, Worswick (1972) suggests that some econometricians are not “engaged in forging tools to arrange and measure actual facts, so much as making a marvellous array of pretend tools: In the same issue of the Economic Journal, Phelps Brown (1972) concludes against econometrics, commenting that “running regressions between time series is only likely to deceive”. Added to these innuendoes of “alchemical” practices, Leontief (1971) has characterized econometrics as an “Attempt to compensate for the glaring weakness of the data base available to us by the widest possible use of more and moir” sophisticated statistical techniques”. To quote Hicks, “the relevance of these methods to economics should not be taken for granted;..Keynes would not have been surprised to find that “econometrics is now in some disarray (1979, xi).
Hendry starts into his own example with the following wonderful phrase:
Econometricians have their Philosophers’ Stone; it is called regression analysis and is used for transforming data into “significant” results! Deception is easily practised from false recipes intended to simulate useful findings and these are derogatively referred to by the profession as “nonsense regressions”.
Now for the example quoted by Phillips, an equally or even better known econometrician. Hendry first shows graphs and models relating price levels to money supply, an important economic theory. After presenting these results, he then presents an alternative theory with a number of illustrations, two of which are shown below:
A second example will clarify this issue. Hendry’s theory of inflation is that a certain variable (of great interest in this country) is the “real cause” of rising prices. I am certain that the variable (denoted C) is exogenous, that causality is from C to P only and (so far as I am aware) C is outside government control although data are readily available in government publications.
Of the relationship in the above figure, Hendry reports:
there is a “good fit”, the coefficients are “significant”, but autocorrelation remains and the equation predicts badly. However assuming a first order autoregressive error process at last produces the results I anticipated; the fit is spectacular, the parameters are “highly significant”, there is no obvious residual autocorrelation (on an “eyeball ” test and the predictive test does not reject the model [see the Figure below]
Hendry then explains how he was able to improve on monetary theory of inflation:
My theory performs decidedly better than the naàƒ⮶e version of the monetary one, but alas the whole exercise is futile as well as deceitful since C is simply cumulative rainfall in the UK. It is meaningless to talk about “confirming theories” when spurious results are so easily obtained.
Doubtless some equations extant in econometric folklore are little less spurious than those I have presented. Before you despair at this hopeless subject, the statistical problem just illustrated in one of its manifestations by Yule in 1926 and has been re-emphasized many times since (see in particular Granger and Newbold 1974).
In any of these guises, the relationships shown above will have a massively significant RE statistic. The lesson to be learned from this is surprisingly easy, and, as I currently formulate it, I wonder why it’s taken me so long to pose it as I am going to do today. Before doing so, I’ll mention that Hendry says that there are adequate tests for spurious relationships in this univariate setting:
We understand this problem and have many tests for the validity of empirical models (those just quoted fail two such tests. [The two chi-squared values in Figure 8 are a (likelihood ratio) test for a common factor and a Box-Pierce test for residual autocorrelation respectively “€œ see Sargan 1975, Mizon and Hendry 1980, Pierce 1971 and Breusch and Pagan 1980 “€œ both of which “reject” the model specification.
The lesson for the RE test is surprisingly simple under all the hyperventilating: statistical tests all have a purpose, to test against null hypotheses. But watch what the econometricians do – they have different tests for different purposes. Let’s apply this to some of the recent examples.
Rutherford et al 2005 (and Wahl and Ammann 2006) argue that the verification r2 test is a poor test because (using more precise statistical terminology not used by them) it has little power against a jump-shift in mean of a series, while still preserving high-frequency relationships. I agree that the statistic has little power (i.e. ability to discriminate against) this event, but this is not a situation that has any practical relevance. While the verification r2 test would fail to identify this unlikely event, other tests would (and the RE test is useful for that.)
Mann points out that the RE test works well against a null of the proxy reconstruction being AR1 red noise – fair enough. The only problem is that nobody is suggesting that the proxy reconstruction is a simple AR1 red nosie series.
The nulls that need to be tested for multiproxy studies are completely different:
(1) a spurious trend (e.g. CO2 fertilization in bristlecones (more or less classic “nonsense” regression, but complicated by the multivariate setting, where less is known about spurious regression effects;
(2) biased cherrypicking of red noise series – a different but potentially interrelated mechanism.
Thinking about how to test against these nulls is what needs to be done. The RE statistic is completely useless against these nulls. Expressed this way, one says what idle puffery is contained in recent ruminations by Mann and his disciples about why one should ONLY look at the RE statistic – a conclusion NOT adopted by the NAS Panel, although they were pretty inarticulate about their reasons. (As I noted above, wouldn’t it have been nice if someone of Hendry’s background had been invited to be on the panel instead of even one of Bloomfield or Nychka.)
Hendry, D.F. 1980. Econometrics “: Alchemy or Science? Economica 47, 387-406.