Cohn and Lins [GRL 2005] , engagingly titled “Nature’s Style: Naturally Trendy”, questions whether recent trends in temperature can be classified as statistically significant, if considered from a more general perspective, including stochastic processes other than white noise. Some of the issues will be familiar to readers of this blog, although the treatment in Cohn and Lins is obviously different. Cohn and Lins has prompted a reaction from Rasmus Benestad, a prominent proponent of the bizarre assumption of identical independent distributions (I.I.D.) in climate statistical testsing, who accuses Cohn and Lins at realclimate here of attempting to "pitch statistics against physics". Amusingly, Benestad argues that "fairly stable" climate is evidence against Cohn and Lins, with the URL supposedly showing "fairly stable" climate being – wait for this – MBH.
Perhaps it’s helpful to refresh some background discussion on the statistical significance of trends. It’s not an issue that I’ve fully come to grips with; I think that I’ve noted up some of the recent econometric considerations, which build on problems arising out of spurious regression. However, there are some points that obviously raise one’s eyebrows.
Earlier this year, Benestad discussed here record-breaking statistics for recent temperatures using the assumption of i.i.d. statistical distributions:
the likelihood of a record-breaking event taking place in a stable system is remarkably simple (Benestad, 2003, 2004). In fact, the simplicity and the nature of the theory for the null-hypothesis (for an stable behaviour/stationary statistics for a set of unrelated observations, referred to as independent and identically distributed data, or ‘iid’, in statistics) makes it possible to test whether the occurrence of record-events is consistent with the null-hypothesis (iid)
At the beginning of August, Benestad again discussed here the significance of trends on the basis that the residuals were independently identically distributed (i.i.d) — actually normal as well — “white noise”. Oddly, the illustration in this article, reproduced below, is taken from William Connolley, also of realclimate, who has elsewhere proclaimed that he is innocent of any actual statistical expertise.
Figure 1. From realclimate, taken from William Connolley.
The look of the residuals under such circumstances is visually completely different than the look of the observed residuals from applying a trend to actual temperature records. I discussed this in the context of the satellite records here, which have a very different look to them than the normal i.i.d. noise of Connolley and Benestad. An ARMA(1,1) model fitted the residuals very well. I also showed a figure here from an economics text with a variety of residuals from different noise structures.
Cohn and Lins consider the issue of trend significance not just from the point of view of AR or ARMA(1,1) models, but from a more general type of noise model – fractionally-differenced noise (FARIMA). We used this type of model in MM05a and are familiar with it. Fractionally-differenced time series have become somewhat familiar through their association with fractals, popularized by Benoit Mandelbrot.
Cohn and Lins  say that:
The statistical significance, or p-value, associated with an observed trend, however, is more difficult to assess because it depends on subjective assumptions about the underlying stochastic process
The question remains whether natural [hydroclimatological] processes in fact possess [long-term persistence]. The idea was introduced more than 50 years ago by Hurst , and has been debated ever since [Mandelbrot and Wallis, 1968; Klemeà'¦à⟬ 1974; Potter and Walker, 1981; Hosking, 1984; Loucks et al., 1981; Koutsoyiannis, 2000, 2003]. Hurst’s fundamental finding has neither been discredited nor universally embraced, but persuasive arguments have been presented (for discussion and additional references, see Koutsoyiannis ). Given the LTP-like patterns we see in longer [hydroclimatological] records, however, such as the periods of multidecadal drought that occurred during the past millennium and our planet’s geologic history of ice ages and sea level changes, it might be prudent to assume that [hydroclimatological] processes could possess LTP.
Happily, I have already posted up on a couple of these references (all of which are interesting.) I discussed Mandelbrot and Wallis briefly here , which interested readers might consult again. The phenomenon of long-term persistence in geophysical series was originally raised by Hurst in connection with Nile River levels. Remarkably and interestingly, the data set for Nile River levels (which is very long) persists in use in mathematical literature to illustrate fractional processes and is probably more familiar there than in climatological literature. Mandelbrot and Wallis followed Hurst in looking for very long records among the fossil weather data exemplified by varve thickness and tree ring indices. They calculated Hurst indices and 3rd and 4 th moments for 12 varve series, 27 tree ring series from western U.S. (no bristlecones), 9 precipitation series, 1 earthquke frequency series, 11 river series and 3 Paleozoic sediment series. Some of the tree ring series are precursors of series used in the North American tree ring data set (a number of precursors to Stahle, who interpreted the series as being ENSO affected.)
Klemeà’¦à⟬ also cited in Cohn and Lins, was discussed here. Klemeà’¦à⟠contested the long-memory interpretation of fractional processes and argued that indistinguishable time series properties could be produced by semi-infinite storage properties of water:
An exceptionally fruitful concept for the mathematical modeling of hydrological processes is the so-called semi-infinite storage reservoir, especially the type with a fixed bottom and no fixed maximum (Klemeà’¦à⟬ 1970, 1971, 1973]. It adequately describes the basic mechanism common to such different water reservoirs as, for instance, a lake, a single dew droplet, a glacier, a groundwater basin and a man-made reservoir operated for flood control or hydroelectric generation. Their common property is on the one hand, the possibility of running dry and the other, the fact that they have no fixed limit of storage capacity (water level in a dam can rise to any elevation above the dam crest, as is demonstrated in the history of dam failures, and a glacier can cover whole continents as is documented in geological history.)
Even a very simple model of this type can reveal very disturbing properties to be expected in hydrologic processes. For instance, a single non-linear reservoir fed with white noise will produce output that is nonstationary, a first-order Markov chain with time variant serial correlation and random component [Klemeà'¦à⟠1973]…
Most geophysical processes involve strong cumulative effects: they themselves represent processes of storage fluctuations.
I find Klemeà’¦à⟠consistently interesting and think that his concepts deserve very careful consideration. Fluctuating storage is a very pretty and very difficult mathematical concept. In a way, I can even see how you can apply this type of model to El Nino phenomena, where you have accumulations of warm water (both in energy and even elevation) in the Pacific Warm Pool driven by trade winds, with intermittent "avalanches" in which the accumulation dissipates. (However nothing here turns on whether this image has any validity.) I previously posted up some of Klemeà’¦à⟧ criticism of “boastful claims of assorted “modellers” about all kinds of climate-change effects, motivated more by polities than by science and reflecting prejudices rather than fact”
So Klemeà’¦à⟬ at any rate, has a physical image of how hydrological accumulations, lakes, glaciers, clouds, etc.., can lead to complicated stochastic processes.
Back to Benestad and his argument that Cohn and Lins "pitch statistics against physics". Benestadstates of the ARMA, ARIMA, FARIMA models etc.:
these models are not necessarily representative of nature – just convenient models which to some degree mimic the empirical data. In fact, I would argue that all these models are far inferior compared to the general circulation models (GCMs) for the study of our climate, and that the most appropriate null-distributions are derived from long control simulations performed with such GCMs
Benstad goes on:
statistics is a powerful tool, but blind statistics is likely to lead one astray. Statistics does not usually incorporate physically-based information, but derives an answer from a set of given assumptions and mathematical logic. It is important to combine physics with statistics in order to obtain true answers.
Wait a minute – isn’t this the same guy who wrote about record-breaking events on the basis of i.i.d.? Why didn’t we hear about "pitching statistics against physics" at that time? Benestad then goes on to the following amusing argument:
One difficulty with the notion that the global mean temperature behaves like a random walk is that it then would imply a more unstable system with similar hikes as we now observe throughout our history. However, the indications are that the historical climate has been fairly stable.
First, Cohn and Lins never use the term "random walk" which has a very specific technical meaning. A FARIMA process is not the same as a "random walk" – why twist words when you don’t need?
Second and this is fun – here is the image at the URL illustrating the "fairly stable" historical climate:
Needless to say this is MBH – how can this be used as proof of a fairly stable climate when it is in question? Maybe this isn’t "tuning" GCMs, but it’s sure tuning discourse. The other millennial citations are Jones et al , Mann and Jones  etc.
As to the existence of a "fairly stable" climate: I presume that Benestad includes the development of continental-scale glaciers over Toronto within "fairly stable". I recently read an article by Nicolas Scafetta in which stochastic processes (very long-tailed) were identified in solar behavior; he discerned similar patterns in earth temperatures. Without opining on the substance of any of the conclusions, the identification of stochastic behavior in the sun (or on the earth) is hardly inconsistent with the laws of physics. But not according to Benestad:
An even more serious problem with Cohn and Lins’ paper as well as the random walk notion is that a hike in the global surface temperature would have physical implications – be it energetic (Stefan-Boltzmann, heat budget) or dynamic (vertical stability, circulation). In fact, one may wonder if an underlying assumption of stochastic behaviour is representative, since after all, the laws of physics seem to rule our universe…
And, to re-iterate on the issues I began with: It’s natural for molecules under Brownian motion to go on a hike through their random walks (this is known as diffusion), however, it’s quite a different matter if such behaviour was found for the global planetary temperature, as this would have profound physical implications. The nature is not trendy in our case, by the way – because of the laws of physics.
As to Benestad’s claim that:
statistics is a powerful tool, but blind statistics is likely to lead one astray.
it is hard to find a better example than the Hockey Team itself. The "confidence intervals" in Hockey Team reconstructions are a confidence game. As I’ve discussed elsewhere (most recently AGU PPT), their confidence intervals are calculated from residuals from an over-fitted and mis-specified model in the calibration period, rather than from the verification period. Since the verification period R2′s are ~0, the standard errors are the same as natural vaiability, whatever that is and no confidence whatever can be attached to the reconstruction.
Cohn, T. A., and H. F. Lins (2005), Nature’s style: Naturally trendy, Geophys. Res. Lett., 32, L23402, doi:10.1029/2005GL024476. SI here