One of the Kevins has drawn Appendix A “Statistical Issues Regarding Trends” in the recent USCCSP report "Temperature Trends in the Lower Atmosphere" to my attention. The appendix is coauthored by the omnipresent Wigley.
It’s quite amazing and, from where I sit professionally, very disturbing. Re-inventing long-established methods (sometimes getting it wrong), strange terminology, blatant errors of omission and commission (MBH have lots of company) etc. all point to a divorce between the climate science community and the mainstream statistical community, as Wegman noted…
Even if there were no potential human cost to not doing things properly, I must say it irks me to see folks doing things that would land me in the street…and becoming celebrities in the process to boot!
Aside from the juvenile tone, what is wrong with it? As a start, the handling of autocorrelation. Readers of this site – or readers of Koutsoyannis or David Stockwell – know that AR1 is not a suitable model for a climate series null process.
I don’t mean to imply that there’s some great gotcha staring everyone in the face. It’s just that it’s a very bad piece of work. I don’t have time to fully discuss it, but perhaps others will.
Update: Against my better judgement, I’ve spend some time looking at the references for their AR1 autocorrelation model. Santer et al (Science 2000) discusses autocrrelation issues as follows in the legend to Figure 1:
Confidence intervals are adjusted to account for temporal autocorrelation in the data (21).
Footnote 21 says:
The method for assessing statistical signiàÆà ⽣ance of trends and trend differences is described by B. D. Santer et al. ( J. Geophys. Res., in press). It involves the standard parametric test of the null hypothesis of zero trend, modiàÆà ⽥d to account for lag-1 autocorrelation of the regression residuals [see J. M. Mitchell Jr. et al., Climatic Change, World Meteorological Organization Tech. Note 79 ( World Meteorological Organization, Geneva, 1966)]. The adjustments for autocorrelation effects are made both in computation of the standard error and in indexing of the critical t value.
Santer et al (JGR in press) turns up in JGR 105. It describes the AR1 saying:
The model that we use here is simple and has considerable empirical justification based on results from extensive stochastic simulations (D. Nychka et al., manuscript in preparation, 2000).
I have been unable to locate any publication Nychka et al,… which fits the bill. If there was no subsequent publication, this is academic check kiting worthy of Ammann and Wahl. They acknowldged Nychka as a consultant in Wahl and Ammann 2006. Perhaps that was one of the things that they consulted Nychka on. More to the point, surely Santer et al could have located some third party statistical reference.
Santer et al (JGR 2000) state:
There are various ways of accounting for temporal autocorrelation in e(t) [see, e.g., Wigley and Jones, 1981; Bloomfield and Nychka, 1992; Wilks, 1995; Ebisuzaki, 1997; Bretherton et al., 1999]. The simplest way [Bartlett, 1935; Mitchell et al., 1966] uses an effective sample size based on , the lag-1 autocorrelation coefficient of e(t):
By substituting the estimated effective sample size n_eff for n in (4), one obtains “adjusted” estimates of the standard deviation of regression residuals and hence of the standard error and t ratio.
Bartlett is a famous statistician, although 1935 was early in his career, and one would like to see a more up-to-date statistical authority. Bartlett 1935 does not support the citation and arguably says the opposite:
First, there is no objection to our using the usual statistical tests as a preliminary measure. If coefficients are quite insignificant on these tests, there does not seem to be much point considering them further. Secondly, it a correlation coefficient appears significant, the extent to which the necessary conditions for a valid test appear to be fulfilled in the problem under consideration should be clearly stated. It should be noted that the complete independence of observations of one series is sufficient for a test to be valid… If neither series is random, no valid test can be recommended for it is not likely that the dependence of the observations can be specified in any satisfactory statistical way.
So I guess the authority for this procedure is a WMO technical report.