Remember the argument of Gavin and Rasmus that you shouldn’t use empirical temperature data to calibrate ARIMA models to provide null distributions, which they used against Cohn and Lins. I’m not sure that the argument is valid, but if it is, then they should not have selectively used it against Cohn and Lins, since Mann does exactly the same thing in MBH98.
Here’s what Ramus said in his post:
When ARIMA-type models are calibrated on empirical data to provide a null-distribution which is used to test the same data, then the design of the test is likely to be seriously flawed. To re-iterate, since the question is whether the observed trend is significant or not, we cannot derive a null-distribution using statistical models trained on the same data that contain the trend we want to assess.
He re-iterated the point in a number of comments as follows:
I think that the use or ARIMA-type models, calibrated on the trended series itself, will not provide a good test for the null-hypothesis, since you a priori do not know whether the process you examine is a ‘null-process’. The risk is that the test is already biased by using the empirical data both for testing as well as tuning the statistical models used to represent the null-process…. ARIMA-type models do not contain any physics, one never knows if the ARIMA-type models really are representative or just seeming to be so, but I agree that they are convenient tools when we have nothing else. The question is not about using ARIMA-type models or not, but what conclusions you really can infer from them. link
I do not believe that statistical models are appropriate because i: they are used to test a null-hypothesis where no antropogenic forcing (of just solar volcanoes) is assumed, ii) they are trained on empirical data subject to forcings (be it anthopogenic as well as solar/volcanic). link
If you use ARIMA-type models tuned to mimic the past, then the effect of changes in forcing is part of the null-process. link
Gavin chipped in with a very similar comment:
The ‘problem’ such as it is with Cohn and Lins conclusions (not their methodology) is the idea that you can derive the LTP behaviour of the unforced system purely from the data. This is not the case, since the observed data clearly contain signals of both natural and anthropogenic forcings. Those forcings inter alia impart LTP into the data. The models’ attribution of the trends to the forcings depends not on the observed LTP, but on the ‘background’ LTP (in the unforced system). Rasmus’ point is that the best estimate of that is probably from a physically-based model – which nonetheless needs to be validated. That validation can come from comparing the LTP behaviour in models with forcing and the observations. Judging from preliminary analyses of the IPCC AR4 models (Stone et al, op cit), the data and models seem to have similar power law behaviour, but obviously more work is needed to assess that in greater detail. What is not a good idea is to use the observed data (with the 20th Century trends) to estimate the natural LTP and then calculate the likelhood of the observed trend with the null hypothesis of that LTP structure. This ‘purely statistical’ approach is somewhat circular . link .
Amusingly, Mann in MBH98 used an ARMA model (AR1) to do exactly what Rasmus and Gavin object to in Cohn and Lins. Here’s how they described their calculation of null distributions: they estimated AR1 coefficients (an ARMA (1,0) model) and applied these for estimating null distributions.
We test the significance of the correlation coefficients (r) relative to a null hypothesis of random correlation arising from natural climate variability, taking into account the reduced degrees of freedom in the correlations owing to substantial trends and low frequency variability in the NH series. The reduced degrees of freedom are modelled in terms of first-order markovian “Åred noise’ correlation structure of the data series, described by the lag-one autocorrelation coefficient à?à during a 200-year window… We use Monte Carlo simulations to estimate the likelihood of chance spurious correlations of such serially correlated noise with each of the three actual forcing series…
Significance levels for àÅ½àⰠwere estimated byMonte Carlo simulations, also taking serial correlation into account. Serial correlation is assumed to follow from the null model of AR(1) red noise, and degrees of freedom are estimated based on the lag-one autocorrelation coefficients (r) for the two series being compared.
I’m not saying that the method is or isn’t any good – only that, if using the empirical data to estimate a dull distribution is no good for Cohn and Lins, it’s no good for MBH98 (and there’s no a priori reason why an AR1 model is OK and a more sophisticated model is not. As so often, the Hockey Team is not merely sucking and blowing, but they are sucking and blowing out of every major orifice simultaneously – sort of a one-mann band.