Re: Geoff Sherrington (#24), Okay, point taken. We need *better* data.

It seems to me important that a valid and reasonably accurate method be used to estimate confidence intervals for trends in temperature etc. time series, and I very much support and applaud Steve’s efforts in this posting. Even where OLS confidence intervals are adjusted for lag-1 autocorrelation – which Santer did in his 2008 paper – they may well be inadequate, at least where there is long term memory in the time series (that is, it has a significantly positive fractional integration parameter d).

I had been working mainly on Steig’s antarctic temperature reconstruction, for which he quotes a continent-wide trend from 1957 to 2006 of 0.12 +/- 0.07 degrees/decade. Running Steve’s script on the continent-wide average of Steig’s main reconstruction gives the following result:

………… coef ….. logLik … AIC …… BIC ……. cil95 .. ciu95 . cil90 .. ciu90 . ci95

ols…….. 0.1178 -942.1814 1886.363 1890.760 0.0532 0.1824 0.0636 0.1720 0.0646

ar1……. 0.1156 -910.1672 1822.334 1826.731 0.0253 0.2058 0.0400 0.1911 0.0902

arm11_1 0.1150 -909.2241 1820.448 1824.845 0.0174 0.2119 0.0336 0.1960 0.0972

fracdiff.. 0.1160 -910.0012 1822.003 1826.399 -0.0219 0.2507 0.0039 0.2258 0.1363

As pointed out by Hu McCulloch in his 26 February 2009 post, Steig failed to correct his confidence intervals for serial correlation. In this case, there is significant long term memory (d= 0.149 per fracdiff, and AR1=0.157), as is quite commonly the case for natural phenomena, and using the results given by fracdiff rather than ar1 seems appropriate. On that basis, the true 95% confidence interval seems to be about double that given by Steig, and has the implication that the trend in his reconstruction is, contrary to his claims (and to the result just correcting for AR1), not significantly different from zero at a 95% confidence level.

Whilst I would not claim to be a statistics expert, I would like to query the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) figures that Steve’s script produces. These are criteria for choosing (by minimisation thereof) between different models with different numbers of parameters. AIC and BIC both equal -2*log(maximized likelihood) plus a penalty depending on the number k of free parameters to be estimated (2k for AIC, k*log(no. of observations) for BIC). AIC and BIC as produced by Steve’s script evidently all use k=1, corresponding to the only free parameter being the slope that is being estimated. However, the various models used have differing numbers of free parameters: ols has zero; ar1 has one; arm11_1 has two; and fracdiff has two (as it has been set to use 1 AR and no MA parameters). I have ignored the constraining of the intercept to be zero (by centering the data), since that has the same effect for all models.

For MSU T2LT (which has 365 observations) the Steve’s R script gives me respectively AIC and BIC values for the four models as follows: ols 109.10 and 113.00; ar1 -398.49 and -394.59; arm11_1 -399.12 and -395.22; and fracdiff -398.01 and -394.11. Corrected for the varying numbers of free parameters, the AIC and BIC figures are: ols 109.10 and 113.00; ar1 -396.49 and -388.69; arm11_1 -395.12 and -383.42; and fracdiff -394.01 and -382.31. So ar1 actually fits the data best, not ar11_1 as per the raw AICs and BICs, with ols being by far the worst fit. That would make sense, as the MSU T2LT series has very high lag 1 autocorrelation (ar1=0.87), but almost no long term memory (d=0.000046 per fracdiff with nar=1, nma=0).

For some reason the values of all model outputs I obtain for MSU T2LT are a bit different from Steve’s; marginally so except for fracdiff, where the profile likelihood slope estimate is 0.045 whereas ar1 and ar11_1 both give 0.052 and ols gives 0.053. It makes no sense to me for fracdiff to give a different slope estimate when d is so close to zero. I have replicated this problem with simulated data sets, and as I am also getting lots of fracdiff warning messages “unable to compute correlation matrix” I am wondering if there might be some problem with the Windows version of fracdiff. More probably, I am doing something wrong. May I ask if anyone else has experienced this problem?

]]>Re: Andrew (#23),

Suggested rephrase is “collect more accurate data”. More data, by iself, is a diminishing return and after a while provides negligible gain unless it uncovers a new mechanism.

]]>Re: lucia (#22), I knew there was a stats term I was looking for. So basically, at this point there is nothing for it but to collect more data. Disappointing, perhaps, but I guess that’s what you get.

]]>Re: Andrew (#4),

Ok. Yes, when the CI’s are wide, faiure to reject a null often means very little. If you select some other rival hypothesis, you can compute the statistical power. If that power is low, failure to reject the null means you just don’t have enough data to select among the possible rival hypotheses.

Re: David Wright (#15),

You’re absolutely right. The autocorrelation structure of the time series determines the significance of the LLS trend. As one gets further and further away from the Dirac delta structure of white noise, the whole concept of a constant trend gets wobbly.

In time series with strong oscillatory components, such as most geophysical series incl. temperature, there simply is no constant trend. Instead, one gets an oscillatory metric, strongly dependent on the time-length of the LLS calculation. In fact, computed on a moving fixed-length basis, the “trend” proves to be a crudely low-passed version of the local slope of the data, i.e., a band-pass filter with suboptimal frequency response characteristics. That’s why LLS “trend analysis’ finds no place in bona fide time-series analysis when the acf cannot be properly specified by a simple model. The idea of projecting such trends into the future–the stock in trade of alarmists–is plainly ludicrous.

]]>I do have a question. I have noticed in calculating time series trends that the serial correlation of the regressed residuals can have a relatively large AR1, but one that is not statistically significant. Should one make a correction for an AR1, or ARn or ARMA for that matter, when the values cannot be shown to be significant. Or does one look at the with and wihout correction models and determine if the difference is significant?

]]>Re: Hu McCulloch (#14),

Hu, thanks for the comments. It looks like it’s pretty easy to do this from first principles rather than through the prism of somebody else’s wrapper function.

If Model= mle2(ols.lik, msu[,”Trpcs”]), I’ve confirmed that, as you surmised, the profile column of profile(Model) is the square root of half the abs (log likelihood – optimum(logLik) ). This is probably not an unreasonable device for keeping the plots comparable. PArabolizing all these plots would not make them more readable. Bolker’s function has figured out reasonable bounds for the profile – which saves figuring it out for the graphic.

I very much appreciate your comment here, as the plots were yielding useful looking results, but I hadn’t quite figured out why – and your comment resolved that.

]]>Re: Hu McCulloch (#16),

I took a look at your linked paper. I haven’t had a chance to go through it carefully yet, but it certainly looks interesting. Basically, it looks like you your underlying model is a straight-line increase, with superimposed noise that — instead of being Gaussian and independent, which would lead to a standard linear chi-squared fitting rule — is serially correlated. You are able to derive an estimator of the slope of the underlying model under this assumption. (It looks like you don’t even require any assumptions about the distribution of the noise, which surprises me. I’ll read more carefully with an eye to trying to understand that.)

What I would point out in the context of my remarks above is that other models are possible. For example, one might pick as an alternative model a biased random walk. Such a model natrually produces serial correlations in the deviation from the underlying trendline, and thus would produce output superficially similiar to the output of yours. Nonetheless it would be a distinct model — at least as far as in know, in the absense of a proof that they are equivilent — and there would be no mathematical guarantee that its best-fit trend parameter would equal the best-fit trend parameter of your model.

We could compare these models based on the quality of the fit, or make hand-waving arguments about which one is closer to the underlying physics, but there is no a priori statistical reason to believe that one’s trend parameter measures the “true trend” while the other does not. What we really want, though, is a model derived from the underlying physics. Such a model would have all sorts of physical parameters, but no “temperature trend” parameter at all — the temperature evolution would simply arise dynamically from the physics, given those parameter values.

]]>