## Trends using Maximum Likelihood

For a seemingly simple topic, the calculation of trends and their confidence intervals has provoked a lot of commentary (see any number of threads at Lucia’s). Today, I’m presenting what I think is an interesting approach to the problem using maximum likelihood. In a way, the approach builds on work that I did last year implementing Brown and Sundberg profile likelihood methods for proxy calibration. The techniques are very concise and, once stated, very obvious. It seems unlikely that there’s anything original in the approach, but, despite all the commentary on trends, I haven’t seen anything analyzing trends in quite the way that I’ve done here (but I’m probably re-tracing steps done elsewhere).

The following figure is one that I’ve used for looking at various time series, shown here for the UAH tropical T2LT series.

There’s quite a bit of information in this graphic. Four different autocorrelation structures are shown here: none (OLS), AR1, ARMA(1,1) and fracdiff. For each autocorrelation structure, the log likelihood is shown for a profile of different slopes.

The maximum likelihood (minimum log likelihood) is similar for each of the 4 autocorrelation models (though OLS is slightly higher than the others), but the confidence intervals are much wider for AR1 and other schemes. In this example, there isn’t much difference between AR1 and more complicated autocorrelation structures (but there are other cases in which the differences are very noticeable.) I’ve also plotted the Santer T2LT ensemble mean trend, which is well outside the 95% confidence interval (using maximum likelihood) – something that we had previously reported applying the (1-r)/(1+r) degrees of freedom adjustment used in Santer et al. (but they only reported results for obsolete data ending in 1999). Figure 1. Log likelihood for MSU Tropical T2LT, with various autocorrelation models. Note: y-axis should be labelled as |z|.

Here is a corresponding plot for the RSS T2 data. In this case, observations are about halfway between zero and the ensemble mean trend. If someone wishes to argue (a la Santer, Schmidt) that there is no statistically significant difference between observations and the ensemble mean, then they are also obliged to concede that there is no statistically significant difference between observations and zero trend even for RSS TMT. Note: y-axis should be labelled as |z|.

[Added May 7] Here’s a similar plot for annual CRU NH 1850-2008, showing more texture in the differences between the various autocorrelation models. Zero trend (under fracdiff autocorrelation) is not precluded at a 95% level. However, these graphs show more clearly than the bare statistic that this is a two-edged sword: double the observed trend is likewise not precluded at a 85% level. Note: y-axis should be labelled as |z|.

[end update]
Here’s how these plots are done (I’ve placed the method below in a wrapper function). The mle2 function used here is from Ben Bolker’s bbmle package (I suspect that the method could be modified to use the more common mle function).

First, get the MSU tropical T2LT series and then for convenience, center the data and the time series (dividing by 10 to yield changes per decade):

source(“http://data.climateaudit.org/scripts/spaghetti/msu.glb.txt&#8221;)) #returns msu.glb
x=msu[,”Trpcs”]
x=x-mean(x);year=c(time(x))/10;Year=year-mean(year)

The mle2 function works by calculating profile likelihoods; to calculate the log-likelihood of a slope, it requires a function yielding the log-likelihood of the slope given the time series. Here I applied a useful attribute of the arima function in R – that the log-likelihood is one of its attributes. I defined a function ar1.lik as follows (note the minus sign):

ar1.lik< -function(b,y=x,t=Year) -arima(y-b*t, order=c(1,0,0),method="ML")\$loglik

To facilitate comparison with OLS methods, I also used the arima function with order (0,0,0) to reproduce OLS results (double-checking that this surmise was correct). The following calculates the maximum likelihood slope:

Model.ols = mle2(ols.lik, start = list(b=.01),data=list(y=x,t=Year) );
coef(Model.ols)
# 0.05363612

This is precisely the same as OLS results:

lm( x~Year)\$coef
#0.05363613

The bbmle package has a profile function that applies to mle2 output, which has a special-purpose plot attached to it (the style of which I applied in the above graph).

Model.ar1 = mle2(ar1.lik, start = list(b=.01),data=list(y=x,t=Year) );
Profile.ar1=profile(Model.ar1)
plot(Profile.ar1)

In the above graphic combining 4 autocorrelation schemes, I used 4 different likelihood schemes as follows (fracdiff requiring the fracdiff package):

ols.lik< -function(b,y=x,t=Year ) -arima(y-b*t, order=c(0,0,0),method="ML")\$loglik
ar1.lik<-function(b,y=x,t=Year) -arima(y-b*t, order=c(1,0,0),method="ML")\$loglik
arma1_1.lik<-function(b,y=x,t=Year) – arima(y-b*t, order=c(1,0,1),method="ML")\$loglik
fracdiff.lik<-function(b,y=x,t=Year) -fracdiff(y-b*t, nar=1,nma=0)\$log.likelihood

For convenience in making the above graphics, I collected the relevant outputs from the mle2 stages as follows:

profile.lik=function(x,method.lik=ar1.lik) { #x is a time series
x=x-mean(x);year=c(time(x))/10;Year=year-mean(year)
fm=lm(x~Year);
Model = mle2(method.lik, start = list(b= fm\$coef),data=list(y=x,t=Year));
Profile=profile(Model);
profile.lik=list(model=Model,profile=Profile)
profile.lik
}

The models and profiles for the 4 different autocorrelation structures were then collated into one list as follows:

x=msu[,”Trpcs”]
Model=list()
Model\$ols=profile.lik(x,method.lik=ols.lik)
Model\$ar1=profile.lik(x,method.lik=ar1.lik)
Model\$arm11_1=profile.lik(x,method.lik=arma1_1.lik)
Model\$fracdiff=profile.lik(x,method.lik=fracdiff.lik)

I then collected relevant information about the various models into an Info table (which is applied in the plot):

Info=data.frame( sapply (Model,function(A) coef(A\$model) ) );names(Info)=”coef”
row.names(Info)=names(Model)
Info\$logLik= sapply(Model,function(A) logLik(A\$model) )
Info\$AIC= sapply(Model,function(A) AIC(A\$model) )
Info\$BIC= sapply(Model,function(A) BIC(A\$model) )
Info\$cil95=NA;info\$ciu95=NA;
Info[,c(“cil95″,”ciu95”)]= t( sapply(Model, function(A) confint(A\$profile) ) )
Info\$cil90=NA;info\$ciu90=NA;
Info[,c(“cil90″,”ciu90”)]= t( sapply(Model, function(A) confint(A\$profile,level=.9) ) )
Info\$ci95= (Info\$ciu95-Info\$cil95)/2
round(Info,4)
# coef logLik cil95 ciu95 cil90 ciu90 ci95
#ols 0.0531 -54.1221 0.0200 0.0863 0.0253 0.0810 0.0332
#ar1 0.0459 198.5281 -0.0804 0.1698 -0.0577 0.1478 0.1251
#arm11_1 0.0453 198.8339 -0.0867 0.1743 -0.0626 0.1512 0.1305
#fracdiff 0.0430 198.2359 -0.0956 0.1773 -0.0700 0.1531 0.1364

I needed a little information Santer trends (already collated):

names(santer)=c(“item”,”layer”,”trend”,”se”,”sd”,”r1″,”neff”)
row.names(santer)=paste(santer[,1],santer[,2],sep=”_”)
santer=santer[,3:ncol(santer)]

The plot was done easily given the above information. I made a short plot function to simplify repetition with other series:

plotf1=function(Working,info=Info,v0=NA) {
layout(1);par(mar=c(3,4,2,1))
Data=list()
for(i in 1:4) Data[[i]]=as.data.frame(Model[[i]]\$profile) #profile.tmt\$mle.profile[[i]])
xlim0=1.1*range(Data[]\$b)
plot(abs(z)~b,data=Data[],type=”n”,col=4,xlab=””,ylab=””,xlim=xlim0,ylim=c(0,3.5),yaxs=”i”)
col0=c(1,3,5,4)
for (i in 4:1) {
lines(Data[[i]]\$b,abs(Data[[i]]\$z),col=col0[i]);
points(Data[[i]]\$b,abs(Data[[i]]\$z),col=col0[i],pch=19,cex=.7);
f1=approxfun(Data[[i]]\$b,abs(Data[[i]]\$z))
x0= info[i,c(“cil95″,”ciu95″)]; y0=f1(x0)
if(i==2) lines(xy.coords(x0,y0),lty=3,col=”magenta”,lwd=1)
if(i==1) lines(xy.coords(x0,y0),lty=1,col=”magenta”,lwd=2)
x0= info[i,c(“cil90″,”ciu90”)]; y0=f1(x0)
if(i==2) lines(xy.coords(x0,y0),lty=3,col=2,lwd=1)}
if(i==1) lines(xy.coords(x0,y0),lty=1,col=2,lwd=2)
legend(“bottomleft”,fill=c(1,3,5,4,2,”magenta”),legend=c(names(Working),”90% CI”,”95% CI”),cex=.8)
mtext(side=2,line=2,font=2,”Log Likelihood”)
abline(v=0,lty=2)
abline(v=v0,lty=3,col=2) #Santer
}

The first graphic was rendered by the following command:

plotf1(Working=Model,info=Info,v0=santer\$trend)
title(“MSU T2LT 1979-2009″)
text(santer\$trend,.3,pos=1,font=2,”Model Mean Trend”,col=2,cex=0.8)
points( santer\$trend,0,pch=17,cex=1.6,col=2 )

I think that this is a useful way of looking at information on trends. (I doubt that 30 years is a relevant period to test for long-term persistence under fracdiff – for example, there’s been pretty much one PDO for the entire period, but have rendered it here as well for comparison.)

As noted above, the application of likelihood methods to assess trend confidence intervals under different autocorrelation structures seems so obvious and is so easy to implement that one would expect it to be almost routine. And perhaps it is in some circles, but, if it is, it hasn’t been mentioned in any of the recent climate debate on trends.

1. Andrew
Posted May 6, 2009 at 6:25 PM | Permalink

My instinct is that CI’s that embrace zero are meaningless. The make it uncomfortably hard to reject hypotheses.

2. ThomasL
Posted May 6, 2009 at 7:24 PM | Permalink

Nice! It took my a moment to figure out exactly how to read them, but after I did, the information really jumps out.

3. lucia
Posted May 6, 2009 at 9:32 PM | Permalink

Andrew–Why are CI’s that embrace zero necessarily meaningless? There is nothing special about zero in hypothesis test. Of course you can’t reject the hypothesis the trend is zero when the hypothesis test embraces zero, but you can reject hypothesis the trend is outside the CI’s to whatever confidence level computed.

In climate science circles, the ‘consensus’ hypothesis is not zero, it’s ‘about 2 C/century’. If that trends turns out to be wrong, and 0C/century is right, the CI’s will end up embracing zero as we get more and more data. If that happens we would not 0C/century; but we could easily reject 2C/century, provided the confidence intervals don’t embrace 2C/century.

(FWIW: I don’t think the CI’s will end up embracing zero as we collect lots of data. But that doesn’t transform ‘not embracing zero’ into a criteria for meaningful CIs.)

4. Andrew
Posted May 6, 2009 at 9:58 PM | Permalink

My point I guess is that if the CI’s are wide enough to contain trends so clearly distant from the null, then a failur to reject the null is not very impressive. At this point, the null hypothesis in the models is .2 degrees per decade in the tropical mid troposphere, and the 95% interval around RSS’s trend is so wide it permits us to not only retain the null but an obviously contradictory alternative-zilch warming. So “consistency” with models is asininely meaningless, as far as I can see.

• lucia
Posted May 8, 2009 at 1:35 PM | Permalink

Re: Andrew (#4),
Ok. Yes, when the CI’s are wide, faiure to reject a null often means very little. If you select some other rival hypothesis, you can compute the statistical power. If that power is low, failure to reject the null means you just don’t have enough data to select among the possible rival hypotheses.

• Andrew
Posted May 8, 2009 at 2:36 PM | Permalink

Re: lucia (#22), I knew there was a stats term I was looking for. So basically, at this point there is nothing for it but to collect more data. Disappointing, perhaps, but I guess that’s what you get.

• Geoff Sherrington
Posted May 9, 2009 at 2:44 AM | Permalink

Re: Andrew (#23),

Suggested rephrase is “collect more accurate data”. More data, by iself, is a diminishing return and after a while provides negligible gain unless it uncovers a new mechanism.

• Andrew
Posted May 9, 2009 at 1:51 PM | Permalink

Re: Geoff Sherrington (#24), Okay, point taken. We need better data.

5. Ben
Posted May 6, 2009 at 10:28 PM | Permalink

Bravo! This layman was able to follow although i had to dust off some lesser used cells.
Its not clear to me why there is a need to auto-correlate the source data signal at all? I know this is probably stats for nerds 101 question, but i thought if the sample was large enough there is no need to worry about the outliers?

Regards,
Ben

• Steve McIntyre
Posted May 6, 2009 at 11:14 PM | Permalink

Re: Ben (#5),

the sample isn’t large enough to ignore autocorrelation. With things like oceans and glaciers, you have physical things that could theoretically cause autocorrelation on all sorts of scales.

6. John Davis
Posted May 7, 2009 at 2:27 AM | Permalink

I’m just wondering why the “model mean trend” line seems a bit different on the two charts?

Steve: there are different model emsemble means for T2LT and T2 (TMT) levels.

7. TAC
Posted May 7, 2009 at 5:00 AM | Permalink

It is interesting to note how much uncertainty there is the trend estimates. Presumably there is also uncertainty in model estimates — how much? Who knows? 😉 — so perhaps the discrepancy between models and data is in fact statistically insignificant. Does it matter? Experience has taught me to assume models are wrong until proven otherwise, and nothing I’ve seen so far (technical aspects of the models; time-domain and frequency-domain characteristics; calibration issues) has instilled confidence in the models.

In any case, excellent analysis and graphics, Steve. Thanks!

• Steve McIntyre
Posted May 7, 2009 at 6:38 AM | Permalink

Re: TAC (#8),

Most of this particular approoach evolved from a consideration of Cohn and Lins (2005) that I partly carried out about a year ago, but did not post on. .

8. Steve McIntyre
Posted May 7, 2009 at 7:23 AM | Permalink

I’ve added a similar plot for annual CRU NH 1850-2008. It shows more texture in the differences between the various autocorrelation models. Zero trend (under fracdiff autocorrelation) is not precluded at a 95% level. However, these graphs show more clearly than the bare statistic that this is a two-edged sword: double the observed trend is likewise not precluded at a 85% level.

ON many occasions, I’ve observed that ARMA(1,1) is a more plausible model for annual data than AR1 and the log-likelihood of ARMA(1,1) on this data set is noticeable better than AR1. It’s also better than fracdiff, but I’m wary of relying on a comparison fracdiff log-likelihoods and arima log-likelihoods without parsing the respective definitions as these can vary. However, the comparison of AR1 and ARMA(1,1) log-likelihoods are both from the same function and will be reliable.

9. Hu McCulloch
Posted May 7, 2009 at 8:58 AM | Permalink

This is very interesting, but I’m puzzled by your plots — In almost any model, log likelihood is approximately quadratic, with a max (or min of -log L) and zero first derivative at the estimated parameters. Inference is then based on Rao’s observation that the expectation of the Score (the gradient of logL wrt the parameters, evaluated at the true paramters) is zero, while the variance of the score is the Information (minus the expectation of the Hessian or second derivative matrix of logL). The Lagrange Multiplier (aka Rao Score) statistic, Likelihood Ratio stat, and Wald stat are then all asymptotically equivalent, since the second derivatives are nearly constant near the true parameters in large samples, so that a quadratic approximation to logL is valid. A Gaussian or similarly differentiable density guarantees that the derivatives are all continuous as required by the theory.

Yet your functions all come to a sharp point at the estimated slope parameters, so that the derivative is discontinuous and the second derivative undefined at the min.

The only model with a non-continuously-differentiable logL that comes to mind is Least Absolute Deviations, which is equivalent to ML with a Laplace distribution (which behaves like exp(-abs((x-mu))). This produces a discontinuity of the derivative of the density and therefore the log density and logL at mu. This causes problems for the standard ML theory, but even then, when you add over the sample, the kinks occur at different places. The logL is actually piecewise linear with the max occurring at a kink, but if you squint, it looks quadratic, so that most packages for LAD just smooth logL with some kind of filter in order to pretend that it has a second derviative at the max of logL, and thus be able to compute standard errors and do inference. But your logL doesn’t look even like this.

I also think there’s a semantic error in how you phrased the problem:

The maximum likelihood (minimum log likelihood)

The likelihood itself and the log likelihood both have their max at the same parameter values, since the log is a montonic increasing function. The likelihood itself can easily be an underflow or overflow, plus the Rao ML theory is based on logL anyway, so in practice the log likelihood is always used in place of the likelihood itself. But in a Gaussian model, the logL usually ends up being a constant minus a sum of squared errors, so that maximizing logL is equivalent to minimizing the relevant (sometimes transformed) sum of squared errors, turning a max problem into a min problem. I assume from your graphs that what you are plotting is really -logL, not logL itself, and then relative to its value at the min so the min always occurs at 0.

Steve: on your latter point, the plots are minus logL as you observe.

• Steve McIntyre
Posted May 7, 2009 at 9:09 AM | Permalink

Hu, I’m cannibalizing profile output from the bbmle package. I’ll pass the question on to Ben Bolker, the author of the package, who would probably be interested in this.

• Steve McIntyre
Posted May 7, 2009 at 9:26 AM | Permalink

Hmmm, although all the coefficients make sense – and I exactly replicate the OLS results using this approach, I also get some warning diagnostics. I’ll need to run these to ground in case they are contributing to what’s going on here.

Note that the profile here yields |z| – which looks like a possible contributor to a kink.

10. Hu McCulloch
Posted May 7, 2009 at 10:42 AM | Permalink

RE #13,
If you’re plotting “|z|” instead of logL, that would explain my problem.

The likelihood ratio statistic, LR = 2*ΔlogL, has an approximately χ-squared(1) distribution, so that if you plot the actual logL profile, the 95% critical value is at (1/2)(1.96)^2 = 1.92, which I assumed was where your top red lines were.

LR behaves asymptotically like z^2, with the standard error for “z” computed from either the second derivative of logL at the common point estimate (the Wald stat case), or at each null hypothesis (the LM or Rao Score case). So sqrt(LR) looks much like |z|, though because the curvature isn’t exactly constant, it won’t exactly equal either.

But if you are in fact plotting the profile of sqrt(LR), this behaves like |z| and should have the kink you show, with a 95% critical value of 1.96 rather than 1.92 as I had assumed. But then your vertical axis is not (-)logL or even ΔlogL, but rather sqrt(LR) = sqrt(2*ΔlogL). It gives the same info as the actual logL profile, but just needs a different label.

• Kenneth Fritsch
Posted May 7, 2009 at 4:39 PM | Permalink

But if you are in fact plotting the profile of sqrt(LR), this behaves like |z| and should have the kink you show, with a 95% critical value of 1.96 rather than 1.92 as I had assumed. But then your vertical axis is not (-)logL or even ΔlogL, but rather sqrt(LR) = sqrt(2*ΔlogL). It gives the same info as the actual logL profile, but just needs a different label.

Relabeling the y axis could save some future confusions.

• Steve McIntyre
Posted May 8, 2009 at 7:13 AM | Permalink

Hu, thanks for the comments. It looks like it’s pretty easy to do this from first principles rather than through the prism of somebody else’s wrapper function.

If Model= mle2(ols.lik, msu[,”Trpcs”]), I’ve confirmed that, as you surmised, the profile column of profile(Model) is the square root of half the abs (log likelihood – optimum(logLik) ). This is probably not an unreasonable device for keeping the plots comparable. PArabolizing all these plots would not make them more readable. Bolker’s function has figured out reasonable bounds for the profile – which saves figuring it out for the graphic.

I very much appreciate your comment here, as the plots were yielding useful looking results, but I hadn’t quite figured out why – and your comment resolved that.

11. David Wright
Posted May 7, 2009 at 3:13 PM | Permalink

When one gets caught up in methodological esoterica, it’s often helpful to return to statistical first principals. The relevant principal here is: to do a fit, you need a model.

Without a model, there is only one quantitiy that could possibly be deserving of being called “the trend”: the last value, minus the first value, divided by the timespan.

“But wait!” you cry. “What about the slope of a linear fit, or the linear coefficient of a higher order polynomial fit, or the trend after applying a Savitzky-Golay smoothing filter, or, or… And what about the various ways to average spatial data into a time-slice value: straight-up averaging, area-weighted averaging, various 2D interpolations followed by analytic averaging, and, and…”

And that is exactly the point. All of those are effectively different models of the data, and they are all extremely poor ones, because they are not based on the actual underlying processes, but are instead attempts to be ready-made “models” for any and all data, without having to know anything about the underlying processes.

A decent model for temperatures would predict values for temperature as a function of time and location for the relevant measurement area, based on a handfull of underlying parameters. That model, coupled with a model for noise and/or measurement error, would have a straightforward and unambiguous maximum likelyhood fitting procedure yielding best values and uncertainties for its underlying parameters. A different model would likely have a different number of parameters with different meanings. One might sensibly compare the quality of the fit of the two models, but no sensible persion would expect a parameter value from one model to agree with a parameter value from the other model just because the parameters happend to have been given the same name. And none of the parameters of either model need have any intrepretation as what a person might think of as “the trend”.

And when someone hears all this and still responds “yes, but I have this data and I need you to do a complex statistical analysis that will tell me the trend” one should simply take the last value, subtract the first value, divide by the timespan, and send him a big bill.

• John S.
Posted May 8, 2009 at 11:01 AM | Permalink

You’re absolutely right. The autocorrelation structure of the time series determines the significance of the LLS trend. As one gets further and further away from the Dirac delta structure of white noise, the whole concept of a constant trend gets wobbly.

In time series with strong oscillatory components, such as most geophysical series incl. temperature, there simply is no constant trend. Instead, one gets an oscillatory metric, strongly dependent on the time-length of the LLS calculation. In fact, computed on a moving fixed-length basis, the “trend” proves to be a crudely low-passed version of the local slope of the data, i.e., a band-pass filter with suboptimal frequency response characteristics. That’s why LLS “trend analysis’ finds no place in bona fide time-series analysis when the acf cannot be properly specified by a simple model. The idea of projecting such trends into the future–the stock in trade of alarmists–is plainly ludicrous.

12. Hu McCulloch
Posted May 7, 2009 at 3:57 PM | Permalink

RE David Wright, #15,
You make some valid points. For one thing, no one should really believe in a linear time trend model, since on the one hand it predicts unlikely boiling oceans if extrapolated sufficiently far in one direction, and impossible negative Kelvin temperatures in the other direction.

However, if you’re just asking whether temperatures have gotten higher on average or lower over some time period, a linear trend often makes more sense (for that limited period) than a step function that has a discrete break half way through.

For a pure random walk, your measure of the last value minus the first, divided by the time span, is optimal. However, even if the series is non-stationary (“I(1)”), if there is observational error, your measure is no longer optimal. A least squares trend regression isn’t optimal either, but it at least it is probably more robust to specification errors, since it looks at all the data, and not just 2 points.

In fact, the LS trend line slope is just a weighted average of two-point slopes like yours, taking the first and last, the second and next-to-last, etc, and then weighting each pair’s slope by the square of its time span. This gives your pair the biggest weight, but robustly looks at the other pairs as well.

My current preference, in a new paper at http://www.econ.ohio-state.edu/jhm/papers/MomentRatioEstimator.pdf is to estimate a trend robustly by OLS, and then try to correct its standard error for serial correlation. In Steve’s post above, the point estimate of the slope varies a little with the model, so the weights are instead being altered a little to optimally take into account the measured error structure. In the limit of a pure random walk (AR(1) with a unit AR coefficient), his slopes would therefore correspond to yours.

• David Wright
Posted May 8, 2009 at 1:51 AM | Permalink

I took a look at your linked paper. I haven’t had a chance to go through it carefully yet, but it certainly looks interesting. Basically, it looks like you your underlying model is a straight-line increase, with superimposed noise that — instead of being Gaussian and independent, which would lead to a standard linear chi-squared fitting rule — is serially correlated. You are able to derive an estimator of the slope of the underlying model under this assumption. (It looks like you don’t even require any assumptions about the distribution of the noise, which surprises me. I’ll read more carefully with an eye to trying to understand that.)

What I would point out in the context of my remarks above is that other models are possible. For example, one might pick as an alternative model a biased random walk. Such a model natrually produces serial correlations in the deviation from the underlying trendline, and thus would produce output superficially similiar to the output of yours. Nonetheless it would be a distinct model — at least as far as in know, in the absense of a proof that they are equivilent — and there would be no mathematical guarantee that its best-fit trend parameter would equal the best-fit trend parameter of your model.

We could compare these models based on the quality of the fit, or make hand-waving arguments about which one is closer to the underlying physics, but there is no a priori statistical reason to believe that one’s trend parameter measures the “true trend” while the other does not. What we really want, though, is a model derived from the underlying physics. Such a model would have all sorts of physical parameters, but no “temperature trend” parameter at all — the temperature evolution would simply arise dynamically from the physics, given those parameter values.

13. Kenneth Fritsch
Posted May 8, 2009 at 8:16 AM | Permalink

I want to thank both Steve M and Hu M for their efforts on this thread to make the issue of auto correlation and corrections for it more clear to this layperson — and more available through the provided R scripts. It will perhaps make me feel less vulnerable when I make layperson attempts at measuring and evaluating trends. I will nevertheless continue to require the layperson proviso for myself as, at least, a potential statistics abuser.

I do have a question. I have noticed in calculating time series trends that the serial correlation of the regressed residuals can have a relatively large AR1, but one that is not statistically significant. Should one make a correction for an AR1, or ARn or ARMA for that matter, when the values cannot be shown to be significant. Or does one look at the with and wihout correction models and determine if the difference is significant?

14. Nic L
Posted May 9, 2009 at 12:30 PM | Permalink

I had independently been working over the last few days on using fracdiff and profile log likelihoods to estimate slopes and confidence intervals for temperature time series, and then I saw this thread. Needless to say, Steve has written a far more elegant and capable R script than I had produced, so I will bin mine.

It seems to me important that a valid and reasonably accurate method be used to estimate confidence intervals for trends in temperature etc. time series, and I very much support and applaud Steve’s efforts in this posting. Even where OLS confidence intervals are adjusted for lag-1 autocorrelation – which Santer did in his 2008 paper – they may well be inadequate, at least where there is long term memory in the time series (that is, it has a significantly positive fractional integration parameter d).

I had been working mainly on Steig’s antarctic temperature reconstruction, for which he quotes a continent-wide trend from 1957 to 2006 of 0.12 +/- 0.07 degrees/decade. Running Steve’s script on the continent-wide average of Steig’s main reconstruction gives the following result:

………… coef ….. logLik … AIC …… BIC ……. cil95 .. ciu95 . cil90 .. ciu90 . ci95
ols…….. 0.1178 -942.1814 1886.363 1890.760 0.0532 0.1824 0.0636 0.1720 0.0646
ar1……. 0.1156 -910.1672 1822.334 1826.731 0.0253 0.2058 0.0400 0.1911 0.0902
arm11_1 0.1150 -909.2241 1820.448 1824.845 0.0174 0.2119 0.0336 0.1960 0.0972
fracdiff.. 0.1160 -910.0012 1822.003 1826.399 -0.0219 0.2507 0.0039 0.2258 0.1363

As pointed out by Hu McCulloch in his 26 February 2009 post, Steig failed to correct his confidence intervals for serial correlation. In this case, there is significant long term memory (d= 0.149 per fracdiff, and AR1=0.157), as is quite commonly the case for natural phenomena, and using the results given by fracdiff rather than ar1 seems appropriate. On that basis, the true 95% confidence interval seems to be about double that given by Steig, and has the implication that the trend in his reconstruction is, contrary to his claims (and to the result just correcting for AR1), not significantly different from zero at a 95% confidence level.

Whilst I would not claim to be a statistics expert, I would like to query the AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) figures that Steve’s script produces. These are criteria for choosing (by minimisation thereof) between different models with different numbers of parameters. AIC and BIC both equal -2*log(maximized likelihood) plus a penalty depending on the number k of free parameters to be estimated (2k for AIC, k*log(no. of observations) for BIC). AIC and BIC as produced by Steve’s script evidently all use k=1, corresponding to the only free parameter being the slope that is being estimated. However, the various models used have differing numbers of free parameters: ols has zero; ar1 has one; arm11_1 has two; and fracdiff has two (as it has been set to use 1 AR and no MA parameters). I have ignored the constraining of the intercept to be zero (by centering the data), since that has the same effect for all models.

For MSU T2LT (which has 365 observations) the Steve’s R script gives me respectively AIC and BIC values for the four models as follows: ols 109.10 and 113.00; ar1 -398.49 and -394.59; arm11_1 -399.12 and -395.22; and fracdiff -398.01 and -394.11. Corrected for the varying numbers of free parameters, the AIC and BIC figures are: ols 109.10 and 113.00; ar1 -396.49 and -388.69; arm11_1 -395.12 and -383.42; and fracdiff -394.01 and -382.31. So ar1 actually fits the data best, not ar11_1 as per the raw AICs and BICs, with ols being by far the worst fit. That would make sense, as the MSU T2LT series has very high lag 1 autocorrelation (ar1=0.87), but almost no long term memory (d=0.000046 per fracdiff with nar=1, nma=0).

For some reason the values of all model outputs I obtain for MSU T2LT are a bit different from Steve’s; marginally so except for fracdiff, where the profile likelihood slope estimate is 0.045 whereas ar1 and ar11_1 both give 0.052 and ols gives 0.053. It makes no sense to me for fracdiff to give a different slope estimate when d is so close to zero. I have replicated this problem with simulated data sets, and as I am also getting lots of fracdiff warning messages “unable to compute correlation matrix” I am wondering if there might be some problem with the Windows version of fracdiff. More probably, I am doing something wrong. May I ask if anyone else has experienced this problem?

15. Ben Bolker
Posted Mar 8, 2010 at 2:22 PM | Permalink

Just a quick response, don’t know if I ever responded to e-mail … the sqrt(delta-deviance) plotting convention is taken from the original mle(), which in turn inherits it from a lot of other likelihood-profile plotting conventions elsewhere in R (don’t know who started it). The reason that you don’t *want* to “parabolize” the plots is that it is a lot easier to detect deviation from linearity by eye than it is to detect deviation from quadraticity …