Briffa et al [Holocene 2002] describes a procedure for estimating confidence intervals that is applied in Jones et al [2001] as well. Bloomfield (pers. comm.) is cited as a source, although it’s inconceivable that he was aware of the Briffa truncation. I don’t see how you can claim precise confidence intervals when your post-1960 completely falls apart. But here is Briffa’s argument (any comments welcome; Im going to post up a puzzling excerpt from MBH99 as well):

**Briffa et al [Holocene 2002]
Appendix: computing uncertainty ranges for reconstructions**

The estimates of regional growing-season temperature constructed by simple linear regression (shown in Figures 11 and 12) include uncertainty ranges in terms of +- 1 and +- 2 standard errors about each estimated value. Each error range is based upon the uncertainty in the regression coefficients (the intercept, a, and the gradient, b) and the residual temperature variance (s^2) that is not captured by the regression-based reconstruction. The latter is typically the largest source of error in the reconstructions presented here, though the other terms can be non-negligible. The standard errors of the regression coefficients (SEa and SEb) are readily obtained (see, e.g., equations 1.4.5 and 1.4.2 of Draper and Smith, 1981). Given a time series of the predictor, x(t), then the predicted temperature is simply

yË’€ (t) = a + x(t) * b (A1)

where the Ë’€ indicates that this is a prediction of the actual temperature, y. When the residuals from the regression are not autocorrelated, then the standard error of this prediction, SEy, is given by:

[SEy(t)]^2 = s^2 + SEa^2+ [x(t) - mean(x)]^2 * SEb^2 (A2)

where mean(x) is the mean of the predictors over the calibration period.

The uncertainty ranges are in fact timescale-dependent, and therefore need to be computed for each timescale at which the results are presented. Equation A2 is valid for the raw reconstruction, but in general a reconstruction might be presented after applying a smoothing function, replacing the value at time t by àŽ⡨i) w[i]*y^[t-i] where the weights, w[i], sum to one. In Figures 11 and 12, we use a Gaussian-shaped smoothing operator. For the smoothed reconstruction, the standard error must be modified (Peter Bloomfield, personal communication) to give

[SEy(t)]^2 = s^2 * àŽ⡠w[i] ^2 + SEb^2 * { àŽ⡠ w[i] * (x[t-i]-mean(x)) }^2 (A3)

(comparing equations A2 and A3, note that the predictor anomaly is replaced by its smoothed value in the final term, but the more important change is that the first term is now scaled by the sum of squared weights, which is always 1 and the uncertainty range is increased. For the case of a smoothed reconstruction, the first term on the right-hand side of equation A3 is multiplied by the same factor.

## 4 Comments

I am presuming that all the equations are quoted stuff. (Is the article available online?)

The forecast error (equation A2) looks funny. I haven’t done the algebra to see if it is right, but here is what the simple two variable linear regression forecast error should be:

Var(Y(T+1)) = Var(e) + Var(a^)

+ 2*X(T+1)*Cov(a^*b^)

+ Var(b^)*X(T+1)^2

which simplifies to

= Var(e)*{1+1/T+(X(T+1)-X_)^2/Sum(X(t)-X_)^2 }

where T+1 is the forecast period, e is the error term of the equation, a^ and b^ are the estimators of the intercept and slope respectively, and X_ is the mean of X.

Note: this presumes the X’s are known for the forecast period. There is another term when they have to be predicted (which makes sense since there is then an error in the X(T+1)^).

The basic idea behind A3 is correct. The smoothed estimate, provided that the smoothing is a linear operator, should give a kind of weighted average of parts of A2. But again I haven’t done the expectation algebra.

Finally, for the last point, I am perplexed. Normally one thinks of forecasting with serially correlated errors as one of few areas where the serial correlation helps. Think about it for a second. We have an estimate of the errors in the sample and an estimate of the serial correlation. Thus, we use those two parts to make an estimate of the error terms in the forecast period and consequently reduce the forecast error. (However, I am sympathetic to the simplification because figuring out the distribution of Y(T+1)^ with an ARMA model should be a mess. Maybe Bloomfield can cite it from memory.)

Anyway the whole forecast error needs to done with a measurement error in temperature (unless it is the dependent) and some kind of estimate of the effect of estimation when the omitted variables are relatively unimportant but forecasting when they presumably are important. (The proxies are picked for working well, which means that the omitted variables in the relationship didn’t mess things up, but that assumptions can’t be made for the forecast/backcast period. That effect is a function of the ratio of the variance of the omitted in the estimation period to that in forecast period… I think.)

Query, what is meant by confidence intervals falling apart? Does that mean getting wider?

I had some of the same questions Martin raised. Also — based on the words but not the equations — it sounds as though Briffa is computing the “standard error of prediction” assuming that the errors are in measurements rather than associated with variability in the underlying system. If so, why are the errors autocorrelated? If not, what does it mean to compute a confidence interval around an accurately measured observation?

The autocorrelation modification factor àŽⱠtakes into account uncertainty due to short term persistance (STP) but we can also do the same thing using long term persistance (LTP). The difference can be seen intuitively by looking at the profile of the ACFs. If all of the autocorrelation structure was explained by the lag 1 correlation, such as a Markov, the ACF would decay exponentially at greater lags. But the ACF of natural series decay more slowly, more like a power curve, leaving significant correlations at greater lags. This is because there is autocorrelation at all time scales, not just the arbitrary scale chosen for the measurements.

A simple modification factor approach to incorporate LTP in estimating mean values uses the Hurst exponent H in the usual equation for the standard error (SE) which for IID errors is the standard deviation divided by the square root of the number of observations n. Instead of SE(IID)=sd/sqrt(n) we write SE(LTP)=sd/n^(1-H). This makes sense as for a series with IID errors, as H=0.5 and SE(LTP) becomes SE(IID). Also, in the case of a random walk, H=1.0 so SE(LTP)=sd for all n. This makes sense too, as given a random walk is not stationary, and has no mean value, the standard deviation of the calculated mean is constant no-matter how many observations you make. Real series are somewhere between these two extremes, as increasing the number of observations does reduce the sd of the mean, but less slowly than if they were IID.

At http://landscape.sdsc.edu/~davids/enm/?p=32 I go into this and give the references to the work of Demetris Koutsoyiannis on which this is based. There I worked out H for a number of reconstructions supplied by Steve. Using the realclimate approved range method, and acknowledging the difficulty of estimating H accurately, the H ranges from 0.84 to 0.97, and using other results from there, this gives a modification factor àŽⱠof 3 to 5 times. That is, the confidence intervals would be 3 to 5 times greater than under IID errors. Comparing this with the modification factor of Bloomfield, the lag 1 ACF of the series (also on that page) is between 0.5 and almost 1. This gives a àŽⱠbetween 3 and a large number by his equation. Either way its a large contribution to uncertainty.

I really appreciate the comments here. Thanks everyone. (David, I apologize for not participating so far at your blog but I will start doing so soon.)

I’ve got a pdf of the Briffa article which I’ll send to anyone interested (I’ve already sent to Martin.) I transliterated the equations a little since I don’t know how to do subscripts. I just thought of something – I’ll clip the equations as gif,s from the article and insert them. That will be better. The article itself is 2.8 MB and provides little relevant past the appendix.

If you think back to Granger and Newbold [1974], they interpret autocorrelated residuals as a sign of model mis-specification and spurious regression. This approach obviously isn’t followed in paleoclimate. For example, in MBH99 where I posted up the relevant section, Mann reports “low-frequency” in the residuals – which is simply an alternative description of autocorrelated residuals. However,instead of concluding misspecification, he purports to adjust for this. There is not statistical reference in MBH99 for this procedure and I am totally baffled as to what he did. I asked Terrence Mills for an opinion and he was baffled as well. I must remember to ask Nychka or Bloomfield for an explanation as they are both very experienced in frequency domain stuff.

I’m really suspicious of the conclusion that the low-frequency error is hugely diminished from the high-frequency error for related reasons to David. Think back to Percival’s article that I cied last summer [insert link] on estimating standard deviations in autocorrelated series. All Briffa is doing is synchronizing portions of two autocorrelated series by, in effect, re-scaling each of them. If the residuals are autocorrelated, I find it hard to see how the results are worth anything. If you then smooth the series, you quickly end up with about only 5-6 “effective” measurements and confidence intervals as big as a room.