I’ve posted in the past on the mystery of MBH confidence interval calculations, especially the mysterious MBH99 confidence intervals (another Caramilk secret). In our NAS panel presentation and perhaps before, I’d speculated that MBH98 confidence intervals, rotundly described in MBH98 as “self-consistently estimated” were nothing other than twice the standard error of the (overfitted) calibration period. Reader Jean S, a post-doc in statistics, has sent in a very pretty proof of this.
Based on this proof and a couple of other comments from Jean S, we’ve corresponded back and forth on the MBH99 confidence interval mystery and have reduced the mystery to a few elements, where we invite new ideas.
I’ll review the bidding first.
“The reconstructions have been demonstrated to be unbiased back in time, as the uncalibrated variance during the 20th century calibration period was shown to be consistent with a normal distribution (Figure 5) and with a white noise spectrum. Unbiased self-consistent estimates of the uncertainties in the reconstructions were consequently available based on the residual variance uncalibrated by increasingly sparse multiproxy networks back in time [this was shown to hold up for reconstructions back to about 1600."
In contrast to MBH98 where uncertainties were self-consistently estimated based on the observation of Gaussian residuals, we here take account of the spectrum of unresolved variance, separately treating unresolved components of variance in the secular (longer than the 79 year calibration interval in this case) and higher-frequency bands. To be conservative, we take into account the slight, though statistically insignificant inflation of unresolved secular variance for the post-AD 1600 reconstructions. This procedure yields composite uncertainties that are moderately larger than those estimated by MBH98, though none of the primary conclusions therein are altered.
Poor Jean S almost gagged on Mannian prose. (Interestingly, when the MBH99 "correction" to the confidence intervals was done, Mann did not notify Nature and issue a corrigendum at Nature. In fact, if you go to the 2004 Corrigendum, you will see that the MBH98 confidence intervals are re-iterated even though they were supposedly re-calculated in MBH99).
Last year, I showed the difference between the estimates in the two attempts in the following graphic.
Original Caption: Figure 1. MBH98 and MBH99 one-sigma by calculation step. Cyan - MBH98; salmon - MBH99. Solid black - CRU std dev; dashed red -"sparse" std. dev.
At the time, I presumed that the differnce was connected to this sentence about "separately treating unresolved components of variance in the secular (longer than the 79 year calibration interval in this case) and higher-frequency bands.", but was then (and still am) unable to decode the rotund and uninformative language.
I re-visited the topic in December when I noted a similar phrase in Rutherford et al 2005 and surveyed some rotund Mannian literature such as Mann and Lees. This post is a handy reference for original quotations. We referred to confidence interval issues at length in our NAS panel presentation as follows:
Confidence intervals in MBH98 (to which the term "Å“self-consistent"'? is applied) are, as we understand it, calculated simply as twice the standard error from calibration period residuals. If there is overfitting (or spurious regression) in the calibration period, as appears almost certain, then calibration period residuals are likely to provide an extremely biased and over-confident estimate of confidence intervals.
For a sui generis procedure with little knowledge of its statistical properties, at a minimum, it seems to us that confidence intervals should be calculated from the verification period residuals -" a procedure which is used in Mann and Rutherford . In this case, given that the verification r2 for the early steps is ~0, this procedure would, of course, have led to very wide confidence intervals and little to no reduction from natural variability, hence a complete inability to assess the statistical significance of warmth in the 1990s.
MBH99 acknowledged that there was significant low-frequency content in the spectrum of residuals i.e. highly autocorrelated residuals. Since at least Granger and Newbold , econometricians have interpreted autocorrelated residuals as evidence of a misspecification. Instead, MBH99 purported to adjust the confidence interval calculations. However, no statistical reference is provided for this calculation. Neither we nor a time series specialist who we consulted on this matter have been able to figure out how this calculation was done.The use of calibration period residuals to estimate confidence intervals is followed in other multiproxy studies. In all cases, we see evidence of spurious relationships in the calibration period with serious out-of-sample behavior, raising in every case the spectre of over-optimistic estimation of the success of the reconstruction.
Jean S. re-opened the matter by sending me the following graph (slightly redrawn here by me) showing a link between MBH98 confidence intervals in each step and the calibration r^2 statistic (described by Mann as the calibration beta statistic). Jean S estimated the calibration sigma using the archived calibration r^s statistics using the formula:
sigma.hat = sqrt (1- r^2 [calibration]) * var (instrumental) )
MBH confidence interval – black – archived; red- emulated from archived R2 statistic.
The instrumental, MBH98 reconstruction and MBH98 sigmas can be located in the following data set. Link (mirrored at WDCP and Nature). The r^2 [calibration] can be picked up here (formerly at the Nature SI, but now deleted there), where it is described as a “calibration beta” statistic. Now I’d figured out that this was a calibration r^2 statistic quite a while ago, but Jean S had a number of expletives for Mannian terminology and had to do his own detective work in the matter. I’ve cited a couple of his references below.
Since calibration residuals are used by Mann both to calculate calibration r^2 and 2-sigma confidence intervals, the connection between the two measures is what you expect. Since there is limited available (unspliced) detailed information on the individual MBH98 steps (the stepwise reconstructions still unarchived after all this commotion!!), each little bit of information on the steps is interesting and this was a nice use of the calibration r2 statistic.
The discrepancies in the graph are intriguing. Why is there a step at 1650 in the CI data set but not the r2 dataset? Does this pertain to an unreported AD1650 step? There’s other evidence of a 1650 step – one of the archived Reconstructed Principal Components (spliced) starts in 1650. So it’s quite possible that there’s an undocumented step. Does the archived information reflect results from two different runs – one with a 1650 step and one without a 1650 step? This also looks likely. Or maybe the reporting of one result was inaccurate. Hey, it’s the Hockey Team.
A similar situation arises with the period from 1750-1800. The r2 information shows 3 steps in this period, but the CI information shows one step. Did the CI calculation not use all the actual steps? Or were there different runs? Again, it’s impossible to tell. It’s the Hockey Team. There’s an odd little wrinkle in the 15th century, with an extra little unexpected bump as well.
A point that I made before, but still unresolved is: why do the confidence intervals INCREASE at certain steps with the addition of more proxies. Doesn’t that indicate that the new proxies have negative information? This would affect the AD1450 step where there’s a slight increase; and both at 1700 and 1750.
With MBH98, at least it was possible to guess what they were doing. Now to MBH99 and another Caramilk secret. Aside from any details, the whole MBH99 confidence interval estimation process seems nutty. Autocorrelated residuals in econometrics are a sign of mis-specification. Mann uses the same information (which he calls low-frequency) to bump the confidence intervals up. While the calculation of the bump remains obscure, the point and validity of such a process is also far from obvious. No statistical reference is given in MBH99 for the procedure; I’ve looked diligently and have been unable to find anything remotely close. Suggestions are welcomed!!
You can download the MBH99 reconstruction with confidence interval data here Two columns are labelled “ignore”. So let’s start with them. Remember how interesting are Mann’s CENSORED files.
First, if you compare the column MBH99$ignore2 to the MBH98$sigma (confidence interval version), the range is between 0.8123908-0.8123998. So these two are directly related. Why the ratio? Who knows? Jean S observes that this is close to sqrt(0.66) if that’s of any help.
If you compare MBH99$ignore2 to the MBH98$sigma (r2 version), you have a much wider range from 0.7435683-0.8814485. So MBH99$ignore2 is obtained from the MBH98 sigma somehow. Using this raito and working backwards, we can derive the unreported calibration r^2 for the MBH99 first step at 0.39: is this significant? Well, if use 12 regressors to predict a series 79 years long with autocorrelation, I doubt it (but that’s a story for another day.) This is NOT the verification r^2, which will probably be about 0 for the AD1000 step as with the other steps, but I haven’t done the MBH99 calculations yet.
So MBH99$ignore2 relates to MBH98 – what are the other columns? If you take the ratio of the MBH99$sigma to the MBH98$sigma, then there are only two “adjustments” – one for the period from 1400-1600 and one after 1600. The ratios are 1.187 and 1.643 respectively. Where do these come from? Who knows? I did this originally for the MBH98 comparison; Jean S responded that you could apply the above relationship between MBH98$sigma and the MBH99$ignore2 to extend this back to the first step. Using the constant, we get a ratio of 1.58 for the 1000-1399 step. These three values have something to do with the spectrum calculations of Mann and Less 1995, but what?
Top: black- MBH99 sigma; red – MBH98 sigma; bottom ratio of MBH99 sigma to MBH98 sigma (using the ignore2 to extend to 1000-1399).
A few comments from Jean S, as a statistics post-doc:
“these “climate scientists” seem to be a light year behind from my field in terms of understanding and using statistics, and their terminology is weird…
“what they are doing just does not make too much real sense (in the meaning of mathematics or statistics)… nor do I approve the thing.
So I guess the “mask” is a complete ad-hoc, which is of course impossible to figure out. See what they sey in MBH99, they don’t give any hint how they “take into account” different things (this usually means that procedure is completely ad-hoc). Also they say “to be conservative” which usually refers to some kind of ad-hoc number selection.
By the way, you should show all statisticians you happen to talk to Mann’s phrase “robustly estimated median” from caption~2 in MBH99. It must be one of the most unprecedentedly ;) stupid phrases ever published in a scientific journal. Exactly this type of phrases I see from our under-grads with great ego but little understanding.
Update April 27: : Jean S has emailed me to point out that the following holds exactly:
sum(MBH99$ignore1 ^2) + sum (MBH99$ignore2^2) = sum(MBH99$sigma^2)
So we have an orthogonal decomposition. Jean S proposes that this has something to do win the Mannian distinctive of “secular” frequencies. MBH99$ignore2^2 is almost exactly equal to (2/3)^2 * MBH98$sigma^2 in the overlap and is thus the standard error of the residuals weighted by 2/3 (or some high-frequency subset.) So it looks like some other series is weighted by 1/3 to get MBH99$ignore1. Ideas welcome. (End update).
Update: October 5, 2006
Jean S has observed that the MBH99 preprint (but not the final version) contained a graphic of residuals said to be from the AD1820 and AD1000 networks (though as noted below, this may not be correct. Jean S’ digitization of the residuals is here AD1000 AD1820 . He used the Matlab routine for calculating MTM spectra script here , digital versions of spectra here AD1000 AD1820 . His emulation of MBH99 Figure 2 is here
For comparison, the corresponding figure from MBH99 preprint is shown here.
Some new references: