A few days ago, I showed that a trivial variation to the Moberg CVM reconstruction led to a very different medieval-modern relationship. Juckes has reported that the Moberg CVM reconstruction is "99.98% significant" – not quite the most significant in a milllll-yun years, but VERY, VERY significant. I thought it would be interesting to see if my variation was also significant and, if so, ponder what that meant for these calculations.
Let’s review what Juckes did. His significance test more or less follows the methodology of a certain Michael Mann in MBH98 by performing a univariate test without considering prior selections or processing. Juckes:
The significance of the correlations between these six proxy data samples and the instrumental temperature data during the calibration period (1856–1980) has been evaluated using a Monte-Carlo simulation with (1) a first order Markov model (e.g. Grinstead and Snell, 1997) with the same 1-year lag correlation as the data samples and (2) random time series which reproduces the lag correlation structure of the data samples (see Appendix B2)….
Table 3. R values for six CVM composites evaluated using the Northern Hemisphere mean temperature (1856 to 1980).
Columns 2 and 3 show R values for the 95% significance levels, evaluated using a Monte Carlo simulation with 10 000 realisations. In columns 2, 7 and 8 the full lag-correlation structure of the data is used, in column 3 a first order Markov approach is used, based on the lag one auto-correlation. Column 4 shows the R value obtained from the data and column 5 shows the same using detrended data. Column 6 shows the standard error (root-mean-square residual) from the calibration period. Columns 7 and 8 show significance levels, estimated using Monte Carlo simulations as in column 2, for the full and detrended R values. The HCA detrended significance is low because the proxies have been smoothed, removing high frequency information.
Now let’s consider the variation of the Moberg CVM obtained by using Yakutia/Indigirka instead of stereotyped Yamal and the Sargasso Sea SST reconstruction (an actual SST proxy) rather than a proxy for Arabian Sea coldwater upwelling (BTW this proxy probably got into circulation in Team-world through a graphic by coauthor Overpeck overlaying the Arabian G bulloides against the HS). As you see, the reconstructions are pretty similar in the 1856-1980 instrumental period, but the medieval-modern relationships are quite different. The correlation for one series is 0.58 and the other is 0.57. So both reconstructions are "99.98% significant" under Juckes’ test, but they are different – and materially different in the MWP.
Left – Juckes CVM; right – CVM with Sargasso Sea and Yakutia/Indigirka instead of Arabian Sea G bulloides and Yamal/
How can they both be 99.98% significant? It doesn’t seem to me that they can. So whatever Juckes has done to benchmark significance, the calculations can’t be right. One of Wegman’s recommendations to climate scientists was that they involve statisticians. Nanne Weber said that Juckes was their statistician; a quick look at Juckes’ publications shows that he has very substantial background and experience in atmospheric turbulence.
How should one establish significance for these reconstructions? Given that two equally plausible reconstructions diverge substantially, I’m not sure that you can. The NAS Panel concluded that you couldn’t establish statistical significance for these things. They also recommended that you use verification period residuals for estimating confidence intervals.
Speaking of which, let’s note that Juckes did NOT reserve a verification period. No RE, no verification r2. In my own experimenting with reconstructions that would be similar to the Juckes CVMs (see for example my AGU05 presentation), the reconstructions have high calibration r2; negligible verification r2. Juckes only used an 1856-1980 calibration period. It’s hard to imagine that he didn’t start with a Mannian 1902-1980 calibration period and 1856-1901 verification. My guess is that his CVMs fail such tests. I’ll take a look at that some time.