The first step in the J98 procedure is the standardization of all series based on 1901-1950 and then taking an average. I’m not a big fan of short-period standard deviations (not just me, but see also Trenberth ). All of these series are at least 300 years long, so there’s no need for 50-year standard deviations. Nonetheless, for replication purposes, this was done. Visually, these seem to match pretty much exactly.
Figure 1. Left – Figure 2 from Jones et al ; right – emulation using reconstructed data set.
However, when one compares this simple average to the archived values (WDCP/jones98 archive), the details are different, with the variance in the simple average early portion being observably a bit greater.
Figure 2. Difference between simple average and archived version (WDCP/jones98). Top – smoothed versions; bottom – difference.
The differences are, in part, due to a variance re-scaling procedure based on Briffa and Osborn [Dendrochronologia, 1999]. I have tried all sorts of ways to try to replicate these calculations without any luck. I requested clarification from Jones, but was refused.
One of the problems with reconstructions in which there are varying numbers of proxies and with a weak "signal" is that the variance of the reconstruction ends up being more volatile in periods in which there are fewer proxies, although there is no intrinsic change in variability. This is mostly because of the elementary statistical fact that the variance of an average declines with the number of samples. For example, consider a situation in which there is no signal at all and merely noise. When you take the average of N series of noise (let’s assume that they all have variance à?Æ’^2); then the variance of the average is à?Æ’^2/N. The principal reason for the increase in variance in the early portion of the J98 reconstruction is simply that N has gone down from 10 to 4 (and even 3).
Because there is little reason to believe that the annual variance in the early period was substantially greater than at present, Briffa and Osborn  proposed a variance adjustment methodology (applied here) as follows. They calculate rbar – the average interseries correlation. They assert that the "effective" number n’ of independent series from n series with average intercorrelation of rbar is given by:
n’ = n/(1+(n-1)*rbar)
They than calculate the "effective" number of independent series based on the average intercorrelation rbar when the maximum number (presumably 10 here for the NH) is present in order to get an adjustment factor. The raw average is then deflated by the adjustment factor. The methodology seems a bit weird to me. I find it very offputting when dendrochronologists/paleoclimatologists use their own little statistical recipes and tweaks, the properties of which have not been verified by proper statisticians. Later, Jones purports to calculate confidence intervals, based on other statistical methods. But what does this little tweak do to confidence interval calculations? They don’t discuss the matter; it must do something.
I’m also not sure what happens to the above calculation when rbar approaches 0, as it does here. It’s possible that there is some rounding in the above ad hoc formula and 2nd-term effects may be required. In the Jones NH dataset, for the period 1000-1700, the rbar is 0.014 and for the period 1900-1991 (for non-instrumental series) is only 0.019. These are extraordinarily low values; I haven’t checked to see whether such values are significant relative to red noise. The rbar for 1800-1990 is 0.13. This is low in signal terms, but the difference between the rbar in this period and the rbar over the entire period looks quite material. The difference in rbar certainly suggests to me a serious risk that the series have been cherry-picked for a difference between 19th and 20th century means – why else would there be a difference in rbar?
Briffa and Osborn  set out several different recipes for applying variance adjustment – constant rbar; time-varying rbar. I’ve experimented with different variations without success. The observed variance reduction in J98 is greater than I’ve been able to replicate. From a strictly replication point of view, in the period from 1659-1974, all 10 series are present throughout and one would expect that the "variance adjustment" procedure would not affect the average calculation. However, even this cannot be replicated exactly. The correlation is very close (>0.99), but this is sufficient to permit a maximum absolute difference of over 0.16 in 1740. I can’t figure out why even a simple average can’t be replicated and would welcome any suggestions. If the Jasper version is different, maybe that would account for it. (This difference when converted to deg C) would be about 1 standard error.
The next figure shows an implementation of the variance re-scaling step with a constant rbar. The replication is closer, but still not exact. The maximum absolute difference amounts to over 0.46 in 1176. It looks like the variance is being deflated more in the archived version than according to my emulation of their variance re-scaling. I’ve experimented with changing the variance re-scaling depending on the series represented (also consistent with Briffa and Osborn, 1999) but this doesn’t exactly work either.
Figure 3. Comparison of Archived NH J98 series to emulation using Briffa-Osborn  variance adjustment (constant rbar).
Like MBH98, the exact method can be replicated in major features, but the details remain imponderable to me (and I’ve tried hard and I’m pretty good at this sort of stuff.) For further analysis, I will apply the best replication that I’ve been able to develop.