The first step in the J98 procedure is the standardization of all series based on 1901-1950 and then taking an average. I’m not a big fan of short-period standard deviations (not just me, but see also Trenberth [1984]). All of these series are at least 300 years long, so there’s no need for 50-year standard deviations. Nonetheless, for replication purposes, this was done. Visually, these seem to match pretty much exactly.

Figure 1. Left – Figure 2 from Jones et al [1998]; right – emulation using reconstructed data set.

However, when one compares this simple average to the archived values (WDCP/jones98 archive), the details are different, with the variance in the simple average early portion being observably a bit greater.

Figure 2. Difference between simple average and archived version (WDCP/jones98). Top – smoothed versions; bottom – difference.

The differences are, in part, due to a variance re-scaling procedure based on Briffa and Osborn [Dendrochronologia, 1999]. I have tried all sorts of ways to try to replicate these calculations without any luck. I requested clarification from Jones, but was refused.

One of the problems with reconstructions in which there are varying numbers of proxies and with a weak "signal" is that the variance of the reconstruction ends up being more volatile in periods in which there are fewer proxies, although there is no intrinsic change in variability. This is mostly because of the elementary statistical fact that the variance of an average declines with the number of samples. For example, consider a situation in which there is no signal at all and merely noise. When you take the average of N series of noise (let’s assume that they all have variance à?Æ’^2); then the variance of the average is à?Æ’^2/N. The principal reason for the increase in variance in the early portion of the J98 reconstruction is simply that N has gone down from 10 to 4 (and even 3).

Because there is little reason to believe that the annual variance in the early period was substantially greater than at present, Briffa and Osborn [1999] proposed a variance adjustment methodology (applied here) as follows. They calculate rbar – the average interseries correlation. They assert that the "effective" number n’ of independent series from n series with average intercorrelation of rbar is given by:

n’ = n/(1+(n-1)*rbar)

They than calculate the "effective" number of independent series based on the average intercorrelation rbar when the maximum number (presumably 10 here for the NH) is present in order to get an adjustment factor. The raw average is then deflated by the adjustment factor. The methodology seems a bit weird to me. I find it very offputting when dendrochronologists/paleoclimatologists use their own little statistical recipes and tweaks, the properties of which have not been verified by proper statisticians. Later, Jones purports to calculate confidence intervals, based on other statistical methods. But what does this little tweak do to confidence interval calculations? They don’t discuss the matter; it must do something.

I’m also not sure what happens to the above calculation when rbar approaches 0, as it does here. It’s possible that there is some rounding in the above ad hoc formula and 2nd-term effects may be required. In the Jones NH dataset, for the period 1000-1700, the rbar is 0.014 and for the period 1900-1991 (for non-instrumental series) is only 0.019. These are extraordinarily low values; I haven’t checked to see whether such values are significant relative to red noise. The rbar for 1800-1990 is 0.13. This is low in signal terms, but the difference between the rbar in this period and the rbar over the entire period looks quite material. The difference in rbar certainly suggests to me a serious risk that the series have been cherry-picked for a difference between 19th and 20th century means – why else would there be a difference in rbar?

Briffa and Osborn [1999] set out several different recipes for applying variance adjustment – constant rbar; time-varying rbar. I’ve experimented with different variations without success. The observed variance reduction in J98 is greater than I’ve been able to replicate. From a strictly replication point of view, in the period from 1659-1974, all 10 series are present throughout and one would expect that the "variance adjustment" procedure would not affect the average calculation. However, even this cannot be replicated exactly. The correlation is very close (>0.99), but this is sufficient to permit a maximum absolute difference of over 0.16 in 1740. I can’t figure out why even a simple average can’t be replicated and would welcome any suggestions. If the Jasper version is different, maybe that would account for it. (This difference when converted to deg C) would be about 1 standard error.

The next figure shows an implementation of the variance re-scaling step with a constant rbar. The replication is closer, but still not exact. The maximum absolute difference amounts to over 0.46 in 1176. It looks like the variance is being deflated more in the archived version than according to my emulation of their variance re-scaling. I’ve experimented with changing the variance re-scaling depending on the series represented (also consistent with Briffa and Osborn, 1999) but this doesn’t exactly work either.

Figure 3. Comparison of Archived NH J98 series to emulation using Briffa-Osborn [1999] variance adjustment (constant rbar).

Like MBH98, the exact method can be replicated in major features, but the details remain imponderable to me (and I’ve tried hard and I’m pretty good at this sort of stuff.) For further analysis, I will apply the best replication that I’ve been able to develop.

## 10 Comments

A. What is the mathematical equivalent of “close match visually”? Did you do a substraction for this case?

B. Not sure what rbar is.

C. Seems (intuitive hunch here, since I don’t follow all the math/background, that J is confusing natural variability with measurement error. For instance, as Kerry/Bush campaign winds down, there is some variation (swing) of the electorate choice. But there are also poll errors, mostly related to how expensive of a survey you do. So if you plot a graph over time and some times were not well surveyed (few people polled or by analogy few proxies) there is large measurement error. This even though actual variability has not changed. But that does not mean that you can wave a wand and reconstrain the variability. It’s just that some of the variability is instrument error.

D. It seems that with the comment about “number of independent series” that they anticipated point C. But it’s not clear to me what they are doing or why they don’t address number of independent series all along. Or why there construction is not understood by Box, Hunter, Hunter or by some pure stats guy.

E. But conversely, if they really do have something, it ought to be published and pushed to other fields that use statistics. Not just the tree-holer dungeon.

F. Wait, if there is 0 interseries correlation, then the series are completely independent and Nprime reduces to n. That makes sense, no?

G. BTW, it seems that what they are doing would be to reduce the number of effective independent series, no? That’s the direction that you should go if some series are dublicates, no? But how does that play into the transform of variability? In your initial discussion, variability is decreased with added sampling, not because of any math done but because that is how sampling works. What does this have to do with some overt adjustment? If anything, they would want to blow up the variability if there is duplication of series but not duplication of all independent series, no? Since there would be false agreement?

H. Also, how do they tell the difference between series that correlate because they are both good measurements and ones that have unwanted duplication?

I. Not clear to me how to tell what a significantly low rbar is. What would you expect from white noise series? And if the rbar is low, does that mean that the some of the proxies must be bad (since they are not correlated to each other, how can they be correlated to the truth?) Same issue as H.

J. “The difference in rbar certainly suggests to me a serious risk that the series have been cherry-picked for a difference between 19th and 20th century means – why else would there be a difference in rbar?” Interesting. Have you found a cherrypick detector? ;-)

K. On the last figure, this goes to a couple points:

a. It’s not possible to replicate the published reports because the methodology is not spelled out (imagine if someone published a math thereom and left out steps…I think this work which is fundamentally statistical, should include all the details of the math.)

b. It’s likely that since they don’t have to share all their details, that there is some kludging, fudging. Not even always of a biased nature. Sometimes of more of a sloppy changing methods halfway through method. I’ve seen this be a tendancy in work that is not well reviewed. For instance patent examples (which can be very cherry picky).

P.s. Impressed that someone actually posted to this?

TCO,

It seems that the more technically difficult a posting is, the fewer comments it gets. Which means someone here could use the comment # as a proxy for technical difficulty and rate Steve’s posts for degree of difficulty. OTOH, maybe it’s all just red noise.

They can be a bit dense at times, too. It still just seems wierd that there are so many damn different analyses and so few papers.

Haven’t you done any actual science, TCO? There are always tons of notebooks with preliminary experiments, then the production of chemicals or lemmas or unique instruments. Then there’s the actual data gathering, analysis, redoing parts of it, producing intermediate results to present in a department seminar and then finally the production of actual papers on the subject. Steve’s doing all this but we just get to be one of the notebooks. So enjoy your paper’s-eye view.

It’s because I have done science and published several papers in a new little micro-field and plotted my little campaign for how I would do so (and had it change, but a warplan is still useful even if it changes during the war) that I know from experience and gefuhl that lots of these analyses should be papers.

Yeah, it’s nice that you read it. As Dave Dardinger says, these are my notebooks on the air. It’s actually a little handy sometimes ot write these up a little more formally, so I don’t lose track of them. Part of the problem with trying to piece data and methods together is that you have to put stuff down when you get stuck and then come back if you get a little more info. It’s very inefficient, but there’s not much choice when you’re dealing with the Hockey Team.

As to your questons, I’ll onl;y pick up on a couple now.

The subtraction is in the second figure.

rbar is the mean interseries correlation. If all the “proxies” are essential orthogomal, you really begin to wonder how good they are as “proxies.

I really dislike these little stgatistical fixes by tree ringers quoting one another. If it ain’t in a text by a real statistician who knows about confidence intervals, don’t use it. Guys like Cook, Briffa and Osborn should give up trying to be theoretical statisticians.

My cherrypick detector is this: if you’ve got covariance in the calibration period but not in the historical period, this suggests to me that you’ve cherrypicked series with a common trend.

ahem…

Yes, TCO? You have permission to speak. -gd&rbTsmaibowtp-

I want more of Steve’s attention…

Waaaaah!