UC inquired about the variance adjustment in Osborn et al (Dendrochronologia 1998), which is used in many Team publications. The number of series in many reconstructions declines as you go back in time. If you take an average of standardized series (the CVM method), the variance over an early time interval will be larger than the variance in a later time period.) The BO variance adjustment was used originally in proxy reconstructions but this procedure or a variant seems to have been introduced into some of the CRU temperature gridcell series as well. The adjustment is described as follows:

Each regional mean thus obtained tended to have greater variance during years when few chronologies were available to contribute to the average; this effect was corrected for by scaling by the square root of the effective number of independent samples available in each year.

First they state

Let’s assume that one starts with a set of series all standardized to 0 mean and sd -1. Then if is the average of n series with a mean correlation ,

(1)

If the series are uncorrelated (, the variance goes down to 1/n;

(2)

whereas if the series are perfectly correlated ( ), the variance stays at 1. They assert:

“an artificial signal will be introduced into the variance of Xbar if the sample size varies through time.”

Comparing (1) and (2), they define the “effective independent sample size” as follows:

(4) ;

They express (1) as follows

(5)

When one thinks about this, this is a very odd terminology. This is measuring not so much the “independent sample size” as the relative lack of coherency in the sample – but let’s proceed, holding this thought. They make the unsurprising observation: “If is low, variance will increase strongly as n falls below 10.” They observe that in western U. S. confiers are about 0.6; in eastern US deciduous hardwoods about 0.3 and as low as 0.2 in deciduous European sites; and from 0.28 to 0.71/.74 for Siberian RW and MXD sites. They illustrate (Fig 2a) the average of 8 sites in S Europw where variance increases pre-1750 as n decreases ( is only 0.07).

They go on to say:

The method presented here is theoretically based …Equation 4 provides the time-dependent effective sample size if supplied with the time-dependent available sample size. We would then expect the variance of the mean timeseries to vary according to equation (5). If we adjust the mean timeseries by

(6)

then we would expect the variance Var(Y) to be independent of sample size (but would still have any real variance signals that are present in the data)”

They go on to discuss a couple of variations, where latex varies with time, but the idea is the same. Briffa and Osborn do not provide any third-party statistical references for this procedure.

Here is a function to implement the BO adjustment. rbar0 can be a time series or a constant.

#rbar0 is a vector of length of the total series, calculated in various ways externally

bo.adjust< -function(js.mean,rbar0){

count.eff<- count/(1+(count-1)*rbar0);#

NN<-max(count,na.rm=T)

count.eff.max<- NN/(1+(NN-1)*rbar0) ;#

var.adj<-sqrt ( count.eff/count.eff.max) ; #equation 7 of Osborn et al.

bo.adjust<-js.mean*var.adj

bo.adjust

}

I’ve included a script here illustrating the use of this method in attempting to replicate the archived version of Jones et al 1991. I can more or less replicate a smoothed version of the archived reconstruction, but the difference between my attempt to replicate Jones et al 1998 and the archived version can be up to 0.5 deg C in individual years. (And we’re told that these reconstructions are accurate to within a couple of tenths of a degree or so.)

Top – comparison of emulation to archived as smoothed; bottom – difference between emulation and archived version.

As to the Briffa-Osborn adjustment itself, if you have series with relative little inter-series correlation, one expects the variance to increase by reason of the Central Limit Theorem. Does the Briffa-Osborn adjustment do anything other than disguise this? I think that someone on the Team needs to prove the validity of the methodology statistically. Of course no one on the Team bothers. They just advocate a recipe and then assert it.

## 16 Comments

I assume that in the formula for calculation of effective n there should be a second close paren after “rbar”?

The link to /scripts/proxy/briffa,osborn.adjustment.txt does not work: spot the comma where there should be a dot

Steve,

This is the best blog on the net. But why is MM03 under links and not articles?

Thanks Steve! Let’s see, I have a simple question.

I don’t have the original Osborn paper, but Frank et al seems to explain the method sufficiently:

IOW, measurement contains wanted part (signal) and unwanted part (noise). Noise term can be reduced by averaging (I wouldn’t use term ‘eliminate’ here). Measurements share a common signal, so averaging does not affect the signal part.

This is obvious. If we use only one tree, variance of the measurement

over time is

,

assuming uncorrelated s and n. If there are more trees, we take the average. Average doesn’t affect the signal part, but it reduces the power of the noise. Efficiency of this reduction depends on how correlated the noise term is between the trees. For a given year, the average is

Expectation value is (given s, )

Looks good to me. If you scale Y, you’ll obtain a biased estimate of s, right? Now, where do I go drunk?

I am not sure I buy the independence of signal(s) and noise(n), where the signal is a pure temperature signal. If y=f(t, p, tp , x), where t is the temperature signal, p the precipitation and tp an interaction term and x is all other factors including random noise then if s = f(t) and n=f(p, tp , x) then s and n can clearly be correlated. Can you really separate strong interaction effects between t and any other factors, with p being the obvious one, by this approach? Does this make sense? For example, and perhaps simple mindedly, if a given ring width is produced by average temperatures and average precipitation or by above average temperatures and above average precipitation, how do you separate temperature and precipitation? But I assume that this is so obvious a point dendrochronologists must have addressed it, would they not? It sounds like, for example, they choose sites based on some assumptions that attempt to control for other factors like precipitation – but frankly the logic of assuming constant that which inherently fluctuates is very puzzling. This is, I assume, part of the argument for up-to-date records so that this assumption of independence can be tested.

Steve,

Am I readin this correct: “…this effect was corrected for by scaling by the square root of the effective number of independent samples available in each year”?

If one assumes that each sample contains a signal plus noise, doesn’t different scaling for different years distort the signal? After averaging chronologies of different length, the result must be the signal plus different amount of noise for different time periods, depending on the number of effective samples in each time period. Isn’t the effect of the correction that you lower the signal amplitude for periods where you have less data!? Instead of increasing the error bars!!

I think Martin is right, the difference in variance is due to different amount of noise cancellation, correct? If so, you just have to live with the higher noise when you have less samples, and represent this as larger error bars. Scaling will affect both the signal and the noise, thus masking the potential signal in the periods where there are less samples. I don’t see how this can possibly be justified. Why is this procedure being used repeatedly if it hasn’t been shown to be a valid statistical technique?

#6

Yes. This adjustment leads to a biased estimate.

Yes. In the past, we have sparser data. Past variations will be scaled towards zero. Increasing the error bars is not a legal move in climate science. Those bars might reach the current temperature levels, that wont do.

#7

Because it makes the results look nice.

Ok , so

that means that they are doing the opposite to what I said earlier, in fact amplifying the mean where you have fever effective samples.

What I don’t understand is how this can cause “…the variance Var(Y) to be independent of sample size” if

Whichever way you cut it, the error term is correlated with one or more of the indpendent variables and you have a big problem.

UC, I posted on this topic a long time ago here http://www.climateaudit.org/?p=418

#5:

Amazingly, I don’t think the dendrochronologists HAVE addressed this fundamental and extremely important issue. They seem to avoid the question like the plague. Tree rings are often very good proxies for moisture. I don’t think they are generally valuable as “thermometers” for many reasons.

Steve, Martin

Frank et al:

Seems that Eq (6) is not correct, X and Y mixed (??)

Steve wrote:

Because Briffa and Osborn have never heard of filtering theory (specifically, the problem of estimating the state of a stochastic dynamical system from noisy observations), they decided to go the easy way and just scale the observation so that the result looks good.

RE: #12 – I would however concede that some species in Marine West Coast, and in the wetter coastal portions at the northern margins of Mediterranean climates may be local temperature proxies. But how many such places are there on earth and how few of the overall claimed set of global tree ring proxies are actually found in such places?

This would have made sense:

A site mean with fewer effective samples might be expected to have a lower (temperature)signal to noise ratio. Then the signal part has lower amplitude after normalization than in a mean from a site with more effective samples. If all site means were adjusted to compensate for this before the total mean is calculated, the signal part in the total mean would be independent of which sites that are included at a specific time.

How this compensation should look I don’t know, but multiplying with n’ is probably in the right direction.

But, this compensation should be done uniformally over time but different on different site means before calculating the total mean. I can’t see that this is what they do.

A correction to my previous post.

I wrote multiplying with n’ is probably in the right direction. I meant dividing with square root of n’ is probably be in the right direction.