Noise in Multiproxy Studies

Someone asked what the graphs in Noise in Jones 1998 would look like for the other multiproxy studies. I speculated that they would probably look similar. In fact, they vary quite a bit. I’ve done plots for Mann and Jones [2003], Esper et al [2002], Crowley and Lowery [2000], Moberg et al [2005] and MBH99. In some cases, I’ve got accurate proxy data; in other cases, I’ve done it with what I’ve got or reconstructed. For amusement, I’ve posted them up without identifying them. You should be able to guess some of them. (I’ll edit in a couple of days and insert labels.)

Caveats: I’ve used the Esper et al [2002] chronologies as I graphed them up a few days ago; I don’t have all the sites and the Tirol site must differ somehow. For Crowley, I have only the smoothed and transformed version. The transformed version is in [0,1] so I did a qnorm transformation, jittering the 0 and 1 values. The MBH99 data used is the proxy roster from the MWP step.

I’ve shown the "studentized G" measure of coherency of annual changes, which is my modificaiton of the Gleichlàƒ⣵figkeit G statistic, as used by Esper and other dendrochronologists as a measure of signal. The G statistic is the maximum of the number of tree rings with an increase in width or with a decrease in width divided by the number of samples. In an unstudentized version, 2 out of 2 counts the same as 23 out of 23. As I pointed out here, this statistic is readily "studentized" (or normalized) by defining the

studentized G (n,N) = the proportion of cases in a 50-50 binomial draw of length N which are less informative than n out of N.

For example, if there is a 50-50 split between + and – in a given year, the studentized G would be 0 since you get at least that good a match 100% of the time. If you get 2 out of 2 going the same way, the studentized G is only 0.5, since you get a less "informative" result (1 up, 1 down) 50% of the time. On the other hand, if you have 23 of 23 cores going the same way, the studentized G is nearly 1. This is a trivial modification but obviously a far more sensible measure of signal strength. The bottom chart shows the values for the Polar Urals site, which demonstrates some values in a site where there is some common signal (leaving aside the 11th century issue).

Figure 1. G and Studentized G for Polar Urals. Studentized G is consistently above 0.67 after 11th century.

In the graphs below, I’ve put a horizontal line to show a benchmark of 2/3; maybe I should have put it at 0.5, maybe higher.

Study #1

Study #2

Study #3

Study #4

Study #5

An important point in these graphs pertains to the MWP. The Hockey Team theory is that the proxies are accurately measuring regional temperature and that the variability in the MWP proxies is evidence of a highly variable regional pattern, differing in nature from the 20th century warming. This is usually shown through spaghetti graphs showing a mess in the MWP. I’m interested in a couple of 20th century features: (1) how messy it is, given that the proxies have been selected on the basis of being "temperature sensitive" – which may mean little more than a trend; (2) the impact of non-normal distributions.

The studentized G values appear remarkably low for these studies.

More on this another day – for now, which is which?

This entry was written by Stephen McIntyre, posted on Sep 26, 2005 at 8:19 AM, filed under Crowley, Esper et al 2002, Jones et al 1998, Moberg [2005], Multiproxy Studies. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

10 Comments

Dave Dardinger

Posted Sep 26, 2005 at 8:30 AM | Permalink

Well, I don’t know about the rest but surely #4 is MBH99. The Hockey Stick is obvious a mile away.
Steve McIntyre

Posted Sep 26, 2005 at 8:32 AM | Permalink

Re #1. No. The MBH99 data here are the 14 proxies used in the MWP reconstruction carried through. I shuold have made that clear (I’m editing now).
TCO

Posted Sep 26, 2005 at 12:40 PM | Permalink

1. Dat somebedy were? 😉

2. Why do blue envelopes in these series not open up as you go back, as in Jones?

3. Why the level set not at 1.96 (same as Jones)?
Steve McIntyre

Posted Sep 26, 2005 at 1:31 PM | Permalink

The blue reflect the square root of the number of series. Jones decreases to only 3 series as you go back. So with only 3 series, for the variance in the mean of 3 series is bigger than the variance in the mean of say 10 series. The other data sets tend to have the same number of series (Mann is stepwise, but I’ve kept the proxy set frozen )
TCO

Posted Sep 26, 2005 at 2:08 PM | Permalink

What do you get if you consolidate them (either brute force or the independant parts) and then do some exercise to look at how well they correlate to instrumented changes during the iunstrumented period and then based on the efficacy of that relationship extrapolate backwards what happend during pre-instrument days?

Oh..wait…that’s what MBH did. It sounds pretty logical when you phrase it that way…
Steve McIntyre

Posted Sep 26, 2005 at 2:14 PM | Permalink

No, what MBH did was to look for the series that had a trend in the 20th century and assign them big weights. I’ve drafted a note showing how this ties into a multiple regression of a series with 79 measurements on 22-112 predictors i.e. you can achieve a representation, but no confidence.
TCO

Posted Sep 26, 2005 at 2:21 PM | Permalink

He looked at the match of overall trends (proxy to instrument) versus the year to year matching? Low freq versus high freq?

Regardless, what would one get from doing it my way?
Steve McIntyre

Posted Sep 26, 2005 at 2:29 PM | Permalink

The whole multiproxy project, from at least Hughes and Diaz 1994 and Bradley and Jones 1993, has been that they can achieve “high-resolution” results. That’s how they sold tree rings in the first place.

If there is no high-frequency relation between the “proxies” and temperature, then you’re into very precarious calibration for low frequency. You might have as few as 3 degrees of freedom for decadal averages.
TCO

Posted Sep 26, 2005 at 2:31 PM | Permalink

well…if I buy some of your blabla about autocorrelation and ARMA and such, shouldn’t that explain some of the failure of the high freq comparisons? Maybe looking at low freq is a way to deal with that effect (of course if significance is lowered thereby, need to square up on that). Are you trying to have it both ways? 😉
TCO

Posted Sep 26, 2005 at 3:10 PM | Permalink

1 is moberg? Isn’t that the one that shows the most variation, least stickishness?

P.s. Where is the comment from someone else’s blog about “mann, you lose” because of him refusing to divulge his algorithm?

Climate Audit