## Willis on Hegerl

Willis writes: A couple of things.

First, I’ve digitized all of the Hegerl proxy data, and placed it here. I sampled it at ~three year intervals, and interpolated the actual years.

Second, I took a look at their reconstruction method. They say:

The first step of the reconstruction technique is to scale
the individual proxy records to unit standard deviation, weigh them by their correlation
with decadal NH 30-90°N temperature (land or land and ocean, depending on the target
of reconstruction) during the period 1880 to 1960, and then average them.

Now, except for one proxy, they are using decadally smoothed data. But for the reconstruction procedure, they have averaged out their data to the decadal level. Thus, they are basing their entire reconstruction on how well it fits vs. eight data points … it seems like this alone should put their error levels from “floor to ceiling”. Let me go see …

Yes, there is not a single correlation coefficient (r^2) in the lot that is significant. In fact, the series are so short (only 8 data points = 6 degrees of freedom) and the autocorrelation is so strong that the p-value for the r^2 can only be calculated for four of them. The best of these four is w. Greenland, p = 0.23. Statistically meaningless. The other ten, once they are adjusted for autocorrelation, have less than one degree of freedom, so their p value cannot even be calculated.

Thus, none of the r^2 values is statistically different from zero, and their method falls apart.

Here’s a spaghetti graph of the contestants …

Only 4 of the 14 have an r^2 that is better than a straight line with regards to the Jones data …

The result of the correlation weighting procedure is a dataset that it is more autocorrelated than most of the individual datasets … so we can’t calculate the p value of it either.

A bozo test of the value of their method, I suppose, would be to compare individual correlations with the first four decades of the Jones data, and then see how well they do in the next four decades. A little bit of “out-of-sample” test … I’ll do this using the smoothed data, rather than the decadally averaged data as they have done, to get a more accurate result. Hang on a few minutes … OK, thanks for waiting. Here’s the results …

YIKES … they have almost no correlation at all with the earlier four decades of Jones data, only with the later decades. The overall correlations dont have any relationship with either half, and the two halves have no correlation with each other … and these are the proxies that we’re going to depend on for temperatures a thousand years ago?!?

Can you say “fails the out-of-sample test”? … I knew you could.

w.

PS – How can the correlations be so different for the different periods? Easy. Here’s Jones versus some selected datasets. You can see why the correlations are so radically different during different time frames.

1. Steve McIntyre
Posted Oct 25, 2006 at 10:12 PM | Permalink

Willis, can you give a URL for the 30-90N series that you used (or did you do the calculation yourself?)

2. Sara Chan
Posted Oct 25, 2006 at 10:58 PM | Permalink

weigh them by their correlation with decadal NH 30-90°N temperature …, and then average them

Shouldn’t Hergerl et al. be weighing them with the (signed) square of the correlation? (That’s how much of the variance is explained; also, variances are linear quantitites.)

3. Willis Eschenbach
Posted Oct 26, 2006 at 1:22 AM | Permalink

I’m reposting this here, from the “Hegerl Proxies #1:” thread.

Steve M., I used the Jones database available here for the temperature data. Don’t know if that’s what she used, but it gives correlations very close to those reported in the the Hegerl paper.

w.

—————————————- repost —————————-

Man, this sucker is bogus. Where do I start?

They say:

Appendix A: Records used for the new 1500 yr reconstruction
The CH-blend reconstruction is composed of records from twelve sites, some of which
contain multiple records (Figure 1 shows their locations).

They go on to explain the 12 sites, kinda, with sites that they don’t even know where they’re from, such as Boniface … they say:

Examination of the NGDC data base indicates that the original
Esper et al. reconstruction appears to be from the Boniface site. A record from
nearby St. Anne also shows many similarities to Boniface (r=0.66), extends closer in
time to the present, but is also slightly shorter (the Boniface/St. Anne correlation is
0.70). Although the Boniface/St Anne composite has a very high correlation with
the 30-90N (land) record (0.88), inspection of shorter records from Fort Chimo and
No Name Lake showed a different 20th century response — earlier warming and late
cooling. In order to preclude a Quebec composite from indicating a potentially
unrealistic magnitude of late 20th century warmth for the whole region, we created a
shorter composite of the four sites that averages records from a Fort Chino and No
Name Lake composite after 1806.

Since they got a whole bunch of their data from Esper … couldn’t they have just asked him where the Boniface site was? But I digress …

Then they show this graphic:

Now, I see 14 records in this graphic … which they have used. The reason there are fourteen is that they use a short and long series from western Siberia, and a short and long series from the western US (“w. US. Hughes” and “w. US composite” respectively). The U.S. short series is described as:

western US: this time series uses an RCS processed treering composite used in
Mann et al. (1999), and kindly provided by Malcolm Hughes, and two sites
generated by Lloyd and Graumlich (1997), analyzed by Esper et al. (Boreal and
Upper Wright), and provided by E. Cook. The Esper analyses were first averaged.
Although there are a number of broad similarities between the Esper and Hughes
reconstructions, the correlation is only 0.66. The two composites were averaged.

Generated by Lloyd and Graumich … analyzed by Esper … provided by Cook … man, we’re a ways down the food chain. In any case, I’m sure this will be seen as an “independent” verification of the Hockeystick …

Then, there’s Mongolia, described as:

Mongolia: this is from the D’Arrigo et al. (2001) study. However, the full
composite illustrated in this paper is not available.

Not available? What’s up with that? Then we have:

e.Siberia: the Esper et al. (2002) composite used the Zhaschiviresk time series from
Schweingruber. However, this composite only went to 1708. We combined it with a
ring width (by Schweingruber, available NGDC) series from the nearby Ayandina
River site after removing the obvious growth overprint in the early part of the
younger record.

Gotta love how they mess with the data … “obvious growth overprint”?

Of course, you can’t have Siberia without Yamal …

w. Siberia: in order to avoid any heavy biases of the mean composite by a number
of sites from one region, the west Siberia time series is a composite of three/four
time series from this region: two “polar Urals” records east of the Urals — Yamal
(Briffa et al. 1995) and Mangazeja (Hantemirov and Shiyatov 2002 – both by way of
Esper et al.) and two records from west of the Urals (Hantemirov and Shiyatov
2002). The records from each side of the Urals were first averaged and then
combined for the w.Siberia.short composite; the w.Siberia.long composite involved
Yamal and the west.Urals composite.

Then we have the mysterious:

European historical: this composite was kindly provided by J. Luterbacher et al.
(2004).

which doesn’t tell us a lot. But further research reveals, you’re gonna love this, folks, that the Luterbacher “European Historical” proxy list includes … wait for it … Yamal in Siberia. Which means that Yamal is in this paper, not once, but twice.

I also suspect that this European composite may contain a common core with the Greenland series described next. Luterbacher says that he uses “1st PC of winter àÅ½àⲱ8O from Greenland” as a proxy. The west Greenland series, on the other hand, is described as:

west Greenland: this composite is from Fisher et al. (1996).

D. A. Fisher et al., 1996: Intercomparison of ice core à⣃ ’ ”¬Å¡18O and
precipitation records from sites in Canada and Greenland over the last 3500 years and
over the last few centuries in detail using EOF techniques. Climatic Variations and
Forcing Mechanisms of the Last 2000 Years, P.D. Jones et al., Eds., Springer-Verlag,
Berlin, pp. 297-328.

The curious thing is, Fisher is using the cores, not as a temperature proxies, but as precipitation proxies … wonder how Hegerl/Luterbacher removed the confounding variable …

Of course, we have to have the obligatory no-show:

Mackenzie Delta: The original time series (Szeicz and MacDonald 1995) provided
by Esper et al. only had a 0.04 correlation with the 1880-1960 decadal average of
NH temperature, which yields a very small weight if used for the hemispheric
composite. We experimented with various other data from the National Geophysical
Data Center (NGDC, http://www.ncdc.noaa.gov/paleo/ftp-search.html) to determine
if other reconstructions for that area would yield more information for a hemispheric
reconstruction. We found generally that proxy data for that region show little
correlation with hemispheric mean temperature. We nevertheless included this site
for the sake of completeness and in order to include as many long sites as possible.

Since the correlation is so low, 0.04, this proxy disappears in the “blend”, with a weighting of 0.2% of the total … but they still put it in. Why?

And how about the US long series? Well, that’s described as … as … well, in fact, it’s not described at all. Nowhere. Nothing. Not a single thing about it … searched the paper for “composite”, “US” … nada.

Next, we have the errors. Curiosities abound here. One expects error bars to get larger the further back we go, since there are less series (not even counting the disappearing Mackensie River series). But the wierd part is, the 95% confidence intervals in the middle section (~1000-1600AD) is smaller than the errors in the oldest section (~600-1000AD), and these, in turn, are smaller than the errors in the newest section (~1600 – 1890 or so). Here’s the data:

Era______ minus_______ plus_________
Recent___ 1.6 – 2.2___ 0.6 – 1.2 ___
Middle___ 0.8 – 1.4___ 0.4 – 1.0____
Oldest___ 1.2 – 1.5___ 0.6 – 1.0____

How many proxies are we talking about over time? They say:

The reconstruction consists of three individual segments: A baseline reconstruction uses 12 decadal records and covers the period to 1505. One longer, less densely sampled reconstruction, which we call CH-blend (long), is based on 7 records back to to AD 946, and CH-blend (Dark Ages) consists of 5 records back to to AD 558.

I have verified this by looking at the data. As a first approximation, the errors should be proportional to one over the square root of the number of proxies. Thus, these errors should increase as 1 : 1.3 : 1.6 … but they don’t.

They also say:

Note that our approach assumes that the errors $\varepsilon_{inst}$ and $\varepsilon_{pal}$ are uncorrelated and normally distributed. The first assumption appears reasonable since red noise possibly present in the individual records should have been filtered in àÅ½àⷰal (note the small changes in the correlation to instrumental temperature between detrended or nondetrended data), and the second is justified for hemispheric means by the central limit theorem.

Hmmm … why would the red noise be “filtered” by correlation weighted averaging?

In any case, I can’t make heads or tails of how they claim to “scale” the proxy average. I’ve tried and tried, but I just can’t see how they can estimate the average in the paleo reconstruction from this as they claim. They’re using the eight points of the weighted reconstruction, and the eight points of the data, to estimate the paleo error … how? As you can see, and as I showed before, the correlation with the first four decades is horrible … how can they get useful information out of that? Here’s the data they’re using …

w.

4. Willis Eschenbach
Posted Oct 26, 2006 at 1:49 AM | Permalink

Re #2, Sara, thank you for your question. That’s what I thought too … but I did it the way they said, and came up with their results. Signing the r^2 isn’t necessary, by the way, as all of the correlations are positive.

Doesn’t make much difference in any case, since the average of the unweighted proxies is very close to the average of the weighted proxies.

w.

5. Sara Chan
Posted Oct 26, 2006 at 3:31 AM | Permalink

Re #4, Willis, Yes, I’d understood that the inaccuracy was the authors’, not yours.

This shows, I think, that the authors’ lack of statistical competence should be evident to anyone with an appropriate statistical background without even seeing the data. So this is further evidence that the peer reviewers didn’t have an appropriate statistical background to review the paper, or they didn’t read the paper, or they didn’t care that it was wrong.

6. Steve McIntyre
Posted Oct 26, 2006 at 7:14 AM | Permalink

Willis, if you look at my post on Mann’s PC1 in Hegerl, you’ll see that there is a description of their western US long series, albeit incorrect. The shorter US series is the average of two cherry-picked Graumlich foxtail sites, incorrectly attributed to Llloyd and Graumlich 1997; they are referred to in Bunn et al 2005 about which I’ve posted on this site and were archived in May 2006 after much fighting with Science.

The longer US series is Mann’s PC1 from Mann and Jones 2003, incorrectly described as an RCS-processed result "kindly provided by Hughes". So the long series is actually an old friend, even if Hegerl et al are too embarrassed to admit that they used Mann’s PC1.

7. Concerned Climate Scientist
Posted Oct 26, 2006 at 7:25 AM | Permalink

Hey you guys. Can’t you just layoff of all of this statistical bs! Clearly you just don’t understand climate science. Come on over to RC and discuss these issues there if you really want to address these climate science issues!

What is really p…..g me off is that these silly questions you guys are asking are making it harder for me to get my grants. How would you feel if the income you need to feed your wife and kids is threatened? Hey?

8. John Lish
Posted Oct 26, 2006 at 8:13 AM | Permalink

#7 – is this a wind up? I suspect so.

9. bender
Posted Oct 26, 2006 at 8:16 AM | Permalink

Self-Concerned Climate Scientist: where exactly is the “statistical bs” that we should lay off of?

10. bender
Posted Oct 26, 2006 at 8:42 AM | Permalink

Re #7

Come on over to RC and discuss these issues there

My place or yours?

Jokes aside. I’ve posted two good questions in the past at RC (on the statistics of hurricane trends and on the constrainedness of GCM parametrization) and gotten next to nothing in reply. What little I got indicated a heads-in-the-sand/maintain-the-consensus attitude towards climate science. At CA you can actually generate a discussion and learn something. Example: Dr. Judith Curry and Dr. Isaac Held at CA provided answers to my questions that could not be gotten over at Consensus-Keepers.

Consensus-keeping is inherently anti-scientific. But as you point out: it’s good for Climate Science Inc.

Stick around and ask a question, or comment on a post. You might actually learn something. Or maybe you have something to teach us? What’s your take on bcp’s, Yamal, extrapolative RegEM, confidence interval estimation, and the like?

11. Mark T
Posted Oct 26, 2006 at 4:16 PM | Permalink

Self-Concerned Climate Scientist: where exactly is the “statistical bs” that we should lay off of?

I think that post was intentionally facetious.

Mark

12. Ken Fritsch
Posted Oct 26, 2006 at 4:52 PM | Permalink

Examination of the NGDC data base indicates that the original
Esper et al. reconstruction appears to be from the Boniface site. A record from
nearby St. Anne also shows many similarities to Boniface (r=0.66), extends closer in time to the present, but is also slightly shorter (the Boniface/St. Anne correlation is 0.70).

Willis E thanks much for your comments on the quoted parts of the paper. When I first read Steve M’s quotes from this paper, I am thinking is it me or does the approach in this paper just seem a bit loosey/goosey. When I read the above excerpt I was getting confused thinking Steve M had put some of his own comments in quotes.

Mackenzie Delta: The original time series (Szeicz and MacDonald 1995) provided by Esper et al. only had a 0.04 correlation with the 1880-1960 decadal average of NH temperature…We nevertheless included this site for the sake of completeness and in order to include as many long sites as possible.

That one really bothered me.

13. Tim Ball
Posted Oct 26, 2006 at 9:12 PM | Permalink

#5: It appears this is another example where the 43 Wegman identified as co-authoring together are also peer reviewing each others papers. Sadly we will not know because reviewers are not identified – a practice that has to change. Good work Willis.

14. A. Fritz
Posted Oct 27, 2006 at 11:50 AM | Permalink

I know this has been talked about before – the idea of having a “statistics reviewer” on every paper. If you guys had to design a way for that to work for any given journal, how would you suggest?

Also, a very well known scientist in the mesoscale meteorology world (at least, I respect him and his science very much) has written his own “review” of the peer review process here, I think some of you might be interested to read it.