Hegerl Proxies: #1 – Mann PC1

The Hegerl et al 2006 climate reconstruction is finally online here . I’m going to go through the proxies individually before talking about method. Obviously the first one to look for is Mann’s North American PC1. Although they say that they’ve “moved on”, Mann’s PC1 was used in Osborn and Briffa 2006 and was one of my predictions when I tried to guess what proxies were used in Hegerl et al.

There is no mention of principal components whatever in Hegerl et al. They state that they used a western U.S . tree ring series, but clearly avoid any reference to principal components or bristlecones. They state:

western U.S.: this time series uses an RCS processed treering composite used in Mann et al. (1999), and kindly provided by Malcolm Hughes, and two sites generated by Lloyd and Graumlich (1997), analyzed by Esper et al. (Boreal and Upper Wright), and provided by E. Cook. The Esper analyses were first averaged. Although there are a number of broad similarities between the Esper and Hughes reconstructions, the correlation is only 0.66. The two composites were averaged.

Their Figure A1 shows that this proxy is available in 500- well before the start of MBH99 in 1000. MBH99 made no mention of the use of “an RCS processed treering composite”. Hegerl et al make no mention of Mann’s PC1 – and after all the publicity, you’d think that they’d make sure to mention any use of this controversial proxy. Here’s their series labeled as “w U.S.A. – Hughes”.

Fig 1. Excerpt from Hegerl et al 2006 Figure A1

Now for comparison, here is a smoothed version of Mann’s PC1 from Mann and Jones 2003. Look familiar?

Fig 2. Re-plot of Mann and Jones PC1, with 21 year gaussian smooth.

Finally, for good order’s sake, here’s a plot of the Western U.S. series from Osborn and Briffa 2006, using their smooth as archived.

Fig 3. Plot of western U.S. series from Osborn and Briffa 2006.

I think that we can safely conclude that Hegerl et al 2006 used Mann’s PC1 and have incorrectly described what they used. How on earth could they have accidentally mis-described this series? It’s not as though Gabi Hegerl is unaware of the issues pertaining to Mann’s PC1. She was at the NAS Panel for example.

In terms of my predictions of what Hegerl et al 2006, I’m scoring this 1 for 1 so far. I’ll keep going through my predictions individually over the next few days.

This entry was written by Stephen McIntyre, posted on Oct 24, 2006 at 11:27 PM, filed under Briffa, Hegerl 2006, Mann PC1 and tagged hegerl. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

85 Comments

Nicholas

Posted Oct 25, 2006 at 2:02 AM | Permalink

Yet more independent confirmation of the hockey stick!

Perhaps somebody should buy these multiproxy study authors a dictionary, highlight the definition of “independent” and place a bookmark at that page. Maybe they would get a hint. To paraphrase a movie:

“Independent. You keep using that word. I do not think it means what you think it means.”
John A

Posted Oct 25, 2006 at 3:44 AM | Permalink

Wasn’t it Hegerl who said that if the verification r2 was practically zero, the confidence limits would be from “floor to ceiling”?
Gary

Posted Oct 25, 2006 at 7:13 AM | Permalink

I think that we can safely conclude that Hegerl et al 2006 used Mann’s PC1 and have incorrectly described what they used. How on earth could they have accidentally mis-described this series?

this time series uses an RCS processed treering composite used in Mann et al. (1999), and kindly provided by Malcolm Hughes,

Did Malcolm Hughes just mis-label the series?
John A

Posted Oct 25, 2006 at 8:03 AM | Permalink

Did Malcolm Hughes just mis-label the series?

Can you think of an innocent reason why he would do that?
A. Fritz

Posted Oct 25, 2006 at 8:55 AM | Permalink

Can you think of an innocent reason why he would do that?

Maybe Steve should contact him and ask him. Is that ever an option on this blog? Or is it more exciting to just run with assumptions…
Steve McIntyre

Posted Oct 25, 2006 at 9:25 AM | Permalink

The motives for the mis-labeling are irrelevant – it doesn’t matter whether the mis-labeling was innocent or not. Attributing the series to Hughes rather than Mann is certainly a sign of cuteness. However, the most generous interpretation – and in the absence of evidence to the contrary, unlike John A, I am prepared to grant the most generous interpretation – is that it was negligence, rather than misconduct. Given all the publicity about Mann’s PC methodology and bristlecones, doncha think that they had an obligation to state that they were using Mann’s PC1 and that it’s a strange place to be negligent.

For the purposes of checking these studies, again, it doesn’t matter whether the mis-labeling was intentional or unintentional. It was done.
Dave Dardinger

Posted Oct 25, 2006 at 9:29 AM | Permalink

re: #5

I’d suggest you go back and read a bunch of the earlier blog entries by Steve if you want to see the great efforts Steve has made to get information from the various MBH and other players in the multiproxy game.
John A

Posted Oct 25, 2006 at 9:29 AM | Permalink

…the most generous interpretation – and in the absence of evidence to the contrary, unlike John A, I am prepared to grant the most generous interpretation – is that it was negligence, rather than misconduct.

Your generosity knows no bounds. If Hughes was negligent, then perhaps you should advise Hegerl that she should issue a Corrigendum before you have to request one from the journal. Again.
Ken Fritsch

Posted Oct 25, 2006 at 9:39 AM | Permalink

For the purposes of checking these studies, again, it doesn’t matter whether the mis-labeling was intentional or unintentional. It was done.

Does it say anything about the “blind eye” of peer review and the reviewers’ turnings thereof? Something to consider, A. Fritz — and without the necessity of contacting the author(s).
Steve McIntyre

Posted Oct 25, 2006 at 10:07 AM | Permalink

When you think about it, I wonder what the peer reviewers did. How could a peer reviewer of a HS article fail to ask about Mann’s principal components and whether they used Mann’s PC1? Why wouldn’t Andrew Weaver, the editor of Journal of Climate, ask? How could the matter NOT come up at some point?
bender

Posted Oct 25, 2006 at 10:17 AM | Permalink

Re #10
If you knew who the reviewers were, you could ask them. But they’re an ivory tower secret.
Gerald Machnee

Posted Oct 25, 2006 at 11:14 AM | Permalink

Some have still not figured out that results can be checked, even if you do not post original documentation. If it quacks like a duck, then it must be a duck!!
Great work Steve M and company!
sonicfrog

Posted Oct 25, 2006 at 11:18 AM | Permalink

Steve said:

When you think about it, I wonder what the peer reviewers did. How could a peer reviewer of a HS article fail to ask about Mann’s principal components and whether they used Mann’s PC1?

It seems they still don’t think of it as a problem.
sonicfrog

Posted Oct 25, 2006 at 11:18 AM | Permalink

That should have been “RealProblem”.
A. Fritz

Posted Oct 25, 2006 at 11:49 AM | Permalink

Re 6: Well not the motives, but a clarification of what was used. Ken I agree that a clarification should have been called for in peer review, but sometimes (from what Ive experienced) it gets explained to the reviewers but is not necessarily included in the article. Im not agreeing with this method, simply pointing it out from experience. Seeing as this is in press for JoC, Im interested to see the results of this inquiry. However, I havent read the entire paper, so I may be missing something.
John A

Posted Oct 25, 2006 at 11:50 AM | Permalink

When you think about it, I wonder what the peer reviewers did. How could a peer reviewer of a HS article fail to ask about Mann’s principal components and whether they used Mann’s PC1? Why wouldn’t Andrew Weaver, the editor of Journal of Climate, ask? How could the matter NOT come up at some point?

If I was to be generous, then you’d have to say that they were gullible fools, or seriously negligent. You don’t want to hear the less generous interpretations.
Steve McIntyre

Posted Oct 25, 2006 at 11:57 AM | Permalink

#15. A Ftitz, it’s irritating that you should say that we try to “run with assumptions”. First I’ve tried on several previous occasions to obtain exact information on the sites used in this study. I asked Hegerl in my capacity as an IPCC 4AR reviewer and she refused. I asked IPCC TSU and they said that IPCC reviewers were not entitled to such information other than as it may be made available through journals; that was the compass of journal review. I asked Ralph Cicerone of NAS to obtain the information since the NAS panel had used this study; he refused to ask. I asked Gerry North; he sent an inquiry to her several weeks ago and I’ve had no feedback. So I’ve tried to get accurate information about the sites.

I only got some information on Esper’s sites after bludgeoning Science at the blog on their non-compliance and Esper’s non-compliance.

I’m familiar enough with the data that I am 100% sure that the series in question is Mann’s PC1 from Mann and Jones 2003. Will I ask Hegerl to confirm this? Yes. but I expect to have other inquiries and don’t want to send them in piecemeal.
bender

Posted Oct 25, 2006 at 12:09 PM | Permalink

Re #15
You haven’t read the entire blog either. I used to be neutral, like you. Now that I have read the blog over, I see how the Team operates (forever “moving on”) with their false claims of “independence”, their inept, deceptive, non-transparent handling of data – I am suspicious of their work and, yes, now even their motives. And, yes, I’m breaking a blog rule in saying so. I believe the Team has lost objectivity in pursuit of a political agenda. After all that has transpired, what other conclusion is there?
A. Fritz

Posted Oct 25, 2006 at 12:28 PM | Permalink

Re: bender
I understand what youre saying. Ive read the blog a few times and am concerned about the dataset, however, Im still hesitant to jump to conclusions without the voice of a reviewer or author. Unfortunately, if this information is being withheld, as in Steve’s case, then its an issue that CA is going to have to sit on for a while.
Steve McIntyre

Posted Oct 25, 2006 at 12:46 PM | Permalink

#19. Look, I’ve looked at hundreds of data sets. The shape of the series is simply too close for the Hegerl series to be anything other than Mann’s PC1. This has nothing to do with information being withheld or not withheld – you can tell this from existing information. THe problem discussed here is that one of the most controversial series in paleoclimate was incorrectly described.
A. Fritz

Posted Oct 25, 2006 at 1:00 PM | Permalink

Steve, Im not attacking you. Im agreeing with your frustration and am looking forward to the series being clarified.
bender

Posted Oct 25, 2006 at 1:07 PM | Permalink

Im still hesitant to jump to conclusions without the voice of a reviewer or author.

Dear lamb, with all due respect … look at the graphs! Use your head! Sit on a fence if you like. Better yet, why not try contacting the authors yourself? Why should Steve be writing all the letters all the time? He’s doing science a great public service, yet he’s rebuked for not doing enough?!

At some point the non-climatological scientists out there are going to have to start thinking for themselves and show some leadership. The Team’s behavior is an embarrassment to the scientific establishment. Reform them from within or be tarred by the same brush.
bender

Posted Oct 25, 2006 at 1:20 PM | Permalink

Ive read the blog a few times

Or have you just visited the site a few times? The blog is huge! What do you make of Yamal, or the bcps, the Mannomatic, extrapolative RegEM, the estimation of confidence intervals on multiproxy reconstructions, the hurricance data fiasco, or the borehole mess, or the statistical robustness of the claim that temperatures are “unprecedented” in a millll-yun years? (No make that 12000,2000,1000,600,400, … years.)

backpedaling team, moving on, declaring independence, but always keeping the consensus pure
Mark T

Posted Oct 25, 2006 at 1:29 PM | Permalink

You are correct bender, your overall demeanor has shifted since you first posted. I do not blame you, however.

Mark
bender

Posted Oct 25, 2006 at 1:38 PM | Permalink

Re #34
My apologies for the devolution. But it’s maddening. How Steve keeps his head I have no idea. I can revert … but I want those who love the science to see what these folks are getting away with.
Mark T

Posted Oct 25, 2006 at 2:09 PM | Permalink

You’re not the one that need apologize. Many that come in here, as evidenced by the back and forth with Dr. Curry, do a quick scan and assume everyone is just a diehard, relying on preconceived notions to formulate opinions. They, hypocritically no less, take such a position without looking into the history behind such opinions. Maddening is a polite term for what has transpired (or should it be conspired?).

It seems A. Fritz was kind enough to back off of the original comment, at least in the way it originally appeared.

It is not difficult to become defensive, ala Steve M., when you are repeatedly told you are wrong because you’re a) Canadian (South Park joke in there), b) an economist and c) paid by big oil.

I’m not as generous as Steve M., either. These are smart people. They are fully aware of what they are doing when they mislabel a well-known, and discredited, series that is part of their analyses.

Mark
bender

Posted Oct 25, 2006 at 2:33 PM | Permalink

A. Fritz was kind enough to back off

Sure, I agree. But what I want is for folks like A. Fritz to read those posts mentioned in #23 and to respond to them. It’s a lot to ask, I know.

All Steve M is asking is that (1) the proxies be updated, and that the science be (2) documented (analytical components in turnkey scripts) and (3) transparent (open source). Doesn’t matter who’s asking or what his background is – these are darn good ideas that best serve the public’s interest. And they’re not even his ideas – they’re the stated values of granting bodies (NSF), governing bodies (NAS), and even the journals themselves (Nature, Science, etc.). If they all lived up to their stated values and archived data and code the way they’re supposed to, we would not be in this mess.
Mark T

Posted Oct 25, 2006 at 3:14 PM | Permalink

They’ve let it go on so long (standards bodies, granting bodies, governing bodies, journals) that if they started now, there’d be too much egg on their faces to continue doing business. Corruption has passed the point of fixing, IMO.

Mark
Dave Dardinger

Posted Oct 25, 2006 at 4:09 PM | Permalink

re: #28

I don’t think so Mark. Just look at the agility shown by the NAS panel. They essentially agreed to everything Steve M said while still claiming Mann was ‘plausable’. The rulers that be could easily continue to use such fancy footwork to demand everyone adhere to the established rules from here on out and still not find any past fault.

We who know what’s been going on would still know, but that doesn’t mean that the mainstream media would ever have to admit it.
Mark T

Posted Oct 25, 2006 at 4:21 PM | Permalink

I think that’s my point, Dave. NAS “essentially” agreed with Steve M., while at the same time “essentially” accepting such procedures knowing they are a farce. We shouldn’t have to read between the lines to find out what they were really stating. The rulers that be can’t admit to themselves that the rules need to be followed, let alone the general public.

The mainstream media is without a conscience, so they will simply report, and exaggerate, whatever the latest findings are, regardless of whether or not rules were followed. They don’t have to admit to anything anyway, so there’s no loss for them not admitting to wrong-doing.

Mark
John A

Posted Oct 25, 2006 at 6:09 PM | Permalink

It is not difficult to become defensive, ala Steve M., when you are repeatedly told you are wrong because you’re a) Canadian (South Park joke in there), b) an economist and c) paid by big oil.

In the case of Steve, only one out of three.

I’m not as generous as Steve M., either. These are smart people. They are fully aware of what they are doing when they mislabel a well-known, and discredited, series that is part of their analyses.

Steve and I had a conversation about this very fact: that before Steve had done his first analysis of MBH98, no-one used the PC1. After M&M2003, the PC1 started appearing as if it were another proxy like any other, and despite what is known about it over time, it seems to have become more popular with Hockey Team members – like an initiation rite or a badge of honor.

What these people are doing is reconstructing history, and in their synthesis, they are using a source which they know is corrupt, unscientific and wholly misleading. They claim to be producing a global picture using global proxies but their results are weighted and dominated by the Mann PC1.

Nobody appears to be able to stop this perversion of climate history and the scientific method, and for reasons I can’t explain, our taxes are used to produce this statistical propaganda. Editors of journals simply wave it through without a moment’s hesitation. New organizations parrot uncritically the highly emotionalized and strident rhetoric of the press releases as if from the word of God Almighty.
Willis Eschenbach

Posted Oct 25, 2006 at 9:57 PM | Permalink

A couple of things.

First, I’ve digitized all of the Hegerl proxy data, and placed it here. I sampled it at ~three year intervals, and interpolated the actual years.

Second, I took a look at their reconstruction method. They say:

The first step of the reconstruction technique is to scale
the individual proxy records to unit standard deviation, weigh them by their correlation
with decadal NH 30-90°N temperature (land or land and ocean, depending on the target
of reconstruction) during the period 1880 to 1960, and then average them.

Now, except for one proxy, they are using decadally smoothed data. But for the reconstruction procedure, they have averaged out their data to the decadal level. Thus, they are basing their entire reconstruction on how well it fits vs. eight data points … it seems like this alone should put their error levels from “floor to ceiling”. Let me go see …

Yes, there is not a single correlation coefficient (r^2) in the lot that is significant. In fact, the series are so short (only 8 data points = 6 degrees of freedom) and the autocorrelation is so strong that the p-value for the r^2 can only be calculated for four of them. The best of these four is w. Greenland, p = 0.23. Statistically meaningless. The other ten, once they are adjusted for autocorrelation, have less than one degree of freedom, so their p value cannot even be calculated.

Thus, none of the r^2 values is statistically different from zero, and their method falls apart.

Here’s a spaghetti graph of the contestants …

Only 4 of the 14 have an r^2 that is better than a straight line with regards to the Jones data …

The result of the correlation weighting procedure is a dataset that it is more autocorrelated than most of the indidual datasets … so we can’t calculate the p value of it either.

A bozo test of the value of their method, I suppose, would be to compare individual correlations with the first four decades of the Jones data, and then see how well they do in the next four decades. A little bit of “out-of-sample” test … I’ll do this using the smoothed data, rather than the decadally averaged data as they have done, to get a more accurate result. Hang on a few minutes … OK, thanks for waiting. Here’s the results …

YIKES … they have almost no correlation at all with the earlier four decades of Jones data, only with the later decades. The overall correlations dont have any relationship with either half, and the two halves have no correlation with each other … and these are the proxies that we’re going to depend on for temperatures a thousand years ago?!?

Can you say “fails the out-of-sample test”? … I knew you could.

w.

PS – How can the correlations be so different for the different periods? Easy. Here’s Jones versus some selected datasets. You can see why the correlations are so radically different during different time frames.
Willis Eschenbach

Posted Oct 26, 2006 at 1:16 AM | Permalink

Man, this sucker is bogus. Where do I start?

They say:

Appendix A: Records used for the new 1500 yr reconstruction
The CH-blend reconstruction is composed of records from twelve sites, some of which
contain multiple records (Figure 1 shows their locations).

They go on to explain the 12 sites, kinda, with sites that they don’t even know where they’re from, such as Boniface … they say:

Examination of the NGDC data base indicates that the original
Esper et al. reconstruction appears to be from the Boniface site. A record from
nearby St. Anne also shows many similarities to Boniface (r=0.66), extends closer in
time to the present, but is also slightly shorter (the Boniface/St. Anne correlation is
0.70). Although the Boniface/St Anne composite has a very high correlation with
the 30-90N (land) record (0.88), inspection of shorter records from Fort Chimo and
No Name Lake showed a different 20th century response — earlier warming and late
cooling. In order to preclude a Quebec composite from indicating a potentially
unrealistic magnitude of late 20th century warmth for the whole region, we created a
shorter composite of the four sites that averages records from a Fort Chino and No
Name Lake composite after 1806.

Since they got a whole bunch of their data from Esper … couldn’t they have just asked him where the Boniface site was? But I digress …

Then they show this graphic:

Now, I see 14 records in this graphic … which they have used. The reason there are fourteen is that they use a short and long series from western Siberia, and a short and long series from the western US (“w. US. Hughes” and “w. US composite” respectively). The U.S. short series is described as:

western US: this time series uses an RCS processed treering composite used in
Mann et al. (1999), and kindly provided by Malcolm Hughes, and two sites
generated by Lloyd and Graumlich (1997), analyzed by Esper et al. (Boreal and
Upper Wright), and provided by E. Cook. The Esper analyses were first averaged.
Although there are a number of broad similarities between the Esper and Hughes
reconstructions, the correlation is only 0.66. The two composites were averaged.

Generated by Lloyd and Graumich … analyzed by Esper … provided by Cook … man, we’re a ways down the food chain. In any case, I’m sure this will be seen as an “independent” verification of the Hockeystick …

Then, there’s Mongolia, described as:

Mongolia: this is from the D’Arrigo et al. (2001) study. However, the full
composite illustrated in this paper is not available.

Not available? What’s up with that? Then we have:

e.Siberia: the Esper et al. (2002) composite used the Zhaschiviresk time series from
Schweingruber. However, this composite only went to 1708. We combined it with a
ring width (by Schweingruber, available NGDC) series from the nearby Ayandina
River site after removing the obvious growth overprint in the early part of the
younger record.

Gotta love how they mess with the data … “obvious growth overprint”?

Of course, you can’t have Siberia without Yamal …

w. Siberia: in order to avoid any heavy biases of the mean composite by a number
of sites from one region, the west Siberia time series is a composite of three/four
time series from this region: two “polar Urals” records east of the Urals — Yamal
(Briffa et al. 1995) and Mangazeja (Hantemirov and Shiyatov 2002 – both by way of
Esper et al.) and two records from west of the Urals (Hantemirov and Shiyatov
2002). The records from each side of the Urals were first averaged and then
combined for the w.Siberia.short composite; the w.Siberia.long composite involved
Yamal and the west.Urals composite.

Then we have the mysterious:

European historical: this composite was kindly provided by J. Luterbacher et al.
(2004).

which doesn’t tell us a lot. But further research reveals, you’re gonna love this, folks, that the Luterbacher “European Historical” proxy list includes … wait for it … Yamal in Siberia. Which means that Yamal is in this paper, not once, but twice.

I also suspect that this European composite may contain a common core with the Greenland series described next. Luterbacher says that he uses “1st PC of winter àÅ½àⲱ8O from Greenland” as a proxy. The west Greenland series, on the other hand, is described as:

west Greenland: this composite is from Fisher et al. (1996).

A curious thing about this Greenland composite is that it comes from:

D. A. Fisher et al., 1996: Intercomparison of ice core à⣃ ’ ”¬Å¡18O and
precipitation records from sites in Canada and Greenland over the last 3500 years and
over the last few centuries in detail using EOF techniques. Climatic Variations and
Forcing Mechanisms of the Last 2000 Years, P.D. Jones et al., Eds., Springer-Verlag,
Berlin, pp. 297-328.

The curious thing is, Fisher is using the cores, not as a temperature proxies, but as precipitation proxies … wonder how Hegerl/Luterbacher removed the confounding variable …

Of course, we have to have the obligatory no-show:

Mackenzie Delta: The original time series (Szeicz and MacDonald 1995) provided
by Esper et al. only had a 0.04 correlation with the 1880-1960 decadal average of
NH temperature, which yields a very small weight if used for the hemispheric
composite. We experimented with various other data from the National Geophysical
Data Center (NGDC, http://www.ncdc.noaa.gov/paleo/ftp-search.html) to determine
if other reconstructions for that area would yield more information for a hemispheric
reconstruction. We found generally that proxy data for that region show little
correlation with hemispheric mean temperature. We nevertheless included this site
for the sake of completeness and in order to include as many long sites as possible.

Since the correlation is so low, 0.04, this proxy disappears in the “blend”, with a weighting of 0.2% of the total … but they still put it in. Why?

And how about the US long series? Well, that’s described as … as … well, in fact, it’s not described at all. Nowhere. Nothing. Not a single thing about it … searched the paper for “composite”, “US” … nada.

Next, we have the errors. Curiosities abound here. One expects error bars to get larger the further back we go, since there are less series (not even counting the disappearing Mackensie River series). But the wierd part is, the 95% confidence intervals in the middle section (~1000-1600AD) is smaller than the errors in the oldest section (~600-1000AD), and these, in turn, are smaller than the errors in the newest section (~1600 – 1890 or so). Here’s the data:

Era______ minus_______ plus_________
Recent___ 1.6 – 2.2___ 0.6 – 1.2 ___
Middle___ 0.8 – 1.4___ 0.4 – 1.0____
Oldest___ 1.2 – 1.5___ 0.6 – 1.0____

How many proxies are we talking about over time? They say:

The reconstruction consists of three individual segments: A baseline reconstruction uses 12 decadal records and covers the period to 1505. One longer, less densely sampled reconstruction, which we call CH-blend (long), is based on 7 records back to to AD 946, and CH-blend (Dark Ages) consists of 5 records back to to AD 558.

I have verified this by looking at the data. As a first approximation, the errors should be proportional to one over the square root of the number of proxies. Thus, these errors should increase as 1 : 1.3 : 1.6 … but they don’t.

They also say:

Note that our approach assumes that the errors $\varepsilon_{inst}$ and $\varepsilon_{pal}$ are uncorrelated and normally distributed. The first assumption appears reasonable since red noise possibly present in the individual records should have been filtered in àÅ½àⷰal (note the small changes in the correlation to instrumental temperature between detrended or nondetrended data), and the second is justified for hemispheric means by the central limit theorem.

Hmmm … why would the red noise be “filtered” by correlation weighted averaging?

In any case, I can’t make heads or tails of how they claim to “scale” the proxy average. I’ve tried and tried, but I just can’t see how they can estimate the average in the paleo reconstruction from this as they claim. They’re using the eight points of the weighted reconstruction, and the eight points of the data, to estimate the paleo error … how? As you can see, and as I showed before, the correlation with the first four decades is horrible … how can they get useful information out of that? Here’s the data they’re using …

w.
eduardo zorita

Posted Oct 26, 2006 at 3:01 AM | Permalink

This reconstruction cannot be considered a confirmation of the Hockey-Stick, and actually is much more similar to the Moberg reconstruction. One may not agree with Mobergs as well, but it is not accurate to say that Hegerl et al confirm the Hocke-Stick.

It is true that the number of degrees of freedom is very small, and the authors are aware of this limitation. The rationale is to focus on the low-frequency variability: if you calibrate with interannual values, the model may not be valid for longer time-scales; if you calibrate with decadal values, the number of degrees of freedom is low. So, there is here a trade-off. For this reason, the method (which is actually very simple) however tested
with pseudo-proxies in the climate simulation with ECHO-G. Again, one can argue that the pseudo-proxies used here are too tamed (just white noise). I would also agree with this, and more complex noise models could be used. We come again to the question of the bad apples, but this is in progress., as I am not aware of a valid and tested noise model for pseudo-proxies yet.
TAC

Posted Oct 26, 2006 at 5:22 AM | Permalink

#34 Eduardo,

I am not aware of a valid and tested noise model for pseudo-proxies yet

points to a a fundamental problem with any statistical inference based on pseudo-proxies. In order to calculate a meaningful significance level for a statistical test, one needs a null hypothesis that recognizes the characteristics of the noise. Every competent statistician knows this. Is it really the case that this point escaped notice in climate science?
eduardo zorita

Posted Oct 26, 2006 at 6:32 AM | Permalink

#35
No, it has not scaped the attention.
This is a limitation that is presnet in many other sciences, climate sciences is not a exception.
Just an example, and please correct me if I am wrong:Ecomic theory has used for a long time the assumption that market agents have perfect information and take rational decisions. Is this a valid assumption? I do not think so, but this makes many problems tractable.
In climatology we are not so smart, and have to start with really simple assumptions, and are always happy to get input from others 🙂
The problem is when those assuptions are forgotten on the way..
Willis Eschenbach

Posted Oct 26, 2006 at 7:04 AM | Permalink

Eduardo, thank you for your comments. You say the number of degrees of freedom is “very small”. But in fact, it seems to be smaller than you realize. While you talk about the effects of autocorrelation on your “fingerprint detection”, you do not mention it when calculating the correlation for your initial correlation weighted reconstruction.

There, you are using decadal data for the period 1880-1960 (8 decades), so you only start out with six degrees of freedom. These are reduced by autocorrelation to the point where not one of your proxies has a correlation with the data which is significantly different from zero …

I would be very interested in your comments on this problem. It is exacerbated by the huge difference in the correlation between the proxies and the first four decades of the instrumental data, versus the correlation with last four decades.

Since the correlation of any given proxy in half the instrumental data is not a predictor of the correlation in the other half of the instrumental data, why should we believe that the proxy is a predictor of temperatures a thousand years ago?

My thanks for your answer,

w.

PS – It is very good that you don’t see this work as a confirmation of the HockeyStick … but you can be sure that the claim will be made by others. In addition, since the choice to use Mann’s bristlecone pine series is unaccountably not disclosed in the paper, it will be claimed as an “independent confirmation” of the HockeyStick.
Steve McIntyre

Posted Oct 26, 2006 at 7:20 AM | Permalink

#37. Eduardo, actually the situation with respect to Mann’s PC1 is not just that its use is not disclosed, the paper misrepresents the situation. You should try to fix this even at this late stage even if it means pulling the paper out of production queue. Right now it’s embarrassing.

If you do this, you should also provide correct descriptions of all the proxy series instead of the inaccurate information now available. I’d be happy to review the descriptions on a non-partisan basis so that any future readers can make sense of it and you can avoid the severe criticism which you’re going to receive.
Steve McIntyre

Posted Oct 26, 2006 at 8:30 AM | Permalink

Here is a plot comparing Willis’ digitization of the "wU.S – Hughes composite" and the Mann and Jones 2003 PC1, smoothed with a 21-year gaussian filter with end-period mean padding. The correlation is 0.9875, further confirming my surmise that the "RCS processed" series “kindly provided by Hughes” is simply Mann’s PC1.
mikep

Posted Oct 26, 2006 at 4:30 PM | Permalink

Re 36, While economists do tend to assume rational agents – in a well defined sense – these days there is plenty of work that dispenses with perfect information. Nobel prizes have been won for work on asymetric information, where some agents know things that others do not. And perhaps it is no great surprise that some of thh simple conclusions that you get if you assume perfect information no longer hold when the asumption is dropped. Asymetric information can mean, for example, that markets for second hand goods whose quality is not readily observable may fail to develop because potential buyers fear that the only things that people want to sell are those that are defective. Simplifying assumptions help get models and thinking going. But you do need to be careful about over-generalising teh resulsting conclusions.
eduardo zorita

Posted Oct 27, 2006 at 2:20 AM | Permalink

I will try answer your questions, but first of all, I say that proxies is not my
area of experise , and that what i say ma not be completely correct

#37 Willis,
I agree that the number of degrees of freedom is small, but I would not
place much weight on the correlation calculations. Essentially the reconstruction
is proportional to the average over all proxies. As you already stated it does not make
a large difference whether this average is weighted or not, so that the question is
essentially to find a proportionality factor for the dimensionless proxy-average.
This factor can be estimated by simple variance matching to the NHT or, as in Hegerl et al,
by a total least square fit. In any case, the number of samples is small so that you cannot
really make a proper statistical validation. What was done here is to recreate the methodology
in a climate simulation, constructing pseudo proxies, and see if the method can retrieve
the true simulated NHT. In this framework, i.e. with the simple pseudoproxy structure,
it works, but I agree that this is not a hard proof, just a plausibility proof.

Concerning the “confirmation” of the HS, it depends of what one understand by HS.
the HS team asserts that it has been confirmed by other reconstructions. However,
the authors of some of those reconstructions would certaintly not agree. The situation is much
far from consensus as one may think.
Actually the abstract reads that the reconstruction displays substantially temperature variability
and is consistent with borehole temperature profiles: exactly the opposite of the HS.
I think that noweher in the paper is the HS reconstruction supported

#37 Steve, it seems from your plot that one of the proxies is indeed the MBH PC1.. I will read
all criticism expressed here and I will try to learn from them

#40 ‘But you do need to be careful about over-generalising teh resulsting conclusions. ‘
Exactly. But reading again the conclusions I could not find any over-generalization, or bombastic
assertions.
Willis Eschenbach

Posted Oct 27, 2006 at 5:14 AM | Permalink

Re #41, Eduardo, thank you kindly for your answer. You say:

#37 Willis,
I agree that the number of degrees of freedom is small, but I would not
place much weight on the correlation calculations. Essentially the reconstruction
is proportional to the average over all proxies. As you already stated it does not make
a large difference whether this average is weighted or not, so that the question is
essentially to find a proportionality factor for the dimensionless proxy-average.

Actually, my concern was not whether the correlation weighting was correct. As I showed, the weighted average is about the same, so it’s not important for that issue. My concern is much more fundamental.

Here are the facts about your paper, and a question that I hope you can answer.

1) None of the reconstructions have a significant correlation with the decadal averaged instrumental record (only 4 had enough degrees of freedom to even determine if they were significant. None of the four that could be measured was significant at p $latex 0.10)

7) Under the same conditions, the r^2 of the proxy average with the second half of the record is r^2 = 0.50, p = 0.18, not significant.

Now, given that lack of correlation between the proxies and the full instrumental record, and given the abysmal correlation of the proxies with the first half of the instrumental record, and given that there is not one single significant correlation between either proxies or average proxies and the intrumental record in all of the tests that I have done … given all that, my question is … why on earth should we believe that any of these are valid proxies for temperature? They don’t match the present, so why should we trust them for the past?

w.
Willis Eschenbach

Posted Oct 27, 2006 at 6:10 AM | Permalink

Eduardo, lest you think I’m picking on your proxies, they may just be a symptom of a larger problem. This is that any single temperature record may not be an adequate proxy for hemispheric temperatures.

To investigate this, I decided to see how well the Central England Temperature correlates with the Jones 30°-90°N hemispheric temperature. The results look like this:

Correlation with first four decades (1880-1920) : r^2 = 0.08, p =0.09, not significant
Correlation with second four decades (1920-1960) : r^2 = 0.27, p =0.002, significant
Correlation with full record (1880-1960) : r^2 = 0.27, p =0.01, significant

Although the correlation with the full record is significant, it is small. In addition, the total lack of correlation with the first half of the test data indicates to me that we can’t trust the CET to give us a good idea of what the hemisphere is doing.

w.
eduardo zorita

Posted Oct 27, 2006 at 6:28 AM | Permalink

#43 Willis,

Yes, I think that no single record can replicate the NHT. This is why one may take an average of different records, and test whether this average can do the job. In the world of the model, they can, but as I said this depends on assumptions about the noise present in the records.

With just a few samples in the record, as I said, I would not place emphasis on high or low correlations.

I understand your concerns that the sample size is limited, but I also tried to explain the pros and cons of using interannual or decadally averaged records.
Willis Eschenbach

Posted Oct 27, 2006 at 5:08 PM | Permalink

Eduardo, thank you for your response. You say:

Yes, I think that no single record can replicate the NHT. This is why one may take an average of different records, and test whether this average can do the job. In the world of the model, they can, but as I said this depends on assumptions about the noise present in the records.

If we lived in “the world of the model”, that would be fine, but we don’t. The only way to test the proxies is to see if they are correlated with the instrumental record. Models are useless for this, because we don’t know if they can reconstruct the past … that’s why we’re looking at the proxies, after all, because we don’t have data from that earlier time.

Please re-read what I wrote above. The average does no better than the individual proxies at replicating the instrumental temperature. THERE IS NO SIGNIFICANT CORRELATION ANYWHERE!. Not with the individual proxies, not with either of the proxy averages, not with the decadal instrumental averages, not with the annual instrumental data, not with the first half of the instrumental data or the last half of the data. Nowhere. Most of them don’t do any better than a straight line.

Are you willing to use a straight line as your proxy for the past? It does better than your averages, after all. So again I ask the question you have not yet answered.

Since there is no correlation with the instrumental data anywhere, why do you believe that these proxies can tell us anything about the historical record?

w.

PS – a final question, my curiousity only. Why do you believe that the models can successfully reconstruct the past? Do the models show the MWP and the LIA?
Eduardo Zorita

Posted Oct 27, 2006 at 6:12 PM | Permalink

#45
Willis,
In the model we cannot test the proxies. we can only test if 12 artificial
proxies can retrieve the modelled NHT when the same procedure is applied as in the real world. In this case, as with the assumed artificial proxies, the methodology can do it. Of course, we cannot know for sure that it will also work in the real world. As I said, it is plausible.

Concerning the values of the correlations that you calculated (although I repeat again that I am not defending 8 samples as enough) they are not the same as those quoted in hegerl et al. They are considerably different. In the text I read, e.g. that the correlations between the proxy-averaged and the NH land temperature is 0.97 (0.82 after detrending). You state that you have taken into account autocorrelation to estimate the cross-correlation. Do you think that this could explain these large differences?

On your last questions, yes, definively, the model ECHO-G simulates a very nice MWP and LIA (see Storch et al 2004). Whether they can accurately simulate the past is a much more difficult quuestion to answer, because even if the models were perfect, which they are not, we do not know the external forcing accurately enough.

What can be done? I see three possibilities:
-claim that we can simulated and predict everything ..
-claim that we cannot simulate nor predict anything..
-try to advance step by step, usually by small steps, and not claim anything, at least not very loudly.

I guess you support option 2. I would support option 3.

How can I include a figure here?
Willis Eschenbach

Posted Oct 27, 2006 at 8:40 PM | Permalink

Eduardo, once again, thank you for your comments.

You must always take autocorrelation into account whenever you are working with series that contain autocorrelation. Otherwise, you can identify trends as being significant when they are not.

The effect of the autocorrelation is to reduce the degrees of freedom, since one data point depends on another. The amount of the reduction depends on the number of points and the amount of autocorrelation.

In your case, you start with 8 points, or 6 degrees of freedom. On ten of the fourteen proxies, autocorrelation reduces this to less than one degree of freedom, so we cannot even calculate the significance. On the other four, the significance is greatly reduced.

I, like you, support option 3. The way to proceed with 3 is to not accept anything as a proxy until it is thoroughly tested, to choose your proxies according to rules, to be very careful with your processing, and to archive everything in a public archive. Here are some ways to do that.

“⠠ In addition to correlation (with the significance adjusted, of course, for autocorrelation), another test we can use is to compare the proxy correlations with the first half and the second half of the instrumental record separately. All of your proxies, as well as both averages, perform poorly on the second half, and abysmally on the first half.

Since this huge difference between the two halves is not affected by autocorrelation, this should be a tip-off that the correlations are random.

“⠠ Another test is to compare the correlation of the proxies/observations with the correlation of a straight linear trend line/observations. Only four of your proxies outperformed a straight line … another danger signal.

“⠠ Another necessary step is to have clear, a priori rules for proxy selection. Make up these rules before you select the proxies, and show which proxies you considered, and why you have included/excluded any given proxy.

“⠠ An additional task is to audit the proxies, identify them clearly, and consider their context. For example, you are using the à⣃ ’ ”¬Å¡O18 proxy for temperature from Fisher et al., who used it for precipitation. Which one is it? And how have you removed the effects of the confounding variable.

“⠠ You need to make sure you have independent proxies. You have used the Yamal proxy (which has known problems) twice in your paper, and perhaps the Greenland ice core proxy twice as well.

“⠠ At several places in the paper, you say that you have "averaged" proxies. I assume that this is a straight average. From a signal processing standpoint, averaging can introduce spurious "beat" frequencies in the average, particularly if the two series are not precisely aligned. Here’s an example:

While this can occur from misdating, it can also occur from signals that have different power in different frequencies.

“⠠ Avoid using proxies that have been "processed" by other people. You have no idea what they have done, whether they have used the right data, or if there are errors in their processing.

“⠠ Avoid using proxies that only exist in "gray" form. One of your proxies, according to the paper, is "not available" … if it’s "not available", then don’t use it.

“⠠Identify your proxies clearly and un-ambiguously. Calling Mann’s PC1 a "RCS processed" series “kindly provided by Hughes” is … well … I’ll let you supply the adjective, but it’s not pretty. Let me just say that, given the knowledge and experience of your co-authors, this description is clearly not accidental.

“⠠ Finally, and perhaps most importantly, you should include your proxy data, your instrumental data, and your processing code as Supplemental Online Material, or archive them in a permanent way.

This is not an exhaustive list, but it’s a start. You say that you support Option 3 above … but the fact that none of these tests, safeguards, or transparency standards have been used in the paper with your name on it supports Option 1. While it may be painful, my only suggestion is that you either:

1) Convince your co-authors to withdraw the paper until the problems are solved, and the proxies can pass site selection and other statistical tests, or

2) Withdraw your name from the paper, or

3) Watch your reputation suffer because of bad choices that have been made by your co-authors.

I know it’s ugly choice, I’m glad I’m not facing it. You have my support whatever choice you make.

All the best to you,

w.

PS – To include an image, it needs to be on the web. Once it is on a web site, include a line exactly like this:

[ tex][ /tex] – there is a space added after the [ in each case so that it will show here.

Replace the part in quotes with your URL to the image, and it will show at that point. I have put the "greater than" and "less than" symbols at the ends of the line in LaTex to prevent it from trying to find an image, and make it simply print the text on the page. You may need to replace them with the actual symbols if a direct copy and paste doesn’t work.

To automate the process, I use an Excel worksheet that I wrote, with macros that allow me to select a document with the usual "open file" dialog box, and then formats it as above, adding the address of my pictures folder on the web and the rest of the line. I have good success with JPEG images, haven’t tried anything else, other formats might work or not.

PPS – I get about the same correlations you get. At times, I report these using R^2, which is smaller than the coefficient of correlation. But the problem generally is not the amount of correlation … it is that the correlation is not statistically significant. Since the p-value of your best proxy is about 0.25, that means one random autocorrelated series in four will produce that value, and thus, you cannot say that the proxy is a valid proxy.

w.
Willis Eschenbach

Posted Oct 27, 2006 at 8:46 PM | Permalink

Blast, Eduardo, the LaTex didn’t work. The line should look like the following, with a $\text{[math]}$ at the end:

img src=”http://somesite.com/subfolder/yourpicture.jpg” style=width: 800px; height: 600px;” /

Replace the part in quotes with your URL, put a $\text{[math]}$ at the end, and it should work.

w.
Willis Eschenbach

Posted Oct 27, 2006 at 8:51 PM | Permalink

Don’t know why the LaTex is off the rails. Put a “less than” symbol at the start of the line (shift-comma on a US keyboard) and a “greater than” symbol at the end (shift-period on a US keyboard) of the line. There is a space after the first symbol at the start, and before the second symbol at the end.

w.
Eduardo Zorita

Posted Oct 28, 2006 at 2:52 PM | Permalink

# 48

Willis, thank you for your comments, although I do not agree with many of them. Let us try not to enter discussions about “reputations”, I think this would a recipe to embroil an otherwise useful discussion.

Some of your comments, although perhaps correct, do not apply here. For instance, the 2oth centuries trends have not been tested for significance.

You wrote:
Avoid using proxies that have been “processed” by other people. You have no idea what they have done, whether they have used the right data, or if there are errors in their processing.

“⠠Avoid using proxies that only exist in “gray” form. One of your proxies, according to the paper, is “not available” … if it’s “not available”, then don’t use it.

Some of your advices are simply not workable. I am not willing to fly myself to Mongolia and cut down trees, measure the tree-rings, get back to my office, run the climate model and see if they agree.. There will be always some pre-processing before you get hold onto the data, even if it is just simply measuring the tree-rings.

I do think that the proxies do contain information about the climate. I would like to illustrate this point. I have just plotted the annual Norther Hemisphere average temperature from the Climate Research Unit and a simple average of all (standarized) proxies used by Mann et al (1998), 112 in total in the period 1820-1980. This is the result

I have not selected proxies out not there is any other preprocessing. Note that i do not claim that this is a valid method to estimate past temperatures in the past centuries or that this is a perfect match -there would be other problems to solve. This is just an illustration that my believe is at least reasonable. On top of this you have a myriad of other questions: divergence after 1980, proxy network decimation before 1820, and so forth.. but I think it would not be fair to say that this set of proxies is useless. there is obvioulsy some information in there. The question is to refine this information, try to avoid errors, overinterpretations and so on. One can be succesful or maybe not, but one should try.
by the way.. this proxy set is available to anyone. Do you want to try?

In the Hegerl paper there are some more things than just the aspect that you have described. When one estimates the borehole temperature profile that should be observed assuming this past temperature evolution is indeed correct, it matches the observed mean borehole temperature profile. Ok, this is just for the last 500 years that can be resolved in the borehole profiles, but this is another hint that perhaps the reconstruction is not that bad. The borehole data are truly independent from the proxies.

As I said the methodology that you critisize works fine in the world of a climate model, where a LIA and MWP are simulated and which matches quite a few of other regional reconstructions.

So I do think that there are at least some positive aspects in this paper, and I am happy with that. Every paper can be improved, and I just hope that the next one will be a little bit better, of course heeding some of your advices.
Willis Eschenbach

Posted Oct 28, 2006 at 4:35 PM | Permalink

Eduardo, your words are always welcome.

I had said:

“⡁void using proxies that have been “processed” by other people. You have no idea what they have done, whether they have used the right data, or if there are errors in their processing.

“⠠Avoid using proxies that only exist in “gray” form. One of your proxies, according to the paper, is “not available” … if it’s “not available”, then don’t use it.

You replied:

Some of your advices are simply not workable. I am not willing to fly myself to Mongolia and cut down trees, measure the tree-rings, get back to my office, run the climate model and see if they agree.. There will be always some pre-processing before you get hold onto the data, even if it is just simply measuring the tree-rings.

My apologies for my lack of clarity. “Processed” does not mean that you should fly to Mongolia. That is “generating” the data, not “processing” the data. What I meant is that data attributions like this:

two sites generated by Lloyd and Graumlich (1997), analyzed by Esper et al. (Boreal and Upper Wright), and provided by E. Cook.

and this:

an RCS processed treering composite used in Mann et al. (1999), and kindly provided by Malcolm Hughes

means that you are exposed to the consequences of any errors that the people who “processed” the data might have made along the way.

Thank you for your graph of the “MBH98 proxy average” vs the CRU temperature, which I reproduce here for discussion.

There are a three very large problems with your graph. One is that you show CRU temperatures extending back to 1847, whereas the CRU dataset only extends back to 1850. This is one of those errors that in itself may mean little, but may also indicate serious hidden problems.

The second is that you describe this as:

a simple average of all (standarized) proxies used by Mann et al (1998), 112 in total in the period 1820-1980. … I have not selected proxies out not there is any other preprocessing.

However, although Mann used 112 datasets during the period, only 81 of these were proxies. The other 31 were Principal Components. So there is a huge amount of “other preprocessing” in the data you are using. See here for details.

Third, although as you say you have not selected proxies, Mann et al. selected proxies.

I would be interested in seeing a corrected graph containing only proxies, but the fact that Mann very carefully selected those proxies renders the exercise useless. Mann picked those proxies for his particular ends, excluding proxies that did not give the desired result. This is why, as I stated above, it is critical to have a priori rules for proxy selection.

None of this means, however, that I do not believe that proxies have value. They do. However, each individual proxy needs to be carefully examined on its own merits, using clearly defined a priori rules for proxy selection. Statistical tests need to be applied, including adjustments for autocorrelation, to determine whether the significance of each proxy is significantly different from zero.

For example, before reading your most recent post, I got to thinking last night … why did you only use the 1880-1960 temperature dataset for your reconstruction, when the data goes back to 1850? So I took a look at the results of using all available instrumental data for verification, rather than the shortened set that you used. Here are the results:

A couple things of note.

First, I tried three averages, using raw, R^2 weighting, and correlation weighting. These made very little difference to the results.

Second, I used the more modern (HadCRUT3) temperature data. The correlation of Jones and HadCRUT3 decadally averaged temperature data from 30″-90*N is .99 during the overlap period.

Third, the correlation of the averages with the observations drops from 0.96 using the 1880- period, to 0.85 using the 1850- period. This drop in correlation as soon as we move out-of-sample is a huge red flag. This is why calculating the significance of the results is so important, because of spurious correlation. Although the correlations are good, the odds of them being this good by chance are huge, about one in three (p = 0.30, a long ways from statistical significance) … and the out-of-sample results indicate that this well may be happening.

My best to you,

w.
cytochrome_sea

Posted Oct 28, 2006 at 5:34 PM | Permalink

Another small nitpick, Willis, are you sure about the graph showing CRU back to 1847? It looks like it begins ~1855 or 1856?
Steve McIntyre

Posted Oct 28, 2006 at 8:01 PM | Permalink

Here’s a plot of the scaled average of all 415 MBH proxies in the instrumental period

Eduardo, I doubt that it’s your view that the average of these scaled proxies contain information about 20th century climate.
Willis Eschenbach

Posted Oct 28, 2006 at 8:39 PM | Permalink

Cytochrome Sea, you’re right. I was reading the midpoint as 1850, when it’s actually 1860.

w.
eduardo zorita

Posted Oct 30, 2006 at 5:33 AM | Permalink

Willis, Steve,

One can reasonably argue that some of the proxies are precipitation proxies, so that in an all-average they should not be included (in regression probably either..).You know the individual proxies much better than I do, so I have plotet the following (without cherry peaking, trust me..).
Form the list containg 112 indicators I have excluded precipitation proxies, all pcs, all tree lines. I have kept proxies 1-32 and 52-62: individually standarized, averaged, rescaled to the NHT variance in 1900-1980.

Willis, thank you for pointing to the error in my previous figure. It was just a minor ploting error, as I typed that the series should start in 1800 instead of 1820, so that they were both somewhat stretched.

To this figure (I hope the attachement worked properly and I did not make any error, please check): again the match is perhaps not perfect, there are some differences from the previous 112-average, and also the network is greatly diminished back in time and the method would perhaps not work for longer periods with these proxies, but dont you think there is a skill here that could somehow be exploited ?
eduardo
BradH

Posted Oct 30, 2006 at 7:28 AM | Permalink

Re: #55

Eduardo,

Before you engage any further on this issue, I think there is a more compelling post which requires your attention:-

Potential academic misconduct by the euro team

Unless Steve has made a mistake, or you have a doppelganger, you are listed as a co-author on the paper discussed.
UC

Posted Oct 31, 2006 at 2:28 AM | Permalink

#55

I have kept proxies 1-32 and 52-62: individually standarized, averaged, rescaled to the NHT variance in 1900-1980.

Individually rescaled to NHT variance, or rescaled after averaging?

but dont you think there is a skill here that could somehow be exploited ?

Maybe. But as the proxies don’t tell what happened during 1980-present, I find it very doubtful. Tried this with Osborn 2006 proxies:

1) Rescale after averaging: Amazing match 1902..1980, then diverges.

2) Rescale before averaging: No match at all.

Maybe OT, but I think originally the term spurious correlation is from Pearson’s On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, 60, 489-497. (Available at http://www.pubs.royalsoc.ac.uk/)

By index he means:

If the ratio of two absolute measurements on the same or different organs be taken it is convenient to term this ratio as index

Well, this doesn’t directly apply to this case, but I think scaling after averaging can also bring up spurious correlations. Maybe someone more statisically oriented can tell if it is true or not.
Willis Eschenbach

Posted Oct 31, 2006 at 3:40 AM | Permalink

Re 55, Eduardo, thank you for your graph.

Is there useful information in these proxies? Possibly … possibly not. As UC notes, it may make a difference whether you standardized before or after averaging.

The main difficulty I see is that, although the match during the 60 years from 1920-1980 is passable, the match during the previous 70 years, from 1850-1920 is not only bad, it runs in the opposite direction to the instrumental data. Temperatures rose from 1850-1920, while proxy temperatures dropped during the same period. When the proxies say decreasing temperatures for 70 years during which the temperature increased, how much trust can we place in them?

This same pattern (correlations are much better post 1920 than pre 1920) occurs in your study as well. Doesn’t this bother you at all?

Finally, while I do trust you when you say you are not doing any cherry picking, you have ignored my comment above that there’s no need for you to do any cherry picking, because Mann has already done the cherry picking for you.

Is there useful information in proxies? Quite possibly … but until you follow all of the precautions I listed above regarding data integrity, a priori selection rules, and statistical checking, we won’t know, because it’s just cherry picking.

I would think that the very first step in trying to use proxies to recreate some given temperature (like say temperatures from 30N-90N, as in your case), would be to look at the air temperatures of the world, to see how well they correlate with 30-90N temperatures. Here’s the correlation:

Note that, for the majority of the locations of the proxies you used, the correlation is less than 0.5. In fact, you would have done just as well to use proxies from Central Africa or Brazil … the reality seems to be that the 30°-90°N temperature is not well correlated with much of anything. So you have an additional hurdle to pass, showing that your proxies can replicate the 30-90 temperatures better than proxies from Central Africa, Brazil, and Antarctica …

All the best,

w.

PS – the allegation against Steve M. in your paper is unscientific, unprofessional, unpleasant, and more to the point, untrue. To unjustly accuse him of not making his code available, when the same accusations are true about some of your co-authors but are not made in your paper, is … well, I’ll let you fill in the adjective. I invite and strongly encourage you to make your views known about this matter to this forum, as your silence will certainly be interpreted (rightly or not) as complicity in the false accusation.
UC

Posted Oct 31, 2006 at 12:53 PM | Permalink

Simple averages of Osborn et al 06 are in here. Crude and simple method, but the results are interesting. However, would need a model for proxy-temperature relation before I can conclude anything from those.
Eduardo Zorita

Posted Oct 31, 2006 at 1:41 PM | Permalink

#58,59

I do not fully understand what do you mean by “scaling after averaging”.

Do you mean by that “calculating the average of the raw proxy timeseries, some of them expressed in mm tree-ring width, others in delta O18, others in Ca/Mg ratios, and then scaling the result to the NHT?” If this is correct, it would not make sense: the result would be dominated by the proxy which happened by chance to have the largest variance because of the units used to store them.

What does scaling before averaging mean? Scaling each proxy to the NHT temperature and then average? This would not make sense either, since the variance of the result would be dependent on the number of proxies used.

I think the right procedure is to make the proxies first dimensionless, i.e. standarize them to unit variance individually, then averge and then re-scale to NHT.

But perhaps I missunderstood something.

#58,
Willis, I think that your argument that the correlation between NHT and local records is low hinders a reconstruction of the NHT is not correct. Consider the following example. The local correlation, according to you , is low everywhere. However, If I had all proxies covering the whole grid, each of them representing local temperature 1 to 1, I could retrieve prefectly the NHT, since NHT would be just the average of those proxies. This contradicts your argument. The question is how much is lost when the number is reduced from 112 and when each proxy does not represent local temperatute 1 to 1. It is not a substantial question, as you presented it, it is a quantitative question.
UC

Posted Oct 31, 2006 at 2:05 PM | Permalink

Do you mean by that “calculating the average of the raw proxy timeseries, some of them expressed in mm tree-ring width, others in delta O18, others in Ca/Mg ratios, and then scaling the result to the NHT?”

No. I mean scaling each proxy to the NHT and then average.

This would not make sense either, since the variance of the result would be dependent on the number of proxies used.

That’s why I was looking for proxy-temperature relation. In which case re-scaling after averaging is OK?

Let’s assume proxy represents local temperature 1 to 1. In this case less proxies, larger variance. Variance is dependent on the number of proxies. Kind of sampling. I wouldn’t try to re-scale.
Paul Penrose

Posted Oct 31, 2006 at 2:13 PM | Permalink

Eduardo,
Thanks for taking the time to post here, I really appreciate it. But I must say that I’m shocked that you put so much emphasis on correlation alone. Of course it’s axiomatic that correlation does not prove causation, and in fact it’s a long road to causation, but it feels like you and your colleagues are trying to take a short-cut. There are many time-proven protocols that can help prevent spurious correlation, and many proven statistical tests to identify the ones that slip through. Yet it appears that very little time and effort has been expended in these areas. Indeed, novel statistical approaches seem to be generated ad hoc with no vetting at statistical journals or even review by qualified statisticians. None of this inspires confidence and nothing you have said so far changes this impression in my mind.

Please don’t take this as an attack or an insult, but as something to think about. There are many branches of science that must analyze time-series data and the potential pitfalls are many and well known. I keep wondering why, with all this expertise out there, do climatologists (especially the paleo type) insist on doing it all themselves? Why don’t they consult with some of these other experts? I’ve even seen quotes from some of the more famous among them that their field is somehow special and that they can violate some of the fundimental rules, like cherry-picking of data. Clearly this is fallacious, the source of the data does not matter, but when I see this kind of stuff I have to think that critical analysis from statistical experts is sorely needed.
Eduardo Zorita

Posted Oct 31, 2006 at 3:03 PM | Permalink

#61

UC,

I think your approach (first scale to NHT and then average) is not consistent. Why should a single proxy be rescaled to the NHT temperature?. It would be more consistent to rescale the proxy to its local temperature and then average. A single proxy cannot represent the NHT . It can only represent, at best, the local temperature, unless there were “spooky actions at a distance”.

#62. Paul, I agree completely with you. WThere has been a focus on correlations probably because of historical reasons and probably lack of competence in my case. I feel there is a problem here but it can be only slowly solved. I am not trying to show that we (climatologist) are superior in any sense ( I would read RC in that case). I am trying to illustrate that, in my opinion, here reigns a negative, pessimistic view, which I do not share.
Are there bad papers? sure. Do people make unsupported claims? Of course. Should these be disclosed? No doubt. This being said, it is also true that it is usually much easier to demolish than to construct. I would be very interested to see how a team of professional timeseries analysts tackle some of the questions in paleoclimate and find a solution.
Dane

Posted Oct 31, 2006 at 3:29 PM | Permalink

Eduardo,

I did a little time series work as an undergrad about 10 yrs ago. I recall reading several papers dealing with paleoclimate and the earths orbital parameters. These papers dealt with time series of paeloclimate data, and then the resulting waveforms were analyzed using spectral analysis in order to “pull” out the strongest signals. You may be able to find a few by googling for it, if not let me know and I will try to find them as I find the earths orbital parameters fascinating.
welikerocks

Posted Oct 31, 2006 at 6:05 PM | Permalink

RE: # 64

Here’s a paper:
http://tinyurl.com/yclaem
Journal of Climate
Article: pp. 2369–2375 | Full Text | PDF (192K)
An Orbitally Driven Tropical Source for Abrupt Climate Change*
Amy C. Clement, Mark A. Cane, and Richard Seager
Lamont–Doherty Earth Observatory, Palisades, New York

Paleoclimatic data are increasingly showing that abrupt change is present in wide regions of the globe. Here a mechanism for abrupt climate change with global implications is presented. Results from a tropical coupled ocean–atmosphere model show that, under certain orbital configurations of the past, variability associated with El Niño–Southern Oscillation (ENSO) physics can abruptly lock to the seasonal cycle for several centuries, producing a mean sea surface temperature (SST) change in the tropical Pacific that resembles a La Niña. It is suggested that this change in SST would have a global impact and that abrupt events such as the Younger Dryas may be the outcome of orbitally driven changes in the tropical Pacific.
Willis Eschenbach

Posted Oct 31, 2006 at 10:08 PM | Permalink

Re #60, Eduardo, you say:

Willis, I think that your argument that the correlation between NHT and local records is low hinders a reconstruction of the NHT is not correct. Consider the following example. The local correlation, according to you , is low everywhere. However, If I had all proxies covering the whole grid, each of them representing local temperature 1 to 1, I could retrieve prefectly the NHT, since NHT would be just the average of those proxies. This contradicts your argument. The question is how much is lost when the number is reduced from 112 and when each proxy does not represent local temperatute 1 to 1. It is not a substantial question, as you presented it, it is a quantitative question.

Pardon my lack of clarity in writing. I meant that it was a quatitative question. I was hoping that there would be some areas of the planet that had a high correlation between the local and global temperature, so we could look for proxies there.

However, there doesn’t appear to be such an area. As you point out, if we had a 1:1 proxy for each gridcell used from 30-90N, we could reconstruct the temperature perfectly. The problem is two-fold. One is that there are 1,728 gridcells that make up the 60-90N temperature. The other is that proxies tend to have fairly low correlations with the local temperature … after all, trees are not thermometers.

If we have a proxy with a 0.2 correlation with a local temperature, and the local temperature in turn has a 0.4 correlation with global 30-90° temperature, we’re starting from a long ways back. At that point, any high correlation between the proxy and the global temperature has to be viewed with great suspicion. After all, if the local air temperature only has a correlation of 0.4 with the global temperature, why should the proxy be better?

That is why we need to look at in-sample and out-of-sample correlations, as well as sub-sample correlations, before we conclude that a high correlation is meaningful. It is also why examining the significance of the correlation is so critical.

Now since:

“⠠none of your proxies, nor either of the averages of the proxies, has a significant correlation with the 30-90° instrumental temperature, and

“⠠ none of your proxies, nor either of the averages of the proxies, has a correlation greater than 0.05 with the first half of the 30-90° temperature, and

“⠠ all of your proxies, and both of the averages of the proxies, do worse when we look at 1850-1960 rather than 1880-1960, and

“⠠ you have not provided any data to establish that the proxies are correlated with local temperatures, and

“⠠ 10 of the 14 proxies do not have a correlation greater than a simple straight line with the 30-90° temperatures …

… why should we believe that we are looking at anything other than random correlations here?

w.
UC

Posted Nov 1, 2006 at 2:10 AM | Permalink

#63

A single proxy cannot represent the NHT . It can only represent, at best, the local temperature, unless there were “spooky actions at a distance”.

You mean teleconnections?
Paul Penrose

Posted Nov 1, 2006 at 9:14 AM | Permalink

Eduardo,
Your comments continue to astonish me; your honesty is truely breath taking compared to some of your peers. I agree that it can be easier to find flaws in someone else’s work than do original work yourself. Nonetheless, this is a crucial aspect of the scientific method and I tend to agree with Feynman in this case: a scientist must make every effort himself to disprove his own work before publishing, and to disclose anything which might cast doubt on his conclusions when he does. The scientific method when practiced correctly is like a crucible where all the irrelevencies, guesses, and unsupportable assumptions are burned away to hopefully revel the truth. This can be an ego bruising process, but if one is going to engage in science one should learn to disentange their ego from their work.

So my view is that dendroclimatologists have gotten so far off the path that it’s going to take more than a gentle nudge to get them back. It’s my opinion that as a group they will have to be dragged kicking and screaming, otherwise they will just continue to drift off into denser jungle. This is especially important considering the enormous policy implications of their work.
UC

Posted Nov 1, 2006 at 11:52 AM | Permalink

#63

I think your approach (first scale to NHT and then average) is not consistent.

Yes, it didn’t work. I assumed that those proxies measure Global Temperature plus low noise multiplied by some unknown scale factors.

It would be more consistent to rescale the proxy to its local temperature and then average.

Yes! And no scaling after that. But that is something you didn’t do in 55, right? And your approach in #55 (average standardized proxies and then rescale to global temperature) didn’t work either, as 1980-present reconstruction went wrong. So, there must be something wrong with the scale-after-averaging method. 0-0. My guess: If you have low SNR, variance matching just matches the noise to calibration signal. And there goes the signal part of the reconstruction.

…This being said, it is also true that it is usually much easier to demolish than to construct.

If nothing ever is demolished we will drown in BS.

I would be very interested to see how a team of professional timeseries analysts tackle some of the questions in paleoclimate and find a solution.

That would be interesting. Seminar on applied statistics or something, I would buy the proceedings!
epica

Posted Nov 1, 2006 at 5:12 PM | Permalink

#63 Eduardo, surely you are joking. Climatologist probably are not perfect, but I havent seen a more arrogant, more biased bunch of Schweinchen schlau (cuchinillos listos) than the majority posting here.
Steve Sadlov

Posted Nov 1, 2006 at 5:18 PM | Permalink

RE: #70 – TROLL ALERT!
Eduardo Zorita

Posted Nov 4, 2006 at 10:52 AM | Permalink

#69

No, dear Epica, I do not think that “the others are even worse” (which may be true or not) justifies anything.

Quien este libre de pecado que tire la primera piedra…

For those interested in this thread

http://news.bbc.co.uk/1/hi/sci/tech/6115644.stm
John A

Posted Nov 4, 2006 at 11:35 AM | Permalink

Here’s a blast from the past.

On the “Stop Climate Chaos” website (they’re having a demonstration today which looks a lot like the CND “mass die-ins” of the 1980s) we have this as fact #1

The 1990s was the warmest decade, and 1998 the warmest year on global record (Intergovernmental Panel on Climate Change, IPCC).

So when we are told that the Hockey Stick “doesn’t matter” and that “the science is settled/done/unquestioned/over”, we should keep reminding alarmists that despite all that has transpired in the last five years, the claims of the Hockey Stick are still reported as unquestioned fact.

Noticeably Mike Hulme (in the article that Eduardo linked to) appears to believe that current climate change is rapid and unprecedented, which is yet another reference to the stable climate myth promoted by the Hockey Stick. When was climate change ever NOT happening?

I think Hulme doth protest too much about exaggeration.
MarkR

Posted Nov 4, 2006 at 11:35 AM | Permalink

Re#72 Interesting that these comments come from Professor Hulme of Environmental Sciences at the University of East Anglia, also the home of Briffa and Jones.

Perhaps Hulme can see that some have gone much too far out on a limb? “When the bough breaks the cradle will fall.”

Link
TAC

Posted Nov 4, 2006 at 11:58 AM | Permalink

#72 Eduardo,

First, thank you for the link. I find the politics and communications elements of the global warming issue nearly as fascinating as the science.

Along these lines, I appreciate, particularly in our post-modern world, your citing scripture (“El que este libre de pecado que tire la primera piedra”) in response to criticism of your work. Brilliant!

Which leads to some questions: What is it going to take to get the climate science community to recognize the significance of SteveM’s criticisms of existing climate reconstructions? Post-NRC, post-Wegman, why has no one produced an unbiased and reproducible (in every sense of the word) climate reconstruction with statistically defensible uncertainty estimates?
Eduardo Zorita

Posted Nov 4, 2006 at 6:36 PM | Permalink

#75
Let us assume for the moment that all the criticism raised by McIntyre is justifed. The name of this site is climateaudit, right? Would you expect that the Association of Industrial Manufacturers would award a medal to Standard & Poor for downgrading the credit rating of the whole industry,even provided they have done an excellent job? No, they would award them a medal when they set up a factory and produce better products. This is the continuous tension of this site.

How long will it take? It will take surely longer than 3 months. Consider that in some journals, e.g. Journal of Climate, the delay between submission and eventual publication may amount to 2 years. Internet journals such as Climate of Past are a new experiment, the outcome if which is still uncertain.

Do you now why Steve seems to know everything?
http://en.wikipedia.org/wiki/McIntyre
Armand MacMurray

Posted Nov 4, 2006 at 7:16 PM | Permalink

Would you expect that the Association of Industrial Manufacturers would award a medal to Standard & Poor for downgrading the credit rating of the whole industry

Perhaps not. Unfortunately, by not doing so they expose themselves to future anger and mistrust from the general population once the true situation becomes clear. Not every company is an Enron or Computer Associates, but their industries have suffered because of those fiascos, and suspicions linger that the other companies are similarly up to no good.
TAC

Posted Nov 4, 2006 at 8:02 PM | Permalink

#76 Eduardo, it is good to hear from you and I thank you for replying.

I have to admit that I am having trouble interpreting your metaphor:
Would you expect that the Association of Industrial Manufacturers would award a medal to Standard & Poor for downgrading the credit rating of the whole industry, even provided they have done an excellent job?
My first reaction is: What? Steve is a complete outsider; he bears no resemblance to Standard & Poors.

However, to be constructive, I would like to offer an alternative metaphor (forgive me, Steve): Steve is the boy in the “Emperor’s New Clothes,” who sees clearly and has the courage (foolhardiness?) to speak out. I think there may be an important message in that tale for us now: To prevent further harm and humiliation to our Emperor (the climate science that we are all a part of — we’re in this together), we’d better start listening carefully to what “the boy” has to say.
Ken Fritsch

Posted Nov 4, 2006 at 8:17 PM | Permalink

Re: #76

McIntyre, or MacIntyre, is a Scottish surname derived from the Gaelic Mac an t-Saoir literally meaning “Son of the Carpenter”.

I think Steve M generally prohibits religious references on his blog, but maybe he’ll let this one pass.
bender

Posted Nov 4, 2006 at 11:07 PM | Permalink

Do you now why Steve seems to know everything?

Or maybe it’s his keen eye, good memory for shapes, good head for numbers, his objectivity, and his bravery in putting his credibility on the line while communicating his unpublished findings to us.
Jean S

Posted Feb 14, 2007 at 9:03 AM | Permalink

Hegerl et al has been published with a scary last page number 😉
Gerald Machnee

Posted Feb 14, 2007 at 9:46 AM | Permalink

The study is available here:
http://www.nicholas.duke.edu/people/faculty/hegerl/hegerletal_scaling_inpress.pdf
We await an audit.
Steve McIntyre

Posted Feb 14, 2007 at 10:30 AM | Permalink

Before working on these data sets, I like to try to approximate the data as used. As noted before, I could pretty much guess the series used by Hegerl (and made rather a good prediction). There are two series where Hegerl et al created novel versions of well-worn series for Mongolia and Yamal/Urals, based on the description in the JClim article. I originally tried to get data from Hegerl et al in fall 2005 as part of the IPCC review process, but was refused at the time both by Hegerl and IPCC. In summer 2006, I asked for the assistance of Ralph Cicerone, since the NAS panel had used the results. He refused. I asked Gerry North to request this and other data; he said that he would, but I never heard anything more from him.

On Dec 5, 2006, I contacted Hegerl again, this time specifically mentioning the Mongolia/Urals versions.

Dear Dr Hegerl, in your recent article, you describe the collation of various sites to obtain new versions for the Urals area and for Mongolia. Could you please provide me with exact citations for the data used to construct these versions. Could you also provide me with the versions of the 14 series in your Table 1 as used. Thanks, Steve McIntyre

She promptly replied that the “references should all be in the paper” and provided smoothed and transformed versions of the series up to 1960 as done by Crowley, pleasantly saying “Please let me know if there are any problems.”

Unfortunately this did not respond to my request and so I replied:

The original paper does not provide an exact data citation for the Urals and Mongolia series that I asked about. It says:

“Mongolia: this is from the D’Arrigo et al. (2001) study. However, the full composite illustrated in this paper is not available. We reconstructed the composite from nine records from tree ring sites sent to the NGDC sites. The early growth part of the treering series from overlapping records was removed without further removal of low-frequency variability.

w. Siberia: in order to avoid any heavy biases of the mean composite by a number of sites from one region, the west Siberia time series is a composite of three/four time series from this region: two “polar Urals” records east of the Urals ‘€” Yamal (Briffa et al. 1995) and Mangazeja (Hantemirov and Shiyatov 2002 – both by way of Esper et al.) and two records from west of the Urals (Hantemirov and Shiyatov 2002). The records from each side of the Urals were first averaged and then combined for the w.Siberia.short composite; the w.Siberia.long composite involved Yamal and the west.Urals composite. The sites from Esper have been RCS processed.”

I can’t tell what you used from this information or what you did. Can you provide details on these series – preferably in the form of exact digital citations (e.g. according to AGU data citation policies). The data that you sent has been smoothed and truncated. Do you have the unsmoothed untruncated versions that preceded these? Thanks, Steve McIntyre

Hegerl replied saying that Tom might be able to help but that he had had a disk failure lately so it might not be instaneous.

As it happened, I had also had a disk failure that week end so I was not unsympathetic to the problem. I could also assess exactly what ws involved in disk recovery – it took me one day in the shop and $70 to recover my disk. On Dec 21, I sent a pleasant reminder which was unacknowledged.

A couple of days ago, I sent another reminder, saying that I would seek the intervention of the journals otherwise. Hegerl replied that she had sent the” reconstruction and uncertainty ranges, as well as the individual records used and that the method the individual records are composed by are described in the paper”, and that she had asked Tom to provide additional information, concluding that “It would be a bit of a stretch though to claim that we have not provided data.”

Later, Crowley said that he would have to “dig into the files” and could not do so until Feb 23, due to preparing for two proposals and for a trip.

So I’ve had to pick up this file about 15 times and still am waiting for the simple information on the Mongolia and Urals series. I never did get the Mongolia version used by Esper.
Ken Fritsch

Posted Feb 14, 2007 at 10:45 AM | Permalink

Re: #82

I always have a chuckle when I see the rush to the obligatory deference to AGW in these papers and particularly so when the results reported in the paper might be taken as evidence against or for a lesser case of AGW. We see this early in this paper in the abstract and a page or two into the main body of it:

High variability in reconstructions does not hamper the detection of greenhouse gas induced climate change, since a substantial fraction of the variance in these constructions from the beginning of the analysis in the late 13th century to the end of the records can be attributed to external forcing.

..Since the 20th century trend stands out less from trends in previous centuries in reconstructions with higher variance, high variance reconstructions have sometimes been used to question the importance of anthropogenic forcing. However, natural influences on climate, such as changes in volcanism and possibly solar radiation, are responsible for a substantial fraction of past climate variations (e.g., Robock and Free 1995; 2000; Hegerl et al. 2003; Bertrand et al. 2002; Weber 2005, Stendel et al., 2005). By quantifying the influence of external forcing on these high variance reconstructions, we find that high variability does not prevent confident detection and attribution of anthropogenic climate change.
Gerald Machnee

Posted Feb 14, 2007 at 10:45 AM | Permalink

So this study has been printed without proper archiving?
Here we go again.