Just When You Thought

Just when you thought that there was nothing left to say about MBH98, look at this MBH98 "flavor" that the cat dragged in today. This is a MBH-flavor reconstruction using MBH methodology and MBH proxies, following one of the NAS panel suggestions!! What did I do?

Figure 1. A New MBH98 Flavor

Before I say what I did, let me explain why I’m doing another MBH flavor at all. Surprisingly, it seems that the Wegman Report and NAS Panel have not driven a silver spike through MBH and devout climate scientists still believe in it. Here’s where the wiggle room is left.

The Wegman Report did not discuss Wahl and Ammann; Jay Gulledge of the Pew Center, who was one of the panelists with me at the second House E&C hearing, strongly criticized them for that, believing that Wahl and Ammann had somehow bailed out MBH – by arguing that the PC error didn’t "matter". Of course, he didn’t criticize the NAS panel for not trying to replicate Wahl and Ammann.

The NAS panel was simply schizophrenic. As we noted before, they condemned the use of substandard materials in bridge construction, but then cited designs using substandard materials. They cited Wahl and Ammann as somehow showing that Mann’s PC errors did not "matter" without reflecting that the entire rationale of Wahl and Ammann was how to include substandard materials in a reconstruction (bristlecones).

One of the Wahl and Ammann arguments which has got a lot of traction in the climate science community is that they can "get" a HS without using PC methods by using all the proxies. We had previously discussed this result in MM05 (E&E 2005b) – it was already in the air at realclimate. We pointed out that all this means is that the reconstruction has abandoned any pretence to geographic balance and that the Mannian regression phase picks up bristlecones. Of course, Wahl and Ammann did not cite this prior discussion nor did the NAS Panel (which did not even include MM05 (EE) in its bibliography. It really pisses me off that the NAS panel could, on the one hand, cite this particular Wahl and Ammann finding approvingly although it relies on bristlecones, while on the other hand condemning the use of bristlecones. So I guess that we’ll have to reply to Wahl and Ammann after all.

The NAS panel mentioned in passing that the average of tree ring networks might be more sensible than PC methods – a possibility mentioned in Huybers 2005. Here’s what they said:

Huybers (2005) and Bürger and Cubasch (2005) raise an additional concern that must be considered carefully in future research: There are many choices to be made in the statistical analysis of proxy data, and these choices influence the conclusions. Huybers (2005) recommends that to avoid ambiguity, simple averages should be used rather than principal components when estimating spatial means.

While I doubt that there’s any sensible way of extracting useful information from the bilge of MBH tree ring networks, I thought that it would be an interesting exercise to do what they recommend here – calculate the average of the 6 tree ring networks, both with and without dividing by standard deviations, and then include these 6 proxies in the MBH reconstruction. Here are the averages of the 6 networks (the versions with division by standard deviation.) Obviously no big Hockey Sticks here.

Figure 2. Averages of Six MBH98 Tree Ring Networks

But when you insert these 6 series into the MBH network leaving the 81 non-PC proxies unchanged, you get the result shown above, which I’ve shown below with the averages over each calculation step in red. There are dramatic downward steps occurring in the 1750 and 1760 steps and back up in the 17780 step.

Figure 3. MBH98 Reconstruction using tree ring network averages. Step average in red.

What happens in these steps? Two things – the number of temperature eigenvectors used in the reconstruction changes. If you recall my posts on the linear algebra of MBH, I pointed out that the NH reconstruction was a linear combination of the RPCs (and these coefficients remain unchanged in the steps regardless of the number of RPCs used.) The figure below shows the weights of each RPC in the NH average. An amusing result of this linear algebra is that you can calculate theNH average from the RPCs without having to do the matrix expansion to individual gridcells, since the algebra cancels. (I’ve quadruple-checked this both in the algebra and reconciling against Wahl-Ammann results.) The most important weight comes from the RPC1, which is hardly a surprise. In the 1750 step, RPC7 and 9 are added in; 1750 and then RPC9 in 1760 and then RPC14 and 16 in 1780. (Actually MBH substitutes RPC6 and RPC8 in the 1750 step – the only step in which these two RPCs are used. I’ve used RPC7 and 9 which are used in all later steps. ) No one has any idea how the selection of RPCs was made; MBH failed to archive this code in response to the House Energy and Commerce Committee request.)

Figure 4. Weights of Reconstructed Temperature PCs to MBH98 NH Reconstruction

While the RPC selections may contribute to the strange effect shown above, I don’t think that this is what’s involved. Here are the RPC1 contributions to the NH reconstruction for the four century steps: 1400, 1750, 1760 and 1780. Obviously, the RPC1 changes dramatically with the roster changes. (The 1730 step is at a similar level to the 1400 step.)

Figure 5. RPC1 Contribution to NH reconstruction for four steps: black – 1780; red – 1760; blue – 1750; green – 1400

What is it that causes these changes? I’ll try to get to that over the week-end. I haven’t got that far right now. It sure is a bizarre result as it stands. It’s possible that there’s some artifact that I inadvertently introduced in modifying code for this case where no tree ring PCs are used. These calculations are very fresh and it’s possible to get blindsided whenever you tweak code. However, I don’t think so and I suspect that there would still be an interesting result regardless. I think that the situation occurs because the proxies are so poor, causing the algorithm to select slightly different noise patterns in different steps, leading to very unstable results.

So there’s still some juice in this particular lemon for statisticians to explain.

One other prediction that may result from this calculation. In statistical literature, you see certain data sets from the distant past used over and over again to illustrate new methods – as benchmarks. I think that there’s a decent chance that the MBH98 data set will become a statistical classic, although perhaps not in the way intended by the authors. I’ve sometimes said that you could do a yearlong seminar on all the statistical problems of MBH98. I think that there’s a decent chance that the MBH98 will come into increasing use as a classic benchmark in multivariate statistical studies as multivariate statisticians come to understand its many and interesting perversions.

UPDATE (a couple of hours later):

Here is a barplot of the weights (implicitly) assigned by the MBH algorithm to the proxies for the 4 steps illustrated above from the MBH data set with 6 tree ring network averages. As you can see, the weights assigned to the individual proxies are very unstable (BTW recall that both Mann and VZ said that you could not allocate the contributions of individual proxies !) Below the figure, I’ll discuss which proxies are contributing heavily to the reconstruction step.

Figure 6. Weights in four MBH steps. The bold number shows the average value in the step (as shown in red in Figure 3).

In the bottom panel (AD1400 step), the positive values are from Stahle’s Georgia precipitation reconstruciton, then Stahle’s South Carolina precipitation reconstruction; the largest negative value is from the Svalbard melt series.

In the second panel from the bottom (AD1750 step), the two large positive weights are from the precipition series from gridcell 42.5N, 2.5E (which, by recollection, is Marseilles instrumenal precipitation) and the Stahle South Carolina precipitation reconstruction; the two negative weights are from Svalbard melt and the Yakutia temperature reconstruction – note the negative weight to the temperature reconstruction.

In the second panel from the top (AD1760 – the most negative step), the largest positive weights are from the two Quelccaya dO18 records; instrumental precipitation from gridcell 42.5N 2.5E, then the CEng and CEur instrumental temperature series. The Quelccaya dO18 go back to AD1400; I odn’t know why the coefficients jump around so much. The negative values are from Svalbard melt, Yakutia T-reconstruction, Galapagos dO18 and instrumental temperature 57.5N, 32.5E (Moscow). Again, note that instrumental temperature has a negative contribution to the MBH-style reconstruction; this also occurs for some gridcell temperature series in the MBH reconstruction itself.

In the top panel (AD1780), the largest positive values are from Dunde dO18, Central Europe instrumental, Quelccaya 1 dO18, Central England instrumental and Leningrad instrumental; the largest negative contributions are frp, Galapagos dO18, the Yakutia T-reconstruction, New Caledonia dO18; the NOAMER tree ring average and the Stahle SWM tree ring average.

What a pile of garbage this stuff is.

This entry was written by Stephen McIntyre, posted on Aug 12, 2006 at 7:57 AM, filed under MBH98 and tagged house, MBH98. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

21 Comments

bender

Posted Aug 12, 2006 at 8:54 AM | Permalink

TXOK in Fig 2 is a *reverse* hockey stick.
Sara Chan

Posted Aug 12, 2006 at 9:01 AM | Permalink

I think that there’s a decent chance that the MBH98 will come into increasing use as a classic benchmark in multivariate statistical studies …

You blithely refer to “a decent chance”. No quantification. No analysis to indicate what the standard deviation might be. Nothing. No wonder Mann thinks he outdoes you in statistics.
Louis Hissink

Posted Aug 12, 2006 at 9:08 AM | Permalink

One fact is clear – I havn;t quite figured out what Steve is on about here but from experience I know he has identified another issue.

Gee, gosh, darn, we mining types are slowawl?
Kevin

Posted Aug 12, 2006 at 9:21 AM | Permalink

Steve, it might be helpful for occasional or new visitors if you could provide a simple step-by-step summary of the MBH procedure to the best you’ve been able to reconstruct it. The details are here but I haven’t seen this in any one section of the blog.
Steve McIntyre

Posted Aug 12, 2006 at 9:43 AM | Permalink

#4. I’m probably going to submit something like this for publication. I’ve got working notes which are quite lengthy, but I’ll write up a short summary for readers. For now, if you browse back the MBH98 category to the discussions of the linear algebra, those may help some readers.
Steve McIntyre

Posted Aug 12, 2006 at 10:03 AM | Permalink

I’ve added in a barplot showing the contributions of individual proxies in the different weird steps. It really is frightful garbage and gets worse as you turn over more stones and see the grubs and maggots underneath each stone.
beng

Posted Aug 12, 2006 at 10:56 AM | Permalink

Precip proxies have large positive weights, & thermometer records negative weights for a temp reconstruction. Well that certainly makes sense, if you’re a kook.

I think the Mannites’ methods (& perhaps the IPCC) have little if any credibility left — at least in the US (unless they’re willing to disclose data/methods & allow replication — little chance). Canada is still signed on to Kyoto…
Lubo Motl

Posted Aug 12, 2006 at 11:33 AM | Permalink

Figure 1 is not a hockey stick graph but a police car is going nearby graph. 😉
Ken Fritsch

Posted Aug 12, 2006 at 12:00 PM | Permalink

Huybers (2005) and BàÆà⻲ger and Cubasch (2005) raise an additional concern that must be considered carefully in future research: There are many choices to be made in the statistical analysis of proxy data, and these choices influence the conclusions. Huybers (2005) recommends that to avoid ambiguity, simple averages should be used rather than principal components when estimating spatial means.

When I read into the full impact of your quote from the NAS report above, I see them politely saying that the methods used by Mann et al. are/were very susceptible to data mining (many choices) and that the conclusions must, as a result, become very suspect.

Not shown here, but I suspect that Mann somewhere along the way has made this clear, or could make it clear, by stating that he may have cherry picked (or constructed a cherry picking algorithm) but that he did it only to improve the RE statistic which in the exceptional case for his work is not considered data mining.

Mann can also note that NAS used the language “an additional concern that must be considered carefully in future research” which can be taken to mean that they were referencing future work and not past work such as his and that the concern be considered which, in effect, is not saying the method is wrong but simply that it needs more “consideration”.

Such are the alternative ways of interpreting these kinds of reports.
Steve McIntyre

Posted Aug 12, 2006 at 12:03 PM | Permalink

Even for me, I was in per’s words, “gobsmacked” when this popped up. It really is a neverending laboratory of horrors. As I mentioned, rather than MBH98 being lost to the sands of time, I am beginning to believe that it is going to become a statistical classic cited by statisticians many years from now.
Steve McIntyre

Posted Aug 12, 2006 at 12:10 PM | Permalink

#9. Ken, there’s an important and interesting point in data mining for RE statistics andI’ve perhaps under-estimated how ingrained data mining is in climate science.

When I re-read Wahl and Ammann, their premise seems to be that it doesn’t matter if their method throws up all sorts of different results (-they concede that the MM05 variation is an MBH98-variation), but that no “climatologist” would have put forward a reconstruction with a goofy RE statistic. One of my communications difficulties with climate scientists is that they all seem to think that it’s OK to mine for RE statistics with a goofy method.

Burger andCubasch caught this issue really well – pointing out that if you use RE statistics to pick a version, then it’s part of your calibration and not a verificaiton test.

The NAS panel completely cocked up this point – they said that it was OK to pick models by mining for RE statistics, using Burger and Cubasch as authorities. Another cock-up by Bloomfield and Nychka, both in terms of the citation and the point itself.
fFreddy

Posted Aug 12, 2006 at 12:23 PM | Permalink

Re #10, Steve McIntyre

…it is going to become a statistical classic cited by statisticians many years from now.

Apropos of which, any idea what happened with Dr Wegman’s session at the American Statistical Association meeting last week ?
Armand MacMurray

Posted Aug 12, 2006 at 2:23 PM | Permalink

Now all you need is an adjunct appointment somewhere like UofT or Guelph to set up a class for climatology grad students on the perils of bad stats.
Jeff Weffer

Posted Aug 12, 2006 at 3:44 PM | Permalink

Assuming you haven’t made any errors in your analysis, it really is a “pile of garbage.”

If a student submitted this in any statistical class, he/she wouldn’t even get an F. The prof would likely call them in to the office to enquire whether they were having stress-related problems or something.

But then it is not the first time someone reached a conclusion based on the RE stat when the underlying hypothesis and model was totally illogical.
jc

Posted Aug 12, 2006 at 5:18 PM | Permalink

I have a couple of questions:

First, what does it mean to give a negative weight to a dataset? Does that mean that the data is being not just thrown out, but actually inverted and added to the conclusion? (Sorry, I’m not up on my stats).

Also, in the second house subcommittee meeting (which I listened to on RealAudio), Mann seemed very vehement in pointing out the difference between a reconstruction and a model. Why was this so important?
Nicholas

Posted Aug 12, 2006 at 9:01 PM | Permalink

Mr. McIntyre:

Why are the components combined for the reconstruction all of different averages? I would understand if you didn’t want to center them at 0 if they had any signal. But since they’re clearly mostly noise, wouldn’t it make sense for their averages to be at least similar? At least, within 1 std. dev. of each other’s averages?

I understand you’re only trying to reproduce this “frankenstein” method of MBH98 without the PCs, but I’m wondering why the average of the steps up to 1740 are ~0.75 and the average of the next step is negative.

I’m thinking it’s because of the data you plotted in the first graph. Those sets don’t seem to average 0. Therefore when you splice them together they don’t match up. But, isn’t this a significant problem before you ever get to looking at the final reconstruction? Isn’t this in itself a fatal error, a place to stop and think what’s gone wrong?

Again, I would understand this if the data sets actually had a signal which had an amplitude close to variation of the differences between the means, but it seems to be an order of magnitude or two smaller…

JC, you are correct in thinking a negative weight on the eigenvalue means the series is inverted before being added in to the result.
Nicholas

Posted Aug 12, 2006 at 9:03 PM | Permalink

JC, I should add, the MBH98 network includes series which quite clearly should have a negative weight – they represent proxies which go down when temperature goes up, such as ice accumulation. (Well, whether that is a proxy for temperature at all is debatable, but if it is, it should have a negative weight).

The laughable thing is that nobody seems to have looked at which series are getting positive and negative weights. Then again, with the “divergence problem”, I’m not sure how you can tell which way it should be aligned anyway. The whole thing is a dog’s breakfast.
Paul Penrose

Posted Aug 13, 2006 at 10:51 PM | Permalink

Looking at this it’s sure clear that Dr. Mann was telling the truth about at least one thing: he is no statistician!
JMS

Posted Aug 15, 2006 at 10:14 PM | Permalink

Steve, I don’t think that this was one of the flavors suggested by the NAS. My guess is that the recommended technique would be:

Use the proper centering
Not use BCP strip bark samples.

The second criteria does not mean throwing out all the BCP samples or any of the foxtail samples. Note that strip bark is a particular form of BCP — not all BCP’s are strip barks. Foxtails can grow in a strip bark form, but that is caused by lightning strikes. BCP specimens can grow in a strip bark form w/o external disturbance. Answer these two questions and you might have something worth talking about.
Steve McIntyre

Posted Aug 15, 2006 at 10:52 PM | Permalink

Huybers (2005) and Bürger and Cubasch (2005) raise an additional concern that must be considered carefully in future research: There are many choices to be made in the statistical analysis of proxy data, and these choices influence the conclusions. Huybers (2005) recommends that to avoid ambiguity, simple averages should be used rather than principal components when estimating spatial means.
JMS

Posted Aug 15, 2006 at 11:02 PM | Permalink

Well, because of the obviously silly results you got maybe Huybers is wrong on this point. Or perhaps you misunderstood his point, or maybe your methods were incorrect. Clearly your results are nonsense and this sort of post does not help your credibility.

Why don’t your try my way?