In a previous post, I discussed, in general terms, the issue of the conflict of interest between being an IPCC reviewer and being an active protagonist in the field. Here I illustrate the problems with specific reference to MBH98, arguing that the running text of IPCC TAR made misleading claims about the MBH98 hockey stick and that the underlying conflict of interest appears to have contributed to the making of these misleading claims. This example is obviously highly relevant to the current controversy.
Here is exactly what the running text of IPCC TAR says about MBH98
Mann et al. (1998) reconstructed global patterns of annual surface temperature several centuries back in time. They calibrated a combined terrestrial (tree ring, ice core and historical documentary indicator) and marine (coral) multi-proxy climate network against dominant patterns of 20th century global surface temperature. Averaging the reconstructed temperature patterns over the far more data-rich Northern Hemisphere half of the global domain, they estimated the Northern Hemisphere mean temperature back to AD 1400, a reconstruction which had significant skill in independent cross-validation tests. Self-consistent estimates were also made of the uncertainties. This work has now been extended back to AD 1000 (Figure 2.20, based on Mann et al., 1999). The uncertainties (the shaded region in Figure 2.20) expand considerably in earlier centuries because of the sparse network of proxy data. Taking into account these substantial uncertainties, Mann et al. (1999) concluded that the 1990s were likely to have been the warmest decade, and 1998 the warmest year, of the past millennium for at least the Northern Hemisphere. [my bold]
I’ve bolded something different than the usual cut-phrases in the last sentence, which, of course, has ended up in press releases and has been endlessly repeated by the Canadian government; here my interest is a little different. One of the reasons for the widespread adoption of MBH98-99 was its claims of statistical “skill”‘?, “robustness”‘?, careful proxy selection and relatively even geographical and proxy balance, as well as its appearance of statistical sophistication to the statistically unsophisticated paleoclimate community (enhanced by rather inflated language to describe even simple statistical tasks).
Anyone who’s read Mann’s texts as closely as I have can hardly doubt that the paragraph comes directly from Mann himself. Some of the terms do not occur in exactly that form in the underlying article (e.g. the idiosyncratic term "self-consistent estimates" as applied to MBH98 uncertainty does not occur in MBH98 itself); an independent review author making a pr’cis of MBH98-99 would have been very unlikely to have written the above paragraph. In our GRL article, we reported that our emulation of MBH98 indicated that the cross-validation R2 for the 15th century proxy roster (as well as all other cross-validation statistics other than the RE statistic) was statistically insignificant (and argued that the high RE statistic was spurious). In fact, the R2 cross-validation failure was massive, with an R2 of ~0.0. We pointed out that the adverse R2 statistic was not reported in MBH98 and, in our EE article, we sharply criticized MBH for withholding this adverse information. Aside from our calculations, there is substantial circumstantial evidence in favor of a catastrophic failure of R2 cross-validation for 15th century proxies.
First, realclimate.org has not explicitly denied it. Instead, they’ve tried to argue that the RE statistic is the only one that should be looked at. Secondly, Wahl and Ammann, whose emulation of MBH98 is virtually identical to ours, have only reported an RE statistic. Although they purport to replicate MBH98 and refute us, they have notably withheld reporting of the R2 statistic in their website presentation. A logical question to Wahl and Ammann is:
Have you obtained highly significant R2 and other cross-validation statistics for the MBH98 15th century network as claimed in IPCC TAR?
So let’s assume that the MBH98 cross-validation R2 statistic for the 15th century proxy network is ~0.0 i.e. massively insignificant. How can this be reconciled with the following claim by the IPCC [not simply Mann et al.]?
The [MBH98] reconstruction “⤠had significant skill in independent cross-validation tests.
Personally, I don’t think that it is possible. To the limited extent that Mann et al. (or Wahl and Ammann) have faced up to the catatrophic failure of the cross-validation R2 statistic, they have tried to argue that the RE statistic is the cross-validation statistic "preferred" by paleoclimatologists and blustered that McKitrick and I were simply too stupid to know this. I don’t think that this argument holds any water in the context of publication in Nature, but it sure doesn’t work in the context of IPCC TAR. It is impossible to construe the sentence highlighted above as anything other than a misrepresentation of the failure of R2 cross-validation for the 15th century network. Did this claim "matter"?
All one needs to do is consider what happens if the IPCC review author charged with considering MBH98 had written the following sentence:
The MBH98 reconstruction failed the R2 and other cross-validation tests using the 15th century network.
What happens to the hockey stick graph? Does it get inhaled into the Summary for Policymakers as a key promotional graphic? Would anyone rely on its ability to make confident assertions about whether the 1990s were the warmest decade or 1998 the warmest year? Does the hockey stick get used in Kyoto promotions all over the world? Even to pose the question is to answer it. If the failure of the 15th century cross-validation R2 statistic had been disclosed in IPCC TAR, the hockey stick simply would not exist as an icon.
Some people might argue that even an arms-length IPCC review author might not have been able to pierce the prior misrepresentations in Nature. In legal proceedings involving conflict of interest (as I understand it), courts seldom engage in speculation about what a non-conflicted party would or would not do – they generally assume near-perfect behavior by a non-conflicted party. Thus, in my opinion, given the conflict of interest, it is irrelevant to speculate whether an independent review author might or might not have identified the withholding of cross-validation statistics in MBH98 and pierced through to the underlying problem in the Nature article. We are entitled to assume that a competent and independent IPCC review author would have noticed the withholding by MBH98 of the cross-validation R2 statistic, would have sought this information directly from Mann et al. and reported the massive failure of the cross-validation R2 statistic with a completely different outcome for the use of the hockey stick in IPCC promotions.
So here is a very specific example of why it is a good idea to avoid conflicts of interest (as acknowledged by Vranes, Pielke and von Storch), specifically showing how the conflict of interest inherent in an IPCC lead author reviewing his own material ended up having a material effect on IPCC TAR.