In the Muir Russell report, Richard Horton observed that orthodox medicine “mostly rejects” papers that invoke invisible pathways (meridians of qi):
For example, the world of complementary and alternative medicine (CAM) divides the medical community. Orthodox medicine mostly rejects papers about reflexology, iridology, and acupuncture treatment that invokes invisible pathways (meridians) of qi. CAM is served by a separate class of journals that have little overlap with the more mainstream medical literature. In this instance, ideas are incommensurable.
Unfortunately, the climate science community has been far more accommodating to the paleoclimate equivalent of alternative statistics, into orthodox journals. The wider climate science “community” is placed in the awkward position of trying to reassure the public that other parts of their field are, in fact, based on science, while, at the same time, not only not disavowing, but actively defending paleo-phrenologists and the meridians of qi converging on bristlecones in California and the magic larches in Yamal. Given that strip bark bulges, which are mostly likely merely mechanical, are interpreted as expressions not just of local temperature and precipitation but of world “instrumental training patterns” or “climate fields”, phrenology is a surprisingly apt term.
The problems of strip bark standardization were being discussed in the thread where Climategate was first mentioned – a thread which contains relevant illustrations of the problems in trying to fit strip bark bulges into any statistical framework – let alone the statistical framework stated to underpin MBH98. The picture below shows the sort of phrenological bulge that underpins the strip bark Hockey Stick – see here for further discussion.) There is convincing evidence that such bulges are present in strip bark chronologies – one of the reasons why the NAS panel said that they should be avoided in temperature reconstructions. Gavin and Tamino can huff and puff all they want about the 4th principal component, but this is the sort of data that they are importing under the guise of the “right” number of principal components.
All their talk about the “right” number of principal components is simply sleight-of-hand to confuse you – when you watch the pea, the entire purpose of the high-falutin talk about principal components is to “get” strip bark bulges into the reconstruction.
This is an old debate, but the only thing that the Team moves is the pea under the thimble.
Implicit in our approach are at least three fundamental assumptions. (1) The indicators in our multiproxy trainee network are linearly related to one or more of the instrumental training patterns. In the relatively unlikely event that a proxy indicator represents a truly local climate phenomenon which is uncorrelated with larger scale climate variations, or represents a highly nonlinear response to climate variations, this assumption will not be satisfied.
This, of course, is a large part of the problem with strip bark bristlecones (and YAD061 and its cousins.) Actually, the problem with strip bark trees looks even worse – it seems very possible, even likely, that the 6-sigma bulges in strip bark widths are purely mechanical, arising from the formation of strip bark itself. However, these 6-sigma bulges become proof in the hands of paleo-phreonologists using their own alternative statistics.
The failure of the most critical MBH proxies – strip-bark bristlecones – to meet the assumptions of their statistical model was stated as early as out 2004 Nature submission (there is compelling evidence that Jones was the third and very antagonistic reviewer), where we stated:
The NOAMER PC1 thus gets its hockey stick shape from the Graybill-Idso sites, which exhibit a nonclimatic response and/or a nonlinear response to 20th century temperature. Since MBH98 states (p. 780) that their method requires the assumption that proxies exhibit a linear response to temperature, the Graybill-Idso sites, explicitly acknowledged as problematic in Mann et al (1999) (ref. ), should have been disqualified as contributors to the NOAMER PC1 in MBH98, let alone as the main determinants of its shape.
Much effort has been spent by paleo-phrenologists to frame the issue as the “right” number of principal components to retain – as opposed to the underlying issue as we had framed it – whether the assumptions of the underlying statistical model had been satisfied. Indeed, we noted that MBH99 had even acknowledged the failure of stripbark bristlecones to satisfy the assumptions of their model:
Mann et al. (1999) themselves pointed out, with reference to these proxies: “A number of the highest elevation chronologies in the western U.S. do appear, however, to have exhibited long-term growth increases that are more dramatic than can be explained by instrumental temperature trends in these regions.”
With the inconsistency that so characterizes the field, after conceding that bristlecones do not meet the assumptions of their statistical model, Mann proceeded to use them anyway. (Despite statements in MBH98 that the reconstruction was “robust” to the presence/absence of all dendro proxies, MBH98 was not “robust” to the presence/absence of bristlecones. Thus, instead of not using bristlecones because they failed to satisfy the assumption of the statistical model, Mann purported to “adjust” the strip bark bristlecone chronologies – an adjustment convincingly criticized by Jean S last year. Mann’s methodology, here as elsewhere, belongs to what can only be described as alternative statistics, a discipline that, as noted above, has found a home in the climate science sections of otherwise orthodox journals.
Mann responded to our observation that Graybill strip bark bristlecones did not meet the fundamental assumption of his methodology by invoking a supposed relationship to “instrumental training patterns” as opposed to local temperature and precipitationi:
MM04 demonstrate their failure to understand our methods by claiming that we required that “proxies follow a linear temperature response”. In fact we specified (MBH98) that indicators should be “linearly related to one or more of the instrumental training patterns2”, not local temperatures.
(Update-Jul 25-6 the criticism of Mannian teleconnections is not refuted by point to ENSO. Individual trees respond to local temperature and precipitation etc; they do not respond to abstractions like a PC3.Further, the problematic 6-sigma strip bark bulges that characterize Team reconstructions are not a linear response to climate at all.) Roman Mureika expresses the point in a comment as follows:
What the climate scientists don’t seem to understand is that for teleconnections to be usable in a scientific fashion, there must be a specific real identifiable physical effect which operates at the proxy location. This effect is clearly not local temperature since the proxy has not responded to that. To further assume that this unidentified effect is related in an appropriate equivalent quantitative form to the proxy measurements is a fiction which lends itself to the cherry picking of spuriously correlated series.
[end – update]
In my opinion, if climate scientists in other parts of the community took pains to disavow paleoclimate meridians of qi and alternative statistical methods used to buttress them – which , after all, are an important part of the public face of climate science – there would have been less fall-out for the rest of the discipline in the wake of Climategate.
When the NAS panel said that strip bark bristlecones should not be used in temperature reconstructions, this should have put an end to the use of Graybill bristlecones in temperature reconstructions. However, this didn’t happen. Wahl and Ammann totally ignored the recommendations of the NAS panel, even though it wasn’t finally published until a year after the NAS panel; the companion paper, Ammann and Wahl 2007, wasn’t even submitted until after the NAS panel. Other members of the Team also continued the use of strip bark after the NAS panel e.g. Hegerl et al 2007, Juckes et al 2007, Mann et al 2008.
Wahl and Ammann, as discussed in past CA posts, is a sustained exercise in Texas sharpshooting. Their efforts to benchmark RE significance were, of course, a singular contribution to Texas sharpshooting literature. But most of the rest of their article are variations on the theme.
Even the longstanding issue of 2 or 5 PCs comes down to Texas sharpshooting. As Jean S reminded readers at Bishop Hill, there was no evidence that Mann used Preisendorfer’s Rule N in determining the number of retained PCs in MBH98. Indeed, the explicit language of the article indicates another rule. Mann has refused to provide source code evidencing the use of this rule in MBH98. Using this rule after the act is simply one more example of Texas sharpshooting – what Wegman called “no statistical integrity”.
Gavin Schmidt’s inline responses to Judy Curry here relies heavily on Wahl and Ammann
2004 2005 2006 2007 includes a complaint that we haven’t published a rebuttal of Wahl and Ammann in the peer-reviewed litchurchur.
Obviously I’ve commented on Wahl and Ammann at length at Climate Audit. I recognize that these comments haven’t been peer reviewed by Jones, Santer, Mann and their associates, but they are still comments that I believe to be thoughtful and ones that are worth reading by someone interested in the topic. There is a separate left-frame category for Wahl and Ammann.
Second, it is very much my belief that, if the points made in these threads and elsewhere are correct (and I believe them to be), then these are sorts of things that specialists in the field, employed to do these sorts of studies, should be responsible for knowing whether or not I’d written the threads. That I’ve commented should be an assistance to them, but surely not a prerequisite.
Third, although Schmidt complains that we haven’t rebutted Wahl and Ammann in the litchurchur, this is not entirely true. McIntyre and McKitrick (E&E 2005) rebutted many, if not most, of the points at issue in Wahl and Ammann. This may seem a little surprising given that MM2005 (EE) was published prior to Wahl and Ammann. Nonetheless, it is so.
All the key arguments of Wahl and Ammann 2007 – bristlecones in a lower-order PC4, two versus 5 PCs, Mannian inverse regression without PCs – were first put forward in the Mann response to our re-submission, which I’ve placed online here – see, in particular, Mann’s cover letter.
These arguments from Mann’s 2004 response to our Nature submission featured prominently in multiple threads in the opening of realclimate. (It was these pre-emptive attacks on us that led to the opening of climateaudit as a blog in late January 2005, thanks to the suggestion and initiative of John A.)
Although Wahl and Ammann did not cite either the Mann submission to Nature or the realclimate posts (and conspicuously do not even acknowledge Mann), virtually all the main arguments in Wahl and Ammann derive from these prior publications by Mann. Isn’t the failure to acknowledge such priority a form of plagiarism?
In MM2005 (EE), we reported on our examination of the various permutations and combinations of correlation and covariance PCs, the impact of 2 or 5 PCs, etc, that had been previously raised, plus a few others. If you go through the salient cases of Wahl and Ammann, you’ll find that they are already considered in MM2005 (EE). Of course, this isn’t reported either. (Wahl’s awareness of this priority is demonstrated in his Climategate correspondence with Briffa.)
Doubtless it would have made things easier for people if we’d responded to Wahl and Ammann/Ammann and Wahl (the SI to which only became available in summer 2008) and it’s on my list of things to do. But the fact that I haven’t attempted to run the gauntlet of Team reviewers in the litchurchur doesn’t mean that I haven’t responded to Wahl and Ammann. The points have been responded to at considerable length.
Schmidt also grasped the verification r2 nettle – a nettle that he would have been better off leaving ungrasped. This was a battleground issue in 2005. Judy Curry had written:
just because no single significance test is objectively the best in all circumstances does not mean that you can cherry pick significance tests until you find one you like and ignore R2.
Gavin Schmidt replied:
[Response: This is simply insulting. You have absolutely no evidence that this was the case. The RE/CE statistics are perfectly fine at describing what the authors thought were relevant and have a long history in that field (Fritts, 1976) and as we have seen the PCA issue is moot. The idea that people went looking for ‘bad statistics’ to fix their results is without merit whatsoever. Please withdraw that claim.]
Well, it may be insulting, but the evidence is what it is.
Fritts, 1976 does not stand as an authority for not using verification r2, as it is a test that Fritts recommends prior to doing the RE test. Secondly, Schmidt’s claim that Mann reported an RE/CE pair is untrue. Mann did not report CE results for MBH98. They were first reported in MM2005 (GRL), where we observed that the AD1400 step failed the CE test, as it had the verification r2 test.
However, the most compelling evidence of Mann reporting a verification r2 in a step where it was favorable was, of course, in MBH98 itself, where Figure 3b is clearly labeled “verification r2” – see below:
While the verification r2 is illustrated geographically in the above graphic, and MBH98 stated that they considered r2 statistics, the SI to MBH98 showed only the RE results and not verification r2 statistics. Mann’s source code, archived in response to the House committee, showed that he calculated verification r2 in the same step as verification RE, a point made at the time and later presented to the NAS panel.
The original Wahl and Ammann submission likewise did not include verification r2 results (even though they had issued a press release that our results were “unfounded”) Our codes and the Wahl-Ammann code reconciled – Wegman waggishly observed that it was more correct to say that Wahl and Ammann replicated our results, than Mann’s. As a reviewer of Wahl and Ammann, I asked that they include verification r2 results. They refused, citing their GRL article as authority (without disclosing to Schneider that their GRL article had already been rejected.)
In December 2005, I suggested to Ammann that we write a joint paper clearly summarizing points of agreement and disagreement. He refused, saying that it would be “bad for his career”. This has led to a great deal of wasted time on everybody’s part. Again, he refused to report verification r2 results. These were reported only after an academic misconduct complaint was filed against Ammann. Needless to say, Ammann got the same negligible verification r2 results that we had.
The NAS offered to examine the verification r2 issue, but Cicerone removed it from the terms of reference of the NAS panel. Nonetheless, panelist Christy asked Mann whether he had calculated verification r2 for the 1400 step and what the result was. Mann denied calculating the verification r2, saying that this would be a “foolish and incorrect” thing to do. Of course, it was known at the time that he had calculated verification r2 statistics, since it was in his code and illustrated for the AD1820 step.
The “dirty laundry” email (in which Mann sent to Briffa and Osborn the residuals that he later refused to send me) had not been available to the NAS panel. With these residuals in hand (or even if the actual reconstruction steps had been made available), it was child’s play to see the failed verification r2, CE and other results.
As it was, the NAS panel was seemingly dumbfounded by Mann’s bald-faced answer and did not follow up. There was supposed to be an opportunity for public discussion after presentations. However, Mann fled the room before anyone from the public e.g. me had an opportunity to ask. I sharply criticized the NAS panel for sitting like bumps on a log and not following up. Nychka came up to me afterwards and said that, just because they didn’t say anything didn’t mean that they didn’t notice. They didn’t say anything in their report either on the topic so they might as well not have noticed.
The Wahl and Ammann attempt to justify the failed verification r2 test was itself one more instance of the Texas sharpshooting. Once the failed verification r2 was exposed (and only after its exposure by third parties), they attempted to re-frame the question by now arguing that verification r2 wasn’t a relevant statistic – notwithstanding their use of the statistic in the illustration when it was to their advantage. Schmidt may find this impolite, but facts are sometimes stubborn.
The failure of the MBH verification r2 results was not as small a result at the time as now portrayed by the Team. Eduardo Zorita told me that his view on MBH changed once he knew of the failed verification r2. If Mann wanted to argue that the failed verification r2 didn’t “matter”, the failed results should have been reported and discussed in the original article.
Thus, while reconstructions relying on strip bark bulges of California bristlecones and magic Yamal larches have been published in orthodox scientific journals, this does not change the fact that the underlying analyses do not rise above phrenology.