The comments by James Annan and his reviewers here on McKitrick et al (2010) demonstrate very nicely how the literature gets distorted by the rejection of a simple comment showing that the application of Santer’s own method to updated data resulted in failure on key statistics. Annan and his commenters are worrying about the novelty of the method and accusing us of being subject to the same criticisms that Santer made of Douglass.
The statistical apparatus of MMH10 is used in econometrics and is not “novel” in that sense. But it is unfamiliar to climate science readers and it’s entirely reasonable for them to wonder whether there is some catch to the new method. It’s a question that I would ask in their shoes.
That Annan and his commenters should be in a position to make such a comment shows how the IJC reviewers and editor Glenn McGregor have succeeded in poisoning the well, by rejecting a simple comment showing that key Santer results fail with updated data (our comment is on arxiv here.
In our rejected IJC comment, we used Santer’s exact methodology. Nonetheless, Annann makes the following accusation:
Tuesday, August 10, 2010
How not to compare models to data part eleventy-nine…Not to beat the old dark smear in the road where the horse used to be, but…
A commenter pointed me towards this which has apparently been accepted for publication in ASL. It’s the same sorry old tale of someone comparing an ensemble of models to data, but doing so by checking whether the observations match the ensemble mean.
Well, duh. Of course the obs don’t match the ensemble mean. Even the models don’t match the ensemble mean – and this difference will frequently be statistically significant (depending on how much data you use). Is anyone seriously going to argue on the basis of this that the models don’t predict their own behaviour? If not, why on Earth should it be considered a meaningful test of how well the models simulate reality?
Later a commenter writes:
1) Didn’t Santer address this point? e.g. “application of an inappropriate statistical ‘consistency test’.” Perhaps you’re right that by adding all the extra bits to the paper, they made it so that an idiot might not realize the elementary nature of the most important error, and we need to keep in mind that there are many idiots out there, but…
To which Annan responds:
I haven’t got Santer to hand (and am about to go travelling, so am not going to go looking for it) so I will take your word for it. In which case this new paper is pretty ridiculous. Well, it’s ridiculous anyway.
The observation that key Santer results do not hold up with more recent data is not “ridiculous”. It holds for key Santer results using Santer’s own methodology. The only reason that this information is not in the “literature” is that IJC editor Glenn McGregor did not feel that, as N IJC journal editor, he had any responsibility to place rebuttal of Santer results in the literature and appears to have permitted reviewers with conflicts to determine the outcome of the rebuttal.
But now the debate is muddied because it is entangled with understanding a different methodology.
The people to blame for the muddying of the debate are McGregor and the IJC reviewers who rejected our simple comment.
If, as seems likely, the most adverse reviewer was Santer coauthor, Peter Thorne of the UK Met Office, Thorne would be the most responsible for Annan and his readers being unaware of this result. Thorne wrote to Phil Jones on May 12, 2009 (there had been no CA discussion to that point and the decision had been issued on May 1, 2009) as follows:
Mr. Fraudit never goes away does he? How often has he been told that we don’t have permission? Ho hum. Oh, I heard that fraudit’s Santer et al comment got rejected. That’ll brighten your day at least a teensy bit?
This represents the attitude of the climate science peer reviewers who tied up our Santer comment at IJC.
It didn’t do anything novel or fancy. It just replicated Santer’s methodology to updated data and showed that key results no longer held up. As noted yesterday, the paper was rejected. One reviewer’s principal complaint proved to be not with our results, but an argument with Santer’s methodology. (It looks like this reviewer was Peter Thorne, who ironically was one of the Santer coauthors.)
The authors should read Santer et al. 2005 and utilise this diagnostic. It is a pity that Douglass et al took us down this interesting cul-de-sac and that Santer et al 2008 did not address it but rather chose to perpetuate it. The authors could reverse this descent away to meaningless arguments very simply by noting that the constrained aspect within all of the models is the ratio of changes and that therefore it is this aspect of real-world behaviour that we should be investigating, and then performing the analysis based upon these ratios in the models and the observations.








