In many interesting comments, beaker, a welcome Bayesian commenter, has endorsed the Santer criticism of Douglass et al purporting to demonstrate inconsistency between models and data for tropical troposphere trends. (Prior post in sequence here) Santer et al proposed revised significance tests which, contrary to the Douglass results, did not yield results with statistical “significance”, which they interpreted as evidence that all was well, as for example, Gavin Schmidt here:
But it is a demonstration that there is no clear model-data discrepancy in tropical tropospheric trends once you take the systematic uncertainties in data and models seriously. Funnily enough, this is exactly the conclusion reached by a much better paper by P. Thorne and colleagues. Douglass et al’s claim to the contrary is simply unsupportable.
In passing, beaker mentioned that he was re-reading Jaynes (1976), Confidence intervals vs. Bayesian intervals. I took a look at this article by a Bayesian pioneer which proved to contain many interesting dicta, many of which were directed at ineffective use of significance tests resulting in the failure to extract useful statistical conclusions available within the data, many dicta resonating, at least to me, in the present situation. The opening motto for the Jaynes article reads:
Significance tests, in their usual form, are not compatible with a Bayesian attitude.
This motto that seems strikingly at odds with beaker’s incarnation as a guardian of the purity of significance tests. Jaynes described methods whereby a practical analyst could extract useful results from the available data. Jaynes looked askance at unsophisticated arguments that results from elementary significance tests were the end of the story. In this respect, it’s surprising that we haven’t heard anything of this sort from beaker.
Not that I disagree with criticisms of the Douglass et al tests. If you’re using a significance, it’s important to do them correctly. The need to allow for autocorrelation in estimating the uncertainty in trends was a point made here long before the publication of Santer et al 2008 and was one that I agreed with in my prior post. But in a practical sense, there does appear to be a “discrepancy” between UAH data and model data (this is not just me saying this, the CCSP certainly acknowledges a “discrepancy”. It seems to me that it should be possible to say something about this data and that’s the more interesting topic that I’ve been trying to focus on. So far I am unconvinced so far by the arguments of Santer, Schmidt and coauthors purporting to show that you can’t say anything meaningful about the seeming discrepancy between the UAH tropical troposphere data and the model data. These arguments seem all too reminiscent of the attitudes criticized by Jaynes.
The Jaynes article, recommended by beaker, was highly critical of statisticians who were unable to derive useful information from data that seemed as plain as the nose on their face, because their “significance tests” were poorly designed for the matter at hand. As a programme, Jaynes’ bayesianism is an effort to extract every squeak of useful information out of the matter at hand by avoiding simplistic use of “significance tests”. This is not to justify incorrect use of significance tests – but merely to opine that the job of a bayesian statistician, according to Jaynes, is derive useful quantitative results from the available information.
Interestingly, the Jaynes reference begins with an example analysing the difference in means – taking an entirely different approach than the Santer et al t-test. Here’s how Jaynes formulates the example:
Jaynes conclusion is certainly one which resonates with me:
Now any statistical procedure which fails to extract evidence that is already clear to our unaided common sense is certainly not for me!
Now I must admit that my eyes ordinarily tend to glaze over when I read disputes between Bayesians and frequentists. However, as someone whose interests tend to be practical (and I think of my activities here as more of as a “data analyst” as opposed to a “statistician”), I like the sounds of what Jaynes is saying. In addition, our approaches here to the statistical bases of reconstructions have been “bayesian” in flavor (as Brown and Sundberg are squarely in that camp and my own experiments with profile likelihood results are, I suppose, somewhat bayesian in approach, though I don’t pretend to have all the lingo. I also don’t have to unlearn a lot of the baggage that bayesians spend so much time criticising, as my own introduction to statistics in the 1960s was from a very theoretical and apparently (unusually for the time) Bayesian viewpoint (transformation groups, orbits), with surprisingly little in retrospect on the mechanics of significance tests.
I’ve done some more calculations in which I’ve converted profile likelihoods to a bayesian-style distribution of trends, given observations up to 1999 (Santer), 2004 (Douglass) and 2008 (as I presume a bayesian would do.) They are pretty interesting. I’ll post these up tomorrow. I realize that the Santer crowd have excuses for not using up-to-date data – their reason is that the models don’t come up to 2008. It is my understanding that bayesians try to extract every squeak of usable information. It appears to me that Jaynes would castigate any analyst who failed to extract whatever can be used from up-to-date information. Nevertheless, beaker criticized Douglass et al for doing exactly that.
Perhaps beaker is a closet frequentist.
Jaynes, E. T. 1976. Confidence intervals vs. Bayesian intervals. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science 2: 175-257. http://bayes.wustl.edu/etj/articles/confidence.pdf