In today’s post, I think that I’ve developed an interesting approach to the Santer problem, which represents a substantial improvement to the analyses of either the Santer or Douglas posses.
I think that the approach proposed here is virtually identical to Jaynes’ approach to analyzing the difference between two means, as set out in the article recommended by beaker. As it happens, I’d done all of the calculations shown today prior to reading this article. While my own calculations were motivated primarily by trying to make sense of the data rather than anything philosophical, academics like to pigeonhole approaches and, to that extent, the approach shown below would perhaps qualify as a “bayesian” approach to the Santer problem, as opposed to the “frequentist” approach used both by Santer and Douglass. I had the post below pretty much in hand, when I was teasing beaker about being a closet frequentist.
The first problem discussed in Jaynes 1976 was the following question about the difference of two means (and keep in mind that the issue in Santer v Douglass is the difference between two trends):
The form of conclusion that I’m going to arrive at today is going to be identical – ironically even the number is identical: there is a 92% probability that a model trend will exceed the true observed trend. Now as a caveat, my own terminology here may be a little homemade and/or idiosyncratic; I place more weight on the structure of the calculations which are objective. I think that the calculations below are identical to the following equation in Jaynes 1976 (though I don’t swear to this and, as noted before, I’d done the calculations below before I became aware of Jaynes 1976).
Let me start with the following IMO instructive diagram, which represents my effort to calculate a probability distribution of the slope of the trend, given the observations to 1999 (Santer), 2004 (Douglass) and up to 2008 (most of which were available to Santer et al 2008). It looks like this corresponds to Jaynes , i.e. what bayesians call a “posterior distribution”, but I’m just learning the lingo. The dotted vertical lines represent 95% confidence intervals from the profile likelihood calculations shown in a prior post, color coded by endpoint. The colored dots represent 95% confidence intervals calculated using the Neff rule of thumb for AR1 autocorrelation in Santer. The black triangle at the bottom shows ensemble mean trend (0.214 deg C/decade).
A couple of obvious comments. The 95% confidence intervals derived from profile likelihoods are pretty much identical to the 95% confidence intervals derived from the AR1 rule of thumb – that’s reassuring in a couple of ways. It reassured me at least that my experimental calculations had not gone totally off the rails. Second, the illustration of probability distributions shown here is vastly more informative than anything in either Santer or Douglass both for the individual years and showing the impact of over 100 more months of data in reducing the CI of the trend due to AR1 autocorrelation. (A note here: AR1 autocorrelation wears off fairly quickly – if LTP is really in play, then the story would be different.)
Thirdly and this is perhaps a little unexpected, inclusion of data from 1999 to 2008 has negligible impact on the maximum likelihood trend of the MSU tropical data; the most distinct impact is on the confidence intervals, which narrow a great deal. The CIs inclusive of 2008 data are about 50% narrower than the ones using data only up to 1999. Whether this is apples-and-apples or the extent to which this is apples and apples, I’ll defer for now.
Fourth and this may prove important, although the ensemble mean triangle relative to 95% CI intervals is unaffected by the inclusion of data up to 2004, it is not unaffected by the inclusion of data up to 2008. With 2008 data in, the CI is narrowed such that ensemble mean is outside the 95% CI interval of the observations – something that seems intuitively relevant to an analyst. I feel somewhat emboldened by Jaynes 1976’s strong advocacy of pointing out things that are plain to analysts, if not frequentists.
My procedure for calculating the distribution was, I think, interesting, even if a little homemade. The likelihood diagrams that I derived a couple of days ago had a function that yielded confidence intervals (this is what yielded the 95% CI line segment.) I repeated the calculation for confidence levels ranging from 0 to 99%, which with a little manipulation gave a cumulative distribution function at irregular intervals. I fitted a spline function to these irregular intervals and obtained values at regular intervals and from these obtained the probability density functions shown here. The idea behind the likelihood diagrams was “profile likelihood” along the lines of Brown and Sundberg – assuming a slope and then calculating the likelihood of the best AR1 fit to the residuals. I don’t know how this fits into Bayesian protocols, but it definitely yielded results, that seem to be far more “interesting” than the arid table of Santer et al.
My next diagram is a simple histogram of LT2T trends derived from data in Douglass et al 2007 Table 1, which, for the moment, I’m assuming is an accurate collation of Santer’s 20CEN results. In Douglass’ table, he averaged results within a given model; I’ll revisit the analysis shown here if and when I get digital versions of the 49 runs from Santer (Santer et al did not archive digital information for their 49 runs; I’ve requested the digital data from Santer – see comments below). [Note - beaker in a comment below states that this comparison is pointless. I remain baffled by his arguments as to why this sort of comparison is pointless and have requested a reference illuminating the statistical philosophy of taking a GCM ensemble and will undertake to revisit the matter if and when such a reference arrives.]
I derived LT2T trends by applying weights for each level (sent to me by John Christy) to the trends at each level in the Douglass table. In the diagram, I’ve carried forward the 95% CI intervals from the observations for location and scale comparison, as well as the black triangle for the ensemble mean. The $64 dollar question is then the one that Jaynes asks. [Note - I think that this needs to be re-worked a bit more as a "posterior" distribution to make it more like Figure 1; I'm doing that this afternoon - Oct 21 - and will report on this.]
My next diagram is the proportion of models with LT2T trends that exceed a given x-value. I think that this corresponds to Jaynes . The dotted lines are as before; the solid color-coded vertical lines are the maximum likelihood trend estimates for the respective periods. Again for an analyst, the models seem to the the right of these estimates.
I then did a form of integration which I think is along the lines of the Jaynes recipe. I made a data frame with x-values in 0.01 increments (Data[[i]]$a, with i subscripting the three periods 1999, 2004 and 2008. For each value of x, I made a column of values representing the results of Figure 3:
for(i in 1:3) Data[[i]]$fail=sapply(Data[[i]]$a, function(x) sum(douglass$trend_LT>=x )/22)
Having already calculated the distribution in column Data[[i]]$d, I calculated the product as follows:
for(i in 1:3) Data[[i]]$density=Data[[i]]$d*Data[[i]]$fail
Then simply add up the columns to get the integral (which should be an answer to Jaynes question):
# 1999 2004 2008
#0.8638952 0.8530540 0.9164450
On this basis, there is a 91.6% probability that the model trend will exceed the observed trend.
I leave it to climate philosophers to decide whether this is “significant” or not.
Going back to Jaynes 1976 – I think that this calculation turns out to be an uncannily exact implementation of how Jaynes approached the problem. In that respect, it was very timely for me to read this reference while the calculation was fresh in my mind. It was also reassuring that, merely by implementing practical analysis methods, I’d in effect got to the same answer as Jaynes had long ago.