Using Santer’s Method

Using Santer’s own methodology with up-to-date observations, here are results comparing observations to the ensemble mean of Chad’s collation of 57 A1B to models to 2009. In each case, the d1* calculated Santer-style has moved into very extreme percentiles.

The results from Ross’ more advanced methodology are not getting results that are in any sense “inconsistent” with the application of Santer’s own methods to up-to-date data.

 Tropo Sat Obs Trend Ensemble Santer d1* (1999) d1*(2009) Percentile Lapse_T2LT rss -0.033 -0.079 -0.67 -2.819 0.003 Lapse_T2LT uah 0.048 -0.079 -3.5 -7.395 0 Lapse_T2 rss 0.005 -0.069 NA -4.212 0 Lapse_T2 uah 0.084 -0.069 NA -8.518 0 T2LT rss 0.159 0.272 0.37 1.69 0.948 T2LT uah 0.075 0.272 1.11 2.862 0.996 T2 rss 0.121 0.262 0.44 2.196 0.981 T2 uah 0.04 0.262 1.19 3.449 0.999

1. Posted Aug 13, 2010 at 12:39 PM | Permalink

I’m guessing the final sentence should say “up-to-date data”?

Steve
; fixed

2. Steve McIntyre
Posted Aug 13, 2010 at 12:44 PM | Permalink

Script for this – which collects quite a bit of other information – is at
http://www.climateaudit.info/scripts/models/santer/script_comment_short.txt

• pete
Posted Aug 13, 2010 at 9:39 PM | Permalink

There’s a bug in make.table4 — it’s trying to read from your d:\ drive instead of downloading from the website.

3. nono
Posted Aug 13, 2010 at 2:00 PM | Permalink

Steve, I’m looking at T2LT, rss.

The quantity in the numerator of d1*, (ensemble – obs trend), does not seem to be very different from that of Santer08.

I then assume that the difference in d1* comes from the denominator. So which term(s) of the denominator changes a lot between Santer08 and your estimate? Is it the (inter-model) variance of mean trends, or the variance of the observed trend?

Steve: the change results simply from more observations, which yields more degrees of freedom and thus narrower CIs in the trend estimation allowing for AR1 autocorrelation.

• nono
Posted Aug 14, 2010 at 2:53 PM | Permalink

okay, then the larger d1* comes almost exclusively from a smaller variance of observed trend, s(b0)^2 in the formula.

s(b0)^2 is inversely proportional to the nb of observations. Therefore, intuitively, adding +10 years of data (that is, +50% of data) should reduce s(b0)^2 to 2/3 of its previous (Santer08) value.

The denominator of d1* being the square root of the inter-model variance + s(b0)^2, such a reduction of s(b0)^2 should yield a denominator reduced to sqrt(2/3)=0.8 times its Santer08 value. And that’s a lower bound, obtained by neglecting the inter-model variance.

So this gives an upper bound for d1* = 1.25 times its Santer08 value. You find a x4 increase (1.69 vs 0.37).

What have I done wrong? Is s(b0) reduced by more that that? Is the inter-model variance reduced as well?

4. Kenneth Fritsch
Posted Aug 13, 2010 at 3:02 PM | Permalink

I assume that Lapse_T2LT and Lapse_T2 refer to the difference series between the troposphere and surface temperature anomalies.

5. RomanM
Posted Aug 13, 2010 at 7:34 PM | Permalink

Santerizing the models is not now nor ever will be a legitimate statistical procedure for the simple reason that statisticians do not consider the mere failure to reject the null hypothesis as evidence to support that hypothesis.

The problem of trying to show that the null hypotheses could be true also exists in other scientific areas. In particular, in bioavailability tests, a pharmaceutical company manufacturing a generic version of a drug tries to demonstrate that their product will be absorbed by the body in a manner equivalent to the original preparation. The testing required from the drug manufacturer must show that the null hypothesis of no difference in the mean absorption is not rejected.

However, in order to guarantee that the result is not due to high variability in the sample or to insufficient information due to an inadequate sample size, they must also show that if the difference between the two formulations was greater than a specified amount, the procedure would reject the null hypothesis at a predetermined significance level (in technical terms, the power of the test would be sufficient to distinguish differences of the given magnitude). This portion is completely lacking in the procedure used by Santer rendering the test useless.

Dr. Pielke demonstrated in his presentation how the latter procedure works: More garbage in … Santerized models out.

• Posted Aug 14, 2010 at 11:55 AM | Permalink

Re: RomanM (Aug 13 19:34),
Yep. We never see information on power of the test when we get “fail to reject”. The power (or type II error) was discussed in my sophomore year statistics class, so it’s odd not to see it. There also never seems to be any suggestion that if we are testing a hypothesis HO and specify our assumptions about the process, we should, when possible, pick a method with greater power. (So for example, if a time series really IS AR(1), and we have a choice between using monthly data vs. annual average data, we should generally prefer a method that gives us more power. Admittedly, if the higher method requires a super-computer to implement and the poorer one can be done on a spreadsheet, one might go for the lower power method for that reason. But all things being equal, the higher power method is preferred.)

6. Lewis
Posted Aug 14, 2010 at 11:50 AM | Permalink

Santerizing the models is not now nor ever will be a legitimate statistical procedure for the simple reason that statisticians do not consider the mere failure to reject the null hypothesis as evidence to support that hypothesis.

Like you said!

7. Lewis
Posted Aug 14, 2010 at 11:56 AM | Permalink

Also, there’s a known rule of common sense – keep pharmacists away from statistics! How many press releases of ‘statistical significance’ are reported as red letter news! And yet, with climate science, we give them a pass!

8. PaulM
Posted Aug 14, 2010 at 12:07 PM | Permalink

Roman you are muddying the waters a bit. Steves point is that the null is rejected, with all the data!