okay, then the larger d1* comes almost exclusively from a smaller variance of observed trend, s(b0)^2 in the formula.

s(b0)^2 is inversely proportional to the nb of observations. Therefore, intuitively, adding +10 years of data (that is, +50% of data) should reduce s(b0)^2 to 2/3 of its previous (Santer08) value.

The denominator of d1* being the square root of the inter-model variance + s(b0)^2, such a reduction of s(b0)^2 should yield a denominator reduced to sqrt(2/3)=0.8 times its Santer08 value. And that’s a lower bound, obtained by neglecting the inter-model variance.

So this gives an upper bound for d1* = 1.25 times its Santer08 value. You find a x4 increase (1.69 vs 0.37).

What have I done wrong? Is s(b0) reduced by more that that? Is the inter-model variance reduced as well?

]]>Re: RomanM (Aug 13 19:34),

Yep. We never see information on power of the test when we get “fail to reject”. The power (or type II error) was discussed in my sophomore year statistics class, so it’s odd not to see it. There also never seems to be any suggestion that if we are testing a hypothesis HO and specify our assumptions about the process, we should, when possible, pick a method with greater power. (So for example, if a time series really IS AR(1), and we have a choice between using monthly data vs. annual average data, we should generally prefer a method that gives us more power. Admittedly, if the higher method requires a super-computer to implement and the poorer one can be done on a spreadsheet, one might go for the lower power method for that reason. But all things being equal, the higher power method is preferred.)

Like you said!

]]>There’s a bug in `make.table4`

— it’s trying to read from your `d:\`

drive instead of downloading from the website.

The problem of trying to show that the null hypotheses could be true also exists in other scientific areas. In particular, in bioavailability tests, a pharmaceutical company manufacturing a generic version of a drug tries to demonstrate that their product will be absorbed by the body in a manner equivalent to the original preparation. The testing required from the drug manufacturer must show that the null hypothesis of no difference in the mean absorption is not rejected.

However, in order to guarantee that the result is not due to high variability in the sample or to insufficient information due to an inadequate sample size, they *must also show* that if the difference between the two formulations was greater than a specified amount, the procedure would reject the null hypothesis at a predetermined significance level (in technical terms, the power of the test would be sufficient to distinguish differences of the given magnitude). This portion is completely lacking in the procedure used by Santer rendering the test useless.

Dr. Pielke demonstrated in his presentation how the latter procedure works: More garbage in … Santerized models out.

]]>The quantity in the numerator of d1*, (ensemble – obs trend), does not seem to be very different from that of Santer08.

I then assume that the difference in d1* comes from the denominator. So which term(s) of the denominator changes a lot between Santer08 and your estimate? Is it the (inter-model) variance of mean trends, or the variance of the observed trend?

**Steve: the change results simply from more observations, which yields more degrees of freedom and thus narrower CIs in the trend estimation allowing for AR1 autocorrelation.**

http://www.climateaudit.info/scripts/models/santer/script_comment_short.txt ]]>