Thanks, Hu — and I also find Lucia’s comment at Pielke Jr’s former blogto be very helpful in indicating how in some of the very heated debates people may be talking ast each other and not really addressing the same issue at times:

]]>One commenter, perhaps slightly off topic suggested winnowing out the models consistently producing the trashier forecasts but what kind of heresy is that? That’s a direct attack on the alarmism! Consistency has to be defined in a way that all manner of garbage is carried along!

]]>I don’t have the raw data. Perhaps you can get it from lucia. Have you visited her blog? It’s very good. Just post a request for the data and I bet she will post it or email it to you.

http://rankexploits.com/musings/

Thank you for your response Snowrunner. The example we did in class was modeling of a power system grid.

That was complicated but the results were straightforward. This climate modeling seems to be so much different.

I’ll just have to study this more.

]]>Borrowing from John V:

It’s an important statistical subtlety, but the two sets of trend estimates should not be expected to have the same variance.

If we were comparing the heights of Swedish men (per lucia’s analog), this would be the difference:

The 5 observations are 5 measurements of *one* Swedish man’s height. The 55 models are single measurements of each of 55 different Swedish men. Both sets of data attempt to determine the average height of Swedish men. The variance is not the same.

So the appropriate scaling for the difference is .

]]>Also, having outlier models allow the scary scenarios to remain firmly in place as part of a “not rejected” ensemble. This is also preposterous when considering how these models are used for long term future projections.

]]>But of course the SE’s are much smaller than the SD’s (by a factor of sqrt(N)), so equality of the means is a very easy reject.

It does appear that Megan’s internet test is based on the equality of the two SDs, while in fact they are obviously very different. There is a long literature on the “Fisher-Behrens” problem of testing the equality of two means when the variances are unequal, e.g. http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aoms/1177697509 . It seems that there is no exact test, but a lot of good approximate tests. In the present case, however, it’s not even a close call, so any reasonable statistic would give an easy reject.

]]>To be clear, I have suggested that the individual models give too little variability, a point on which everyone seems to agree. Clearly a stochastic component needs to be added.

Yes I am suggesting the **inter-model** variability is too high. As discussed here we can make it as high or low as we want by adding or subtracting models, bogus or otherwise. In addition as I explained, they should all be designed to produce future histories – ie they are trying to do the same thing. If they differ wildly as to future histories, compare them to the data and pick the best model.

As my example above shows, it seems suspiciously like the ensemble approach is designed to “not reject” the models, while also not providing the inherent stochastic variability that would do anything but allow the 100 yr projections to essentially just have a drift term.

It’s looks like another bogus statistical approach from the climate scientists.

]]>Orkneygal: Thought it wouldn’t help. For your example of lapse rate, I don’t know what this is, but if it is an output of model, then a single run only gives a single point value. Since this is no use to anyone, there has to be some attempt to quantify the uncertainty in some other way. One way is to run the model with different starting values, which will result in different point values for the lapse rate. The distribution of these can be used as an estimate of the range of possible true values. It is obvious however that a single model even with different starting values will not give enough variability, which is why a range of models is used.

A credibility or (or “credible”) interval is the Bayesian equivalent of a confidence interval. From the posterior distribution of a parameter (ie the distribution given the data and the prior) take the interval that includes 95% of this distribution.

]]>