David Douglass writes in:
I wish to make a few comments/clarifications on the Douglass et al. paper. We took the model data from the LLNL archive. We computed the tropical zonal averages and trends from 1979 to 1999 at the various pressure levels for each realizations for each model and averaged the realizations for each model to simulate removal of the El Nino Southern Oscillation (ENSO) effect; these models cannot reproduce the observed time sequence of El Nino and La Nina events, except by chance as has been pointed out by Santer et al..
We offered no opinion as to the validity of any model.
This is table II of our paper. [ Note: Some of the values in the table for the models for 1000 hPa are not consistent with the surface value. This is probably because some model values for p = 1000 hPa are unrealistic; they may be below the surface. This may be why the modelers compute the surface value separately and are the values that we use in the plot.]
How to interpret model data in Table II?
We noted that previous papers and IPCC reports had introduced the concept of averaging over models. There is no scientific justification for doing so. The models are not independent samples from a parent population because the models are, in fact, different from each other in many ways.
Nevertheless, we used this same concept to describe the results of our paper and computed statistical quantities: averages, standard deviation, standard errors of the mean. This may not have been the best decision. Perhaps a better way would have been to forget statistics altogether. Just look at the plot of the model results from the table as Willis Eschenbach has suggested [comment # 144]. I show this plot below. Note that the models are not normalized to thier surface value; normalizing hides some of the discrepancies.
The observations are not plotted. The main observational results are:
1. The surface value is about 0.125 K/decade which is close to the 22-model average.
2. The trend values of the 4 data sets generally decrease with altitude. [Note; Christy has explained in these comments the choice of RAOBCORE 1.2. We submitted this to the IJC Journal on Jan 3, 2008 as an addendum. It is not yet published, but I will send a copy of the addendum to anyone requesting it.]
The plot of the values from the table indicates that most of the models show an increase with altitude – opposite to what the observations show.
Which ones ?
We would like to make a list of which models do not agree with the observations. Better yet, which ones are not excluded?
Rather than argue pointlessly about the meaning of standard deviation and error of the mean of the 22 models, let’s do something real simple. Each model must and should be compared individually against the two observational results above.
Using the somewhat arbitrary criteria that models that have a trend at higher altitudes [between 500 and 200 hPA] greater than 0.2K/decade disagree with the observations. This test then rejects all models except 2, 8, 12, 22. [These are labeled in the figure]
Reject models that have surface trends less than 0.05K/decade. Model 22 from the list in test 1 is now eliminated. Out of the 22, only models 2, 8, and 12 are left as viable. Further tests or different tests can change this list. Invent your own tests or criteria. The main conclusion will be the same – only a few models can be reconciled with the observations.
I want to a make a general comment.
Each modeling group may find comparison of its results to results from other groups interesting but a much more important comparison is to the observations. I would expect each group following the scientific method to do this. A corollary of this is that each group would have to say that some, perhaps most, models are wrong. It is in the interest of every group to do this.