I posted up on Kaufmann and Stern  on GCMs a few days ago. Kaufmann subsequently posted up at realclimate here about this, with a detailed reply from Gavin. The exchange is interesting on a number of levels – there is an interesting statistical point raised. In addition, you will notice how quick Gavin is to try to move what was quickly becoming a "serious" discussion offline, perhaps so that the hoi polloi aren’t involved.
Kaufmann wrote in as follows:
I would like to pick up on a comment made by per (#58) about testing GCM’s against real-world data. As an outsider to the GCM community, I did such an analysis by testing whether the exogenous inputs to GCM (radiative forcing of greenhouse gases and anthropogenic sulfur emissions) have explanatory power about observed temperature relative to the temperature forecast generated by the GCM. In summary, I found that the data used to simulate the model have information about observed temperature beyond the temperature data generated by the GCM. This implies that the GCM’s tested do not incorporate all of the explanatory power in the radiative forcing data in the temperature forecast. If you would like to see the paper, it is titled "A statistical evaluation of GCM’s: Modeling the temporal relation between radiative forcing and global surface temperature" and is available from my website
Needless to say, this paper was not received well by some GCM modelers. The paper would usually have two good reviews and one review that wanted more changes. Together with my co-author, we made the requested changes (including adding an errors-in variables" approach). The back and fourth was so time consuming that in the most recent review, one reviewer now argues that we have to analyze the newest set of GCM runs – the runs from 2001 are too old. The reviewer did not state what the "current generation" of GCM forecasts are! Nor would the editor really push the reviewer to clarify which GCM experiments would satisfy him/her. I therefore ask readers what are the most recent set of GCM runs that simulate global temperature based on the historical change in radiative forcing and where I could obtain these data?
Many of the runs have many more forcings than you considered in your paper which definitely improve the match to the obs. However, I am a little puzzled by one aspect of your work – you state correctly that the realisation of the weather ‘noise’ in the simulations means that the output from any one GCM run will not match the data as well as a statistical model based purely on the forcings (at least for the global mean temperature). This makes a lot of sense and seems to be to equivalent to the well-known result that the ensemble mean of the simulations is a better predictor than any individual simulation (specifically because it averages over the non-forrced noise). I think this is well accepted in the GCM community at least for the global mean SAT. That is why simple EBMs (such as Crowley (2000) do as good a job for this as GCMs. The resistence to your work probably stems from a feeling that you are extrapolating that conclusion to all other metrics, which doesn’t follow at all. As I’ve said in other threads, the ‘cutting-edge’ for GCM evaluation is at the regional scale and for other fields such as precipitation, the global mean SAT is pretty much a ‘done deal’ – it reflects the global mean forcings (as you show). I’d be happy to discuss this some more, so email me if you are interested [my bold][.
First, in terms of Kaufmann’s problems with GCMs reviewers: it seems to me that he is perfectly entitled to comment on the GCM models used in IPCC TAR, which have been published and documented to some extent and used in policy. If global mean SAT is a "done deal", as Gavin says then, then these earlier models should hold up to Kaufmann’s evaluation. I think that he should stick to his guns and not get into trying to sort out 14 new models.
Second, with respect to the newer models themselves, Gavin argues that, since they have more forcings, they will "improve the match to the obs" in the GCM. However, it’s pretty obvious that using more forcings will also improve the match in a simple linear model between the forcings and the global mean temperature as well. So Gavin’s first sentence proves nothing.
Third, look again at this extraordinary sentence of Gavin’s:
you state correctly that the realisation of the weather ‘noise’ in the simulations means that the output from any one GCM run will not match the data as well as a statistical model based purely on the forcings (at least for the global mean temperature).
Well, if this point of view is "correct", where have you ever seen it in print? Other than Kaufmann and Stern, has anyone ever seen a study comparing the results of a GCM to a simple linear model? Did Gavin ever say about his own GCM that: for estimating global temperature, we would have had more accurate results by a simple linear regression against the forcings? I remind people that I haven’t read the GCM literature and perhaps Gavin has done so, but I’d be extremely surprised. So Gavin’s condescending remark here is unjustified: Kaufmann and Stern’s point is interesting and not at all obvious. While "correctly" is correct, the observation is more than "correct"; it’s also interesting and provocative.
Fourth, look at Gavin’s next sentence, which is no better:
This makes a lot of sense and seems to be to equivalent to the well-known result that the ensemble mean of the simulations is a better predictor than any individual simulation (specifically because it averages over the non-forrced noise).
Well, it isn’t an equivalent result at all. Indeed, I think that it only re-inforces Kaufmann’s point. Kaufmann said that a simple linear model out-performed several prominent and high-powered GCMs that took days of supercomputer time to run. Gavin counters – an ensemble of GCMs will out-perform any individual GCM. But that’s a lot different point. If the net output of an ensemble of (10?) GCMs for global temperature maybe gets you back to the performance level of a simple linear model, then what’s the purpose of going through the 10 GCMs if you’re concerned about global temperature? Gavin tries to fend off this by arguing that the community has “moved on” to regional issues, because the temperature issues are a “done deal”. But the big issue is temperature – if the GCMs individually (and perhaps even collectively – we don’t know) cannot match a simple linear model, then why are these being touted as the way to study the problem and to generate null distributions – which is where we got into this via Cohn and Lins?
Gavin argues that the resistance to Kaufmann and Stern comes from a view that they are trying to go a bridge further and analyze the performance of other metrics. Given that there is no such discussion of other metrics in their article, there is no basis for this surmise.
Gavin then closes off:
I’d be happy to discuss this some more, so email me if you are interested.
I ifnd this the most offensive statement of all. Why not do this online? I once questioned realclimate’s commitment to their stated policy that "serious rebuttals and discussions are welcomed" in the context that they devoted a post to criticize Ross and me and then refused to post serious responses. In this case, they couldn’t get away with censoring Kaufmann, but it’s pretty clear that they didn’t want to have a "serious" discussion online. So Gavin asked Kaufmann to email him. I think that Kaufmann’s questions and points are good ones and Gavin’s rush offline is all too characteristic.