I have much unfinished business with multiproxy studies, but am getting dragged into discussing GCMs, where I wish to make clear that I am not familiar with the literature and am merely commenting on individual articles as I read them in the context of current discussion. If I miss some nuance, I apologize and will try to correct. Rasmus observed here that:
the most appropriate null-distributions are derived from long control simulations performed with such GCMs. The GCMs embody much more physically-based information, and do provide a physically consistent representation of the radiative balance, energy distribution and dynamical processes in our climate system.
Kaufmann and Stern , A Statistical Evaluation of GCMs: Modeling the Temporal Relation between Radiative Forcing and Global Surface Temperature here consider the interesting qustion of whether GCMs out-perform elementary statistical models, given the same inputs, and come to very negative conclusions about GCMs and even question whether GCMs are an appropriate tool for assessing the issue of global temperature change (as opposed to regional impacts.) They say:
none of the GCM’s have explanatory power for observed temperature additional to that provided by the radiative forcing variables that are used to simulate the GCM…
Curiously, Kaufmann weighed in at realclimate in a thread which was specifically discussing the utility of GCMs for making null distributions, but did not mention this article, instead mentioning some of his other work purporting to show cointegration between CO2 and global temperature change. In passing, I might observe once again that a GCM step of 25 years takes 1 calendar day. To obtain a null distribution of 1000 runs of 25,000 years (still less than 1 obliquity cycle) is well beyond the range of present computing power. So the "null distributions" that Rasmus is talking about are not "null distributions" as they are understood in statistics, where 1000 would be a bare minimum. Anyway on to some excerpts from Kaufmann and Stern.
The Abstract is as follows:
Abstract: We evaluate the statistical adequacy of three general circulation models (GCMs) by testing three aspects of a GCM’s ability to reconstruct the historical record for global surface temperature: (1) how well the GCMs track observed temperature; (2) are the residuals from GCM simulations random (white noise) or are they systematic (red noise or a stochastic trend); (3) what is the explanatory power of the GCMs compared to a simple alternative model, which assumes that temperature is a linear function of radiative forcing. The results indicate that three of the eight experiments considered fail to reconstruct temperature accurately; the GCM errors are either red noise processes or contain a systematic error, and the radiative forcing variable used to simulate the GCM’s have considerable explanatory power relative to GCM simulations of global temperature. The GFDL model is superior to the other models considered. Three out of four Hadley Centre experiments also pass all the tests but show a poorer goodness of fit. The Max Planck model appears to perform poorly relative to the other two models.
The running text contains the following observations:
These results indicate that the GCM temperature reconstruction does not add significantly to the explanatory power provided by the radiative forcing aggregate that is used to simulate the GCM (Table 4). Conversely, we strongly reject (p< .01) restrictions that eliminate X and/or DX for all eight experiments. This indicates that the radiative forcing variables used to simulate the GCM have explanatory information about observed surface temperature that is not present in the GCM simulation for global surface temperature.
As described in section 4.3, none of the GCM’s have explanatory power for observed temperature additional to that provided by the radiative forcing variables that are used to simulate the GCM… we cannot make any precise determinations as to the cause for the explanatory power of the model inputs relative to the GCM output.
Conclusions about the effect of human activity on surface temperature are based in large part on comparisons of observed temperature and GCM simulations (Mitchell et al., 2001). But this may not be most effective means for attribution: the noise in even the best simulation (in this case the GFDL simulation) increases the uncertainty involved in attributing and predicting climate change. This uncertainty could be reduced by using appropriately specified and estimated statistical models to simulate the relation between human activity and observed temperature (Kaufmann and Stern, 1997, 2002; Stern and Kaufmann, 2000; Stern, 2004). Initial results with a new generation of time series models that are constrained by recently available observations on ocean heat content and energy balance considerations are very promising, yielding realistic rates of adjustment of atmospheric temperature to long-run equilibrium (Stern, 2004).
I’ve not attempted to parse through all the reasons for their findings and do not specifically endorse them. I’m merely mentioning them. If their conclusions are correct, then the GCMs do not actually assist in the calculation of global temperature, as compared to linear models from the input forcing series themselves.
Reference: Kaufmann and Stern , A Statistical Evaluation of GCMs: Modeling the Temporal Relation between Radiative Forcing and Global Surface Temperature here. http://www.bu.edu/cees/people/faculty/kaufmann/documents/Model-temporal-relation.pdf