Can anyone on the Team actually hit a target?
A couple of days ago, I reported that Santer’s own method yielded failed t-tests on UAH when data up to 2008 (or even 2007) was used. I also reported that their SI (carried out in 2008) included a sensitivity test on their H1 hypothesis up to 2006, but they neglected to either carry out or report the corresponding test on the H2 hypothesis (the results reported in Table III).
The reason why the H2 test failed using up to date data is not that the trend changed materially from the 1979-99 trend, but that the AR1-style uncertainty decreased a lot. While I had calculated the changes in SE (for the observed trend) to make these calculations, I didn’t discuss these changes, but, for reference, the SE (observed trend) for 1979-99 (Santer Table 1, which I’ve replicated given their methods) was 0.138 deg C/decade. Apples and apples, using information up to 2007 (available to Santer and his Team as of the submission of their article), the SE(observed trend) had decreased to 0.067 deg C/decade.
The “correct” CI, according to the actual methods of Santer et al, is to take the Pythagorean sum of the SE (observed trend) (0.138 in their analysis) and the much decried SE of the inter-model mean (0.092/sqrt(19-1) = 0.0217), thus 0.14 deg C/decade.
Santer et al. state that Douglass et al had ignored uncertainty in the observations:
DCPS07 ignore the pronounced influence of interannual variability on the observed trend (see Figure 2A). They make the implicit (and incorrect) assumption that the externally forced component in the observations is perfectly known (i.e. the observed record consists only of φ_o(t), and η_o(t) = 0
Most readers have taken this for granted. But let’s see what Douglass et al themselves say that they did:
Agreement means that an observed value’s stated uncertainty overlaps the 2σSE uncertainty of the models.
In examples in the running text of Douglass et al, one can see applications of this procedure. For example:
Given the trend uncertainty of 0.04 °C/decade quoted by Free et al. (2005) and the estimate of ±0.07 for HadAT2, the uncertainties do not overlap according to the 2σSE test and are thus in disagreement.
So it’s a bit premature to take at face value Santer’s allegation that Douglass et al “ignored” trend uncertainty in the observations. Douglass et al say that they considered trend uncertainty in the observations (I haven’t parsed this in their article yet.) For now, let’s compare the uncertainties that Douglass said that they used, as compared to uncertainties calculated according to the method Santer said should be used. Here’s what Douglass et al said about trend uncertainty for the UAH T2LT that we’ve been looking at:
For T2LT, Christy et al. (2007) give a tropical precision of ±0.07 °C/decade, based on internal data-processing choices and external comparison with six additional datasets, which all agreed with UAH to within ±0.04. Mears and Wentz (2005) estimate the tropical RSS T2LT error range as ±0.09.
Do you recognize any numbers here? Look at the trend uncertainty for UAH T2LT up to the end of 2007 – calculated according to Santer’s own methods – 0.067 deg C/decade. Compare to the Douglass number – ±0.07 °C/decade.
So if you use up-to-date information in calculating the trend uncertainty according to the Santer method, you get the same value (actually a titch less) than that Douglass et al say that they used (0.07). I suppose that Santer might then try to argue the Douglass et al estimate of 0.07 deg C is covering different uncertainties than their 0.07 deg C, but what’s the evidence? And since this is a critical point, shouldn’t Santer have proved the point with some sort of exegesis?
[Update – Oct 26] John Christy writes in with the following exegesis of the topic:
In our paper, we addressed “measurement” uncertainty – i.e. how precise the trend really is given the measurement problems with the systems (I introduced the separation of these two types of uncertainty in the IPCC TAR text in the upper air temperature section.) We wanted to know how accurate the observational trend estimates were.
The subtle point here is that “temporal” or “statistical” uncertainty is not needed because of the precondition that the surface trends of the observations and the model simulations be the same. Sure, you can create a surface trend of +0.13 C/decade from scrambling the El Ninos and volcanoes any way you like, but when the surface trend comes out to +0.13 it will be accompanied by (in the models) a specific tropospheric trend – no need for the “temporal” uncertainty. When you scramble the interannual variations and get a trend different from +0.13, then we don’t want to use it in our specific experimental design.
This notion of the precondition seems lost in the discussion, but when understood, the arguments of Santer17 become irrelevant.
On this information, I’m not quite sure what the ±0.07 °C/decade in Douglass et al would represent in Santer terms. At this point, it appears that the Santer accusation that Douglass et al had “ignored” this form of uncertainty is incorrect, but the way that they handled it is, as Christy says, sufficiently “subtle” that, for me to express an opinion on whether the Douglass method is adequate in the context of Santer’s data, I’d need to emulate exactly what was done on Santer’s data and see how it works.
At present, I’ve requested the 49 Santer runs as used, but have received no response from Santer other than that he is at a workshop. The Santer coauthor with whom I’ve cordially corresponded did not personally ever receive a copy of the data and indicates that Santer will probably take the position that I be obliged to run the irrelevant gauntlet of trying to reconstruct his dataset from first principles, even to do simple statistical tests – the sort of petty silliness that is incomprehensible to the public. [End update]
PS. There’s one other issue raised in Santer (citing Lanzante 2005). Lanzante 2005 observed that the denominator when combining different uncertainties is a Pythagorean sum (sqrt(ssq)) and that a t-test using this gives different results than a visual comparison of overlap of uncertainties. If one uncertainty is a lot larger than the other, the Pythagorean sum ends up being dominated by the larger uncertainty.
For example, in the Santer Table III example sqrt(.138^2+.021^2)= .1397, only 1% larger than the greater uncertainty by itself. A visual comparison in which the overlap of separate 2-sigma intervals is inspected implies a little more onerous t-test than using t=2 against the Pythagorean sum. Lanzante 2005 criticized these sorts of visual comparisons in IPCC TAR and many climate articles, but the practice continues.
While the point is valid enough against Douglass, it is equally so against dozens of others studies; however, in the matter at hand, the issue does not appear to be material in any event.