Last year, I reported the invalidity using up-to-date data of Santer’s claim that none of the satellite data sets showed a “statistically significant” difference in trend from the model ensemble, after allowing for the effect of AR1 autocorrelation on confidence intervals. Including up-to-date data, the claim was untrue for UAH data sets and was on the verge of being untrue for RSS_T2. Ross and I submitted a comment on this topic to the International Journal of Climatology, which we’ve also posted on arxiv.org. I’m not going to comment right now on the status of this submission.
However, I re-visited some aspects of Santer et al 2008 that I’d not considered previously and found that it was worse than we thought.
In particular, I examined the following assertions that there was no statistically significant difference in lapse rate trends between observations and the ensemble mean. Santer observed that the lapse rate eliminated a considerable amount of common variability, reducing the AR1 autocorrelation and thus narrowing the confidence intervals (which is a fair enough comment):
Tests involving trends in the surface-minus-T2LT difference series are more stringent than tests of trend differences in TL+O, TSST , or T2LT alone. This is because differencing removes much of the common variability in surface and tropospheric temperatures, thus decreasing both the variance and lag-1 autocorrelation of the regression residuals (Wigley, 2006). In turn, these twin effects increase the effective sample size and decrease the adjusted standard error of the trend, making it easier to identify significant trend differences between models and observations.
Santer noted that there were significant differences in lapse rate for the UAH data set but not for the RSS data, about which he said:
there is no case in which the model-average signal trend differs significantly from the four pairs of observed surface-minus-T2LT trends calculated with RSS T2LT data (Table VI).
If RSS T2LT data are used for computing lapse-rate trends, the warming aloft is larger than at the surface (consistent with model results)… When the d∗1 test is applied, there is no case
in which hypothesis H2 can be rejected at the nominal 5% level (Table VI)
In the tropospheric data, we’d noticed that the trend in RSS T2 data was now very close to statistical significance and so it would be worthwhile checking these claims using up-to-date data. (Note that Santer did not mention T2 lapse rates in this summary, though T2 data is referred to in various tables elsewhere.)
As CA readers know, there are three major temperature indices that one encounters in public debate: HadCRU, GISS and NOAA. These were used for comparing troposphere and surface trends in the CCSP report that is a reference point for both Douglass and Santer, as shown in their Table 1 below.
As a first exercise, I calculated the lapse rate trend between these three surface indices (collated into TRP averages) and RSS T2LT (and T2, as well as corresponding UAH results.)
Using 2007 data (then available), 4 of the 6 comparisons of RSS data to surface data were statistically significant even with Santer’s autocorrelation adjustment: GISS relative to both T2LT and T2; NOAA and CRU relative to T2. Using data currently available, 5 (and perhaps) 6 comparisons are now statistically significant. (The t-statistic for the 6th, NOAA vs T2LT, is at 1.924.)
Using obsolete data (ending in 1999 as Santer did), the two GISS comparisons were “statistically significant” even with truncated data. The t-stats were 2.7 (T2LT) and nearly 3 (T2) and weren’t even borderline. So what was the basis of Santer’s claim that “none” of the RSS lapse rates trends relative to surface temperatures was “statistically significant”?
Santer didn’t include GISS and NOAA in his comparison!
He deleted the GISS and NOAA composites used by IPCC and others and replaced them by three SST series: ERSST v2, ERSST v3 and HadISST. The ERSST versions had lower trends and these trends were enough lower than NOAA and GISS that the discrepancy between the ERSST versions and RSS was no longer significant. Santer proclaimed victory.
In summary, considerable scientific progress has been made since the first report of the U.S. Climate Change serious and fundamental discrepancy between modelled and observed trends in tropical lapse rates, despite DCPS07′s incorrect claim to the contrary. Progress has been achieved by the development of new T_SST , T_L+O and T2LT datasets,
I have several problems with this supposed reconciliation. In my opinion, Santer should have reported the GISS and NOAA results. He could also have reported the ERSST results, but to simply not report the significant GISS results is hard to endorse, particular when Gavin Schmidt was a coauthor and familiar with GISS.
Secondly, if one sticks to the actual indices in common use, the discrepancy in lapse rate is just as real as ever. If the ERSST versions aren’t actually used in GISS, NOAA or HadCRU, what exactly is accomplished by showing that there is supposedly no statistically significant discrepancy between RSS and a surface version that isn’t used in the composites?
Third, Bob Tisdale observed that the first version of ERSST v3 (back in the old days of October 2008 before they “moved on”) incorporated satellite data in their estimates of SST. If so, then it seems relatively unsurprising that adjusting SST with satellite data reduces the discrepancy between SST and satellite data, but this hardly resolves the situation for data that hasn’t been adjusted by satellites or, for that matter, the unadjusted difference in the original ERSST3.
Fourth, there’s a nice bit of further irony. The low trends of ERSSTv3 apparently aroused protests within the community and the ink was barely dry on the publication of ERSST3 before they moved on. Bob Tisdale reported that, in November 2008, ERSSTv3(now ERSST v3A) was withdrawn and replaced by a new ERSST v3, not using satellites, with a higher trend. The “old” ERSST3 TRP trend up to end 1999 was 0.076 deg C/decade (I calculated this from a vintage gridded version of ERSST 3 that I located at another site); this was less than half the corresponding GISS trend and a little more than 60% of the CRU trend. But the new and improved ERSST3 TRP version presently online at the ERSST3 website (zonal) is 0.126 deg C/decade – a little higher than CRU. If they don’t adjust SST using satellites (and this adjustment seems to have been withdrawn after protests), the Santer reconciliation no longer works
In summary, contrary to Santer’s claim that none of the lapse rate trends are significant, all of them (or all but one of them) are significant relative to the three major indices using up-to-date data. As I said above, Santer et al 2008 is “worse than we thought”.
ERSST v3 originally used satellite data