In a story featured on the cover of Nature, Eric J. Steig, David P. Schneider, Scott D. Rutherford, Michael E. Mann, Josefino C. Comiso and Drew T. Shindell report to have found “significant warming” that “extends well beyond the Antarctic Peninsula to cover most of West Antarctica, an area of warming much larger than previously reported.” (“Warming of the Antarctic ice-sheet surface since the 1957 International Geophysical Year”, Nature, Jan 22, 2009).
Specifically, they state that
We find that West Antarctica warmed between 1957 and 2006 at a rate of 0.17 ± 0.06°C per decade (95% confidence interval). Thus, the area of warming is much larger than the region of the Antarctic Peninsula. The peninsula warming averages 0.11 ± 0.04°C per decade. We also find significant warming in East Antarctica at 0.10 ± 0.07°C (1957-2006). The continent-wide trend is 0.12 ± 0.07°C per decade. (p. 460)
However, in another recent paper, Santer et al. (2008) point out that “In the case of most atmospheric temperature series, the regression residuals … are not statistically independent…. This persistence reduces the number of statistically independent time samples.” Such a reduction of the effective sample size can cause Ordinary Least Squares (OLS) standard errors and confidence intervals to be too small, and the significance of coefficients to be overstated.
Steig commendably provides a detailed table of the paper’s Thermal Infrared (TIR)-based temperature reconstruction on his University of Washington webpage, that allows these trends to be re-estimated and checked for serial correlation. In fact, there is substantial AR(1) serial correlation, but the authors have made no correction for it, despite their claim to the contrary in their online Supplementary Information (SI).
When the standard errors reported by Steig et al. are corrected for serial correlation, using either the standard method or a simplified method used by Santer et al., the reported trends remain statistically significant, though not at as high a level as reported in the paper.
When the 600 monthly values for 1957-2006 are averaged over the 5509 locations in the file, the estimated continent-wide average temperature anomaly illustrated in Figure 1 below, does indeed show an upward trend of 0.1178°C per decade, which is within rounding error of the 0.12°C per decade reported by Steig et al:
Steig et al. report a 95% CI of ± 0.07 °C/decade for this trend coefficient. Assuming a t critical value of approximately 2, this means that they are estimating the se to be about 0.035. Using the above data, the OLS standard error (“seOLS”) is 0.0330. Twice this value is within rounding error of 0.07. It indicates a very high level of significance (p = .0004).
However, the first order serial correlation of the residuals, r1, is 0.318, and is highly significant (p = .0000). With this serial correlation, the standard AR(1)-corrected standard error (“seAR”) is 0.0458, i.e. 44% larger. This leaves the slope still highly significant (p = .0103), but not nearly as significant as is implied by the published CI. The simpler approximation presented by Santer et al., which is based on a 1952 formula by Quenouille and therefore identified here as “seQ” (see below for details), gives the same numerical value as seAR, namely 0.0458. Using either seAR or seQ, the corrected 95% CI is ± 1.964(0.0458) = 0.0900, which is clear inconsistent with the 0.07 reported by Steig et al.
Table 1 below reports similar figures for all the 1957-2006 TIR reconstruction trends for which Steig et al. report numerical values. Implicit Steig standard errors, obtained by dividing their reported 95% CI’s by 2, are given in square brackets [ ]. p-values are given in parentheses below the respective standard errors. Slopes and standard errors are all in °C/decade.
In all four cases, Steig et al. are clearly using seOLS, not seQ or seAR. In the caption to Figure S4 of their online SI, the authors state that
Black lines separate areas of significant vs. insignificant trends (>95% confidence based on two-tailed t-test with number of degrees of freedom adjusted for temporal autocorrelation). (SI p. 4)
The reference to adjusting the number of degrees of freedom would imply that they are using the Santer/Quenouille seQ adjustment. However, the confidence intervals they report in fact show no correction for serial correlation, by either method. The article’s Figure 3, showing regions of significant and insignificant trends, is therefore erroneous.
When the implied standard errors are appropriately corrected, many of the reported trends in Table 1 do remain significant, but only at a greatly reduced level.
The famous cover photo on the 22 Jan 2009 issue of Nature, shown above, appears to be based on Figure 4e of the text, which shows mean annual trends for the shorter period 1979-2003. Steig et al. give no numerical values for this period, but they are easily computed using the online TIR reconstruction file. Table 2 below gives the same statistics as Table 1 above, but for this shorter, 300-month period.
Thus, although the trends for the Peninsula and West Antarctica shown on the cover are both significant, the trends for the continent as a whole and for East Antarctica are not significantly different from zero, with or even without a correction for serial correlation. It is curious that the trend for West Antarctica is even stronger than that for the Peninsula, even though the authors admit that the limited surface data for West Antarctica shows less warming than Peninsula stations do.
The slight differences that sometimes arise between my point estimates of the trend in Table 1 and those given by Steig et al. for the Peninsula and West Antarctica may be simply due to the manner in which ties induced by rounding error at the boundaries of the regions were resolved: On p. 461 of their text, they define W. Antarctica as 72 °-90° S, 60°-180° W. I interpreted this to include all 3 boundaries plus the pole. They define East Antarctica as 65°-90°S, 300-180°E. I took this to include 65° S, but to exclude the two longitude boundaries, which I included in the other two regions. They define the Peninsula to have the same westerly longitude boundaries as W. Antarctica, and to be north of 72° S. I took this to exclude 72° S, which I had included in W. Antarctica, but to include the two longitude boundaries, as with W. Antarctica.
In fact, Steig et al.‘s regional definitions unambiguously place a sliver of the Peninsula in East Ant., place one pixel of Thurston Island in the Peninsula, and leave out the very tip of the Peninsula altogether. They may have modified their definitions manually to correct these anomalies, but I just took their definitions as published.
In this note, I am just taking the TIR recon as given. Until Steig posts the Matlab code he used to apply RegEM to the satellite TIR data, it remains uncertain how valid that reconstruction is and therefore how valid the derived trends are. It is also unclear why Campbell Island (52°S) and 4 other temperate-zone oceanic weather stations were included in the 42 occupied weather stations used to measure Antarctic ice-sheet temperature, as listed in Table S2 of the SI. Amsterdam, at 52°N, would surely not be included in a study of Arctic temperatures. Without knowing the details of how RegEM was applied to the satellite data, there is no way to check how robust the reconstruction is to the exclusion of these 5 stations.
In a regression of the form
y = X β + ε,
where X is an nXk matrix of regressors, the OLS estimate of β is
β-hat = (X’ X)-1 X’ y
When Cov(ε) = σ2 I,
Cov(β-hat) = σ2 (X’ X)-1.
This may be estimated by
CovOLS = s2(X’ X)-1, where
s2 = e’ e / (n-k) and
e = y – X β-hat
is the vector of OLS residuals. The standard errors are then
seOLS = sqrt(diag(CovOLS)).
If instead there is first order Autoregressive (AR(1)) serial correlation of the form
εt = ρ εt-1 + vt,
where the vt are white noise, then
Cov(β-hat) = σ2 (X’ X) -1 X’ Γ X (X’ X)-1,
where Cov(ε) = σ2Γ with
Γ = (ρ|i – j|).
The standard estimator of this is
CovAR = s2 (X’ X)-1 X’ G X (X’ X)-1,
G = (r1|i – j|),
r1 = ∑(etet-1)/(n-k-1) / s2.
In the early days of concern about serial correlation, before modern computers were available to crunch through the above matrices, Maurice H. Quenouille (Associated Measurements, 1952, p. 168) gave a formula for adjusting the standard test for the significance of a correlation between two variables (or equivalently for the significance of the slope in a simple regression of one on the other) for the loss in effective sample size when there is serial correlation of a general form in both variables. In the special case of a trendline regression (so that the low-order correlations of the regressor are virtually one) with AR(1) errors, and not too large a value of ρ, it can be shown that Quenouille’s formula implies, in the large sample limit,
CovAR ≈ CovOLS / Q,
Q = (1 – r1) / (1 + r1),
is the factor by which serial correlation reduces the effective sample size.
This factor is used by Santer et al. (2008, p. 1708), and may be seen to give almost identical results to the more computation-intensive CovAR in the present application. It is not clear to me who first proposed this special case of Quenouille’s more general formula. Quenouille himself attributed the formula identified with him to an even earlier article by M.S. Bartlett (J. Royal Statist. Soc. 1935, 98).
(The Quenouille adjustment is specific to the problem of the significance of a linear trend coefficient. For more general regressors, seAR is more appropriate.)
It should be noted that in the presence of serial correlation, s2 as calculated above is in fact a downward-biased estimate of the variance of εt. Furthermore, r1 as computed above is a downward-biased estimate of the true ρ. The standard CovAR and therefore CovQ are therefore both downward-biased estimates of the true sampling variance of β-hat. I am working on a new estimator of ρ which should correct both these deficiencies. However, although this improved estimator can make a big difference as ρ approaches 1, preliminary calculations indicate that this improved estimator makes very little difference for the present data, in which rho is never as high as 0.4.
Another important model of serial correlation, which UC has commented on here, but which goes beyond the scope of the present post, is that of long-memory or Fractionally Integrated errors.
J. Huston McCulloch
Ohio State University
The above Nature cover image comes to you directly from http://faculty.washington.edu/steig/nature09data/cover_nature.jpg