One more bit of review before we get to Ammann’s answer. As an excuse for not answering the request of the House Energy and Commerce Committee about the R2 statistic, Mann told them that his "colleagues and [himself] did not rely on this statistic" in the following terms:
The Committee inquires about the calculation of the R2 statistic for temperature reconstruction, especially for the 15th Century proxy calculations. In order to answer this question it is important to clarify that I assume that what is meant by the “R2″ statistic is the squared Pearson dot-moment correlation, or r2 (i.e., the square of the simple linear correlation coefficient between two time series) over the 1856-1901 “verification” interval for our reconstruction. My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of “skill.” The statistic used by Mann et al. 1998, the reduction of error, or “RE” statistic, is generally favored by scientists in the field.
This sounds plausible, but it’s usually worthwhile checking what the Hockey Team says. MBH98 itself describes how they went about "not relying on" the r2 statistic as follows:
àÅ½àⰠ[RE] is a quite rigorous measure of the similarity between two variables, measuring their correspondence not only in terms of the relative departures from mean values (as does the correlation coefficient r) but also in terms of the means and absolute variance of the two series. For comparison, correlation (r) and squared-correlation (r2) statistics are also determined. Significance levels were determined for r2 from standard one-sided tables, accounting for decreased degrees of freedom owing to serial correlation
They made their lack of reliance on the verification r2 statistic very clear through the following illustration, showing the verification r2 statistics for each gridcell in the AD1820 step (which had 112 "proxies" including 12 actual temperature series as so-called "proxies" for temperature:
Original Caption: Figure 3 Spatial patterns of reconstruction statistics. … bottom, verification r2 (also based on 1854–1901 data). … For the r2 statistic, statistically insignificant values (or any gridpoints with unphysical values of correlation r < 0) are indicated in grey. The colour scale indicates values significant at the 90% (yellow), 99% (light red) and 99.9% (dark red) levels (these significance levels are slightly higher for the calibration statistics which are based on a longer period of time). A description of significance level estimation is provided in the Methods section.
Elsewhere in MBH98, they repeatedly demonstrated that they were "not relying" either on the correlation r or the correlation squared r2 through the following statements:
Time-dependent correlations of the reconstructions with time-series records representing changes in greenhouse-gas concentrations, solar irradiance, and volcanic aerosols suggest that each of these factors has contributed to the climate variability of the past 400 years, with greenhouse gases emerging as the dominant forcing during the twentieth century….
Figure 3 shows the spatial patterns of calibration àÅ½àⰬ and verification àÅ½àⰠand the squared correlation statistic r2, demonstrating highly significant reconstructive skill over widespread regions of the reconstructed spatial domain. Although a verification NINO3 index is not available from 1854 to 1901, correlation of the reconstructed NINO3 index with the available Southern Oscillation index (SOI) data from 1865 to 1901 of r = -0.38 (r2 = 0.14) compares reasonably with its target value given by the correlation between the actual instrumental NINO3 and SOI index from 1902 to 1980 (r =-0.72)
Original Caption: Figure 7 Relationships of Northern Hemisphere mean (NH) temperature with three candidate forcings between 1610 and 1995. … Bottom panel, evolving multivariate correlation of NH series with the three forcings NH, Solar, log CO2. The time axis denotes the centre of a 200-year moving window. One-sided (positive) 90%, 95%, 99% significance levels (see text) for correlations with CO2 and solar irradiance are shown by horizontal dashed lines, while the one-sided (negative) 90% significance threshold for correlations with the DVI series is shown by the horizontal dotted line. The grey bars indicate two difference 200-year windows of data, with the long-dashed vertical lines indicating the centre of the corresponding window….
We estimate the response of the climate to the three forcings based on an evolving multivariate regression method (Fig. 7). This time-dependent correlation approach generalizes on previous studies of (fixed) correlations between long-term Northern Hemisphere temperature records and possible forcing agents. Normalized regression (that is, correlation) coefficients r are simultaneously estimated between each of the three forcing series and the NH series from 1610 to 1995 in a 200-year moving window. The first calculated value centred at 1710 is based on data from 1610 to 1809, and the last value, centred at 1895, is based on data from 1796 to 1995″¢’¬?that is, the most recent 200 years. A window width of 200 yr was chosen to ensure that any given window contains enough samples to provide good signal-to-noise ratios in correlation estimates. Nonetheless, all of the important conclusions drawn below are robust to choosing other reasonable (for example, 100-year) window widths….
We test the significance of the correlation coefficients (r) relative to a null hypothesis of random correlation arising from natural climate variability, taking into account the reduced degrees of freedom in the correlations owing to substantial trends and low frequency variability in the NH series.
We use Monte Carlo simulations to estimate the likelihood of chance spurious correlations of such serially correlated noise with each of the three actual forcing series. For (positive) correlations with both CO2 and solar irradiance, the confidence levels are both approximately 0.24 (90%), 0.31 (95%), 0.41 (99%), while for the “Åwhiter’, relatively trendless, DVI index, the confidence levels for (negative) correlations are somewhat lower (-0.16, -0.20, -0.27 respectively). A one-sided significance test is used in each case because the physical nature of the forcing dictates a unique expected sign to the correlations (positive for CO2 and solar irradiance variations, negative for the DVI fluctuations).
The correlation statistics indicate highly significant detection of solar irradiance forcing in the NH series during the “ÅMaunder Minimum’ of solar activity from the mid-seventeenth to early eighteenth century which corresponds to an especially cold period. period. In turn, the steady increase in solar irradiance from the early nineteenth century through to the mid-twentieth century coincides with the general warming over the period, showing peak correlation during the mid-nineteenth century. Greenhouse forcing, on the other hand, shows no sign of significance until a large positive correlation sharply emerges as the moving window slides into the twentieth century. The partial correlation with CO2 indeed dominates over that of solar irradiance for the most recent 200-year interval, as increases in temperature and CO2 simultaneously accelerate through to the end of 1995, while solar irradiance levels off after the mid-twentieth century.
Their "not relying" on correlation or correlation squared is further demonstrated in Mann et al. , which re-capitulates the two figures shown above with only slightly different legends.
Original Caption: Figure 4. Spatial patterns of (top) calibration beta, (middle) verification beta, and (bottom) r-squared statistics for annual-mean reconstructions. The calibration statistics are based on the 1902–80 data, while the verification statistics are based on the sparser 1854–1901 instrumental data (see Figure 2) withheld from calibration… For the r-squared statistic, statistically insignificant values (or any grid points with unphysical negative values of correlation) are indicated in gray. The color scale indicates values significant at the 90% (yellow), 99% (light red), and 99.9% (dark red) levels (these significance levels are slightly higher for the calibration statistics that are based on a longer period of time). More details regarding significance level estimation are provided in Mann et al. (Mann et al, 1998). [Reprinted with permission from Mann et al. (Mann et al., 1998).]….
Our winter Nino-3 reconstruction exhibits a highly significant correlation with largely independent reconstruction of the winter (Dec–Jan–Feb) SOI of Stahle et al. (Stahle et al., 1998). The two reconstructions are correlated at r = 0.63 over the full period of overlap (1705–1976) and r = 0.60 during the precalibration interval (1705–1901). This is nearly as high as the observed correlation (r = 0.7) between the instrumental SOI and Nino-3 series during the twentieth century….
Original Caption: Figure 17. Relationship of annual-mean NH mean temperature reconstruction to estimates of three candidate forcings (see Mann et al., 1998) between 1610 and 1995. ….. (e) Evolving multivariate correlation of NH series with the three forcings (a, b, and c). The time axis denotes the center of a 200-yr moving correlation window. Significance levels are based on the null hypothesis that the surface temperature series is a realization of natural variability represented as represented by a red noise process with the persistence structure of the observed NH series (see Mann et al. 1998 for details). One-sided significance levels for correlations with the different forcing agents are shown, under the assumption that only positive relationships with GHG and CO2, and negative relationships with DVI, are physically meaningful. These confidence levels are approximately constant over time and are thus represented by their average values over time for simplicity (although the number of degrees of freedom in the CO2 series is somewhat decreased prior to 1800 when the series is essentially flat, so that the confidence intervals are slightly too liberal in this case). Significance levels for correlations of temperature with CO2 and solar irradiance are nearly identical, and the 90%, 95%, and 99% (positive) significance levels are shown by the horizontal dashed lines. The 95% (negative) significance level for DVI is shown by a horizontal dotted line. The lower dotted line indicates the 99% significance level for correlation with GHG if a two-sided hypothesis test is invoked (this is only added to emphasize that the seemingly spurious negative correlation of NH with GHG apparent during the late eighteenth–early nineteenth century is in fact not statistically significant if the a priori physical requirement of a positive relationship between CO2 and temperature is not taken into account in hypothesis testing). The gray bars indicate two different 200-yr windows of data in the moving correlation, with the long-dashed vertical lines indicating the center of the corresponding windows…
For lags of 10–15 yr the relationship between greenhouse gas (GHG) increases in recent decades and increasing temperatures is considerably more significant, while the relationship with solar irradiance is considerably less significant. For the shorter (100 yr) window there are few enough degrees of freedom in the temperature and forcing series that the statistics are not as stable (i.e., the results are much “Å”Ånoisier”). In particular, larger negative correlations with GHGs are achieved prior to 1800 in this case, although these are not significant taking into account the decreased degrees of freedom in the series. Nonetheless, even with the large sampling variations that arise in the 100-yr window case, the relationship between recent warming and increasing greenhouse gas concentrations is the dominant statistical feature. It is evident that the inclusion of a representation of the lagged response of temperatures to forcing heightens the evidence for a recent anthropogenic impact on twentieth century climate beyond that presented in Mann et al. (Mann et al., 1998)
Just to make it totally clear that they did "not rely" on correlation statistics, the SI to the MBH98 Corrigendum in July 2004 stated the following for "each" of the 11 steps in the stepwise reconstruction:
4. Statistical Verification
An essential step in the procedure of Mann et al (1998), as described therein, was the use of conventional verification procedures to establish the level of skill in the proxy-based surface temperature reconstructions. Verification estimates based on correlation and Reduction of Error (‘RE’ or, ‘beta’ in the language of Mann et al, 1998) were established for each of the 11 separate procedures contributing to the stepwise reconstruction procedure, based on comparison of the proxy reconstructions…It should be stressed that reconstructions that did not pass statistical cross-validation (i.e., yielded negative RE scores) were deemed unreliable.
I was about to editorialize a little on this, but words escape me. If this is "not relying" on correlation, I’d hate to think what would happen if they actually relied on it. You’d think that any of Mann’s friends who had read his draft evidence to Barton (or his lawyer for that matter) would have said: Uh, Mike, maybe you’d be better off just answering the question.