I originally drafted the article because it seemed to me that the then new James Ross Island isotope series exemplified many features of a “good” proxy according to ex ante criteria that I had loosely formulated from time to time in critiquing “bad” proxies, but never really codified (in large part, because it’s not easy to codify criteria except through handling data.)

Although this series is in the Kaufman 60-90S reconstruction, its appearance is quite different than the final 60-90S reconstruction: indeed, it has a very negative correlation (-0.61) to Kaufman’s final CPS reconstruction. I’ll discuss that in a different article.

Following is mostly 2014 notes, with some minot updating for context.

**“Good” Proxies**

I’ve articulated with increasing clarity over the years (but present in early work as well) – is that one needs to work outward from proxies that are “good” according to some ex ante criteria, rather than place hope in a complicated multivariate algorithm on inconsistent and noisy data, not all of which are “proxies” for the item being reconstructed. This is based on principles that I’ve observed in use by geophysicists and geologists to combine “good” (high resolution) data with lower quality data.

While I haven’t attempted to reduce my concepts of a “good” proxy, over the years, I have developed some criteria that I find useful in appraising proposed proxies. I will briefly assess the James Ross Island isotope series against these criteria.

First, for a class of proxy to be useful in a reconsruction network, it ought to have been applied to many locations, rather than being an ad hoc singleton. For example, one of the few PAGES2K (South America) reaching to medieval period was an index from Lago Aculeo pigments. This was a novel and then unique proxy class. Even if pigment proxies ultimately prove to be ideal, there was only one example available to PAGES2K. Without many examples, we simply don’t know how it might be confounded. In contrast, there are dozens of polar ice core isotope series, qualifying this proxy class on this count.

Second, a good proxy needs to be both high resolution and well-dated. Ice cores are dated with high accuracy. (Ocean sediments are not as well dated). “High” resolution depends on the context: I much prefer proxies that have at least 10-year resolution. The resolution of the James Ross ice core deteriorates with age, but it performs well on these counts as compared to (say) most ocean sediments.

Third, I place particular value on high-resolution proxies that extend through the Holocene. The changes from the LGM to the mid-Holocene to the present are sufficiently great that one can benchmark which way is up for the proxy class. It sounds obvious, but I think that it’s more important than generally acknowledged. Antarctic (and Greenland) ice cores are a very important example of such proxies.

Fourth, for Holocene-scale reconstructions, one needs proxies that are responsive to millennial scale changes. Esper et al 2012, an excellent article, observed that very long tree ring chronologies extending back to the Mid-Holocene (a few Scandinavian series) lacked the millennial-scale variation that “ought” to be observable given the dramatic changes in NH high-latitude summer insolation over this period (which is much, much greater in comparison to CO2 forcing than most people would expect.) In the Arctic, there is evidence that “small” ice caps may provide more nuanced information on Holocene changes than the Greenland summit. Gifford Miller’s work on Baffin Island is one example (covered in several CA posts http://www.climateaudit.org/tag/miller); there is also interesting work on proglacial lakes adjacent to small Greenland ice caps (Bregne, Istorvet, Renland) which show mid-Holocene warmth more clearly than ice core proxies from the summit of the main Greenland ice sheet.

Fifth, one needs proxies that are relatively free of secular drift which distorts the relationship between the proxy and temperature. In ice cores, this can happen in several ways. For example, because glaciers flow, the source ice from deeper sections of an ice core can come from higher elevations, imparting an extreme bias to the series. Masson (2000) contained some striking examples. The earliest Agassiz cores (1977; 1979) also had this problem (but were uncritically used in some multiproxy articles nonetheless.) Specialists attempt to mitigate this problem through locating cores at summits, but these attempts are not always successful. Ice core series which suffer from this problem ought not to be used (though they sometimes are).

Sixth, for ice core proxies, changes in elevation of the Greenland and Antarctic ice sheets through the Holocene (mostly lowering) result in an important long-term drift in the association between isotopes (d18O, d2H) and temperature. Vinther (2009) proposed (in my opinion) a really elegant solution to this problem: he noticed that elevation changes over the Holocene in adjacent small ice caps (Agassiz on Ellesmere Island, Renland in east Greenland) were MUCH less than at the top of the Greenland ice sheet. The isotope series from these small ice caps had a close relationship with the more famous series from the top of the Greenland ice sheet (GISP, GRIP), but there was a noticeable increasing difference in levels between the series over the long Holocene. Vinther plausibly attributed the long-term change to elevation decrease of the Greenland ice sheet through the mostly warm Holocene and used the difference to estimate the elevation change. Vinther also observed that the decreasing isotope trend in Renland and Agassiz series was very coherent with the decline in summer NH insolation (the driving force in Milankowitch theory), while the GISP2 series (hugely popular with “skeptics”) was very flat, too flat. Under Vinther’s analysis, after allowing for elevation change, the GISP2 series is no longer flat, but declining through the Holocene, just like the Renland and Agassiz series.

While the authors of the James Ross Island dataset didn’t connect their site location to Vinther’s conceptual model, James Ross Island, which is separate from the Antarctic Ice Sheet and very small in comparison, also has the key features of the Renland and Agassiz series that Vinther used for his benchmark, thereby offering the possibility of a similar benchmark for series from locations on the Antarctic ice sheet where large changes in elevation are known to have occurred e.g. Law Dome.

It is the first ice ice core from the Antarctic Peninsula which extends to the start of the Holocene and earlier. There have been several previous ice cores on the Peninsula, but none even reached 1500 years. The nearest ice cores reaching into the Holocene and LGM are approximately 60 degrees away each in both directions: Dronning Maud at 3E and the new WAIS core at 112W, both much further to the south on the Antarctic continent.

This is not to say that the James Ross Island isotope series will not prove to have warts of its own, but it has many features that it make it more interesting ex ante than yet another ex post screened tree ring chronology.

In passing, I find speleothem isotope series, though mostly given little attention in multiproxy articles, to be very interesting proxies, as they are (1) highly replicated: (2) consistent between nearby sites; (3) well-dated; (4) both reach back through the Holocene to the LGM and come forward to the very present. Chinese data appears particularly thorough.

**The James Ross Island dD Series**

The James Ross Island isotope series was originally published as Mulvaney et al (Nature 2012), Recent Antarctic Peninsula warming relative to Holocene climate and ice-shelf history pdf. Another publication by the same authors (^) is limited to the period from AD1000 on, which is highlighted in light yellow below.

*Figure 1. James Ross Island dD, showing post-1000 AD values in light yellow margin. The LGM ice (separation denoted by //) is NOT dated *

The main features of the James Ross Island data are obvious.

The LGM (**not** dated here) is very cold. The highest values of the series are in the Early Holocene (12.5-10 ka BP). Values from ~9000 BP to 3000 BP fluctuated within a relatively narrow range before declining in the late Holocene (after ~4000 BP). The lowest values were reached about 500 BP, more or less contemporary with the NH Little Ice Age. Values in the 20th century were higher than in the LIA, but are still lower than values through most of the Holocene an considerably lower than the highs in the Early Holocene.

Notwithstanding its rather deceptive title (Recent Antarctic Peninsula warming relative to Holocene climate and ice-shelf history), Mulvaney et al recognized these features in an extended and interesting discussion of the data, including the following (and other similar comments):

The Holocene temperature history from the JRI ice core is characterized by an early-Holocene climatic optimum that was 1.3 +- 0.3 deg C warmer than present (Fig. 3). The magnitude and progression of this early-Holocene optimum is similar to that observed in ice-core records from the main Antarctic continent [16 –Masson-Delmotte 2011]…

Likewise, marine temperatures on the western side of the Antarctic Peninsula [17-Shevenell 2011] declined to reach, by ~8,000 yr BP, a long-term mean that was close to present-day values…

Various proxy evidence exists for a mid-Holocene warm period on the Antarctic Peninsula [7- Bentley 2009], although the lack of a consensus on its timing in this region may be explained by the small magnitude of this feature in the JRI temperature record compared with the well defined mid-Holocene climate optimum in continental Antarctic icecore records [16 – Masson Delmotte]…

Marine sediments indicate that a permanent ice shelf was established there [Prince Gustav] only after ~1,500 yr BP and that the maximum ice-shelf extent may have been reached as recently as a few centuries ago [3 –Pudsey 2006].

Despite these sensible comments, the abstract to Mulvaney et al 2012. in apparent genuflection to the “consensus”, began with the words “rapid warming over the past 50 years” and observed that “the high rate of warming over the past century is unusual (but not unprecedented) in the context of natural climate variability over the past two millennia”.

Conclusion

To the extent that proxies and proxy reconstructions have broader significance in the climate debate, their interest largely arises from the unprecedentedness (or lack thereof) of late 20th century/early 21st century data relative to the past. When IPCC was founded, as much interest attached to the comparison of the modern warm period to the “Holocene Optimum” (or “Holocene Thermal Maximum”) as to the corresponding comparison to the medieval warm period. In the 1990s and, especially since the IPCC Third Assessment (2001) promoted the Mann hockey stick, far more attention has been paid to the medieval comparison, but there is increasing interest in the longer Holocene perspect (Marcott et al 2013; Kaufman 12K (2020).

]]>

One of the signature findings of IPCC AR5 WG2 has been that climate change has already had a negative impact on crop yields, especially wheat and maize. These findings are prominent in the WG2 Summary for Policy Makers and were featured in WG2 press coverage. The topic of crop yields are a specialty of WG2 Co-Chair Christopher Field. Field’s frequent co-author, David Lobell, was a Lead Author of the chapter on Food (chapter 7), which in turn cited and relied on a series of Lobell articles, in particular, Lobell et al (Science 2011, Climate Trends and Global Crop Production Since 1980, pdf), which was a statistical analysis of crop yields from 1980 to 2008 (or to 2002 in some analyses) for four major crops (wheat, maize, rice, soy) for 185 countries.

In the period 1980-2008, both crop yields and temperatures have positive trends (notwithstanding the pause/hiatus in the 21st century). Because both series have positive trends, there is therefore a positive correlation between crop yields and temperatures for the vast majority of crop-country combinations.

Given that both series are going up, it is an entirely valid question to wonder who Lobell and coauthors arrived at their signature negative impact merely by applying elementary statistical methods to annual data of yields, temperature and precipitation. I’ll look at this question in today’s post.

**Data
**

In 2011, I obtained the data for Lobell et al 2011 from lead author Lobell (who undertook at the time to place both data and code online, neither of which appears to be done.) I had asked Lobell to archive code, because it wasn’t entirely clear what he had done. Lobell collated temperature and precipitation data from both UDel and CRU. (For the latter, Lobell used the CRU TS data made famous by the Harry Readme.) In the figure below, I’ve plotted Lobell’s yield and temperature data for the China-wheat combination (both standardised to SD units), as an example of both series going up.

Lobell regressed Yield (actually log Yield) against time, temperature and precipitation variables, describing the procedure as follows:

Translating these climate trends into potential yield impacts required models of yield response. We used regression analysis of historical data to relate past yield outcomes to weather realizations. All of the resulting models include T and P, their squares, country-specific intercepts to account for spatial variations in crop management and soil quality, and country-specific time trends to account for yield growth due to technology gains (6).

The precipitation and quadratic terms don’t appear to affect the regression very much, i.e. the main effects are delivered by the model in which Yield is regressed against time and temperature as follows:

(1) Yield ~ Year + Temperature

Using conventional regression nomenclature, the regression coefficient b is given by the formula

(2) b= (X^T * X)^{-1} X^T y

where the X matrix of independent variables if {Year; Temperature} and y is the Yield vector.

For convenience (and thus is irrelevant to the point that I’m working towards), normalize the data.

X^T y is simply the vector of correlations of Yield to Time (the normalized trend) and Temperature.

(X^T * X) is nothing more than the correlation matrix between Year and Temperature i.e. the off-diagonal element r is the temperature trend (normalized units) as follows:

| 1 r |

| r 1 |

The calculation of the OLS regression coefficients uses the inverse of this matrix,

| 1 -r | * 1/(1-r^2)

| -r 1 |

The negative term in the off-diagonal means that the OLS coefficient for the regression of Yield onto Time and Temperature is calculated as a function of the correlation between yield and temperature, the trend in yield, the trend in temperature as follows:

b_temperature = 1/(1-r^2) (-r*trend_yield + cor_yield_temp)

In other words, if the correlation between Yield and temperature is less than the product of the trend in yields and trend in temperature (both normalized), then the regression coefficient is negative. This has nothing to do with yields or temperatures, but is a trivial property of the matrix algebra.

As an example, for the Chinese wheat series shown above, although there is a positive correlation between yield and temperature (0.5096), the OLS regression coefficient of a regression of Yield against Year and Temperature results in a negative coefficient. Applying the above formula, the normalized trends (correlations between year and item) for yield and temperature are 0.984 and 0.548, yielding 0.5096- 0.984*0.584 <0.

]]>**Introduction**

The recent open-access paper Gregory et al 2019 “How accurately can the climate sensitivity to CO2 be estimated from historical climate change?” discusses, *inter alia*, the use of regression to estimate historical climate feedback. As I wrote in a previous article, Gregory et al. consider a regression in the form *R* = α *T*, where *T* is the change in global-mean surface temperature with respect to an unperturbed (i.e. preindustrial) equilibrium and *R* is the radiative response of the climate system to the change in *T,* however caused; *α* is thus the applicable climate feedback parameter for that cause. The corresponding effective climate sensitivity (EffCS) is then *F*_{2xCO2}/*α* where *F*_{2xCO2} is the effective radiative forcing (ERF) for a doubling of preindustrial atmospheric carbon dioxide concentration. It should be noted that the climate system response to ERF, in GCMs and/or the real world, may vary between forcing agents: their (equilibrium) efficacies[1] may differ or, equivalently, different *α* values may apply to them. This is thought to be a major issue in relation to volcanic forcing, at least.[2][3]

Gregory et al. go on to consider time-variation of climate feedback in the standard CMIP5 climate model *historical* experiment simulations spanning 1850-2005 featured in AR5. They attempt to do so for the AR5 CMIP5 models generally. This a very difficult task, since ERF was not generally diagnosed for CMIP5 *historical* simulations, and evidently varies very substantially between models. An accurate ERF time series is needed in order to derive one for *R*, using *R* = ERF – *N* (where *N* is the Earth’s downwards top-of-atmosphere (TOA) radiative imbalance) since *R* cannot be measured directly in models or in the real world. As a surrogate for evolving CMIP5-mean *historical* simulation ERF, Gregory et al. use IPCC AR5 estimated total ERF, with the anthropogenic aerosol component multiplied by 1.5 and the volcanic component multiplied by 0.8 to compensate for estimated differences between CMIP5 and AR5 historical ERF. My investigations suggest that doing so, which undoubtedly better than using unadjusted AR5 ERF, does not provided a sufficiently accurate match to the time profile of CMIP5-mean *historical* simulation ERF evolution. I do not think that there is any fully satisfactory way of resolving this problem.[4]

Gregory et al. analyse the data using ordinary least squares (OLS) regression of *R* against *T* during the historical period over sliding 30-year windows, and conclude that the level of climate feedback (*α*: the slope of this regression) varies substantially in AOGCM historical simulations on multidecadal timescales, by a factor of up to two. Figure 1 reproduces the relevant figure in their paper. The solid black line shows how estimated *α* varies when regressing CMIP5 multimodel-mean *historical* simulation data, using their surrogate ERF time series. The dotted black line is the mean of *α* estimates for individual CMIP5 model-ensembles (each comprising from one to ten runs), with the shaded area showing ±1 standard deviation bounds. As Gregory et al. correctly point out, due to “regression dilution” use of OLS regression biases *α* estimates downwards in the presence of noise in the explanatory variable, *T*. Such noise is much greater for individual models than for the CMIP5 multimodel-mean, where it averages out much more.

Figure 1. A reproduction of Fig. 5(a) in Gregory et al (2019). Time-dependent climate feedback parameter for the multimodel mean of the CMIP5 historical experiment (labelled“CMIP5 E”) compared with the mean ‑I of individual CMIP5 models (labelled “CMIP5 I”), and corresponding ‑e and ‑i from the MPI-ESM1.1 ensemble. The lightly coloured regions around the CMIP5 I lines are ±1 standard error.

.

I do not suggest here that anything in this part of the new paper is wrong. Moreover, Gregory et al. correctly state a key point, that the time-variation of *α *estimated for the CMIP5 mean can be explained mainly by the varying importance of greenhouse gas and volcanic forcing. However, I think that climate feedback estimates derived from 30 years of historical period data, with no adjustment in relation to volcanic forcing, are not the best method of analysis. Such regressions reflect noise from internal climate system variability and the low efficacy (or, equivalently, the high *α*) of volcanic forcing. But, even assuming accurate ERF estimates are used, the 30-year sliding regression analysis does not reveal what, if any, non-noise caused variability in *α* remains after adjustment is made for the low efficacy of volcanism. I focus here on the use of an alternative method of analysis that I believe provides more insight into climate model behaviour during *historical* simulations.

**Analysing climate feedback in MPI-ESM1.1**

In addition to looking at AR5 CMIP5 models, Gregory et al. sensibly investigated the ensemble of CMIP5-style *historical* simulations carried out by the Max Plank Institute in Hamburg using their more recent MPI-ESM1.1 AOGCM. That ensemble is ideal for investigating *α* estimates, both because it has far more members (100) than other ensembles of multiple *historical* simulations and, even more so, because related diagnostic simulations that enable accurate quantification of the ERF applying in each year of the *historical* simulation have also been carried out with this model. As the solid orange line in Figure 4 shows, on Gregory et al.’s chosen method of 30-year regression analysis *α* varied even more over the historical period for the MPI-ESM1.1 ensemble mean than for the CMIP5 multimodel-mean.

The paper states that the distribution of estimated climate feedback obtained by OLS regression of *R* against *T* over the 100 individual MPI-ESM1.1 full *historical* simulation runs is 1.38±0.08 Wm^{−2}K^{−1} (mean and standard deviation), and that this is consistent with the median of 1.43 Wm^{−2}K^{−1} estimated by Dessler et al. (2018)[5] from the same dataset using differences between the means of the last and the first decades

However, the diagnosed forcings for 1851 and, particularly, 1850, were affected by spin-up issues (Thorsten Mauritsen, oral personal communication 8 August 2018).[6] When omitting these years from the period analysed, the median *α* estimates from differencing averages of the last and first decades and from OLS regression are closer, at 1.34 and 1.37 Wm^{−2}K^{−1} respectively. However, the regression estimate will inevitably be affected by the response to episodic volcanic forcing differing from that to other forcings, as will the differencing method in this case since volcanic forcing differed materially between 1852-1861 and 1996-2005. While – as the paper says – due to using more data regression gives a smaller slope uncertainty than the differences-of-means method, this advantage becomes smaller if longer averaging periods are used. Differencing between averages over 20 years reduces the standard deviation of the slope (*α*) estimate to under 0.10 Wm^{−2}K^{−1}_{}, little more than the 0.08 Wm^{−2}K^{−1} for regression.

Unfortunately, the raw diagnosed MPI-ESM1.1 *historical* simulation ERF estimates are biased low. That is because they are derived from the changes in TOA radiative imbalance in simulations by the model’s atmospheric component (ECHAM6.3) with the same changes in atmospheric composition and land use as in the *historical* simulation but fixed sea surface temperatures (SST). The land surface warms in these fixed SST simulations, but no correction was made here for the resulting increase in *R*, resulting in underestimation of the *historical* simulation ERF. This is a well known issue, but it is usually ignored.[7] Scaling the unadjusted historical ERF by a factor of 1.07 provides an appropriate correction for land surface warming.[8] When this correction is made, and additionally the model’s response to volcanic forcing is allowed to differ from that to other forcings, the 1.37 Wm^{−2}K^{−1}_{} *α* regression estimate from ensemble-mean 1852-2005 *historical* simulation data becomes 1.45±0.035 Wm^{−2}K^{−1}_{}.[9]

The method I use to adjust for the different efficacy of volcanic forcing is to include the AR5 volcanic forcing series as a separate regressor (explanatory variable); its estimated regression coefficient is 0.15±0.02. Since the AOGCM’s diagnosed volcanic ERF is only about 80% as high as volcanic forcing per AR5,[10] that implies that the equilibrium efficacy of volcanic ERF in MPI-ESM1.1 is approximately (0.8 – 0.15) / 0.8, or 0.8. Equivalently, one could say that *α* for volcanic ERF is some 1.25x that for CO_{2} and other historical forcing agents. This method introduces one additional free parameter, which due to the episodic nature of volcanism can be well estimated. The single *α* estimate produced by this approach applies to the sum of non-volcanic ERF and efficacy-scaled volcanic ERF, not to their unweighted sum.

When further adjusting the diagnosed historical ERF for the low efficacy of volcanic forcing, by deducting AR5 volcanic forcing scaled by the 0.15 regression coefficient estimate, *α* estimates from taking the ratio of differences in mean *R* and *T* between averages for the last and first one, two or three decades are 1.44–1.45 Wm^{−2}K^{−1}_{}, essentially identical to the regression derived estimate.

Importantly, when the low efficacy of volcanic forcing is allowed for by including AR5 volcanic forcing as well as *T* a regressor when regressing MPI-ESM1.1 historical simulation annual ensemble mean land-surface warming adjusted *R* over 1852-2005, there is no evidence of any time variation in *α* whatsoever. Figure 2 shows the fit between the diagnosed historical ERF, corrected for land surface warming and adjusted for the low equilibrium efficacy of volcanic forcing, and ERF estimated from ensemble-mean changes in *T* and in TOA radiative imbalance *N* in the MPI-ESM1.1 historical simulations, based on the *α* estimate of 1.45 Wm^{−2}K^{−1}.[11] The fit is excellent, with a regression R^{2} of 0.93 (0.96 correlation). Note that the residuals will reflect any time variation in the efficacy of volcanic ERF as well as any time variation in *α*.

Figure 2. Diagnosed 1852-2005 *historical* ERF for MPI-ESM1.1, when land-surface warming and the low efficacy of volcanic forcing are adjusted for, and that estimated from the model’s ensemble-mean historical simulations response using a fixed 1.45 Wm^{−2}K^{−1 }climate feedback (*α*) estimate.

.

The residuals from the fit (Figure 3) appear random and have low autocorrelation.

Figure 3. Residuals when regressing MPI-ESM1.1 ensemble-mean historical simulation Δ*R* against Δ*T *when land-surface warming and the low efficacy of volcanic forcing have been adjusted for.

.

Nor is there any evidence for the residuals being non-normal. The Shapiro-Wilk normality test gives a *p*-value of 0.96 (a value of ≤ 0.05 would usually be taken to reject normality). And a QQ plot of the quantiles of the residuals against those expected if their distribution were normal, with the same standard deviation as the residuals, is very close to a 1-1 line (Figure 4).

Figure 4. QQ plot for the residuals shown in Figure 1. If the residuals were exactly normally distributed the green line would coincide with the black line, given enough data points.

.

Moreover, the residuals in R from this regression have a standard deviation, of only 0.11 Wm^{−2}, that is barely more than the sum in quadrature of the mean estimated standard deviations of the ensemble-mean *R* time series in the diagnosed ERF time series and of the ensemble-mean *N* time series from the 100 historical simulations, both caused by internal variability. So the contribution from any underlying variation in *α* must be negligible.

Results when pentadal rather than annual mean data are regressed (starting in 1851 to maximise the number of pentads) are consistent with the conclusion that the residuals are almost entirely random noise. The regression coefficients are almost identical to those for annual regression and the fit is almost perfect, with an R^{2} of 0.99 and a residual error standard deviation of 0.04 Wm^{−2}. Regressing pentadal rather than annual mean data often improves noise suppression and leads to more reliable slope estimates. Gregory et al. show (their Fig.4) that when the MPI-ESM1.1 ensemble mean *T* is regressed against *R* (which is much noisier than *T*) rather than *vice versa*, the *α* estimate (the reciprocal of the fitted *T* on *R* regression slope) is noticeably higher. When regressing annual mean data the *α* estimate increases by approaching 10%, from 1.45 to 1.58 Wm^{−2}K^{−1}, if *T* is regressed on *R*,[12] reflecting the large regression dilution in that case. But when regressing *T* on *R* using pentadal mean data, the *α* estimate only increases by ~1%.

My regression results, along with the plot of the fit achieved, show that there is essentially no time-variation in *α* during the MPI-ESM1.1 historical simulation, provided that the low equilibrium efficacy (or, equivalently, larger climate feedback) for volcanic forcing is allowed for. It would not be possible to tell that from the solid orange line in Figure 1.

Importantly, the *α* of 1.45 Wm^{−2}K^{−1}_{} estimated over the historical period is very closely in line with the *α* of 1.43 – 1.44 Wm^{−2}K^{−1}_{} estimated from the model’s abrupt4xCO2 simulation over appropriate periods,[13] and with the *α* of 1.47 Wm^{−2}K^{−1} estimated from the model’s 1pctCO2 simulation (in which the CO_{2} concentration is increased by 1% a year compound). Using that simulation provides an even closer proxy than the abrupt4xCO2 simulation for historical period CO_{2}-only *α.*[14]

Although I estimated the required adjustment for volcanic forcing by regressing mean values from the 100-member ensemble, regressing instead values from individual runs also works well. The resulting *α* estimates have a mean (and median) of 1.46 Wm^{−2}K^{−1}, almost identical to *α* estimated from regressing ensemble-mean values. Their standard deviation is 0.08 Wm^{−2}K^{−1}, the same as when AR5 volcanic forcing is not included as a regressor.

I have shown that there is no evidence whatsoever that *α* actually varied over the course of the MPI-ESM1.1 historical simulation (provided diagnosed ERF is adjusted for land-surface warming and the low efficacy of volcanic forcing is accounted for). Accordingly, the large fluctuations of the solid orange line in Figure 1 must be regarded as due simply to the influence of low-efficacy volcanic forcing, with contributions from ERF not having been adjusted for land-surface warming and from random variability (principally in *R*).

**Other evidence: GISS-E2-R**

Although it is impossible to prove that the same is true for the CMIP5 multimodel-mean given that no accurate estimate of CMIP5 mean *historical* simulation ERF is available, a historical forcing time series has been diagnosed for one CMIP5 model (GISS-E2-R). Unfortunately, this is for instantaneous radiative forcing, not for ERF, so there is some uncertainty involved.[15] For the GISS-E2-R historical simulation annual ensemble-mean, a linear fit of *R* on *T* and AR5 volcanic forcing gives an R^{2} of 0.94 (a correlation of 0.97). The residuals are slightly larger than for the MPI-ESM1.1 ensemble-mean, as averaging is over only 6 rather than 100 runs, but they show no evidence of non-normality or significant autocorrelation. The excellent fit and satisfactory residuals strongly suggest that *α* was stable during the GISS-E2-R historical simulation. Moreover, the *α* estimate of 2.08 Wm^{−2}K^{−1}_{} is closely in line with *α* estimated by regression over appropriate periods of data from the model’s idealised CO_{2}-forced simulations.[16]

Thus, for both MPI-ESM1.1 and GISS-E2-R, *α* estimated over the historical period is, provided an efficacy adjustment is made for volcanic forcing (e.g., by estimating it simultaneously with *α*), both stable and closely in line with its value as estimated from idealised CO_{2}-only forcing simulations over periods providing a comparable forcing duration. There is therefore no evidence here for a different *α* value applying to aerosol forcing (as suggested by some authors)[17][18] or to any other significant component of historical forcing apart from volcanism[19], or for any underlying variation in climate feedback strength in AOGCMs during their *historical* simulations.

**Summary and Conclusions**

Volcanic ERF has a low equilibrium efficacy, or equivalently a higher *α* value applies to it, relative to historical period non-volcanic ERF. This is in agreement with what Gregory et al. (2019) conclude.

Provided that the low efficacy of volcanic forcing is adjusted for:

- there is no evidence of any forced variation of
*α*during the historical period in MPI-ESM1.1, the only model for which it is possible to determine this accurately. - nor is there any such evidence in GISS-E2-R, the only CMIP5 model with diagnosed forcing
- in both cases estimated
*α*is very close to*α*estimated from comparable CO_{2}-only forced abrupt4xCO2 and 1pctCO2 simulation data, implying that greenhouse gas and non-volcanic non-greenhouse gas*historical*simulation ERF has a very similar equilibrium efficacy (or, equivalently,*α*is very similar for both).

While these findings go beyond those in Gregory et al. (2019), they are not inconsistent with anything in the paper.

While there is no reason to believe that the behaviour of other CMIP5 models in their *historical* simulations, in aggregate or individually, is different, it is impossible to confirm or disprove that because of the inability to form a good estimate of the evolution of ERF in those simulations, either in aggregate or individually.

Estimating *α* during the historical period by OLS regression over sliding 30-year periods, without adjusting for the low efficacy of volcanic forcing, is not the best way of examining whether and how climate feedback varied over the historical period, with any underlying variation in *α* being obscured by fluctuations resulting from the low efficacy of volcanic forcing and here also (to a lesser extent) from inaccurately estimated historical ERF.

Nicholas Lewis 31 October 2019

.

[1] The equilibrium efficacy of a forcing agent is the ratio of α when changes are driven by an increase in CO_{2} forcing to α when changes are driven by an increase in forcing by that agent (Marvel et al 2016). It can be viewed as the ratio, for that agent, of the ultimate effective radiative forcing to the chosen forcing measure. For a few forcing agents, the conventional fixed-SST measure of ERF does not result in a unit equilibrium efficacy.

[2] Lewis, N. and Curry, J.A., 2015. The implications for climate sensitivity of AR5 forcing and heat uptake estimates. *Climate Dynamics*, 45(3-4), pp.1009-1023.

[3] Gregory, J.M., Andrews, T., Good, P., Mauritsen, T. and Forster, P.M., 2016. Small global-mean cooling due to volcanic radiative forcing. *Climate Dynamics*, *47*(12), pp.3979-3991.

[4] I tried using regression to estimate what scaling of IPCC AR5 aerosol and volcanic and solar forcing results in the best fit of Δ*R* to Δ*T* for CMIP5-mean *historical* simulation data. However, the resulting residuals were dominated by spikes around volcanic episodes. It appears that the requisite scaling of AR5 volcanic forcing varies between the various eruptions, and that the time-profile of volcanic episodes may not quite match between CMIP5 and AR5. Moreover, the residuals show apparent trends over differing sub-periods, suggesting that the time-profile and not just the scaling of aerosol forcing differ between AR5 and the CMIP5-mean.

[5] Dessler AE, Mauritsen T, Stevens B (2018) The influence of internal variability on Earth’s energy balance framework and implications for estimating climate sensitivity. Atmos Chem Phys 18:5147–5155. https ://doi.org/10.5194/acp-18-5147-2018

[6] That would account for diagnosed forcing in 1850 and 1851 being lower than those for all other non-volcanic years in the 1850s despite solar forcing being high in those years.

[7] However, GISS do correct for land surface warming in their fixed-SST estimates of ERF

[8] The 7% upwards adjustment to ERF is based on the 0.052 KW^{−1}m^{2} ratio of the changes in mean near-surface air temperature and in TOA radiative imbalance between 1859-82 and 1999-2008, being periods near the start and at the end of the diagnostic fixed-SST simulations that are not influenced by volcanic eruptions, multiplied by 1.35 Wm^{−2}K^{−1}, being *α* estimated by regression over years 1–150 of the MPI-ESM1.1 abrupt4xCO2 simulation. The value of *α* applicable to land warming in the fixed-SST, historical forcing simulations is unknown but seems unlikely to differ greatly from its thus-estimated value.

[9] All stated uncertainties are 1*σ* (one standard deviation) unless otherwise indicated.

[10] The ratio of the mean volcanic forcing excursions as diagnosed in MPI-ESM1.1, scaled by 1.07x, and per AR5 is ~0.85 if averaged over all six major eruptions since 1850. But for the two most recent, best observed eruptions (Pinatubo and El Chinon) the ratio is ~0.77. These values are consistent with the 0.78 ratio estimated for ECHAM6.3, the atmospheric component of MPI-ESM1.1, in Gregory et al (2019).

[11] The 1852-2005 anomalies shown are relative to means over 1852–1881.

[12] With volcanic forcing included as a separate regressor in both cases.

[13] By regression over years 2-20 and 2–50 of abrupt4xCO2, which are suitable proxies for what *α* would have been over the historical period if forcing had been purely from a rising CO_{2} concentration.

[14] Obtaining *α* from the ratio of mean Δ*R* to mean Δ*T* over years 60–80 of the ensemble-mean from the MPI-ESM1.1 1pctCO2 simulations (in which the CO_{2} concentration reaches twice its original level during year 70). The value of Δ*T* provides an accurate estimate of the model’s transient climate response, and by adopting an estimate of the ERF for a doubling of preindustrial CO_{2} concentration (*F*_{2×CO2}) a value for *α* can also be obtained. I use an estimate of 3.95 Wm^{−2} for *F*_{2×CO2}, which produces an *α* estimate of 1.47 Wm^{−2}K^{−1}_{}. Averaging over years 50-90 gives the same *α* estimate. Estimating *α* instead by regression of Δ*R* on Δ*T* over years 1–100 of the 1pctCO2 simulations gives an *α* estimate of 1.47 Wm^{−2}K^{−1}, whether or not an intercept term is estimated (as the Δ*R* and Δ*T* values are anomalies relative to preindustrial control simulation means, in principle no intercept should be necessary).

[15] What was diagnosed for the GISS-E2-R *historical* simulation was instantaneous radiative forcing at the tropopause (IRF), not ERF. I multiplied the diagnosed time series by 0.97 to convert it to estimated ERF, that being the ratio of ERF of 4.35 Wm^{−2} to IRF of 4.5 Wm^{−2} for a doubled CO_{2} concentration, although it is possible this ratio is inaccurate for the composite historical forcing.

[16] Regression over years 2–50 of abrupt4xCO2 gives an *α* estimate of 2.09 Wm^{−2}K^{−1}. Estimating *α* instead by regressing Δ*R* on Δ*T* over years 1-100 the 1pctCO2 simulation gives a value of 2.01 Wm^{−2}K^{−1}; regressing instead over years 1-70 or 1-140 give virtually identical estimates. So does regressing with no intercept.

[17] Shindell, D.T., 2014: Inhomogeneous forcing and transient climate sensitivity. *Nature Climate Change*, **4**, 274–277.

[18] Rotstayn, L. D., M. A. Collier, D. T. Shindell, and O. Boucher, 2015: Why does aerosol forcing control historical global-mean surface temperature change in CMIP5 models?, *J. Clim.*,** 28**, 6608–6625, doi:10.1175/jcli-d-14-00712.1.

[19] Gregory et al’s findings point in the same direction, although they only go as far as saying “This explanation does not require EffCS for anthropogenic aerosol to differ substantially from the CO2 EffCS.”

]]>The recently published open-access paper “How accurately can the climate sensitivity to CO2 be estimated from historical climate change?” by Gregory et al.[i] makes a number of assertions, many uncontentious but others in my view unjustified, misleading or definitely incorrect. Perhaps most importantly, they say in the Abstract that “The real-world variations mean that historical EffCS [effective climate sensitivity] underestimates CO_{2} EffCS by 30% when considering the entire historical period.” But they do not indicate that this finding relates only to effective climate sensitivity in GCMs, and then only to when they are driven by one particular observational sea surface temperature dataset.

However, in this article I will focus on one particular statistical issue, where the claim made in the paper can readily be proven wrong without needing to delve into the details of GCM simulations.

Gregory et al. consider a regression in the form *R* = *α* *T*, where *T* is the change in global-mean surface temperature with respect to an unperturbed (i.e. preindustrial) equilibrium, and *R* is the radiative response of the climate system to change in *T*. *α* is thus the climate feedback parameter, and *F*_{2xCO2 }/ *α* is the EffCS estimate, *F*_{2xCO2} being the effective radiative forcing for a doubling of preindustrial atmospheric carbon dioxide concentration.

The paper states that “that estimates of historical α made by OLS [ordinary least squares] regression from real-world *R* and *T* are biased low”. OLS regression estimates *α* as the slope of a straight line fit between *R* and *T *data points (usually with an intercept term since the unperturbed equilibrium climate state is not known exactly), by minimising the sum of the squared errors in *R*. Random errors in *R* do not cause a bias in the OLS slope estimate. Thus in the below chart, with *R* taken as plotted on the y-axis and *T *on the x-axis, OLS finds the red line that minimizes the sum of the squares of the lengths of the vertical lines.

.

However, some of the variability in measured *T* may not produce a proportionate response in *R*. That would occur if, for example, *T* is measured with error, which happens in the real world. It is well known that in such an “error in the explanatory variable” case, the OLS slope estimate is (on average) biased towards zero. This issue has been called “regression dilution”.

Regression dilution is one reason why estimates of climate feedback and climate sensitivity derived from warming over the historical period often instead use the “difference method”.[ii] [iii] [iv] [v] The difference method involves taking the ratio of differences, Δ*T *and Δ*R*, between *T *and *R* values late and early in the period. In practice Δ*T *and ΔR are usually based on differencing averages over at least a decade, so as to reduce noise.

I will note at this point that when a slope parameter is estimated for the relationship between two variables, both of which are affected by random noise, the probability distribution for the estimate will be skewed rather than symmetric. When deriving a best estimate by taking many samples from the error distributions of each variable, or (if feasible) by measuring them each on many differing occasions, the appropriate central measure to use is the sample median not the sample mean. Physicists want measures that are invariant under reparameterization[vi], which is a property of the median of a probability distribution for a parameter but not, when the distribution is skewed, of its mean. Regression dilution affects both the mean and the median estimates of a parameter, although to a somewhat different extent.

So far I agree with what is said by Gregory et al. However, the paper goes on to state that “The bias [in *α* estimation] affects the difference method as well as OLS regression (Appendi*x *D.1).” This assertion is wrong. If true, this would imply that observationally-based estimates using the difference method would be biased slightly low for climate feedback, and hence biased slightly high for climate sensitivity. However, the claim is *not *true.

The statistical analyses in Appendi*x *D consider estimation by OLS regression of the slope *m *in the linear relationship *y*(*t*) = *m x*(*t*), where *x *and y are time series the available data values of which are affected by random noise. Appendi*x *D.1 considers using the difference between the last and first single time periods (here, it appears, of a year), not of averages over a decade or more, and it assumes for convenience that both *x *and *y* are recentered to have zero mean, but neither of these affect the point of principle at issue.

Appendi*x *D.1 shows, correctly, that when only the endpoints of the (noisy) *x *and *y* data are used in and OLS regression, the slope estimate for *m *is Δ*y*/Δ*x*, the same as the slope estimate from the difference method. It goes on to claim that taking the slope between the *x *and *y* data endpoints is a special case of OLS regression and that the fact that an OLS regression slope estimate is biased towards zero when there is uncorrelated noise in the *x *variable implies that the difference method slope estimate is similarly so biased.

However, that is incorrect. The median slope estimate is not biased as a result of errors in the *x *variable when the slope is estimated by the difference method, nor when there only two data points in an OLS regression. And although the mean slope estimate is biased, the bias is high, not low. Rather than going into a detailed theoretical analysis of why that is the case, I will show that it is by numerical simulation. I will also explain how in simple terms regression dilution can be viewed as arising, and why it does not arise when only two data points are used.

The numerical simulations that I carried out are as follows. For simplicity I took the true slope *m *as 1, so that the true relationship is *y* = *x, *and that true value of each *x* point is the sum of a linearly trending element running from 0 to 100 in steps of 1 and a random element uniformly distributed in the range -30 to +30, which can be interpreted as a simulation of a trending “climate” portion and a non-trending “weather” portion.[vii] I took both *x* and *y* data (measured) values as subject to zero-mean independent normally distributed measurement errors with a standard deviation of 20. I took 10,000 samples of randomly drawn (as to the true values of *x* and measurement errors in both *x* and *y*) sets of 101 *x* and 101 *y* values.

Using OLS regression, both the median and the mean of the resulting 10,000 slope estimates from regressing *y* on *x* using OLS were 0.74 – a 26% downward bias in the slope estimator due to regression dilution.

The median slope estimate based on taking differences between the averages for the first ten and the last ten *x* and *y* data points was 1.00, while the mean slope estimate was 1.01. When the averaging period was increased to 25 data points the median bias remained zero while the already tiny mean bias halved.

When differences between just the first and last measured values of *x *and *y* were taken,[viii] the median slope estimate was again 1.00 but the mean slope estimate was 1.26.

Thus, the slope estimate from using the difference method was median-unbiased, unlike for OLS regression, whether based on averages over points at each end of the series or just the first and last points.

The reason for the upwards mean bias when using the difference method can be illustrated simply, if errors in *y* (which on average have no effect on the slope estimate) are ignored. Suppose the true Δ*x *value is 100, so that Δ*y* is 100, and that two *x *samples are subject to errors of respectively +20 and –20. Then the two slope estimates will be 100/120 and 100/80, or 0.833 and 1.25, the mean of which is 1.04, in excess of the true slope of 1.

The picture remains the same even when (fractional) errors in *x* are smaller than those in *y*. On reducing the error standard deviation for *x *to 15 while increasing it to 30 for *y*, the median and mean slope estimates using OLS regression were both 0.84. The median slope estimates using the difference method were again unbiased whether using 1, 10 or 25 data points at the start and end, while the mean biases remained under 0.01 when using 10 or 25 data point averages and reduced to 0.16 when using single data points.

In fact, a moment’s thought shows that the slope estimate from 2-point OLS regression must be unbiased. Since both variables are affected by error, if OLS regression gives rise to a low bias in the slope estimate when *x *is regressed on *y*, it must also give rise to a low bias in the slope estimate when *y* is regressed on *x*. If the slope of the true relationship between *y* and *x *is m, that between *x *and *y* is 1/m. It follows that if regressing *x *on *y* gives a biased low slope estimate, taking the reciprocal of that slope estimate will provide an estimate of the slope of the true relationship between *y* and *x *that is biased high. However, when there are 2 data points the OLS slope estimate from regressing *y* on *x *and that from regressing *x *on *y* and taking its reciprocal are identical (since the fit line will go through the 2 data points in both cases). If the *y*-against-*x *and *x*-against-*y* OLS regression slope estimates were biased low that could not be so.

As for how and why errors in the *x *(explanatory) variable cause the slope estimate in OLS regression to be biased towards zero (provided there are more than two data points), but errors in the *y* (dependent) variable do not, the way I look at it is this. For simplicity, I take centered (zero-mean) *x *and *y* values. The OLS slope estimate is then Σ*xy* / Σ*xx*, that is to say the weighted sum of the *y* data values divided by the weighted sum of the *x *data values, the weights being the *x *data values. An error that moves a measured *x *value further from the mean of zero not only reduces the slope *y*/*x *for that data point, but also increases the weight given to that data point when forming the OLS slope estimate. Hence such points are given more influence when determining the slope estimate. On the other hand, an error in *x *that moves the measured value nearer to zero mean *x *value, increasing the *y*/*x *slope for that data point, reduces the weight given to that data point, so that it is less influential in determining the slope estimate. The net result is a bias towards a smaller slope estimate. However, for a two-point regression, this effect does not occur, because whatever the signs of the errors affecting the *x*-values of the two points, both *x*-values will always be equidistant from their mean, and so both data points will have equal influence on the slope estimate whether they increase or decrease the *x*-value. As a result, the median slope estimate will be unbiased in this case. Whatever the number of data points, errors in the y data points will not affect the weights given to those data points when forming the OLS slope estimate, and errors in the *y*-data values will on average cancel out when forming the OLS slope estimate Σ*xy* / Σ*xx*.

So why is the proof in Gregory et al. Appendix D.1, supposedly showing that OLS regression with 2 data points produces a low bias in the slope estimate when there are errors in the explanatory (*x*) data points, invalid? The answer is simple. The Appendi*x *D.1 proof relies on the proof of low bias in the slope estimate in Appendi*x *D.3, which is expressed to apply to OLS regression with any number of data points. But if one works through the equations in Appendi*x *D.3, one finds that in the case of only 2 data points no low bias arises – the expected value of the OLS slope estimate equals the true slope.

It is a little depressing that after many years of being criticised for their insufficiently good understanding of statistics and lack of close engagement with the statistical community, the climate science community appears still not to have solved this issue.

*Update 29 October 2019*

Just to clarify, the final paragraph is a general remark about the handling of statistical issues in climate science research, not a particular remark about this new paper (where the statistical mistake made does not in any case affect any of the results).

*Update 23 January 2020*

Typo in 3rd paragraph fixed (*R* = *α* *T* corrected to *R* in the 2nd line).

.

[i] Gregory, J.M., Andrews, T., Ceppi, P., Mauritsen, T. and Webb, M.J., 2019. How accurately can the climate sensitivity to CO₂ be estimated from historical climate change?. Climate Dynamics.

[ii] Gregory JM, Stouffer RJ, Raper SCB, Stott PA, Rayner NA (2002) An observationally based estimate of the climate sensitivity. J Clim 15:3117–3121.

[iii] Otto A, Otto FEL, Boucher O, Church J, Hegerl G, Forster PM, Gillett NP, Gregory J, Johnson GC, Knutti R, Lewis N, Lohmann U, Marotzke J, Myhre G, Shindell D, Stevens B, Allen MR (2013) Energy budget constraints on climate response. Nature Geosci 6:415–416

[iv] Lewis, N. and Curry, J.A., 2015. The implications for climate sensitivity of AR5 forcing and heat uptake estimates. Climate Dynamics, 45(3-4), pp.1009-1023.

[v] Lewis, N. and Curry, J., 2018. The impact of recent forcing and ocean heat uptake data on estimates of climate sensitivity. Journal of Climate, 31(15), pp.6051-6071.

[vi] So that, for example, the median estimate for the reciprocal of a parameter is the reciprocal of the median estimate for the parameter. This is not generally true for the mean estimate. This issue is particularly relevant here since climate sensitivity is reciprocally related to climate feedback.

[vii] There was an underlying trend in T over the historical period, and taking it to be linear means that, in the absence of noise, linear slope estimated by regression and by the difference method would be identical.

[viii] Correcting the small number of negative slope estimates arising when the *x* difference was negative but the *y* difference was positive to a positive value (see, e.g., Otto et al. 2013). Before that correction the median slope estimate had a 1% low bias. The positive value chosen (here the absolute value of the negative slope estimate involved) has no effect of the median slope estimate provided it exceeds the median value of the remaining slope estimates, but does materially affect the mean slope estimate.

One of the longest-standing Climate Audit controversies has been about the bias introduced into reconstructions that use ex post screening/correlation. In today’s post, I’ll report on a little noticed* Climategate-2 email in which a member of the paleoclimatology guild (though then junior) reported to other members of the guild that he had carried out simulations to test “the phenomenon that Macintyre has been going on about”, finding that the results from his simulations from white noise “clearly show a ‘hockey-stick’ trend”, a result that he described as “certainly worrying”. (*: WUWT article here h/t Brandon).

A more senior member of the guild dismissed the results out of hand: “Controversy about which bull caused mess not relevent.” Members of the guild have continued to merrily ex post screen to this day without cavil or caveat.

The bias, introduced by ex post screening of a large network of proxies by correlation against increasing temperatures, has been noticed and commented on (more or less independently) by myself, David Stockwell, Jeff Id, Lucia and Lubos Motl. It is trivial to demonstrate through simulations, as each of us has done in our own slightly different ways.

In my case, I had directed the criticism of ex post screening particularly at practices of D’Arrigo and Jacoby in their original studies: see, for example, one of the earliest Climate Audit posts (Feb 2005) where I wrote:

Jacoby and d’Arrigo [1989] states on page 44 that they sampled 36 northern boreal forest sites within the preceding decade, of which the ten “judged to provide the best record of temperature-influenced tree growth” were selected. No criteria for this judgement are described, and one presumes that they probably picked the 10 most hockey-stick shaped series. I have done simulations, which indicate that merely selecting the 10 most hockey stick shaped series from 36 red noise series and then averaging them will result in a hockey stick shaped composite, which is more so than the individual series.

The issue of cherry picking arose forcefully at the NAS Panel on paleoclimate reconstructions on March 2, 2006 when D’Arrigo told a surprised panel on March 2 that you had to pick cherries if you wanted to make “cherry pie”, an incident that I reported in a blog post a few days later on March 7 (after my return to Toronto.)

Ironically, on the same day, Rob Wilson, then an itinerant and very junior academic, wrote a thus far unnoticed CG2 email (4241. 2006-03-07) which reported on simulations that convincingly supported my concerns about ex post screening. Wilson’s email was addressed to most of the leading dendroclimatologists of the day: Ed Cook, Rosanne D’Arrigo, Gordon Jacoby, Jan Esper, Tim Osborn, Keith Briffa, Ulf Buentgen, David Frank, Brian Luckman and Emma Watson, as well as Philip Brohan of the Met Office. Wilson wrote:

Greetings All,

I thought you might be interested in these results. The wonderful thing about being paid properly (i. e. not by the hour) is that I have time to play.

The whole Macintyre issue got me thinking about over-fitting and the potential bias of screening against the target climate parameter. Therefore, I thought I’d play around with some randomly generated time-series and see if I could ‘reconstruct’ northern hemisphere temperatures.

I first generated 1000 random time-series in Excel – I did not try and approximate the persistence structure in tree-ring data. The autocorrelation therefore of the time-series was close to zero, although it did vary between each time-series. Playing around therefore with the AR persistent structure of these time-series would make a difference. However, as these series are generally random white noise processes, I thought this would be a conservative test of any potential bias.

I then screened the time-series against NH mean annual temperatures and retained those series that correlated at the 90% C. L. 48 series passed this screening process.

Using three different methods, I developed a NH temperature reconstruction from these data:

- simple mean of all 48 series after they had been normalised to their common period
- Stepwise multiple regression
- Principle component regression using a stepwise selection process.
The results are attached. Interestingly, the averaging method produced the best results, although for each method there is a linear trend in the model residuals – perhaps an end-effect problem of over-fitting.

The reconstructions clearly show a ‘hockey-stick’ trend. I guess this is precisely the phenomenon that Macintyre has been going on about. [SM bold]It is certainly worrying, but I do not think that it is a problem so long as one screens against LOCAL temperature data and not large scale temperature where trend dominates the correlation. I guess this over-fitting issue will be relevant to studies that rely more on trend coherence rather than inter-annual coherence. It would be interesting to do a similar analysis against the NAO or PDO indices. However, I should work on other things.

Thought you’d might find it interesting though. comments welcome

Rob

Wilson’s sensible observations, which surely ought to have caused some reflection within the guild, were peremptorily dismissed about 15 minutes later by the more senior Ed Cook as nothing more than “which bull caused which mess”:

You are a masochist. Maybe Tom Melvin has it right: “Controversy about which bull caused mess not relevent. The possibility that the results in all cases were heap of dung has been missed by commentators.”

Cook’s summary and contemptuous dismissal seems to have persuaded the other correspondents and the issue receded from the consciousness of the dendroclimatology guild.

Looking back at the contemporary history, it is interesting to note that the issue of the “divergence problem” embroiled the dendro guild the following day (March 8) when Richard Alley, who had been in attendance on March 2, wrote to IPCC Coordinating Lead Author Overpeck “doubt[ing] that the NRC panel can now return any strong endorsement of the hockey stick, or of any other reconstruction of the last millennium”: see 1055. 2006-03-11 (embedded in which is Alley’s opening March 8 email to Overpeck). In a series of interesting emails (e.g. CG2 1983. 2006-03-08; 1336. 2006-03-09; 3234. 2006-03-10; 1055. 2006-03-11), Alley and others discussed the apparent concerns of the NAS panel about the divergence problem, e.g. Alley:

As I noted, my observations of the NRC committee members suggest rather strongly to me that they now have serious doubts about tree-rings as paleothermometers (and I do, too… at least until someone shows me why this divergence problem really doesn’t matter). —

In the end, after considerable pressure from paleoclimatologists, the NAS Panel more or less evaded the divergence problem (but that’s another story, discussed here from time to time.)

Notwithstanding Wilson’s “worry” about the results of his simulations, ex post screening continued to be standard practice within the paleoclimate guild. Ex post screening was used, for example, in the Mann et al (2008) CPS reconstruction. Ross and I commented on the bias in a comment published by PNAS in 2009 as follows:

Their CPS reconstruction screens proxies by calibration-period correlation, a procedure known to generate ‘‘hockey sticks’’ from red noise (4 – Stockwell, AIG News, 2006).

In their reply in PNAS, Mann et al dismissed the existence of ex post screening bias, claiming that we showed “unfamiliarity with the concept of screening regression/validation”:

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ”hockey sticks” is unsupported in peer reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

CA readers will remember that the issue arose once again in Gergis et al 2012, who had claimed to have carried out detrended screening, but had not. CA readers will also recall that Mann and Schmidt both intervened in the fray, arguing in favor of ex post screening as a valid procedure.

]]>

The connection of CG email nomenclature to Unix timestamps was observed as early as Dec 7, 2009 (see WUWT commenter crosspatch here)m who similarly noticed discrepancies between nomenclature and email times, but concluded that they showed that hacker used a computer set to Eastern North American time (-05:00 Standard).

I pointed the error out on Twitter with technical analysis. I also linked Ostanin to the original WUWT comment making similar point.

Ostanin responded by claiming that my (correct) replication of CG1 nomenclature was “needlessly complicated” and doubled down with his incorrect assertion that “time seen in hacked email headers is 5 hours behind – to the second – of the time in the decoded email file names”:

Ostanin challenged everyone “to try to see for themselves” – pointing to a internet utility:

After I re-iterated my technical criticism, Iggy stated that he wasn’t “sure if either of [me or Charles Wood] ever came across a Kremlin narrative they didn’t endorse”. Then, in true Mannian (and Eliot Higgins) style, Ostanin blocked me on Twitter.

While it’s a bit absurd to waste time on this trivia, Iggy’s falsehoods remain in circulation. He hasn’t conceded anything. Nor have Revkin, Harrabin, Rice or other re-tweeters conceded that Iggy’s analysis was nonsensical.

In my tweets, I observed that Iggy’s analysis was based on an email sent from GMT timezone and that the 5-hour difference between nomenclature and email time only held for emails from that time zone. What any competent analyst (and we may safely exclude Iggy from that category) would have done is to compare email timestamp to nomenclature across multiple timezones and Daylight/Standard times. I’ve done so in the table below.

Nomenclature for GMT timezone emails in winter are 5 hours ahead, but only 4 hours ahead in summer. This should have caused Iggy to pause. Nomenclature for emails sent from Eastern timezone exactly matched the email time – both in Standard (winter) and Daylight (summer) time. Nomenclature for emails sent from Mountain time (two hours behind Eastern) were – 2 hours in both winter and summer.

Ironically, the very first email in the Climategate dossier was sent from Iggy’s Ekaterinaburg (+05:00). But instead of the nomenclature exactly matching the email time, the nomenclature was 10 hours ahead.

In other words, Ostanin got everything pretty much backwards and upside down. It’s about as bad a bit of analysis as it is possible to imagine. And, instead of simply conceding that he’d made a mistake (which is easy enough to do), Ostanin got belligerent and shut his ears. Unfortunately, Ostanin’s falsehoods are now in circulation and, like Mann’s, will probably fester forever.

]]>

Antarctic d18O is one of the few proxies which can be accurately date in both very recent measurements and in Holocene and deep time. However, rather against message, Antarctic d18O over the past two millennia (as for example the PAGES2K 2013 compilation) has mostly gone the “wrong” way, somewhat diluting the IPCC message – to borrow a phrase.

PAGES2017 relaxed the PAGES2K ex ante quality control criteria to include 15 additional series (most of which are not *new*), but these, if anything, reinforce the earlier message of gradual decline over the past two millennia.

PAGES2K (2017) also added two borehole inversion series, which were given a sort of special exemption from PAGES2K quality control standards on resolution and dating. I suspect that readers already know why these series were given special exemption: one of them has a very pronounced blade. Long-time readers may vaguely recall that an (unpublished) Antarctic borehole inversion series also played an important role in conclusions of the NAS 2006 report. I tried at the time to get underlying measurement data, but was unsuccessful. A few years ago, when the PAGES2017 borehole inversion series was published, I managed (through an intermediary) to obtain much of the underlying data and even some source code for the borehole inversion. I’ve revisited the topic and I conclude today’s post with a couple of teasers and what is an interesting analysis in works.

Here is a plot of the PAGES2K Antarctic temperature reconstruction. It showed a long decline from mid-first millennium, with nearly all 19th and 20th century values and even early 21st century below the long-term mean.

This series was used in IPCC AR4 (see below). Though its most recent portion is rather muddy in the IPCC diagram, the lack of any 20th century blade is clear.

PAGES2K authors used 11 datasets in their temperature reconstruction. According to their statement of methods, they applied sensible ex ante quality control procedures by aiming at use of “longest, highest resolution and best synchronized” of available records.

Data for the Antarctic reconstruction were selected based on a restrictive approach aimed at using the longest, highest resolution and best synchronized of available records. All records were water isotope (d18O or dD) series from ice cores. The project aimed to maximize coherence by using records that could be synchronized through either high-resolution layer counting or alignment of volcanic sulfate records.

I very much endorse this sort of ex ante quality control. Which is the opposite of the far-too-common practice of ex post selection of a subset of proxies. The 11 isotope series used by PAGES2K (2013) are shown below in a gif together with the reconstruction. The series, examined individually, also show the non-HS decline illustrated in the reconstruction composite.

Several of the high-resolution PAGES2K series extending back to the MWP were first archived as part of PAGES2K, including Law Dome (DSS) and Plateau Remote, both of which I had long and unsuccessfully sought from Tas van Ommen and Ellen Mosely-Thompson.

Earlier versions of Law Dome had been used in Jones et al 1998 and Mann and Jones 2004, the latter including an illustration showing a high MWP. As an IPCC reviewer of AR4, I had asked that Law Dome d18O be included in their figure showing high-resolution Southern Hemisphere proxies. Climategate emails (see CA discussion) show that IPCC authors snickered at this request, knowing that I had asked that they show a proxy with high medieval values. There was no way that they were going to show the Law Dome series. Despite sneering at my request, they recognized that they had to cooper up their rationale for not showing such an important series and inserted the excuse that there was inconsistency between the isotope data and the reconstruction from inversion of subsurface temperatures.

Contrasting evidence of past temperature variations at Law Dome, Antarctica has been derived from ice core isotope measurements and from the inversion of a subsurface temperature profile (Dahl-Jensen et al., 1999; Goosse et al., 2004; Jones and Mann, 2004). The borehole analysis indicates colder intervals at around 1250 and 1850, followed by a gradual warming of 0.7°C to the present. The isotope record indicates a relatively cold 20th century and warmer conditions throughout the period 1000 to 1750.

I mention this incident and excuse because the inconsistency between isotope data and borehole inversions re-appears in PAGES2017.

Stenni et al 2017 (pdf; CA discussion) presented a much expanded database of high-resolution Antarctic isotope data in response to PAGES2K. They presented 112 records (94 d18O; 18 dD) , many of which were short (36 limited to last 50 years or less). 15 records went back to AD1000; 9 went back to AD0. However, 4 of the additional series did not come up to the present or even to AD1950. Four series (TALDICE; DML07; DML17 and Berkner Island) dated from the 1990s; the reason for their exclusion from PAGES2013 is unclear. It included a much lengthened version of WDC06A, a companion hole to WAIS WDC05A. If a site had both d18O and dD records, they used the d18O record and did not double up. There was only one new long series: Roosevelt Island. It showed the long gradual two-millennium decline evident in other records.

Stenni et al produced a reconstruction, which, as pointed out at CA previously, used ex post screening to select series that had positive correlation with upward trending instrumental temperature data:

Even with this bias, their temperature reconstruction had a pronounced downward trend over the past two millennium – entirely consistent with the Law Dome d18O that IPCC had refused to show in AR4 a decade ago.

The PAGES2K (2017) dataset consisted of 27 series. They used 10 of 11 PAGES2K series (of which one series was updated), added 15 isotope series and two VERY unresolved borehole temperature reconstructions. 13 (of 15) new isotope series had been previously used in Stenni et al 2017; the other two series were dD series at sites where d18O series had already been used. The earlier compilations had avoided such duplication.

PAGES (2017) said that their standards for Antarctic ice core isotope series had been relaxed to include “shorter and decadal-scale-resolution” records:

for some proxy types, the standards in this version were broadened compared to the criteria used previously by PAGES2k regional groups. In most regions, records have been added that have become available since the publication of PAGES2k-2013, or that were not used in the continental-scale reconstructions because they are not annually resolved and therefore did not conform to the reconstruction method used by a particular regional group.

In Antarctica, for example, PAGES2k-2013 included only the longest annually resolved ice cores, whereas the present version includes shorter and decadal-scale-resolution records.

Of the 15 new isotope series, 5 begin after the medieval period; 5 end before 1940; 4 have decadal resolution or less. **None** of the new isotope series begin prior to AD1000; end after AD1950 and have better than decadal resolution. Three series which begin at exactly AD1000 meet the other two criteria. Of these three series, two (DML07, DML17) are from the same campaign and author as the 2013 series DML05 and add little new information. I mentioned the other series, an isotope series from Berkner Island, five and seven years ago in connection with the SH network of Neukom, Gergis and Karoly(see here, here). The new isotope data show the same two-millennium decline as PAGES2K and Stenni et al 2017.

The two borehole series invert downhole thermometer temperatures to supposedly estimate past temperature. These inversions use **extremely ill-conditioned** matrices – an issue that doesn’t seem to be clearly understood by proponents – with resolution far lower than PAGES2K standards. (PAGES2017 falsely asserts that one of the two series has annual resolution, and that the other has 100-year resolution.)

PAGES (2017) acknowledged that the resolution of borehole inversions was “less straightforward” than other proxies – an understatement, but nonetheless asserted, waving their arms wildly, that the records were “appropriate for examining decadal to multi-centennial variability”:

PAGES2K scientific questions focus on centennial and finer time scales. Terrestrial and lacustrine records were included with average sample resolution of 50 years or finer. However, such records are rare from marine sediments, and thus a minimum average sample resolution of 200 years was accepted for this database. We also included 4 borehole records, although quantifying median resolution is less straightforward in boreholes than in other archives. The borehole records in the database are appropriate for examining decadal to multi-centennial scale variability, depending on the timeframe of interest [21- Orsi et al, Little Ice Age cold interval in West Antarctica: Evidence from borehole temperature at the West Antarctic Ice Sheet (WAIS) Divide. Geophysical Research Letters 39, L09710 (2012). pdf

There is, of course, a different and real reason for PAGES (2017) insertion of borehole records which didn’t meet PAGES2K ex ante quality standards: the borehole inversions, especially at WAIS Divide (shown in the gif below) have a pronounced 20th century blade, which is absent in the Antarctic isotope data. Cynical readers might reasonably conclude that this had something to do with the PAGES2K decision to abandon its quality control standards for these records.

**Discussion of Antarctic Borehole Data**

I’m going to write a detailed analysis of the WAIS Divide borehole inversion in a separate post . Antarctic played a surprisingly prominent role in conclusions of the 2006 NAS paleoclimate report, but NAS provided no citations for their assertions about Antarctic. I challenged their assertions and, in a surprise appearance in Climate Audit comments, Eric Steig agreed with my criticisms (while slagging me either for making the criticisms or, more likely, for existing.) I was later able to determine from a NAS panelist that their assertions about Antarctic were based on unpublished borehole inversion data. I tried to get the underlying data (measured in 1994-95) from USGS but the data could not be provided to me because it lacked “official USGS approval” which had thus far not been obtained due to other pressing obligations. (Twelve years later, the data remains unarchived.) In 2009, I looked at inversion techniques for downhole temperatures in “boreholes” in rock. (These almost entirely come from mineral exploration.) I noted that the techniques required inversion of **very** ill-conditioned matrices and that some properties looked like Chladni-type artifacts.

When Orsi et al published their borehole inversion in 2012, I asked an associate to request for data and code (figuring that it would be pointless to request the data myself.) Orsi courteously sent both data and code to the associate, who sent it to me. Much of the code had been written in 1990 in an antique Fortran; the rest was in Matlab. I spent some time in 2012 trying to figure it out, but put it to one side after a while. I’ve re-visited the topic with some interesting results which I’ll write up at length, but, for now, give two teasers.

First, the downhole temperature curve was both **convex** and smooth. (Convex means that there were no changes in the direction of curvature.) However, the reconstruction had three major changes in curvature direction and, in detail, many small changes. Mathematically, this is very unsettling: without some very peculiar conditions, the inverse of a convex and smooth curve ought to be convex (or concave) and smooth as well. So how do the changes in curvature in the reconstruction arise? Are they real or an artifact? (In some prior CA posts, I’ve discussed changes in curvature in connection with Chladni patterns arising from principal components on tree ring networks – so there are some interesting connections to a long-standing mathematical interest.) However, it’s a little long and detailed for this post.

While I was trying to figure out the code, I noticed the authors had excluded the top 15 meters of their data “because of the influence of the weather on surface measurements”. This raises an obvious question: what did the excluded data look like?

Orsi’s unpublished data package didn’t include a file named “WAIStemp2009c.txt”, but did include a file entitled “WDC05A_BoreholeTemp_300m_2009.txt”, which contained downhole temperature measurements taken in January 2009.

% as measured in January 2009″, which contained six excluded measurements between 8 and 15 m. The excluded data is shown (in red) in figure below: it continued upward a little further, then declined, retracing about half the increase. Given that the overarching conclusion of the article was rapid recent increase in temperature, it seemed unsettling that they had deleted the most recent data (which went down). The text of the article also cited 2008 measurements, which had not been included in the data package. They turned out to be online at USAP here and are plotted in right panel: ice sheet temperature in the topmost 2 meters reversed the decline, increasing by more than 16 deg C – an effect that was clearly “weather” not “climate”.

Van Ommen et al (1999) contained an informative graphic (replicated below) which showed the dramatic annual variation in near-surface ice sheet temperature: in the top meter or so, temperatures ranged from ~-30 deg C in winter to ~-13 deg C in January, with the amplitude of the variation attenuating by ~15 meters deep. The shape of the temperature profile in the top 15 meters is distinctly of the form of a damped sinusoid: one can reasonably also see a damped sinusoid in the top few meters of the WAIS data as well.

The problem with the top ~15 meters or so is the effect of ordinary (average) *annual* variations, not “weather”. One can see how the elimination of the top ~15 meters of data sidesteps the thorny problem of disentangling these annual variations, but this surely comes at a heavy cost. Ice cores can be accurately dated by layer counting (based on visual appearance and annual d18O cycles). Layers at 15-18 meters date back to the 1960s. Orsi et al purport to reconstruct temperature up to 2007, but they do so **without** using ice core dating from ~1965 to 2007. The calculation is entirely done from ice core layers dated **prior to** the 1960s.

**Conclusion**

I plan a separate post on the curvature issues, which are of mathematical interest (to me at least). I’m very dubious of these borehole inversions in general and am extra dubious of this borehole inversion in particular. From the perspective of PAGES2K (2017 version), it seems transparent that they plan to include even questionable borehole inversions in their composite in an effort to goose the inconveniently declining isotope data into a Hockey Stick.

]]>

- even though PAGES (2013) was held out as the product of superb quality control, more than 80% of the North American tree ring proxies of PAGES (2013) were rejected in 2017, replaced by an almost exactly equal number of tree ring series, the majority of which date back to the early 1990s and which would have been available not just to PAGES (2013), but Mann et al 2008 and even Mann et al 1998;
- the one constant in these large networks are the stripbark bristlecone/foxtail chronologies criticized at Climate Audit since its inception.
**All 20(!)**stripbark chronologies isolated by Mann’s CENSORED directory re-appear not only in Mann et al (2008), but in PAGES (2013). In effect, the paleoclimate community, in apparent solidarity with Mann, ostentatiously flouted the 2006 NAS Panel recommendation to “avoid” stripbark chronologies in temperature reconstructions. In both PAGES (2013) and PAGES (2017), despite ferocious data mining, just as in Mann et al 1998, there is no Hockey Stick shape without the series in Mann’s CENSORED directory.

PAGES2K references: PAGES (2013) 2013 article and PAGES (2017) url; (Supplementary Information).

**Background: Stripbark Bristlecones and Mann’s CENSORED Directory**

In our 2005 articles, Ross and I pointed out that the Mann’s hockey stick is merely an alter ego for Graybill’s stripbark bristlecone chronologies and that the contribution from **all** other proxies was nothing more than whitish noise. We noted that Graybill himself had attributed the marked increase in late 19th and 20th century bristlecone growth to CO2 fertilization, not temperature – a theory which was arguably a harbinger of the massive and widespread world greening, especially in dry areas, over the 30 years since Graybill et al (1985).

In a CA blogpost here, I further illustrated the unique contribution of bristlecones by segregating the additive contribution to the MBH98 reconstruction of bristlecones (red) and other proxy classes (e.g. ice cores, non-bristlecone North American tree rings, South American proxies, etc. in blue, green, yellow ). This clearly showed that (1) the distinctive MBH98 Hockey Stick shape arose entirely from bristlecones and that (2) all other proxy classes contributed nothing more than whitish noise – with their combined contribution diminishing in accordance with the Central Limit Theorem of statistics.

Mann had, of course, done a principal components analysis of his North American tree ring network *without* stripbark bristlecones – an analysis not reported in his articles, but which could be established through reverse engineering of his now notorious CENSORED directory – see CA post here. ) These non-descript PCs further illustrate the non-HSness of the Mann et al 1998 North American tree ring network *without* strip bark bristlecones.

Figure 2. Plot of five principal components in MBH98 CENSORED directory i.e. without Graybill stripbark chronologies, mainly from bristlecones, but a couple of limber pines.

The 2006 NAS panel stated that stripbark chronologies (i.e. the Graybill bristlecone chronologies) should be “avoided” in temperature reconstructions. Although Mann et al 2008 stated that it was compliant with NAS recommendations, Mann flouted this most essential recommendation by including all 20 stripbark series isolated from the CENSORED analysis.

Because of persistent criticism over the impact of these flawed proxies, Mann et al (2008) made the grandiose assertion that he could get a hockey stick without tree rings (and thus, a fortiori, without stripbark bristlecones) – a claim credulously promoted by Gavin Schmidt at Real Climate. However, it was almost immediately pointed out at Climate Audit (here) that Mann’s non-bristlecone hockey stick critically depended on a Finnish lake sediment “proxy”, the modern portion of which (its blade) had been contaminated by modern agriculture and road construction and which had been used upside-down to its interpretation as a temperature proxy in pre-modern times. Mann was aware of the contamination of lake sediments, but argued that his use of contaminated (and upside down) data was legitimate because he could get a HS without them – in a calculation which used stripbark bristlecones. When challenged to show results without either stripbark bristlecones or upside-down mud, Mann (and Gavin Schmidt) stuck their fingers in their ears, with the larger climate community obtusely refusing to understand a criticism that was obvious to any analyst not subservient to the cause.

In the weeks prior to Climategate, I used increasingly harsher terms for the addiction of the paleoclimate community to the data-snooped stripbark chronologies, describing them as “heroin for paleoclimatologists”, with Briffa’s spurious Yamal chronology as “cocaine” (e.g. here here), occasioning much pearl-clutching within the hockey stick “community”.

**PAGES 2013**

To the accompaniment of claims of quality control, PAGES (2013) dramatically culled the population of the Mann et al 2008 North American tree ring network.

The predecessor network used 790 North American tree ring chronologies: 696 individually identified series plus 94 Schweingruber density (MXD) series that contributed to 37 gridded MXD series. (The fudging of these 37 gridded series is an interesting and under-appreciated chapter in *hide the decline*, in which Mann chopped off post-1960 declining values and replaced them with instrumental data – see here.)

The new PAGES (2013) network was reduced to 146 series, i.e. 81% (644 series) of the Mann et al (2008) was discarded as presumably not meeting PAGES (2013) quality control criteria. Approximately 45% (66) of these series were reported in PAGES (2013) as having a positive relationship to temperature according to their criterion, with 55% (80) having a negative relationship.

Despite the 81% cull, **every** (all 20) Graybill stripbark chronology of the MBH98 CENSORED directory (each of which had been subsequently used in Mann et al 2008) was used once again in the PAGES 2013 North American network. In this new network, just like Mann et al 1998, the non-stripbark series – even when opportunistically oriented after the fact according to PAGES (2013) procedure – do not have a Hockey Stick shape. The next diagram compares network averages of scaled chronologies (left- stripbark; right – all other chronologies after orientation), also showing network counts in lower panels. Scale in top panel is identical for both series, but there are far more series in right diagrams.

Figure 3. Top left: average of the 20 stripbark bristlecone chronologies common to Mann et al 1998 and PAGES (2013), standardized to standard deviation units; top right – same for the other 126 tree ring chronologies in the PAGES 2013 tree ring network. Bottom: left – count of number of sites included in the stripbark network (maximum of 20); right – same for other 126 chronologies. Note that scale in bottom panel differs between two sides. PAGES(2013) truncated series to 1200-1987 (with many further truncated to 1500-1980). For this diagram, original chronologies from NOAA archive were used.

The simple average of the PAGES 2013 stripbark chronologies has a shape very similar to the distinctive MBH98 Hockey Stick shape (the MBH98 shape is somewhat more pronounced due to extra weighting of more extreme blades in its PC calculation.) The combination is precisely identical to the pattern which I had observed in the MBH98 networks years ago: Graybill stripbark chronologies contribute the Hockey Stick; the vast majority are nothing more than whitish/reddish noise and have no overall climate signal whatever.

PAGES (2013) determined orientation of each series ex post through temperature correlation in the 20th century – a practice that I’ve criticized from my beginning in this field. My position has been that, if, for example, high-altitude or high-latitude black spruce are believed to be temperature proxies, then you have to use all sites in a consistent ex ante orientation, rather than opportunistically flipping series ex post simply because they go down. While the network is subject to this criticism, there is so much noise in the data in the network shown in the right panel that there is no HS even

Notice that the amplitude of fluctuations of the much larger network on the right (126 versus 20 series) is considerably less than the smaller network on left: this is a trivial result of the Central Limit Theorem of statistics: the standard deviation of an average of noise decreases as the dataset gets larger.

The apparent spike in 2002 non-stripbark ring widths (right) has a neat explanation. For some reason, ring widths in 2002 were exceptionally **low. **Examined in detail (and I looked at the underlying rwl measurement data), many trees at these sites (fewer in number in 2002 than a decade earlier) had *negligible*, even *zero*, growth in 2002. Because so many such series had been assigned negative orientation in PAGES 2013, these very low ring width values resulted in very “high” values in the composite.

There are other peculiarities in the PAGES 2013 network. Regardless of the length of the chronologies available to them, they were truncated into two separate subsets: a short subset truncated to 1500-1980 and a “long” subset truncated to 1200-1987. One dataset was archived as original chronology; the other standardized to SD units. Some series were included in both datasets; other series, which on their face qualified for both datasets, were not, for no obvious reason.

The only representation of the North American tree ring reconstruction in the PAGES (2013) article was the color bar (middle panel below) – a style, which either despite or because of its lower information content, has become popular among climate activists. It turns out to be a representation of a 30-year averaged series (bottom panel) which was archived in the Supplementary Information. The 30-year version appears to have been derived from the 10-year average version associated with it in the Supplementary Information.

Figure 4. PAGES (2013) North American reconstruction from tree ring network: middle panel – excerpt from figure in original article; bottom panel – plot of data from SI showing 30-year version of PAGES2K North American tree ring network; top panel – plot of data from SI which, for other regions shows annual data, but for North American tree rings, shows 10-year data.

**PAGES (2017)**

In PAGES (2013), as noted above), the NOAMER tree ring network contained both positive- and negative-oriented chronologies, the sign being assigned ex post based on the correlation of the chronology with temperature. PAGES (2017), in the supposed cause of “more stringent criteria”, introduced the restriction that the tree ring proxies (in all networks) be restricted to proxies which had a significant *positive *correlation to temperature:

more stringent criteria resulted in the exclusion of some records. .. In most regions, some records were excluded because they did not meet the stricter standards for the minimum length or temporal resolution (criteria detailed above), or because of ambiguities related to the temperature sensitivity of the proxy, or because they have been superseded by higher-quality records from the same site… To be included in the current database,

tree-ring data were required to correlate positively ((averaged over the entire year or over the growing season). Trees whose growth increases with temperature (e.g., direct effect of temperature on physiological processes and photosynthetic rates) are more likely to produce a reliable expression of past temperature variability compared to trees that respond inversely to temperature, for which the proximal control on growth is moisture stress (e.g., evapotranspiration demand)P<0.05) with local or regional temperature

They reported that the new positive orientation criterion resulted in the exclusion of **124** tree ring series from. the PAGES (2013) network:

Of the 641 records that together comprise the previously published PAGES2k datasets, 177 are now excluded, of which

124 are tree-ring-width series that are inversely related to temperature.

Relative to a supposed worldwide total of 124 series excluded through negative correlation, no fewer than **123(!)** series can be identified in the North American tree ring network. Previously, I’d noticed 3 such exclusions in the South American network. In a quick check, there were zero in the Asia network. PAGES (2017) did not explain (or even observe) the unique impact of this criterion on the North American network, but it’s an interesting question. Only 23 North American tree ring series were carried forward from PAGES (2013) to PAGES (2017). The devastation of the PAGES 2013 network itemization can be seen in the excerpt of the Supplementary Information shown below:

As usual, there is an additional irony and puzzle when the screening is examined in detail: of the 123 NOAMER tree ring series excluded due to their “negative” relation to temperature,** 29(!)** had been assigned a **positive** sign in PAGES (2013). This apparent inconsistency was not explained (or even reported) by the PAGES (2017) authors.

There are 126 “new” tree ring series in the PAGES 2017 North American tree ring network, but the majority of these series date back to the mid-1990s and even the early 1980s, as shown in the chart at left. Many of the numerous series from the early 1980s and 1990s are from the Schweingruber collection from which the Briffa reconstruction (with its notorious *decline*) was calculated. These series had presumably been previously considered in Mann et al 1998, Mann et al 2008 and PAGES (2013), but, for some reason, qualified in PAGES 2017 for the first time.

PAGES (2017) retained (only) 23 series from PAGES (2013). The number retained from Mann et al 2008 via PAGES2K was only 10, the majority of which were classic stripbark bristlecone chronologies, including Graybill chronologies from Timber Gap Upper, Flower Lake, Cirque Peak, Pearl Peak, Mount Washington, San Francisco Peaks,and, of course, Sheep Mountain. The PAGES (2017) network added two “classic” stripbark chronologies, which had not been used in PAGES (2013), but which had been a staple of many multiproxy studies: Graumlich’s Boreal Plateau and Upper Wright Lakes stripbark foxtail chronologies from the early 1990s, previously used in Esper et al 2002, Briffa and Osborn 2006, Hegerl et al 2007 and others. (Discussed on numerous occasions at CA, including here here).

It also added a composite (Salzer et al 2013) which updated three Graybill sites: Pearl Peak, Mount Washington, Sheep Mountain), each of which is thus included in both versions. The PAGES2017 version of the Salzer composite continues to 2009 – three years later than the series in the original publication or in archived ring width data. The provenance of this extra data was not reported. The extension is shown at right (green for 1980-1990; red for 1991-2009). The stripbark bristlecone data reached its peak in the late 1970s, exactly when Mann terminated his bristlecone-based reconstruction. Since then, bristlecone widths at these three sites have gone down despite increasing temperatures over the past 40 years, though they remain at historically elevated levels. In our 2005 criticism of Mann et al 1998, we had speculated that bristlecone ring widths would not continue to increase with higher temperatures and, indeed, they have not done so.

The stripbark chronologies, though reduced somewhat in number from the PAGES 2013 network, continue to play a unique role in the North American tree ring chronology. The diagram below compares the stripbark series in PAGES (2017) to non-stripbark chronologies in the same style as Figure 3 above. Despite industrial-scale ex post screening, in the non-bristlecone network (140 series – right panel), there is only a very slight increase at the start of the 20th century increase, **no** increase in the second half of the 20th century, with a possible reversion towards the mean in the sparser recent data. This pattern seems just as likely, or more likely, to be nothing more than what can be expected from ex post screening of reddish noise, and obviously does not capture the expected temperature “signal”. Nor do the bristlecones perform much better.

Figure 5. In same style as Figure 3, but for PAGES (2017). The stripbark network consists of the seven series from the CENSORED directory carried forward into PAGES 2017 plus two stripbark foxtail chronologies (Graumlich) re-introduced in PAGES (2013).

In passing, I noticed some frustrating technical misinformation that I might as well document. Although we’ve already seen that the PAGES 2017 technical spreadsheets explain exclusion of North American tree ring networks as due to “negative” correlation to temperature, elsewhere PAGES (2017) stated that **many** exclusions were due to other technical reasons: use of a *reconstruction *rather than a *chronology* (measurement data) in the earlier PAGES 2013 network, including reconstructions that made use of *principal components* – a topic not unfamiliar to readers of Climate Audit:

Unlike [3 – PAGES 2013], in the present version, tree-ring records include only ring-width or density measurements rather than the reconstructions derived from them. Therefore,

manyof the North American dendroclimatological records used in [3] are no longer employed. Also, in the North American component of [3], unlike the current version, tree-ring data were screened and incorporated into the North American temperature reconstructions as the leading principal components of the tree-ring chronologies utilized. The rationale, methodological detail, and associated reconstruction performance metrics for that usage are described in the supplemental information in [3] (cf. section 4a).

While this explanation seems superficially plausible, **none(!)** of the 146 records in the PAGES (2013) North American tree ring network were (temperature) *reconstructions, *let alone reconstructions calculated with the use of principal components. **Every** series in the PAGES (2013) North American network was either an ITRDB *chronology *truncated to 1500-1980 or an ITRDB chronology standardized to SD units (after truncation to 1200-1987).

**North American Tree Ring Chronologies in PAGES Arctic Network**

The PAGES2K (2013) Arctic network contained four North American tree ring series, while the PAGES (2017) Arctic tree ring network contained three North American tree ring series.

Three of the PAGES (2013) records were *regional chronologies* from D’Arrigo 2006: Central NWT, Seward and Yukon. The fourth was Wilson’s Gulf of Alaska/Coastal Alaska temperature reconstruction – the **only** temperature reconstruction in PAGES 2013 from North American tree rings, which, for good measure, was used in a duplicate copy the North American network.

As long-time Climate Audit readers are aware, Jacoby and D’Arrigo withheld supplementary information for almost 10 years. The eventual archive, published shortly before Jacoby’s death, remains incomplete and frustrating. In 2016, I wrote a very detailed examination (Cherry Picking By D’Arrigo) of the Central NWT regional chronology of D’Arrigo 2006 (and now PAGES 2013), as it represented many of the worst practices of the paleoclimate community. It annoys me to re-read the article. The Central NWT chronology built on the earlier Jacoby chronologies at Coppermine River and Hornby Cabin, which were used in Mann et al 1998.

The PAGES (2017) Arctic network contained three North American tree ring chronologies. It replaced Wilson’s Gulf of Alaska temperature reconstruction with a Gulf of Alaska temperature reconstruction (Wiles et al 2014) with more elevated closing values. I discussed these two datasets quite critically in a 2016 post entitled Picking Cherries in the Gulf of Alaska. These comments carry forward to the similar replacement in PAGES 2017. Ironically, although PAGES 2017 purported to replace reconstructions with original chronologies, in the only PAGES 2013 North American tree ring series which was a *reconstruction* (Wilson’s Gulf of Alaska), it replaced it with a series which was also a reconstruction.

As the present post is already long, I’ll visit this topic on another occasion.

**Conclusions**

- ex post screening based on recent proxy trends necessarily biases the resulting data towards a Hockey Stick shape – a criticism made over and over here and at other “ske;ptic” blogs, but not understood by Michael (“I am not a statistician”) Mann and the IPCC paleoclimate “community”;
- the PAGES 2017 North American tree ring network has been severely screened ex post from a much larger candidate population: over the years, approximately 983 different North American tree ring chronologies have been used in MBH98, Mann et al 2008, PAGES 2013 or PAGES 2017. I.e. only ~15% of the underlying population was selected ex post – a procedure which, even with random data, would impart Hockey Stick-ness to any resulting composite
- despite this severe ex post screening (in both PAGES 2013 and PAGES 2017), the composite of all data other than stripbark bristlecones had no noticeable Hockey Stick-ness and does not resemble a
*temperature*proxy. - PAGES 2013 and PAGES 2017 perpetuate the use of Graybill stripbark chronologies – despite the recommendation of the 2006 NAS Panel that these problematic series be “avoided” in future reconstructions. PAGES 2013 (like Mann et al 2008) used all
**20(!)**stripbark chronologies, the effect of which had been analysed in Mann’s CENSORED directory. PAGES 2017 continued the use of the most HS stripbark chronologies (Sheep Mt etc) both in the original Graybill version and in a more recent composite (Salzer et al 2014), while adding two stripbark chronologies used in Esper et al 2002 and other IPCC multiproxy studies.

In the past, I charged Mannian paleoclimatologists as being *addicted* to Graybill stripbark bristlecone chronologies – which I labeled as “heroin for paleoclimatologists” (also describing Briffa’s former Yamal chronology as “cocaine for paleoclimatologists”. Unfortunately, rather than confronting their addiction, Gavin Schmidt and others responded with haughty pearl-clutching indignation, while, behind the scenes, the PAGES consortium doubled down by perpetuating use of these problematic proxies into PAGES 2013 and PAGES 2017.

On this day in 2009, a few weeks before Climategate, I suggested appropriate theme music by Eric Clapton and Velvet Underground. Still apt nine years later.

]]>In keeping with my preference to look at regions and proxy types before worrying too much about aggregates, I looked at their South American network, which is an update of the South American network of PAGES2K (2013), which I discussed a few days after publication here. There were major changes between 2013 and 2017 networks, which were not elucidated in the later study, but which will be discussed in today’s article. The changes illustrate the profound problems with the tree ring chronologies and lake sediment series which make up the vast majority of data in PAGES 2017 and similar studies.

**The PAGES2K (2013) South American Network **

The PAGES2K (2013) South American network consisted of 23 proxies:

- two ice core proxies from a single site (Quelccaya, Peru);
- one lake sediment proxy (a reflectance indicator from Aculeo, Chile);
- one ocean sediment proxy (Mg/Ca from the Cariaco Basin, offshore Venezuela);
- four
series as supposed**instrumental***proxies*for instrumental temperature - 15 tree ring series. These 15 series were screened from the larger tree ring network of
*Neukom and Gergis (2012)*, which had**63**series, which, in turn, had been selected from a larger roster of unknown size using unknown procedures.

I discussed this network a few days after publication, pointing out some serious problems which had been overlooked in the hasty review of PAGES2K (2013) by Nature after it had been rejected by Science. The hasty review was required because IPCC AR6 had cited PAGES2K then in review, not anticipating that it would be rejected. Ironically, Michael Mann was one of the reviewers who recommended rejection.

- I observed that the PAGES2K use of the very standard Quelccaya d18O series (used in most multiproxy series since Jones et al 1998) was
**upside-down**to its use by all other authors – an error that ought to have been picked up and corrected before publication; - I criticized the use of the four instrumental records as supposed
*proxies*for temperature observing that this “seems to be peeking at the answer if the “skill” of the early portion of the reconstruction is in any way assessed on the ability of the network (including instrumental) to estimate instrumental temperature”. This seems so obvious that it is hard to imagine any serious climate scientist using instrumental temperature data in anetwork, except that the practice has been encountered much too often, including Mann et al 1998.**proxy** - I observed that “one-third of the tree ring series are inverted” and asked whether this was “an ex ante relationship or mere ex post correlation?”. Perhaps the longest standing dispute between Climate Audit and authors relied upon by IPCC is over ex post screening or ex post orientation – both practices being condemned at Climate Audit since its earliest days.

I’ve also long spoken against the use of *singleton* proxies in multiproxy studies intended for policy reliance on the grounds that replicability across multiple sites ought to be insisted on, before inclusion in a multiproxy study. The Laguna Aculeo indicator – relative absorption band depth (RABD) centred in 660-670 nm said to measure “total sedimentary chlorin” – was then relatively unique; a rare example in a marine sediment here.) Values of the index were not even reported in its data archive – only the temperature reconstruction.

**The PAGES 2017 Network**

Eighteen of the 23 series in the 2013 network were rejected in 2017; only *five* were retained. Of these five, one series (Quelccaya d18O) was used in the opposite orientation to the 2013 network. Needless to say, the PAGES2K 2017 authors did not disclose that they reversed the orientation of the series from the earlier study. This was the second PAGES2K 2013 series where the authors recognized that their original use was upside down: I had also criticized their upside-down use of the Hvitarvatn, Iceland series, which they grudgingly corrected in a later publication and even more grudgingly (after some sneering on my part) and much later issued a corrigendum.

The disposition of the 2013 network is shown in the table below.

- the second Quelccaya series (accumulation) – which had also been used in Mann et al 1998 – was rejected as being a “hydroclimate proxy”. They did not explain how it had passed the supposedly rigorous protocols of PAGES 2013.
- the Cariaco ocean sediment series was exported to their Ocean proxy network. (Fair enough).
- the four instrumental series were rejected as proxies with the laconic explanation that they were “instrumental data” (thus, complying with one of my 2013 criticisms)
- they rejected the five tree ring series which had been assigned (ex post) negative orientations. From a statistical perspective, ex post screening of series (which met ex ante criteria) on grounds of negative correlation is just as pernicious as ex post orientation. This is no real improvement
- they rejected one tree ring series due its failure to meet an internal consistency statistic (EPS). It is unclear why this wasn’t picked up in 2013
- they rejected six tree ring series as being too short (less than 300 years). I agree with this policy: if one’s objective is to compare modern temperatures to (say) medieval temperatures, introduction of such short proxies results in inhomogeneity which ought to be avoided. (This sensible 300-year policy was unfortunately ignored in PAGES2017 Ocean network.)

The disposition of the 2013 network is shown below.

**New Proxies**

There were three “new” proxies: one tree ring series and two lake sediment series. In addition, two tree ring series were updated.

The”new” tree ring series (CAN Composite 15) had, like the other series, been in the Neukom and Gergis 2012 network. For some reason, it had been screened out of the PAGES 2013 network, but now determined to meet the PAGES2K criteria after all. Of the original **63(!)** tree ring chronologies in the Neukom and Gergis 2012, only **four(!) **made their way into the PAGES2017 network. I do not believe for a minute that these four tree ring chronologies are unique thermometers. A more likely interpretation is that their satisfaction of proxy criteria was fortuitous and that they are no more trustworthy as thermometers than the excluded chronologies. Nor did any of these four chronologies reach back to the medieval period: their start dates ranged from 1435 to 1636, start dates, long after the medieval period.

Interestingly, the fresh data in the two updated tree ring series further illustrates the ineffectiveness of these South American tree ring chronologies as temperature proxies, as shown in the plots of Central Andes 6 (CAN 6) and Central Andes 9 (CAN 9) below.

CAN9, which is barely over 300 years long, has high values in mid-20th century, but declines in the last half of the 20th century despite temperatures increase. Its late 20th century decline continues into the 21st century, where values have reverted to the long-term mean. Similarly CAN6 has had little longterm change, but had a late 20th century spike, but has regressed to low values subsequently.

A more plausible interpretation of the data is that these four series were selected **ex post** because their 20th century values were somewhat higher than values in earlier centuries, but are not magic thermometers.

Only one of the two new lake sediment series purports to show elevated and increasing 20th century levels: the Laguna Chepical (de Jong et al 2013) series. But closer examination of the data shows that the modern portion of this lake sediment series, like the notoriously contaminated Korttajarvi series of Mann et al 2008 and the equally contaminated (but less notorious) Igaliku series of PAGES2K (2013), is also compromised by man-made construction. The original authors (de Jong et al) argued that man-made construction did not compromise lake sediment reflectance as a climate proxy, but, when held up to sunshine, their argument is flimsy.

Laguna Chepical is located in central Chile (32S) at high altitude (3055 m), approximately 130 km north of Santiago. The authors measured reflectance at relatively high resolution, from which they selected the ratio of reflectance at 570 nm to reflectance at 630 nm (R570/R630), interpreted as indicative of the clay mineral content in the lake sediments. They observed a strong decrease in this ratio during the 20th century (for which instrumental temperature data was available). Summer temperatures increased during this period. A simple correlation calculation was said to show that R570/R630 was “strongly and significantly negatively correlated with summer temperatures.” The authors proposed the following explanation:

We reason that cool summers, associated with late lake ice break-up and hence relatively long periods of ice cover, favor the settling of very fine particles in the lake, which leads to increased clay contents in the sediments.

But there’s a catch: around 1885, just prior to the calibration period, there was a **ten-fold (!)** increase in sediment accumulation rate. This can be seen in a comparison of the two plots shown below: top – R570_R630 versus depth; bottom – “temperature”, a linear transformation of R570_R630, to year. The two red arrows show two pairs of matching points. The layer at ~20 cm of core (right arrow) is dated to ~1885 AD, while the layer at ~41 cm (left arrow) is dated to ~440 AD. In other words, the top ~20 cm of core was accumulated in ~115 years of time, but it had taken ~1445 years to accumulate the prior ~20 cm of core. The rate of modern accumulation **is more than ten(!) times greater** than the rate of accumulation in the previous 15 centuries.

It is more or less certain that an order-of-magnitude increase in sediment accumulation rate in modern period is due to some sort of man-made land disturbance, rather than climate. For example, modern period increases in sediment accumulation at Korttajarvi and Igaliku were due to local land disturbance (construction, agriculture), not climate. Failing to recognize this led to embarrassing mistakes in Mann et al 2008 and PAGES2K (2013) respectively.

When one re-examines the original publication (De Jong et al 2013 ), one finds that they reported a man-made intervention at the precise time when sedimentation rates increased so dramatically:

A small creek with episodic flow enters the lake in the northwestern side and has formed a small, shallow delta. Additional sediment inflow likely occurs during snow melting from the surrounding slopes to the N, E and W. … An outflow is located in the SW.

Since ca. AD 1885, this outflow was dammed and regulated(A. Espinoza, personal communication, 2006). [my bold]

The order-of-magnitude increase in sediment accumulation in the core clearly results from the dam, rather than increase in temperature (the sediment accumulation increase is a local phenomenon). Within this enormous increase in sedimentation rate, there is a noticeable increase in clay mineral content (measured by the fall in R570_630 reflectance values from ~0.90 to ~0.82) to levels which were essentially unprecedented in the previous three millennia. It seems logical that the increase in clay mineral content is a by-product of this dramatic increase in sedimentation, as opposed to the speculative connection to cool summers and late ice break-up proposed to the authors.

The authors purported to dismiss any connection between the construction of the earth dam in 1885 and the subsequent increase in clay mineral content in lake sediments as follows:

An additional, potentially important environmental variable was the construction of the earth dam in AD 1885. However, as indicated by cluster analyses,

the construction of a low (ca. 2 m) earth dam and the subsequent relatively small increase in maximum lake depth did not significantly affect most of the sediment properties measured with VIS-RS scanningand had no influence on the R570/R630 values. Therefore, the reconstruction of summer temperatures based on calibration-in-time, which was developed for the period after dam building, is also valid back in time. [my bold]

Unfortunately, the authors failed to provide any statistics or other supporting data for this assertion. I don’t know how “cluster analyses” could possibly show that the construction of the dam in 1885 had “no influence on the R570/R630 values”, which, after all, fell to unprecedented levels following dam construction and subsequent ten-fold increase in sediment accumulation rate. I don’t believe that it is possible to draw such a conclusion from “cluster analyses”. Also, speaking strongly against the assumption of non-impact of dam construction is the following statement in Meyer et al 2017:

The main prerequisite for its [VIS‐RS scanning] successful application is that temporal variation in lake hydrology over the period of interest has not appreciably affected sedimentation dynamics at the core site, since major changes in sediment texture and organic content are likely to create confounding effects in the VIS‐RS signature.

That condition was obviously not met at Laguna Chepical.

The other new South American proxy in PAGES2017 is from Laguna Escondida in northern Patagonia (45S) from Elbert et al 2013. It is a temperature estimate from *biogenic silica flux (mg/cm^2*yr). *Biogenic silica % and/or flux is measured quite commonly in paleoclimate lake sediment studies, but is not commonly used as a temperature proxy. It measures productivity of diatoms. BSi was used used in one other PAGES2017 proxy, Hallett Lake, Alaska (as percentage, rather than flux). The Hallett Lake series had been previously discussed at Climate Audit, where I noted that its very elevated early values had been chopped off for no apparent reason other than that they were elevated. Another location with BSi measurements is Hvitarvatn , Iceland, a site discussed on several occasions at Climate Audit; its varve thickness measurements were used in PAGES2K. This series had high medieval values, with a decreasing trend to the modern period. It lacks the strong HS-blade of Laguna Chepical discussed above.

**Summary**

The eight PAGES2017 series are summarized in a consistent panel plot below for the period 1000 on.

The tree ring component of this network is, more or less, a reductio ad absurdum of tree ring chronologies as useful temperature proxues: only four of 63 original tree chronologies have sufficient Hockey Stick-ness to be retained in the network, with even these poor remnants reverting to the mean in the 21st century updates. There is negligible similarity between the three lake sediment series, each of which uses a different indicator, though similar measurements appear to have been taken for all three sites. The only series with a meaningful HS (Chepical) appears to result from construction of a dam in 1885AD, rather than from increased temperature. This leaves the Quelccaya ice core series – which was a staple of temperature reconstructions as early as 1998 and, which, ironically, was used upside down in PAGES2K (2013), corrected in PAGES 2017 without disclosure/admission of the earlier error.

All in all, a rather pathetic show by PAGES2K.

]]>The first article linking the DNC hack to APT28 spearphishing by SecureWorks (here) in June 2016. Secureworks had been tracking APT28 spearphishing for some time through their bitly links. They provided two examples linking respectively to the malicious domains: accounts-google[.com and googlesetting[.com. I’ve looked at both, but will discuss only the latter in this note. These domains were previously discussed at CA here.

The SecureWorks article showed the following syntax for the hyperlink to googlesetting[.com.

The string ZGlm… expresses the target email (difeitalia.canberra[@]gmail.com) in base64 (see https://www.base64decode.org for conversions). According to public IP records, on April 29, 2015 (the relevant date), googlesetting[.]com resolved to 37.221.165.244, an IP address in Romania. The domain is associated with APT28 by, inter alia, its registrant: Andre Roy, email address ///,: a registrant discussed at CA here. In early 2015, the domain also sometimes resolved to a US IP address (173.194.121.36).

The googlesetting[.]com domain had quite a few contemporary attestations, in particular, inquiries to phishtank by Ukrainian activists associated with Informnapalm. The earliest attestation that I’ve located occurred on 2014-07-23 in a phishing email to anna.prokaeva[@]gmail.com): see below. At the time, the domain similarly resolved to IP address 37.221.165.244 in Romania:

On 2018-05-26, a spearphishing email with IDENTICAL syntax to the 2014 spearphishing email was reported by Virus Total: see below. The target (omaralshater[@]gmail.com). is, of course, different.

In late May 2018, the domain resolved to IP address 199.59.242.150, hosted by Bodis LLC in New York City: see here; here.

What does this mean? Dunno. But it sure seems odd to see the re-appearance in 2018 of a domain characteristic of the APT28 spearphishing campaign, this time in New York City.

**Update (July 20):** A commenter observed that Bodis LLC parks hundreds of thousands of unused domains, so the appearance of this domain in May 2018 doesn’t, in itself, mean anything. Thinking further on other possibilities, it seems possible that someone, in the course of re-investigating spearphishing events, might have done a search at VirusTotal or other anti-virus service on a string from a 2015 phishing attempt. If such a search was done in May 2018, Virus Total would only know the date of the inquiry, not the date of the phishing attempt. At the end of the day, there doesn’t seem to be anything here. I don’t wish to contribute to any additional inaccuracy on this murky topic and will consider deleting this post.

]]>