Jones et al. [1998] makes the reasonable policy that proxies should be validated against gridcell temperatures as evidence that they are temperature proxies, noting that this is not always done. This policy is endorsed in Jones and Mann [2004], who note that not all multiproxy studies had observed this policy, presumably including MBH98, which included precipitation measurements in the dataset.
As an exercise, I attempted to replicate their results, as reported in their Table 4, against the present HadCRU2 gridcell temperatures. A few points are illustrated in the replication tables – one for their annual series and one for the decadal averages. First, many of their reported results cannot be replicated, including the correlations for some well-known proxies. Second, I’ve done calculations to yield a t-statistic, which is easier to determine significance than a correlation statistic. For independent residuals, a t-statistic of about 2 is significant. For autocorrelated series and for cherrypicked series, it’s higher, but I’ve not done those calculations here, contenting myself with illustrating versus OLS significance. I’ve marked in bold red examples where the t-statistic is insignificant or where there are big declines in replication versus actual. Also I’ve calculated the Durbin-Watson statistic, which is one measure of autocorrelation in residuals, used to test for spurious regression. Granger and Newbold [1974] discussed here took the position that any model with a DW<1.5 is misspecified. These are also marked in bold red. Not much survives.
This is not the exact same dataset and the verification exercise is, to that extent, not a direct audit of their calculation against the temperature data set which they used – which is not necessarily available anyway. (I’ve experimented with the vintage temperature dataset archived with the MBH98 Corrigendum and can’t replicate results with that dataset, but that’s a different topic.) Scripte here.
Jones et al [1998] show correlations for the 1881-1980 period for both annual and decadally averaged data (his Table 4). Here I’ve extended the comparison to 1856-1994 for annual data and 1860-1989 for decadally averaged data, using all available data. It is trite that the correlation coefficient is equal to the square root of the R2 statistic of the corresponding linear regression model. (I’ve attached a script for all calculations and benchmark this for the specific functions that I used.) The advantage of using the linear regression approach is that you obtain standard errors and a t-statistic for which you can determine statistical significance. This is particularly relevant for the decadal correlations, where you only have about 10 measurements. I’ve not elaborately considered the issue of autocorrelation although this is always an issue. Statistics shown here are OLS statistics. I have shown the Durbin-Watson statistic for the fit.
Table 1 compares the reported results with my verifications. In 7 cases, the verification correlation was substantially lower than reported: two Briffa series – Urals, Western US density; Galapagos coral, two Villalba South American tree chronologies (Lenca, Alerce). Seven series had insignificant t-statistics (5 of 7 SH series) and 2 NH series: Kameda Greenland melt % and Jacoby treeline. Eight series had Durbin-Watson statistics less than 1.5, indicating autocorrelation in the residuals (per Granger and Newbold [1974], an unusable model : Tornetrask, New Caledonia, Greenland O18, Svalbard melt %, Galapagos, Lenca, Kameda melt and Jacoby treeline.
The only proxies which had a significant t-statistic and passed the DW test were: the C Europe and C England "proxies" and the Jasper and Tasmania tree ring reconstructions.
Table 1. Correlation Statistics for Jones et al [1998] Proxies
Proxy |
Reported |
coef.OLS |
se.OLS |
t.OLS |
Pr.OLS |
DW |
C Europe |
0.9 |
0.93 |
0.03 |
28.59 |
0 |
1.83 |
C England |
0.84 |
0.75 |
0.06 |
12.96 |
0 |
1.94 |
Tornetrask |
0.79 |
0.7 |
0.06 |
10.78 |
0 |
1.48 |
Jasper |
0.48 |
0.47 |
0.08 |
6.14 |
0 |
1.55 |
Urals |
0.83 |
0.46 |
0.08 |
5.62 |
0 |
1.56 |
Tasmania.92 |
0.42 |
0.41 |
0.08 |
5.1 |
0 |
1.57 |
New.Caledonia |
0.41 |
0.41 |
0.09 |
4.49 |
0 |
0.68 |
Crete.O18 |
0.30 |
0.47 |
0.11 |
4.36 |
0 |
1.38 |
Briffa.WUSA |
0.60 |
0.33 |
0.08 |
4.01 |
0 |
2.01 |
Svalbard |
0.08 |
0.26 |
0.09 |
2.79 |
0.01 |
0.77 |
GBR.5 |
0.18 |
0.18 |
0.1 |
1.81 |
0.07 |
1.77 |
Galapagos |
0.39 |
0.15 |
0.1 |
1.55 |
0.12 |
1.28 |
Lenca |
0.36 |
0.11 |
0.1 |
1.02 |
0.31 |
1.49 |
Kameda.melt |
0.17 |
0.09 |
0.09 |
0.98 |
0.33 |
0.39 |
Alerce |
0.35 |
0.07 |
0.11 |
0.64 |
0.53 |
1.56 |
Jacoby treeline |
0.34 |
0.04 |
0.08 |
0.52 |
0.61 |
0.70 |
Law.Dome |
0.26 |
0.08 |
0.19 |
0.43 |
0.67 |
1.97 |
Table 2 shows the same information for the decadally averaged versions. There is an ongoing effort by paleoclimatologists to try to extract "low-frequency" information from proxies. There’s nothing wrong with this, but the smoothing ends up reducing the degrees of freedom so much that it is important to use more sophisticated statistics than a correlation coefficient to see what’s going on. The methodology here is as in the first table.
A number of series had substantially lower decadal correlations than reported: the most notable was the Polar Urals dataset which went from a reported 0.92 to minus 0.2 (!); four series had negative correlations and 8 series had insignificant t-statistics. Here the impact of the number of measurements is felt: for example, the Greenland (Crete) O18 series has a correlation of 0.49, which looks good, but the t-statistics is only 1.58. Six series had DW statistics below 1.5; 4 had DW statistics greater than 2, showing an over-correction, although there are so few values that none of these numbers mean a while lot. Again only 4 series passed both t- and DW-tests.
Table 2. Decadal Correlation Statistics for Jones et al [1998] Proxies
Proxy |
Reported |
coef.OLS |
se.OLS |
t.OLS |
Pr.OLS |
dw |
CEur |
0.83 |
0.86 |
0.15 |
5.94 |
0 |
0.93 |
CEng |
0.8 |
0.86 |
0.15 |
5.6 |
0 |
2.52 |
Tasmania.92 |
0.58 |
0.73 |
0.2 |
3.59 |
0 |
1.82 |
Svalbard |
0.38 |
0.74 |
0.21 |
3.5 |
0.01 |
1.53 |
Fenno |
0.8 |
0.68 |
0.22 |
3.06 |
0.01 |
1.8 |
New.Caledonia |
0.48 |
0.62 |
0.24 |
2.61 |
0.02 |
0.74 |
Briffa.WUSA |
0.79 |
0.6 |
0.24 |
2.47 |
0.03 |
1.9 |
Jacoby.treeline |
0.87 |
0.46 |
0.19 |
2.36 |
0.08 |
1.22 |
GBR.5 |
0.52 |
0.57 |
0.25 |
2.28 |
0.04 |
1.34 |
Crete.O18 |
0.49 |
0.41 |
0.26 |
1.58 |
0.15 |
1.93 |
Law.Dome |
0.98 |
0.37 |
0.24 |
1.53 |
0.27 |
2.34 |
Jasper |
0.45 |
0.24 |
0.25 |
0.96 |
0.36 |
0.92 |
Alerce |
0.16 |
0.07 |
0.3 |
0.23 |
0.82 |
2.66 |
Lenca |
0.55 |
-0.05 |
0.31 |
-0.16 |
0.88 |
2.32 |
Galapagos |
0.16 |
-0.05 |
0.3 |
-0.16 |
0.88 |
0.79 |
Urals |
0.92 |
-0.2 |
0.26 |
-0.75 |
0.47 |
1.75 |
Kameda.melt |
-0.28 |
-0.23 |
0.22 |
-1.06 |
0.31 |
1.61 |
It should also be remembered that these series have been picked from a much larger population of series. The very act of selection means that the true t-statistic for significance is much greater than the rule of thumb of 2 [Ferson et al, 2003].
5 Comments
A summary of the significance of your findings for us lurking laymen please!
Thank you.
Steve,
That does not surprise me at all – there seems to be a general inability to replicate much in climate studies.
If results cannot be replicated, ie here statistically which you have done, then their results are specious.
In this case the raw data has to be released for independent audit which so far has been next to impossible to achieve.
I down loaded Jone’s absolute temp values used to compute the reference global mean temp which on inspection showed to be grid cell aggregates, not what I had hoped were raw station monthly means. Warwick Hughes comments on his site the problems associated with these grid cells.
nice analysis.
Steve
Isn’t this just a case of, if you do enough different tests sooner or later you are bound to find one that is (not)significant?
Excerpt from David Brooks “On Paradise Drive” – this must be treated – by those “adjusting” the surface record, by those doing reconstructions, and by those doing modeling:
“The Exurbs
“Now we are out in the outer suburbs, the great sprawling expanse of subdevelopments, glass-cube office parks, big-box malls, and townhome communities. This new form of human habitation spreads out into the desert or the countryside, or it snakes between valleys, or it creeps up along highways and in between rail lines. This kind of development seems less like a product of human will than an organism. And you can’t really tell where one town ends and the other begins, except when, as Tom Wolfe observed, you begin to see a new round of 7-Elevens, CVS’s, Sheetzes, and Burger Kings.
“We don’t even have words to describe these places. Over the past few decades, dozens of scholars have studied places like Arapahoe County, Colorado; Gwinnett County, Georgia; Ocean County, New Jersey; Chester County, Pennsylvania; Anoka County, Minnesota; and Placer County, California. They’ve coined terms to capture the polymorphous living arrangements found in these fast growing regions: edgeless city, major diversified center, multicentered net, ruraburbia, boomburg, spread city, technoburb, suburban growth corridor, sprinkler cities. None of these names has caught on, in part because scholars are bad at coming up with catchy phrases, but in part because these new places are hard to define.
“You can’t even sensibly draw a map because you don’t know where to center it. Demographer Robert Lang tried to draw a map of a zone north of Fort Lauderdale, Florida. He located all the roads and office parks and arbitrarily drew the borders. If he’d slid his map north, south, east, or west, some roads and buildings would have disappeared, and others would have appeared. But there would have been no noticeable change in density, no new and definable feature, just another few miles of suburban continuum.
“And yet people flock here. Seventy-three million Americans moved across state lines in the 1990s, and these places ‘€” across Florida, north of Atlanta, shooting out beyond Las Vegas, Phoenix, Denver, and so on ‘€” drew them in. You fly over the desert in the Southwest or above some urban fringe, and you notice that the developers build the sewers, roads, and cul-de-sacs before they put up the houses, so naked cul-de-sacs to nowhere spread out beneath you. One day I stood and watched a crew carve a golf course out of the desert near Henderson, Nevada, one of the fastest-growing cities in America. A year later, and fifty thousand people are living where there was nothing.”
[emphasis added by me – S. Salov]