Juckes has finally written response to the various comments – see url here. Today I’ve posted up Willis’ Comments and inter-collated Juckes’ Reply in block-quote to make it easier to compare the Comment and Reply – something that I often do for my own purposes to facilitate comparison. Willis submitted thoughtful comments, to which Juckes was completely unresponsive (as he was to my comments).
ABSTRACT: The MITRIE paper presents a comparison and evaluation of proxy-based
temperature reconstructions. It also introduces its own reconstruction called the Union
Reconstruction (UR), based on the CVM method. The UR is argued to be superior
to previous reconstructions but, in fact, suffers from exactly the same problems that
affect previous reconstructions. As such it is hard to recommend the publication of this
paper, presenting as it does, yet another minor variation on existing reconstructions
that remains compromised by the same problems (spurious correlation with temperature
being the most significant) as the earlier reconstructions. There is evidence
that the CVM measure generates spurious results (Granger 1974). One reason for
the spurious results is that there are substantial grounds to question the validity of
particular records used as temperature proxies’ in the paper. In addition, there are
problems with the models and methods used in the reconstruction. More detailed
comments expanding on these themes follow in this and succeeding reviews. Supplementary
Online Material (SOM), including figures, proofs, and references, is available
A) STATISTICAL SIGNIFICANCE AND SPURIOUS CORRELATION: The paper says
“The composite tracks the changes in northern hemisphere temperature well, capturing
the steep rise between 1910 and 1950 and much of the decadal scale variability. This
is reflected in the significance scores (Table 3) which are high both for the full series
and for the detrended series.”
The correlation of the Union Reconstruction (UR) with the NH data, while high, is not
significant because of the autocorrelation of the two series. There are at least three
methods available to establish the significance of correlation between two series each
of which has significant autocorrelation. 1) The most common method is the Durbin-
Watson Test, which examines the autocorrelation of the residuals between the two
series. A Durbin-Watson statistic of less than 1.5 indicates that there is significant
correlation in the residuals, indicating that the model is mis-fit to the data. The Durbin-
Watson test statistic for the correlation of the UR and the NH data is only 1.4, marking it
as not significant. (Shaw 1985, Draper 1998) 2) A second method is that of Quenouille
(Quenouille 1952), which gives an effective number of degrees of freedom for two
autocorrelated series. This reduced number of degrees of freedom is used in the
normal way to calculate a standard “p” value for the significance of the correlation.
The Quenouille method for the UR/NH data correlation gives a result of p=0.11, again
showing the correlation is not significant.
Finally, there is the Monte Carlo method, which compares the results of random “red
noise” realizations with the UR results. In order for this test to be valid, the red noise
proxies must be used in the same way as the proxies in the CVM method. For the
CVM method to work, one requirement is that the proxy results that have negative
correlations with the NH data must be “flipped” so that they have a positive correlation
with the NH data. The reason that the UR CVM proxy has the calculated correlation
with the NH data is that the Chesapeake proxy (which has a negative correlation with
temperature) is flipped before being averaged. This, of course, is the correct procedure
(although it is not mentioned in the text and has not yet been found in the code), and
contributes significantly to the correlation seen in the reported results. However, when
a Monte Carlo analysis is performed to determine the significance of the correlation,
the exact same procedure must be followed – the random red-noise series that have
a negative correlation with temperature must be flipped before they are used in the
calculation. If this is not done, as in this paper, then the Monte Carlo simulation is not
following the same procedure as the CVM method used for the UR, and thus the results
of the test are not representative of the UR. When this is done, there are a number of
red noise proxies that outperform the UR. An R script for a red noise (random walk)
process that outperforms the UR is available at http://tinyurl.com/ylk4sq
(It is worth noting that a Monte Carlo method cannot prove that a result is significant,
only that it is not significant. If a given method outperforms the same method used with
a given red-noise proxy, all the Monte Carlo test proves is that the reconstruction has
outperformed a given form of red noise. I does not show that it outperforms all types
of red noise. On the other hand, if a given form of red noise outperforms the proxies,
this shows that the correlation may well be random, since we cannot reject the null
hypothesis. This is particularly true if the red noise is simple, such as a random walk
in this case.)
Thus, based on the results of three separate evaluation methods (Durbin-Watson, Quenouille,
and Monte Carlo) we cannot reject the null hypothesis that the correlation of the
UR with the NH data is random. This, of course, means that we can place no reliance
on the UR as a reconstruction of historical temperatures.
(A) The Durbin-Watson test deals with the autocorrelation of residuals not the significance
of the correlation. The switch in sign of the Chesapeake bay data was an error.
The statement of significance is using standard statistical terminology.
B) BRISTLECONE/STRIPBARK PROXIES: The NAS panel was quite clear about not
using Bristlecone/Stripbark tree ring proxies in historical temperature reconstructions.
Despite the NAS recommendations, and other information indicating problems with
bristlecones (e.g. Graybill and Idso, Rob Wilson, Wegman Report) that supports
the NAS Panel’s recommendation, the UR contains no less than four such proxies:
Methuselah’s Walk, Indian Garden, CA Bristlecone ?C13, and Boreal.
The NAS recommendation is not an isolated example. Biondi et al. (including
MBH author Hughes) wrote:“The average of those sites [a network of high-elevation
temperature-sensitive tree-ring sites in the Great Basin and Sierra Nevada of Hughes
and Funkhouser, unpublished], plotted in Figure 5, is based on many ring-width series,
each one being 500 years or longer, without individual growth surges or suppressions
and from “strip-bark” five-needle upper forest border pines of great age. Such a record
is not a reliable temperature proxy for the last 150 years as it shows an increasing trend
in about 1850 that has been attributed to atmospheric CO2 fertilization [Graybill and
The problem with bristlecones is that since “Such a record is not a reliable temperature
proxy for the last 150 years”, it cannot be used as a temperature proxy at all, because
the record is not reliable during the calibration period. Thus, there is no way to calibrate
the earlier period of the proxy record.
While there is no requirement that the MITRIE study follow the recommendation of the
NAS Panel or of Biondi et al., there definitely is an obligation to justify the action if the
recommendation is not followed. To do so, it is necessary to explain how to calibrate a
proxy that has unreliable data during the calibration period.
(B) As the reviewer notes, the Biondi et al. paper introduces no new evidence for CO2
fertilization of pine. Recent literature suggests that this is probably a minor issue.
SM: The reviewer did not “note” that the “Biondi et al paper introduces no new evidence for CO2 fertilization”. The reviewer quoted a statement that the bristlecone record is not “reliable”, the lack of reliability being hypothesized to be due to CO2 fertilization, but the attribution of the unreliability is a different issue than the existence of the unreliability. Juckes et al do not cite any “recent literature” suggesting that this is “probably a minor issue”; the NAS Panel reviewed recent literature and concluded that bristlecones should be avoided.
C) PROXY SELECTION AND PROCESSING: It is vital to have clear proxy selection
rules. This prevents later accusations of “cherry picking” only those proxies which
support a desired conclusion, and makes the conclusions of the study more robust.
The statements about proxy selection in the paper are: “Here, we will restrict attention
to records which span the entire reconstruction period from to AD 1000 to AD
1980 (with some series ending slightly earlier, as discussed below).”, and “These se-
ries have been chosen on the basis that they extend to 1980 (the HCA composites and
the French tree ring series end earlier), the southern hemisphere series have been
omitted apart from the Quelcaya glacier data, Peru, which are included to ensure adequate
representation of tropical temperatures. The MBH1999 North American PCs
have been 20 omitted in favour of individual series used in other studies. Finally, the
Polar Urals data of ECS2002, MBH1999 and the Tornetraesk data of MSH2005 have
been omitted in favour of data from the same sites used by JBB1998 and ECS2002,
respectively (i.e. taking the first used series in each case).”
The implied a priori selection rules are 1) proxies must span the period AD 1000 to AD
1980 (with exceptions); 2) individual series are used instead of proxy compilations; 3)
older data are used in preference newer data, when both exist for a given site, and 4)
proxies will be from the Northern Hemisphere (with exceptions). There is no rule in the
MITRIE paper about archived vs. un-archived proxies. However, Dr. Juckes later said
that there was a rule that the data be “published data”. (See http://tinyurl.com/yforq7
for a discussion of this issue). There are a number of problems with the rules as stated.
(These and further issues will be discussed in Multidisciplinary Review 2 and then 3.)
Vagueness of A Priori Proxy Rules and Criteria: A priori rules must have some logical
reason for their inclusion and must be applied consistently. However, in the instant
case, some explanation is needed. 1) The first rule says “. . . AD 1000 to AD 1980
(with some series ending slightly earlier, as discussed below).” , but does not discuss
either which proxies it refers to, or the justification for including them while other proxies
are omitted based on the same rule. 2) Rule two, implies using individual series rather
than proxy compilations, such as MBH98 or Yang E. China. Why was Yang included
and not the other compilations? 3) Rule three, in contrast to the overwhelming majority
of studies, specifies older data in preference to newer data without a specific reason
for the choice. What is the justification for using older data? 4) Rule four specifies
using northern hemisphere proxies with one unsupported exception. How is the southern
hemisphere gridcell connected to northern hemisphere average temperature and
how does one southern hemisphere proxy adequately represent NH tropical temperatures?
5) The unstated archive rule is used to exclude Indigirka, but fails to exclude
un-archived Yang and the use of other series that do not match archived data. 6) There
is no rule about the geographical spacing of proxies, or the use of several proxies from
a single location or temperature gridcell. Why are proxy geographic locations and distributions
Proxy selections not following a priori rules: The following proxies (including proxies
used in the Yang Composite) do not meet the stated rules for proxy selection: 1) Guliya:
differs from archived version, bad dating; 2) Dunde: differs from archived version; 3)
Dulan: unarchived; 4) S Tibet 1-12 (12 separate proxies): starts 1100, ends 1950,
unarchived; 5) East China: unarchived; 6) Great Ghost Lake: unarchived; 7) Jiaming:
ends 1960, unarchived; 8) Jinchuan: ends 1950, unarchived; 9) Japan: ends 1950,
unarchived; 10) Tornetrask: differs from archived version; 11) Taimyr: differs from
archived version; 12) Methuselah Walk: ends 1979; 13) Indigirka: Meets all criteria,
but not used.
Dual-Use Proxies: Some of the proxies have been used in previous studies as proxies
for other climate variables. 1) Greenland ?O18 from Fisher et al.: This proxy was
originally used by Fisher as a proxy for precipitation. (Fisher 1996). 2) Arabian Sea
G. bulloides: This proxy is used as a precipitation proxy in Treydte et al (Nature 2006),
a temperature proxy in Moberg et al (Nature 2005), and a wind-speed proxy in Anderson
et al. (Anderson 2002) Also, David Black, a published specialist (Science) in G.
bulloides, has pointedly disavowed the use of G. bulloides off Venezuela as a proxy for
temperature, as he considers it a proxy for trade wind strength. Since these proxies
were originally treated as precipitation or wind proxies, a) what reason do we have to
believe that they are also temperature proxies, and b) what procedure was used in the
UR to remove the effects of the confounding variables?
One Proxy Used Twice: Polar Urals (Briffa MXD version, Yamal Briffa 2000). These
two proxies have the same location, and differ only by the substitution of one proxy in
Geographical Locations: These are shown in SOM Figure 1.There are several areas
with more than one UR proxy (two in Northern Fennoscandia, two in Quelccaya, four
in western US). As there is no a priori rule for the spacing of the proxies, this opens
the door for speculation about basis of the selection because the immediate impact is
to skew the reconstruction toward the densely represented sites. At a minimum, the
proxies in the same temperature gridcell should be averaged to provide no more than
one value per temperature gridcell. (Mann 2003)
Tree-Ring Processing: The validity of any given proxy depends on the processing that
the proxy has undergone. The investigators of the Yamal Proxy (Hantemirov 2002) say
it should not be used in multicentennial reconstructions such as the UR. Justification
for its use is needed.
(C) We will explicitly state that we are using publically available data in the revision. The date selection rule will be modified to be only proxies extending from AD1000 to 1980. As noted above, previous use of proxies does not rule out using them again.
D) PROBLEMS WITH METHODS. Before using a method such as CVM, we first need
to provide a theoretical and practical foundation for the procedure. For example, it
would be very useful to take the gridcell temperatures for the locations of the proxies
and, using CVM, see how well the actual temperatures do at recreating the actual NH
data record. This should be done with a calibration and a validation period, to see how
well the CVM method is able to predict out-of-sample results. There is no indication in
the MITRIE paper that this has been done.
Assumption of Stationarity: The CVM method assumes that the variance in the climate
is stationary. Eduardo Zorita comments that “In the case of a stationary signal and
stationary noise: once the variance is matched in a calibration period, under stationarity
assumptions, its is matched in all periods.” http://tinyurl.com/yd8pmv
However, the Union Reconstruction is used to show that the recent temperature is
significantly higher than the historical temperature, viz: “The reconstructions evaluated
in this study show considerable disagreement during the 16th century. The new 18
proxy reconstruction implies 21-year mean temperatures close to 0.6 K below the AD
1866 to 1970 mean.” This, of course, means that the assumption of stationarity is
unfounded. In fact, there is a large signal in the variance of the Union Reconstruction
itself, which is shown in SOM Figure 2. As this figure shows, the standard deviation of
the UR varies by a factor of two over the time period, as well as containing an overall
trend. Thus, the assumption of stationarity is not supported by the data.
Error Estimation: Error is estimated using the MBH98 style “confidence interval estimation”,
and makes claims related to these. The Wegman report explicitly stated in
relation to MBH that these type of claims were unsupported by MBH98. The paper
says “A reconstruction using 18 proxy records extending back to AD 1000 shows a
maximum pre-industrial temperature of 0.25 K (relative to the 1866 to 1970 mean).
The standard error on this estimate, based on the residual in the calibration period is
0.149 K.” and “A new reconstruction made with a composite of 18 proxies extending
back to AD 1000 fits the instrumental record to within a standard error of 0.15 K. This
reconstruction gives a maximum pre-industrial temperature of 0.25 K in AD 1091 relative
to the AD 1866 to 1970 mean. The maximum temperature from the instrumental
record is 0.84 K in AD 1998, over 4 standard errors larger.” The reason why this error
estimate is incorrect is detailed in the SOM.
Assumption of Stable Global Temperature Field: One of the underlying, and untested,
assumptions of the CVM method is that the global temperature field is stable over
time. That is to say, will a CVM average that works in one century necessarily work in
another? As noted in Review 2, the assumption of stationarity in the variance is not
supported by the data, so there is no a priori reason for assuming that a CVM average
will work over a multi-century period. This assumption needs justification.
Lack of Correlation with Gridcell Temperature: Overall, the average of the absolute correlation
of the individual UR proxies with the NH data is passable (0.31 ⳳ 0.09 [95%CI]).
But the average of the absolute correlation of the UR proxies with local gridcell temperature
is much lower, 0.20 ⳳ 0.08 (95%CI). In addition, there is a negative correlation
between how well the proxies compare with the local gridcell, and how well they correlate
with the Northern Hemisphere. In other words, on average, the worse a proxy
does at correlating with the local temperature, the better the correlation it has with the
NH data. This is particularly evident in the four proxies that are in one gridcell, Upper
Wright, Methuselah Walk, Boreal, and Indian Garden (USA). They have a statistically
very strong (p = 0.003) inverse relationship between local and NH correlation. This
leaves us with an interesting problem. If the proxies are not well correlated with local
temperatures, by what mechanism can they be better correlated with the NH data?
Certainly, there are “teleconnections” between climate patterns in widely separated
parts of the globe. But what is the possible mechanism whereby the NH data can
affect a proxy without affecting the local temperature?
Lack of a Validation Period, Calibration Period Only: The problem of poor performance
“out of sample” in any type of reconstruction method (e.g. OLR, CVM) is widely recognized,
and is taught in undergraduate statistics. The way to test for this is to divide the
NH data into a “calibration” period and a “validation” period. The proxy data is first calibrated
against one period, and then validated against the other. Then the two periods
are reversed, the calibration period becoming the validation period and vice versa. Tree
ring data are particularly well suited for this purpose. (Rutherford and Mann 2004) This
allows us to determine how well the reconstruction performs “out of sample”. Since the
historical period is entirely “out of sample”, this is the only way we have to determine
how well the proposed reconstruction will perform during the historical “out of sample”
period. This is such elementary and standard practice that any deviation from the
practice requires a very strong theoretical reason for its omission. No such reason is
provided in the paper.
No Subsampling: Subsampling is the routine practice of dividing the proxies into different
groups, either by type or randomly, to see how well they perform. (e.g. Xiong 2001,
St. George 2000). With the exception of rudimentary subsampling done by removing
one proxy at a time, this has not been done with the UR. Figure 3 in the SOM shows
the results of one such test, showing the proxies divided into ice core data, tree ring
data, and “other”. Note that during the calibration period, there is good agreement between
the three different groups of proxies, and all of them agree in general with the
NH data. However, when we look at the full period 1000-1850 shown in Figure 4, the
situation is quite different. All three of the proxy groups, which correlated well with the
NH data during calibration, do not agree with each other at all during the 1000-1850
period. This points to a fundamental flaw in the argument that we can just average
them and get a useful or accurate reconstruction.
Problems with Autocorrelation: According to the MITRIE paper, the CVM method depends
on normalization of the individual proxies. To do this requires an accurate estimator
of the variance of the proxies over the calibration and verification periods. The
usual method for estimating variance in the presence of autocorrelation is to calculate
an “effective N”, a reduced number of degrees of freedom due to autocorrelation. This
effective N is then used to estimate the variance. The MITRIE paper has two problems
in this regard: 1) No adjustment is made for autocorrelation, and 2) Some of the proxies
are so highly autocorrelated that it is not possible to calculate the variance. The paper
contains no acknowledgement of these problems regarding CVM, nor any proposals of
how to deal with the problems.
(D) We are not supporting the terminology used by Mann et al. The conclusion describes
the relation between estimated temperature anomalies and the standard error
of the fit to calibration data. We assume that the sensitivity of the proxy composite is stable, not that the
temperature itself is stable. The correlation of individual proxies with local temperature
are clearly compromised by the fact that individual proxies have a signal to noise ratio
less than unity. The choice of normalisation would ideally be determined by estimates
of the signal to noise ratio of the individual proxies and of the noise autocorrelation
function (which is clearly not the same as the time series auto-correlation function).
Since these things are not known, the best choice is to normalise all proxies to unit
E) PROBLEMS WITH THE GROWTH MODEL: The paper assumes that the growth
response model for tree rings is linear in average annual T, of the form G = T + e where
G is the growth (tree ring width), T is the annual temperature and e is the error. In
fact, the growth function is a complex non-linear function, where G is the integral of
some function the daytime conditions f (T,M,C) + e over the growing season, where T
is daytime temperature, M is moisture, and C is CO2. Dr. Juckes has stated that this
is not a problem because “The data used in our study are selected from sites where
temperature is expected to be a growth limiting factor.” However, the paper does not
provide any verification of this claim.
Underlying Form of the Tree Ring Response: Plants do not have a linear or even a
quasi-linear growth response to temperatures. Instead, they have an upside-down “U’
shaped response to temperature. They grow fastest at an optimum temperature, and
grow more poorly if the temperature is either higher or lower than that temperature.
(Pisek 1973) Tree ring proxy analyses assume a linear response to temperature, with
wider rings correlating to higher temperatures and narrower rings correlating to lower
temperatures. The effect of this is to reduce the high-temperature peaks in the proxy
response that correspond to high temperatures. When the temperature is too hot and
the rings are correspondingly narrow, this is incorrectly interpreted as a cooler temperature.
This is one of the major unsolved problems with tree ring proxies, which is that they
identify higher than optimum temperatures as lower than optimum temperatures. While
the problem is unsolved, it should not be ignored, as at a minimum it should be reflected
in increased error estimations on the warm peaks.
The location of the optimum temperature for any plant depends on the available moisture.
Without adequate water, a plant will wilt and stop growing at a temperature at
which it will grow strongly with adequate water. Thus, the problem of the “U” shaped
response curve cannot be separated from the previous problem of the number of variables
in the growth response curve. In particular, in order to use tree ring proxies for
temperature reconstruction, it is necessary to show that the growth function for the selected
tree proxies is invertible, or that the limitation of the non-linear response curve
can be addressed in some other manner. Although this question of non-linear response
has received some interest in the specialty literature (e.g. Fritts 2003), it has not been
addressed in long-term paleoclimatology reconstructions.
(E) The model assumes there is a linear dependence in the composite, not in the
F) PRIOR RELEVANT INVESTIGATIONS: It is normal in scientific investigations to refer
to previous studies of the same subject. Regarding historical reconstructions of
climate, two of the most important reviews of the subject are the reviews done by the
NAS Panel (NAS 2006), and the Wegman Report (Wegman 2006). The MITRIE paper
has managed to totally avoid any comment on these two very important documents. In
addition, the paper has not addressed the issues unresolved by the Nature Corrigendum
(Nature Corrigendum 2004, McIntyre 2004).
(F) We refer to peer reviewed material published by independent journals, we do not
aim to review all “important documents”.
Overall, the MITRIE paper needs extensive work on both the “Intercomparison and
Evaluation” and the “Union Reconstruction” aspects before it is ready for publication.