Picking Cherries in the Gulf of Alaska

The bias arising from ex post selection of sites for regional tree ring chronologies has been a long standing issue at Climate Audit, especially in connection with Briffa’s chronologies for Yamal and Polar Urals (see tag.)  I discussed it most recently in connection with the Central Northwest Territories (CNWT) regional chronology of D’Arrigo et al 2006,  in which I showed a remarkable example of ex post selection.

In today’s post, I’ll show a third vivid example of the impact of ex post site selection on the divergence problem in Gulf of Alaska regional chronologies.  I did not pick this chronology as a particularly lurid example after examining multiple sites. This chronology is the first column in the Wilson et al 2016 N-TREND spreadsheet and was the first site in that collection that I examined closely.  It is also a site for which most (but not all) of the relevant data has been archived and which can therefore be examined. Unfortunately, data for many of the Wilson et al 2016 sites has not been been archived and, if past experience is any guide, it might take another decade to become available (by which time we will have all “moved on”).

The 2006 and 2014 Chronologies

In this case, the Gulf of Alaska chronology of D’Arrigo et al 2006 was the first long chronology using mountain hemlocks (TSME) from the Gulf of Alaska coast.   It had a pronounced divergence problem (top panel) and was never reported in a technical publication. In 2007, Wilson et al published a second long chronology, which purported to somewhat mitigate the divergence problem. (See Postscript).  In 2014, Wiles et al published a third long Gulf of Alaska TSME long chronology (later used in Wilson et al 2016), which was virtually identical to the 2006 version through its early history up to the 18th century or so, but which goes up in the 20th century, seemingly avoiding the divergence problem of the earlier series:


Figure 1. Gulf of Alaska TSME regional chronologies: top – D’Arrigo et al 2006; bottom – Wiles et al 2014, as used in Wilson et al 2016. 

Effect of Site Selection

Both Gulf of Alaska chronologies (D’Arrigo et al 2006 and Wiles et al 2006) used the same two subfossil data sets: both on the coast of Prince William Sound to the left of the location map shown below as Figure 2 (shown in large red-pink icons).  The identity of subfossil data explains the remarkable similarity of the two versions of the chronology up to about the 18th century: they are similar because they used the same data in this period.

However, the modern portion of the chronologies differs: the D’Arrigo et al 2006 version has a divergence problem, whereas the Wiles et al 2014 does not.   Both D’Arrigo et al 2006 and Wiles et al 2014 used RCS variations, but Wiles et al only used three (yellow) of ten D06 sites; Wiles et al discarded seven sites used in D’Arrigo et al 2006 (red below) and added five sites not used in D’Arrigo et al (green).  The D06 sites were first listed in the D’Arrigo et al 2006 Supplementary Information in 2012, over seven years after the article was cited by IPCC.

Remarkably, nearly all of the modern sites discarded by Wiles et al (red pins) are located close to and even almost contiguous with the two subfossil sites (both near the coast of Prince William Sound), while the five sites added by Wiles et al are all located about 800 km away near Juneau.


Figure 2 Location map comparing sites in D’Arrigo et al 2006 and Wiles et al 2014. Large red-pink – two subfossil sites used in both studies; red- seven modern sites only used in D’Arrigo et al 2006; yellow- three modern sites used in both studies; green – five modern sites only used in Wiles et al 2014. 


The only information in D’Arrigo et al 2006 on the provenance of their Gulf of Alaska data was that they used 820 cores and that its reference was “Wiles et al., Tree-ring evidence for a medieval warm period along the southern coast of Alaska, manuscript in preparation, 2005.”   Unfortunately, this article never appeared and, to my knowledge, there was never any technical publication of the D’Arrigo et al 2006 Gulf of Alaska series.  In 2012, an amendment to the D’Arrigo et al 2006 Supplementary Information finally listed the sites used in the D06 Gulf of Alaska regional chronology (used in the above location map.)

Wiles et al did not reconcile their sites against the sites previously used in D’Arrigo et al and, based on the location map, it is very difficult to contemplate a plausible ex ante rationale.  Indeed, it is hard to think of any rationale for the 800 km migration other than an intent by Wiles et al to  “partially circumvent” the divergence problem by only using modern sites that went up,  a program described in D’Arrigo et al 2009, (quoted in the previous post) as follows:

The divergence problem can be partially circumvented by utilizing tree-ring data for dendroclimatic reconstructions from sites where divergence is either absent or minimal. (Wilson et al., 2007; Buntgen et al., in press; Youngblut and Luckman, in press).

And, indeed, the divergence problem was definitely on the minds of Wiles et al. In their abstract, they stated that the modern sites in their network showed no “evidence of the so-called divergence effect”. They attributed this to the “moderate elevation” of the sites in their selection of sites:

The moderate elevation at the tree-ring sites has allowed these trees to retain their temperature signal without evidence of the so-called divergence effect, or underestimation of tree-ring inferred temperature trends, which is observed at many northern latitude forest locations.

Later, in the running text, they explained that they “target[ed]” sites where the “trees appear to still be responding positively to temperature” to avoid “bias[ing]” their results:

Here, we use tree-ring records from living hemlock at mid-elevation GOA sites where such trees appear to still be responding positively to temperature as in the past. Targeting such sites, we minimize divergence in the recent period that might bias our results and thus provide a more accurate assessment of contemporary warming relative to previous centuries.


It was either cheeky or ignorant on their part to characterize such blatant cherrypicking as a technique to avoid “bias[ing] their results”.   That such strategies are accepted without qualm both by referees and other specialists in the field speaks volumes.

A Replication Puzzle

Even spotting Wiles et al their modern sites, I do not believe that it is possible to replicate their non-declining chronology based on available data.

Wiles et al used 8 modern sites and two subfossil sites (listed in their Table 1).  Measurement data for the two subfossil sites and six of eight modern sites appears to be fully archived at NOAA, but one data set (Wright Mountain) is completely unarchived and an unarchived (and expanded) second version of Eyak Mountain appears to have been used in Wiles et al 2014.  Ironically, Wiles et al 2014 Table 1 specifically (but incorrectly) stated that the Wright Mountain data had been archived at ITRDB.

Nonetheless, the archived data for the two subfossil sites and 6.5 (of 8) modern sites permits calculation of an RCS chronology that would one would expect to be quite similar to the chronology reported in Wiles et al 2014.  Using the available data, I therefore calculated an RCS chronology (see bottom panel) using a one-size-fits-all standardization curve, an RCS variant said to have been used, according to the running text of Wiles et al 2014.   The correspondence between the Wiles chronology and my emulation is very close up to the 18th century, but I was unable to replicate the closing uptick of the Wiles et al 2014 reconstruction, obtaining instead the closing decline, also seen in the D’Arrigo et al 2006 version.


Figure 3. Top – Wiles et al 2014 reconstruction re-scaled to match chronology scale; bottom – emulated RCS chronology using available ITRDB data for sites listed in Wiles et al Table 1.

In the next figure, I’ ve tried to highlight the 20th century difference between the two versions by zooming in.  At high frequency, the Wiles et al version and the emulation are very similar, but the emulation (red) shows the characteristic decline (divergence problem), while the Wiles version goes up slightly in the 20th century, with most of the increase due to higher post-1975 values in the Wiles reconstruction.


Figure 4. Detail of chronologies shown in Figure 3.

It is possible that inclusion of the unarchived data from Wright Mountain and Eyak Mountain will reconcile the differences; if so, there is considerable irony in the proposed mitigation of the divergence problem depending on only two sites, neither of which have been archived.  It is possible that the difference arises in different implementations of poorly described RCS protocols – maybe the chronologies were estimated site by site and averaged, rather than one size fits all.  There is one final possibility that I would never have postulated prior to my recent reconciliation of the D’Arrigo et al Central Northwest Territories regional chronology: in that case, D’Arrigo selectively included cores from a site that went up, while selectively excluding cores from a site that went down.  Without a complete measurement archive, there is little point reflecting further on such matters.


My underlying issue with “regional chronologies” is that the 20th century shape of the chronologies can be dramatically impacted by ex post selection of modern data.  I originally raised the question of ex post data collection in the earliest days of Climate Audit in connection with the NH reconstruction of Jacoby and D’Arrigo 1989.  I wrote many posts on this issue in connection with Briffa’s Yamal and Polar Urals chronologies, where site selection clearly impacted the shape of the chronology (see e.g. here here here here here.)  This was a large controversy leading into Climategate.

In a recent post, I showed that D’Arrigo consciously attempted to “circumvent” the divergence problem by ex post selection of sites that went up, with a surprisingly blunt implementation of this questionable strategy in the CNWT regional chronology of D’Arrigo et al 2006. In today’s post, I showed that the Gulf of Alaska regional chronology is one more example, where the shape of the regional chronology has been impacted by ex post site selection, in this case, with the selective use of sites over 800 km distant from the target subfossil sites.

Some time ago, Gavin Schmidt observed of a chronology of which he disapproved (his objections not actually being valid, but that’s another story):

if any actual scientist had produced such a poorly explained, unvalidated, uncalibrated, reconstruction with no error bars or bootstrapping or demonstrations of common signals etc., McIntyre would have been (rightly) scornful.

Even though that the most recent Gulf of Alaska chronology amply meets Schmidt’s criteria of being “poorly explained, unvalidated, uncalibrated, reconstruction with no error bars or bootstrapping or demonstrations of common signals”,  I will content myself with mild (Canadian) disapproval, but would not strongly argue with Schmidt if he wrote a review that was more severely “scornful”.




Postscript – Wilson et al 2007

In 2007, Wilson et al published a third regional chronology using Gulf of Alaska TSME sites.  While this chronology was not used in the Wilson et al 2016 composite,  the Supplementary Information of D’Arrigo et al 2006 stated that the cores used in D’Arrigo et al 2006 were identical to the cores used in Wilson et al 2007.   I have concluded that his information is false, but it took me quite a bit of time to be confident of this conclusion and I wish to document my reasoning while it is fresh in my mind.

Wilson et al 2007 had been discussed at Climate Audit soon after publication (also here on varimax rotation).  Needless to say, the measurement data required for analysis was not available at the time of publication.  The comments thread contained a lively exchange between Willis Eschenbach and Rob Wilson about archiving: Eschenbach sharply criticized Wilson and coauthors for failing to archive data concurrent with publication; Wilson attempted to deflect the criticism as overwrought on the grounds that archiving delay, while regrettable, would be slight.  As it turned out, the majority of the missing data wasn’t archived for another five years (2012) and a little is still unarchived, a delay which, in my opinion, more than vindicates Eschenbach’s side of the dispute.

In fall 2009, Kaufman et al (2009) published a multi-proxy Arctic reconstruction, one item in which was a Gulf of Alaska temperature reconstruction attributed to D’Arrigo et al 2006 (which had produced an RCS chronology but not a temperature reconstruction.)  In December 2009, the Supplementary Information to D’Arrigo et al 2006 was amended, including the archiving of the Gulf of Alaska temperature reconstruction used in the recently published Kaufman et al. (All other D06 chronologies remained unarchived until 2012!!)

The 2009 SI amendment stated that the D06 Gulf of Alaska chronology had used the same 820 cores as the Wilson et al 2007 reconstruction:

Wilson et al. 2007 produced a Gulf of Alaska reconstruction based on an STD chronology derived from the same 820 ringwidth series….

820 individual series that were published in the two articles listed above. The Standard Chronology (ak096.crn) was used for the reconstruction by Wilson et al. 2007. The RCS chronology (ak096c.crn) was used in the D’Arrigo et al. 2006 reconstruction.

At this time, two chronologies (ak096.crn and ak096c.crn) and one measurement dataset (ak096.rwl) were contributed to the ITRDB data bank.

However, Wilson et al 2007 (of which Wiles was a coauthor) described an entirely different network that that illustrated in my Figure 2 (based on my reconciliation of the core numbers of ak096.rwl.  Wilson et al listed an opening network of 31 sites in their Table 1.   Wilson et al appear to have calculated RCS chronologies on a site-by-site basis for all 31 sites, which were then screened for correlation to instrumental data, resulting in nine sites being discarded.  The 31 Wilson et al 2007 sites were shown in a location map in the original article, reproduced and annotated below, showing a stretch of the Alaska coastline almost 1000 km long, reaching from the Juneau area on the right to Kodiak Island on the left:


location map_annotated

Figure 5. Location map from Wilson et al 2007, showing the 31 sites (22 used sites in solid colors), overprinting the D06 sites (magenta +). 

In 2012, more major changes were made to the SI to D’Arrigo et al 2006.  Seven years after my request to IPCC, the 19 regional STD and RCS chronologies were finally archived.  While the STD and RCS chronologies archived in 2012 for Gulf of Alaska matched the two ak096 chronologies archived in 2009, the chronologies for most of the sites appeared in 2012 for the first time.  New 2012 commentary on the Gulf of Alaska chronologies stated that the D06 chronology had been developed from 10 modern sites:

Coastal Alaska
10 Living chronologies:

Data with ITRDB code:
Ellsworth Glacier, Alaska (EL)	ITRDB AK015
Rock Glacier (RG)		ITRDB AK024
Water Supply (WS)		ITRDB AK029
Wolverine Glacier (WV)		ITRDB AK030
Tebenkof Glacier (TB)		ITRDB AK025
Miners Well (MW)		ITRDB AK021
Nichawak Mountain (NK)		ITRDB AK022
Cordova Eyak Mountain (CV)	ITRDB AK020
Massive Rock near Cordova (MR)	ITRDB AK090
Rock Tor (RT)			ITRDB AK091

Sub-fossil material: Data not archived and continually being updated.
Relevant contact is Greg Wiles (gwiles@xxx) – primary generator 
of the data - and Rob Wilson (rjsw@xxx) who has original 
2006 version used.

I’ve marked the location of these sites used in D’Arrigo et al 2006 with a magenta + sign.  Nearly all of the 10 come from the Prince William Sound area (top towards the left), whereas the W2007 sites stretch for about 1000 km along the coast.  The two subfossil sites (used in all long chronologies) both come from the Prince William Sound area (marked with solid magenta dots).  Ironically, although the 2012 SI amendment said that the subfossil data was not archived, it had actually been archived in 2009 (as part of ak096.rwl).

Obviously , the 10 modern sites used in D’Arrigo et al (2006) do not match the 22 modern sites used in Wilson et al 2007. Only nine sites are common. Thirteen sites used in Wilson et al 2007 are not used in D’Arrigo et al 2006, while one site used in D’Arrigo et al (Tebenkof Glacier) was not used in Wilson et al 2007.  It is obviously impossible for the 820 cores used in D’Arrigo et al 2006 to be identical to the cores used in Wilson et al 2007, unless the descriptions in Wilson et al 2007 are completely incorrect.

It is also instructive to review the multivariate methodology of Wilson et al 2007 as a potential contributor to their “circumventing” the divergence problem.  After they had screened their original network from 31 to 22 sites  – ex post screening of the type long criticized at Climate Audit, they carried out principal components analysis on the 22 site-by-site chronologies (each of which was calculated as a site STD chronology).  They retained four principal components, which were then subjected to varimax rotation.  They then calculated a temperature reconstruction by regressing instrumental temperature onto the four (rotated) principal components in a calibration period.   The resulting temperature reconstruction (not shown in this post, but its shape is similar to the Wiles et al 2014 reconstruction shown above) did not have the 20th century decline that characterized the Arrigo et al (2006) reconstruction.

In a recent discussion at Bishop Hill,  Rob Wilson likened the improvement in recent regional chronologies to the improvement from a Trabant to a 2016 BMW Series 1:

Of course there are older versions, but only a fool would use an old version with less data or that had calibration issues etc. Would you rather drive a Trabant or a 2016 BMW series 1. Duh!

Each of the multivariate operations in their PC methodology is linear and thus the temperature reconstruction is necessarily a linear function of the underlying 22 chronologies. However, the technique of Wilson et al (2007) does not constrain the coefficients to remain positive.  Their method can result in negative coefficients i.e. flipping of series upside down (an issue that Jeff Id and I have discussed on many occasions in the context of Mannian methodology). Even if it is possible to extract information on regional temperature from the tree ring data, in my opinion, complicated multivariate methods like that of Wilson et al 2007 are a retrogression from simpler regional averages, rather than an improvement – let alone an improvement on the order of a Trabant to a BMW Series 1 – unless one were  attempting to quantify “improvemens” in the technology of “data torture”.


  1. Craig Loehle
    Posted Feb 2, 2016 at 5:18 PM | Permalink | Reply

    A double face-palm doesn’t do this justice. Just the sloppiness of data handling and archiving by the dendros is pretty appalling, but if so many sites have to be tossed as “divergent” then what in the world is the basis for the claim that tree rings are thermometers?
    If a tree becomes crowded or sick or hit by lightening over time, it will in fact show reduced growth compared to an expected age curve (divergence). If another tree has neighbors die, it will grow faster in recent years. If the true limiting factor is rainfall or snowfall, then there is no actual relationship to temperature. And then there is the tree that actually is temperature limited. All of these trees are thrown into the hopper and the ones that correlate to temperature are selected by the sausage maker. I do not approve.

  2. Follow the Money
    Posted Feb 2, 2016 at 8:14 PM | Permalink | Reply

    The moderate elevation at the tree-ring sites has allowed these trees to retain their temperature signal without evidence of the so-called divergence effect, or underestimation of tree-ring inferred temperature trends, which is observed at many northern latitude forest locations.

    Although logically delusional, perhaps we can take from this a tacit burial of the treeline treemometer kindergarten of thought. I.e., the divergence problem is big for latitudinal treeline rings, and probably the altitudinal ones too. Their solution: 1. find some trees that fit, 2. assess their elevation, 3. vacuously assert said non-treeline elevation produces better treemometers. (Gaspe…”moderate” elevation too??)

    we minimize divergence in the recent period that might bias our results


    • Craig Loehle
      Posted Feb 3, 2016 at 10:18 AM | Permalink | Reply

      So, bristlecones are out then? Someone better send out a memo…

  3. Michael Jankowski
    Posted Feb 2, 2016 at 9:27 PM | Permalink | Reply

    “…Targeting such sites, we minimize divergence in the recent period that might bias our results and thus provide a more accurate assessment of contemporary warming relative to previous centuries…”

    Priceless. Divergence is a bias, but selection bias is not a bias?

  4. Michael Jankowski
    Posted Feb 2, 2016 at 9:35 PM | Permalink | Reply

    “…It is possible that inclusion of the unarchived data from Wright Mountain and Eyak Mountain will reconcile the differences…”

    Yeah, “possible.” You’re too kind. The shape of Wiles et al 2014 and your emulation match so very well. It would seem remarkable that the inclusion of unarchived data would retain almost the identical shape and yet shift the composite data to match your emulation. At face-value, it seems more like some undescribed processing or “poorly described RCS protocols” were performed.

    But even if it were just a case of incorporating unarchived data, you’re right – how ironic. I guess it just goes to show how “robust” (I am so tired of reading that word describing garbage) or not the results really are.

    Steve: prior to determining that D’Arrigo had cherrypicked individual trees in the CNWT regional composite, I would considered that cherrypicking was limited to the selection of sites, but not at a tree level. Right now, it’s hard for me to see how Wiles could have got his 2014 version without cherrypicking at a tree level.

    • Clark
      Posted Feb 3, 2016 at 12:25 PM | Permalink | Reply

      ” Right now, it’s hard for me to see how Wiles could have got his 2014 version without cherrypicking at a tree level.”

      Wouldn’t that be the logical extension of the original ex post screening philosophy? If a whole site is “unsuitable” because of divergence, why wouldn’t an individual tree also be considered by the same criterion?

  5. Posted Feb 3, 2016 at 1:33 AM | Permalink | Reply

    But how are the trees doing as thermometers, has anybody ever compared regional tree rings with regional temperatures?

  6. Posted Feb 3, 2016 at 7:18 AM | Permalink | Reply

    For those interested – 2007 paper here:

    In the 2007 study, RCS was not used to detrend the data. It states quite clearly:

    “To remove non-climatic biological age-related
    trends (Fritts 1976), the individual raw ring-width series
    were detrended using negative exponential functions
    or regression lines of negative/zero slope (Cook
    and Kairiukstis 1990). For 12 of the chronologies (DM,
    WP, RG, EX, LL, MR, NK, TM, MW, AP, MT and
    KI), the Cook and Peters (1997) power transform was
    used to reduce end effect inflation of resultant indices
    in some select series.”

    The focus of the study was on multi-decadal variability so the processing of the data was quite different to the regional GOA RCS chronology used in DWJ06. The full GOA recon shown in Figure 9a does not express any longer term secular variability as that has been removed via detrending.

    Yes – I did screen the data against the large regional temperature series. Agreed that this could lead an inflated r2 value – easy to test. However, one thing you have not stated is that the sub-fossil chronology extension IS a simple mean and that compares rather favorably with the PCregression living reconstruction – see Figure 7. I would see that as a form of validation.


    Steve: Thanks for the comment. I changed a few words in the relevant sentence of the postscript to note that the 22 screened chronologies used STD methods. Thanks for observing this. This point about Wilson et al 2007 does not affect the cherrypicking argument, nor does it affect the observations of the multivariate method of Wilson et al 2007. I value the ability to both agree and disagree with Rob civilly.

    • davideisenstadt
      Posted Feb 3, 2016 at 7:30 AM | Permalink | Reply

      Rob: started to digest the article you to which you were kind enough to provide a link.
      One quick question: I noticed that you relied on a few data sets that were published by D’Arigo…any diem if she employed the type of post hoc selection of the members of those data sets that she has endorsed?

    • Jeff Id
      Posted Feb 3, 2016 at 9:29 AM | Permalink | Reply

      “Agreed that this could lead an inflated r2 value”

      Is there some possibility that it would not lead to inflated agreement? – My answer is no so a test is not required.

      What this screening is, is an ad-hoc form of regression utilizing binary (1 or 0)weighting. All of these regression methods do the same thing they just have different weightings.

      What many of us are stating is that the practice is completely invalid, what I would like to have is an explanation of how we are wrong. We have all been looking for a rational explanation for years, and cannot get a serious answer.

      • Steve McIntyre
        Posted Feb 3, 2016 at 9:53 AM | Permalink | Reply

        What this screening is, is an ad-hoc form of regression utilizing binary (1 or 0)weighting. All of these regression methods do the same thing they just have different weightings.

        I absolutely and totally agree not just with the overall point, but with this way of expressing the point.

        • davideisenstadt
          Posted Feb 3, 2016 at 1:21 PM | Permalink

          yet Rob, in his article explicitly states that he does this.
          How can it be that so many in this subdiscipline are so bereft of the most cursory and superficial understanding of the foundations off statistical analysis?
          His own efforts entail “calibrating” his data to the instrumental record, and discarding those times series that dont correlate well with it, with no real rationale other than they somehow aren’t magic enough to convey some temperature signal…

    • Posted Feb 3, 2016 at 11:26 AM | Permalink | Reply

      Prof. Wilson:

      How did you control for the wide range in local precipitation across the Gulf of Alaska? Your paper discusses temperature correlations with the PDO and NPI but I could find no discussion of precipitation other than a suggestion that several of the discarded sites were potentially in “drier” locations.

      For those who are unfamiliar with the region, micro-climates abound. Annual precipitation among locations varies by several meters (yes, meters). As a result, using records from, say, Sitka or Juneau would not necessarily reflect the local precipitation (or temperature?) at tree-ring sites.

      Specifically, your reference to drier locations notes that some discarded chronologies:

      …are located at more ‘interior’ sites around the extreme western end of Prince William Sound and are likely protected from the influence of the North Pacific resulting in drier site conditions.

      If one assumes, as implied by the quote, that “influence” from the North Pacific results in increased precipitation, how did you control for this factor when attempting to extract the similarly influenced temperature signal?

      And thank you for providing the link to your paper and your willingness to engage in the discussion.


  7. kenfritsch
    Posted Feb 3, 2016 at 10:14 AM | Permalink | Reply

    It would appear that SteveM’s examples of selecting proxies after the fact and explaining the problems that that incorrect procedure causes will continue to fall on deaf ears in the dendro community. Would not it be great to hear a member of that community delve into the basic statistical issues and biases that post selection creates? Too many who discuss these issues, and from all sides of the validity issue, seem at some point to get distracted with other details of dendro investigations and papers that deal with temperature reconstructions. Pure and simple is it that without a valid prior selection process for proxies the reconstructions are not valid – even if accepted within the community as such.

    Even if one wants to ignore that basic selection problem and point to correlation coefficients between temperature and tree rings or tree rings reactions to temperature pulses such as from volcanic eruptions there remain statistical problems there.

    Tree rings obviously react to temperature and unfortunately to a number of other variables that can be climate or non climate related. If, and this is a mighty big if, those other variables were occurring over time in a random fashion it would be expected that a sufficiently large sample would have those variables cancelling out. But by that reasoning all prior selected samples would have to be used from the range of temperature responses that would be expected. An after the fact selection would ruin the cancellation process.

    It is sometimes shown that tree rings react to volcanic pulses and this should not be unexpected as a sudden large change in temperature would tend to dominant over the other variables. The problem here is that the amplitude of those responses are not necessarily in portion to the expected temperature change. This reaction leads to another issue of high and low frequency responses of tree rings to temperature. A low frequency response from tree rings is probably what is needed for validating the tree ring as a reasonable thermometer for climate science as it is the trend in temperature that is the important feature. The problem with that relationship between temperature and tree response is that with two time series with large auto correlations an artificial and false correlation can be found with reasonably high statistical probability depending on the amount of auto correlation. On the other hand qualifying a tree ring proxy by its high frequency response to temperature can lead to a reasonable correlation that can actual have a divergence in trend – which is back to case of the tree ring responding in direction to temperature but not in proportion to the change in temperature.

    If SteveM were to present a quiz here on all these issues I would predict that no dendros would be takers and many of those who might call themselves skeptical of the validity of tree ring temperature reconstructions would fail or do poorly.

    • davideisenstadt
      Posted Feb 3, 2016 at 1:27 PM | Permalink | Reply

      Thanks for reiterating this point.
      If one cannot, before the act, articulate a process and a rationale for discarding data, then what is happening is a post hoc selection of data that agree with one’s hypothesis…nothing more nothing less.

    • Steven Mosher
      Posted Feb 4, 2016 at 9:07 PM | Permalink | Reply

      Ok Kenneth

      here is a question for you, david, craig, mc.

      prior to test I establish my selection criteria ( environmental ect ) and
      i include in this selection criteria RW correlation to local temperature, even the most crude type of correlation metric.

      When it comes to reconstructing, however, I use density.

      Steve Mc: I don’t have a clue what your question is here. Look, a proxy for temperature is supposed to have a correlation to temperature. Thermometers work because their physical properties correlate to temperature. The point is that – as I’ve discussed over and over – if you believe that white spruce chronologies at treeline (or whatever) are temperature proxies and you go and collect 31 of them, then you have to use all 31 in your study. You can’t de-select “divergent” chronologies ex post. Again, consider the situation of a portfolio. He can’t redo his portfolio afterwards. Surely this isn’t what you’re talking about.

      • davideisenstadt
        Posted Feb 5, 2016 at 1:27 AM | Permalink | Reply

        since you posted this and addressed it to me, among others, i will take a swing at responding to your post:

        “here is a question for you, david, craig, mc.

        prior to test I establish my selection criteria ( environmental ect ) and
        i include in this selection criteria RW correlation to local temperature, even the most crude type of correlation metric.

        When it comes to reconstructing, however, I use density.”

        Is there a question in your comment?
        I dont see one.

        Im pretty certain that disregarding any data in your sample after you collected it because it didn’t correlate with local temperature is a post hoc selection of data, even if one uses density for the reconstruction… is this not true?
        In the end, aren’t you going to use density because it…correlates with local temperature?
        Are you implying that tree density is somehow correlated to something other than local temperature?
        Why the “Tennessee Two Step”?
        You may understand programming in R, but this key concept (post hoc selection) seems to escape you.

        And, BTW, what would be your concept of “the most crude type of correlation metric”?
        Isn’t this a situation with one dependent and one independent variable?
        Is there a more simple scenario in which one would wish to compute a correlation coefficient between two variables?

        The only rational reading of your selection criteria is that you establish that the only tree rings you will include are those that correlate to local temperature…
        you are then going to use these data to show what?
        that they are correlated to local temperature?
        geez mosh.

        • HAS
          Posted Feb 5, 2016 at 4:06 AM | Permalink

          I think the problem is that referring to ex-post suggests the problem arises because you are discarding having eye balled the results. This is of course a problem, but you will have potential problems with inference off a sample that is systematically biased by any selection criteria.

          Take an analogy. We know the weight over time of members of a community and for a period we know the general level of deprivation faced by it and we want to estimate this in times gone by. Ex-ante we specify that we will only use those members of the community where there is 95% significant correlation between the deprivation and weight measure (or if we’re Steven Mosher we say height and deprivation). No ex-post funny business.

          But the problem is that we want to get an estimate of the deprivation the community faced, not the deprivation these individuals faced. This subsample is very likely not representative within sample, let alone useful for drawing inferences out of sample.

          An example is that individual weight vulnerability/sensitivity to deprivation at a given time is likely to a function of a wide range of other factors. Self-sacrifice by older community members would be an example. Assuming the same relationship applies when they are much younger would be quite wrong.

          So it isn’t just ex-post selection, it is the risk of bias from any selection process that isn’t random within the population of interest.

        • Patrick M.
          Posted Feb 5, 2016 at 7:31 AM | Permalink

          “i include in this selection criteria RW correlation to local temperature,”

          Substitute “the price of gold” for “local temperature.”

          It seems to me that if your selection criteria is correlated to what you are looking for then you are going to find it, right?

        • Patrick M.
          Posted Feb 5, 2016 at 7:50 AM | Permalink

          Okay I think I see what Steven Mosher is doing. Correlate to ring WIDTH and then reconstruct against ring DENSITY, implying that ring width is independent of ring density, or that it is just as independent of ring density as perhaps elevation or tree species or other selection criteria used prior to reconstruction.

          Steve: perhaps, but it still doesn’t make sense. Both density and ring width are believed by dendros to be correlated to temperature, so density cannot be “independent” of rind width.

          But the main defect is that averaging is a time tested and well understood way of getting the central limit theorem to work. There is ZERO purpose in trying to figure out some complicated way of doing things worse – though that seems to be one of the principal preoccupations of paleoclimatologists these days.

        • davideisenstadt
          Posted Feb 5, 2016 at 8:21 AM | Permalink

          Either way, what Mosh his attempting to suggest (i think) is that if one establishes before the fact that one intends to screen after collecting the data, that this is an acceptable practice…
          In other words, premeditated post hoc screening doesn’t count.

        • Patrick M.
          Posted Feb 6, 2016 at 12:18 PM | Permalink

          Steve Mc replied:
          “Steve: perhaps, but it still doesn’t make sense. Both density and ring width are believed by dendros to be correlated to temperature, so density cannot be “independent” of rind width.

          But the main defect is that averaging is a time tested and well understood way of getting the central limit theorem to work. There is ZERO purpose in trying to figure out some complicated way of doing things worse – though that seems to be one of the principal preoccupations of paleoclimatologists these days.”

          I’m guessing that Mosher’s point is that choosing a tree based on elevation because trees at a certain elevation are known to correlate to temperature is not much different than choosing them by correlation to ring width. Both criteria are based on the trying to find trees that do correlate to temperature.

          BUT I feel that this still allows, (actually encourages), cherry picking. Does it make any sense to go half way with a process like:

          1. Collect ALL available data that has both ring width and ring density data.
          2. Screen the data based on correlation to ring width, BUT the correlation will be based on a randomly selected sample of the tree’s lifetime. (Example: A tree has 200 years of data. Divide the total years by a factor of 2 gives 100 years. We select a random 100 year interval in that tree’s data to correlate ring width for that tree. Repeat this process for each tree.)
          3. Now take a simple average of ring density for the trees that passed the screening.

          This way Mosher gets to screen based on ring width and yet because the sample was from a random sample of each tree’s lifetime it becomes harder to cherry pick hockey sticks. Because the screening is objective, there is no need to do any pre-screening screening by humans so ALL tree data can be used regardless of elevation or species, etc. Using simple averaging eliminates the possibility of eliminating trees through weighting.

          I would leave the correlation parameters up to statistical experts like you. Since I know very little about statistics or trees I will just go back to lurking. :)

        • Steven Mosher
          Posted Feb 8, 2016 at 3:01 PM | Permalink

          The question is do rw mxd and blue intensity all necessarily correlate. For example does a divergence in rw necessarily imply a divergence in blue intensity. And do they all reconstruct the same season. I think it’s a bit trickier than simple screening fallacy. Of course if rw and density were strictly linearly related
          It would be easy to answer that post test screens on rw and just switching to density was a ploy.

        • Steve McIntyre
          Posted Feb 8, 2016 at 6:55 PM | Permalink

          rw mxd and blue intensity all necessarily correlate

          MXD and RW have weak correlation, sometimes Mannian weak. Which makes it rather hard for both of them to be “proxies” for temperature since a proxy by definition has to have a linear relationship to temperature.

          If you looked at plots of data – as Willis does- you would not be quite so quick to assume that there’s a meaning to the squiggles.

          You’ve spent so much time on thermometers – which actually do measure something and it’s only a matter of teasing out biases – that I think that you’re falling into the trap of assuming that “proxies” are a sort of noisy thermometer – the Phil Jones problem – and that the problems can be cured by math. But if the proxies do not have a consistent relationship to temperature, the problem is completely different than the one that you’re used to.

        • kenfritsch
          Posted Feb 8, 2016 at 7:37 PM | Permalink

          Steve Mosher and Patrick M, if the dendro is confident in what criteria to use ex post facto for selecting proxies that respond reasonably well to temperature changes, that proposition could be tested by selecting based on that criteria aprior and then using all that data for correlation to the instrumental data. To avoid snooping at data already obtained the dendro would have to go to the field for new data for testing against the aprior criteria.

          Why is this not a dendro project? Maybe they, like those Mosher calls skeptics and accuses, they would rather talk about it and conjecture instead of doing the hard work – or maybe they would rather not know in such a conclusive manner. An alternative is that they just might not know any better or want to admit that there is a problem here.

      • jferguson
        Posted Feb 8, 2016 at 11:14 PM | Permalink | Reply

        Must a proxy have a linear relationship? Wouldn’t a known relationship do the job? logarithmic? Or is linear built into the biology?

        Steve: I think that monotonic is the better word.

        • HAS
          Posted Feb 9, 2016 at 12:25 AM | Permalink

          I would have thought the as long as every proxy value corresponds to just one temp that is probably good enough to work with.

        • davideisenstadt
          Posted Feb 9, 2016 at 4:27 AM | Permalink

          yes…but tree growth is dependent on a myriad of factors, and the trees’ responses aren’t monotonic to any of them.This fact makes TRs great candidates for an accurate, precise proxy capable of giving high resolution data (from a temporal perspective).
          what could go wrong?

  8. Posted Feb 3, 2016 at 12:07 PM | Permalink | Reply

    Correct me if I am wrong, but your assumption is that all the trees we have sampled are behaving in the same way w.r.t. their response to temperature and other factors.
    That is of course not the case.

    We sample many trees per site to derive a mean chronology which maximises the common response.
    Site selection will help ensure that common response is related to a climate variable that we would like to reconstruct.

    So a basic rule is that high latitude/elevation tree-line will be controlled predominantly by temperature and likewise low latitude/elevation tree-line will be controlled predominantly by moisture availability.

    The 31 chronology network for the Gulf of Alaska is a rather mixed network of sites from low to high elevations and different species. We cannot expect them all to respond similarly to climate and as stated other factors may influence growth. Not all of these sites were sampled specifically for dendroclimate analyses. Greg Wiles is a glaciologist and some sites were developed purely for his dendrogeomorph dating etc.

    So – screening is one often used method to identify the sites that best express the “desired” signal.

    So – in my 2007 paper, I screened the 31 sites and 22 expressed a significant correlation with Jan-Sep temperatures. I used those for further analysis. But also look at Figure 2 – PC1 and 3 represent trees with quite different responses. This is not a simple issue – in fact this is a typical situation of working with ring-width. It is almost always simpler when using density based variables.

    Anyway – the resultant 22 site chronology PCregression analysis returns an overall ar2 value of 0.44 with a Durbin-Watson value for the residuals being 1.87 (no linear trend in residuals).

    If I had used all 31 chronologies, the results actually would be better with an ar2 of 0.49 (DW = 1.98).

    If I create a simple mean of the 31 sites, the r2 value is only 0.21 (DW = 1.54). Steve will likely say that this is the correct approach and this is the actual amount of variance explained by such trees in this region, but I would argue that that is nonsense as it does not take into account that some sites are more optimally located than others w.r.t. temperatures response.

    The good news is that we’re busing measuring Blue Intensity in this region and this should improve the calibrations substantially and reduce this ambiguity that is keeping Steve up at night.

    • Salamano
      Posted Feb 3, 2016 at 12:54 PM | Permalink | Reply

      “If I create a simple mean of the 31 sites, the r2 value is only 0.21 (DW = 1.54). Steve will likely say that this is the correct approach and this is the actual amount of variance explained by such trees in this region, but I would argue that that is nonsense as it does not take into account that some sites are more optimally located than others w.r.t. temperatures response.”

      Isn’t it possible though, that no matter how restrictive you define ‘optimally located’, you can still get trees that you must ‘screen out’ after the fact using observed temperature? Would this not tilt the scale closer to what Steve is talking about true tree variance — or, are you just that confident that all of the rejected trees have a definite locational basis to be excluded?

    • HAS
      Posted Feb 3, 2016 at 1:37 PM | Permalink | Reply

      What I don’t understand is why you don’t incorporate the latitude/elevation model into your temp model and fit the whole lot.

      • Posted Feb 3, 2016 at 3:37 PM | Permalink | Reply

        Judging from the Lat-Long and elevation data of Wilson’s Table 1, the location of the Kenai Lowlands site (“LL”) must be upstream from the head of Tutka Bay on the Kenai Penninsula. Table 1 reported Lat-Long of 59.41–151.25 and elevation of 20 meters.

        However, his screening process rejected the nearby Grewingk Glacier site (“GW”) with a reported Lat-Long of 59.37–151.09 and the same reported elevation of 20 meters. It seemed odd to me that two sites in the same area with the same elevation would produce different results in his screening process and I thought perhaps it was due to glacial influence on local temperature. However, when I tried to find the approximate GW location on a topo map it appeared to be in an area of greater than 2000 feet in elevation. It seems that the location, elevation or both might be incorrectly reported.

        This makes me wonder whether any quality control review was conducted on the reported locations and elevations of the 31 sites shown in Table 1.

    • Jeff Id
      Posted Feb 3, 2016 at 2:28 PM | Permalink | Reply

      I’m sorry, is this in response to me? It is out of sequence so I don’t want to assume.

    • davideisenstadt
      Posted Feb 3, 2016 at 5:26 PM | Permalink | Reply

      The whole point is that if you cant articulate why one site is better than another BEFORE you analyze the data, you aren’t really finding anything but spurious correlations.
      That you first collect the data, run your regressions on it, and discard those time series that dont conform to your hypothesis means that youre not testing for any hypothesis.
      The proper procedure, one that is familiar to even the most naive researchers, is to articulate your data collection procedure, in your case, articulate a method for identifying the most promising areas, collect your data, and then let the chips fall wherever they may. If your data is noisy, and doesn’t conform to your expectations, that means you haven’t figured out what makes a site promising.

      “…..screening is one often used method to identify the sites that best express the “desired” signal.”

      This is precisely the problem…the proper way to screen for good sites is to articulate before just what sites you think may be good, and then test your hypothesis.
      You simply cant go and sample sites, look for a correlation, throw out the data that dont conform to your hypothesis, and then declare that those remaining sites express the desired signal.
      They merely appear to express the desired signal.
      Look, If I give you ten thousand time series, all red noise, all generated by a quasi random process, and then mine them for correlations to any time series you can think of, I will find some that actually appear to correlate very well. If I then discard the rest, then i have shown that whatever time series Im investigating is actually correlated to red noise. But we know this isn’t the case.
      Steve has made this very point ad nauseum.
      Just because by employing post hoc screening you have created the appearance of a correlation doesn’t mean there is a meaningful correlation.
      You and your community would be well served to do a little reading regarding what the guys who invented this type of analysis, econometricians, have to say about what you are doing.
      They’ve been doing this since your parents were kids, and were fooled at one time, just as you are today.
      Thats how the term “spurious correlation” was first coined.
      The only difference is: they were fooled two or three generations ago., theres no excuse for this practice today.
      Here is a suggestion: print this article, and the comment thread, make an appointment with a statistics professor at a university near to you, and ask her to explain this to you.

      • Posted Feb 4, 2016 at 3:03 AM | Permalink | Reply

        but we dont sample randomly – we have made this point ad nauseum as well

        • davideisenstadt
          Posted Feb 4, 2016 at 8:08 AM | Permalink

          you make my point for me.
          One of the fundamental underlying assumptions of statistical analysis is that once you set your criteria before you sample, for example, what species of tree, where the trees to be sample are, in terms of latitude longtime, altitude, relationship to existing of historic tree line, orientation (i.e. southwest exposure) you then sample randomly, or if possible universally (that is survey the entire population).
          You dontfirst sample,then look for trees that respond to temperature in your calibration period and discard those trees which provide you with an inconvenient signal.
          Your response is, in an nut shell what is wrong.
          I can only ask you again to employ an individual with a robust background in applied statistics, and pay them for an hour or two to explain to you just why your procedure is guaranteed to produce spurious correlations.
          Perhaps if you pay for someone’s time, you may take their advice.

        • sue
          Posted Feb 5, 2016 at 2:37 AM | Permalink

          Rob, your field is not the only one falling into this problem. Protocols need to be established beforehand and ALL data shown, even if it’s to explain why some data was dropped. http://www.buzzfeed.com/tomchivers/how-science-journals-are-hiding-bad-results#.osMyKK2r1o

        • sue
          Posted Feb 5, 2016 at 2:41 AM | Permalink

          Another interesting article, yet another field: http://edge.org/conversation/richard_nisbett-the-crusade-against-multiple-regression-analysis

      • Craig Loehle
        Posted Feb 4, 2016 at 9:53 AM | Permalink | Reply

        To reinforce David’s point, with R2 values around .2 to .5 (what I would consider very weak relationship) it is almost guaranteed with many series that you can get spurious correlations.
        The other problem is that in an exploratory mode, where you find that recent growth seems related to June precip and winter temperature (for example), you have formed a hypothesis. Note that you have not yet proven anything. But extending the relationship back 500 or 1000 years to derive a temperature timeseries you have added implicit ad hoc hypotheses that the relationship is stable over time and that the recent correlation is the true one, and not spurious. BUT you have not shown either of these to be true. It is castles in the air.

        • davideisenstadt
          Posted Feb 4, 2016 at 11:22 AM | Permalink

          Thank you for articulating this point far better than I am capable of doing.
          It is so tempting to mine a data set for correlations, especially now that there is no real cost( financial, or in terms of time or effort) associated with doing so…
          After all what can be wrong with testing every possible relationship in the world to see of one can get an R-squared of 0.5 or so for particular set of independent and dependent variables?
          There are so many examples of malfeasance associated with ignoring the basic underlying assumptions associated with statistical analysis that one can only observe that academia isn’t doing its job of insisting that people who use these tools actually understand the reasons they either are appropriate, or inappropriate.
          Is it so wrong to expect researchers to first identify the relationship they hypothesize, then articulate a protocol for collecting data, and in the event of encountering outlying data, a procedure for discarding that data, BEFORE mining it?
          How can it be that the entire dendroclimatological community so misapprehends, the most basic concepts integral to statistical analysis?
          An In-law of mine, a world renown geneticist, one with tens of thousands of citations, to his research, a person who travels the world to educate others,routinely employs statisticians to assist him.
          Are these climatologists better equipped to do this on their own?
          Wegman identified the teams’ reluctance to seek advice and counsel elsewhere a decade or so ago.
          Nothing has changed.
          Here we have a coauthor, who by his comments shows himself to be abjectly ignorant of fundamental assumptions related to statistical analysis…what to do?
          Its not like the editors of our renown peer reviewed journals seem to be better acquainted with these concepts than Rob.
          BTW: Im betting the he doesn’t have any more than the traditional two semester into sequence of statistics…you know…descriptive statistics and an intro to analysis of variance.
          It pretty clear to me.
          However, the unwillingness to read about the history of applied statistics, and an unwillingness to learn from the mistakes of others is simply unforgivable.

      • S. Geiger
        Posted Feb 4, 2016 at 10:55 AM | Permalink | Reply

        “That you first collect the data, run your regressions on it, and discard those time series that dont conform to your hypothesis means that youre not testing for any hypothesis.”

        – isn’t this the big disconnect? It seems they are already convinced the ‘hypothesis’ is correct (that trees can tell temps), and is not at issue anymore. They have moved on to finding those trees that demonstrably DO tell temps (well, in a couple of limited periods), and use them to come up with an out of sample temp history.

    • Don Keiller
      Posted Feb 3, 2016 at 6:14 PM | Permalink | Reply

      Rob, wouldn’t it be a lot less verbose and much clearer if you just said “We picked the data that showed what we wanted”.

      There, fixed.

      My pleasure, no charge.

      Steve: Rosanne D’Arrigo already said that as clearly as one could want.

    • mpainter
      Posted Feb 3, 2016 at 8:34 PM | Permalink | Reply


      A plausible justification for your methodology is what you offer. But it appears as weakly supported optimism, not old-fashioned scientific rigor.

    • Jeff Id
      Posted Feb 3, 2016 at 9:17 PM | Permalink | Reply

      Rob Wilson, after work today I took more time to reread your comment and believe I should have understood your response was to me – sorry about that.
      “Correct me if I am wrong, but your assumption is that all the trees we have sampled are behaving in the same way w.r.t. their response to temperature and other factors.”

      No, I completely understand that data is taken for different reasons in Paleoclimate. I also know it is used rather randomly by many in the paleo community despite the collection intent. If paleoclimate data were collected for an intent and then was used without sorting based on agreement with the predicatand, there would be no issue.

      “If I create a simple mean of the 31 sites, the r2 value is only 0.21 (DW = 1.54). Steve will likely say that this is the correct approach and this is the actual amount of variance explained by such trees in this region, but I would argue that that is nonsense as it does not take into account that some sites are more optimally located than others w.r.t. temperatures response.”

      Calling a simple mean of your data “nonsense” is rather unique statement in science. I’m a little sorry for the bold but it is an unusual moment. Yes I understand that you get a better answer if you sort out (regress away) the less agreeable data but we can all do that in every field. In the business world, it would be unusually exciting.

      I also understand that some tree sites are more optimally positioned to respond to temperature. What doesn’t make any sense whatsoever is the choice of preferred tree information by only support of correlation with the predictand. If you had some independent characteristic of thermometer-trees such as altitude, dryness, color, height, etc. by which they could be pre-sorted and a simple mean used, there would be no issue.

      Ok, there is an error factor in all regressions. Independent noise in the data. Rejection of any data based on correlation has an error component (noise) which will cause the algorithm to reject a percentage of otherwise good data and accept a percentage of bad. This noise component is very significant in tree rings. Moisture, CO2, bugs, frost, measurement angle, etc.. By definition, the noise is randomly correlated to temp so over a million subfossil trees, you would get a very flat arithmetic mean. If this noise plus temperature is sorted in recent years ONLY for correlation to temprature (whichever regression form you like) you get a blade. And the unsorted bit will average to zero creating the handle.

      Guaranteed hockeystick as long as the input data is noisy enough. No matter which regression method is used. My point is not nonsense, or we wouldn’t be able to prove it with random data having the same autocorrelation as tree rings. Again, please tell me where I’m wrong.

      What would be ok or even wonderful is a set of criteria which would identify thermometertrees which is independent of the measureed characteristics. Tallness, shortness, greenness, dryness, altitude, soil, bugs, age, or any combination you can imagine. That would be a true dendro-revolution. Science done right is exciting!

      • Jeff Id
        Posted Feb 3, 2016 at 9:36 PM | Permalink | Reply

        Rob W,

        Also, sorting (data plus noise) by correlation or other regression methods produces a guaranteed variance increase in the sorting period relative to history. I’m not stating that it sometimes does, or that it needs to be tested, it absolutely 100% guarantees increased variance in the screened time period.

        A fun link from my now distant past: https://noconsensus.wordpress.com/2008/10/11/will-the-real-hockey-stick-please-stand-up/

        • davideisenstadt
          Posted Feb 4, 2016 at 12:07 AM | Permalink

          Im afraid that your point will fall upon deaf ears.
          The entire enterprise is built on a foundation of post hoc selection.
          Without the ability to “calibrate” to the instrumental record, and discard data that dont correlate with it, what one has is a noisy relatively poorly correlated data set.
          Why cant you see the utility of choosing which magic trees to use?

      • Posted Feb 4, 2016 at 3:23 AM | Permalink | Reply

        I stated above that whether one uses the whole data-set of a screened sub-set, a PC regression approach will result in reasonable calibrations. These are not random data and most of these sites have been sampled from stands where growth should be limited by temperature variability. Some not however.

        But let’s keep this realistic, the calibrations are modest at best around 40%. I also agree that there is potential for inflation of the r2 value and that is why independent period validation is important to identify whether there is any over-fitting.
        So – there are two aspects to this work that you (CA readers) are missing:
        1. true independent validation of the reconstruction outside the screening period – Figure 3a – against early instrumental data.
        2. the sub-fossil chronology (Figure 7c) is a simple mean of the available RW data from the relic samples. This time-series clearly coheres well with the PC reg nested reconstruction – at least when replication is high.

        so finally – in the spirit of moving on – as I discussed at the end of the N-TREND paper, using ring-width data alone generally only leads to mediocre calibration. The measurement of tree ring density (or related variables) to compliment the RW data will improve the fidelity of the GOA reconstruction substantially.

        • Nathan Kurz
          Posted Feb 4, 2016 at 4:30 AM | Permalink

          Hi Rob. I love that you’re here and willing to engage with this critical audience. Thanks for sticking around! But I don’t know if you are understand yet the level of uneasiness that some of us feel about selecting sites based on correlation to instrumental data. From the focus of your clarifications, I think you still must be underestimating how viscerally your approach strikes some of us as perilous.

          I’m fighting to find the right analogy. Maybe it’s as if we heard a friend saying “Yes, I’ve had a few beers, but actually I drive better with some alcohol in my system”. Maybe what you are doing is safe in this particular case, but it scares us that it’s probably not and that you might not understand the consequences. Adding variables to improve the fidelity would be great, but only if it’s done by choosing the sites based on prespecified characteristics rather than “peeking” at the data.

          I thought David’s advice above was great: “print this article, and the comment thread, make an appointment with a statistics professor at a university near to you”. Do it privately and non-confrontationally, ideally with someone who isn’t already familiar with your field. Present it as “I think these people are overreacting because they don’t understand what I’m actually doing”. Maybe you are right, and your approach is justified in this case, but from the outside anything that involves choosing only “verified” sites feels really dangerous.

        • davideisenstadt
          Posted Feb 4, 2016 at 8:11 AM | Permalink

          regressing on density doesn’t in any way solve the problem created by your sampling calibration and screening protocol.
          The problem is with post hoc screening. whether one is measuring tree ring width, density, or any other factor, the fact that you calibrate and then discard is the problem
          Willful obtuseness isn’t a good quality for any adult to possess.
          Im sorry to ask this question, but your response basically demands this:
          Do you employ a professional statistician, someone with a degree in statistics to oversee and advise you on your application of these tools?

        • Layman Lurker
          Posted Feb 4, 2016 at 10:18 AM | Permalink


          These are not random data and most of these sites have been sampled from stands where growth should be limited by temperature variability.

          Not correct. Each (even meeting the most impeccable ex ante criteria) sampled TR series is composed of signal plus noise. I suggest subtracting each TR series from the known ‘signal’ temperature in the calibration period. Then run these ‘noise’ components through your calibration and validation regressions. If you are ‘catching’ any of this ‘noise’ in your method then you will know you have produced a biased result. The ‘selection bias’ in this case is a result of selecting ‘noise’ which correlates to temperature. Simple averaging (no data screening) allows the unbiased composite noise (providing sufficient replication) to cancel to a slope of 0. Since calibrated noise does not cancel to zero slope this forcibly attenuates the ‘signal’ expressed in the reconstruction.
          HT RomanM: https://climateaudit.org/2012/06/17/screening-proxies-is-it-just-a-lot-of-noise/

        • Jeff Id
          Posted Feb 4, 2016 at 11:00 AM | Permalink


          “But let’s keep this realistic, the calibrations are modest at best around 40%. I also agree that there is potential for inflation of the r2 value and that is why independent period validation is important to identify whether there is any over-fitting.”

          The problem isn’t a ‘potential’ for inflation of R2, it is a mathematically guaranteed inflation 100% of the time.

          The validation you refer to is often a validation that you have accurately sorted an upsloping dataset. You could argue that you have selected true temperature sensitive trees but the validation windows and calibration windows are often too close to be truly independent data but even if we make the assumption that you have correctly identified thermometer trees from the noise in the calibration period, you have amplified the agreeable noise portion of these trees – often dramatically – by mathematics that are not justified or vetted. The validation process even perfectly done does not change this fact in any manner whatsoever.

          In my experience you can even pass some of the “validation” tests using random data simply due to the autocorrelation. M08 had a cute method in it but it is outside of the scope here.

          If we assume that the amplification of the good noise were minimal, which random autocorrelated data will show it is typically not, we then have to remember that no real attempt to sort older subfossil samples is made (or even possible) and that is the #1 point I am making. The noise which your well chosen cores in the calibration range all exhibit, is averaged out in these older (typically more numerous) cores. By treating the two datasets of the series in a statistically different fashion, you have literally guaranteed a higher variance signal in recent years with a proportionally flatter handle. This is true even in highly temperature correlated datasets when multiple series are individually scaled to temperature.

          Any regression method you choose creates the same problem. That is why a simple average of all data is really the best you can mathematically do until some other physical feature of the thermometer trees allows you to pick them equally through the entire length of the series.

          Even in manually choosing not to use sites which were expected to be temperature sensitive but for some reason exhibited ‘divergence’ you are creating this same issue and that particular sorting is the most difficult to quantify – so NO attempts at this that I know of are being made in the field. I have a very hard time understanding how a scientist cannot easily see the problem, make tests on random but autocorrelated data and step back to consider the ramifications in this field.

          Seeing 54 series, with most having these horrific statistical issues averaged together having thermometer information regressed into the data on the end, was very disappointing from my perspective. Everyone should be very interested in understanding the magnitude of the problem and I expect a recognition that simple average is the best we currently have available. Use PC1 if you want to get fancy but it makes little difference.

          Do you want to see how well a perfectly random autocorrelated dataset will overlay on Wilson2016 when sorted by some form of regression? It will be a nearly perfect match. The only defense would be well you rejected X percent of the data and we rejected far less so there is some signal, but due to the nature of site pre-selection in the field (to avoid “divergence”, we really don’t know how many tree series were actually rejected. Other papers have attempted this argument but failed miserably on closer examination indicating that even their pre-selected series had no statistically verifiable signal. A surprising result to me at the time because I’ve seen very high correlation MXD datasets, we know trees respond to temp etc….

          Dr. W, I do thank you for your time and hope you will continue to stop by once in a while, however I am completely unconvinced that your field has an understanding of the seriousness of this problem.

        • kenfritsch
          Posted Feb 4, 2016 at 3:07 PM | Permalink

          1. true independent validation of the reconstruction outside the screening period – Figure 3a – against early instrumental data.

          A truly out-of-sample test of a theory developed with in-sample data is always a good practice but unfortunately difficult to execute in practice.

          There is the problem of selecting the overall time period when the agreement between tree ring and temperature is reasonable and thus the out-of-sample data is tainted also by ex post facto selection and as it would be if one peeked at the out-of-sample before committing to the in-sample or simply by not using data that fails out-of-sample. Often telling is to compare in-sample and out-of-sample correlations.

        • foias
          Posted Feb 4, 2016 at 3:28 PM | Permalink

          Denrob writes of “aspects to this work that you (CA readers) are missing:
          1. true independent validation of the reconstruction outside the screening period – .. – against early instrumental data.”

          This is a very good point in principle. Any amount of statistical skullduggery can be negated by rigorous cross-validation of an ‘algorithm’.

          Unfortunately, this requires enough good quality data and tests with high power for the cross-validation to be meaningful. C40 yrs autocorrelated data is just not going to work for standard statistical methods. As distinct from low-power, dodgy tests used to exploit the way testing is biased towards the null hypothesis (validation). This is the problem in proxyland – this way out seems closed by data availability. Consequently, proxy selection has to be done without data mining etc etc.

        • Posted Feb 4, 2016 at 8:37 PM | Permalink


          Validation —

          then there are those who select a time window at the early part of temp measurement, and a time window at the end part of temp measurement to use for “calibration” and then look an unmolested middle to surprisingly note an upslope.

          Points for those who find the correlation. Additional points for those who don’t.

        • Don Keiller
          Posted Feb 5, 2016 at 3:18 PM | Permalink

          Which makes the vast leap of Dendroclimatogist faith – which you force upon us- that your carefully selected “treemometers” which show a correlation with local temperatures (or in Mann’s case “teleconnect” with the world temperature field) continue to show this relationship with temperature outside the calibration period.

          One of the first thing I teach my students is about the danger of “spurious correlation”.
          Please explain why this is not the case with your carefully, post-hoc selected trees?

    • Steve McIntyre
      Posted Feb 4, 2016 at 12:32 AM | Permalink | Reply

      Rob, the primary issue in this note was Wiles et al, 2014 (of which you were a coauthor), not Wilson et al 2007, which was mentioned in a postscript and largely because of my frustration with false information about D’Arrigo et al 2006. In terms of the main issues of the post – at least as I intended it – could you comment on three points:

      1. can you confirm that the decision to replace D’Arrigo et al sites with the Wiles et al 2014 was an ex post decision?
      2. will you archive the missing data related to Wiles et al 2014? Was all the listed data used in the RCS calculation? Was there anything odd in the calculation methodology?
      3. Do you agree that the cores used in D’Arrigo et al 2006( the 820 cores) were a different dataset than the cores used in Wilson et al 2007? Will you amend the SI to D06?
      Regards, Steve Mc

      • Posted Feb 4, 2016 at 3:46 AM | Permalink | Reply


        1. can you confirm that the decision to replace D’Arrigo et al sites with the Wiles et al 2014 was an ex post decision?
        2. will you archive the missing data related to Wiles et al 2014? Was all the listed data used in the RCS calculation? Was there anything odd in the calculation methodology?
        3. Do you agree that the cores used in D’Arrigo et al 2006( the 820 cores) were a different dataset than the cores used in Wilson et al 2007? Will you amend the SI to D06?

        Ellsworth Glacier (EL)
        Rock Glacier (RG)
        Water Supply (WS)
        Wolverine Glacier (WV)
        Tebenkof Glacier (TB)
        Miners Well (MW)
        Nichawak Mountain (NK)
        Cordova Eyak Mountain (CV)
        Massive Rock near Cordova (MR)
        Rock Tor (RT)




        Steve: Rob, I know that these sites were used in D’Arrigo et al and figured out that the statements in the D’Arrigo et al 2006 SI was incorrect in saying that the same cores were used in Wilson et al 2007. My suggestion on this point was that you correct the incorrect statements in the D06 SI for any future readers, in the possiblity that someone other than me might actually read the SI.

        • Steve McIntyre
          Posted Feb 4, 2016 at 2:33 PM | Permalink

          Rob says:


          One of the problems with saying this is that it took more than 8 years to archive the measurement data and chronologies for D’Arrigo et al 2006, so that it was impossible to comment on it at the time.

          Because most topics have a sequential development, it’s better to approach the new study with a thorough understanding of the original study, to see what’s the same and what’s different. If you wanted timely discussion of D’Arrigo et al 2006, then the authors should have made data available in 2005 when I originally requested it. (I know that you did not have control of this personally and do not fault you personally, but as a field, you cannot reasonably complain.)

          Further, a considerable amount of measurement data for the new study is unavailable. I haven’t itemized the missing data, but, for example, haven’t located anything very much for the Cook Asian series. It would be best if your SI for Wilson et al included url’s to original data for each measurement data set.

          Hopefully, the new measurement data for the new study will not take another 10 years to archive. By which time, the rinse cycle will no doubt repeat.

    • Michael Jankowski
      Posted Feb 6, 2016 at 3:15 PM | Permalink | Reply

      “…So – in my 2007 paper, I screened the 31 sites and 22 expressed a significant correlation with Jan-Sep temperatures…If I had used all 31 chronologies, the results actually would be better with an ar2 of 0.49 (DW = 1.98)…”

      This has to be one of the most amazing things I have ever read.

      You took 31 sites and screened based on correlation with temperature, and ended up with 22 sites. Ok.

      So you ended-up with 22 essentially “thermometer” sites to generate results from and 9 essentially “non-thermometer” sites that were excluded.

      And now you’re telling us that you actually get “better results” if 9 the non-thermometer sites are added to the 22 thermometer sites rather than just using the 22 thermometer sites?

      That should scream that either (1) you’ve made a mistake, (2) your screening methods were garbage, (3) your processing methods were garbage, (4) the results are garbage, or (5) some combination thereof.

      • Ogden Wernstrom
        Posted Feb 6, 2016 at 6:12 PM | Permalink | Reply

        I think this is an unfair reading of what dendrob said. I believe the process he is describing is not cherry picking in the sense of picking only the trees within a location data set that correlate, or rejecting sites completely based on correlation versus physical properties. I think this is more or less what he was trying to communicate.

        1.) Find all data sets in a region.
        2.) Discard entire location data sets if they don’t meet physical criteria (IE, we expect temperature sensitivity when near tree line, so completely reject locations that are not near the tree line).
        2.) Perform statistical analysis on remaining location data sets.
        3.) Maybe include previously rejected data sets and re-run analysis for comparison.
        4.) Note as a point of interest that some verification stats improve with addition of “noise” location data sets into the process.
        5.) Look for hints as to how to improve collection and selection in the future, consider why effectively adding noise had this effect.
        6.) Write about it.

        I don’t think anything about that was particularly unreasonable. We’ve certainly seen where noise can cause spurious correlations and you have to start with some assumptions in order to structure an experiment and analyze data, and if one of those assumptions is “completely reject data sets from inappropriate locations based on known physical properties of that location” prior to starting analysis, then that seems reasonable. “Discard known problem species” is an acceptable selector too I’d say, as long as you are applying such criteria on a consistent basis and are performing your data collection phase prior to looking for correlations. As long as it’s a physical process that defines which data sets make it into the meta-analysis in advance, it seems fine to me. Otherwise you’d be forced in doing a meta-analysis to accept obviously bad or weak data sets such as “Sample size is only 6 and was collected by groping in the dark.”

        My expectation is that the process he is describing isn’t this:

        1.) Check data set for location for correlation with measured temps.
        2.) Reject badly correlated locations
        3.) Perform etc etc

        Even more problematic:

        1.) Check individual trees in each location set based on correlation with measured temps
        2.) Reject uncorrelated trees.
        3.) etc.

        I think dendrob is describing the first set of operations and not the latter two.

        • Michael Jankowski
          Posted Feb 6, 2016 at 10:07 PM | Permalink

          “Unfair reading?” How? He didn’t say anything about discarding sets based on physical criteria in that example. He distinctly stated, “I screened the 31 sites and 22 expressed a significant correlation with Jan-Sep temperatures. I used those for further analysis.” So the screening was based solely on expressing “a significant correlation with Jan-Sep temperatures.” Clearly what he said is that the 9 rejected sites did not “express a significant correlation with Jan-Sep temperatures” and were not “used for further analysis.” So among the items you think he said, your items 2(a?), 2(b?), and 3 look like garbage, and 4-6 are guesses on your part.

          On the other hand, your “expectation” of what he “isn’t” describing items 1-3 are literally exactly what he described!

          But let’s go back to 2007 as well. You can visit the actual publication here http://www.geos.ed.ac.uk/homes/rwilson6/Publications/Wilsonetal2007a.pdf . It clearly states, “Having identified this optimum season, the final reconstruction was developed using only those chronologies that correlated (1899–1985, the common period of the tree-ring and instrumental data) with this season at the 95% confidence limit at either lags T and/or T + 1—the latter taking into account the effect of previous year’s climate upon growth (Fritts 1976). Table 1 lists the correlations of each of the chronologies with January– September GOA mean temperatures for both lags, and highlights those series that were utilized in the development of the reconstruction. Nine chronologies were excluded from further analysis, as they showed no significant correlation with this season.”

          So again, even in the actual paper, Wilson describes correlation as the sole reason for selecting the 22 series as the only series used for further analysis and discarding 9 series that “were excluded from further analysis” (i.e., not “maybe included previously rejected data sets and re-run for comparison” – they were tossed-aside and that was that). He does give physical reasoning for why he thinks they may not have correlated, but clearly identifying some sort of physical criteria was not a reason for exclusion.

          There’s absolutely no substantiation for arguing, “I think dendrob is describing the first set of operations and not the latter two” with respect to this publication, and there are in fact TWO explicit first-hand descriptions that say otherwise.

          So now that it has been established that my reading was completely “fair” while yours was imaginary, let’s go back to your point about how “some verification stats improve with addition of ‘noise’ location data sets into the process.” You have 31 data sets and determine 22 to be “temperature-sensitive” and 9 to be “temperature-insensitive” by comparison. If including the 9 “temperature-insensitive” series improves the results over using the 22 “temperature-sensitive” alone, you don’t see that as raising some major red flags? These 9 sites aren’t just “red noise,” either. They were explicitly excluded for statistical reasons.

        • mpainter
          Posted Feb 6, 2016 at 10:30 PM | Permalink

          It can be concluded from dendrob’s statement that his selection criteria were faulty,that is, his screening assumptions were not borne out by the results. I think that he himself needs to clarify his meaning before we can say that this conclusion does not obtain.

  9. kenfritsch
    Posted Feb 3, 2016 at 2:02 PM | Permalink | Reply

    I am afraid that I would have to give failing grades to the last 3 posts – for missing the point about ex post facto selection.

    • HAS
      Posted Feb 3, 2016 at 2:24 PM | Permalink | Reply

      If you don’t assume anything about the relationship with latitude/elevation but allow for it as a vble in your model in some form it isn’t clear how that fails ex post facto selction

      • HAS
        Posted Feb 3, 2016 at 2:26 PM | Permalink | Reply

        “fit the whole lot” = use all the observations

    • davideisenstadt
      Posted Feb 3, 2016 at 2:28 PM | Permalink | Reply

      I have commented on this previously…
      The fact that Rob “calibrates” his time series against the very instrumental record he wishes to emulate, and discards the data that dont conform to this calibration period is in and of itself a form of ex post selection of data.
      There exists no better method to uncover spurious correlations than that which Rib employs, with absolutely no shame.
      That these guys dont even go to the Statistics departments where they teach for a second opinion is even worse.

  10. MikeN
    Posted Feb 3, 2016 at 2:47 PM | Permalink | Reply

    Is there an alternative defense given as to why skeptics are misunderstanding the quote from DArrigo about picking cherries as is given about this quote from Esper?

    “However as we mentioned earlier on the subject of biological growth populations, this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.”

  11. kenfritsch
    Posted Feb 3, 2016 at 4:57 PM | Permalink | Reply

    I did not want to post the names with those failing grades on bulletin board and thus I will say that I give a failing grade to those who continue to talk about an ex post facto selection process and how one after the fact selection method might be better than another.

    • davideisenstadt
      Posted Feb 3, 2016 at 5:06 PM | Permalink | Reply

      you are far too harsh…
      if you want a tasty pie, you have to find the best method to identify and discard the sour cherries.
      Really now.
      “The ability to pick and choose which samples to use is an advantage unique to dendroclimatology”
      One should pick and choose before one mines the data, not after.
      Other than that, everything is quite fine.

  12. Geoff Sherrington
    Posted Feb 3, 2016 at 11:24 PM | Permalink | Reply

    Using simple graphics for the period 1999 to end of this graph shown, the red curve can be given a small vertical stretch and lift, to overlay the black curve in blue, with some minor squiggles remaining.

    This invites the interpretation that a simple mathematical operation was used on their near-final data to reduce the droop, rather than a recombination of data from different selections of trees. (I have time constraints, otherwise I would do this digitally.)

  13. Greg Wiles
    Posted Feb 4, 2016 at 8:09 AM | Permalink | Reply

    Thank you for the discussion. I have sent in the Mount Wright living ring-width chronology into the ITRDB, thanks for pointing out that we had not yet archived it. Our work in the Glacier Bay/ Juneau Alaska region that was published in Jarvis et al. 2013 (listed below) was designed to examine how trees responded to temperature with elevation. In this study we learned more about the how forests there are responding and if we use them in temperature reconstructions we are not overestimating or underestimating changes. Thus the Wiles et al (2014) paper was informed by this work.

    Jarvis, S. K., Wiles, G.C., Appleton, S.N., D’Arrigo, R.D. and Lawson, D.E., 2013, A warming-induced
    biome shift detected in tree growth of Mountain Hemlock (Tsuga mertensiana (Bong.) Carrière)
    along the Gulf of Alaska. Arctic, Antarctic and Alpine Research 45, DOI 10.1657/1938-4246-

    • miker613
      Posted Feb 4, 2016 at 9:35 AM | Permalink | Reply

      I appreciate you folks showing up here to discuss this.

      • Posted Feb 4, 2016 at 1:41 PM | Permalink | Reply

        Ditto, much appreciated. There are many readers with science backgrounds who don’t comment here but who do follow the discussions. Comments by the authors are very helpful.

    • Steve McIntyre
      Posted Feb 4, 2016 at 2:22 PM | Permalink | Reply

      Greg, thanks for the cordial response. While you’re housekeeping, you should also archive the Eyak Mountain updated measurement data.

    • Steve McIntyre
      Posted Feb 4, 2016 at 2:41 PM | Permalink | Reply

      Greg or Rob, Can you put : Jarvis et al 2013 online or email me a copy? Thanks, Steve

  14. Mark Gilbert
    Posted Feb 4, 2016 at 10:37 AM | Permalink | Reply

    Much respect for authors in the discussion. Adult and cogent debate is greatly appreciated and so rare. Real science is a beautiful thing and should be encouraged.

  15. Bill
    Posted Feb 4, 2016 at 4:29 PM | Permalink | Reply

    Look I am far from a scientist or a statistician, but if I understand the basic issue, how are any of these dendo reconstructions that toss out trees solely for not matching the instrumental temperature record valid?

    I mean, it seems you need a hypothesis, say “All blue rocks at exactly 2000 feet altitude contain rings that when analyzed match the temperature for the last 1000 years.” Then, you go and sample 1000 defined blue rocks. 100 of these samples get tossed because they are corrupted for some reason. No problem there. Then you run the other 900 and find that they do not correlate with temperature very well. There is a lot of noise.

    So then, you look at the individual rocks, and find 100 that do not match recorded temperature and toss them. After you do that, you run the test again and get a good correlation with temperature. How is the correlation valid? You tossed 100 rocks for no good reason other than they did not match temperature. How can you do that? Have I missed the explanation somewhere as to why otherwise valid samples are excluded simply because they do not match the expected result?

    What should be said is “We tried our hypothesis but it failed. We will try again. Blue rocks may be a pretty good marker for temperature but at this time we have no way of knowing a priori which rocks to use, therefore we have no statistically significant results to report.”

    If I have it right, that just blows my mind.

    • davideisenstadt
      Posted Feb 4, 2016 at 5:56 PM | Permalink | Reply

      Bill: I only taught statistics for a decade or so, so I might not I count, but yeah, thats pretty much what passes for modern statistics in the “dendro” community.
      When D’arrigo says “if you want to make a cherry pie, you have to pick cherries” thats kinda the practice she endorses.
      Crazy, eh?

  16. GD Holcombe
    Posted Feb 4, 2016 at 4:30 PM | Permalink | Reply

    Also here as a non-science background individual (attorney) who has done a lot of reading on climate–and has tried to inform myself as much as reasonably possible on the paleoclimate reconstruction disputes. A few thoughts:

    –The level of knowledge and analysis on this subject on this forum, starting with Steve, but including a great many of the commenters, never ceases to amaze me. I learn something every time I get on this page.

    –As someone who has (a) been smeared in my local paper a few times by Michael Mann (who apparently has a Google alert for every published criticism in every corner of the world), and (b)been put off by the constant sarcasm and and “thin-skinnedness” on display over at places like “realclimate” (especially by Mann and Gavin Schmidt), I have to say that I’ve been very, very impressed by the willingness of Rob Wilson (and Tim Osborn and Greg Wiles) to engage on this site. Yes, I am more convinced by the skeptics arguments, but I learn something every time anyone with a solid scientific or statistical background takes the time to engage–whether I agree with him or not.

    Thank you, Steve, for the tremendous service you’ve provided since taking up the subject of climate, and thanks to all who participate in an informed and civil manner on this page.

  17. Don B
    Posted Feb 4, 2016 at 6:55 PM | Permalink | Reply

    From 2012, a haiku from Lucia:

    Screening fallacy:
    If you sieve for hockeysticks
    that’s just what you’ll get.

    and a cartoon from Josh:


    • mpainter
      Posted Feb 4, 2016 at 8:29 PM | Permalink | Reply

      “A few good trees”

  18. Green Sand
    Posted Feb 4, 2016 at 7:20 PM | Permalink | Reply

    I am not a scientist, even less a statistician but early in life I came to appreciate the involvement of critics/auditors/inspectors.

    Having waited months for one of Her Majesty’s factory inspectors to reply about incorporating further safety kit/practices whilst carrying out the relocation of a concast plant. Late, very late, within days of starting production we received a compulsory notice of improvement. This delayed a major investment and irked as there was no statuary requirement for our original decision to involve the Inspectorate.

    I took the flack, from all sides. The plant started 2 weeks late, 2 days later a set a circumstances led to an inexplicable decision and a considerable explosion. Lot of damage, uncontrolled combination molten metal and water – big bang!

    The consequences – one twisted ankle! What a result and all down to an inspector, auditor, who during his research found instances of similar situations in Europe some of which had resulted in mortality.

    I am convinced without the involvement of a self imposed auditor I would have had to explain to at least two families why their fathers would not becoming home that day.

    Later in a mechanical engineering design life I again found my greatest friend/ally was my auditor, the third party inspector etc. In life hazard design you welcome their insight and you chase down their fears, because like me you may remain forever grateful for their insight.

  19. Craig Loehle
    Posted Feb 4, 2016 at 7:22 PM | Permalink | Reply

    Let me try to capture the logic of the dendros. Some trees respond to temperature, just like some rocks are magnetic. You use a magnet to find the magnetic rocks, and temperature correlations to find the responder trees. A magnetic rock is always magnetic and a responder tree by virtue of site conditions is (and was) always a temperature responder. While this seems sensible, it is not, because a rock is inanimate and a tree is living, so the analogy is only that.
    This ignores the factors I mentioned above that can cause a tree to SEEM to respond to temperature recently (because a neighbor tree died, letting it grow faster, or it is strip bark, causing compensatory growth) but it was not so in the past. This is even more true when the “signal” has R2 of .2 to .4.
    Science is full of cases where the logical, sensible thing is not how nature actually behaves, and assuming that trees that respond to temperature now always did so is one of those things. The proof of this is how wildly different the curves in spaghetti graphs are (see latest and previous IPCC reports).

    • Posted Feb 5, 2016 at 3:47 AM | Permalink | Reply

      Craig, we are in full agreement. Really – but the difference between the dendroclimate community is that we are fully aware of the issues and are trying to move forward and address them, while many sceptics see only problems and want to debunk tree-rings as a useless climate proxy. WRONG!

      Ring-width alone is really a poor proxy of past temperatures. At best RW never explains more than 40% of the local temperature variance. MXD, whether used alone of combined with RW always improves the calibration substantially. However, there are only few labs in the world with the facilities to measure this variable and that is why I have pushed the development of the related Blue Intensity parameter.

      Enough time spent on this. I end with this paragraph from the conclusion of N-TREND2015 paper. Stay open minded my friends.


      How well should a TR reconstruction calibrate locally before it can be considered for inclusion in such a large-scale data-set? Wilson et al. (2007) specifically used a minimum correlation of 0.4 (only 16% explained variance) against local gridded temperature data, but this value is still rather low, and for realistic reconstructions of local, regional, and hemisphere-scale temperatures, greater fidelity should be required, especially when the number of input series is modest. Of the records used herein, local scale calibration r2 values range from ~10-70% explained variance (Figure 1) with all RW based records explaining less than 40% variance. Considering as well, the well-known lagged high frequency biases of RW data due to biological persistence (Krakauer and Randerson, 2003; Frank et al., 2007b; Anchukaitis et al. 2012; Esper et al. 2015), it could be argued that RW derived reconstructions should not be deemed robust estimates of local temperatures without the inclusion of (or replacement with) MXD or BI data. This is a rather contentious statement but it is more defensible if the study focus is on addressing past climate response to volcanic forcing. At the very least, whether RW is used alone or in combination with MXD/BI data, local calibration needs to express “reasonable” fidelity and multiple statistical metrics exist to assess reconstruction quality (e.g. r2, RE, CE and residual analysis – Cook et al. 1994; Wilson et al. 2006; Macias-Fauria et al. 2012). As a minimum, local climate calibration for any single TR parameter or combination of multiple TR parameters must be statistically significant with no significant long-term trends within the calibration residuals. Any TR record expressing local based divergence (D’Arrigo et al. 2008) should not be considered in a large composite database unless the cause of the divergence can be truly identified as unique to the recent period (Wilson et al. 2007).

      • HAS
        Posted Feb 5, 2016 at 4:26 AM | Permalink | Reply

        What I don’t understand is what happens if you go to an area, core a random sample of trees, take metrics from them along with environmental measures for the instrumental period, assuming there are reasonably robust growth models for species that relate these vbles, test the fit of the model over the period for which information is available, holding out data, test the resultant model out of sample, and move from there to any reconstruction.

        If the species model doesn’t fit; or the out of sample test fails; or there are too many unknown independent vbls to do a reconstruction; or the relationship with temp is not present or significant, try ice cores instead.

      • Salamano
        Posted Feb 5, 2016 at 7:01 AM | Permalink | Reply

        The best places to go to understand climate science (in my opinion) is where specialists openly engage, and where differences of opinion can be expressed smartly. It probably takes a lot of moderating to make it happen, but also willing parties. The alternative is an echo-chamber that denigrates/straw-mans alternative views, or perhaps a place where no dialogue is even allowed.

        To that end, I must say that in a small (small) way, the ‘ambiguity’ DOES keep me up at night:-)

        I see this:

        —“So a basic rule is that high latitude/elevation tree-line will be controlled predominantly by temperature and likewise low latitude/elevation tree-line will be controlled predominantly by moisture availability.

        The 31 chronology network for the Gulf of Alaska is a rather mixed network of sites from low to high elevations and different species. We cannot expect them all to respond similarly to climate and as stated other factors may influence growth. Not all of these sites were sampled specifically for dendroclimate analyses. Greg Wiles is a glaciologist and some sites were developed purely for his dendrogeomorph dating etc.

        So – screening is one often used method to identify the sites that best express the “desired” signal.”—

        And come away with a few things:

        1. Specialists agree that trees do indeed demonstrate a thermometer quality, currently understood by site qualities of latitude and elevation.
        2. But, at the same time, the raw data that is obtained is seemingly from all over the place, where it is simply not knowable which specific cores are the ones deemed in-advance to be specifically correctly located — Only that ‘generally’ they come from an area that is ‘generally’ accepted to feature temperature-sensitive trees.

        3. Taking both together, a screening process (comparing them to observed temperatures post 1850) will winnow out those that diverge…BUT, with the argument then being that the remaining cores are thus deemed to be the ones at the right elevation/latitude/species, based on point #1 — even though this is not known to be certain. Yes?

        That, to me, is a legitimate conundrum, but one that is solvable with more precise coring adventures that would eliminate the need for screening. It certainly sounds reasonable to accept the specialist views’ on what qualities make a tree properly sensitive to temperature. And, the agreement of those cores with the co-located sub-fossil record from earlier dates is a good point-in-favor.

        It just seems like there’s much of a knuckle-ball going on though– where, because a tree passes a screening it MUST be because it exhibits any/all of the qualities known to be good for a tree-mometer, and if it doesn’t it MUST be because it doesn’t, but the precise reality is not actually known for each individual tree in some of these samples. I guess in some respects it’s the best that we’ve got for now, but given the stakes it should be enough to keep anyone up at night.

        Hopefully this new “Blue Density” measurements will indeed reduce this ambiguity, but also perhaps some new trips to get cores where 100% of the samples are 100% targeted using current dendro understanding on temperature sensitivity, so we can see how the chips fall without the need to screen.

        • davideisenstadt
          Posted Feb 5, 2016 at 7:12 AM | Permalink

          If some of the chronologies were developed with purposes other than temperature reconstructions, why not prescreen for altitude, location, orientation, species, and whatever other characteristics are thought to be indicative of a tree ring series reflecting “predominantly” variance in temperature?
          The act of screening via regression on the instrumental record is a questionable enterprise.
          I would gladly accept a specialist’s opinion on what makes a tree predominantly sensitive to temperature, but not the application of post hoc screening by a specialist, or any other person.
          Why should one of two similarly situated trees convey an accurate temperature signal, and the other not?
          until one can answer that question, the use of post hoc screened time series should be discouraged.
          The fact that similarly situated trees do not exhibit similar behavior indicates (to me) that they are poor proxies for a precise, accurate temperature reconstruction

      • Craig Loehle
        Posted Feb 5, 2016 at 9:31 AM | Permalink | Reply

        While I appreciate your reply, let me note that neither you nor any dendro has answered the complaint that the assumption of stationarity has never been tested. For the Alaska data, we have reason to believe that the Pacific Ocean circulation has a big impact, but the pattern of this has changed over centuries. Also in Alaska, peat can build up on certain sites, decreasing tree growth. Conversely, recently exposed soils (from receding glaciers) can become more suitable for trees over centuries. Permafrost can melt over the past 1000 years, changing drainage. In high elevation/latitude locations, drainage from glaciers may provide moisture for the trees but this will depend on the growth/retreat of the glacier. Thus there are many many reasons to believe that tree growth vs temperature might not have a constant relationship.
        As to “moving forward”, there are times to move forward and times to admit your data are confounded. Just moving forward is admirable for a snowplow but not for a scientist. Testing new metrics like density or the blue something you are working on is great, but until you can get true validation data (ie tree growth vs temperature 400 years ago) you are left with huge uncertainty in historical reconstructions, and when you only start with 0.4+ correlations and different reconstructions create a spaghetti graph with wildly different histories, you are still in the exploratory phase.

        • Posted Feb 5, 2016 at 12:32 PM | Permalink

          Craig Loehle:

          You raise many valid points that question anyone’s ability to produce a valid temperature record from tree rings. For the Gulf of Alaska, in particular, I started jotting down my similar questions as I hiked through the various papers.

          The reconstruction period in Wilson, et al., (2007) is January-September. The number of daylight hours in Seward, Alaska ranges from about 6 hours in January to almost 19 hours in July.  However, the number of “full sun” hours is only about one quarter of the “daylight” hours on any given day. Site-specific shading/exposure would further complicate matters. Tree-line max-min temperatures in winter (even in the relatively temperate climate of the Gulf of Alaska) hover at or below freezing. Snow cover periods are highly variable. In the maritime climate of the Gulf of Alaska, elevation largely controls date of first snow as well as snow melt dates. So I’m at a loss for a biological explanation as to how including winter months improves your high altitude tree-ring temperature signal. Regardless, Wilson (2007) found the “best” correlations by extending their reconstruction into January:

          Correlation analysis and PCR calibration trials between monthly temperature series and the individual chronologies identified the optimum season for reconstruction as January–September.

          Temperature estimates are devised from regional weather stations.  In the case of the Gulf of Alaska, temperature records are taken much closer to sea level than most of the tree sites. This relies upon the assumption that temperature trends at higher elevations match those at lower elevations.

          Snowfall (and therefore total precipitation) varies greatly across the Gulf of Alaska — although you might not know that from measurements taken only at port towns (which cluster at the lower end of snowfall ranges). However, in defense of temperature reconstructions, precipitation is not likely to be a decadal-scale limiting factor at most sites in the moist Gulf of Alaska region.

          Nevertheless, climate variables diverge rapidly in this region based upon elevation and mountain sheltering effects. Tree site microclimate effects (such as cold air moving downslope, wind exposure, snowpack depth and melting date, etc.) typically are not directly controlled for. 

          Trees are living organisms that respond to stresses and stimuli on a daily basis.  Precipitation, growing-range temperatures, predation, light and root competition, micronutrient availability, disease, etc. all have an impact  Efforts to account for these interacting/conflicting effects seem to rely heavily on screening for (presumed local) temperature correlations. I’m not yet convinced that produces satisfactory results.

          Perhaps a minor concern, but It is also unclear to me how dendro studies account for volcanic influences in this region.  For example, there is a demonstrated suppression of tree ring width immediately following an eruption that is typically followed by a “rebound” effect of more rapid growth for up to a decade. Wilson, et al. (2007) notes that their reconstruction only weakly correlates to the mid-1920s “climate shift” in the Gulf of Alaska. Perhaps the tailing off of rebound effects from the 1912 Novarupta eruption had a role to play? However, given that they focus on multi-decadal trends, perhaps volcanic ash effects come out in the wash, so to speak.

          As fascinating as all of these questions may be, I, like you, have little confidence in the asserted value of tree-temp proxy studies — particularly when they discuss tenths of a degree differences in temperature trends.


      • Steve McIntyre
        Posted Feb 5, 2016 at 9:41 AM | Permalink | Reply

        Rob says:

        the difference between the dendroclimate community is that we are fully aware of the issues and are trying to move forward and address them, while many sceptics see only problems and want to debunk tree-rings as a useless climate proxy. WRONG!

        At this site, this is a bit of a cheap shot. On numerous occasions, I’ve discussed what I believe to be the underlying statistical issues of making a chronology – which I interpret, in statistical terms known to the outside world, as being a very complicated random effects problem. I’ve used random effects techniques to calculate “chronologies” in what appears to me to be a more sophisticated and more intellectually supportable method. I have not detected a shred of interest in the dendro community in such issues and dispute the claim that it is “fully aware” of such issues.

        Secondly, I do not believe that the dendro community fully understands the problems of ex post selection and ad hoc data manipulation, or it wouldnt condone it. Can you seriously read the Appendices to Briffa et al 2015 on Yamal/Polar Urals and not wince?

        Thirdly, there is a considerable amount of generalized complaining by some commenters about tree rings in general. However, I regularly discourage such commentary and frequently delete overly-complaining comments and ask commenters to stick to narrow issues. I can understand that, from your perspective (and from mine too), I leave too many merely comments, but I do make an attempt. In this thread, there are numerous substantive comments that rise well above complaining, including comments from people highly familiar with the statistical issues.

        Thanks for speaking your mind, Steve

        • Posted Feb 5, 2016 at 10:44 AM | Permalink

          I dind’t want to answer his comment because while I do recognize that trees MUST react to temperature, I am not certain that the growth data being collected can be used as a temperature proxy. There are some really nice correlations in certain MXD datasets.

        • HAS
          Posted Feb 7, 2016 at 4:45 AM | Permalink

          Jeff Id: “I am not certain that the growth data being collected can be used as a temperature proxy.”

          I think this draws attention to a possible confusion in some of the comments here. It seems to me there are two essential sequential steps in any analysis of what a population of trees are telling about local temps/climate and what they can tell us about temps/climates past.

          The first is to establish that the population at large at that location in fact conforms to the growth models of the species (and to calibrate, if necessary, parameters in those models). This requires looking at the whole population, and if the growth models are any good they will include structures that adjust for characteristics that counteract or diminish the impact of temp etc in individual trees.

          Having established that the population at large is well behaved and having estimated the parameters you are then in a position to turn your attention to using those models to estimate temp/climate, first in sample to check any simplifications in moving from the models of growth, to models that use growth to estimate temp/climate, and then out of sample to derives estimates of the past.

          Whether or not this allows you ex-ante to put aside trees without (significant) bias basically depends on what the growth model says, not some informal assessment of the trees that look like they fit (noting that any inference drawn from the second phase of analysis is contingent on the growth model).

          I must say that having accepted the discipline of sampling the whole population to validate and calibrate the growth model, it is hard to see what benefits dropping trees brings. Trees will only influence the estimates of temp/climate derived from the sample to the extent that the growth model dictates.

        • barn E. rubble
          Posted Feb 10, 2016 at 11:13 AM | Permalink

          RE: Jeff Id
          Posted Feb 5, 2016 at 10:44 AM
          “. . . I do recognize that trees MUST react to temperature, I am not certain that the growth data being collected can be used as a temperature proxy.. .”

          I was under the impression tree growth was more determined by what it lacks than what it receives. IE: lack of moisture would be more apparent than higher temps. Or less sunlight would be more prevalent (in growth) than ideal temperatures. Or am I wrong? IE: Temperatures (ideal for growth) would be more or less prevalent than say needed moisture &/or CO2 fertilization?

          I understand the thinking behind trees being (or should be) a warehouse of info/data as per the conditions they grow in. I also understand the thinking behind ‘dig where the gold is’.

          And again, I ask those who may know better does isotope ratio analysis of tree cores not reveal temp proxies better than ring width or density?

        • mpainter
          Posted Feb 10, 2016 at 12:11 PM | Permalink

          “what it lacks than what it receives”


          Yes, a well established principle: Liebig’s law. This law holds that growth rate is determined by the resource in shortest supply. I’m not sure how temperature figures into plant growth but it is a known factor in plant reproduction. For example, cherry does not fruit unless it receives a minimum length of cold temperature (that is, below a certain temperature). Hence cherry orchards are located in the cooler climes, but not too cool.

          A good example of how limiting factors influence growth is found in arid environments, such as the Sahel. Here water is obviously the resource in shortest supply, hence the “limiting resource” of growth. But the Sahel has greened in recent years, not because of increased water but because of increased atmospheric CO2 and this allows more efficient use of soil moisture because plants have responded with fewer stomata. It seems that plants have universally responded to increased CO2.

        • barn E. rubble
          Posted Feb 10, 2016 at 1:03 PM | Permalink

          RE: mpainter
          Posted Feb 10, 2016 at 12:11 PM
          “. . .This law holds that growth rate is determined by the resource in shortest supply.. . .”

          I was under the impression that O2 isotope ratio analysis would be independent of ‘other’ factors making it a better temperature proxy. I understand how/why sub-fossil tree samples may not be suitable for isotope analysis. However, could both not be used (for suitable samples) to gain better understanding of ‘other’ factors? Without doubt trees, and in particular, those that live centuries hold many answers . . . I suspect getting those answers is far more complicated with a single approach, IE: ring width/density because of ‘other’ factors.

          The other thing that I found troubling as per tree selection re: altitude. As a scout leader on many a hike we’ve seen where an entire islands or patch of trees has slid down a steep slope from top to bottom and continued growing as if the slide hadn’t occurred. Over the course of 20-30 years evidence of that slide would all but disappear (more so, I suspect beyond 50 yrs), meaning some of those trees spent most of their lives at the top. (The other thing I’ve noticed over the last 15 yrs or so is that Scout camps all seem to have active railway lines bordering them, IE: whistles/horns between 3-5am. I’m not sure if active railways are a universal requirement for Scout campgrounds or just a regional anomaly.)

      • MikeN
        Posted Feb 5, 2016 at 2:08 PM | Permalink | Reply

        Why does the number of trees matter? Having a large sample would increase the likelihood of getting a hockey stick from random data.

        Steve: Only if biased methods are used e.g. ex post screening or Mannian principal components.

        • MikeN
          Posted Feb 7, 2016 at 2:44 AM | Permalink

          Well yes. I was responding to Rob Wilson’s defense of ex post screening.

      • Frank
        Posted Feb 5, 2016 at 5:05 PM | Permalink | Reply

        Rob and Greg: Thanks for being open-minded enough to comment here. I’ve limited experience in drug development and I’d like to point out some parallels:

        Some trees respond to temperature. Some patients respond a new drug. (Due to variations in metabolism, the standard dose of many drugs provides different exposure in different patients.) In NEITHER case can a scientist QUANTIFY that response after ex post selection. In both cases, quantification must start by PRE-DEFINING a population of trees or patients expected to respond to temperature or drug.

        Here is what would happen if ex post selection were allowed in drug development: No company invests big money in human clinical trials without abundant evidence from animal and in vitro studies that a new drug does something expected to be beneficial. [There is abundant evidence that trees at some sites respond to local temperature during the instrumental period.] Every year, the WSJ contains an opinion piece about a small biotech drug company that is about to go under because that the FDA won’t approve their “efficacious” new drug. Rejection usually occurs because clear evidence for efficacy was found in only a subset of the patients who were given the drug: perhaps only one sex, perhaps only the healthier patients, perhaps those with a particular genetic marker, etc. Sometimes rejection occurs because the company wants to define efficacy using a metric different from the one agreed upon with the FDA before the clinical trial began. And the company always has a good scientific rational explaining why only some of the patients responded or the new metric is superior. [For TRW, this may be altitude, precipitation, rising CO2, PDO, MXD, standardization methodologies (RCS), etc.] The FDA responds: “If your scientific rational is so strong, why did the treatment group include some patients you knew were less likely to respond. Why did you agree to using an inferior metric for determining the efficacy of your drug? [The analogous question for dendrochronologists is: “If you really understand why trees at some sites don’t respond to temperature, why did you pick such sites to study? If you really understood how measurement and analysis methodology would impact your results, why couldn’t you decide ahead of time what method to use?] Then the FDA tells the sponsoring company: Now that you know which patient sub-population responds best to your drug and/or the best way to measure your drug’s efficacy, run another very expensive clinical trial and demonstrate statistically significant efficacy?. The WSJ moans about government regulation destroying another company and its product – as if other pharmaceutical companies wouldn’t buy or partner with a company that actually had a really promising drug! Experience has shown that – most of the time – a second clinical trial using only the cherry-picked patient population and/or efficacy metric from the first trial fails to demonstrate efficacy! That is what usually happens when you cherry-pick or “data-mine”. [You collect a new set of cores from places like Sheep Mountain and Yamal and new sites with similar characteristics, where trees are “known” to respond to temperature. That is the scientific way to demonstrate what you really “know” after preliminary studies relying on ex ante selection.] The Japanese were not as rigorous in their approval process and as a consequence dozens of locally-developed drugs are sold there and not recognized as efficacious in the USA or Europe.

        Drug companies and the FDA negotiate what data will be collected and how it will be analyzed BEFORE any clinical trial is begun. The FDA and the sponsoring company independently calculate efficacy. Companies are required to present ALL of their data to the FDA when they submit a new drug application, but they often didn’t publish papers on trials that failed to show efficacy before or after approval. This bias against publishing negative results is now known to produce distortion. [See Ababneh.] In response, almost everyone running a clinical trial TODAY is required open files in a database (equivalent to the ITRDB) for all the data they intend to collect, update it as patients begin treatment (cores are collected) and post anticipated data deposit dates. Data is private until automatically released a year after it is deposited – whether or not publications are complete. (If a scientist is government-funded, is that really “his” or “her” data? Even drug companies are now required to disclose proprietary data they paid to collect.)

        Do paleoclimatologists need to be as rigorous as companies that want to make billions selling drugs to people? Why not? You are asking the world to spend trillions to avoid catastrophic climate change? Lack of scientific and statistical rigor is hurting your cause! There is more a stake than your next publication or grant. Possibly Lysenkoism?

        Get some professional advice from outside the paleoclimatology/CAGW community. Steve McIntyre comes at these same issues from the perspective of mining, where fraud is common. Statisticians (like those at the FDA) make a living telling other how not to fool themselves with data. As Feynman said in Cargo Cult Science: “The easiest person to fool is yourself”. Some trees definitely respond to temperature, but that doesn’t mean you can reconstruct past temperature with reliable confidence intervals from TRW’s selected ex ante.

        • davideisenstadt
          Posted Feb 5, 2016 at 8:26 PM | Permalink

          I believe that if the dendroclimatological community wished to have real statisticians assist them, they know where to go to ask for advice; my supposition is that they dont want to be made explicitly aware of just how flawed their approach is.

        • mpainter
          Posted Feb 6, 2016 at 5:28 AM | Permalink

          Excellent disquisition by Frank who frames the issue in terms of what is rigorous scientific method, and how dendroclimotogy compares. It has always been my impression that dendroclimotogy has been practiced in a sort of lax, wishful manner. D’Arrigo is exhibit #1.

        • Posted Feb 6, 2016 at 6:45 AM | Permalink

          Perhaps the best statisticians to ask about the appropriateness of these “standard” dendroclimatology reconstructions would be an epidemiologist – they are well used to using noisy biological data, controlling for extraneous factors and pulling meaning from the initial “mess”. They are also a rather conservative lot – at least in terms of making definitive pronouncements. I think the field (dendroclimatology) would benefit enormously from their (epidemiologists) thoughts.

        • Spence_UK
          Posted Feb 6, 2016 at 11:19 AM | Permalink

          Frank, there is a lot that can be learned from how scientific experiments are conducted in the medical community and I have raised before simple examples such as the difference between single blind and double blind studies. The point you raise about defining how the study will be conducted up front is equally important (as well as definitions of how and when the study might be aborted – this would be less of an issue for the dendro community though since patients are not so much at risk!)

          The important thing to learn is that the people doing single blind experiments are not necessarily deliberately subverting the study; it just happens because it is human nature to do so. But this human nature introduces massive biases which at least in medicine people have gone to great lengths to prevent happening. It frustrates me enormously with these examples that the dendro community seem so naive to the problems it can cause.

          Interestingly, science journalist Ben Goldacre commented on this (with respect to medical science) in Nature just four days ago. The article is well worth a read – and the recommendations would be just as apropos to the climate community as they would to the medical community. Now Ben is a fully paid up member of the alarmist side on climate, so I don’t know for sure he would endorse this on the climate side (even though his recommendations are in fact universal for good scientific practice). There is a great quote in the article that I’m sure Steve would enjoy as well:

          “Audit and accountability are the bread and butter of good medicine, and good science.”

          Link to article: Make journals report clinical trials properly

          Who was it that said auditing had nothing to do with good science recently? For some reason I think it was ATTP but I may be mistaken. Seems Ben doesn’t agree!

        • mpainter
          Posted Feb 6, 2016 at 11:29 AM | Permalink

          Yup, it was Ken Rice a few weeks ago. Richard Drake copied his remark at the post current at that time. Great amusement.

        • Posted Feb 6, 2016 at 12:15 PM | Permalink

          I second the comments by Frank and Spence. Climate science and medical research have important pints in common — esp. The large role of government funding and the high degree of public policy interest in their findings.

          We have decades of experience in medical research about ways to increase the reliability of findings beyond that of typical peer-reviewed research: such as double-blind testing, review of results by multi-disciplinary teams of experts, and transparency of data and methods. Yet recent studies about replication of results show that much more needs to be done.

          Climate scientists show little awareness of this work, and when pointed out to them usually react with distain or hostility. As Professor Curry has said, probably only outside pressure will change this.

          For example, debate in blogs only creates suspicions among skeptics and those that listen to them, while denrochronologists react with denial. Given the role of denrochronology in paleoclimate research, a review by an “outside” team of biologogists and statisticians is warranted. This could resolve these issues. But probably only Congress has the power and interest in forcing this to happen.

          So five years from now the dendritic debate probably will continue, with the public policy debate still deadlocked.

        • Frank
          Posted Feb 6, 2016 at 3:45 PM | Permalink

          David wrote above: “I believe that if the dendroclimatological community wished to have real statisticians assist them, they know where to go to ask for advice; my supposition is that they dont want to be made explicitly aware of just how flawed their approach is.”

          Frank replies: This is a bit excessive. Many climate skeptics come from “hard” sciences and engineering, areas where one has the luxury of designing and running carefully controlled, reproducible experiments with rigorous statistical methods. Dendroclimatologists are trying to extract a temperature signal from a limited data that is heavily contaminated with other signals and noise. Publications discarding sites without correlation to local temperature are implicitly telling us that scientists can’t produce a useful reconstruction if they don’t use ex post selection. In that case, they don’t have publishable results. The statisticians, McShane and Wyner, were willing to analyze a large collection of proxy data without any selection. Their work was published in a statistics journal. It doesn’t seem unreasonable to expect the correlation between tree growth and instrumental temperature that is found at some sites continued into the past. The problem arises from the inappropriate confidence intervals derived for NH temperature reconstructions and possibly the suppression of reconstructed variability first described by von Storch.

          A pharmaceutical company can and will spend large amounts of money running a follow up clinical trial on a cherry-picked patient sub-population that appears to respond to their drug candidate. (They don’t often do so, because they took great care in selecting the target patient population most likely to respond in the first clinical trial.) Assuming dendrochronologists now know how to pick sites (and elevations) where trees will show a strong response to temperature, the dendrochronology community lacks the resources to collect cores from 10-20? new ideal sites and perform a new reconstruction without any selection. Since they haven’t even done so using ex ante selection criteria, the logical assumption is that they still don’t understand what factors reliably differentiate a site that has been somewhat temperature responsive during the instrumental period from a site that is not. Their methodology doesn’t appear to be capable of rigorously meeting society’s need for reliable information, a problem that plagues AOGCMs and many other areas of climate science. Unfortunately, the IPCC publicizes the resulting “cherry-pies” as being the type of rigorous science that is practiced in other fields. (Publishable studies in other fields only contain conclusions that are “extremely likely” or “virtually certain”. Five sigma was the criteria for “discovering” the Higgs Boson.)

        • Follow the Money
          Posted Feb 6, 2016 at 4:31 PM | Permalink

          I’ve limited experience in drug development and I’d like to point out some parallels: Some trees respond to temperature. Some patients respond a new drug.

          The more apt analogy would be between the drug industry and the forestry industry. I have not seen in any forestry text the idea that temperature anomalies can be wheedled out from tree rings. There is no forestry text supporting the dendrochronologist’s Mannian temperature fad of the last twenty years. The treemometer is a delusion. Sometimes they are tricky and implicitly talk about “climatic” information in tree rings and “temperature” as one in the same. It’s not. Precipitation, or growing season precipitation, is not a firm proxy of temperature. Some places more water tends to indicate coldness, some the opposite. Then there is fog in the NE Pacific, elevations of fog, and of course growing season length, seasonal cloudiness, etc.

        • Posted Feb 8, 2016 at 12:11 PM | Permalink


          Not only do I think that this field needs to engage the applied statistics community, I think it needs to engage with the physiology community. I have some background in this. I have been involved in pharmaceutical post-marketing studies. The good ones always had a pre-determined protocol that the voluntary collectors, such as myself, were to follow. Only demonstrated failure to follow that protocol was grounds for having my data excluded, if I am remembering correctly.

          I lurk here and rarely comment. Nevertheless I cringe often when I read the procedures discussed here. I’ve had to present data in a Grand Rounds fashion and subject my presentation to the kinds of insight shown here.

          Thank you, Steve, for your tireless work.

        • mpainter
          Posted Feb 8, 2016 at 1:05 PM | Permalink

          “..I think that this field needs to engage..”

          Yes, but if no problem is admitted, then no remedy is applied. It is discouraging that no one in the field has criticized the methods employed by some of their colleagues, AFAIK. I believe that this lack of critical examination by the investigators in this field foretell the future. It’s as if none dare to improve the science, and I’m pessimistic that any will ever admit to the need for improvements.

  20. Geoff Sherrington
    Posted Feb 5, 2016 at 4:47 AM | Permalink | Reply

    Thank you for the response.
    It does represent an advance if you have done enough work to show that MXD is better than ring width. That is hard to do with correlation coefficients around 0.4, especially when establishing a method. For some types of earth science work, that figure would be adequate grounds to dismiss the project as not viable.
    There are other matters aplenty, like the postulated inverted U response of growth either side of an optimum.
    There is a lot of dancing around the ring properties in relation to the “decline”. It sounds like few readers here accept the solution to the decline is to do a cheery cherry pick and hope that people will not mention it. Do you have a better method?

    By coincidence, I was looking for a climategate quote yesterday.
    Here is another. The question is, what has improved since then especially given the McIntyre essay above?

    From: Phil Jones To: Tom Wigley
    Subject: Re: [geo] Re: CCNet: A Scientific Scandal Unfolds
    Date: Mon Oct 5 10:03:02 2009
    Thanks for trying to clear the air with a few people. Keith is still working on a response. Having to contact the Russians to get some more site details takes time. Several things in all this are ludicrous as you point out. Yamal is one site and isn’t in most of the millennial reconstructions. It isn’t in MBH, Crowley, Moberg etc. Also picking trees for a temperature response is not done either.
    The other odd thing is that they seem to think that you can reconstruct the last millennium from a few proxies, yet you can’t do this from a few instrumental series for last 150 years! Instrumental data are perfect proxies, after all.
    This one is wrong as well. IPCC (1995) didn’t use that silly curve that Chris Folland o Geoff Jenkins put together.

    At 02:59 05/10/2009, you wrote: David,
    This is entirely off the record, and I do not want this shared with anyone. I hope you will respect this. This issue is not my problem, and I await further developments.
    However, Keith Briffa is in the Climatic Research Unit (CRU), and I was Director of CRU for many years so I am quite familiar with Keith and with his work. I have also done a lots of hands on tree ring work, both in the field and in developing and applying computer programs for climate reconstruction from tree rings. On the other hand, I have not been involved in any of this work since I left CRU in 1993 to move to NCAR. But I do think I can speak with some modicum of authority.
    You say, re dendoclimatologists, “they rely on recent temperature data by which to *select* recent tree data” (my emphasis). I don’t know where you get this idea, but I can assure you that it is entirely wrong.
    Further, I do not know the basis for your claim that “Dendrochonology is a bankrupt approach”. It is one of the few proxy data areas where rigorous multivariate statistical tools are used and where reconstructions are carefully tested on independent data.
    Finally, the fact that scientists (in any field) do not willingly share their hard-earned primary data implies that they have something to hide has no logical basis. Tom.

    David Schnare wrote: Tom:
    Briffa has already made a preliminary response and he failed to explain his selection (etc)….

  21. Phil Howerton
    Posted Feb 5, 2016 at 11:52 AM | Permalink | Reply

    What does “blue” studies refer to? What are they and how do they differ from what has been produced so far? Ie. in the paper under discussion.


    Steve: “blue intensity” is a new treering proxy that Rob is very encouraged by. One of the tremendous advantages of tree ring data is that it is very very well dated, something that is an issue with sediment studies.

    • Steve S
      Posted Feb 5, 2016 at 12:15 PM | Permalink | Reply

      It doesn’t matter right now what “blue” study is…..even if a new paper is released based on “blue”, you wont see any data archived for 9 years. So, if a new “blue” paper comes out today, the rigors of science can’t be applied for at least 9 years when the data is finally released. Then, when the data is released, Rob Wilson will say once again,



      I’m not good at math, or statistics, but I’m coming to the conclusion that the statute of limitations for applying scientific rigor to dendro papers is exactly equal the time delay in releasing data necessary to perform such analysis.

      Thanks Mac for all the good work, the Whack-a-mole must be frustrating.

      • GD Holcombe
        Posted Feb 5, 2016 at 1:06 PM | Permalink | Reply

        Your statute of limitations rule is priceless, and sadly apt. Well done.

    • davideisenstadt
      Posted Feb 5, 2016 at 8:30 PM | Permalink | Reply

      But Steve:
      Tree growth exhibits a strong degree of autocorrelation….while the dates may be certain, what causes a particular tree to grow in one season is often dependent on conditions that existed and events that occurred in previous years.
      Sorry if this is a redundant post; please feel free to delete it.

    • TimTheToolMan
      Posted Feb 8, 2016 at 5:29 AM | Permalink | Reply

      Steve: “blue intensity” is a new treering proxy that Rob is very encouraged by.

      If it continues to involve sample selection based on temperature correlation then its lipstick on a pig.

      • mpainter
        Posted Feb 8, 2016 at 9:33 AM | Permalink | Reply

        Such procedure, if eliminated from the the science, would leave the field bereft of most of its important works, if not all. Surely you do not expect those engaged in these studies to renounce one of their fundamental tools?

  22. Posted Feb 5, 2016 at 1:51 PM | Permalink | Reply

    One thing (among many) that I’m unclear about is the pre-instrumental statistical behavior of the “tree thermometer sites” that are ex post selected.

    Those selected “tree thermometer sites” correlate with the local instrumental temperature record during the instrumental period – that’s why they were selected. The selected sites’ relationship with each other during the instrumental temperature period should be describable in statistical terms (std deviation, range, etc).

    Do those “tree thermometer sites” have the same statistical relationships with each other in the pre-instrumental period? That’s what I’m unclear about.

    If they statistically behave one way during the instrumental period but behave another way prior to the instrumental period then it’s hard to believe that they were “tree thermometers” in the pre-instrumental period (or instrumental period for that matter).

    Seems like a group of true tree thermometers would behave the same over time. Sorry if the question is ill-posed or if the answer is obvious to more-literate folks.

    • Sven
      Posted Feb 5, 2016 at 2:09 PM | Permalink | Reply

      Interesting. I’ve been thinking about the same thing. Seems to be a logical thing, at least to a layperson. Maybe it’s done, but I’ve not seen anything on that.

      • Sven
        Posted Feb 5, 2016 at 2:15 PM | Permalink | Reply

        And as a test, does the said correlation differ in any significant way from the correlation between the thermometer and non-thermometer trees?

    • Pat Frank
      Posted Feb 5, 2016 at 2:51 PM | Permalink | Reply

      That’s been mentioned in the proxy literature, David. One can read there that constant tree response is assumed. Trees that are “good responders,” i.e., that correlate well with temperature now, are assumed to correlate with temperature throughout their past.

      This assumption is never tested, and indeed cannot be tested vs. temperature. However, the biological literature of tree genetics and tree environmental response does not support that assumption.

      Notice also that Rob’s determinations of temperature-limited growth are qualitative judgments. From these he proposes to use statistics to extract quantitative data (temperature).

      Statistics is no substitute for physics. The most sophisticated statistical algorithms cannot ever convert qualitative biological inferences into quantitative physical data. Such “paleo-temperatures” have no physical meaning. I’ve discussed these points here; see also here.

      • Sven
        Posted Feb 5, 2016 at 3:09 PM | Permalink | Reply

        Thank you, Pat. It’s true that historical correlation to temperature can not be tested and so unprovable assumptions are made. But we are talking about hostorical correlations between the treemometers themselves. That can be tested. As well as the difference in correlation between treemometer-treemometer vs treemometer-nontreemometer. I would assume that the existence of this correlation would still be no proof of these trees being a valid thermometer but the possible lack of correlation would be quite a good proof of the trees not being historical thermometers

        • barn E. rubble
          Posted Feb 6, 2016 at 9:34 AM | Permalink

          RE: Sven
          Posted Feb 5, 2016 at 3:09 PM

          ” . . .the possible lack of correlation would be quite a good proof of the trees not being historical thermometers”

          At what age are trees no longer considered suitable for isotope ratio analysis? I would assume that cores from living trees would be suitable and understand that isotope ratios are ‘good’ temperature proxies, or am I mistaken?

        • Pat Frank
          Posted Feb 7, 2016 at 5:30 PM | Permalink

          You’re right, Sven.

          But if history is any guide (recall the divergence problem), dendro-thermometrists could just argue that the trees that don’t correlate with temperature, despite adjoining trees that do so correlate, just have some biological issue that allows them to be rejected.

          The field lives off tendentious argument, and the problem you raise would just provide another opportunity for one.

        • Pat Frank
          Posted Feb 8, 2016 at 12:18 PM | Permalink

          The problem is not “that dendroclimatologists don’t know how to pick a site where trees will be reliably temperature-responsive.,” Frank. The primary problem is that dendro-climatologists have neither theory nor method for extracting a temperature from a tree ring at all.

          It’s not that they can’t pick a site. It’s that they can’t pick a tree — a far more fundamental disability.

          Absent any physical theory of tree-growth, temperature-dependence can not be objectively determined. Nor can a temperature itself be extracted from any tree-ring-metric.

          My comment was not at all misleading, but is rather obviously true. Perhaps you misunderstood the point.

        • Frank
          Posted Feb 9, 2016 at 3:44 PM | Permalink

          Pat Frank wrote: “Absent any physical theory of tree-growth, temperature-dependence can not be objectively determined. Nor can a temperature itself be extracted from any tree-ring-metric.”

          In the 19th century, we didn’t have a physical theory for why mercury in thermometers expanded with rising temperature. Nevertheless, we were able to use thermometers to learn about conservation of energy (temperature is internal energy) and about thermodynamics (TdS). We may not fully understand the temperature-dependence of electrical resistance today. For centuries, we successfully used a temperature-dependent phenomena to measure temperature without fully understanding them.

          The temperature dependence of tree growth is far more complex that the temperature-dependence of electrical resistance or the density of mercury. Tree growth is a series of [catalyzed] chemical reactions whose rate increases with temperature proportionally with exp(-E/RT), where E is the activation energy. The product of those reactions is the cellulose quantified in early and late wood of tree rings. In the case of photosynthesis of plants, one of these steps – the initial reaction with CO2 – is so slow that 25-50% of the protein in leaves is the enzyme that catalyzes this step (RuBisCO). It is frequent the rate-limiting step in plant growth and it is temperature-dependent. Under some circumstances, another biochemical reaction might be rate-limiting with a different temperature dependence. At times, too little light, water, nitrogen or phosphorus can also limit rate of plant growth in a non-temperature dependent manner. (Clouds/light are obviously critical, but globally the planet’s albedo remains fairly constant. Reconstruct NH temperature may be more accurate than expected from the confidence interval for individual sites.)

          There are also regulatory systems that prevent trees from trying to grow under unfavorable or dangerous conditions.

          The question is not whether tree growth should reliably contain a temperature-depend signal. It will and does under some circumstances and not under others. Validation shows that one can use the temperature dependence from part of the instrumental record to reconstruct the temperature during the rest of the record. The issue is the reliability (confidence interval) of the abstracted signal. That can’t be properly calculated if you cherry-pick. If 50% of “ideal” sites are responding during any one century, discarding non-responders when analyzing the 20th century creates too much confidence in your reconstruction of earlier centuries.

          None of this means that the temperature difference between the CWP, LIA and MWP has been or ever can be reconstructed with useful confidence intervals if you don’t cherry-pick. The ubiquity of cherry-picking suggests it hasn’t been possible in the past. TRWs contain relatively little temperature signal; MXD is better. See Esper (2015) Figure 6.

        • davideisenstadt
          Posted Feb 9, 2016 at 5:03 PM | Permalink

          Its quotes like this that many find maddening:

          “The question is not whether tree growth should reliably contain a temperature-depend signal. It will and does under some circumstances and not under others”

          I think it stands on its own, and requires no commentary.

        • Pat Frank
          Posted Feb 12, 2016 at 11:46 PM | Permalink

          So your position is that exploiting a well-known co-relation between an independent variable and a dependent variable is identical to quantitatively extracting one variable from among many correlated and coupled variables in a complex multi-variate system about which little is known and less can be predicted.

          Great thinking, Frank.

          The rest of your essay can be condensed to, “Validation shows that one can use the temperature dependence from part of the instrumental record to reconstruct the temperature during the rest of the record.,” which is the standard correlation = causation mistake rife in paleo-thermometry; paleo-thermummery is more apt.

          Correlation = causation was wrong when Pearson supposed it, it’s wrong now, and it will never be correct.

          Your ideas are thoroughly unscientific.

      • Frank
        Posted Feb 8, 2016 at 1:39 AM | Permalink | Reply

        Pat Frank wrote: “Trees that are “good responders,” i.e., that correlate well with temperature now, are assumed to correlate with temperature throughout their past. This assumption is never tested, and indeed cannot be tested vs. temperature.”

        This is misleading. The problem is that dendroclimatologists don’t know how to pick a site where trees will be reliably temperature-responsive. Suppose they could pick sites 90% of the time where they could reliably reconstruct the full dynamic range of instrument temperature in a range of settings. In that case, we might be able to assume that 90% of the same sites in the past showed similar temperature responsiveness before the instrumental period – within the range of temperature experienced during the instrumental period.

        Look at Figure 6 of Esper (2015) cited elsewhere in the comments and see how badly TRW reconstruct isolated temperature extremes during the instrumental period. The real problem with TRW is that they are lousy temperature proxies – even at sites that show the highest correlation with local temperature. And such sites show this degree of temperature responsiveness perhaps half of the time if they are pick using ex ante criteria. In that case, it would be reasonable to assume that a varying fraction in the vicinity of half of sites were non-respnsive in the past.

        The biochemical processes that limit tree growth at low temperature haven’t changes over the past few millennia. (Evolution is slow and improved mutants in a single species would quickly dominate.) The local factors that complicate this relationship do change, but we account for them by looking at many trees at many sites. They contribute noise and uncertainty, but they don’t invalidate the methodology. Rising CO2 (“fertilization”) is a systematic perturbation in the late 20th century.

  23. David
    Posted Feb 5, 2016 at 5:11 PM | Permalink | Reply

    Steve M.

    I believe there is an error in the 4th paragraph:

    Both Gulf of Alaska chronologies (D’Arrigo et al 2006 and Wiles et al 2006) used….

    should be:

    Both Gulf of Alaska chronologies (D’Arrigo et al 2006 and Wiles et al 2014) used…

    No need to post this.

  24. kenfritsch
    Posted Feb 6, 2016 at 10:07 AM | Permalink | Reply

    It bothers me that the true value and validity of proxy responses to temperature will not be realized until those working in this area are much more introspective in pointing to basic problems/issues/weaknesses involved with currently constructed temperature reconstructions. It appears to me that in this area of climate science – as well as in other areas dealing with AGW – that there is a hesitancy to admit any weaknesses for fear that the entire edifice of the consensus might fall. In my mind the 97% that agree that at least 50% of the warming during the instrumental period is anthropogentically caused would not be affected by any admission of weaknesses in any of these areas. I think the mindset of those working in these areas about admitting to weaknesses and uncertainties comes from the close link between science and advocacy.

    Temperature reconstructions properly validated and using a priori and physically based criteria could go a long way in turn in validating climate models and understanding our current and future climate. It may well be found that some proxies cannot be used in temperature reconstructions, but that information is valuable as it means we move on to testing other proxies and not being confused by using invalid proxies. The instrumental temperature record available for attempts to validate climate models is short and particularly short is the time period where the GHG levels have increased at a higher rate. I think SteveM and posters here have presented the weaknesses seen in the current methods of temperature reconstructions that are apparent to outsiders who have done their own analyses. Those weaknesses and basic problems are not limited to one or few aspects of reconstructions but are rather comprehensive – as noted best by a post above from Jeff ID.

    Without those aspects being critically investigated by the workers in this area I see little true progress being made in this field. And as SteveM noted those advances will not come in inventing new ways to “torture” the data in post facto selection of proxies.

  25. Frank
    Posted Feb 6, 2016 at 4:15 PM | Permalink | Reply

    When discussing biological persistence and lagged TRW response to temperature, Rob mentioned Esper et al, Dendrochronologia 35 (2015) 62–70:


    The main focus is the extremely limited ability for TRW and superior ability of MXD to reconstruct the cooling following volcanic eruptions. Buried at the end of the paper and not mentioned in the abstract is Figure 6, which shows the reconstruction of the 15 unusually warm and cool years (JJA, locally) at 11 NH sites during the instrumental period. (15 sites had both kinds of data, but 4 with little correlation were omitted.) The warm and cool years averaged +1.5 degC and -1.2 degC compared with the surrounding 10 years. TRW reconstructions only capture 1/3 of this known variability and the warm signal – but not the cold! – persisted for a half decade. MXD did a better job of reconstructing this observed high frequency variability, capturing 80% of the cooling and 50% of the warming. The response at individual sites is all over the map.

    1998 was the warmest year in the previous millennium?

    • Follow the Money
      Posted Feb 6, 2016 at 5:12 PM | Permalink | Reply

      Frank, on your recommendation I read it. I repeat the first sentences of the paper:

      Volcanic eruptions have been identified as a major natural forcing of the climate system (Oppenheimer, 2011). The aerosols released by large, explosive eruptions tend to cool the earth’s surface, but warm the lower stratosphere. Surface cooling results from scattering of incoming solar radiation, i.e. less radiation reaches the ground. Stratospheric warming is triggered by increased absorption of radiation, i.e. more radiation is transferred into sensible heat in 10+ km above ground (Robock, 2000) [my bolds]

      I’m surprised this is laid out so clearly from the start. Where later in this paper is there any discussion that the botanic response they are recording is not due to less solar radiation hitting trees? That is, less photosynthesis? Trees are plants, after all. This paper does discuss how a degree or so of temperature anomaly in the ambient atmosphere is guiding ring density, rather the anomalies in solar radiation hitting leaves. Sure, cooling may correlate with less solar energy, and thereby less plant growth, but is it cause? I’ve looked, never found any of these papers spelling out this issue, or mentioning it at all. Maybe I’m mistaken in assuming dendrochronology is an arm of botany. And your points too..

      • Posted Feb 7, 2016 at 4:03 AM | Permalink | Reply

        try this:
        Stine, A., Huybers, P., 2014. Arctic tree rings as recorders of variations in light
        availability. Nat. Commun. 5, 1e8. http://dx.doi.org/10.1038/ncomms4836.

        contentious paper but has bubbled up some debate. One has to focus on at least the Euro records and how TR data model 1816 cool summer
        Carbon isotopes should help in this debate

        another paper of possible interest:
        D’Arrigo R, Wilson R and Anschukaitis K. 2013. Volcanic Cooling Signal in Tree-Ring Temperature Records for the Past Millennium. JGR-Atmopsheres. Published online. DOI: 10.1002/jgrd.50692

        • Frank
          Posted Feb 8, 2016 at 12:59 AM | Permalink

          Dendrob: Thanks for the references. To understand what is going on, I prefer to decouple the multiple factors associated with volcanic eruptions (multi-year cooling, modest decrease in SWR reaching the surface, aerosol deposition?, ?) from simple temperature change. Figure 6 in Esper 2015 provides an analysis of simple, high frequency temperature variability (isolated relatively warm or cool years) during the instrumental period compared with the TRW and MXD reconstructed temperature. According to this information, the “memory effect” from the first cool year doesn’t persist in either TRW or MXD, but there is a memory effect from a single warm year in TRW, but not MXD. If Esper 2015 had more then 11 sites, perhaps he could have continued to analyze how these relatively simple responses vary with latitude/altitude, average sunshine at the site, actual (!) sunshine and precipitation from reanalysis, etc. Then you might be in a better position to understand what happens in a more complicated situation like a volcanic eruption. Or the depths of the LIA and peaks of the MWP, however big they might be.

          This reflects my prejudices from drug discovery, which left the Dark Ages of simply dosing animals with “medicinal plants” or new molecules and seeing what happens. Now we measure inhibition/occupancy for target and off-target enzymes/receptors, activity in cell culture, and free and serum-bound drug levels plus therapeutic activity in animal models. That isn’t cheap and can’t be done in an academic setting, but it keeps you from wasting money in clinical trials. The first thing I’d like to do is look at the rate of C14O2 uptake in a representative dwarf tree in the laboratory with different temperature and illumination. There might be plenty of signal in one leaf/needle after 15 minutes. Then it would be useful to consider what fraction of new carbohydrate ends up in cellulose and other plant material (your TRW), what fraction is consumed by respiration (and fraction is stored for next spring?). If only 10% goes into cellulose, a 5% reduction in CO2 uptake would be significant. Trying to learn about the importance of many factors in uncontrolled settings is challenging, but it does leave you with another publishable reconstruction to add to the “spaghetti” graph whose confidence limits and dynamic range will be challenged by SteveM. FWIW, MXD looks like a big advance over TRW.

      • Frank
        Posted Feb 7, 2016 at 4:53 AM | Permalink | Reply

        Follow the Money: In biochemical terms, there is probably a temperature-dependent, rate-limiting step in the conversion of CO2 into the cellulose in tree rings and.or late wood (MXD). For crops, I’ve readd that the intensity of sunlight during daytime is generally not a limiting factor. So a small reduction in sunlight reaching the surface after a volcanic (Pinatubo was only a -3 W/m2 forcing at peak, a 2% reduction in radiation reaching the surface) probably has a trivial effect on tree growth compared with the reduction of growth associated with temperature.

        In many plants, the rate-limiting step in the conversion of CO2 to carbohydrates is catalyzed by the enzyme RuBisCO. It is a target of many attempts to genetically engineer more efficient crops. Compared with many enzymes, its turnover number is low – fixing a maximum of about 10 CO2 molecules per enzyme per second. I have no idea whether dendrochronologists understand the biochemistry that produces the temperature dependence of tree growth.

        I’d like to again draw attention to Figure 6, which shows the inability of TRW to record high frequency (annual) variation in temperature, even at sites where there is strong correlation between TRW and temperature.

        • davideisenstadt
          Posted Feb 7, 2016 at 7:11 AM | Permalink

          Just because light isn’t a limiting factor doesn’t necessarily imply that a reduction in light won’t reduce plant growth
          Your assertion that a reduction in sunlight received by plants won’t decrease their growth rates flies in the face of working experience of foresters and commercial farmers everywhere.
          Do you seriously maintain that generally, shaded trees do not grow more slowly than the same species of tree that experiences full sun?
          Perhaps for shade loving understory ornamentals like dogwoods…but for the vast majority of trees this is not the case.

          this sentence:
          ” So a small reduction in sunlight reaching the surface after a volcanic (Pinatubo was only a -3 W/m2 forcing at peak, a 2% reduction in radiation reaching the surface) probably has a trivial effect on tree growth compared with the reduction of growth associated with temperature.”

          is a keeper..

          The change in growth rates experienced by a tree that are caused by a change in temperature may be positive or negative, depending on the direction of the change (cooling of warming), the starting temperature and the magnitude of the change.
          Trees’ responses to changes in temperature are neither uniform or linear, thats one of the basic problems in atttempting to use trees’ growth as a proxy for temperature.
          Thus, warming may have either positive or negative impacts on a tree’s growth rate, depending not only on just what temperature the tree was experiencing in the first place, but also the magnitude of the warming..
          Ceteris paribus, reduction in sunlight received will generally have a negative impact on trees’ growth rate.
          Your point regarding the resolution of temperature information gleaned from tree rings is an important one, one often overlooked by advocates of the use and utility of these proxies.

        • Frank
          Posted Feb 7, 2016 at 5:15 PM | Permalink

          David: I said that for [agricultural] crops, light is generally not a limiting factor for growth. If you want to compare harvesting solar energy with biomass to harvesting it with solar panels, you’ll find that plants are inefficient because they can’t use all of the photons they receive in full sunlight. You are obviously correct that light is a limiting factor in dense forests. I don’t know what happens near the tree line, where one might expect trees to be further apart.

          There is obviously a lot of interest in improving the efficiency of growth of agricultural crops (particularly engineering the enzyme RuBisCO) and increasing their tolerance to heat and drought. The important biochemical limitations of tree growth at extreme altitude or latitude or any plants at low temperature aren’t easy to find. Do dendrochronologists treat trees as “black boxes” or has the biochemical nature of their temperature dependence been studied? Or is growth actually controlled in a different manner: by membrane fluidity or by the number of “growing” days between the last frost (or first warm spell triggering growth) in the spring and first frost (or first cold spell terminating growth) in the fall. In the latter case, we might expect TRW to vary more with local spring and fall temperature less with summer temperature.

        • Follow the Money
          Posted Feb 8, 2016 at 4:30 PM | Permalink

          In biochemical terms, there is probably a temperature-dependent, rate-limiting step in the conversion of CO2 into the cellulose in tree rings and.or late wood (MXD).

          I don’t know how to take “probably,” “rate-limiting,” or “step,” but there is certainly a relationship between light availability and the conversion of CO2 into cellulose, ‘certainly’ derived from published science for ages. Relative lack of light is indeed a factor less light, e.g., interannual cloudiness variability.

          Another point to be made research wise, appropriate here, is that if one gives focus to the fact that CO2 is necessary for cellulose growth (don’t say duh! out there), should not research be directed whether increased CO2 causes increased growth. (I suspect the existence of the “divergence issue” is anthropogenic evidence the answer is something close to “no.”)

          For crops, I’ve read that the intensity of sunlight during daytime is generally not a limiting factor.

          The availability of light is a significant factor (I don’t like “limit “) on plants of which I am aware, especially outwardly noticeable in grasses.

          -3 W/m2 forcing at peak, a 2% reduction in radiation

          It looks like your talking about infra-red “radiation,” heat, here. I would rather see spectrometer data about visible and UV light after Pinatubo and the other vulcanism incidents at issue.

      • Posted Feb 7, 2016 at 2:37 PM | Permalink | Reply

        “Volcanic eruptions have been identified as a major natural forcing of the climate system (Oppenheimer, 2011).”

        Willis Eschenbach: “So I’ll say, as I’ve said before, that while volcanoes can certainly affect local areas, rumors of the power of volcanoes to affect global average temperatures have been greatly exaggerated.”

        Includes links to a series of volcano posts.


  26. EdeF
    Posted Feb 7, 2016 at 8:37 AM | Permalink | Reply

    Excellent post, Steve. With this species of tree I would cut out the ex post screening,
    plot each tree core as is (except for obvious mechanical tree damage), but then look
    for the temperature signal as the optimum tree response moves north or south. I would
    plot the changes in the optimum tree response as variations in latitude, knowing this
    may be confounded by rainfall, competing tree die-off, insect infestation, etc.

  27. Michael Jankowski
    Posted Feb 7, 2016 at 12:13 PM | Permalink | Reply

    Seems to me that older pubs need to be checked and possibly re-analyzed. With recent temperatures being revised upwards and the past being cooled, the screening correlations and then the calibrations and verifications of the used series are going to change. And any divergence problem is likely to be “worse than we thought.”

    • Geoff Sherrington
      Posted Feb 7, 2016 at 7:24 PM | Permalink | Reply

      If there has indeed been an 18 year “pause” in global temperatures, locations with similar patterns will cause a calibration problem. How do you calibrate against an input that is constant?
      OTOH, here is an opportunity. Dendro responses that show systematic variation during a steady temperature input would seem to provide means to quantify other variables that affect growth.

      • Michael Jankowski
        Posted Feb 8, 2016 at 10:21 AM | Permalink | Reply

        Most of the “recent” studies calibrate against so-called regional temps from CRUTEM, and often seasonal instead of yearly. So it’s not necessarily pause-material. In any case, there seems to be a retroactive cooling of the earlier part of the 20th century when adjustments are made and a warminb of more recent decades/years. So those will affect calibration.

    • Posted Feb 7, 2016 at 11:35 PM | Permalink | Reply

      Trees do not respond to global temperatures, they respond to local temperatures and local environmental factors.

      If the actual, local environmental factors are poorly documented in a given tree ring chronology, claims of “correlation” with presumed local temperature averages should be strongly discounted.

      • Geoff Sherrington
        Posted Feb 8, 2016 at 3:06 AM | Permalink | Reply

        Yes, as I noted. When global T is constant, there is more likelihood of local area that are also constant. Not firmly so, but a rude expectation. Steve has asked that we not get too general, so I’ll just note that you and I are in agreement on this.

      • Michael Jankowski
        Posted Feb 8, 2016 at 5:29 PM | Permalink | Reply

        It would be important to examine correlations with other local factors as well, such as precipitation, aerosols, etc, and allegedly well-mixed global factors such as CO2 levels. Granted, some such data may only cover a portion of the calibration period, but it should be examined. I would argue that correlations to temperatures in other regions should be examined as well to help substantiate that local temperatures are the driving force and that it’s not coincidental (although Mann and his teleconnection garbage would possibly argue to use whichever gridcell correlates best).

  28. Anto
    Posted Feb 8, 2016 at 5:57 AM | Permalink | Reply

    Amazing what you can get away with in climate “science”. Just imagine if you had tried to publish the results of a medical trial, where you took the trial population, excluded any who had adverse reactions, then excluded those who had insufficiently favourable reactions, then published unreservedly positive results from your trial, based on that small population of positive responders. Where else would this process escape censure?

    • davideisenstadt
      Posted Feb 8, 2016 at 6:57 AM | Permalink | Reply

      well, theres AA…they use this type of analysis.
      AA works for 98% of less than 3% of the population….

      • Posted Feb 8, 2016 at 9:18 AM | Permalink | Reply

        Automobile Association, Alcoholics Anonymous or other?

        • davideisenstadt
          Posted Feb 8, 2016 at 9:24 AM | Permalink

          alcoholics anonymous.
          their use of statistics would make a dendroclimatalogist blush.

        • Posted Feb 8, 2016 at 9:38 AM | Permalink

          “Whoever saves one life saves the world entire.”

          Perhaps they’re both in the same business therefore, leading to some looseness in the stats:)

          (Interesting in googling for those words at once reads criticism of Spielberg for having got the original Talmud quote wrong. Not the film-maker, surely, but Thomas Kennealy and his Jewish informant, right-hand man to Oscar Schindler? Talk about pedantic quibbling.)

          Your comment jumped out at me because alongside my 22-year-old niece being baptised yesterday were a number of recovering addicts, one of whom mentioned the benefit of the twelve-step program in his testimony beforehand. Sample of one but a moving one.

      • mrmethane
        Posted Feb 8, 2016 at 10:36 AM | Permalink | Reply

        I think you’ll find that AA publish very few stats and make no claims based on same. For-profit rehab places, not so reserved. You’ll also find that AA take no large donations, no government grants, and do not engage in or comment upon related scientific research. Beyond the occasional Public Service Announcement, you’ll find no media hype from AA. Oh, that academics had the same level of ethics.

        • Phil Howerton
          Posted Feb 8, 2016 at 2:40 PM | Permalink

          Mrmethane: You are exactly right. AA does not produce stats. Nor does it accept large donations. I have been a member now for twenty-six years. My usual donation is a dollar. It saved my life and has done the same for millions of others. I am afraid Mr. Eisenstadt, on this subject at least, doesn’t know what the hell he is talking about.

    • Michael Jankowski
      Posted Feb 8, 2016 at 10:26 AM | Permalink | Reply

      If you want to use that analogy, I would say it’s more akin to a medical trial whereby the assignment of who received a drug and who received a placebo was done retroactively and determined based on the results.

  29. Michael Jankowski
    Posted Feb 8, 2016 at 8:56 PM | Permalink | Reply

    Looked at Fig 5 and re-read Wilson’s 2007 of the sites that were discarded because of their poor correlations. His logical description as to possibly why:

    “…Nine chronologies were
    excluded from further analysis, as they showed no
    significant correlation with this season. Of these nine
    chronologies, FT, LT, PT, GW, WC and TB are located
    at more ‘interior’ sites around the extreme western
    end of Prince William Sound and are likely
    protected from the influence of the North Pacific
    resulting in drier site conditions…”

    Hmmm…drier site conditions. Brilliant. If only someone kept data on stuff like precipitation, which can be an indicator of how dry or wet a site gets.

    Here’s the result of 10 seconds of extremely difficult googling:

    I’ll go as far to say that there’s FAR more detailed precipitation data out there and that’s much more up-to-date. For some grant funding, I’ll provide it to you, oh world of dendros.

    (Number of times that “rain,” “snow,” or “precipitation” appears in the publication, aside from titles in the bibliography: ZERO…seriously?)

  30. Craig Loehle
    Posted Feb 9, 2016 at 11:17 AM | Permalink | Reply

    Trees clearly respond to precipitation as well as temperature. Even cold sites can get too much or too little snow. It is well known that there have been floods and droughts throughout history almost everywhere and these wet/dry periods have sometimes lasted hundreds of years. This confounds using trees as thermometers for 1000 year reconstructions. I challenge any dendro to show me why I am wrong. To do so they would need to prove that precipitation has never affected the trees they selected or only as noise.
    Likewise, changes in tree health or competition can change for periods of hundreds of years–please show how this is irrelevant.
    No, instead they just ignore these 2 key things and pretend they don’t happen.

    • Michael Jankowski
      Posted Feb 9, 2016 at 11:29 AM | Permalink | Reply

      Well I can’t say they always ignore such factors…especially when using trees as precipitation proxies😉

    • Follow the Money
      Posted Feb 9, 2016 at 4:10 PM | Permalink | Reply

      Warmer weather = more precipitation.

      I believe that is the IPCC-influenced dendro firm faith. Jus’ because, and everywhere.

      Especially Australia.

      • Michael Jankowskim
        Posted Feb 9, 2016 at 8:25 PM | Permalink | Reply

        Nah, sometimes you need to find drought histories, too!

    • Posted Feb 10, 2016 at 7:08 AM | Permalink | Reply

      Craig – we don’t ignore – decades of work exploring these issues – you’re just ignoring the literature.

      If you want to see non-linear modelling based approaches, you cant do much worse than start here:

      • davideisenstadt
        Posted Feb 10, 2016 at 8:23 AM | Permalink | Reply

        Dr Wilson:
        With all due respect, your comment reveals that you either dont get the point Craig is making, or you just dont want to get it.
        Craig’s observation deals with the assumptions necessary to extend observed relationships into the far past.
        Whether the relationship between the proxies you investigate and temperature is linear or nonlinear, a necessary assumption is that the posited relationship holds relatively constant over time. (or changes at a known rate throughout the time period examined….)
        Since, by their nature, trees respond to literally a myriad of environmental factors, and their environment is likely to change over a centennial scale, It not unreasonable to question these assumptions.
        IOW, just because a tree yields a time series (TR chronology) that shows a correlation to the instrumental record doesn’t in any way imply that other factors did not influence the particular tree’s growth in the centuries before the tree was cored, and examined.
        Conflating concerns like those Craig articulated with the nature of whatever relationship one posits exists, is in my view, mistaken.

        Also this sentence:

        “If you want to see non-linear modelling based approaches, you cant do much worse than start here:”

        Perhaps should read something like”…..you could do much worse”, or “cant do much better”.

        • Gd Holcombe
          Posted Feb 10, 2016 at 12:17 PM | Permalink

          Mr. Eisenstadt:

          I’ve enjoyed reading your arguments, and I agree with the case you make. Your econometric background gives you some important and useful things to say. Conversely, I disagree with Dr. Wilson. That said, I don’t think it’s especially useful to suggest that he “[doesn’t] want to get it.” Sure, it’s frustrating to have sensible arguments fall on seemingly deaf ears, but Wilson is at least willing to engage here, and to defend his arguments in a very hostile forum–a rare thing where he comes from. The back and forth between Wilson and you and Steve and many other very bright people on this forum is very helpful to many of us. So why question motives and give him a reason to perhaps withdraw from the discussion?

          Same thought with your comment at the end. It’s obvious he misstated himself there, but what’s the purpose of pointing it out when we all know what he meant?

          Again, thanks for the very useful discussion points and arguments you’ve made. I almost always learn something from reading you. But, frustrated as you may be with his arguments, please consider easing up on the motive-questioning–at least with Wilson (Mann and Schmidt and others deserve the questioning). I’d like to continue to watch Wilson defend his arguments here and see you you and others respond to them.


        • bernie1815
          Posted Feb 10, 2016 at 1:27 PM | Permalink

          FWIW, I strongly endorse GD Holcombe’s position on both the value of the discussion and the importance of being temperate and substantive when responding to Rob Wilson and any other well-mannered climate scientists who participate.

        • davideisenstadt
          Posted Feb 10, 2016 at 2:13 PM | Permalink

          Gd Holcombe:
          Thank you for your kind and temperate words.
          I find that C Loehl’s points are generally carefully worded, and precise.
          For one to basically ignore the obvious plain reading of his comment struck me as…unusual.
          I find Dr Wilson’s response to border on the willfully obtuse.
          However, I should strive to be more considered in my written remarks.
          My apologies to Dr Wilson and to all of you.

        • Frank
          Posted Feb 12, 2016 at 12:18 PM | Permalink

          Davideisenstadt: Another way to approach the question of whether a tree site was temperature sensitive in the past is to hypothesize that certain types of sites are temperature sensitive on the average 50%, 80% or, 95% of the time – both today and in the past. If dendrochronologist get good enough at identifying the characteristics of sties that are temperature-sensitive today (there being nothing special about today), then it is likely that a collection of such sites will (on the average) be equally temperature sensitive in the past. Unfortunately, dendroclimatologists aren’t particularly effective at selecting temperature selective sites right now, so they remove them by ex post screening for correlation and thereby improve there signal in the instrumental period. Due to this ex post screening, we have no way of knowing how temperature sensitive they were in the past. If climatologists can pick temperature sensitive sites with high reliability without relying on screening, you confidence in their reliability in the past would increase and the confidence interval without selection would improve.

          There are certainly times during the instrument period where individual sites fail to show the expected temperature sensitivity for a few years or perhaps decades (or when noise dominates a weak temperature signal ). That doesn’t stop a collection of sites from being able to reconstruct the instrument period with some skill.

      • michael hart
        Posted Feb 10, 2016 at 10:06 AM | Permalink | Reply

        dendrob, on the Bishop Hill thread indicated above, after several people asking, Tim Osborn said this regarding CO2 fertilisation effects.

        “We don’t identify or remove such effects. The empirical evidence for sustained effects (i.e. over decades) on trees in cool, moist locations over long periods of time is scarce.”

        That seems a bit like ignoring an effect because it is probably not possible to collect such data if recent CO2 levels are considered exceptional for current living trees. Given that CO2 affects efficiency of water usage, and thus maybe temperature effects too, I would have thought that this was a good reason to specifically exclude data from the most recent decades.

        • Posted Feb 10, 2016 at 3:16 PM | Permalink

          Re: Osborn comment — The empirical evidence for sustained temperature correspondence by tree ring width over long periods is also scarce.

          The studies that I have seen testing woody plants’ responses to increased CO2 examined differences that are significantly larger than the current increase over pre-industrial atmospheric concentrations. However, if the response is linear, one could at least begin to estimate the response over time to an increase from +/- 280 ppm to over 400 ppm. Several big “ifs” in that effort, of course.

      • Craig Loehle
        Posted Feb 10, 2016 at 2:06 PM | Permalink | Reply

        Robert: The paper you cite is a FORWARD looking model. I don’t claim that you can’t do a reasonable job of projecting tree growth responses to FUTURE climate, my argument is that the claim that this particular tree that is a good responder now was always a good responder is unproven (plus my other concerns about nonstationarity of climate, growing conditions). Are we to seriously believe that growing conditions around particular trees have been the same and the PDO pattern (etc) the same going back 800 or 1000 years in regions with glaciers and permafrost? Over 1000 years some coastal Alaska sites have risen many feet due to tectonics and isostatic rebound, potentially affecting drainage etc.
        No one has ever shown a proof in the stationarity assumtion. I’m not “ignoring” anything.

        • Posted Feb 10, 2016 at 3:01 PM | Permalink

          Craig – very quick – on phone and typing tricky
          VS lite has been used many times to test the mixed climate variable response of trees.
          In environemtns where the PREDOMINANT limiter to growth is either temperature (upper tree-line) of moisture (low tree-line), VS lite has validated that hypthsesis quite well. In such sitautions, the simple traditional regresison model can then be used to model/reconstruct that climate variable.
          finally – for the GOA and Icefield reconstructions, the reconstructed cool periods coiniced VERY well with the glacial record – at least providing validation of the reconstructed cold periods. Apologies for spelling – fat thumbs and a bit of wine

        • Craig Loehle
          Posted Feb 10, 2016 at 3:44 PM | Permalink

          A comparison to historical periods is what I was looking for. Good. However:
          1) Such comparisons have been done in very few places where tree data have been taken
          2) This is only a qualitative comparison.
          3) Since low growth can occur when too hot or too cold (upside down parabola), the inverse problem is undefined (no unique solution). It is not just that it is nonlinear but is dual-valued (as I think you know).
          So testing is good, but so far does not alleviate my concerns re tree rings overall.

  31. mpainter
    Posted Feb 10, 2016 at 3:11 PM | Permalink | Reply

    Rob, just inland from the Gulf of Alaska are plenty of glaciers. Assuming that d18 O ice core analysis has been performed in the area, and that regional climate reconstructions based on Ice cores have been produced,
    how well do the ice cores corroborate your own tree ring work?

    Alternatively, if no ice core d18 O analysis is available, do you not think that it would be a sensible step to undertake one, given the possibility of obtaining independent confirmation of you tree ring work? After all, if d18 analysis gives you corroboration, that should settle the question of the value of tree ring paleoclimate reconstruction.

3 Trackbacks

  1. […] sked för att resultatet skall passa in i den gängse bilden? , jag vet inte. Diskussionen om hockeyklubbor och trädringar är ju åter aktuell på Climat Audit, någon som kan se närmare på materialet från […]

  2. […]  Starting there,  Steve McIntyre just destroys a tree ring climate paper in this blog entry: Picking Cherries in the Gulf of Alaska.   I mean he methodically eviscerates the […]

  3. By New Light on Gulf of Alaska « Climate Audit on Feb 14, 2016 at 2:47 PM

    […] week, I posted on the effect of ex post site selection on the Gulf of Alaska tree ring chronology used in Wilson […]

Post a Comment

Required fields are marked *



Get every new post delivered to your Inbox.

Join 3,633 other followers

%d bloggers like this: