CG2 and Ex Post Picking

Jul 31, 2019: Noticed this as an unpublished draft from 2014. Not sure why I didn’t publish at the time. Neukom, lead author of PAGES (2019) was coauthor of Gergis’ papers.

One of the longest-standing Climate Audit controversies has been about the bias introduced into reconstructions that use ex post screening/correlation.   In today’s post, I’ll report on a little noticed* Climategate-2 email  in which a member of the paleoclimatology guild (though then junior) reported to other members of the guild that he had carried out simulations to test “the phenomenon that Macintyre has been going on about”, finding that the results from his simulations from white noise “clearly show a ‘hockey-stick’ trend”, a result that he described as “certainly worrying”.  (*: WUWT article here h/t Brandon).

A more senior member of the guild dismissed the results out of hand:  “Controversy about which bull caused mess not relevent.”  Members of the guild have continued to merrily ex post screen to this day without cavil or caveat.

The bias, introduced by ex post screening of a large network of proxies by correlation against increasing temperatures, has been noticed and commented on (more or less independently) by myself, David Stockwell, Jeff Id, Lucia and Lubos Motl.  It is trivial to demonstrate through simulations, as each of us has done in our own slightly different ways.

In my case, I had directed the criticism of ex post screening particularly at practices of D’Arrigo and Jacoby in their original studies: see, for example, one of the earliest Climate Audit posts (Feb 2005) where I wrote:

Jacoby and d’Arrigo [1989] states on page 44 that they sampled 36 northern boreal forest sites within the preceding decade, of which the ten “judged to provide the best record of temperature-influenced tree growth” were selected. No criteria for this judgement are described, and one presumes that they probably picked the 10 most hockey-stick shaped series.  I have done simulations, which indicate that merely selecting the 10 most hockey stick shaped series from 36 red noise series and then averaging them will result in a hockey stick shaped composite, which is more so than the individual series.

The issue of cherry picking arose forcefully at the NAS Panel on paleoclimate reconstructions on March 2, 2006 when D’Arrigo told a surprised panel on March 2 that you had to pick cherries if you wanted to make “cherry pie”, an incident that I reported in a blog post a few days later on March 7 (after my return to Toronto.)

Ironically, on the same day, Rob Wilson, then an itinerant and very junior academic, wrote a thus far unnoticed CG2 email (4241. 2006-03-07) which reported on simulations that convincingly supported my concerns about ex post screening. Wilson’s email was addressed to most of the leading dendroclimatologists of the day:  Ed Cook, Rosanne D’Arrigo, Gordon Jacoby, Jan Esper, Tim Osborn, Keith Briffa, Ulf Buentgen, David Frank,  Brian Luckman and Emma Watson, as well as Philip Brohan of the Met Office. Wilson wrote:

Greetings All,

I thought you might be interested in these results. The wonderful thing about being paid properly (i. e. not by the hour) is that I have time to play.

The whole Macintyre issue got me thinking about over-fitting and the potential bias of screening against the target climate parameter.  Therefore, I thought I’d play around with some randomly generated time-series and see if I could ‘reconstruct’ northern hemisphere temperatures.

I first generated 1000 random time-series in Excel – I did not try and approximate the persistence structure in tree-ring data. The autocorrelation therefore of the time-series was close to zero, although it did vary between each time-series. Playing around therefore with the AR persistent structure of these time-series would make a difference. However, as these series are generally random white noise processes, I thought this would be a conservative test of any potential bias.

I then screened the time-series against NH mean annual temperatures and retained those series that correlated at the 90% C. L. 48 series passed this screening process.

Using three different methods, I developed a NH temperature reconstruction from these data:

  1. simple mean of all 48 series after they had been normalised to their common period
  2. Stepwise multiple regression
  3. Principle component regression using a stepwise selection process.

The results are attached.  Interestingly, the averaging method produced the best results, although for each method there is a linear trend in the model residuals – perhaps an end-effect problem of over-fitting.

The reconstructions clearly show a ‘hockey-stick’ trend. I guess this is precisely the phenomenon that Macintyre has been going on about. [SM bold]

It is certainly worrying, but I do not think that it is a problem so long as one screens against LOCAL temperature data and not large scale temperature where trend dominates the correlation. I guess this over-fitting issue will be relevant to studies that rely more on trend coherence rather than inter-annual coherence. It would be interesting to do a similar analysis against the NAO or PDO indices. However, I should work on other things.

Thought you’d might find it interesting though. comments welcome

Rob

Wilson’s sensible observations, which surely ought to have caused some reflection within the guild, were peremptorily dismissed about 15 minutes later by the more senior Ed Cook  as nothing more than “which bull caused which mess”:

You are a masochist. Maybe Tom Melvin has it right:  “Controversy about which bull caused mess not relevent. The possibility that the results in all cases were heap of dung has been missed by commentators.”

Cook’s summary and contemptuous dismissal seems to have persuaded the other correspondents and the issue receded from the consciousness of the dendroclimatology guild.

Looking back at the contemporary history, it is interesting to note that the issue of the “divergence problem” embroiled the dendro guild the following day (March 8) when Richard Alley, who had been in attendance on March 2, wrote to IPCC Coordinating Lead Author Overpeck “doubt[ing] that the NRC panel can now return any strong endorsement of the hockey stick, or of any other reconstruction of the last millennium”: see 1055. 2006-03-11 (embedded in which is Alley’s opening March 8 email to Overpeck). In a series of interesting emails (e.g. CG2 1983. 2006-03-08;  1336. 2006-03-09; 3234. 2006-03-10; 1055. 2006-03-11), Alley and others discussed the apparent concerns of the NAS panel about the divergence problem, e.g. Alley:

As I noted, my observations of the NRC committee members suggest rather strongly to me that they now have serious doubts about tree-rings as paleothermometers (and I do, too… at least until someone shows me why this divergence problem really doesn’t matter). —

In the end, after considerable pressure from paleoclimatologists, the NAS Panel more or less evaded the divergence problem (but that’s another story, discussed here from time to time.)

Notwithstanding Wilson’s “worry” about the results of his simulations, ex post screening continued to be standard practice within the paleoclimate guild.  Ex post screening was used, for example, in the Mann et al (2008) CPS reconstruction.  Ross and I commented on the bias in a comment published by PNAS in 2009 as follows:

Their CPS reconstruction screens proxies by calibration-period correlation, a procedure known to generate ‘‘hockey sticks’’ from red noise (4 – Stockwell, AIG News, 2006).

In their reply in PNAS, Mann et al dismissed the existence of ex post screening bias, claiming that we showed  “unfamiliarity with the concept of screening regression/validation”:

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ”hockey sticks” is unsupported in peer reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

CA readers will remember that the issue arose once again in Gergis et al 2012, who had claimed to have carried out detrended screening, but had not.  CA readers will also recall that Mann and Schmidt both intervened in the fray, arguing in favor of ex post screening as a valid procedure.

 

 


28 Comments

  1. kim
    Posted Jul 31, 2019 at 6:55 PM | Permalink | Reply

    The easiest people to fool were themselves.
    ===========================

  2. Posted Jul 31, 2019 at 8:20 PM | Permalink | Reply

    Jul 31, 2019: Noticed this as an unpublished draft from 2014….
    Ironically, on the same day, Rob Wilson, then an itinerant and very junior academic, wrote a thus far unnoticed CG2 email (4241. 2006-03-07) which reported on simulations that convincingly supported my concerns about ex post screening.

    Climategate 2 email – Rob Wilson replicates McIntyre & McKitrick – produces hockey sticks out of noise
    Anthony Watts / November 27, 2011

    https://wattsupwiththat.com/2011/11/27/climategate-2-email-briffa-replicates-mcintyre-mckitrick-produces-hockey-sticks-out-of-noise/

    • Stephen McIntyre
      Posted Jul 31, 2019 at 10:53 PM | Permalink | Reply

      good catch. I’ll note that up in text tomorrow. It’s a while since I looked at this material.

      • R.S. Brown
        Posted Aug 1, 2019 at 1:02 AM | Permalink | Reply

        Steve,

        My congratulations on your use of the non-judgmental term “guild”.

        Keeping the tone of the descriptive narrative to neutral values can only
        encourage rational discussion.

  3. Zagzigger
    Posted Aug 1, 2019 at 5:30 AM | Permalink | Reply

    Am I misreading this? Surely the comment made by Ed Cook means they are all BS – but nobody noticed.

    “Controversy about which bull caused mess not relevent. The possibility that the results in all cases were heap of dung has been missed by commentators.”

    • slgeiger
      Posted Aug 1, 2019 at 1:09 PM | Permalink | Reply

      Read that the same way….

    • Anteros
      Posted Aug 2, 2019 at 11:37 AM | Permalink | Reply

      I also read it that way

      • Matt Skaggs
        Posted Aug 3, 2019 at 10:06 AM | Permalink | Reply

        Cook is indeed calling the reconstructions “BS.”

        Cook became disaffected with Michael Mann’s bullying, and the result was a shot across the bow:

        https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2001GL014580

        Even though Cook never bought into the smoothing-out of the MWP, he brusquely dismissed Steve’s request for data, so Steve has been on the attack ever since.

        • Jeff Alberts
          Posted Aug 3, 2019 at 10:59 AM | Permalink

          Indeed, Cook seems only able to criticize the consensus in private. In public he’s a staunch defender.

  4. Posted Aug 2, 2019 at 9:58 AM | Permalink | Reply

    I don’t really want to beat on Rob:

    “It is certainly worrying, but I do not think that it is a problem so long as one screens against LOCAL temperature data and not large scale temperature where trend dominates the correlation.”

    I have no idea why he would automatically conclude that local temperatures would resolve the issue. Any admission by a scientist that the tree ring data contained even a small signal from non-temperature influences (and there is a lot of poorly correlated data) results in the over-selection of the favorable portion of the non-temperature signal and rejection of the unfavorable portion. This hand waiving argument is non-trivial and unscientific.

    • EdeF
      Posted Aug 2, 2019 at 8:29 PM | Permalink | Reply

      Tree ring growth may respond to factors other than temperature, such as rainfall, rainfall timing, cloud cover %, soil type and nutrient availability, plant disease, human activity, changes in atmospheric carbon levels and changes in other gases, air pressure, winds, tree site location (shading effects, etc), avalanches, to name a few.

      • Posted Aug 6, 2019 at 6:29 AM | Permalink | Reply

        Oh they truly do. The noise in the data is tremendous which is why the field has replaced what should have been simple data averaging methods with preferential data sorting. I wish all science worked like that because we could get so much more done!! All drugs would work, stocks would soar, anything we wanted could be done.

    • TimTheToolMan
      Posted Aug 17, 2019 at 4:57 AM | Permalink | Reply

      Jeff Id writes “Any admission by a scientist that the tree ring data contained even a small signal from non-temperature influences (and there is a lot of poorly correlated data) results in the over-selection of the favorable portion of the non-temperature signal and rejection of the unfavorable portion.”

      regarding: “Controversy about which bull caused mess not relevent. The possibility that the results in all cases were heap of dung has been missed by commentators.”

      No it wasn’t, Tom. You were either listening to the wrong commentators – the same ones who deny there is issue with dendroclimatology …or had your fingers in your ears going LA, LA, LA, LA.

  5. Phoenix44
    Posted Aug 6, 2019 at 8:33 AM | Permalink | Reply

    I find the whole thing deeply puzzling. How can you prove your reconstruction by picking only the proxies that prove your reconstruction? It’s absurd, akin to a drug company simply ignoring any participants in a drug trial who didn’t get better. If Big Pharma did that, people would be outraged, yet here it happens and people shrug or claim it’s fine.

    • Adam Gallon
      Posted Aug 6, 2019 at 3:33 PM | Permalink | Reply

      Happens in Big Pharma too.
      Trials with a negative outcome, don’t get published. An “It didn’t work” outcome, won’t get the researchers any kudos or citations.
      The pharma companies aren’t interested in publishing negative outcomes either.

      • Posted Aug 6, 2019 at 4:51 PM | Permalink | Reply

        Does not happen in big pharma. Negative data would never be deleted. You know not what you profess. Also, journals would never accept a paper stating a drug didn’t work.

        • Posted Aug 7, 2019 at 8:06 PM | Permalink

          Repeating the words “It can’t happen here” is always a comfort and the most people not involved in the area in question will agree with you. If the assumption that whistleblowers take no risks is valid I would agree as well. But if the truth is they get no awards or industry praise but instead risk their careers then we need to have auditors, investigative reporting and open minds.

        • Adam Gallon
          Posted Aug 8, 2019 at 1:53 AM | Permalink

          Not deleted, simply if something didn’t work, then it won’t get published.

        • MikeN
          Posted Aug 14, 2019 at 6:48 PM | Permalink

          If journals are not accepting a paper that a drug didn’t work, then this is deletion of negative data.
          I’m not so sure journals would not publish this.

      • Michael Jankowski
        Posted Aug 12, 2019 at 5:45 PM | Permalink | Reply

        Drug trial failures beyond the investigatory phase are at least REPORTED with regularity. Clinicaltrials_dot_gov, for example, has a database of trials that include suspended, terminated, withdrawn, etc.

  6. JAMES SMYTH
    Posted Aug 7, 2019 at 9:44 PM | Permalink | Reply

    The point is that in a medical trial, negative outcomes aren’t REMOVED from the data, in order to get positive outcomes.

  7. MrPete
    Posted Aug 8, 2019 at 11:52 AM | Permalink | Reply

    Big Pharma is quite upset about “tricks” like this.

    There are major efforts under way to improve the situation. A few pertinent references:

    * What Makes Science True?: A 15 min PBS video on the reproducibility crisis. Includes a good bibliography under “Editor’s note”.

    * Reproducible Research: One solution that’s helping. Studies following this paradigm require a package of data and processing instructions that automatically produce the analytical results. Easy for anybody to check the assumptions, etc.

    * Registered Reports aka Preregistered Research: The study plan is peer reviewed, and a publication commitment is made, before any data is gathered. Thus, even negative results are published!

    To me, these are wonderful advances.

  8. Posted Aug 9, 2019 at 11:13 PM | Permalink | Reply

    Reblogged this on Climate Collections.

  9. Tom T
    Posted Aug 9, 2019 at 11:49 PM | Permalink | Reply

    Cooks dismissal sounds like the reasoning used on a poker game. The term is pot committed. You have bet so much in a pot that even though you have been trapped and you realize you have been trapped it doesn’t matter. You have to see the hand to the end. Statistically you have higher odds of playing the hand and hoping to catch an out and win the pot and eventually the game than you do of winning the game should you fold and lose all the chips you have put into the pot.

  10. EdeF
    Posted Aug 10, 2019 at 8:49 AM | Permalink | Reply

    This got me thinking, do they also use ex post screening on instrumented weather station data?

  11. I_am_not_a_robot
    Posted Aug 16, 2019 at 1:49 AM | Permalink | Reply

    The idea that some individual trees are as honest as the millennium is long based on an approximate correlation with the (purported) thermometer record and others are inveterate liars is preposterous.

    • patfrank01
      Posted Aug 18, 2019 at 1:53 PM | Permalink | Reply

      That’s the argument, though, Ianar. The literature talks about trees with constant response over their lifetime.

      I discussed that assumption in my “Negligence…” paper.

6 Trackbacks

  1. […] Reposted from Climate Audit […]

  2. […] McIntyre over a Climate Audit revives a post from 2014. This site is worth a look should you be interested in some of the problems with the Climate […]

  3. […] https://climateaudit.org/2019/07/31/cg2-and-ex-post-picking/ […]

  4. […] CG2 and Ex Post Picking […]

  5. […] CG2 and Ex Post Picking […]

  6. […] https://climateaudit.org/2019/07/31/cg2-and-ex-post-picking/ […]

Post a Comment

Required fields are marked *

*
*

%d bloggers like this: