McShane and Wyner 2010

A reader (h/t ACT) draws attention to an important study on proxy reconstructions (McShane and Wyner 2010) in the Annals of Applied Statistics (one of the top statistical journals)
A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

They cite the various MM articles.

356 Comments

  1. Lewis
    Posted Aug 14, 2010 at 12:06 PM | Permalink

    Can I say ‘verified’ and ‘Q.E.D.’!?

    • richyRich
      Posted Aug 19, 2010 at 1:15 AM | Permalink

      @Lewis: “Can I say ‘verified’ and ‘Q.E.D.’!?”

      well… can ya? BTW, how’s the audit of this paper coming along?

  2. Benjamin
    Posted Aug 14, 2010 at 12:19 PM | Permalink

    “On the one hand, we conclude unequivocally that the evidence for a
    ”long-handled” hockey stick (where the shaft of the hockey stick extends
    to the year 1000 AD) is lacking in the data.”

    Ouch, that hurts.

  3. David
    Posted Aug 14, 2010 at 12:19 PM | Permalink

    Everyone knows you can’t get statistical “skill” without the bristlecones – did they consider that ? Haha!

    • EdeF
      Posted Aug 15, 2010 at 9:15 PM | Permalink

      I would like to see the authors address the assumption of the linearity of the proxies, maybe in a sequel to this well written report. I have been unable to see much linear behavior in comparing BCP ring widths with local temperature
      data over the instrumental time frame. As I have said in other posts on this site, the plots look like what you would see if someone were checking out a new bird gun. From a logical point of view, if proxies do not behave linearly
      with some variable of interest (summer temperature, average yearly temperature
      of the preceeding year, rainfall, etc), or with some combination, then it is pointless to construct and evaluate 1000 year old reconstructions. I am still going to re-read this lucid report and await any updates.

  4. PaulM
    Posted Aug 14, 2010 at 12:35 PM | Permalink

    Wow. Great. This is really important. These are serious statisticians and it’s a good journal. Amusing to see they are from Pennsylvania!

  5. Hoi Polloi
    Posted Aug 14, 2010 at 12:38 PM | Permalink

    To me climataology is “we know AGW exists, now find the matching models”.

  6. RayG
    Posted Aug 14, 2010 at 12:55 PM | Permalink

    Wow!!! I particularly like their comment in the Conclusions “Climate scientists have greatly underestimated the uncertainty of proxybased
    reconstructions and hence have been overconfident in their models.” The last sentence in the paper describes all that is wrong with the current state of climatology as practiced by “The Team,” “Although we assume the reliability of their data for our purposes
    here, there still remains a considerable number of outstanding questions
    that can only be answered with a free and open inquiry and a great deal of
    replication.”

  7. Pat Frank
    Posted Aug 14, 2010 at 1:20 PM | Permalink

    Halle-bloody-lujah. It’s about time statisticians took a detailed interest in the scientific kludge that is proxy thermometry.

    Steve has been the lonely voice in this field for years, has single-handedly carried the fight for honest audits to the very core of the field, and has bravely withstood the resulting vicious opprobrium that has disgraced science.

    So, hats off to Blakeley McShane and Abraham Wyner for standing up with Steve and taking up the rescue of scientific integrity. Until now, apart from Edward Wegman, it has been sorely neglected by their colleagues.

    • Posted Aug 14, 2010 at 3:13 PM | Permalink

      Agreed Pat. All I’d add is that Steve’s voice has been made less lonely by the terrific community here at Climate Audit. Never has something as humble as WordPress been used for something so significant. Every constructive critic and online supporter should consider some of the glory from the Annals of Applied Statistics as duly reflected on them today. It’s telling that Steve only knew of this from one of CA’s followers. I salute every one of you, friends.

    • Posted Aug 14, 2010 at 7:26 PM | Permalink

      Re: Pat Frank (Aug 14 13:20), I must have fallen asleep twenty times before I could finish the paper. But I was sooooooo determined to read it all because even while understanding little of the details so I probably make enthusiastic but stupid remarks here, I too feel

      “Halle-bloody-lujah”

    • dougie
      Posted Aug 14, 2010 at 7:51 PM | Permalink

      Amen to that Pat.

      about time they/somebody of their calibre chipped in.

    • Faustino
      Posted Aug 17, 2010 at 2:36 AM | Permalink

      Pat, about ten years ago, former Australian Statistician and past-president of the international statisticians’ union Ian Castles discredited the economic modelling on which the IPCC’s scenarios are based, working with former OECD Chief Economist David Henderson. Some of the IPCC staff initially welcomed the critique as helping them get to the truth, higher levels then pulled down the shutters, the discredited modelling was never revisited. While head of the ABS, Castles was a major player in getting international agreement on appropriate national income statistics, which reflect those developed in Australia. International comparisons are based on “Purchasing power parity” – adjusting monetary values in terms of what can be purchased to enable international comparisons. The IPCC’s modellers – who are not specialists in this field – did not use PPP. The models had some ludicrous results – e.g., in some scenarios, South Africa’s national income in 2100 exceeded world income in 1990, the start date. After growth was (incorrectly) modelled, assumptions were made about the emissions-intensity of growth. These assumptions have proved to be far too high, but have never been corrected by the IPCC.

      • Faustino
        Posted Aug 17, 2010 at 2:58 AM | Permalink

        I was going to send the paper to Castles, I’ve just discovered he died two weeks ago. A sad loss.

  8. David L. Hagen
    Posted Aug 14, 2010 at 1:24 PM | Permalink

    Finally scientists brave enough to state the unknowns rather than showing hubris:

    Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries.

  9. Justin
    Posted Aug 14, 2010 at 1:41 PM | Permalink

    “It is not necessary to know very much about the underlying methods to see that graphs such as Figure 1 are problematic as descriptive devices. First, the superposition of the instrumental record (red) creates a strong but entirely misleading contrast. ”

    Oh snap.

  10. Posted Aug 14, 2010 at 1:44 PM | Permalink

    Acute constipation is always best worked out with a pencil.

  11. Benjamin
    Posted Aug 14, 2010 at 1:46 PM | Permalink

    “The proxy record has to be evaluated in terms of its innate ability to reconstruct historical temperatures (i.e., as opposed to its ability to ”mimic” the local time dependence structure of the temperature series).”

  12. Posted Aug 14, 2010 at 1:51 PM | Permalink

    This is a beautiful paper. Simply beautiful.

  13. stephen richards
    Posted Aug 14, 2010 at 1:54 PM | Permalink

    VINDICATION !!!!!!

  14. stephen richards
    Posted Aug 14, 2010 at 1:55 PM | Permalink

    This is not harshly worded but it is very harsh.

  15. TimG
    Posted Aug 14, 2010 at 1:59 PM | Permalink

    The paper is listed as submitted. Does that mean anything more than the paper is done and the authors are hoping it will pass peer review?

    • John M
      Posted Aug 14, 2010 at 2:18 PM | Permalink

      TimG

      A reasonable question. It does appear that it’s at least been through the review process.

      Acknowledgements. We thank Editor Michael Stein, two anonymous referees, and Tilmann Gneiting for their helpful suggestions on our manuscript.

      • bob sykes
        Posted Aug 14, 2010 at 3:17 PM | Permalink

        No. “Submitted” is professor talk for “has been mailed to journal for review.” There is no implication that the paper is or will be reviewed or will be accepted.

        I was a P&T chair a a major university for 10 years or so, and junior, untenured faculty were enamored of this word. It does indicate effort, but it does no indicate achievement.

        None-the-less, a very interesting paper.

        • G E Lambert
          Posted Aug 14, 2010 at 3:40 PM | Permalink

          When I go to the web site of the “Annals of Applied Statistics” and look under the category of “next issues”, this paper shows up. I am not familiar with AOAS. Does “next issues” merely mean submitted but not neccessarily accepted papers?

        • John M
          Posted Aug 14, 2010 at 3:58 PM | Permalink

          “There is no implication that the paper is or will be reviewed”

          So what is it about “two anonymous referees” that is confusing me?

        • Paul Dennis
          Posted Aug 14, 2010 at 4:09 PM | Permalink

          John, the paper is downloaded from the website of the Annals of Applied Statistics. Their web page indicates it is to appear in an upcoming issue. The implication is the paper has now been reviewed and accepted for publication.

        • John M
          Posted Aug 14, 2010 at 5:03 PM | Permalink

          Thanks Paul. I was just tweaking Bob Sykes a little.

          Great to see a comment from you.

          It’d be nice to see more of them, since they’re always highly informative, though I guess isotopes haven’t come up for a while.

      • Faustino
        Posted Aug 17, 2010 at 2:39 AM | Permalink

        McShane’s website shows that the paper has been accepted for publication, following which he posted the current draft on 16/8/2010 (or 8/16/2010 for US readers.)

  16. SOI
    Posted Aug 14, 2010 at 2:17 PM | Permalink

    This is very interesting. A paper by two heavy duty statisticians, both at top 5 business schools, in a highly respected mainstream peer-reviewed statistical journal. The paper appears a vindication of MM and a complete repudiation of Wahl and Amman (and to a fair extent of Mann). To my mind, the money quote is:

    “…The major difference between our model and those of climate scientists, however, can be seen in the large width of our uncertainty bands. Because they are pathwise and account for the uncertainty in the parameters (as outlined in Section 5.3), they are much larger than those provided by climate scientists. In fact, our uncertainty bands are so wide that they envelop all of the other backcasts in the literature. Given their ample width, it is difficult to say that recent warming is an extraordinary event compared to the last 1,000 years. For example, according to our uncertainty bands, it is possible that it was as warm in the year 1200 AD as it is today. In contrast, the reconstructions produced in Mann et al. (2008) are completely pointwise…”

    It will be interesting to see how the team reacts to this, but it is a harsh blow to the hockey stick.

    • Posted Aug 14, 2010 at 4:15 PM | Permalink

      Re: SOI (Aug 14 14:17),
      A paper by two heavy duty statisticians, both at top 5 business schools
      Well, not exactly. The lead author says in his CV that he expects to get his PhD in May 2010. Wyner is his supervisor.

      Curiously, for his thesis he lists not just a thesis advisor, but also a “marketing advisor”.

      • Posted Aug 21, 2010 at 7:34 PM | Permalink

        I missed this post. Your link is outdated as he has received his Ph.D. in May as per his current CV,

        Yes they are two heavy duty statisticians,

        Blakeley B. McShane, B.S. Economics summa cum laude, University of Pennsylvania (2003), B.A. Mathematics summa cum laude, University of Pennsylvania (2003), M.A. Mathematics, University of Pennsylvania (2003), Studies in Philosophy, University of Oxford (2004-2005), M.A. Statistics, University of Pennsylvania (2010), Ph.D. Statistics, University of Pennsylvania (2010), Donald P. Jacobs Scholar; Assistant Professor of Marketing, Northwestern University (2010-Present)

        Abraham J. Wyner, B.S. Mathematics magna cum laude, Yale University (1988), Ph.D. Statistics, Stanford University (1993), National Science Foundation Fellowship (1989-1991), Acting Assistant Professor of Statistics, Stanford University (1993-1995), National Science Foundation Post-Doctoral Fellowship in the Mathematical Sciences (1995-1998), Visiting Assistant Professor of Statistics, University of California at Berkeley (1995-1998), Assistant Professor of Statistics, University of Pennsylvania (1998-2005), Associate Professor of Statistics, University of Pennsylvania (2005-Present)

  17. Fred Harwood
    Posted Aug 14, 2010 at 2:17 PM | Permalink

    The journal website lists it as to be in the next issue. Sounds like accepted.

    • RomanM
      Posted Aug 14, 2010 at 2:21 PM | Permalink

      Re: Fred Harwood (Aug 14 14:17),

      The web page referred to can be found here.

      The journal is one of a number published by the Institute of Mathematical Statistics (IMS), a high class outfit. it would have been peer reviewed in a meaningful fashion.

  18. Dean P
    Posted Aug 14, 2010 at 2:18 PM | Permalink

    [sarcasm]
    But but but…

    They’re not climatologists!!! And don’t they know that if they’re going to criticize proxies they first have to make up their own proxy?

    Sheesh… amateurs!
    [/sarcasm]

  19. thechuckr
    Posted Aug 14, 2010 at 2:37 PM | Permalink

    Although most of the math is over my head, the conclusion is breathtaking and dare I say, “robust.”

  20. Hector M.
    Posted Aug 14, 2010 at 2:38 PM | Permalink

    Any chance of some Team member still “going to town” to keep this from being actually published?

    • CRS, DrPH
      Posted Aug 15, 2010 at 2:28 PM | Permalink

      I doubt it! From the Climategate emails, we know that they have had undue influence in journals particular to their own branch of “science.” However, I really doubt if they will have any influence over this particular journal.

      The Oxborough report found this:

      “The panel found that the statistical tools that CRU scientists employed were not always the most cutting-edge, or most appropriate. “We cannot help remarking that it is very surprising that research in an area that depends so heavily on statistical methods has not been carried out in close collaboration with professional statisticians,” reads the inquiry’s conclusions.

      However, “it is not clear that better methods would have produced significantly different results,” the panel adds.”

      http://www.newscientist.com/article/dn18776-climategate-scientists-chastised-over-statistics.html

      So, how can the Hockey Team possibly object? *heh!*

      • Posted Aug 15, 2010 at 5:41 PM | Permalink

        Re: CRS, DrPH (Aug 15 14:28), Thanks Doc, for the New Scientist link. I think it contains some very pertinent information:

        David Hand, president of the UK Royal Statistical Society and a member of Oxburgh’s panel, said the work of climate scientists is a “particularly challenging statistics exercise because the data are incredibly messy”… He said the strongest example he had found of imperfect statistics in the work of the CRU and collaborators elsewhere was the iconic “hockey stick” graph, produced by Michael Mann… Hand pointed out that the statistical tool Mann used… produced an “exaggerated” rise in temperatures over the 20th century, relative to pre-industrial temperatures.

        That point was initially made by climate sceptic and independent mathematician Stephen McIntyre. The upwards incline on later versions of the graph has been corrected to be shorter and less exaggerated (for the full story of the hockey stick controversy, see Climate: The great hockey stick debate, and Climate myths: The ‘hockey stick’ graph has been proven wrong).

        Hand said he was “impressed” by McIntyre’s statistical work. But whereas McIntyre claims that Mann’s methods have “created” the hockey stick from data that does not contain it, Hand agrees with Mann: he too says that the hockey stick – showing an above-average rise in temperatures during the 20th century – is there. The upward incline is just shorter than Mann’s original graphic suggests…

  21. theduke
    Posted Aug 14, 2010 at 2:40 PM | Permalink

    “They cite the various MM articles.”

    They not only cite them, you and Ross get more citations in the refernces than anyone but Mann et al.

    Congratulations to Steve and Ross.

  22. Michael Jankowski
    Posted Aug 14, 2010 at 2:41 PM | Permalink

    Probability that this article is referenced in the next IPCC report: <0.1%

  23. Martin A
    Posted Aug 14, 2010 at 3:06 PM | Permalink

    It’s a very clearly written article. My experience is that:

    – Clearly written papers usually convey significant information
    and are free from major errors.

    – Difficult-to-read articles often contain major misconceptions or significant errors.

    I found the original Hockey Stick paper extremely difficult to read.

    • Posted Aug 14, 2010 at 3:23 PM | Permalink

      This useful wisdom reminds of another in my own field of software – Tony Hoare in his Turing Award speech:

      There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

      Thank God the deficiencies in MBH98 and all that followed it have now be made obvious, in a most fitting place.

  24. MarkB
    Posted Aug 14, 2010 at 3:15 PM | Permalink

    Regardless of the content of the paper, it’s nice to see academic statisticians paying attention to the topic.

  25. John M
    Posted Aug 14, 2010 at 3:22 PM | Permalink

    Inerestingly, you can hunt down the first author’s CV on Google Scholar.

    Check out the first hit and the hosting web site.

    Small world. 🙂

    http://scholar.google.com/scholar?q=BLAKELEY%20B.%20MCSHANE&rls=com.microsoft:en-us:IE-SearchBox&oe=UTF-8&rlz=1I7HPND_en&um=1&ie=UTF-8&sa=N&hl=en&tab=ws

    • Posted Aug 14, 2010 at 3:24 PM | Permalink

      Let’s hear it for Penn State!

      • John Baltutis
        Posted Aug 14, 2010 at 5:34 PM | Permalink

        Let’s hear it for Penn State!

        Did I miss something? Blakeley B. McShane hails from Northwestern University and Abraham J. Wyner from the University of Pennsylvania, a completely different institution and not one to be confused with Mann’s employer.

        • John M
          Posted Aug 14, 2010 at 5:45 PM | Permalink

          John,

          We’re referring to the site that has his CV. See the link I posted.

        • Posted Aug 14, 2010 at 5:49 PM | Permalink

          Snap!

        • John Baltutis
          Posted Aug 14, 2010 at 6:05 PM | Permalink

          That’s what I missed; that Penn St. hosted his out of date CV. According to the title manuscript, Blakeley is now at Northwestern.

        • Posted Aug 14, 2010 at 5:48 PM | Permalink

          The ‘hosting website’ of the first hit for McShane on Google Scholar – his CV – is Penn State, the host being citeseerx.ist.psu.edu. CiteSeerX looks like a pretty general service. A coincidence that may not signify a imminent reversal of the institution’s defence of M. Mann, I grant you. But I agree with John M that it’s a nice touch.

  26. ZT
    Posted Aug 14, 2010 at 4:25 PM | Permalink

    Nice paper. It will be interesting to see what Gavin, Tamino, and the UEA inquiry keystone kops, can come up with to attempt to deal with what appears to be a turning tide. Second careers in the exacting world of Feng Shui consulting are beckoning.

    • TimG
      Posted Aug 14, 2010 at 4:33 PM | Permalink

      It will be ignored because it is not in an ‘offical’ climate journal. We will treated to lectures on why only climate scientists are qualified to determine what statistical analyses are valid for paleo data. We will also be told that since they did not show the MWP exists it does nothing to refute the claim that MWP did not exist.

      • ZT
        Posted Aug 14, 2010 at 6:06 PM | Permalink

        I’m sure that some will say that only ‘official’ climate journals count, and the authors will be criticized, but it seems to me that some nasty cliquey pseudoscience has been afoot, but happily that is now slipping into the past. Climatology will be the better for the experience – even if climatology is forced to admit that statistics is a superior science.

    • thechuckr
      Posted Aug 14, 2010 at 4:51 PM | Permalink

      I posted on this paper at Tamino and Romm. Both posts disappeared within an hour.

      • Hoi Polloi
        Posted Aug 14, 2010 at 5:13 PM | Permalink

        I disapprove of what you say, but I will defend to the death your right to say it. ~ Voltaire

        • Robinson
          Posted Aug 16, 2010 at 7:42 PM | Permalink

          But Voltaire also said, “a witty saying proves nothing” :p.

    • Posted Aug 14, 2010 at 7:52 PM | Permalink

      Feng Shui consultant, equally useless but pays more.

  27. Bernie
    Posted Aug 14, 2010 at 4:30 PM | Permalink

    The footnote on page 35 will leave a mark:

    12 On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a
    regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining paleoclimatology as a statistical enterprise.

  28. Bernie
    Posted Aug 14, 2010 at 4:43 PM | Permalink

    I suspect that Amman will have something to say, since he appears to have the strongest links/capabilities to respond to the approaches taken by McShane and Wyner.

    McShane is a brand new PhD!!

    What struck me in looking at the paper is that these guys had thought very hard about the statistical counter arguments and seemed to have addressed them.

    The one thing I am less sure about is whether their apparent undifferentiated treatment of all the proxies is a problem. They included for example Tiljander, presumably as oriented by Mann.

    • Jonathan
      Posted Aug 14, 2010 at 5:54 PM | Permalink

      Re: Bernie (Aug 14 16:43), “The one thing I am less sure about is whether their apparent undifferentiated treatment of all the proxies is a problem. They included for example Tiljander, presumably as oriented by Mann.”

      The authors are very clear on their approach to this point:

      All three of these datasets have been substantially
      processed including smoothing and imputation of missing data
      (Mann et al., 2008). While these present interesting problems, they are not
      the focus of our inquiry. We assume that the data selection, collection, and
      processing performed by climate scientists meets the standards of their discipline.
      Without taking a position on these data quality issues, we thus take
      the dataset as given.

      This is a perfectly reasonable position for statisticians to take, especially when they are savaging the statistcs.

      • Posted Aug 14, 2010 at 6:05 PM | Permalink

        They always say a savaging is best served cold.

      • ZT
        Posted Aug 14, 2010 at 6:59 PM | Permalink

        “We assume that the data selection, collection, and processing performed by climate scientists meets the standards of their discipline.”

        Never let it be said that the scientifically and statistically inclined lack a sense of humor.

        • Bernie
          Posted Aug 14, 2010 at 8:07 PM | Permalink

          I understand their assumptions, but that does mean that this assumption will not be used to limit the import of their conclusions.

        • Posted Aug 15, 2010 at 3:03 PM | Permalink

          It would be difficult to challenge the paper’s stated assumptions without undermining the team’s own work in so doing, such is the positioning of Wyner & McShane’s paper. I do agree that this is a likely route for them to try, though.

          But different eyes are on the stick this time, and I don’t anticipate that Mann’s creative/inventive statistical techniques will charm this audience.

        • Invariant
          Posted Aug 15, 2010 at 4:47 PM | Permalink

          Yes – read Bishop Hill!

          http://bishophill.squarespace.com/blog/2010/8/15/here-come-the-cavalry.html

          “They take the Mann proxy data set as a given. They do not try to assess the quality or defects of individual proxies. They therefore leave the Tijlander series as defined by Mann plus problematic Cedars and BCPs. Theirs is an analysis to define better statistical procedures when too many variables for the number of data points and autocorrlation issues.”

    • Posted Aug 14, 2010 at 6:07 PM | Permalink

      Re: Bernie (Aug 14 16:43),

      McShane is a brand new PhD!!

      A fitting anti-Mann.

  29. jv
    Posted Aug 14, 2010 at 4:44 PM | Permalink

    >”Any chance of some Team member still “going to town” to keep this from being actually published?”

    Forecast calls for 100% chance that the the thumb screws are being turned as we speak. The question is will they be successful? This might be a case where being successful might be more damaging to their cause than failing. You don’t just smack down top end talent with a track record with out raising some eye brows. If the paper gets flushed with out an actual serious flaw being found, it will definitely pop up some where else. And the brew haha will make people even more curious.

    • Mike Jowsey
      Posted Aug 15, 2010 at 5:49 AM | Permalink

      Won’t happen – different discipline, and one with an obvious distaste for amateur maths.

  30. Oldjim
    Posted Aug 14, 2010 at 4:56 PM | Permalink

    From the end of the paper which may suggest it was peer reviewed
    Acknowledgements. We thank Editor Michael Stein, two anonymous referees, and Tilmann Gneiting for their helpful suggestions on our manuscript. We also thank our colleagues Larry Brown and Dean Foster for many helpful conversations.

  31. Menns
    Posted Aug 14, 2010 at 4:58 PM | Permalink

    I’m not able to find the supplementary material at http://www.imstat.org/aoas/supplements/default.htm. Has someone else found it or is it only supplied when the paper is published?

  32. Tom C
    Posted Aug 14, 2010 at 5:07 PM | Permalink

    We all know the reaction will be “they are not climate scientists”. I do think, though, that this marks a turning, oops “tipping” point.

  33. chris y
    Posted Aug 14, 2010 at 5:13 PM | Permalink

    I didn’t see an acknowledgement of funding for this paper. This is normally included on the first page or just before the references list. It would be interesting to see if this was funded by an NSF grant…

  34. Posted Aug 14, 2010 at 5:23 PM | Permalink

    The fact that McShane only got his phd this year is interesting, as that means he will have been taught the very latest thinking in statistics.

    tonyb

    • Dave L.
      Posted Aug 14, 2010 at 6:25 PM | Permalink

      The number two author has credentials:

      Abraham Wyner, Associate Professor, Department of Statistics, The Wharton School, University of Pennsylvania.

      http://statistics.wharton.upenn.edu/people/faculty.cfm?id=594

      PhD in Statistics from Stanford, followed by a NSF Post Doc Fellowship in Mathematical Sciences at Univ. of California, Berkley, from 1995-1998.
      “Professor Wyner is an expert at Probability Models and Statistics.”

  35. dukeofurl
    Posted Aug 14, 2010 at 5:43 PM | Permalink

    Yes they are not climate scientists, but they have used the climate scientists carefully assembled data ( which is what climate scientists do) and then focussed on the statistical techniques ( which is what climate scientists do not do)

  36. John Whitman
    Posted Aug 14, 2010 at 5:58 PM | Permalink

    McShane and Wyner 2010, in order to “focus on the substantive modeling problems encountered in this setting” they make two “substantial” assumptions:

    “We assume that the data selection, collection, and
    processing performed by climate scientists meets the standards of their discipline.”

    “We further make the assumptions of linearity and stationarity
    of the relationship between temperature and proxies, an assumption
    employed throughout the climate science literature (NRC, 2006) noting
    that ”the stationarity of the relationship does not require stationarity of the
    series themselves” (NRC, 2006).”

    Basically, these two “substantial” assumptions will likely be future topics for other statisticians to write papers on the Multiple Temperature Proxies.

    John

    • John Whitman
      Posted Aug 14, 2010 at 8:01 PM | Permalink

      And in their conclusions last sentence:

      “Although we assume the reliability of their data for our purposes
      here, there still remains a considerable number of outstanding questions
      that can only be answered with a free and open inquiry and a great deal of
      replication.”

      “Free and open” and “replication”, yes.

      John

  37. Benjamin
    Posted Aug 14, 2010 at 6:05 PM | Permalink

    **Blakeley B. McShane**

    Click to access cv_McShane.pdf

    Ph.D. in Statistics
    University of Pennsylvania Philadelphia, PA
    The Wharton School
    Thesis: Integrating Machine Learning Methods with Hidden Markov Models: A New Approach to Categorical Time Series Analysis with Application to Sleep Data

    **Abraham J. Wyner**
    http://www.wharton.upenn.edu/faculty/wyner.cfm
    PhD in Statistics, Stanford University, 1993;
    BS in Mathematics Yale University, 1988
    Research Areas :
    Probabilistic modeling; information theory; entropy; data compression; estimation

    Finally, statisticians starting to work on it !

  38. GrantB
    Posted Aug 14, 2010 at 6:55 PM | Permalink

    Nick Stokes will debunk all this in a matter of seconds.

    • Posted Aug 14, 2010 at 7:33 PM | Permalink

      Re: GrantB (Aug 14 18:55),
      Well, it won’t appear in a matter of seconds. My 4.15pm post above is still in moderation at 7.30pm.

      Actually, I think it seems to be a good paper. Certainly well-written. Not sure everything’s right.

  39. Vorlath
    Posted Aug 14, 2010 at 7:14 PM | Permalink

    I read the whole thing. Am I to understand correctly that their conclusion is basically that random noise is a better predictor than the actual proxy data? They sure repeated it enough times that the error margins completely engulf all reconstructions done by climate scientists and that it doesn’t preclude MWP or widely diverging scenarios.

    When I fist saw those reconstructions way back when and noticed the very small changes in temperature compared to the precision of the proxy data, I didn’t understand how they could filter out the noise and be able to get any reliable data that wouldn’t be engulfed in the error bands. Seems like what I remember from stats class wasn’t far off even though there’s no way I could do the analysis found in the linked paper. The uncertainties in the IPCC graphs always looked like pointwise confidence intervals to me, not uncertainty margins. In any case, I found a lot of stuff in there (while beyond my grasp in the analysis) to go overboard to show the obvious.

    “In other words, our model performs better when using highly autocorrelated
    noise rather than proxies to ”predict” temperature. The real proxies
    are less predictive than our ”fake” data.”

    This is with the 30 year holdout blocks. Is there something I’m missing. Is it that bad?

  40. Lance
    Posted Aug 14, 2010 at 7:37 PM | Permalink

    You can almost hear the smear campaign being formulated.

    “This wasn’t published in a scientific journal.”

    “The authors weren’t scientists, let alone climate scientists.”

    They will be subjected to all of the same irrelevant and underhanded attacks as Wegman.

    It won’t budge the faithful, but it will make a difference in scientific circles that acknowledge the dependence of scientific inquiry on legitimate statistical analysis.

    Whether it will find traction in the popular media is another matter entirely.

    • Tom C
      Posted Aug 14, 2010 at 9:35 PM | Permalink

      We have been through these smear campaigns so often that they can nearly be charted in advance:

      1) Deep Climate will go looking for text fragments that have appeared in any publication in the Library of Congress. He will breathlessly report that the words “Bayesian Validation” appear in six textbooks!
      2) Eli Rabbett will chime in with a series of weird insults involving nicknames, animals, animals with nicknames, etc. It will be incomprehensible to everyone except him but the amen corner will be ecstatic.
      3) Hard to know at this point who will get the marquee “Guest post” at RC to offer the rebuttal, but expect sentences with wild permutations of statistical terms that sound really deep but are nonsensical. – mike will appear only in the comments to offer some slurs based on geography.
      4) When all is lost they will turn to Annan as final aribrator and he will offer disdainful comments all around without really engaging the issues at hand. Eli will declare victory with the usual abstruse references to unknown people and animals.
      5) Rinse – Repeat

    • Posted Aug 15, 2010 at 2:22 AM | Permalink

      Re: Lance (Aug 14 19:37), I think this one goes deeper and they’ll look deeper for rebuttals. My thought is, watch Gerry North and his merry men.

  41. JCM
    Posted Aug 14, 2010 at 7:47 PM | Permalink

    Balloon prickers.
    This should be fun to watch.

  42. geo
    Posted Aug 14, 2010 at 7:56 PM | Permalink

    This feels very much to me like the coup de grace to the Hockey Stick.

    I hope Steve and Ross won’t take that as in some way disrespectful to their previous work, as it certainly isn’t intended that way.

    But one of the things that really needed to happen is that the larger professional academic statistical community needed to engage here and refute the point, long held by the AGWers, that somehow there was a degree of individual personalities involved here in the M&M critques. The broader these critiques become in the academic statistical community, the more untenable that position becomes.

    Said another way, M&M, taking the slings and arrows of outrageous fortune over years from the climatologists, has finally energized the general professional statistical community into dipping their oar into the situation, and this is a Very Good Thing for all us in the long-run, even the climate modellers, however loathe they may be to admit it in the shorter run.

    • Benjamin
      Posted Aug 14, 2010 at 8:16 PM | Permalink

      I think Steve&Ross have been waiting for quite a while that other skilled statisticians replicate their work.
      Now here you have it.

    • pete
      Posted Aug 14, 2010 at 9:16 PM | Permalink

      From the article:

      our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium

      Not quite a coup-de-grace then.

      Also check out page 5. They’ve completely mangled the PCA argument.

      It gets better later on — some good points about confidence curves for sample paths and the limitations of cross-validation.

      • Benjamin
        Posted Aug 14, 2010 at 9:23 PM | Permalink

        Yeah but that’s with Tijlander & bristlecones.

      • geo
        Posted Aug 14, 2010 at 10:01 PM | Permalink

        The hockey stick is about a lot more than whether the 1990s were the hottest decade of the last millenium.

        It’s the AGWers own brand of denialism to tell themselves that the vast majority of skeptics believe that C02 plays *no* role in warming. The truth is that’s a minority position in even the skeptic community.

        • steven Mosher
          Posted Aug 15, 2010 at 3:38 PM | Permalink

          Bingo

          The more skeptics tackle the REAL issue and stop deby the basic physics the better

      • geo
        Posted Aug 14, 2010 at 10:21 PM | Permalink

        And btw, “offers support” is awfully weak tea. What’s the confidence level on that?

        As the climategate emails show, even Briffa was very hesitant about pushing paleo claims to reliable decadal granularity.

        Personally, I think a combination of insuffient granularity and dating uncertainty across proxies go a long way towards an artificial smoothing of the paleo reconstruction temperature record –but that belief does not require the 1990s to not have been the hottest in the last millenium.

        • bender
          Posted Aug 17, 2010 at 9:17 AM | Permalink

          “offers support” is awfully weak tea

          yes. damning by faint praise.

      • Tesseract
        Posted Aug 14, 2010 at 11:32 PM | Permalink

        Hey Pete =)

        “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”

        “our model does not pass ‘statistical significance’ thresholds against savvy null models. Ultimately, what these tests essentially show is that the 1,000 year old proxy record has little power given the limited temperature record” (p. 41)

        I think this sounds a lot like what Steve often says, something like, ‘just because I’m correcting errors in the models, doesn’t mean the corrected models mean anything.’

  43. Tom C
    Posted Aug 14, 2010 at 8:18 PM | Permalink

    I know that Gavin does not read ClimateAudit (except at midnight on Easter) but if he has been alerted to this you can imagine the team is already scheming a counter-attack. Since this paper is devastating, their response will be over-the-top aggressive. They have been able to get away with this in past but it will backfire post-Climategate.

    • damian
      Posted Aug 15, 2010 at 6:34 AM | Permalink

      Ideally if they react aggressively against this paper the statistician community can take it personally and fight back … would be good as the statisticians are completely independent to the climate scientists …

    • steven Mosher
      Posted Aug 15, 2010 at 3:40 PM | Permalink

      They are not planning a response in a stats journal THAT is certain

      • Ron Cram
        Posted Aug 15, 2010 at 3:45 PM | Permalink

        Funny, mosh! Totally true!

  44. Tom C
    Posted Aug 14, 2010 at 8:25 PM | Permalink

    It was gratifying to see how they took the time to clearly describe the graphical chicanery of the original hockeystick.

  45. Abaraham Wyner
    Posted Aug 14, 2010 at 8:34 PM | Permalink

    Thanks for the welcome response. For the record, Blakely just graduated with a Phd in Statistics under my supervision from the University of Pennsylvania (not Penn State!).

    The paper has been accepted, but publication is still a bit into the future as it is likely to be accompanied by invited discussants and comment. Stay tuned…

    • Bernie
      Posted Aug 14, 2010 at 8:49 PM | Permalink

      Congratulations on an exceptionally thorough paper. It would be interesting to hear what the reviewers said and whether you plan on any follow up.

    • Chuck L
      Posted Aug 14, 2010 at 8:57 PM | Permalink

      I am Penn alumnus and am proud and gratified to see such a cogent and well-written paper emanating from my university. Unfortunately, you and Blakely should now be prepared for much unpleasantness from the creators and worshippers of the hockey stick orthodoxy.

      • Derek H
        Posted Aug 18, 2010 at 2:47 PM | Permalink

        I too am a Penn alumnus and am gratified to see a cogent paper coming from the university, especially in the wake of Dr. Guttman’s Hackneyed (pun intended) revival of political correctness there. I just wish it had come from the SEAS instead of Wharton … 😉

    • pete
      Posted Aug 14, 2010 at 9:47 PM | Permalink

      Since this has already been accepted, is it too late to make corrections? Page 5 is a bit of a mess, and I’m sure the commenters here could help you fix some of the errors there.

      • Posted Aug 15, 2010 at 4:46 AM | Permalink

        Since this has already been accepted, is it too late to make corrections?

        This highlights for me two key decisions made by the authors. One: assume that climate scientists have been scrupulous in collecting and selecting data. Two: have no contact with McIntyre and McKitrick prior to publishing.

        In the normal case these two combined would lead one to assume bad news. In this case, genius. Complete independence. Same result.

        The IPCC and its fellow-travellers made a terrible error of judgement not acknowledging the statistical deficiencies of the hockey stick when first exposed. This is precisely what was needed as a remedy.

        • pete
          Posted Aug 15, 2010 at 5:02 AM | Permalink

          The problem with “complete independence” is it leads to numerous easily avoided errors. Based on section 3.3 it looks like they haven’t even read some of the papers they’ve cited.

        • Posted Aug 15, 2010 at 5:18 AM | Permalink

          Talking of careful reading, Professor Wyner just told us:

          The paper has been accepted, but publication is still a bit into the future as it is likely to be accompanied by invited discussants and comment. Stay tuned…

          That was the third decision if you like (with help from the Annals of Applied Statistics).

          All looking good to me.

        • MarkJ
          Posted Aug 15, 2010 at 7:22 AM | Permalink

          Discussants and comment imply accompanying commentary from climate scientists. Perhaps stressing (or overstating) areas of agreement.

        • Posted Aug 15, 2010 at 8:17 AM | Permalink

          It may imply climate scientists. It may imply other statisticians. It may even imply a certain ‘citizen scientist’ and his economist friend, a number of whose publications the paper cites. I don’t know and I’m not quite sure how you do.

        • MarkJ
          Posted Aug 15, 2010 at 8:28 AM | Permalink

          Fair enough, I don’t, i’m just speculating that as their work is based on Mann, invitations for comment will have been sent in that direction. That would be collegiate and promote further debate.

        • Posted Aug 15, 2010 at 9:07 AM | Permalink

          Yep, would make sense.

    • Posted Aug 14, 2010 at 10:41 PM | Permalink

      Re: Abaraham Wyner (Aug 14 20:34),
      Yes, many comments have UPenn and Penn State mixed.

      • Posted Aug 15, 2010 at 4:18 AM | Permalink

        In fact, none of the comments have UPenn and Penn State mixed. There was simply a little joke made by John M about Penn State hosting Blakely’s old CV, which happened to be the first hit on Google Scholar. And that’s it. Time to er, move on 🙂

    • Michael Jankowski
      Posted Aug 14, 2010 at 10:42 PM | Permalink

      Go Quakers!!!

    • Thomas L
      Posted Aug 15, 2010 at 2:15 AM | Permalink

      If, as some models suggest (primarily those base on solar/sunspot variability on global average temperature) we have several years of cooler temperatures, how sensitive is holdout RMSE to this? It seems, from quick calculations, it would make Mann-like hockey stick models appear even more like seeing signal where the data is indistinguishable from noise.

    • Posted Aug 15, 2010 at 2:35 AM | Permalink

      Re: Abaraham Wyner (Aug 14 20:34), Thank you from all my heart for this important work.

      If it’s not too late for minor corrections, this sentence flashed out from your Introduction page 2:

      On the one hand, this is peculiar since paleoclimatological reconstructions can provide evidence only for the detection of AGW and even then they constitute only one such source of evidence.

      as I personally would prefer to see the acronym “AGW” replaced by the phrase “global warming”

    • John Whitman
      Posted Aug 15, 2010 at 7:06 AM | Permalink

      Congratulations to McShane and you on the paper.

      Your timing of publishing seems good.

      John

    • bender
      Posted Aug 16, 2010 at 1:16 PM | Permalink

      Dear Dr. Wyner,
      Do you care to comment on what appears to be an inconsistency in statements made in the paper vis a vis one’s ability to make confident statements about temperature in the 1990s versus other decades of the last millenium? Your text seems a bit schizoid – asserting that such statements can not be made with much conifidence, yet proceeding to do exactly that. How do you explain the inconsistency?

    • bender
      Posted Aug 16, 2010 at 1:20 PM | Permalink

      I’m referring specifically to these two statements, A & B, made in different parts of the text:

      A: “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”

      B: “temperature derivatives encountered over recent history are unprecedented in the millennium. While this does seem alarming, we should temper our alarm somewhat by considering again Figure 15 and the fact that the proxies seem unable to capture the sharp run-up in temperature of the 1990s”

      What sort of “tempering” do you think is warranted?

  46. Jason
    Posted Aug 14, 2010 at 8:36 PM | Permalink

    I thought that this was a very even handed treatment of the subject.

    Later in the paper McShane and Wyner develop a Bayesian model which (assuming that no bias has been introduced into the Mann ’08 data) establishes an 80% probability that the decade 1997-2006 is the warmest in the past 1000 years.

    So they aren’t so much rebutting the hockey stick as they are the estimated error. This is no different than what McIntyre has written (but, IMNSHO, a great deal more precise).

    I would predict that the team reacts to this paper by ignoring it. If they say anything at all, they will probably note (correctly) that all the existing reconstructions considered by this new paper are consistent with the reconstruction (or model in this paper’s terminology) presented in the paper.

    • Benjamin
      Posted Aug 14, 2010 at 9:16 PM | Permalink

      Well it depends if they just look at the graphs, or if they read the text.

      “While our results agree with the climate scientists findings in some
      respects, our methods of estimating model uncertainty and accuracy are in
      sharp disagreement […] we conclude unequivocally that the evidence for a
      ”long-handled” hockey stick (where the shaft of the hockey stick extends
      to the year 1000 AD) is lacking in the data […] the long flat handle
      of the hockey stick is best understood to be a feature of regression and less
      a reflection of our knowledge of the truth.”

    • Tesseract
      Posted Aug 14, 2010 at 11:39 PM | Permalink

      Jason, they are saying that’s what their MODEL predicts. However, they also say that their model is not statistically significant and that the proxy record has little power to predict past temperature (regardless of the model) given the limited temperature record (p. 41).

      Sounds like something Steve McIntyre might say 😉

    • Michael Jankowski
      Posted Aug 15, 2010 at 8:16 AM | Permalink

      They won’t ignore it. They’ll latch-on to statements along the lines of those you noted (e.g., 80% probability of the hottest decade in the past 1000 yrs) out of the paper, suggesting it “validates” Mann’s publications and conclusions. They will try to dismiss and ignore the rest.

  47. Jason
    Posted Aug 14, 2010 at 8:52 PM | Permalink

    Professor Wyner,

    Thank you for commenting here.

    It is my understanding from reading your paper that the data you used includes both the tree rings and the Tiljander data from Mann ’08.

    As you are likely aware, various issues have been raised with this data, and Mann himself has released updated figures based on its removal.

    I would be VERY curious to learn how your figure 16 is impacted by the removal of this data (both the red line and the error bounds).

    Thanks again for commenting here.

    • Benjamin
      Posted Aug 14, 2010 at 9:08 PM | Permalink

      Good point.

      • Paul_K
        Posted Aug 15, 2010 at 3:27 AM | Permalink

        I would also like to see a formal re-analysis excluding BCPs and Tiljander series, although I suspect it will just show a LOT more grey in Figure 16.
        My congrats to McShane and Wyner on an excellent paper.

  48. Mike
    Posted Aug 14, 2010 at 9:39 PM | Permalink

    “…our model offers support
    to the conclusion that the 1990s were the warmest decade of the last millennium,…”

    • Bernie
      Posted Aug 14, 2010 at 9:47 PM | Permalink

      45 pages of evisceration and you come up with that?

    • Tesseract
      Posted Aug 14, 2010 at 11:43 PM | Permalink

      “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”

      “our model does not pass ‘statistical significance’ thresholds against savvy null models. Ultimately, what these tests essentially show is that the 1,000 year old proxy record has little power given the limited temperature record” (p. 41)

      (sorry for the repeated response, but it’s to a repeated point! =)

  49. pete
    Posted Aug 14, 2010 at 10:03 PM | Permalink

    Interestingly, the authors assume that holdout-RMSE (i.e. RE) is the natural way to measure skill.

    The superiority of RMSE to r^2 in this context is obvious to anyone with statistics training. No need to check Draper and Smith for something this simple!

    • bobdenton
      Posted Aug 15, 2010 at 5:11 AM | Permalink

      The point that has been made is more subtle. Where r2 was significant it was reported and where it wasn’t it wasn’t reported leaving an impression that r2 as relied on and was significant in all cases.

      You can argue about the best metric but you can’t rely on a single metric in so far as it’s significant but not where it isn’t.

  50. Margaret
    Posted Aug 14, 2010 at 11:00 PM | Permalink

    In anticipating the reaction you have missed the role of

    Big Oil

    I am sure these two have had their handsome payouts….

    • Ed Snack
      Posted Aug 15, 2010 at 1:17 AM | Permalink

      Hey, nice ad hom Margaret ! If that’s all you’ve got you’re in kmore trouble than I thought.

      You know though, we’re ALL still waiting for our cheques, the ones big oil promised us for being “skeptical” and making a lot of trouble for those nice scientists who are just barely getting by on a crust while alerting the world to the oncoming doom ! Ooops, hope I haven’t let the cat out of the bag, but you knew about those promises anyway, didn’t you… ? BTW, you don’t think that Big Oil would welsh on the money, I mean, they promised, they promised they’d give us as much as they gave the “other side”, so we’re all really hopeful of getting millions upon millions, really !

      • geronimo
        Posted Aug 15, 2010 at 4:01 AM | Permalink

        Ed, I think you’ll find Margaret is indulging in “irony” and isn’t seriously suggesting they’re funded by big oil.

        • Margaret
          Posted Aug 15, 2010 at 5:01 AM | Permalink

          Sure was !!!

          It was written in response to Tom C at 9.35 outlining the probable response of realclimate — which at the time was close to the most recent post. I didn’t notice the “reply” button else I would have attached my post to it.

    • GrantB
      Posted Aug 15, 2010 at 5:51 AM | Permalink

      Blakeley McShane is from the Kellogg School of Management and is obviously funded by big corn.

  51. Geoff Sherrington
    Posted Aug 14, 2010 at 11:11 PM | Permalink

    In the category of the mild “told you so” the authors note –

    “Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries.”

    c.f. a comment of mine on tAV July 29 at http://noconsensus.wordpress.com/2010/07/27/bo-christiansen-variance-loss/

    “31 Jeff. Agreed. My thoughts are drifting towards doubts about any proxies being useful. Part of it stems from an uneasy feeling that the errors of measurement at all stages are much, much greater than commonly expressed. It’s like the model ensemble case, when the error is assessed between a number of submitted comparisons. I’ve long argued that the errors should be calculated by including all of the model runs from all modellers in the round robin, apart from those runs that were rejected for obvious reasons like transcription errors.”

    It’s easy to arm wave like I did, but it’s comforting to see quantitative conclusions of greater skill from McShane & Wyner.

    Geoff McSherrington.

  52. bender
    Posted Aug 14, 2010 at 11:28 PM | Permalink

    I’m shocked. Not.

  53. Paul
    Posted Aug 14, 2010 at 11:49 PM | Permalink

    URGENT

    Steve: If this is the wrong place then please remove to a more appropriate thread but this needs to be made known.

    Go to Stuff.co.nz where a story has just broken that the New Zealand Science Coalition and the ACT Political Party have lodged papers with the High Court of New Zealand challenging the accuracy of the temperature records of NIWA.
    They claim the temperature records have been adjusted and that they are unjustified and seek to have them romoved.

  54. steven Mosher
    Posted Aug 15, 2010 at 3:10 AM | Permalink

    northwestern. go cats

  55. John Ritson
    Posted Aug 15, 2010 at 3:40 AM | Permalink

    woo hoo!

  56. stephen richards
    Posted Aug 15, 2010 at 3:50 AM | Permalink

    Key issues:

    Data are a given with all their faults, manipulations and adjustments. This therefore eliminates the necessity to cross-over to the climate side.

    Focus is purely on the stats. Criticisms therefore can only be made by well qualified statistitians. The like of computer games writers, tree ring manipulators and Nick Stokes PROBABLY have no value here.

  57. Manfred
    Posted Aug 15, 2010 at 3:57 AM | Permalink

    Am I right, that these rather devastating (though not surprising) results were obtained, without even considering the warming baises in the proxy selections and the temperature records ?

    • Latimer Alder
      Posted Aug 15, 2010 at 6:13 AM | Permalink

      Yes.

      As they so beautifully put it

      ‘We assume that the data selection, collection, and
      processing performed by climate scientists meets the standards of their discipline’.

      Certainly Sir Humphrey!

      With such a brilliant putdown, I can only imagine that they have British heritage.

      • Posted Aug 15, 2010 at 10:17 AM | Permalink

        Absolutely.

        I confess that I grinned from ear to ear when I read that particular gem.

        • Latimer Alder
          Posted Aug 15, 2010 at 11:35 AM | Permalink

          The other pleasing thing is (to me at least) that it demonstrates that they are not frightened by The Mann .. and are prepared to have a fight. This barbed remark is almost a direct challenge to him.

  58. Invariant
    Posted Aug 15, 2010 at 4:01 AM | Permalink

    1. Natural temperature variability may be large.
    2. It’s not the sun (thanks Leif!).
    3. Increased CO2 increase temperature.
    Is it possible to tell magnitude of 3 given 1?

  59. AdderW
    Posted Aug 15, 2010 at 6:11 AM | Permalink

    http://www.stuff.co.nz/the-press/news/4026335/Niwas-data-accuracy-challenged

  60. Bernie
    Posted Aug 15, 2010 at 6:35 AM | Permalink

    Does anyone else detect the overall skepticism of the authors about the statistical feasibility of the entire proxy effort? Footnote 12 is all about stationarity, which is another key assumption.

  61. two moon
    Posted Aug 15, 2010 at 6:58 AM | Permalink

    I mentioned the M&W paper at RealClimate on a thread devoted to “expert credibility.” Reply was essentially an advisement to wait and see how the discussion plays out. I can confirm that RC is cognizant of the paper.

    • stephen richards
      Posted Aug 15, 2010 at 7:08 AM | Permalink

      I don’t see how they will be able to make any valid crticisms. they are not, after all, statisticions, mind you they hasn’t stop them in the past. The paper is focused entirely on the statistical validation of their work. I look forward to their response. Should be ‘interesting’.

    • MarkJ
      Posted Aug 15, 2010 at 7:37 AM | Permalink

      Yes it is obvious which quotes will picked and trotted out, reading the whole paper leaves a different impression and it is an easy read even for ‘civilians’.

    • SOI
      Posted Aug 15, 2010 at 9:06 AM | Permalink

      It is really funny to see Gavin’s response. From a paper scathing about paleo reconstructions, he actually cherry picks one sentence out of context to support the IPCC conclusions! You have to admire his tenacity.

  62. Posted Aug 15, 2010 at 7:20 AM | Permalink

    About time! I am still puzzled why common sense never prevailed in this debate.

  63. bender
    Posted Aug 15, 2010 at 7:49 AM | Permalink

    “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”

    • Sylvain
      Posted Aug 15, 2010 at 8:08 AM | Permalink

      Here is to complete the rest of your quote:

      “While,…, it does not predict temperature as well as expected even in sample.
      The model does much worse on contiguous thirty year time intervals.
      Thus, we remark in conclusion that natural proxies are severely
      limited in their ability to predict average temperatures and temperature
      gradients.”
      snip

      • bender
        Posted Aug 16, 2010 at 9:27 AM | Permalink

        Sure, and the entire article provides the fullest possible context. Would you have me quote that from start to finish?
        .
        I believe you are missing the point. Predicting past temperatures with high accuracy and determining whether CWP is warmer than MWP are two very different (albeit related) questions. You may never be able to resolve small temperature differences (less than a degree, say). But the substantive issue is whether you can resolve large differences (of more then a few degrees, say).
        .
        Quote whatever pasage you like. The fact is this paper is pessimistic on the former, but optimistic on the latter.

        • Stan Plamer
          Posted Aug 16, 2010 at 9:59 AM | Permalink

          Optimistic?

          My impression was completely different. Figure 14 shows completely different reconstruction’s from the same proxy set based on different model building methods. The issue of proxy quality was not addressed. I do not see the paper as optimistic that these issues can be resolved with current techniques.

        • bender
          Posted Aug 16, 2010 at 10:08 AM | Permalink

          “These issues?”

          Here, you pluralize and conflate exactly where I attempt to separate and to clarify. Address the quote, not the stuff around it.

        • Stan Plamer
          Posted Aug 16, 2010 at 10:43 AM | Permalink

          Your comment does not take into account that one of the reconstructing in figure 14 shows a constant rise in temperature into the past while another shows a flat shafted hockey stick. M&W are not optimistic that any model building technique will be able to find a valid reconstruction. The issue of proxy noise is part of the reason for that they identify for this and proxy selection is an issue that will affect it as well. If Tijlander is included in the proxy set, how will this affect the building of the model?

        • bender
          Posted Aug 16, 2010 at 10:56 AM | Permalink

          Again, you conflate.

          You say: “M&W are not optimistic that any model building technique will be able to find a valid reconstruction.”

          A “valid reconstruction” can mean a lot of things. It means different things to different people. Allow me to deconflate.
          .
          If the issue is comparing the decade of the 1990s versus all decades of the past millenium, then the authors are clearly optimistic that a confident statement can be made. Because they made one.

        • bender
          Posted Aug 16, 2010 at 10:58 AM | Permalink

          Should I assume I’m talking with “Stan Palmer” and not “Stan Plamer”?

        • bender
          Posted Aug 16, 2010 at 10:15 AM | Permalink

          “The major difference between our model and those of climate scientists, however, can be seen in the large width of our uncertainty bands. Because they are pathwise and account for the uncertainty in the parameters (as outlined in Section 5.3), they are much larger than those provided by climate scientists. In fact, our uncertainty bands are so wide that they envelop all of the other backcasts in the literature. Given their ample width, IT IS DIFFICULT TO SAY THAT RECENT WARMING IS AN EXTRAORDINARY EVENT compared to the last 1,000 years. For example, according to our uncertainty bands, it is possible
          that it was as warm in the year 1200 AD as it is today.”
          .
          Is this paper internally consistent in its statements?

        • TimG
          Posted Aug 16, 2010 at 10:19 AM | Permalink

          When they say “our model” are they always talking about the same model or do they develop different models to illustrate different points? I got the impression there is more than one model being proposed.

        • bender
          Posted Aug 16, 2010 at 10:49 AM | Permalink

          By “our/their” model they are drawing a distinction between statistical models that use “pathwise” versus “pointwise” estimation of uncertainty bands. Theirs is pathwise. Others are pointwise.
          .
          According to them, the pathwise uncertainty bands are so wide as to preculde a wide range of common categorical statements about past climate vis a vis present day climate.

        • Duke C.
          Posted Aug 17, 2010 at 11:32 AM | Permalink

          Re: bender (Aug 16 10:15),

          Aug 16, 2010 at 8:23pm EDT – Note on “A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?” by Blakeley B. McShane and Abraham J. Wyner:

          “The paper has been accepted at the Annals of Applied Statistics and a draft version is posted on the journal’s website in the forthcoming section. The posted draft was submitted for referee and editor comments and is not yet in “final” form. Likewise, some have obtained the code and data which was intended for the referees and editors as part of the review process. This code and data is not yet in final form nor is the documentation complete. The final draft of the paper and the code and data bank will be posted at the journal’s website come publication.”

          http://www.blakemcshane.com/

          “How long should papers be?

          Most published papers will not exceed 20 pages (About 500 words per page, with figures typically taking 1/3 page and displayed equations 30 words each), anything longer requiring unusually compelling subject matter. Papers fewer than 12 printed pages may receive expedited review.”

          http://www.imstat.org/aoas/mansub.html

          According to the authors the copy of MW2010 currently making the rounds is a DRAFT VERSION, and AOAS has 20 page limit. Seems that there is quite a bit of editing that still needs to occur.

        • SOI
          Posted Aug 16, 2010 at 10:23 AM | Permalink

          Who are you, and what have you done with bender?? I can’t possibly see how you think the paper is optimistic on CWP versus MWP. Yes, they mention that their model shows an 80% chance of 1997-2006 being the warmest on record, but they proceed to expound on a number of reasons why their model produces false confidence. The context they give includes:

          “While this (high CWP) does seem alarming, we should temper our alarm somewhat by considering again Figure 15 and the fact that the proxies seem unable to capture the sharp run-up in temperature of the 1990s.”

          “Still, it seems there is simply not enough signal in the proxies to detect either the high levels of or the sharp run-up in temperature seen in the 1990s. This is disturbing: if a model cannot predict the occurrence of a sharp run-up in an out-of-sample block which is contiguous with the insample
          training set, then it seems highly unlikely that it has power to detect such levels or run-ups in the more distant past.”

          And this is even with assuming the proxy data is perfect!

          When you account for the limitations of the model they identified and the likelihood of flaws in the proxy data, we are clearly not at 80%, or anything even close. I can’t see how you can read the paper any other way.

        • bender
          Posted Aug 16, 2010 at 10:51 AM | Permalink

          You say: “I can’t possibly see how you think the paper is optimistic on CWP versus MWP.”

          Read the thread. Read the opening quote:
          “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”

        • bender
          Posted Aug 16, 2010 at 11:08 AM | Permalink

          So far, only Ken Fritsch has abstracted the salient counter to the alarmist comment:

          “temperature derivatives encountered over recent history are unprecedented in the millennium. While this does seem alarming, we should temper our alarm somewhat by considering again Figure 15 and the fact that the proxies seem unable to capture the sharp run-up in temperature of the 1990s”

          stephen richards appears to have picked up a scent of internal incosinstency, but he doesn’t provide a specific instance to substantiate his remark.

          So this is really all about “the divergence problem”.

        • SOI
          Posted Aug 16, 2010 at 11:17 AM | Permalink

          Um, bender, I quoted the same paragraph as Ken Fritsch.

          Not like you to be this sloppy.

        • bender
          Posted Aug 16, 2010 at 11:43 AM | Permalink

          Right on both counts.

        • bender
          Posted Aug 16, 2010 at 12:57 PM | Permalink

          SOI, do you recognize these two statements as being internally inconsistent? And if so, how do you reconcile the inconsistency?
          .
          Steve M has previously described what seems to be a “need to genuflect” for skeptical papers hoping to get published in the mainstream literature. stephen richards has hinted at a possibility of this.
          .
          Would love to hear from any of the authors of this paper.

        • Bernie
          Posted Aug 16, 2010 at 1:00 PM | Permalink

          Bender:
          Adi Wyner, during his too brief visit, noted above that there was going to be a special issue with a response and comments, so I think the pressure was on to be somewhat conciliatory – though the footnotes tend to tell another tale.

        • bender
          Posted Aug 16, 2010 at 1:10 PM | Permalink

          “The footnotes tend to tell another tale?” Care to expand?

        • Mike B
          Posted Aug 16, 2010 at 1:23 PM | Permalink

          Bender, my take is that this paper has something for everyone, although for some there is a little more than others.

          I think there is a deft genuflect, but then I also think this was a wise move to get their work into the peer-reviewed literature so that they can expand on it.

          There many issues they didn’t touch on (filtering, in-filling, proxy selection, etc.), but that was mainly to avoid controversy about those issues.

          For me, the primary issue remains robustness (as has been discussed by others in this thread), and hopefully these authors will take up that issue later.

        • bender
          Posted Aug 16, 2010 at 5:40 PM | Permalink

          “Something for everyone” is politics, not science. Do you think the authors would accept your suggestion that they are playing politics?

        • Mike B
          Posted Aug 17, 2010 at 8:54 AM | Permalink

          Bender, if they presented *all* the results, carefully and completely, I don’t see how that’s “politics”. They were thorough.

        • bender
          Posted Aug 17, 2010 at 8:57 AM | Permalink

          When you make logically conflicting statements just to appease opposing groups, this is pure politics, the antithesis of science.

        • bender
          Posted Aug 17, 2010 at 9:09 AM | Permalink

          My point is that I don’t think there is “something for everyone” here. Wyner was quite clear that he believed the “alarmist” conclusion must be “tempered” in some way. Granted, this is an ambiguous position (since he dind’t sate what kind of tempering is warranted). But it is A position. It is a counter-alarmist position. Net result is there is nothing here for the extreme alarmists. Yes, there are quotes they *could* use if they dared to take them out of context. That would be an incorrect and foolish thing to do.

        • Mike B
          Posted Aug 17, 2010 at 9:17 AM | Permalink

          To your first point, in full context, I don’t think they made logically conflicting statements. To your second point, if pulled out of context they could seem to be conflicting, and we can fully expect the alarmists to do that. Because as has been clear for over a decade, they are engaged in politics.

        • SOI
          Posted Aug 16, 2010 at 2:56 PM | Permalink

          bender,

          There does seem to be some degree of inconsistency. They say, as you point out, that “our model offers support to the conclusion that the 1990s were the warmest decade of the last millennium.” However, if you read their analysis, they clearly think support is weak. I also think it telling that they did not include this observation in the summary.

        • bender
          Posted Aug 16, 2010 at 5:34 PM | Permalink

          As a fairly direct reply to MBH, does this paper represent a strengthening or a weakening of support for the argument? If the latter then these authors can be fairly accused of wordplay. And if that’s the case, then for what purpose? Why the ambiguity?

        • SOI
          Posted Aug 16, 2010 at 5:55 PM | Permalink

          bender,

          It is a weakening of MBH. The conclusion is strongly worded and clear: “On the one hand, we conclude unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data.”

    • TAG
      Posted Aug 15, 2010 at 8:35 AM | Permalink

      . This was not an exercise to develop a new resonstruction but an examonation of the technqiues used in conventional reconstruction work

      The authors state that they accepted the Mann proxies and the CRU temperature record as is. They have not addressed the proxy issues which is SMc’s main concern. I presume that the bristle cones, Tijlander etc are in the proxy set used.

      Some of the salient points that I saw in the paper are:

      a) the confidence intervals provided in previous work are too narrow and cannot be justified
      b) uncertainties in parameter values were not estimated and this leads to too narrow confidence intervals
      c) use of proxies prevents the detection of rapid temperature rises

      • Bernie
        Posted Aug 15, 2010 at 9:05 AM | Permalink

        Those are the polite points. There are a number of less polite ones.

        • TAG
          Posted Aug 15, 2010 at 9:41 AM | Permalink

          Maybe we could start of a list of the points made in the paper. Anotehr one was point out above

          “Natural climate variability is not well understood and is probably quite large. It is not clear that the proxies currently used to predict temperature are even predictive of it at the scale of several decades let alone over many centuries.”

        • Michael Jankowski
          Posted Aug 15, 2010 at 11:24 AM | Permalink

          Less polite would be confirming that the proxies are about as useful for predicting and hindcasting temperatures as randomly-generated series.

    • Dave
      Posted Aug 16, 2010 at 2:39 PM | Permalink

      Bender>

      I think you’ve totally missed the point of the paragraph you quoted, which is that whilst their “model offers support to the conclusion that the 1990s were the warmest decade of the last millennium”, that is, in toto, a negative thing for the value of the proxies for it to do so. The model is, after all, random data.

      If a study shows that random choices of stocks made by a monkey throwing darts are as good as the choices of expert stock-pickers, does that mean the stock-pickers are vindicated?

  64. Eric (skeptic)
    Posted Aug 15, 2010 at 8:45 AM | Permalink

    Perhaps Mike will retract this article:
    http://www.realclimate.org/index.php/archives/2005/01/on-yet-another-false-claim-by-mcintyre-and-mckitrick/

    • Posted Aug 17, 2010 at 10:11 AM | Permalink

      Are you high???? They’re busy trying to show how it reinforces it!!! 🙂

  65. AdderW
    Posted Aug 15, 2010 at 9:54 AM | Permalink

    At best, and that isn’t much, proxies might only be useful at the site from whence they originated. A tree-ring-proxy originating from a tree at an obscure location in Russia, perhaps say something about the growing conditions for that single location and for that single tree, nothing else.

  66. Patrick Hadley
    Posted Aug 15, 2010 at 10:16 AM | Permalink

    Professor Wyner https://climateaudit.org/2010/08/14/mcshane-and-wyner-2010/#comment-239212 tells us that The paper has been accepted, but publication is still a bit into the future as it is likely to be accompanied by invited discussants and comment.

    It seems likely that Michael Mann would be one of the invited discussants, and hence that the Hockey Team have been well aware of this paper for some time. If that is the case then one can understand why Gavin et al have been so uninterested in discussions about the proxies recently, and have been playing down the importance of the hockey stick.

    • TomRude
      Posted Aug 15, 2010 at 11:37 AM | Permalink

      Excellent observation…

  67. stephen richards
    Posted Aug 15, 2010 at 10:49 AM | Permalink

    I remember in my research days having to spend a great deal of effort rewriting a paper, that was extremely critical of the party line, in such a way that it appeared to both support and denegrate the original theses. This is such a paper. I can imagine well the discussions and changes before it was allowed to see the light of day. he he 🙂

    • j ferguson
      Posted Aug 15, 2010 at 10:51 AM | Permalink

      The most effective slam is one in which the slamee suspects he’s been slammed but cannot quite figure out how it was done.

  68. Kenneth Fritsch
    Posted Aug 15, 2010 at 11:39 AM | Permalink

    The authors here, very politely, condition their analyses results based on reliability of the data of those climate scientists who labored years to assemble it. They do not even scratch the surface of the selection of proxies, as Steve M has, and do not claim to.

    “Our work stands entirely on the shoulders of those environmental scientists who labored untold years to assemble the vast network of natural
    proxies. Although we assume the reliability of their data for our purposes
    here, there still remains a considerable number of outstanding questions
    that can only be answered with a free and open inquiry and a great deal of
    replication.”

    Here the authors reference what I view as fundamental to their criticism of the hockey stick and that being that, given the assumptions of “good” data, reconstructions cannot get the sharp run up in the 1990s right. The question then becomes, as it does with “hide the decline, i.e. divergence” whether that failure means that sharp run ups in the past were missed by the reconstruction or alternatively are the recent run ups so unique that proxies are “topping out”. That the authors in the introduction noted that the HS graph with the instrumental record tacked on was misleading was not in my view just a passing remark but part of their criticism later of the reconstruction failures.

    “On the one hand, we conclude unequivocally that the evidence for a
    ”long-handled” hockey stick (where the shaft of the hockey stick extends
    to the year 1000 AD) is lacking in the data. The fundamental problem is
    that there is a limited amount of proxy data which dates back to 1000 AD;
    what is available is weakly predictive of global annual temperature. Our
    backcasting methods, which track quite closely the methods applied most
    recently in Mann (2008) to the same data, are unable to catch the sharp run
    up in temperatures recorded in the 1990s, even in-sample. As can be seen
    in Figure 15, our estimate of the run up in temperature in the 1990s has
    a much smaller slope than the actual temperature series.”

    The authors nicely cover the run up question and the alternative here:

    “On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining
    paleoclimatology as a statistical enterprise.”

    Below the authors condition the conclusions about the warmest decade in the millennium with again a reference to the failure of the proxies to capture the recent run up in temperatures.

    “This suggests that the temperature derivatives encountered
    over recent history are unprecedented in the millennium. While this
    does seem alarming, we should temper our alarm somewhat by considering
    again Figure 15 and the fact that the proxies seem unable to capture the
    sharp run-up in temperature of the 1990s. That is, our posterior probabilities are based on derivatives from our model’s proxy-based reconstructions and we are comparing these derivatives to derivatives of the actual temperature series; insofar as the proxies cannot capture sharp run-ups, our model’s reconstructions will not be able to either and therefore will tend to understate the probability of such run-ups.”

    Overall this is not a paper that can be judged by taking comments out of context, but unfortunately I think there will be those who will attempt to do it that way. I found lots of good comments and analyses in this paper that have been subjects of discussion here at CA and other so-called skeptic blogs.

    As an aside, I recall that one of the recent Mann et al. papers talks about, not only the better known tree ring divergence, but divergence of other non-tree ring proxies in recent times. I am too lazy at the moment to look it up but I will in the near future. To me the divergence problem could almost be considered a failure of out-of-sample testing. I was very pleased that authors mentioned in-sample and out-of-sample testing and even the dangers of data snooping.

  69. CRS, DrPH
    Posted Aug 15, 2010 at 2:37 PM | Permalink

    RealClimate folks are already aware of this one:

    [Response: The M&W paper will likely take some time to look through (especially since it isn’t fully published and the SI does not seem to be available yet), but I’m sure people will indeed be looking. I note that one of their conclusions “If we consider rolling decades, 1997-2006 is the warmest on record; our model gives an 80% chance that it was the warmest in the past thousand years” is completely in line with the analogous IPCC AR4 statement. But this isn’t the thread for this, so let’s leave discussion for when there is a fuller appreciation for what’s been done. – gavin]

    BTW, McShane’s personal website is here: http://www.blakemcshane.com

    A well-qualified, serious individual on the faculty of one of the leading business schools in the US. This paper will sting.

    • two moon
      Posted Aug 15, 2010 at 3:44 PM | Permalink

      I guess that Gavin did not see the appropriateness of posting to a thread concerning expert credibility. i thought it was a good fit.

    • Benjamin
      Posted Aug 15, 2010 at 4:34 PM | Permalink

      Loool this was soooo expected from gavin.
      Cherry picking the quotes, nicely done !

      Let’s keep reading “While this
      does seem alarming, we should temper our alarm somewhat by considering
      again Figure 15 and the fact that the proxies seem unable to capture the
      sharp run-up in temperature of the 1990s. That is, our posterior probabilities
      are based on derivatives from our model’s proxy-based reconstructions
      and we are comparing these derivatives to derivatives of the actual temperature
      series; insofar as the proxies cannot capture sharp run-ups, our
      model’s reconstructions will not be able to either and therefore will tend to
      understate the probability of such run-ups.”

  70. BDAABAT
    Posted Aug 15, 2010 at 2:48 PM | Permalink

    Would expect the team to respond in a couple different ways…they won’t be able to ignore this paper. They will likely continue as they have with past challenges to the hockeystick and vigorously defend Mann et al. Would bet that folks at RC are now scouring the paper for ANY errors, omissions, corporate funding issues, typos, referencing issues or whatever with the hope of finding SOME little nugget to offer up to their readers as clear evidence of error or wrongdoing.

    Would also expect that there will be a collective groan from RC about the stats “cowboys” using an inferior model to “Lasso” the papers results….that the authors decision to use this method of course results in extremely wide confidence intervals, but that nothing in the paper invalidates their results.

    BTW: What nice juxtaposition. Mann’s original hockeystick was created as a newly minted PhD. Mann is now (further) being taken down by another young gun.

    BTW part II: I’ve gotta say, what a fabulous project for a PhD student. The analysis has immediate impact, utility and notoriety. The statistical issues are pretty clear to those trained in statistics…. just need to show the issues formally. It’s also basically free to do! The data is largely already in the public domain (with a lot of work done by others to help describe and document what actually occurred in the previous papers), which means the student can really just work with their adviser on the analysis. Can’t help but wonder if Steve or Roman or someone else might have placed a bug in the ear of a colleague or stats acquaintance about this. If so, nicely done! Would also expect there are many other juicy projects that the budding stats grad student might be able to dive into based on the work of the climate community.

    Bruce

  71. eddieo
    Posted Aug 15, 2010 at 4:06 PM | Permalink

    Steve and Ross McK have had to shoulder a huge burden over the past few years with the support of a the readers of this blog. Its great to see their work being vindicated.

    However, why has it taken so long for the statistics community to turn their attention to the Hockey Stick?
    snip – policy

    • eddieo
      Posted Aug 15, 2010 at 4:08 PM | Permalink

      Obviously that should have been “It’s” not “Its”

  72. Steve Fitzpatrick
    Posted Aug 15, 2010 at 5:23 PM | Permalink

    Elegant, thoughtful, well written. A real contribution. Thanks.

  73. Chris Watkins
    Posted Aug 15, 2010 at 5:44 PM | Permalink

    The new McShane and Wyner paper due to appear in Ann. Stats. is clearly going to be much discussed, so I thought I would get in with a few comments, after scanning it briefly.

    Let me say first that it is great news that some stats journals are taking a look at climate reconstructions. Unfortunately the first half of this paper is very silly, and the second half is slightly more sensible, and the most plausible reconstruction they produce…..looks rather like the hockey-stick.

    In the first half, they take 1200 temperature proxy series (treated as independent variables) and fit them to 119 temperature measurements (keeping overlapping holdout sequences of 30 yearly temperature measurements). Fitting 1200 coefficients to 119 data points is of course hopeless without further assumptions. Instead of doing some form of thoughtful data reduction, they employ the lasso to to the regression directly, with strong sparsity constraints.

    They justify their choice of the lasso by saying:
    “…the Lasso has been used successfully in a variety of p >> n contexts and because we repeated
    the analyses in this section using modeling strategies other than the
    Lasso and obtained the same general results.”
    Both parts of this statement are wrong, and the first part is a MORONIC thing for statisticians to say. They give absolutely no reasons to suppose that the Lasso — a method that makes _very_strong_ implicit assumptions about the data — is in any way appropriate for this problem.

    The Lasso _is_ appropriate in certain cases where you believe that only a small subset of your variables are relevant. To use it as a substitute for any data reduction with 1200 variables and 119 data points, when _all_ the temperature proxy series are presumed to be relevant to some degree, and all are thought to be noisy, is simply stupid.

    Not surprisingly, they find they can’t predict anything at all using the Lasso. (It is a completely inappropriate technique for the problem.)

    In the second half of the paper, they do something which is almost sensible, (but less sensible than what the climate modellers do). They take 93 proxy series that go back a thousand years, and do OLS regression on various numbers of principal components of these series. Regressing on just one PC gives more or less Mann’s curve (ironically this is probably the most defensible prediction from all the ones they try)– when they regress on 10, they back-cast historical upward trends. If they were being agnostic statisticians, then I suspect that from the cross-validations they show, the most conservative model they could choose would be a model predicting on one or a very few principal components.

    Hey presto, they’ve recovered Mann’s hockey stick as a most plausible estimate. As Garfield would say, Big Hairy Do.

    That’s the bulk of the paper. Some but not all of the points they make about over-tight confidence intervals in the previous literature seem valid.

    In my opinion they do not introduce any useful new techniques into palaeclimate reconstruction: their main contribution is to show that using the Lasso with no prior dimension reduction is as useless an idea as any sensible person would expect it to be.

    This paper shows, if proof were needed, that it is possible to get ill-considered papers into good peer reviewed journals, especially if they are on hot topics.

    • Ron Cram
      Posted Aug 15, 2010 at 6:38 PM | Permalink

      Chris,
      I’m sorry but just claiming Lasso is a stupid idea doesn’t prove anything. You are going to need more than arm-waving to dismiss this paper. You certainly have not shown anything of substance in this comment.

      • Dave Dardinger
        Posted Aug 15, 2010 at 6:57 PM | Permalink

        Re: Ron Cram (Aug 15 18:38),

        I found at least one Chris Watkins who should know what he’s talking about, whether he’s correct or not. He’s a professor / Phd in England who’s into artificial intelligence and presumably knows statistics. One thing he says is of interest:

        The Lasso _is_ appropriate in certain cases where you believe that only a small subset of your variables are relevant.

        As I recall that the whole purpose of PCA is to eliminate irrelevant unneeded (i.e. relevant)proxies. Or if he’s referring to climate variables like humidity, rainfall, temperature, wind vector, etc. then if we can’t assume only a small subset are relevant, the whole proxy idea is clearly worthless.

        • Ron Cram
          Posted Aug 15, 2010 at 7:07 PM | Permalink

          Dave,
          He may well know what he is talking about, but his comment did not prove anything to me. It is easy to make claims about Lasso (which may or may not apply to PCA as well), but a claim does not prove anything. If Lasso is what he says it is, he can provide a citation to show it.

      • Posted Aug 15, 2010 at 7:08 PM | Permalink

        Re: Ron Cram (Aug 15 18:38),
        Chris at least knows something about the Lasso, and has explained it well. It is a regularisation technique, which has the characteristic that it takes a problem for which you don’t have enough information to find a solution, and you choose a solution on a restricted subspace. What you get depends entirely on what restriction you apply. In effect, you arbitrarily supply the extra information. Their restriction is that they do an OLS minimisation with a restriction on the L1 norm of the slopes (sum of absolute values). Do you have any thoughts on why that is a good idea? Or any substantive comments?

        Let’s hear it for the Lasso!

        • Ron Cram
          Posted Aug 15, 2010 at 7:29 PM | Permalink

          I have no idea if the Lasso is a good idea or not, but whenever I make a statement like that I try to provide some support for the claim. Chris’s post is nearly void of support.

          The authors chose the Lasso for a reason. Why was their reasoning lacking? Just saying it is a stupid idea is not compelling. Is there a statistical textbook that says Lasso is a bad idea in this circumstance? If so, how did the paper get passed the reviewers?

        • pete
          Posted Aug 15, 2010 at 7:35 PM | Permalink

          I thought Chris’s objection to the Lasso was clear enough.

          It would be fair to ask him to simplify his comment for this audience, but your accusation that his objection was “nearly void of support” is false.

        • Ron Cram
          Posted Aug 15, 2010 at 7:48 PM | Permalink

          Pete,
          Actually, I think Chris’s clarification below is presented much better and his conclusion is much toned down. So, evidently even he did not think his conclusions were well supported.

      • pete
        Posted Aug 15, 2010 at 7:17 PM | Permalink

        The Lasso with L1 penalty is a stupid idea because it basically selects a small subset of the proxies rather than trying to extract a common signal.

        This would be a good idea if you think there are a small number of strong proxies hidden amongst a set of not-actually-proxies.

        Terrible idea if you have a set of weak proxies.

        • hengav
          Posted Aug 16, 2010 at 12:13 AM | Permalink

          Are you suggesting that they “cherry picked” the Lasso, or are you inferring that the proxies are weak and the Lasso was not appropriate?

    • Kenneth Fritsch
      Posted Aug 15, 2010 at 6:59 PM | Permalink

      I guess we are simply to accept that the methods of the authors are moronic and silly. How about bizarre?

      The authors say the following about the Lasso method:

      “We chose the Lasso because it is a reasonable procedure that has proven powerful, fast, and popular, and it performs comparably well in a p � n context. Thus, we believe it should provide predictions which are as good or better than other methods that we have tried (evidence for this is presented
      in Figure 12). Furthermore, we are as much interested in how the proxies fare as predictors when varying the holdout block and null distribution (see Sections 3.3 and 3.4) as we are in performance. In fact, all analyses in this section have been repeated using modeling procedures other than the Lasso and qualitatively all results remain more or less the same.

      Due to the L1 penalty, the Lasso tends to choose sparse ˆ βLasso thus serving as a variable selection methodology and alleviating the p � n problem. Furthermore, since the Lasso tends to select only a few
      of a set of correlated predictors, it also helps reduce the problem of spatial correlation amongst the proxies.”

      Actually the main models analyzed by the authors got most of the instrumental period right (up to the 1990s, but none (even the best, the Bayesian model) could get the run-up to the 1990s correct.

      Also interesting to note was that the hold outs of the early and late part of the instrumental period gave some rather unique results and, of course, those are the hold outs frequently used by climate scientists.

      Mann et al. in the original HS paper used one PC, but I believe a later paper needed to use 4 PCs to get the HS. The authors here note that one gets the HS regressing on PC 1 but using PCs 1- 10 and a two stage model featuring one local temperature principal component and ten proxy principal components did not. Of course, the question becomes what is the selection criteria for PCs and does using only one uniquely give a HS.

      • Kenneth Fritsch
        Posted Aug 15, 2010 at 7:00 PM | Permalink

        That should be got it right up to the 1990s run-up.

        • Chris Watkins
          Posted Aug 15, 2010 at 7:15 PM | Permalink

          I guess I should learn pause and use MUCH more measured and polite language before hitting the “post” button late at night to post a message irrevocably into the blogosphere. 🙂 Sorry about that. My bad. This’ll teach me.

          Nevertheless, I think my main point stands. What perhaps I should have said was:

          What the Lasso does does — intuitively put — is to force quite a lot of the regression coefficients to be zero. This means that if you have (as in this case) about 120 data points, and you have a much larger number of proxy series ( over a thousand), then _if_ you believe that there is a linear combination of just a small number of the proxy series that can correctly fit the data, then a Lasso is a good technique to try. In other words, use it if you believe that there may be a good prediction rule based on a small number of the series.

          Now, is this reasonable for climate proxies? Well … perhaps … but common sense might indicate not. After all, all, the proxy series are thought to be noisy, and it would seem reasonable (if they can be used at all to predict temperatures) that you would want a rule that combined a lot of them so as to average out errors in any individual series.

          Hence, for this type of data, it does not seem plausible that taking over 1000 series and selecting a few of them to fit the (short) temperature series is going to produce a good predictor.

          Well, this paper seems to show that, indeed, using the Lasso doesn’t work well for this problem.

          That’s a valuable thing to show. It’s reasonable to try the Lasso — but I don’t think we should be surprised that it didn’t work here. Other methods which make different assumptions about the data might work much better.

        • Stan Plamer
          Posted Aug 15, 2010 at 7:44 PM | Permalink

          The authors indicate that they tried other methods than the Lasso. In figure 12, they show the results of other methods in comparison to the Lasso. The authors indicate that the Lasso performed as well as these other metihods which include ones used in previous reconstructions.

        • Stan Plamer
          Posted Aug 15, 2010 at 7:48 PM | Permalink

          From teh paper

          We plot our results in figure 13 and again include tehe boxplot to ten principal components from figure 11 for easy reference. Again, there is simply not much variation in holdout RMSE across various model specifications. No method is a clear winner

        • Ron Cram
          Posted Aug 15, 2010 at 7:52 PM | Permalink

          Stan,
          Thank you for this contribution. This raises a good question for Chris, here.

          Chris,
          What method do you think should have been used given the statistical properties of the data and the goal of temp reconstruction? Is there a method you would intuitively look at first? Is this method one of the ones the authors tested?

        • Stan Plamer
          Posted Aug 15, 2010 at 8:17 PM | Permalink

          In figure 14, the authors present backcasts for various methods of methods of model building including the Lasso. One the the main points that the authors are trying to make is shown there. There are three distinctly different backcasts created by techniques that are are equivalent in the validation period. This is a characteristic of the problem.

          So those people who are saying that the Lasso is a bad thing are misunderstanding the authors. It is very difficult to know which method is a “good thing” if they are all roughly equivalent in the validation period due to the noise in the proxies and the brevity of the calibration period.

          The authors’ point seems to be going right over the heads of some of their critics.

        • Stan Plamer
          Posted Aug 15, 2010 at 8:20 PM | Permalink

          To be clear, the backcasts in figure 14 range from a hockey stick to one with continuously rising temperature.

          As the carney’s say “You pays yer money. You takes yer choice”

        • Chris Watkins
          Posted Aug 15, 2010 at 8:40 PM | Permalink

          From section 4 onwards, and in figure 14, they consider only 93 proxy series. Using the Lasso with 93 series and 150 data points is far more reasonable than with 1000+ series: in this case, the variable selection does not have to be nearly so extreme as with 1000+ series — but they don’t seem to show the Lasso backcast ! Or am I wrong?

        • Stan Plamer
          Posted Aug 15, 2010 at 9:04 PM | Permalink

          They are not making a case for the Lasso. Their point is the difficulty of selecting a modelling technique given the parameters of the problem

          From the paper:

          Throughout this section we assess the strength of the proxy signal by building models using the lasso (Tibshirani, 1996)

          )

        • Ron Cram
          Posted Aug 15, 2010 at 7:44 PM | Permalink

          Chris,
          I think I am able to follow this comment a little better than the previous one. However, I still do not understand. The authors of the paper agree with you that the number of the proxies is much higher than the number of “target data points.” But I do not understand the meaning of that term. I would think there would be at least 1,000 data points in the reconstruction since it goes back at least 1,000 years. How does “target data points” differ from the data points I have in mind?

        • Stan Plamer
          Posted Aug 15, 2010 at 7:56 PM | Permalink

          The data points in question are the data points in the calibration period which are used for the creation of the model

        • Ron Cram
          Posted Aug 15, 2010 at 8:03 PM | Permalink

          Stan,
          Thank you again. Makes sense now.

        • Chris Watkins
          Posted Aug 15, 2010 at 8:12 PM | Permalink

          My reading of the paper is as follows (please correct me if I am wrong).

          There are only (approx) 150 years of data of measured temperatures. There were nearly 1200 “proxy series”, which are numbers that are related to temperatures (tree-ring widths and other things). To estimate temperatures far in the past from the proxies, you need to find the relationship between the proxies and the 150 years of measured temperatures.

          What they did was to perform a regression to find a rule for predicting temperatures from proxies. This rule is of the form “to predict the temperature in a particular year, multiply each proxy by a certain weighting factor (aka regression coefficient), and then add up all these terms to get the estimate of the temperature”. The weighting factors for recent years are then used to estimate temperatures for past years before regular measurements were made.

          But how to check the validity of their estimates? What they did was to “hold out” 30 years of data from the measured record, and find the prediction rule from the 120 or so (119?) remaining years. Then they can test the prediction rule by seeing how well it predicts the temperatures in the years they “held out”. This is a standard statistical technique for assessing the accuracy of a prediction rule: you construct the rule from part of your data (120 years) and then test it on some other data that you “held out”, and you see how accurate it is. (They did this for all the possible periods of 30 years they could hold out).

          Now, since there are only 120 or so “known” temperatures, and 1000 proxy numbers for each year, you could fit the 120 observed temperatures exactly with many different choices of 1000 weighting factors.

          The Lasso is a technique for searching for rules that have most of the weighting factors equal to zero, so that you only consider temperature prediction rules based on the values from a small number of the proxies. If there were a good prediction rule based on a small number of the proxy series — some weight for tree13, lake22, ice core12, etc, and which ignored all the other proxy series, then the Lasso might find it.

          For some problems where this is a reasonable assumption, the Lasso does great.

          However, for climate proxies, this assumption doesn’t seem reasonable. A more reasonable approach, for example, might be to find a rule based on, say, averaging the proxies of each different type. A better idea might be to take principal components, which (crudely) is a way of extracting the “main variation” from a set of proxies (I haven’t described that very well — but the idea is that principal components preserve the most important types of variation in a set of proxies, while getting rid of individual glitches.)

          Well, the found the Lasso was barely able to predict the holdout sets at all. Perhaps this might have been expected — but perhaps it was also worth a try, and at least we now know that it didn’t work too well.

          Principal components of the 93 proxies used later in the paper seem (according to their graphs) to predict the holdout sets better.

          Then they apply the rules for the 93 proxies to the whole 1000 years. Using a rule based on one PC of the 93 proxies, they get the hockey-stick. As they include more PCs, they get different curves.
          Figuring out which is the most plausible curve,l and what the plausible range of uncertainty is, is quite subtle, and I feel that a more careful discussion would be needed in this part of the paper. In particular, there are many more proxy series that go back a few hundred years, and the estimates from these could be used to choose the appropriate number of PCs to use for the 93 series.

        • Stan Plamer
          Posted Aug 15, 2010 at 8:23 PM | Permalink

          See figures 13 and 14. Which rule will you pick. How do you know it is a good rule?

        • Chris Watkins
          Posted Aug 15, 2010 at 9:04 PM | Permalink

          A hard question — but small differences in cross-validation error is not the only criterion.

          A natural thing to do would be to try to use (hopefully) more accurate back-casts for a few hundred years to distinguish between alternative models for even longer back-casts.

          For example, if you can estimate the temperature back to, say, 1700 with reasonable accuracy, then you have some basis for choosing between different models based on just the 93 series which go back much further. But this is beyond the scope of the paper.

          Section 4 of the paper seems just based on the 93 proxy series that go back 1000 years.

          If you frame the problem in terms of _just_ these 93 proxy series, as if that was all the information that you have, then uncertainties will of course be larger. Their Bayesian uncertainty estimate might be reasonable — but I don’t know because it is discussed rather briefly.

          It would be nice to see uncertainty estimates for models with smaller numbers of PCs also – 10 PCs seems a lot to me, and there seems no attempt to pick a ‘correct’ number of PCs other than by cross-validation in the regression.

        • Steve Fitzpatrick
          Posted Aug 15, 2010 at 9:29 PM | Permalink

          Chris,

          You said: “For example, if you can estimate the temperature back to, say, 1700 with reasonable accuracy, then you have some basis for choosing between different models based on just the 93 series which go back much further.”

          As I am sure you known, there is no instrument record back to 1700, so I am really not sure how you would declare a reconstruction to 1700 ‘reasonably accurate’. Please describe how you propose to do this.

          One of the principle conclusions of the paper is that an accurate reconstruction may just be impossible:
          “since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just
          too small for accurate reconstruction.”

          This is a subtle and elegant paper. It is not at all clear to me why you appear to believe it can be dismissed, especially since (it seems) you have given little or no thought about what the authors actually concluded. As I said before, if I were you, I would be cautious in my critiques of this paper.

        • Mike Hollinshead
          Posted Aug 16, 2010 at 1:18 PM | Permalink

          The Central England temperature series goes back to 1659.

          Click to access qj74manley.pdf

        • Chris Watkins
          Posted Aug 15, 2010 at 9:07 PM | Permalink

          If I were pushed to make a choice, I’d choose a really simple rule with reasonable cross-validation accuracy. Fewer than 5 PCs. (But you need to look at the PC values etc. )

        • Bernie
          Posted Aug 16, 2010 at 6:39 AM | Permalink

          Chris:
          I think you have helpfuly forced a deeper thinking about the paper. I had a slightly different read of the point of the entire exercise, namely, one way or the other legitimate confidence intervals need to be constructed around the proxies, IF you are going to try to use the proxies to construct a backcast. This they did – though I would be interested in hearing your assessment of the reasonableness of these confidnece intervals.
          As for the data reduction bit – there I think that you cannot possibly and legitimately do it without (a) assessing individual (b) ensuring that the PCs are themselves robust and (c) explaining the nature of the signal in each of the PCs.

        • RomanM
          Posted Aug 15, 2010 at 9:06 PM | Permalink

          I think that you may misunderstand the role of the lasso methodology in this paper. They are not claiming that it is the best method to use for the reconstruction.

          From the Page 8:

          First, we endeavor to judge regression-based methods for the specific task of predicting blocks of temperatures in the instrumental period. Second, we study specifically how the determination of statistical significance varies under different specifications of the null distribution.

          The intent from the lead-in summary on page 1 is to show:

          Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts.

          Mann et al. first select proxies based on correlation with the calibration temperatures and then does CPS and/or EIV regression to construct their series. This method also selects proxies based on correlation as it at the same time that it does a least squares technique to combine the results. Although I am not that familiar with the lasso methodology, Tibishrani’s glowing description in his original paper would not seem to rule out its use in this case:

          The ordinary least squares (OLS) estimates are obtained by minimizing the residual squared error. There are two reasons why the data analyst is often not satisfied with the OLS estimates. The first is prediction accuracy: the OLS estimates often have low bias but large variance; prediction accuracy can sometimes be improved by shrinking or setting to 0 some coefficients. By doing so we sacrifice a little bias to reduce the variance of the predicted values and hence may improve the overall prediction accuracy. The second reason is interpretation. With a large number of predictors, we often would like to determine a smaller subset that exhibits the strongest effects. The two standard techniques for improving the OLS estimates, subset selection and ridge regression, both have drawbacks. Subset selection provides interpretable models but can be extremely variable because it is a discrete process-regressors are either retained or dropped from the model. Small changes in the data can result in very different models being selected and this can reduce its prediction accuracy. Ridge regression is a continuous process that shrinks coefficients and hence is more stable: however, it does not set any coefficients to 0 and hence does not give an easily interpretable model.

          We propose a new technique, called the lasso, for ‘least absolute shrinkage and selection operator’. It shrinks some coefficients and sets others to 0, and hence tries to retain the good features of both subset selection and ridge regression.

          I agree with you that it is likely not the best approach and the authors do not claim it to be, but it does seem to serve the purpose for which it is intended.

        • Bman
          Posted Aug 16, 2010 at 12:28 PM | Permalink

          Tibshirani’s original paper original lasso paper included several examples with 50 observations and 1 predictor variable, and another with 100 observations and 40 predictor variables. In comparison, McShane and Wyner’s paper has 119 observations and 1200 predictor variables. Look at how the number of predictor variables is many times larger than the number of observations in this last case. To be honest, I’m not sure whether this is more than the lasso can handle or not. But it is something worth looking into. Does anyone know of a published paper using the lasso in which the number of predictor variables was as large as this, relative to the number of observations?

          In a 2009 paper appearing in the Annals of Statistics (a different but highly regarded journal), “Lasso-type recovery of sparse representations for high-dimensional data” by Nicolai Meinshausen and Bin Yu, the authors note that the lasso can fail (‘fail’ is an oversimplification, but the idea is that the lasso can run into serious problems) in the presence of highly correlated variables. My ubderstanding is that ‘highly correlated variables’ does describe the situation in this case. So does this matter? Again, I’m not sure.

          I look forward to the discussion of this paper in the AofAS when it finally appears, as I trust some knowledgeable lasso experts will weigh in on this and shed some light on the use of the methodology in this paper.

        • Michael Jankowski
          Posted Aug 16, 2010 at 5:22 PM | Permalink

          ‘least absolute shrinkage and selection operator’

          Better not put it in the pool if you want to minimize shrinkage!

        • Posted Aug 15, 2010 at 9:07 PM | Permalink

          Re: Chris Watkins (Aug 15 20:12),
          One thing that puzzled me – you may have a view. The proxies are a very disparate group, with different scalings, different units. The regression coefficients will have those scalings inverted. Yet the Lasso just seems to sum the abs coefs, in different units.

        • Chris Watkins
          Posted Aug 15, 2010 at 9:16 PM | Permalink

          It would be natural to rescale the proxies so that they had the same variances so that regression coefficients of similar importance would be of comparable size.

        • Posted Aug 15, 2010 at 9:55 PM | Permalink

          Re: Chris Watkins (Aug 15 21:16),
          Thanks – they describe X as a scaled matrix, so I guess that is what they mean.

        • Kenneth Fritsch
          Posted Aug 15, 2010 at 7:52 PM | Permalink

          It has been implied that the Mann HS depends on a few proxies, like bristlecone tree rings (unfortunately, bristlecones are reputed to be a poor proxy for temperature) and many of the other proxies are merely white noise (or weak indicators, if you like). I am not sure that all weak signal proxies is an apt description of the proxies “selected” for reconstruction.

          Does using PC1 only, even though it may have no physical meaning, make intuitive sense if we judge that the reconstruction contains many weak signal proxies? Would PC1 somehow zero in on the signal in many weak proxies?

        • Steve Fitzpatrick
          Posted Aug 15, 2010 at 7:52 PM | Permalink

          I would urge a measure of caution here. Whatever misgivings you might have about the use of the Lasso, these authors appear to have given the problem considerable thought and analysis. There is every reason to believe that the uncertainty limits routinely used in past climate reconstructions are artificially small.

          I would also remind you that much of the hockey-stick shape in the reconstruction of Mann 08 (and earlier reconstructions) is the result of a small group of proxies which are of questionable validity (bristle-cone pines and Tiljander lake varves). Absent these few proxies, even the methodology of Mann 08 shows a MWP temperature comparable to present day temperatures, very much like what these authors have calculated with their 10-PC reconstruction.

        • Posted Aug 15, 2010 at 10:30 PM | Permalink

          Re: Steve Fitzpatrick (Aug 15 19:52),

          > …much of the hockey-stick shape in the reconstruction of Mann 08… is the result of a small group of proxies…

          In my opinion, the effect of the Tiljander proxies on the Mann08 CPS and EIV reconstructions has to be considered “uncertain” at this time.

          The problem is that Mann08 did not perform adequate sensitivity tests of the Tiljander (or any other) proxies. (To be fair, Gavin Schmidt and others have argued that the twice-revised/corrected SI Fig. 8a and a Mann09 SI figure are good sensitivity tests, but I cannot agree–there is no clear-cut way to interpret those figures, in my opinion.)

          Thus, it takes running the Mann08 code and doing some tinkering to address this issue. But to my knowledge, few people have done so.

          Steve McI has emulated Mann08 code in “R”; I believe that he has found that Tiljander is a major contributor. However some RealClimate bloggers (including Gavin, IIRC) have contested the adequacy of his emulations.

          Jeff Id has also emulated Mann08 code in “R”. He produced a sensitivity diagram that showed very modest contributions of Tiljander; unsurprisingly, they were strongest in the earliest years. But Jeff has cautioned that this work is not definitive.

          RomanM has also successfuly emulated Mann08 code. I don’t know his view of the matter.

          The coauthors of Mann08 have not offered any comments on this issue, to my knowledge.

          As far as I know, no pro-AGW Consensus researchers have reported the emulation of Mann08 code (MatLab or “R”), so I don’t know what their take might be.

          I would (obviously) be very interested in any informed opinion on this point.

          Steve: The impact of Tiljander on CPS verification statistics reported in my previous posts is certain at this point. I’ve precisely replicated Mann’s CPS and there is no basis for contesting these results.

        • Steve Fitzpatrick
          Posted Aug 15, 2010 at 10:39 PM | Permalink

          Mann 08’s twice revised SI graphic shows a reconstruction with neither Tiljander varves nor bristlecone pines. That graph shows a substantially warmer MWP, certainly comparable to the mid-late 20th century.

        • Stan Plamer
          Posted Aug 16, 2010 at 6:39 AM | Permalink

          Doesn’t this support the McShane et al conclusions. The paramaters of the reconstrcution problem (noise proxies, brief instrumental period) problem are such that the shape of any proposed reconstruction depends more on the technique chosen than on information contined in the proxies.

        • Steve Fitzpatrick
          Posted Aug 16, 2010 at 7:10 AM | Permalink

          Fair enough, but this certainly doesn’t refute McShane et al either. The point I was trying to make was that you get a much warmer MWP in the reconstruction from Mann 08 methodology when a small number of questionable proxies are removed. The McShane et al method with more principle components included also yields a much warmer MWP, in spite of including the same questionable proxies in the analysis. Does the inclusion of more principle components effectively de-weight the questionable proxies that generate cooler MWP and HS shape of Mann 08? Sure sounds like it.

        • Posted Aug 15, 2010 at 11:35 PM | Permalink

          Re: AMac (Aug 15 22:30),

          Thanks, Steve. A question that is different from the Tiljander proxies’ effects on verification statistics is their effects on the shape of the reconstruction spaghetti curve. Any thoughts on that related issue?

        • Posted Aug 17, 2010 at 1:12 AM | Permalink

          Re: AMac (Aug 15 22:30),

          I’ve done a more careful compilation of the Tiljander proxies, and put up some pretty pictures as a blog post, The Tiljander Data Series: Data and Graphs. Mainly Excel charts, to give people (incl. me) a better chance to visualize what these numbers look like.

        • scientist
          Posted Aug 15, 2010 at 7:58 PM | Permalink

          This would seem to argue agains the recent Frank, Wilson, Zorita opinion piece arguing that expert-selected smaller sets of good proxies would give a better recon than the Mann-proxyhopper way. Not to say that the Mann-hopper works either, we could have an insoluble problem for instance, just that if Frank et al approach was the way to go, why didn’t Lasso show it?

        • Szerb fan
          Posted Aug 17, 2010 at 8:37 AM | Permalink

          “What the Lasso does does — intuitively put — is to force quite a lot of the regression coefficients to be zero. This means that if you have (as in this case) about 120 data points, and you have a much larger number of proxy series ( over a thousand), then _if_ you believe that there is a linear combination of just a small number of the proxy series that can correctly fit the data, then a Lasso is a good technique to try.”

          Is this via coefficients between zero and one? Or can some of the coefficients be set negative, if that produces a better fit?

  74. Stephen Parrish
    Posted Aug 15, 2010 at 6:35 PM | Permalink

    The more recent availability of Gavin and the less than steadfast support of Mann (I recall here someone even saying he was a known and discounted commodity in the community!) that produced so many posts in the last weeks or so seem well timed with this paper.

    Walk back?

  75. stephan
    Posted Aug 15, 2010 at 8:54 PM | Permalink

    SM you are far too modest. A lot of this is due to you. Hopefully one day you will be nominated the nobel prize for your work

  76. PJP
    Posted Aug 16, 2010 at 8:38 AM | Permalink

    A very interesting paper. I just wish my statistics/mathematics was up to fully understanding it. However, the writing style makes it possible to follow even without that specialized knowledge – congratulations to the authors on achieving that!

    I foresee two two outcomes directly affecting the AGW proponents, one negative (from their point of view) and one positive.

    Negatively, I think that this puts the final nail in the coffin of dendroclimatology. By taking what is supposed to be the preeminent data set exemplifying the hockey-stick effect and showing that when subject to rigorous analysis a temperature signal can not be discerned with any reasonable level of confidence, this particular track appears dead.

    Positively, much will be made of the claim that rigorous analysis of the data set shows that the last decade has an 80% probability of being the hottest ever (not really ever, but that is what they will say).

    However, this paper concentrated on the statistical analysis used to generate the hockey stick, not on the data, which is the correct way to proceed. Dissection of the data-set can proceed independently, probably using the techniques from this paper.

    • Ron Cram
      Posted Aug 16, 2010 at 8:46 AM | Permalink

      PJP,
      I agree. The authors had to analyze the data used by Mann in order to get it published in a statistics journal. But a follow on paper should still focus on the statistics but could conceivably use the same data minus the strip bark trees and making the Tiljander series right side up. In other words, just making the changes the NAS has supported and fixing obvious errors. It would be interesting to see what they conclude.

    • two moon
      Posted Aug 16, 2010 at 10:30 AM | Permalink

      Yup. The point is not that one reconstruction is better than another, but that all reconstructions using these proxies are castles in the air.

  77. John Schadenfreude Archer
    Posted Aug 16, 2010 at 12:43 PM | Permalink

    Ha ha!

    — Nelson Muntz

  78. Bernie
    Posted Aug 16, 2010 at 12:46 PM | Permalink

    For those of you who do not frequent Bishop Hill’s admirable site, here is my take.

    The McShane and Wyner paper reminds me of the Maine (USA) joke relayed by Marshall Dodge of Bert and I fame. (The CDs are available at Amazon.)

    It goes something like this.

    A New York tourist in a big swanky convertible, pulls up in front of Bert, our hero, while he is relaxing in his rocking chair on his porch. (The sound effects of a powerful engine and screeching brakes add to the story.)
    The New Yorker abruptly asks “Which way to Millinocket?” (Lots of towns in Maine have names derived from the Indians – and many New Yorkers are nothing if not abrupt.)
    Bert, after thinking awhile, says to the tourist. “Let me see. Go west about 3 miles and take a left. Hold on. No, that not right. Let me see. Go east about 2 miles and take a left. No, that’s not right either. Come to think of it, you can’t get there from here.”

    Thus ends the HS.

  79. Bob Hamilton
    Posted Aug 16, 2010 at 1:18 PM | Permalink

    From the paper:
    “All data and code used in this paper are available at the Annals of Applied
    Statistics supplementary materials website:
    http://www.imstat.org/aoas/supplements/default.htm

    Thank you, Steve. I think your efforts are bearing fruit.

  80. MikeN
    Posted Aug 16, 2010 at 1:20 PM | Permalink

    The new result still looks like a hockey stick.

    • Ron Cram
      Posted Aug 16, 2010 at 1:27 PM | Permalink

      No, it doesn’t. There is an uptick in the 20th century, as would be expected. But it is still lower temps than the MWP. That is the key issue. Today’s temps are not outside natural climate variation.

      • Mark F
        Posted Aug 16, 2010 at 1:45 PM | Permalink

        Which wouldn’t be a big deal without the acrobatics of the “team” in trying to suppress evidence of the MWP etc. etc. All well-documented, diarized and debunked. From the looks of things, it’s time to compile data on species evolution and extinction before history is again subject to rewrite by the “movement”. Sigh….

        • robert
          Posted Aug 16, 2010 at 2:08 PM | Permalink

          A conclusion of the paper is that the warmest 10 year period on record is 1997-2006 and they say this has an 80% probability…

        • stephen richards
          Posted Aug 16, 2010 at 2:30 PM | Permalink

          using the data supplied by the team.

        • SOI
          Posted Aug 16, 2010 at 2:38 PM | Permalink

          Robert,

          Actually no. It is not a “conclusion” of the paper that there is a 80% probability that the warmest 10 year period on record is 1997-2006. I really hope people (and that includes Gavin) stop saying this as it is disingenuous. What the paper says is that their model gives an 80% chance that 1997-2006 was the warmest in the past thousand years. They strongly caveat this finding by saying that this depends on:
          – perfect proxy data (no data quality issues)
          – linearity and stationarity of the relationship between temperature and proxies
          – proxies being able to capture sharp run-ups in temperature

          They say that the first two assumptions are “substantial” and that evidence shows that the third condition is not being met, giving false confidence to the probabilities. Add these uncertainties to the mix, and you have a probability well south of 80%.

        • Manfred
          Posted Aug 16, 2010 at 2:59 PM | Permalink

          “While this does seem alarming, we should temper our alarm somewhat by considering again Figure 15 and the fact that the proxies seem unable to capture the sharp run-up in temperature of the 1990s. That is, our posterior probabilities are based on derivatives from our model’s proxy-based reconstructions and we are comparing these derivatives to derivatives of the actual temperature
          series; insofar as the proxies cannot capture sharp run-ups, our model’s reconstructions will not be able to either and therefore will tend to understate the probability of such run-ups.”

          Their true result is that Mannian and Ammannian results are wrong and proxies pretty poor in drawing any conclusions about the past.

  81. Mike B
    Posted Aug 16, 2010 at 1:55 PM | Permalink

    Would be very curious to get comments from UC, Jean S., and Hu on how this approach compares to the calibration approach they’ve proposed.

    I’m not convinced this new paper formulates the problem better than they did, McShane and Wyner just got published in a higher visibility journal.

  82. Posted Aug 16, 2010 at 3:02 PM | Permalink

    From Deltoid:

    The funny thing is that this paper actually replicates Mann et al. 2008 without even noticing it…

    To partake in this dirty little secret, see their Figure 14 on page 30: the blue curve is wiggle-identical and practically a photocopy of Mann’s corresponding EIV NH land curve. As it should be. The higher (green) curve they canonize and which is shown above is the result of an error: they calibrate their proxies against hemispherical mean temperature, which is a poor measure of forced variability. The instrumental PC1 which the blue curve is based on, is a much better measure; its EOF contains the polar amplification effect. What it means is that high-latitude proxies, in order to be made representative for global temperatures, should be downweighted. The green curve fails to do this. Thus, high latitudes are overrepresented in this reconstruction, which is why the “shaft” is at such an angle, due to the Earth axis’s changing tilt effect on the latitudinal temperature dependence described in Kaufman et al. 2009.

    The authors have no way of detecting such an error as their RMSE goodness-of-fit seems to be also based around the hemispherical average…

    http://scienceblogs.com/deltoid/2010/08/a_new_hockey_stick_mcshane_and.php#comment-2729979

    And a similar point:

    About the Bayesian thingy, yes that looks interesting… actually the result is not so very different from the Mann curve, which is pointed out. BTW the differences between the curves in Figure 14 look suspiciously like the signature of the Earth axis tilt change over time, cf. Kaufman et al. I suspect this means something…
    http://shewonk.wordpress.com/2010/08/15/the-eternal-return/#comment-2758

  83. Posted Aug 16, 2010 at 3:39 PM | Permalink

    Author Bios:

    Blakeley B. McShane, B.S. Economics Summa Cum Laude, University of Pennsylvania (2003), B.A. Mathematics Summa Cum Laude, University of Pennsylvania (2003), M.A. Mathematics, University of Pennsylvania (2003), Studies in Philosophy, University of Oxford (2004-2005), M.A. Statistics, University of Pennsylvania (2010), Ph.D. Statistics, University of Pennsylvania (2010), Donald P. Jacobs Scholar; Assistant Professor of Marketing, Northwestern University (2010-Present)

    Abraham J. Wyner, B.S. Mathematics Magna Cum Laude, Yale University (1988), Ph.D. Statistics, Stanford University (1993), National Science Foundation Fellowship (1989-1991), Acting Assistant Professor of Statistics, Stanford University (1993-1995), National Science Foundation Post-Doctoral Fellowship in the Mathematical Sciences (1995-1998), Visiting Assistant Professor of Statistics, University of California at Berkeley (1995-1998), Assistant Professor of Statistics, University of Pennsylvania (1998-2005), Associate Professor of Statistics, University of Pennsylvania (2005-Present)

    It is always good to have these handy.

  84. Kenneth Fritsch
    Posted Aug 16, 2010 at 7:04 PM | Permalink

    Please note Figure 2 in the paper titled “Proxy-based reconstructions of hemispheric and global surface temperature variations over the past two millennia”, by Michael E. Mann, Zhihua Zhang, Malcolm K. Hughes, Raymond S. Bradley, Sonya K. Miller, Scott Rutherford, and Fenbiao Ni.

    In the graphs in this figure the reconstructions, and including the non dendro ones, appear not to match the sharp run-up to the 1990s in the instrumental period. The paper is linked below:

    Click to access MannetalPNAS08.pdf

    When the authors of McShane and Wyner 2010 show their models not matching the sharp run-up late in the instrumental period, should we be surprised. Is this a feature of Mann et al. 2008 that was sufficiently nuanced and not discussed to go unnoticed until McShane and Wyner showed their model failures?

    • Kenneth Fritsch
      Posted Aug 17, 2010 at 10:01 AM | Permalink

      The SI to the paper linked in my post above is linked here:

      http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/

      Please note the corrections in the SI and the very frustrating, for me that is, practice of hooking the CRU instrumental data onto the end of the reconstructions. It certainly draws attention away from the reconstructions, both dendro and non dendro, problems with divergence.

  85. Max_OK
    Posted Aug 16, 2010 at 8:03 PM | Permalink

    Posted Aug 16, 2010 at 2:38 PM | Permalink | Reply
    Robert,

    Actually no. It is not a “conclusion” of the paper that there is a 80% probability that the warmest 10 year period on record is 1997-2006……
    ====
    I’m not so sure. Read the Conclusions on page 37.

    • John M
      Posted Aug 16, 2010 at 8:25 PM | Permalink

      One would be wise to read all of Section 5.4.

  86. Posted Aug 16, 2010 at 8:23 PM | Permalink

    The global temperature record has long flat stretches (80 out of ~ 120 years). It is obvious that the best fit to this will be noise on a flat line. As is obvious when M&S fit proxies and noise to the global record they find that noise is a better fit that the proxies. That should have been a warning to them but they charged ahead. A bit more a Eli’s

    The proxies are affected by local temperature (and precip, etc). The local temperatures vary more than the global ones, thus M&S get proxy sensitivities that are much smaller than they should be and noise that is much larger, esp extrapolated out to the year dot.

    Some of the other procedures may be worthwhile once the calibration about global warming (CAGW) problem is fixed. As their Fig. 16 and 17 show, the end result will be a pretty hockeyish stick. Doing it right will reduce the error bands.

    • John M
      Posted Aug 16, 2010 at 8:29 PM | Permalink

      Figure 16 looks more like a boomerang than a hockey stick. Figure 17 is only a hockey stick because of the instrumental record, which isn’t confirmed by the proxy data.

    • Stan Plamer
      Posted Aug 16, 2010 at 8:50 PM | Permalink

      From the paper

      Climate scientists have greatly underestimated the uncertainty of proxy-based resonstructions and hence have been overconfident in their models …

      …Futhermore, even proxy based models wtih approximately the same amount of reconstructive skill (Figures 11, 13 and 13) prodcie strikingly dissimilar historical backcasts; some of these look like hockey sticks but msot do not (Figure 14)

      It appears to me that the above comment is typical of the attitude of overconfidence.

    • Posted Aug 16, 2010 at 8:52 PM | Permalink

      There is no global temperature, just an ensemble of local temps.

    • Bernie
      Posted Aug 16, 2010 at 9:25 PM | Permalink

      I don’t get. Everything you say may already apply to the calibrating of the proxies whether it be against local or global temperatures. How come we are hearing the issues now rather than when Mann published his papers?
      What logical or mathematical basis do you have for suggesting that the eror bands will get narrower? For sure it looks like they cannot get much wider until of course we start looking at some of the proxies like BCPs, Yamal, Tiljander and Gaspe.

      • Steven Mosher
        Posted Aug 17, 2010 at 3:18 AM | Permalink

        Elis memory lost its teleconnection

      • John F. Pittman
        Posted Aug 17, 2010 at 6:34 AM | Permalink

        Re: Bernie (Aug 16 21:25), I agree. One of the problems that has been repeatedly pointed out here and in HSI is the different standards as applied to the Team or skeptics, with explanations that they knew this after the fact. The example I like to use is when Briffa was quoted last year stating that paleo’s “are” working to solve divergence problems versus the misleading graph indicating that they had solved the issue, just failed to mention it, except in other places. Perhaps our host would compare Eli’s to Mann’s MBH98 defense where he would deny M&M and then would admit it obliquely later, incorporate it later, or came up with a rule he didn’t follow as one that he actually did, though of course, he didn’t. All of these are well documented,especially Mann in the HSI.

    • Tom C
      Posted Aug 16, 2010 at 10:07 PM | Permalink

      Please Eli, I outlined your role many posts back. I’ll reproduce it for your benefit:

      2) Eli Rabbett will chime in with a series of weird insults involving nicknames, animals, animals with nicknames, etc. It will be incomprehensible to everyone except him but the amen corner will be ecstatic.

      This is all you are qualified to do, so please stick to the script.

    • Steven Mosher
      Posted Aug 17, 2010 at 3:17 AM | Permalink

      I’m waiting for Eli to do it right. Thats the stock response to any criticism of a Mann paper, so it’s a fitting response to Eli’s criticism of M&S. Steve is not allowed to criticize mann or jones without doing his own. therefore, eli is not allowed to criticize M&S without doing his own.

      See how silly that sounds.

      • John F. Pittman
        Posted Aug 17, 2010 at 6:51 AM | Permalink

        Re: Steven Mosher (Aug 17 03:17), I was just thinking about what Eli said, I think if you accept what he says, the conclusion pales the Mc&W10 paper. The problem is divergence. Especially divergence in the polar region.For Eli to be correct, one would need to see the same 3+/- sigma in ring widths around the start or end of the MWP at the poles or conclude that the data cannot be used. In other words, these “flat” periods are not just flat, they are non-stationary. And because of polar amplification and divergence, the area where we need the most accuracy and stationary, is the least. I think the Deltoid comment has similar problems.

        The instrumental PC1 which the blue curve is based on, is a much better measure; its EOF contains the polar amplification effect.

        Doesn’t an EOF have to be demonstrated to be accepted. Has anyone demonstrated that the instrumental PC1 contains the polar amplification effect? And if it does, why would one need to down weight high latitude proxies? It is though the paper has two problems, if it doesn’t weight the polar it is physically wrong, and if it does, it is physically wrong!?!

      • Posted Aug 17, 2010 at 6:58 AM | Permalink

        Re: Steven Mosher (Aug 17 03:17),
        nitpick, but please don’t pick up Eli’s bad habbit of naming the paper M&S.

        It’s M&W. Or at least, McS&W.

    • bender
      Posted Aug 17, 2010 at 8:42 AM | Permalink

      Doing it right will reduce the error bands.

      One of the main points of this paper is that the more right you “do it” (estimate confidence by including key uncertainties), the WIDER the bands get. Which is the exact OPPOSITE of what Eli prays for.

    • Ryan O
      Posted Aug 17, 2010 at 8:46 AM | Permalink

      Do you actually believe any of what you say?

      • bender
        Posted Aug 17, 2010 at 8:52 AM | Permalink

        Maybe he believes it when he says it. But it seems that a lot of what he says is so reactionary that it is likely to not be entirely grounded in fact. I have pointed out two cases here.

    • Spence_UK
      Posted Aug 17, 2010 at 2:05 PM | Permalink

      Have I got this right?

      Did Eli Rabett just say – in essence – that the reason random proxies match well is that the instrumental record is not significantly different to trendless noise?

      Nah. The Eli Rabett I know would *never* claim that. Must be a fake.

  87. Max_OK
    Posted Aug 16, 2010 at 10:02 PM | Permalink

    Max_OK
    Posted Aug 16, 2010 at 8:03 PM | Permalink | Reply
    Posted Aug 16, 2010 at 2:38 PM | Permalink | Reply
    Robert,

    Actually no. It is not a “conclusion” of the paper that there is a 80% probability that the warmest 10 year period on record is 1997-2006……
    ====
    I’m not so sure. Read the Conclusions on page 37.

    John M
    Posted Aug 16, 2010 at 8:25 PM | Permalink | Reply
    One would be wise to read all of Section 5.4
    ====
    Well sure, but a a conclusion is supposed to be a conclusion. I didn’t write it, they did.

    • John M
      Posted Aug 16, 2010 at 10:12 PM | Permalink

      Holy crap man.

      If you want a “conclusion”, read the friggin Conclusion (Section 6).

  88. Max_OK
    Posted Aug 17, 2010 at 12:28 AM | Permalink

    Max_OK
    Posted Aug 16, 2010 at 10:02 PM | Permalink | Reply
    Max_OK
    Posted Aug 16, 2010 at 8:03 PM | Permalink | Reply
    Posted Aug 16, 2010 at 2:38 PM | Permalink | Reply
    Robert,

    Actually no. It is not a “conclusion” of the paper that there is a 80% probability that the warmest 10 year period on record is 1997-2006……
    ====
    I’m not so sure. Read the Conclusions on page 37.

    John M
    Posted Aug 16, 2010 at 8:25 PM | Permalink | Reply
    One would be wise to read all of Section 5.4
    ====
    Well sure, but a a conclusion is supposed to be a conclusion. I didn’t write it, they did.

    John M
    Posted Aug 16, 2010 at 10:12 PM | Permalink | Reply
    Holy crap man.

    If you want a “conclusion”, read the friggin Conclusion (Section 6).
    ——————-

    Section 6 Conclusions is the section I’m talking about.

    It starts with the following sentence:

    ‘On the one hand, we conclude
    unequivocally that the evidence for a ”long-handled” hockey stick (where the shaft of the hockey stick extends to the year 1000 AD) is lacking in the data.’

    The paragraph ends with the following sentence, which I take to be the “on the other hand”:

    ‘Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distributionof our model.’

    So it looks like they aren’t sure about the handle, but suspect there may be a blade.

    • Posted Aug 17, 2010 at 3:27 AM | Permalink

      Re: Max_OK (Aug 17 00:28),
      They say in the abstract and at the end of sec 5.4 that since the proxies dont capture the late 20th century rise they could well have missed earlier similar events as well – a point made regularly by skeptics. And in the conclusions they say the long flat handle is a feature of the regression.

      My guess is that the “Nevertheless” sentence tagged on to the end of the paragraph may have been added later to satisfy an unhappy reviewer.

      • simpleseekeraftertruth
        Posted Aug 17, 2010 at 7:32 AM | Permalink

        The ‘nevertheless’ comment may be to satisfy an unhappy reviewer, nevertheless, they specifically disown any validation of the data themselves – they were only concerned with statistical analysis. Any ‘shape’, however it appeared in graphed results, would be a function of that data. If the paper passes peer review (and it might have) then the next focus has to be on the data quality with all that that implies.

        • Posted Aug 17, 2010 at 12:31 PM | Permalink

          Re: simpleseekeraftertruth (Aug 17 07:32),
          I was wrong about that. McShane’s website clarifies that the version we are reading is BEFORE refereeing, even though it is hosted on the journal’s website (which is not usual journal practice). The paper has been accepted, but the final version will be different – it will be interesting to see in what way it is different.

        • simplesekeraftertruth
          Posted Aug 18, 2010 at 10:41 AM | Permalink

          I agree, it will be very interesting if the following is retained as both pertain to data, not statistics;

          From the conclusion: “The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.”

          Ultimate sentence: “Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.”

    • SOI
      Posted Aug 17, 2010 at 7:46 AM | Permalink

      Max_OK

      The “80%” is not a conclusion. It was a calculation from a model that they acknowledge (and demonstrate) has significant weaknesses that create false confidence. As John M, points out the 80% is not in the conclusion (nor is it in the opening summary). It is disingenuous for anyone to claim that the 80% likelihood was a finding of the paper.

      • Max_OK
        Posted Aug 17, 2010 at 11:19 PM | Permalink

        True, the “80% ” is not in the conclusion section, but something similar is:

        “Nevertheless, the temperatures of the last few decades have been relatively warm compared to many of the thousand year temperature curves sampled from the posterior distributionof our model.”

        And there is no getting around that.

        • EJD
          Posted Aug 19, 2010 at 11:14 AM | Permalink

          Which is their model made from white noise…

    • geo
      Posted Aug 17, 2010 at 7:49 AM | Permalink

      The power of the hockey stick was always in the handle/shaft. If that disappears, then the AGWers can no longer airly dismiss natural variability as a contributor (note, this does not require it to be the *only* contributor) to modern warming.

      Few skeptics dismiss C02 entirely. The real question is what percentage do natural variability and UHI/land use changes play. The models, by assuming it is all C02 up to now, drive their forward predictions likewise. If it turns out that recent past C02 warming was overstated by assigning NV and UHI driven warming to C02, then forward predicted C02 warming is overstated as well.

      These are multi-trillion dollar questions –even if amelioration is still required in a more realistic assessment of C02s contribution to warming, it could be much less expensive and taken over a longer-time frame.

    • bender
      Posted Aug 17, 2010 at 8:38 AM | Permalink

      they aren’t sure about the handle, but suspect there may be a blade

      If there is an instrumental blade today that the proxies don’t capture, then there is a good chance there were similar blades in the past that were also not captured. Or did you not read the paper?

      • bender
        Posted Aug 17, 2010 at 8:45 AM | Permalink

        I see PaulM already pointed this out. But I note Max_OK hasn’t replied. This is your cue, Max_OK.

  89. Geoff Sherrington
    Posted Aug 17, 2010 at 3:11 AM | Permalink

    Eli Rabett

    In your hyperlink there is a statement about global temperatures “If you look at Tamino’s figure, for about 80 of ~120 years (M&S only go to 2000, there ain’t a lot of proxies that go to 2010), a flat line is about the best description of what happened. This covers the period from ~1880 – 1920 and ~ 1940 – 1980. In such a situation, random noise is the best description of the variation.”

    This is not correct. You have to explain, for example, why the hot year 1998 was hot. Remember that it showed hot globally, that it was hot on thermometers and that it was hot on satellite records. That’s not noise, that’s a mechanism at work. Having accepted a mechanism for 1998, which other years do you ascribe to mechanisms, before you finally get to noise? What is more, by using 1998 in your calibration period, you bias your hindcasts because you have no idea if the mechanism of 1998 was repeated in the pre-instrument reconstruction period, how often, when or to what degree.

    • Jim Crimmins
      Posted Aug 17, 2010 at 5:10 AM | Permalink

      Right, what Eli R says about flatness isn’t right, unless you believe that the proxies work better with auto-correlated temperature patterns (trends). If so, please explain the physics behind that. In other words, would the proxies have a higher correlation with an annual temperature series of (+1,-1,+1,-1) or (1,1,1,1)? Why? What are the physics behind that? It does makes sense that the proxies should work better with *larger* temperature movements, as that should drown out the other confounding factors, but as we have seen during the late 20th century they don’t.

      In terms of the local vs global calibration issue, it would be interesting to see things done both ways.

    • bender
      Posted Aug 17, 2010 at 8:34 AM | Permalink

      a flat line is about the best description of what happened. In such a situation, random noise is the best description of the variation

      What he should have said is that in such a situation a *stationary process* is the best description of the variation.

      In fact, random noise is actually the WORST possible stationary model to describe the variation! The amount of variation described by this model, in the long run, is ZERO. Hard to imagine how you could invent a worse model.

  90. AdderW
    Posted Aug 17, 2010 at 4:40 AM | Permalink

    Is anyone actually going to post an analysis of the paper here??

    • Skip Smith
      Posted Aug 17, 2010 at 6:05 AM | Permalink

      Maybe as soon as someone shows up who is actually capable of understanding the paper. Most of what we’re getting right now is the usual partisan spin from both sides.

      • Kenneth Fritsch
        Posted Aug 17, 2010 at 10:07 AM | Permalink

        I would disagree. There will always be partisan comments, but we have had some thoughtful comments that come down on both sides of the reconstruction controversy and some putting others’ comments into context. At some point readers here have to make their own conclusions.

        Of course, I am biased in favor of doing paper reviews and comments on blogs such as CA.

  91. Faustino
    Posted Aug 17, 2010 at 5:00 AM | Permalink

    I mentioned above that former Australian Statistician Ian Castles, who took on the IPCC several years ago, died two weeks ago. Here’s a link to a paper on Castle’s stoush with the IPCC, which will not surprise readers of CA but shows that their malpractice extends beyond what is discussed here.

    http://www.lavoisier.com.au/articles/climate-policy/economics/cas

  92. Joe Born
    Posted Aug 17, 2010 at 6:04 AM | Permalink

    Does any one else have trouble understanding what the authors mean by “ten repetitions of five-fold cross validation,” which they use to find their Lasso lambda parameter?

    • Jonathan Baxter
      Posted Aug 17, 2010 at 7:50 AM | Permalink

      I believe they mean they selected 10 values of lambda, tested each with 5-fold cross-validation (ie 5 disjoint 80/20 splits of the data, train on the 80, test on the 20), and kept the value of lambda that yielded the lowest test error.

      • Joe Born
        Posted Aug 17, 2010 at 10:40 AM | Permalink

        Thank you for the response. That certainly sounds right as to the “five-fold” part. The “ten repetitions” part sounds plausible, too, although if that’s what they meant it seems slightly odd that they didn’t give the list of (somewhat arbitrarily selected) lambdas from among which that procedure chose.

        • Jonathan Baxter
          Posted Aug 17, 2010 at 3:35 PM | Permalink

          They should have mentioned the lambda values and also shown the impact of the choice of lambda on test error. That may be in the supplementary material. Typically the test error is not particularly sensitive to the choice of lambda: eg if the best value is 1 the 1/2 or 2 do just as well.

  93. Hu McCulloch
    Posted Aug 17, 2010 at 3:38 PM | Permalink

    McShane and Wyner (MW 2010) do a good job of showing that the proxies used in Mann et al 2008 (M08) do little better at explaining the instrumental record than random noise. However, I don’t think their paper should be held up as an example of how proxies should be calibrated to temperature.

    First, their basic regression is one in which temperature is the dependent variable and the proxies (or proxy PCs) are taken as independent variables. This would be appropriate if treerings etc were the dog that caused global temperatures to wag, or if one had a strong prior that the calibration temperatures represent the distribution from which the temperatures to be reconstructed are drawn.

    But if temperatures are the dog that causes proxies to wag, and/or if one does not wish to presuppose that pre-instrumental temperatures look a lot like the instrumental ones, it is instead appropriate to use what CA’s UC calls “Classical Calibration Estimation,” or CCE. In the univariate case, this amounts to regressing the proxy on temperature and then inverting the OLS estimates to obtain the ML estimate of temperature given the proxy. The MW opposite procedure is what UC calls Inverse Calibration Estimation, or ICE, since runs the regression backwards from the natural way.

    Computing confidence intervals for the CCE estimates, and merging multiple proxies into one estimate is a complicated issue that I have written on at: http://econ.ohio-state.edu/jhm/AGW/Thompson6/Thompson6Calib.pdf , and don’t wish to go into here.

    However, in the single proxy case, ICE point estimates are just attenuated versions of the CCE estimates, by a factor of R^2. In the multi proxy case, one would expect the ICE estimates to be similarly attenuated versions of the Brown multiproxy CCE estimator. This over-attenuation of ICE estimates may in part explain why MW find that the proxies have a hard time explaining the recent 30 year runup of instrumental temperatures. It also means that even their backcast model (Fig. 16) may actually have too little amplitude.

    Second, I am not familiar with the “Lasso” procedure for dealing with the situation of more proxies than observations and so am wary of it. I am more comfortable with PCA as a method of dimensionality reduction. Even there, however, there should be a careful sequence of data-mining-adjusted t and/or F tests to determine how many PCs should be retained — just applying stepwise regression to the first 20 PCs or whatever sounds like data mining to me.

    And third, I don’t see that the “Bayesian” model of section 5 adds anything to a naive ICE regression of temperature on the proxies, and may even raise additional questions. MW’s essentially uniform prior on beta just gives OLS ICE point estimates back again as the posterior mean, and so makes no difference. A potential problem, however, is that MW’s essentially uniform prior on sigma favors higher values of sigma than the standard uniform prior for log(sigma) (advocated eg by Zellner, Intro to Bayesian Econometrics 1971). With a sufficiently large sample, the prior eventually gets dominated by the data, but with a limited sample, this can make the CI’s too wide. The standard uniform prior on log(sigma), on the hand, will just give the standard Student t distribution back as the posterior distribution. I only use a Bayesian approach in my paper because I can’t figure out how to get CI’s otherwise.

    Of course, the big problem with M08 is their proxy set, which is dominated by spuriously HS series like Tiljander. MW just take this set as given (after reducing three highly correlated Tiljander series to just one to avoid computational problems), but find plenty of purely statistical problems even without questioning the data series themselves.

    • Kenneth Fritsch
      Posted Aug 17, 2010 at 4:59 PM | Permalink

      “..just applying stepwise regression to the first 20 PCs or whatever sounds like data mining to me.”

      Shouldn’t that be data snooping.

      • Hu McCulloch
        Posted Sep 14, 2010 at 12:05 PM | Permalink

        I tend to lump the two together, but I see that there is a big literature on “data mining” as contrasted with “data snooping”. Hopefully the former isn’t just a high tech, white collar crime version of the latter!

    • Mike B
      Posted Aug 17, 2010 at 9:40 PM | Permalink

      Thanks Hu, for walking us through CCE and ICE again. It’s unfortunate, but to get published in these statistical journals I guess something has to be done that is novel methodologically, even if it isn’t optimal.

      But like you stated, at the end of they day, MW found some serious deficiencies in the approaches used by the climatologists without even touching on the most glaring errors.

      • DanQ
        Posted Aug 30, 2010 at 11:33 AM | Permalink

        You “guess something has to be done that is novel methodologically, even if it isn’t optimal”

        Please, we have had enough of guesswork in this field. The Stat. journals are open to your perusal if you so desire.

    • Carl Gullans
      Posted Aug 18, 2010 at 7:43 AM | Permalink

      Hu, you should (if you haven’t already) e-mail this to McShane and Wyner, since I believe their paper apparently hasn’t been finialized yet (although it will be published soon in some form). Perhaps they could address these comments in some way, if it isn’t too late.

    • UC
      Posted Aug 21, 2010 at 5:13 AM | Permalink

      But if temperatures are the dog that causes proxies to wag, and/or if one does not wish to presuppose that pre-instrumental temperatures look a lot like the instrumental ones, it is instead appropriate to use what CA’s UC calls “Classical Calibration Estimation,” or CCE.

      I think I got “CCE” from Williams 69, also called indirect regression (Sundberg 99), inverse regression (Juckes 06), controlled calibration. ICE is from Krutchkoff 67/Williams 69, direct regression (Sundberg 99), natural calibration.

      Multivariate Calibration

      Williams 69: Regression methods in calibration problems. Bull. ISI., 43, 17-28

      Krutchkoff 67: Classical and inverse regression methods of calibration. Technometrics, 9, 425-439

      Sundberg 99: Multivariate Calibration – Direct and Indirect Regression Methodology
      ( http://www.math.su.se/~rolfs/Publications.html )

      Juckes 06: Millennial temperature reconstruction intercomparison and evaluation

      ( http://www.cosis.net/members/journals/df/article.php?a_id=4661 )

  94. QBeamus
    Posted Aug 17, 2010 at 3:49 PM | Permalink

    I’ve been thinking about the assertion that random, non-climate related numbers are as good or better at predicting–more specifically, what it would mean if random numbers were actually more predictive. Would this imply that the proxies were anti-correlated with post-98 temperatures, at least within the existing models? More pointedly, would this suggest that the models were designed for a purpose inconsistent with predicting temperatures? Or would this simply suggest a complete lack of correlation, combined with a bit of bad luck to have lost, in effect, a coin toss against McShane’s random series?

    • Bernie
      Posted Aug 18, 2010 at 6:44 AM | Permalink

      Hu above has the best explanation. Beyond that, a random walk is a random walk.

      • QBeamus
        Posted Aug 18, 2010 at 1:08 PM | Permalink

        Thanks, but I’m afraid I don’t follow your point. Perhaps I should try to express my question differently.

        How is it even possible for something to be less predictive than an unrelated random number series, which, I presume, has, by definition, zero predictive power. I don’t understand the concept of negative predictive power. A stopped clock is right twice a day, but is the least useful measure of time that exists (tied for worst with random numbers). A clock that’s 12 hours off (or six hours, in the U.S.) is the most “wrong” it’s possible to be, but is actually an outstanding way to tell time, once you detect the anti-correlation (or, if you prefer, properly calibrate).

        So was McShane just being verbally sloppy when he identified a possiblity that the proxies might be less predictive than random numbers, or is there some technical meaning that I don’t appreciate?

        • Bernie
          Posted Aug 18, 2010 at 2:59 PM | Permalink

          QBeamus:
          Essentially they are saying that there is close to zero correlation between the metrics derived from the proxies and the temperature record, which is exactly what you would expect from a run of random numbers. The MW research is calling into question the supposed link between the proxies and temperature over a long period. A broken clock would also have a very poor correlation with the correct time.

        • mpaul
          Posted Aug 18, 2010 at 3:13 PM | Permalink

          McShane says ‘less predictive’ this is not the same as having negative predictive power. Random noise will result in the model having some positive non-zero predictive power. Of course this is a bit of an artifiacal concept, but it is a lot like saying that the probability that a single monkey could randomly type characters in a pattern that exactly matches Hamlet is >0. Random noise results in the model having a predictive power of ‘x’, where x is greater than zero. Mann’s proxies result in the model having predictive power ‘y’ where y>0. X>Y, that’s all.

        • Neil Fisher
          Posted Aug 18, 2010 at 7:45 PM | Permalink

          Re: mpaul (Aug 18 15:13),
          Assuming that a suite of random series were tested and not just one (ie, this is a “robust” result), what does this mean? Does it mean that:

          * these proxies cannot be used to make even vague predictions; or

          * that if we consider a different model, we may be able to extract predictive power from these proxies; or

          * that the “signal” is well in the “noise” and at this scale, we cannot extract anything meaningful (yet).

          It seems to me that any significant difference from the random series indicates that there is some data that we can extract about climate, but we just don’t have the right model (yet).

        • pete
          Posted Aug 18, 2010 at 8:19 PM | Permalink

          Yes, they use a suite of random series (the grey dots in Figure 10).

          It seems to me that any significant difference from the random series indicates that there is some data that we can extract about climate, but we just don’t have the right model (yet).

          Bingo. Although “we” in this context should read “M&W”.

        • Neil Fisher
          Posted Aug 19, 2010 at 5:34 PM | Permalink

          Re: pete (Aug 18 20:19),

          Bingo. Although “we” in this context should read “M&W”.

          Given the lack of predictive power, I’d suggest “we” means everyone!

        • pete
          Posted Aug 19, 2010 at 6:58 PM | Permalink

          The lack of predictive power is specific to the lasso. It doesn’t generalise to other methods.

        • QBeamus
          Posted Aug 24, 2010 at 1:18 PM | Permalink

          I believe that is mistaken. M&W find no model that has meaningful predictive power.

        • Stan Plamer
          Posted Aug 24, 2010 at 1:30 PM | Permalink

          M&W test a variety of model building tehnqiues and a) find that they have have about the same ability to predict the data in the hold out period and b) produce quite different backcasts

        • QBeamus
          Posted Aug 19, 2010 at 3:54 PM | Permalink

          Firstly, thank you–this is helpful. I wanted to get that in because I don’t want for my follow up to seem like tedious obtuseness. (Well, it may be tedious, but at least it’s not willful.)

          So it’s my premise that’s false: “random” doesn’t mean “zero predictive power.” That leads me to believe that “predictive power” doesn’t mean the same thing to physicists (which I am) and statisticians (which I am not). Yes, I would expect a series of random numbers to have some non-zero correlation with any measured variable, say, global temperature means. But that result follows from the density of numbers, more than anything–another way of making your monkey-and-typewriter point. But it should have zero predictive power, because that non-zero correlation is (by definition) false correlation. Any resemblance to past observations will vanish during the next round of observations.

          Unless, I suppose, there is auto-correlation in both the independent variable (plausible) and the random series. And perhaps there’s the point of confusion, because while I can understand the concept of a “random series” that is auto-correlated, it’s not what I’d normally think of or mean by that term, but perhaps mine is an unrealistically narrow definition.

        • Alan D McIntire
          Posted Aug 25, 2010 at 1:38 PM | Permalink

          QBeamus- I think you’re right- there’s either auto-correlation in both the independent and random series, else both are non- stationary. When I saw that figure 4 graph in the MW paper, I was astonished. I generated a few non-stationary random walks of about 25, ran correlations, and got unbelievably high numbers.

  95. Max_OK
    Posted Aug 17, 2010 at 11:43 PM | Permalink

    bender
    Posted Aug 17, 2010 at 8:38 AM | Permalink | Reply
    they aren’t sure about the handle, but suspect there may be a blade

    If there is an instrumental blade today that the proxies don’t capture, then there is a good chance there were similar blades in the past that were also not captured. Or did you not read the paper?
    ———

    bender
    Posted Aug 17, 2010 at 8:38 AM | Permalink | Reply
    they aren’t sure about the handle, but suspect there may be a blade

    If there is an instrumental blade today that the proxies don’t capture, then there is a good chance there were similar blades in the past that were also not captured. Or did you not read the paper?

    bender
    Posted Aug 17, 2010 at 8:45 AM | Permalink | Reply
    I see PaulM already pointed this out. But I note Max_OK hasn’t replied. This is your cue, Max_OK.

    —–
    A good chance means “maybe”, not “for sure.” That’s why I said they aren’t sure about the handle.

    • sleeper
      Posted Aug 18, 2010 at 5:14 AM | Permalink

      A good chance means “maybe”, not “for sure.” That’s why I said they aren’t sure about the handle.

      Unlike some people, M&W have an appreciation for uncertainty.

  96. simplesekeraftertruth
    Posted Aug 18, 2010 at 12:02 PM | Permalink

    McShane & Wyner are not too happy with the data either?

    Note 12 “Furthermore, it implies that up to half of the
    already short instrumental record is corrupted by anthropogenic factors
    ….”

    And from the conclusion: “The final point is particularly troublesome: since the data is not easily modeled by a simple autoregressive process it follows that the number of truly independent observations (i.e., the effective sample size) may be just too small for accurate reconstruction.”

    Plus ultimate sentence: “Although we assume the reliability of their data for our purposes here, there still remains a considerable number of outstanding questions that can only be answered with a free and open inquiry and a great deal of replication.”

  97. pete
    Posted Aug 18, 2010 at 8:23 PM | Permalink

    Blatant error in Figure 17: M&W compare annual errors from their backcast to multi-decadal errors in the Mann08 reconstruction. Of course they’re bigger!

    • Patrick Hadley
      Posted Aug 19, 2010 at 12:32 PM | Permalink

      Can anyone explain why it was valid for Mann08 to smooth the errors into a multi-decadal melange? Of course the error bars for a multi-decadal average are going to be narrower than annual, but why should annual proxy information give a 40 year smoothed graph?

  98. Salamano
    Posted Aug 19, 2010 at 12:06 PM | Permalink

    Here comes the meat..!

    http://climateprogress.org/2010/08/19/i-went-to-a-fight-and-a-hockey-stick-broke-out/#comment-291966

    So…what would the statisticians in the room say to these comments issued by DeepClimate?

  99. klee12
    Posted Aug 19, 2010 at 8:41 PM | Permalink

    There is a discussion of the McShane and Wyner 2010 paper at

    http://klimazwiebel.blogspot.com/2010/08/mcshane-and-wyner-on-climate.html

    Would anyone like to comment on that discussion?

    klee12

  100. Posted Aug 22, 2010 at 8:17 AM | Permalink

    On Aug. 20, RealClimate.org published some brief remarks on M&W10 in “Doing it yourselves”. Intriguingly, the RC authors chose to focus on the Tiljander proxies, showing M&W10-style “Lasso” reconstructions that included and excluded the Lake Korttajarvi varve series data.

    The RC figure is based on M&W10’s Figure 14. As best I can tell, the three traces in Fig. 14 are built upon all of the proxy series in Mann08 — not only on the Non-Dendro proxies.

    If so, the RealClimate figure is not informative.

    I submitted a comment on these issues to Realclimate.org last night, when the thread’s count was up to #41. So far this morning, 10 new comments have been released from moderation in three batches, but mine is not among them. I’ve made a copy into a post at my blog, “A comment on M+W10 submitted to RealClimate.org.

    • Layman Lurker
      Posted Aug 22, 2010 at 12:28 PM | Permalink

      Thanks for the update AMac. On one hand, RC mocks people “obsessing” about Tiljander, and on the other they throw up a meaningless strawman graph and nix your comment. IMO not real smart when still in the aftermath of Gavin’s admission, Mann’s SI, and recent blog discussions.

      Mosh’s 2+2=4 post now needs to be updated.

    • Posted Aug 22, 2010 at 2:25 PM | Permalink

      Re: AMac (Aug 22 08:17),

      Within the hour, my comment passed moderation, and was slotted into position #42 (the comment count is currently at 60). Not surprisingly, Gavin Schmidt has provided inline commentarry and rebuttal.

  101. Stan Plamer
    Posted Aug 22, 2010 at 11:59 AM | Permalink

    M&W point out that they tried multiple model building methods and all of the produced results that were just about as good as each other. These models then produced quite dissimilar reconstructing. They identified the issue as the brief period available for calibration.

    Would it be possible to improve their method by selecting techniques which build good models against regional proxies for areas in which the temperature record is known qualitatively. So if the proxies in western and northern Europe are used to build models and some of these models produce reconsturctions that yield the MWP and the LIA, would this not provide support for the reconsturctions produced by this method for the entire northern hemisphere. Conversely if the proxies cannot produce a MWP for Europe, which is known from historical records, then would this not indicate a failure in the proxies?

  102. Posted Aug 24, 2010 at 3:53 PM | Permalink

    Eduardo Zorita has a number of criticisms of the McShane and Wyner draft <a href="http://klimazwiebel.blogspot.com/2010/08/mcshane-and-wyner-on-climate.html&quot; Here. Zorita thinks they need some professional climatology input. There are some interesting comments at Klimazwiebel. I hope some of his criticisms will be addressed when the paper is actually published.

  103. Posted Nov 2, 2011 at 1:43 AM | Permalink

    I really don’t know why linear trends are so often assumed when curved ones often fit better.

    Trenberth himself plotted this nice curve of sea surface temperatures, now declining even more when recent data is added. See top of page http://climate-change-theory.com

    By the way, does anyone know why NASA stopped adding sea surface data after October 3rd at http://discover.itsc.uah.edu/amsutemps/

    Steve: satellite failure, I think.

13 Trackbacks

  1. […] submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious […]

  2. […] submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to […]

  3. […] McShane and Wyner 2010 […]

  4. By The Eternal Return « The Policy Lass on Aug 15, 2010 at 11:34 AM

    […] submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to […]

  5. […] which track quite closely the methods applied most recently in Mann (2008) to the same data, are unable to catch the sharp run up in temperatures recorded in the 1990s, even […]

  6. By Top Posts — WordPress.com on Aug 15, 2010 at 7:06 PM

    […] McShane and Wyner 2010 A reader (h/t ACT) draws attention to an important study on proxy reconstructions (McShane and Wyner 2010) in the […] […]

  7. […] submitted into the Annals of Applied Statistics and is listed to be published in the next issue. According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to […]

  8. […] reported over at Climate Audit, an important study on proxy reconstructions (McShane and Wyner 2010) is to be published in the […]

  9. […] in lots and lots—of folks wrote in and asked me to review the McShane and Wyner paper. […]

  10. […] The paper has been accepted, but publication is still a bit into the future as it is likely to be accompanied by invited discussants and comment. — Abaraham Wyner, comment at Climate Audit […]

  11. […] Air Vent: MW10 – Some thoughts Climate Audit: McShane and Wyner 2010 William M. Briggs: The McShane and Wyner Gordie Howe Treatment Of Mann Deep Climate: McShane and […]

  12. […] proxy reconstructions (McShane and Wyner 2010) published in the Annals of Applied Statistics  According to Steve McIntyre, this is one of the “top statistical journals”. This paper is a direct and serious rebuttal to […]

  13. […] Statistics to rebut criticism of their previous work by McShane et al (discussed on CA starting here) used pseudo-proxies generated from modeled temperature series with various forms or noise. The […]