Replicating McShane and Wyner

R coder mind of a Markov chain has replicated portions of the M&W work.

They write:

There are a bunch of “hockey sticks” that calculate past global temps. through the use of proxies when instrumental data is absent.

There is a new one out there by McShane and Wyner (2010) that’s creating quite a stir in the blogosphere (here, here, here, here). The main take out being, that the uncertainty is too great for the proxies to be any good.

Here’s an output from the replication:

More including R code here:


  1. Edouard
    Posted Aug 23, 2010 at 11:20 PM | Permalink

    Have you seen this?

    “In summary, admittedly climate scientist have produced in the past bad papers for not consulting professional statisticians. The McShane and Wyner paper is an example of the reverse situation. What we need is an open and honest collaboration between both groups.” (Eduardo Zorita)

    • RomanM
      Posted Aug 24, 2010 at 7:15 AM | Permalink

      In summary, admittedly climate scientist have produced in the past bad papers for not consulting professional statisticians. The McShane and Wyner paper is an example of the reverse situation.

      I don’t think that this criticism is completely justified.

      Yes, statistician would be foolish to go out and collect the proxy data themselves without the guidance of competent paleo climate scientists. For this, experience and background scientific knowledge will be essential. However, McShane and Wyner did not do this. They accepted the Mann et al proxy set as is. Although the presence of questionable proxies could seriously affect the validity of their own reconstruction, it does not directly invalidate the earlier results in ths paper.

      Once the proxy set has been determined, the problem shifts almost completely into the statistical forum. The choice of statistical procedures requires a deeper understanding of how these procedures work and whether the proxies satisfy the assumptions needed for the application of the procedures. Only the latter part of this may require input from the climate scientist. A good statistician would know which questions to ask in order to make the decisions regarding the analysis of the data.

      In order to criticize the paper, one would need to show that either the procedure selections made by M and W were incapable of extracting the necessary information from the data (requiring a full understanding of how the mathematics in those procedures works) or the proxy data did not sufficiently satisfy the necessary assumptions for the procedures to be applied in this specific case (requiring sufficient knowledge of the properties of the proxies themselves).

      Dr. Zorita’s evaluation can be summarized in his statement

      They test and criticize a statistical method which, to my knowledge, has not been used for climate reconstructions, and in contrasts they barely mention the methods that are indeed used. If they had analyzed the latter methods, climatologist would have benefited much more strongly from this study.

      It appears that much of his criticism is that they didn’t write the paper that he would have liked to have seen written rather than analyzing the content of the one that they actually wrote.

      • bill
        Posted Aug 28, 2010 at 8:59 AM | Permalink

        He sure seems to be saying that there isn’t much point in commenting on the content of their work, since they have the wrong content. From his point of view, why should he comment on methods that should not have been used?

        • RomanM
          Posted Aug 28, 2010 at 11:29 AM | Permalink

          Re: bill (Aug 28 08:59),

          In my book, “has not been used” and “should not have been used” do not have the same meaning. New ways of looking at things can often provide better insight than arguing the old methods.

        • bill
          Posted Aug 28, 2010 at 5:34 PM | Permalink

          RomanM, I certainly agree. However, what you are describing sounds like a paper all by itself. Unless I missed something, this was not a paper that proposed to introduce a new method and compared and contrasted this method with other methods, arguing why this method should be the new norm. If I’m wrong, I’ll certainly admit that that is the case.

          On this blog, Steve has had many posts arguing against the way Mann introduces new methods without spending an appropriate amount of time explaining why the new method is better than any other method. If you are arguing that their goal was to introduce a new method, then it seems like they went about it the wrong way.

        • RomanM
          Posted Aug 28, 2010 at 7:22 PM | Permalink

          This is a lengthy manuscript and it has several different aspects to it. The authors state:

          In this paper, we assess the reliability of such reconstructions and their statistical significance against various null models.

          We propose our own reconstruction of Northern Hemisphere average annual land temperature over the last millenium, assess its reliability, and compare it to those from the climate science literature.

          I have not had the time to look at the results in sufficient detail to pass judgment on the appropriateness of their approach, but, despite Steve’s (and my) aversion to Mann’s proclivity to devise new methods, would you not think that statisticians writing in a statistics journal might not be held to quite the same limits? From what I have seen it is not as much “new” methodology as opposed to previously developed statistical methods being applied in a new environment.

          I think it will be interesting to see what commentary will appear with the article and who that commentary will come from.

    • MikeC
      Posted Aug 25, 2010 at 4:31 PM | Permalink

      With the inclusion of Mann’s gi-normous error bars, there is nothing wrong with his reconstruction. When scaled, the GISP cores, the MW reconstruction… and probably someone going out and guessing the temperature every day… match perfectly.
      The problem with the Mann reconstruction is, that lately, the error bars keep disappearing.

  2. Edouard
    Posted Aug 23, 2010 at 11:21 PM | Permalink

    Hmm Sorry the link is there 😉

  3. Venter
    Posted Aug 24, 2010 at 3:10 AM | Permalink

    Except that the paper was not about climate science. It was about statistics. The authors used their speciality, statistics, to show that Mann’s own data shows nothing like a hockey stick and does not stand up to proper statistical analysis.

    Neither Mann or any of the Climate Scientists are statisticians. Mann used dubious statistics to arrive at his hockey stick. Now, professionals have stepped in to show that his work has no worth.

  4. Kilted mushroom
    Posted Aug 24, 2010 at 4:43 AM | Permalink

    Edouard then Venter and the whole post is done in a nut shell.

  5. stephen richards
    Posted Aug 24, 2010 at 5:46 AM | Permalink

    The Markov analysis is very straight forward and ‘honest’. As he points out, he is doing the analysis for himself and publishing it for others. But OH OH OH TCO has to pipe in to ask Amac who’s side is he on.

    Good analysis, clear and concise. Well worth the time taken to read.

  6. Geoff Sherrington
    Posted Aug 24, 2010 at 6:07 AM | Permalink

    There is that worrying period from about 1945 to 1970 when some versions of NH temp show a steep slope down while others are almost level. This is not new. See the duscussion by Frank Lansner in
    under the heading “Comparing Hansen’s graphs over time.”

    The question is whether the error spread shown in the associated graph forms part of the error estimates in M&W10. One alternative is that the differences have an adequate physical/mathematical explanation; another explanation is that numbers can be thrown around without due respect for the original versions; and a third conclusion is that the errors are even worse than M&W10 calculate.

    It seems probable that the M&W10 conclusions would be strengthened if there was indeed a sharp drop in this 1945-70 interval, as shown in early versions with least “adjustment”.

    Finally, if the instrumental temperature record cannot be extracted from the noise, how can proxy temperature experiments be calibrated? Should we now expect a rash of corrigenda and revised papers about proxies?

  7. Kenneth Fritsch
    Posted Aug 24, 2010 at 10:54 AM | Permalink

    Not a scientific view, but I see in some of the reviews of MW 2010 what I have noticed before in these situations. There would appear to be some ownership issues as in who was the first to criticize Mann reconstructions and what is the “correct” criticism of these reconstructions and finally some circling the wagons in the climate science community in that criticism is better coming from the inside then outside.

    Some reviews tend to get into points of contention that have little to do with main thesis of the paper and the points that the authors are attempting to make.

    We certainly know that Mann et al. in one of the reconstruction papers used 4 PCs and needed the fourth to obtain the hockey stick.

    Form the link, we have:

    “The report found that MBH method creates a PC1 statistic dominated by bristlecone and foxtail pine tree ring series (closely related species). However there is evidence in the literature, that the use of the bristlecone pine series as a temperature proxy may not be valid (suppressing “warm period” in the hockey stick handle); and that bristlecones do exhibit CO2-fertilized growth over the last 150 years (enhancing warming in the hockey stick blade).”

    At least as far as I recollect MW 2010 did not do any detrending.

    • Dave Dardinger
      Posted Aug 24, 2010 at 6:54 PM | Permalink

      Re: Kenneth Fritsch (Aug 24 10:54),

      and that bristlecones do exhibit CO2-fertilized growth over the last 150 years

      Except that that doesn’t seem to be the case. The problem with the Bristlecones is that the ones collected were largely ones with strip bark and the strip bark has different growth properties than regular tree growth. Idso thought it would have CO2 fertilized growth enhancement, but it’s never proven out. Of course, in either case bristlecones are still not usable as a temperature proxy.

  8. robert
    Posted Aug 24, 2010 at 12:48 PM | Permalink

    M and W will see comments and we will undoubtedly see whether their analysis is correct or not afterwards. I do find it dubious that they calibrate against Global temperatures instead of local temperatures. Anyone with any sense would know that the signal in the proxies were chosen because they have reasonably good relationships with local grid cell temperatures and not global. You can’t take proxies that were meant to be indicative of local temperature and say because they don’t predict global temperature well they are useless.

    • Dave Andrews
      Posted Aug 24, 2010 at 2:21 PM | Permalink

      Surely the whole point of MBH98/99 was that they used BCPs as proxies for global temperatures!

    • Steven Mosher
      Posted Aug 24, 2010 at 6:28 PM | Permalink

      the target for calibration is not as well defined as you imagine

      people try… seasonal temps, annual temps, in the local grid, pick two grids, hemisphere. when that fails they even have been know to do there own temp series

      its a treasure hunt

      • EdeF
        Posted Aug 25, 2010 at 9:04 AM | Permalink

        We could use a thread on that topic….pick 5-10 proxies and show their
        correlation to the local temperature during the calibration period, as
        reported in the literature. People would be surprised.

        • AMac
          Posted Aug 25, 2010 at 10:43 AM | Permalink

          Re: EdeF (Aug 25 09:04),

          I graphed “Proxy vs. temperature anomaly” (CRUTEM3v gridcell 1850-1995) for the four Tiljander data series used in Mann08, here.

          This is “for what it is worth,” as these are “nonsense correlations” per the terminology of GU Yule (1926, PDF).

          Still, three of these data series were accepted as valid proxies to the local temperature record for Mann08’s analysis, so they may be instructive.

          Mann08 reported “r” for these data series, while Excel yielded “R^2”. The values of Mann08’s “r” and my “sqrt(R^2)” are somewhat discordant, evidence of the use of different methods.

          Darksum (passed Mann08’s screen): r=0.3066; sqrt(R^2)=0.23
          Lightsum (passed Mann08’s screen): r=0.2714; sqrt(R^2)=0.18
          Thickness (passed Mann08’s screen): r=0.2987; sqrt(R^2)=0.21
          XRD (failed Mann08’s screen): r=0.1232; sqrt(R^2)=0.02

        • EdeF
          Posted Aug 26, 2010 at 12:36 PM | Permalink

          Amac, thanks for the response and the data. My plots of BCPs in the White Mtns look very similar to your plots of varve density vs temperature in the validation period. To me it looks like a bird-gun shot pattern and I didn’t even bother to try to fit a curve to the data. Looks like nonsense to me and I can’t believe that these poor correlations show that the proxies are linear.

        • AMac
          Posted Aug 26, 2010 at 1:13 PM | Permalink

          Re: EdeF (Aug 26 12:36),

          At the Air Vent, Jeff Id has used pseudoproxies with low S/N ratios to show convincingly that it is possible to extract signal in the presence of a lot of noise. The AGW Consensus paleoclimate reconstructors shouldn’t be faulted for working with low-signal proxies… since that’s the only type that exist.

          Mann08’s screening was set up in a way that allowed the Tiljander series (plural) to pass. This illustrates some of the flaws in the AGW Consensus’ implementation of the “proxy” idea, which I have come to think of as the Proxyhopper approach — “if they look plausible, toss ’em in, the algorithm will take care of any difficulties.”

          The pitfalls of setting out to look for the kind of data that you’d like to find have been discussed at length, and are well-understood in many fields,, e.g. in clinical trial design. That Mann08’s employment of the Tiljander series continues to be defended by AGW Consensus scientists is evidence that this basic point is still not grasped.

        • EdeF
          Posted Aug 28, 2010 at 2:38 PM | Permalink

          Here is Steve from 2008 on MBH proxy non-calibrations:

    • Posted Aug 24, 2010 at 9:34 PM | Permalink

      Again, what global temperature?? How is global temperature defined except as a bunch of local temps mashed together?

    • suyts
      Posted Aug 24, 2010 at 10:10 PM | Permalink

      Robert, it isn’t as if they hadn’t considered the locality issue and in fact ran tests to that end. From M&W, “3.6. Proxies and Local Temperatures. We performed an additional test which accounts for the fact that proxies are local in nature (e.g., tree rings in Montana)and therefore might be better predictors of local temperatures…….”

  9. S. Geiger
    Posted Aug 24, 2010 at 12:59 PM | Permalink

    “You can’t take proxies that were meant to be indicative of local temperature and say because they don’t predict global temperature well they are useless.”

    – That’s crazy talk. Haven’t you heard of teleconnections?

  10. stephen richards
    Posted Aug 24, 2010 at 3:15 PM | Permalink

    There are people who still cannot come to terms with the fact the MW10 was not about climatology but about statistics. Stop going on about temps local or otherwise it doesn’t matter to this paper. Did MW make a mess of their stats? Was their model wrong? Forget the data it is not important. Where does the paper go wrong in terms of their methods? Answer that and that’s all. Roman, Venter, as always, has got it.

    • Geoff Sherrington
      Posted Aug 24, 2010 at 11:53 PM | Permalink

      stephen richards – I went on a bit about local temperatures because they are part of what needs eventual consideration in the error analysis. The authors chose the HadCRU NH data as a starting point, but there are other versions that can be quite different and so affect the totality of error calculations.

      • stephen richards
        Posted Aug 25, 2010 at 8:11 AM | Permalink

        Absolutely, and I meant no criticism of your good self. My point was aimed at RC contributors who persist is focusing their attacks on MW’s data which of course is entirely inappropriate.

        I had understood from the paper that they used Mann’s data as is. I missed the HadCru stuff, my bad. However, I am sure that the intended focus of the paper was method and not data.

  11. Posted Aug 26, 2010 at 7:06 AM | Permalink

    “Mind of a Markov Chain” blogger apeescape has posted Global Temperature Proxy Reconstructions ~ now with CO2 forcing . Now the Bayesian reconstructions go back to 1000. Apeescape finds that adding CO2 forcing to the model changes the reconstructions, improving them by allowing them to predict recent warming, even when the last three decades of instrumental data are withheld.

    The various partisans are sure to like some of these efforts better than others, depending on what the results imply about paleotemperatures. Even I can see reasons to be skeptical of the actual spaghetti traces (as with Mann08, M&W10, and the others). Aside from that, apeescape’s approach seems innovative. Perhaps it will lead to greater analytical rigor in the field.

    • stephen richards
      Posted Aug 26, 2010 at 9:24 AM | Permalink

      Amac said “improving them by allowing them to predict recent warming”

      Are you making that judgement (the improvement) against Mann and Hansen’s manipulated data or something else? If it’s Mann et al then I would expect the answer to be as you have said afterall MW10 was based on Mann’s data. If this model is something else, well that could change everything.

      • AMac
        Posted Aug 26, 2010 at 10:41 AM | Permalink

        Re: stephen richards (Aug 26 09:24),

        > Are you making that judgement…

        I’m summarizing what I think apeescape accomplished, given the terms of the analysis as defined. The linked post may prove interesting to many technically-informed readers. I can’t say whether apeescape’s analysis is “valid.” I agree with Kenneth Frisch (9:28 AM) that assumptions and uncertainties have to be scrutinized.

    • Tom Gray
      Posted Aug 26, 2010 at 1:21 PM | Permalink

      Since when is CO2 forcing considered to be a proxy for temperature. Why not put in the Dow, number of Anglican curates etc?

    • Tom Gray
      Posted Aug 26, 2010 at 1:34 PM | Permalink

      Isn’t there some circularity here? The purpose of the proxy reconstructions is to determine past temperatures. This climate history will then be used with climate models to determine CO2 sensitivity.

      So this example seem to be deriving CO2 sensitivity not from temperature but from CO2 concentration. So CO2 derives CO2. Isn’t the example just predicting CO concentration from CO2 concentration plus some noise? In regard to temperature, isn’t the example just assuming the conclusion that CO2 CONCENTRATION IS A LINEAR PREDICTOR OF TEMPERATURE?

  12. Kenneth Fritsch
    Posted Aug 26, 2010 at 9:28 AM | Permalink

    Would not that be an oxymoron: Proxy reconstructions with CO2 forcing? Interesting stuff though and with large CI ranges.

    I think the partisans on all sides of this issue get upset when they fail to closely scrutinize all the assumptions and uncertainties that lead to rather tentative conclusions in these exercises. Both sides are much too quick to see a result as conclusive and that would appear to be the result of the advocacy overwhelming the science.

2 Trackbacks

  1. By Top Posts — on Aug 24, 2010 at 7:15 PM

    […] Replicating McShane and Wyner R coder mind of a Markov chain has replicated portions of the M&W work. They write: There are a bunch of “hockey […] […]

  2. […] Replicating McShane and Wyner (climate audit) […]

%d bloggers like this: