Intelligible Code

Just a short note today on code for McShane and Wyner and its comments. The discussion of McShane and Wyner has been greatly enhanced first by their archiving of code and secondly by simplifications of the code both by the online community and discussants.

Most of the McShane and Wyner analysis is in R, making this part of their analysis quickly accessible to a wide community. (Their Bayesian analysis is not in R – it seems to be in WinBUGS.) There’s a lot of self-teaching in R programming and I nearly always learn something from reading other people’s code. Sometimes you see things that you think that the other guy could learn. For example, I find it handy to organize R data sets as time series (“ts”) objects – not used by Mc-W. I like to have objects in use available on the internet for download using download.file(…, mode=”wb”), rather than packing things up in zip-files. It’s a small stylistic preference.

I was able to get the R portion of their code to operate quickly and easily, but I’m not familiar with WinBUGS and wasn’t sure how much overhead would be involved in figuring it out.

Meanwhile, I spent some time looking at the code for Gavin Schmidt’s (Schmidt, Mann Rutherford) comment on McShane Wyner. The code is night and day different from the code of Mann et al 2008, which is ponderous and opaque to the point of virtual unintelligibility. While the entire point of the Schmidt comment is nothing more than preserving bristlecone impact without mentioning bristlecones, the code of their comment is clean and easy to follow.

One useful improvement to the McShane-Wyner code is the collection of their code into a well-organized master script, much improving the ease of replicating results. Their master script makes extensive use of the source command. Occasionally they have second-tier source commands (called from a source script) rather than keeping all the source commands in the master script, but this is a stylistic nit.

Their other major improvement is applying the port of the Bayes portion of McShane-Wyner to R originally developed at a blog blog here. This port requires the R-package rjags AND the separate installation of JAGS 2.1.0 . It took me a while to figure out that JAGS 2.1 needed to be installed separately, but once that was done, I was able to proceed uneventfully.

For the most part, the reconstructions illustrated in Mc-W Figure 14 are based on a variation of MBH98-99 methodology (and not the lasso methodology used in the red noise analyses) i.e. they did a principal components analysis of proxies and gridded temperatures (with the M)8 network) and reported various permutations, yielding a lot of different squiggles. The big issue in the various permutations is not discussed by Mc-W or (especially) by Schmidt, as it pertains to the extremely stale issue of the presence/absence of strip bark bristlecones.

I mentioned previously that the 93-series network used by McShane-Wyner included 24 strip bark chronologies and 2 Korttajarvi sediment series. Because there are so many strip bark chronologies and no attempt is made to downweight all these nearby chronologies, they dominate the PC1 in the AD1000 network even without Mannian centering – a point originally made in McIntyre and McKitrick 2005 (EE), but mostly ignored. It can therefore hardly be a surprise that the “red” hockey stick shaped reconstruction in McW Figure 14 (see below) looks like MBH99, because it is simply one more avatar of MBH99 bristlecones.

In the following graphic, I’ve indicated the weights of individual gridcells by the weights of the filled circle – a technique used here on a number of occasions. After all the huffing and puffing of megaflops of multivariate analysis, it all comes down to a vector of linear weights of the original proxies (in calculating the weights, I’ve used linear algebra shown at CA in the past.) Other permutations are less weighted towards bristlecones.


Figure 2. Weights by Gridcell of Proxies in Mc-W 93 Proxy PC1 (radius is square root of weight)

Schmidt et al contested the Mc-W 93-series network as not being “properly” selected. They reduce the network to 55 proxies, excluding the two Tiljander series retained in the MW 93 and tree ring series with fewer than 8 cores. Bristlecones remain well represented, needless to say. I’ll discuss their version on another occasion, together with a no-bristlecone version.

The availability of code stimulates discussion since you can see what they actually did and it hugely mitigates the arid controversies all too typical of climate science about whether one had correctly guessed at some bizarre methodological quirk.

30 Comments

  1. R DeWitt
    Posted Oct 5, 2010 at 12:50 PM | Permalink

    Is the weight proportional to the area or to the radius of the circles in the graphic?

    Steve- Good point. To the radius. But the area is more appropriate and what I’ve used in similar graphics in the past. I’ll reissue using the area.

    • Pat Frank
      Posted Oct 5, 2010 at 5:44 PM | Permalink

      circle area = pi x r^2, of course, so the weight is proportional to both. 🙂

      • Posted Oct 6, 2010 at 5:30 AM | Permalink

        It’s that kind of math expertise and attention to detail that keeps us coming back to CA.

  2. Hu McCulloch
    Posted Oct 5, 2010 at 1:05 PM | Permalink

    One useful improvement to the McShane-Wyner code is the collection of their code into a well-organized master script, much improving the ease of replicating results. Their master script makes extensive use of the source command. Occasionally they have second-tier source commands (called from a source script) rather than keeping all the source commands in the master script, but this is a stylistic nit.

    This is one thing I find really annoying about Matlab — specialized functions that are only going to be called by the current main program may not be in the same file as the main program. This means that a single project may end up in several files, making it a nuisance to keep track of. I don’t know about R, but GAUSS, for example, doesn’t have this restriction.

    Oddly, though, Matlab function files may contain specialized subfunctions, though these will not be recognized outside the file in question.

  3. ZT
    Posted Oct 5, 2010 at 1:29 PM | Permalink

    What a nice clear graphic of the skewed weights. Perhaps something for the next IPCC report cover?

  4. stephen richards
    Posted Oct 5, 2010 at 1:38 PM | Permalink

    Steve Mc

    I just love your work. It is clear, clean, concise and accurate. No controversy. Thanks.

  5. Steve McIntyre
    Posted Oct 5, 2010 at 2:23 PM | Permalink

    When one does a similar graphic for the “MWP” sort of curve, it is dominated by the Yucatan sediment series. I don’t know much about this data, but there’s no reason to believe that it is magic mud any more than the bristlecones are magic trees. All of this needs to be taken with many grains of salt.

  6. Steve McIntyre
    Posted Oct 5, 2010 at 2:46 PM | Permalink

    I replaced the weights diagram with one where the sqrt(radius) is proportional to the weight. The dominance of the bristlecones is very clear. The Tiljander sediments are the dot in Finland.

    The blue weights how negative coefficients. The bristlecones will sometimes cause series with “warm” early portions to be flipped over in a PC analysis – a point that we mentioned in McMc 2005 (EE). It is very pronounced in Mannian PCs, but occurs even in centered PC as here.

    Some of the McW-S choices seem overly deferential to paleo practices e.g. their apparent assumption that PCs have been validated as a methodology for tree ring networks.

    • Shallow Climate
      Posted Oct 5, 2010 at 3:35 PM | Permalink

      “Overly deferential”: Well, yes. I surmised from their paper that their thrust was to take the data and methodologies of the Climate Science Community as a given, casting no critical eye on that aspect. After all, M and W are statisticians (only); i.e., only half of what you get with SM. Actually, “surmised” is too weak a word: They actually state it. They leave in the bristlecone chronologies and include the upside-down Tiljander series as is. In my perhaps feeble mind, that is a strength of the MW paper: “Even with your best-case scenario, we will show you that you are all wet.”

  7. ben
    Posted Oct 5, 2010 at 4:29 PM | Permalink

    Isn’t this how climate science should be done? Shame on Mann et al for their needless, wasteful, prolonged obfuscation.

  8. ZT
    Posted Oct 5, 2010 at 5:23 PM | Permalink

    It is harder to hide the 55 peas under all those thimbles when you’re publishing intelligible code.

  9. Posted Oct 5, 2010 at 9:40 PM | Permalink

    Ok, two years of paleoclimate science and I’m ready to switch majors.

    [self snip again, sorry]

    • Posted Oct 5, 2010 at 9:43 PM | Permalink

      Really, i did snip the comment. Let’s not forget that none of the scribbles are conclusively demonstrated to be in any way related to temperature.

  10. Sylvain
    Posted Oct 5, 2010 at 11:49 PM | Permalink

    I have 2 questions:

    1-Does the M&W paper stands up to scrutiny?

    2-Does the Schmidt paper valid in its critic

  11. John Murphy
    Posted Oct 6, 2010 at 4:26 AM | Permalink

    I understand that teh proper application of statistical method is vital, but how is any of these tree-ring proxy studies useful anyway?

    From reading Fritts it is obvious that precipitation is a very important determinant of ring width. It is perfectly possible, indded, certain that temperature and ring-width will be confounded. It is also certain that, for example, ring-width and temperature can be negatively correlated, depending on a host of other variables. The only way around the problem is to have a large and varied sample for each species in each area – which Fritts demonstrates, but which are not avaiable in most of the “hockey stick” type studies. Then there is a chance that something like PC analysis can separate out the various factors at work.

    • Posted Oct 6, 2010 at 5:39 AM | Permalink

      Apropos of which, I was struck by this great ending to geologist Joe Brannan’s riposte to Bob Ward’s critique of his review of The Hockey Stick Illusion in a letter to the Geological Society (spotted a couple of days ago courtesy of Bishop Hill):

      Bob notes that I am sympathetic to the sceptics. I am, but in a very specific sense; I believe that critics such as McIntyre have raised legitimate questions about the robustness of the Hockey Stick. They deserve a straight answer. Judith Curry, the only mainstream climatologist to review the book (as far as I am aware) agrees with me.

      The hypothesis that recent global temperatures are unprecedented in 1000 years is profoundly important. We should be gathering data by the bucket-load to test it, instead of endlessly debating the meaning of the inadequate data-sets we have amassed to date. I have no doubt that the hypothesis can be conclusively tested, and I look forward to the result – regardless of whether it turns out to be true or false.

      It surely can be done, if we gather data by the bucket-load. Just not that data, presumably. (CA readers can fill in the blanks with their favourite examples.)

      Meanwhile, even before getting into all that, I applaud Steve once again for pointing to the strengths as well as the weaknesses in his ‘enemies’. Open source really works, at so many levels, not least this ability for unknown third parties to edit code and thus clarify it for everyone that follows. That message seems finally to be getting through.

  12. Don McIlvin
    Posted Oct 6, 2010 at 10:02 AM | Permalink

    Agree or not with the conclusions, it seems the McShane Wyner paper has raised the bar in setting a bench-mark of acceptability in the availability of code and source data. Perhaps papers past, present, and future should be considered against this standard of source data and code availability. As considering whether it meets a standard of scientific “review-ability”, rather than the simple publisher conducted “selective peer review”.

    Steve: the first papers to apply this sort of standard in the field was McIntyre and McKitrick 2003, 2005ab. I’ve also done this in many Climate Audit posts and there is a very large inventory of scripts of climateaudit.info. Wahl and Ammann placed code online for much of their response to us and Mann et aI 2008 placed code online – not very good code, but nonetheless code. They’ve done a very good job of archiving but the standard had already been set.

    • Don McIlvin
      Posted Oct 7, 2010 at 10:23 PM | Permalink

      Well then, the MM standard it is.

  13. Anthony Watts
    Posted Oct 6, 2010 at 11:00 AM | Permalink

    Forgive my ignorance, but I can find no reference in text nor graphics as to what the blue and green plots represent. I know what the red plot is, but a color key would be helpful.

    • TAG
      Posted Oct 6, 2010 at 3:27 PM | Permalink

      Green and blue are reconstructions from using other model building methods taht have verification metrics that are just as good as the famous red one from Mann et al. I thought that this was M&W’s main point. They showed that because of the short instrumental period and noisy proxies that different methods produced wildly different reconstructions. There was no way to choose between these reconstructions even though they had very disparate properties/

      They are criticized for using the lasso but their point was that the lasso worked as well as any other of the methods so why not.

      • Anthony Watts
        Posted Oct 6, 2010 at 7:57 PM | Permalink

        And I repeat the question “other model building methods” is….?

        • TAG
          Posted Oct 6, 2010 at 8:30 PM | Permalink

          They used Mann’s method to build the model for the read. They used many other other methods found in the climate science literature for the others. They all had equivalent verification statistics so one reconstruction is as good as another even if they generate quite different reconstructions. They tested many more than the three modelling methods shown in the graph. This is all explained in the early portions of the paper.

          The obvious conclusion is that with the current proxies and the number of model building methods, one can quite easily tune the results of a reconstruction

        • TAG
          Posted Oct 6, 2010 at 8:40 PM | Permalink

          The proxies are calibrated to temperature in the calibration or instrumental period. So proxy values are calibrated to temperature readings. This created a model which links proxy values to temperature. This model is then used to generate the estimate of temperature outside of the calibration period. This is the why SMc says that are used t the various methods of reconstruction all produce a “weighed average”. This is the model. So Mann will do everything he can to retain the bristle cones in the weighed average or model while Smc shows why this is not a good thing.

          To estimate the quality of the model, some to the instrumental data is held out from the calibration, the model is then used to generate an estimate for this period and the estimate is compared to the measured values.

          M&W showed that various methods used in the literature created models with equivalent verification statistics but quite different reconstructions. They said that this was because the instrumental period was too short and the proxies to noisy to create reconstructions that went so far in the past.

        • Tim Osborne
          Posted Oct 7, 2010 at 3:55 AM | Permalink

          I have been closely following climate science for some time now from a lay person perspective (so my only excuse is stupidity) but I am clearly some way behind everyone else. If the red includes contested sources of data such as bristlecones (which Steve M has previously gone into in detail) a) do the blue and green exclude bristlecones but b) use other sources, which may or may not be more reliable proxies?

        • TAG
          Posted Oct 7, 2010 at 4:23 AM | Permalink

          The same proxies are used for all. M&W point is that the choice of method can greatly affect the reconstruction generated since the proxies are noisy and the instrumental period is so short.

  14. kuhnkat
    Posted Oct 6, 2010 at 11:11 AM | Permalink

    Along with the more egreqious distortions, it would appear that the term multivariate hides the fact that there is still no reconstruction with enough data to do a reasonable job of showing us what is claimed.

    My ideal would be multivariate in each location, with the locations covering most major regions and full time of the reconstruction. How much less can be provided and still have a useful work?

  15. TGSG
    Posted Oct 6, 2010 at 2:51 PM | Permalink

    Blue diamonds, red diamonds, blue and red dots = lucky charms! Which ones are the magical charms that get us to the flat blade and the curved blade? Only in “climate science” can we know.

  16. AMac
    Posted Oct 6, 2010 at 3:53 PM | Permalink

    For readers new to the story, here’s a synopsis of the data series from Lake Korttajarvi in Finland, collected by Tiljander et al., and mentioned by Steve in the post (“Korttajarvi” and “Tiljander” refer to the same four (three) series).

    Mann08 (on which M&S10 is based) screened four Tiljander series. There are actually only three Tiljander series, “Lightsum”, “Darksum”, and “XRD.” The fourth putative series, “Thickness,” contains no additional information, as it is simply “Lightsum” and “Darksum” added together. Lightsum, Darksum, and Thickness passed Mann08’s screen; XRD did not.

    Tiljander and her coauthors are the sole published authorities on the interpretation of these proxies. They proposed particular temperature signals for Darksum, Lightsum, and XRD in the pre-1720 era. In my opinion, the evidence for clear temperature signals in those proxies is weak (discussion).

    Tiljander noted that the natural signals in these data series are progressively overwhelmed post-1720, especially in the 19th and 20th centuries. Local land use factors that come to dominate the series include roadbuilding, farming, peat cutting, bridge reconstruction, and lake eutrophication. Thus, these data series are wholly unusable as temperature proxies by the methods used in Mann08, which require direct calibration of each proxy to the instrumental temperature calculated by CRUTEM3v for a local gridcell, 1850-1995. These points become clear upon inspection of graphs of the data series.

    Notwithstanding the above: Mann08’s authors implicitly accepted Tiljander’s temperature signal for Darksum, implicitly employed an opposite (“upside-down”) signal for Lightsum and for XRD, and proposed a novel signal for Thickness. Mann08’s authors offered no justifications for these assignments.

    Until recently, these points were vigorously contested by pro-AGW Consensus paleoclimatologists. In the last few months, Gavin Schmidt and others have quietly revised their position. They now assert that nobody can tell whether or not these data series are suitable for use as temperature proxies, and that, in any case, “it doesn’t matter.” Those claims are rebutted by Steve in the CA post The No-Dendro Illusion.

    We now return you to your regularly scheduled program.

  17. Spence_UK
    Posted Oct 7, 2010 at 4:21 PM | Permalink

    This reminds me a lot of the analysis Jean S did on the putative CO2 growth spurt correction Mann applied to the Bristlecone Pines in MBH99. While the correction direct effect was in the late 19th / 20th century, the consequence was to modulate the shaft of the stick exactly as shown in McS+W fig 14.

    So many ways of tuning the stick to get the result just so.

  18. Posted Oct 8, 2010 at 1:53 PM | Permalink

    If this is not a stupid or irrelevant question – I would like to see every single proxy as a graph (with a few notes) – just as one can look up GISS graphs. Even if there are issues of logarithmic scaling and normalization, these can surely be discussed. It’s the shapes I want to mine for (yes, mine for similarity / dissimilarity of shapes to each other, not just for shapes that match the instrumental period).

    I don’t have the skill to turn data into graphs and lack time to learn. I’m sure a lot more like me would really appreciate seeing the individual proxy graphs; moreover I could then use them to do photoshop work to great visual effect and scientific evidence and insights, as I did with Yamal.

    Can anyone point me to a source of such material?