A Surprising Result

Yesterday, I observed some seemingly bizarre patterns in Mannian splicing, which seemed at first to be an inexplicable stupid pet trick. However, in this case, what seems to be a stupid pet trick appears to implement an intentional Mannian procedure referred to in passing in the SI, but not shown in the online source code. The net result is really quite astonishing: it appears that none of the 422 proxies that started after AD1700 contribute to any of the reported CPS reconstructions (EIV are a different can of worms). Here’s how I arrived at this startling conclusion.

Here was my initial introduction to the mysteries of Mannian splicing (at least in Mann et al 2008. MBH98 had inexplicable splicing patterns in their principal component retention, that no one’s ever figured out.) On the left is a comparison of the archived iHAD reconstruction and the version resulting from splicing the steps from my emulation in 100-year intervals – the system shown in the source code. The latter matched the results from UC’s Matlab run to negligible difference. As you see, the match up to AD1800 was virtually exact, but not after 1800. With some experimenting – and this sort of experimenting can take hours before you figure out what the issue is – I found that by excluding the AD1800 step, I could get a virtually identical match, as shown on the right (which continues the AD1700 network to the present.)

Readers may perhaps note that the version that Mann used contains a little extra 20th century trend, relative to the discarded version – the sort of choice that is not entirely unfamiliar to CA readers.

   

My first instinct, as reported yesterday, was that this was a stupid pet trick – a careless bit of programming of the sort that we’ve seen elsewhere, in this case, the accidental omission of the last series. However, when I tested this on other “targets”, a different and seemingly bewildering pattern emerged.

On the left is a similar comparison for the CRU NH “target”. In this case, there were matches in some centuries, but not others. In this case, there were matches in the 700s, 1400s and 1700s. And this series proved to have steps in AD700, AD1400 and AD1700, rather than every century.

On the right is the same exercise for the iHAD SH “target” where the pattern is different again. Here there is a match from about AD1000-1200 and not thereafter. This network appears to have used the AD1000 roster through to the present.

   

How Does Mannian Splicing Occur?
Clearly, whatever is going on here is not simply a stupid pet trick, in the sense of double-counting proxies by erroneous programming of what Nicholas aptly refers to as the “fence post” problem.

I carefully examined the code for gridboxcps for clues, but there wasn’t a whiff of anything that remotely resembled code that implemented a splicing procedure. There was the following code implementing the splice of regular century steps, but nothing to show irregular splicing:

%splice and save
gbcps=NaN([nyears_ptot,1]);
for n=1:19
gbcps((n-1)*100+1:n*100)=NHmean((n-1)*100+1:n*100,20-n);
end

However, the SI contained the following gem:

As a safeguard against the inclusion of redundant statistical predictors and statistical overfitting, the set of additional series that become available at the start of each subsequent century (e.g., A.D. 500, A.D. 600, etc.) were admitted into the network only when their use results in an increase in scores for both skill metrics relative to the set used for the previous century.

The “skill metrics” are the RE and CE statistics – see main article which states:

So-called ‘‘reduction of error’ (RE) and ‘‘coefficient of efficiency’ (CE) skill scores for the decadal reconstructions were used as metrics of validation skill as in past work (20, 32). Because of its established deficiencies as a diagnostic of reconstruction skill (32, 42), the squared correlation coefficient r2 was not used for skill evaluation.

The statistical “authorities” (32,42) 🙂 are Mann et al 2007 and Wahl and Ammann 2008. Given the supposed “deficiencies” of the squared correlation coefficient, it is perhaps a little surprising that the unsquared correlation coefficient r is used both to screen proxies for inclusion in the Mannian network and to invert the sign of proxies, but, hey….

The “decadal” RE statistic essentially measures trend. So here’s what I think is going on – if new proxies “enhance” the trend, they are added to the network; but if they don’t, they aren’t. So here’s one more way that the results have been biased a little. And, of course, every time, you do one of these sorts of comparisons and make one of these choices, there’s a price for it in reduced significance – but nowhere, as far as I can tell, is there any attempt to render an accounting of the impact of this procedure. It would take some effort to try to figure out the impact; all I can say right now is that there is a price, what it is, I don’t know.

The AD1800 Step
Aside from statistical degrees of freedom, I noticed something really quite odd just in how this picking procedure worked empirically.

For all 8 “targets” (NH, SH by CRU,iCRU,HAD,iHAD), the final “step” was AD1700 or earlier. In no case was the AD1800 roster retained. Out of the total roster of 1209 proxies, over one-third (415) proxies began in AD1700 or later. As I currently understand the splicing, none of these proxies contributed to any of the reported reconstructions.

So let’s go back to the statement in the SI:

As a safeguard against the inclusion of redundant statistical predictors and statistical overfitting, the set of additional series that become available at the start of each subsequent century (e.g., A.D. 500, A.D. 600, etc.) were admitted into the network only when their use results in an increase in scores for both skill metrics relative to the set used for the previous century.

Maybe this statement passes muster with climate scientists, but do we really know that this procedure actually provide a “safeguard” against “overfitting”? Perhaps it is a recipe for additional overfitting. This is a statistical statement and a statistical authority should be provided.


The Southern Hemisphere

The situation is even more extreme in the SH. As I presently understand Mannian splicing – and none of this is explicitly reported – none of the proxies that start after AD1500 contribute to the CRU reconstruction and no proxies that start after AD1100 contribute to the iHAD reconstruction!!

There are a total of 173 SH proxies shown in the network, of which 139 start after AD1500. Under the RE improvement criterion, these additions are rejected!

It appears that the iHAD reconstruction for the SH uses only 6 of the 173 proxies: Cook’s Tasmania recon, Thompson’s Quelccaya O18, both of which are staples of prior recons, Cook’s Oroko NZ reconstruction (rejected in Mann and Jones 2003), Holmgren’s South Africa Cold Air Cave speleothem (O18 and C13) and a FICU series from Argentina (Arge091). The other 167 proxies fail to “improve” the results and are excluded.

As noted above, as far as I can tell, none of these manipulations are shown in the archived source code – so, to that extent, the source code, as provided, does not show the actual calculations.

16 Comments

  1. Soronel Haetir
    Posted Nov 9, 2008 at 10:27 AM | Permalink

    Am I missing something or wouldn’t many of the best series used to do the instrumental validation start after 1700? Your description makes it sound like the reconstruction tosses away the very data that would best represent a temperature to physical process relationship.

  2. AndyL
    Posted Nov 9, 2008 at 10:32 AM | Permalink

    Jeff ID has apparently reverse-engineered which proxies are used in the reconstructions with weightings. As a check, do his calculations back up the finding that the proxies that start after 1700 are not used?

  3. Steve McIntyre
    Posted Nov 9, 2008 at 10:46 AM | Permalink

    Jean S and UC also did some reverse engineering and I’ve done so as well. UC posted AD1000 results on another thread and I used this information in my “engineering”.

    Essentially you end up regressing the reconstruction against the candidate series over 100-year intervals. There are some statistical limitations to the reverse engineering, as the archived series are smoothed and there remains some indeterminacy in the results – some coefficients look a little off.

    In fact, now that we have a little more insight into the splicing, the reverse engineering can be improved – we can now specify steps and take longer intervals for the reverse engineering. HEre are the steps as I presently understand the situation:

    NH:
    CRU – 500,600,700,1400,1700
    iCRU – 600,800,1500,1600,1700
    HAD – 1500,1600,1700
    iHAD- 1500, 1600, 1700

    SH:
    CRU – 1000,1100,1200,1400,1500
    iCRU – 500,700,1000,1100,1200,1400, 1500
    HAD – NA
    iHAD- 1000, 1100

    If these longer intervals are used the reverse engineering can be done better.

    Now that I’ve got the actual engineering replicated, I’ll be able to calculate the weights directly in fairly short order.

  4. Steve McIntyre
    Posted Nov 9, 2008 at 10:52 AM | Permalink

    #1. Just because something is called a “proxy” doesn’t mean that it is one. As Rob Wilson and Mike Pisaric observed, just because a series is in the ITRDB data bank doesn’t mean that it is a temperature or even a climate proxy.

  5. vivendi
    Posted Nov 9, 2008 at 12:34 PM | Permalink

    I have a degree in engineering, but no profound knowledge in statistics nor in climatology. But after reading this, I must express my admiration for Mann’s work. He must have spent months after months to find the right combination of inclusions/exclusions, programming and mathematical tweaking to finally reach the desired result. How can he keep track of what he has done and undone?
    Unfortunately for him, he didn’t expect anybody to dig so deeply and to finally find the worms ….

    • Craig Loehle
      Posted Nov 9, 2008 at 3:16 PM | Permalink

      Re: vivendi (#5), Perhaps fortunately for me, I am no where near smart enough to do things in such a convoluted way without confusing myself. KISS is my motto.

  6. Soronel Haetir
    Posted Nov 9, 2008 at 1:01 PM | Permalink

    If Mann didn’t expect the CA crowd to dissect this paper then he is more blind than I.

  7. pjm
    Posted Nov 9, 2008 at 4:12 PM | Permalink

    Dr Mann’s algorithms are known to produce hockey sticks from the available data. Is there a small change that could be made that would produce a decreasing hockey stick? I don’t mean an obvious blunder such as putting in a – sign, but something that could “reasonably” (in some sense of the word) be generated from the data.

  8. Steve McIntyre
    Posted Nov 9, 2008 at 4:36 PM | Permalink

    #8. In other data sets, small (and plausible) variations yield different MWP-modern relationships e.g. Ababneh’s bristlecone chronology instead of Graybill’s; the Polar Urals update instead of Yamal; Grudd’s Tornetrask instead of Briffa’s. In every recon so far, slight variations change the results. And surprise, surprise, the Team always chooses variations so that the modern period is slightly in the black. This is what makes the accounting such an issue.

    There are a lot of new proxies in Mann et al 2008 and to be fair to it, one has to look at the new proxies, just in case there is more to this one than the others. So far it seems unlikely.

  9. Kenneth Fritsch
    Posted Nov 9, 2008 at 6:24 PM | Permalink

    The statistical “authorities” (32,42) are Mann et al 2007 and Wahl and Ammann 2008. Given the supposed “deficiencies” of the squared correlation coefficient, it is perhaps a little surprising that the unsquared correlation coefficient r is used both to screen proxies for inclusion in the Mannian network and to invert the sign of proxies, but, hey….

    I was under the impression that the screening in Mann et al. was done using p (the probability that the correlation was different than zero) and not r.

  10. jae
    Posted Nov 9, 2008 at 6:42 PM | Permalink

    I’m amused, but I don’t see how Steve Mc can be after all that sleuthing. This “story” gets weirder and weirder.

    • andy
      Posted Nov 10, 2008 at 4:47 AM | Permalink

      Re: jae (#11), What I find weird is that although my own background is on engineering, not in climates, I can read and somehow understand eg doctoral thesis of Ababneh or Grudd. I can read the Mann article as well, but really from the article I have no clue what is he really doing, some of the major gaps I can identify myself. But even more weird it will be wether Mann article is referred in the next IPCC report; there will be contra-arguments for it’s use, but the will be turned down on basis of either “those arguments are not published in peer reviewed journal” or “these issues are covered in article just approved for publishing”.

  11. Posted Nov 10, 2008 at 4:39 AM | Permalink

    Steve,

    I hope you’d indulge me by answering a couple of questions that nag me.

    1. As you remember, Wegman noted in the Reply to Stupak that

    A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions.

    …which begs the question as to what Mann is doing sifting through proxies accepting or rejecting them according to how well the chosen statistical metric (RE in this case) is “enhanced” or not. Can a method like this have any statistical integrity or is it more like a self-constructed delusion?

    2. As Bishop Hill noted, RE is a measure only for linear sequences yet clearly Mann 2008 isn’t linear. What is Mann enhancing? I must admit I’m more than usually baffled by a methodology whose purpose uses the wrong metric to produce a “signal” that scores strongly with that wrong metric. Am I wrong to be baffled?

    • jae
      Posted Nov 10, 2008 at 8:22 AM | Permalink

      Re: John A (#12),

      I must admit I’m more than usually baffled by a methodology whose purpose uses the wrong metric to produce a “signal” that scores strongly with that wrong metric. Am I wrong to be baffled?

      It’s simple. You use whatever promotes your theory. You use R2 when it is convenient, and you ignore it when it’s not, for example.

  12. KevinUK
    Posted Nov 10, 2008 at 12:28 PM | Permalink

    Steve

    As best as I can gather in trying to follow these series of threads in which your good self, Jean S, UC, Jeff ID and others attempt to audit Mann et al 2008, despite the NAS Panel report, despite the Wegman report and despite the congressional hearing, our esteemed friend ,who struggles to get babysitters, has carried on with business as usual almost as if these events had never happened?

    Correct me if I’m mistaken but to summarise what you have laid out so far, Mann et al 2008 looks like its just a re-hash of MBH 98/99 but with more proxies (particularly non-tree ring proxies) and with slightly different (but equally flawed) methods for picking the same cherries to make pretty much the same cherry pie (i.e. a hockey stick)? It would however seem that at least the handle has now changed though, but that the blade section spliced to the instrumental temperature record nonetheless stays the same?

    No doubt in concocting his new statistical methodology this time round he’s done his best to make sure that the MWP and LIA aren’t completely filtered out as this is after all what caused you to start digging into MBH 98/99 in the first place? It is a matter of some speculation but may I ask you, if the original TAR poster child hockeystick had actually shown prominent MWP and LIA’s in the handle (albeit obviously with a significantly lower MWP versus modern relationship/ratio), would you have still gone to the trouble you have done to audit it?

    Regards

    KevinUK

  13. Clark
    Posted Nov 10, 2008 at 3:31 PM | Permalink

    With so few proxy actually have data that extends to recent decades (and with several of those having the recent data discarded and replaced by fake data), and with so many proxies rejected by the technique you describe above, how many proxies containing data beyond 1970 contribute to the final temperature reconstruction????? Are there any??