Mann and Perfect Reconstructions

I finally turned over a few stones in Mann’s EIV reconstructions, little suspecting the perfection that awaited me in the cave using the simple and unassuming alter ego shglfulihad_smxx.

The figure below compares the SH reconstruction to the smoothed (SH) iHAD instrumental version. From the frail instruments of speleothems, bristlecone ring widths and upside-down sediments, Mann has achieved something divine – a “perfect” reconstruction.

Presumably out of an excess of modesty, Mann did not claim the perfect verification statistics that he was ‘entitled’, only claiming an “average” verification r2 of 0.28. But let’s not hide this light under a bushel. [Note: Jean S observes below that the calculations are done separately for calibration and verification periods. As I interpret his comment (together with my own take on Mannian RegEM), the method yields essentially ‘perfect’ reconstructions during the calibration period – no overfitting involved, of course – with verification stats dropping sharply in the verification period due, of course, to overfitting: a phenomenon that we discussed in a May 2006 post on VZ pseudoproxies.]

This was the first stone that I turned over in the EIV cave/ Perhaps more perfection awaits us deeper within the cave? Perhaps even richer treasures.

If you wish to confirm the perfection for yourself, here are scripts that download perfection from Mann’s FTP site. First download the shglfulihad reconstruction:


Next the instrumental target:


Now do Mannian smoothing on the instrumental target:

cutfreq=0.1; ipts=10 #ipts set as 10 in Mann lowpass
bf=butter(ipts,2*cutfreq,”low”); npad=1/(2*cutfreq); #from
smooth= mannsmooth(target,M=npad,bwf=bf)

Now plot perfection:

legend(“topleft”,fill=c(“grey80”,1,2),legend=c(“Instrumental”,”Instr Smooth”,”‘Reconstruction'”),cex=.8)

Oh yes, here’s the rest of the ‘perfect’ reconstruction.

Figure 2 – this is the same data as above, but plotted over the entire record.


  1. Posted Nov 28, 2008 at 12:12 PM | Permalink

    How’s the NH eqivalent? Surely it must be better because there’s more data and all those highly accurate bristlecones and finnish lakes to use 😉

  2. anonymous
    Posted Nov 28, 2008 at 12:38 PM | Permalink

    It’s almost as if somewhere in the incomprehensible barely commented source code someone smoothed the instrumental record, misnamed it something sufficiently abbreviated like “tmp” and later continued from there on the assumption it’s the proxy record…

  3. Steve McIntyre
    Posted Nov 28, 2008 at 12:39 PM | Permalink

    #1. Yep, it’s ‘perfect’ as well. So is the NH_iCRU.

  4. Steve McIntyre
    Posted Nov 28, 2008 at 12:41 PM | Permalink

    #2. Michael Mann:

    No researchers in this field have ever, to our knowledge, “grafted the thermometer record onto” any reconstruction. It is somewhat disappointing to find this specious claim (which we usually find originating from industry-funded climate disinformation websites) appearing in this forum.

  5. jae
    Posted Nov 28, 2008 at 12:56 PM | Permalink

    Wow. Can’t get much better than r2=1! Just what did he do here?

  6. jae
    Posted Nov 28, 2008 at 1:01 PM | Permalink

    Oh, I get it. Splice.

  7. Jean S
    Posted Nov 28, 2008 at 1:24 PM | Permalink

    To be honest, I really do not understand the point in this post. The fact that the instrumental target is the same as the “reconstruction” during the calibration period is a feature of any (Reg)EM-based reconstruction. It’s due to the very way these reconstructions are constructed: one is “infilling” the target values in missing time slots using the proxy series. Naturally there is nothing to infill during the calibration period.

  8. Steve McIntyre
    Posted Nov 28, 2008 at 1:47 PM | Permalink

    #7. Jean S, I think that there’s a point. I approached this from the verification statistics. The verification stats are not perfect. So something else must have been used to calculate the verification statistics – something that is not perfect.

    I agree that Mannian RegEM is a vast over-fitting exercise but this is a slightly different issue. The fit’s too perfect and looks like splicing rather than overfitting.

    • RomanM
      Posted Nov 28, 2008 at 1:55 PM | Permalink

      Re: Steve McIntyre (#8),
      That was my sense as well when I read Jean’s post. What did Mann use as a validation exercise?

    • Jean S
      Posted Nov 28, 2008 at 2:03 PM | Permalink

      Re: Steve McIntyre (#8), no. The “reconstruction” is always perfect during the calibration period, nothing to do with over-fitting. This is the way it’s done. There is really no “reconstruction” durig the calibration, only the target.

      In this EIV reconstruction, the calibration is 1850-1995. In calculating verification stats, the calibration interval is shorter and the reconstruction values are actually infilled during the verification period, which is intentionally left “blank” for the purpose. The fact that those stats are so poor tells that the algorithm is not actually working, but that’s another issue.

  9. Steve McIntyre
    Posted Nov 28, 2008 at 2:16 PM | Permalink

    #10. OK, that mostly makes sense.

    But in that case, why is this not an illustration of overfitting?

    Going back to the EIV method (and I haven’t got much of a foothold on replicating this yet), my sizeup is that Mannian RegEM is essentially a multiple regression of temperature against dozens to hundreds of proxies with rather poorly controlled ‘regularization’. So the fits during the calibration period are going to be pretty much perfect, which would then be what we’re seeing here – only over an extended 1850-1995 calibration period.

    I’d noticed that Mann had failed to report his calibration period statistics – an interesting “oversight” given that these stats for the EIV reconstructions appear to be “perfect”.

    • Jean S
      Posted Nov 28, 2008 at 2:27 PM | Permalink

      Re: Steve McIntyre (#11),
      It’s not overfitting, since during the calibration period they are not really calculating any (pointwise, linear) relationship between proxies and the target. The calibration period is a kind of “training interval”. If some specific name really needs to be assigned for this part of “reconstruction”, IMO “splice” is ok.

  10. RomanM
    Posted Nov 28, 2008 at 2:40 PM | Permalink

    OK, I think I see the approach. According to the Mann paper:

    Reconstructions were based on calibration over the full 146-year interval 1850–1995. Statistical validation was achieved by using a split calibration/verification procedure wherein data were calibrated alternatively on both the most recent (1896–1995) and oldest (1850–1949) 100-year subintervals, whereas the remaining 46 years were used to validate the reconstruction. Results from the early and late validation experiments were then averaged for the purpose of estimating skill metrics and uncertainties.

    It seems that the years from 1896 to 1949 were common for both the “early” and “late” periods so, for that period, the reconstruction would be the grafted temperature data. Unlike the main reconstruction, it seems to me that the two validation recons would have generated values for the years 1850 – 1895 and 1950 – 1995. Are those values somewhere in the SI? The file eiv-validation in the recons-eiv directory seems to only reference the years from 1850 back so it apparent that such values (if they exist) were not used in the validation. It would be interesting to plot them on Steve’s graph.

    • Jean S
      Posted Nov 28, 2008 at 3:12 PM | Permalink

      Re: RomanM (#13),

      Unlike the main reconstruction, it seems to me that the two validation recons would have generated values for the years 1850 – 1895 and 1950 – 1995. Are those values somewhere in the SI?

      These depend also from the used proxy network. Some of them are plotted in Figure 2 (main text) and in figures S4 (SI). Almost all of these graphs seem to indicate rapid cooling lately…

  11. anonymous
    Posted Nov 28, 2008 at 2:48 PM | Permalink

    Ok, it would be interesting to know how well the resulting 1850-1930 period matched the temperature record if the calibration was done over 1930-1995 period instead, or even vice versa etc…

  12. Steve McIntyre
    Posted Nov 28, 2008 at 2:52 PM | Permalink

    I haven’t been able to locate any archived reconstruction that actually ties together with any of the archived reconstruction statistics even though there are thousands of reconstruction statistics and dozens of reconstructions.

    As far as I can decode things, all the archived reconstructions are (1) splices of different steps; (2) full-period calibrations (and thus not related to the split-period calibration stats.)

    In the EIV case, no one’s been able to get the program to start due to some missing programs/files. So it’s hard to test interpretations.

    In the CPS case, UC’s been able to get parts of the program to run, but not through to the verification statistics. I can emulate the program up to the point where UC’s been able to get it to run, but am stuck right now at the verification stats.

    • Jean S
      Posted Nov 28, 2008 at 3:47 PM | Permalink

      Re: Steve McIntyre (#15),

      In the EIV case, no one’s been able to get the program to start due to some missing programs/files.

      Yes. Unlike Rutherford et al 2005-code the main code this time seems actually easily runable. However, RegEM is again run for “high” and “low” splits of data, and the code for doing the instrumental split is missing. Until Mann releases clihybrid.m (UC’s plead didn’t seem to help) , I think it’s next to impossible to try to replicate EIV reconstructions (although it’s pretty easy to guess the main things (normalization+Mannian filtering) done in clihybrid.m, but with Mann that’s never enough).

  13. Craig Loehle
    Posted Nov 28, 2008 at 2:57 PM | Permalink

    Either the “reconstruction” is something else, or it is grossly overfit. Anyone ever hear of “adjusted R^2”? Fitting 100 data sets or 1000 to a 100 year timeseries won’t look so good once you adjust for all the lost degrees of freedom.

  14. Carl
    Posted Nov 28, 2008 at 3:13 PM | Permalink

    It does not ever make any sense to overlap calibration and verification periods, as this does nothing but overstate verification period skill. They should have calibrated on those two 100 year periods and verified on the remaining 50.

    • Jean S
      Posted Nov 28, 2008 at 3:17 PM | Permalink

      Re: Carl (#18),
      They did used 100 year periods. In other words, they ran “verification” twice (late/early) and then averaged the results.

  15. Steve McIntyre
    Posted Nov 28, 2008 at 3:42 PM | Permalink

    The averaging of the two verification stats is something that needs to be pondered. For example, in some cases, Mann gets an RE of 0.8 in one exercise and -0.01 in the other exercise and declares that the average is 99% significant (as opposed to the obvious alternative interpretation that one RE stat is spurious and there is no valid model). Jean S (or UC or Roman), have you ever seen this sort of procedure in recognized statistics?

    • RomanM
      Posted Nov 28, 2008 at 4:49 PM | Permalink

      Re: Steve McIntyre (#20),

      Not like this. But then we all know the climate science theorem that averaging things always gives a more accurate result. Anyway, I’m still trying to get my head around the fact that in climate paleo recons everything gets smoothed beforehand and THEN they do the tests on the validity of the reconstruction basing critical values not on established theory, but on spurious model runs! I must have led a sheltered professional existence…

  16. Steve McIntyre
    Posted Nov 28, 2008 at 4:22 PM | Permalink

    My guess is that clihybrid.m will look like the corresponding proxy program only applied to temperatures – a while lot of Mannian smoothing to get “low frequency”. Wouldn’t this be possible to emulate?

    • Jean S
      Posted Nov 28, 2008 at 4:53 PM | Permalink

      Re: Steve McIntyre (#22),
      Yes, but it’s pretty time consuming to say the least. What exactly is the normalization? Is it done after or before filtering or both? Is filtering done with frequency=0.05/0.10/? Etc. In order to test the combinations, one needs to run the rest of algorithm in order to see if the “parameters” were picked correctly… Anyhow, if someone is willing to give a try, the corresponding proxy program is proxyhybrid.m (which contains lots of commented code meaning a lot was tried before “final version”).

  17. Steve McIntyre
    Posted Nov 28, 2008 at 4:56 PM | Permalink

    #23. There are some new features in Mann et al 2008 that don’t necessarily characterize other reconstructions. Indeed, the fantastic amount of smoothing is about as distinctive in Mann et al 2008 as PCs were in MBH98.

    It’s sort of interesting to have worked through Santer almost simultaneously because the money argument in Santer (coauthored by Gavin Schmidt) is that the CIs from models with autocorrelated residuals are large and large enough to incorporate present trends.

    Although Mann claims over 100 degrees of freedom (“modest autocorrelation”), by the time that his model compares Mannian smoothed proxy recons to Mannian smoothed instrumental records, I’d be surprised if there were even 6 degrees of freedom under the Nychka formula.

  18. Steve McIntyre
    Posted Nov 28, 2008 at 5:16 PM | Permalink

    #25. If one did a regression model between the unrescaled SH reconstruction and the smoothed iHAD (that I’ve been looking at), the AR1 coefficient of the residuals is 0.928; giving 5.4 df with (1-r)/(1+r) and df=1.41 using the Nychka formula. In the latter case, the t-value (.975) is 6.585.

    Mann’s “confidence intervals” are 2 times something – with the 2 presumably being the usual rule of thumb (which is based on the 97.5% t-percentile being about 2 for 20 or more degrees of freedom : e.g. qt(.975,df=20) # 2.086. If there are 1.41 degrees of freedom (in a Mann-smooth world) then the CIs are not 2 times something, but 6.5 times something – as in UC’s pretty cartoon showing floor to ceiling CIs.

    • John A
      Posted Nov 28, 2008 at 6:44 PM | Permalink

      Re: Steve McIntyre (#26),

      …giving 5.4 df with (1-r)/(1+r) and df=1.41 using the Nychka formula. In the latter case, the t-value (.975) is 6.585

      If there is a Statistical Olympics then this should have set Olympic records for overfitting and insignificance. I’m 97.5% confident.

      • Craig Loehle
        Posted Nov 28, 2008 at 7:28 PM | Permalink

        Re: John A (#31), I’m 100% confident that reviewers would never let ME get away with such monkey business…

        • Posted Nov 28, 2008 at 11:32 PM | Permalink

          Re: Craig Loehle (#32),

          No chance unless you find the same shape curve as M08.

          other comments:
          The thing that Mann’s group and other AGW people miss is that if they did find the same shape curves as M08 and used reasonable methods and data they would receive our support. Of course they would have to disclose their methods. They don’t understand that people like me would love to support their work, it is not my style to disagree first.

          I would feel much more relaxed with a clear concise answer to historic temps, this paper is so over-complicated for it’s flawed result that it is difficult to express. After 4 months of climatology experience it has been entertaining simply because the math it isn’t too difficult and there are constant rat’s nests to dig into.

        • Patrick M.
          Posted Nov 30, 2008 at 8:12 AM | Permalink

          Re: Craig Loehle (#32),

          Interesting! What if you DID try the exact same monkey business, (and used Mann’s methods to prove that the MWP was higher than current temps)? What would the reviewers do? Would they take you to task knowing that you could then turn around and complain about Mann? Or would they look the other way as they appear to do with Mann? Of course you would have plenty of Mann citations listed in your paper.

        • masmit
          Posted Nov 30, 2008 at 11:12 AM | Permalink

          Re: Patrick M. (#41),

          It’d also interesting to see if you’d be taken to task for misapplication of Mann’s methods, since it seems likely that few reviewers would know what those methods are…

  19. Carl
    Posted Nov 28, 2008 at 5:37 PM | Permalink

    #19: Ok, I see that I misread something… so did he then re-calibrate the model on all 150 years (after achieving sufficient model verification, in his mind) to predict beforehand? Or did he use one model or the other for the rest of his paper?

    • RomanM
      Posted Nov 28, 2008 at 6:05 PM | Permalink

      Re: Carl (#27),

      From the quote from the paper in RomanM (#13) , it certainly looks like the eiv-construction was calibrated on the full temperature record. Then, two more constructions were done on two 100 year overlapping sub-periods. Evaluation statistics were then calculated only to compare the two validation reconstructions for years prior to 1850 (ignoring the ability to compare the measured temperature era to other estimates from those validation sequences). If you look at the graphs referred to by Jean S (#17), you will understand why.

  20. Posted Nov 28, 2008 at 5:47 PM | Permalink

    Absolutely incredible, stunning. I haven’t had much time to work on this paper over the last several weeks (did a little R today) but the SH reconstruction is ….unbelievably good.

    I need to spend some time looking at the amount of infilling on the data. In the meantime, like the rest of your readers. Very nice.

  21. Geoff Sherrington
    Posted Nov 28, 2008 at 6:31 PM | Permalink

    Why are there 11 easily-visible and near-regular peaks/troughs in the period 1850 to 2005? That’s one every 14 years and 14 does not ring a bell.

    BTW, I had this unintelligent thought that if you smooth data in a way to produce good correlation coefficients, then calculate correlation coefficients, they will be good.

    Steve, that’s a bit rough – putting in figure 2.

    • Geoff Sherrington
      Posted Nov 29, 2008 at 5:01 AM | Permalink

      Re: Geoff Sherrington (#30),

      Sorry, forgot the Biblical 7 yours famine repeated.

      • John S.
        Posted Nov 29, 2008 at 11:01 AM | Permalink

        Re: Geoff Sherrington (#38),
        14 is also the number to the dozen that the most-ambitious oyster bars would serve in the good old days. That’s one better than a baker’s dozen!

        On more serious note, all the questions about apparent cycles and the coherence between instrument and proxy series would be readily answered by cross-spectrum analysis. I’d be happy to run such an analysis on any pair of string-formatted (i.e., date and data value) series that Steve Mc would send me.

  22. John Norris
    Posted Nov 28, 2008 at 9:12 PM | Permalink

    re clihybrid.m

    CLI is sometimes used to refer to command line interpreter, kind of an old fashioned command line input for programming in todays GUI world. Is it possible that this function pulls in operator entered data, or data from a text file that might look like operator entered data?

  23. Steve McIntyre
    Posted Nov 28, 2008 at 10:29 PM | Permalink

    This post is worth re-reading in the context of this method. My sense is that MAnnian RegEm is function more like OLS than a PLS-type method and isn’t necessarily a methodological improvement – quite aside from the booby traps of upside down muds and bristlecones.

  24. Pierre Gosselin
    Posted Nov 29, 2008 at 4:43 AM | Permalink

    Mann ought to start supplying the NHL. God knows he makes the best.
    Has anyone ever found a defect from Mann Mfg? You know, one that’s missing a blade?
    Give him credit for consistent quality.

  25. John A
    Posted Nov 30, 2008 at 1:30 AM | Permalink

    Steve, why does Mann “only [claim] an “average” verification r2 of 0.28″ when he very specifically state in his article that he did not calculate the verification R2 statistics (referring to his own tantrum article on the manifest failures of the R2 metric)?

  26. John Norris
    Posted Nov 30, 2008 at 7:20 PM | Permalink

    Incidentally, looks like clihybrid.m has been added to the folder, and the addition is noted in the updated Readme.txt file

  27. Posted Dec 11, 2008 at 1:07 PM | Permalink

    r-squared seems to be the statistic of choice for climatologists, and for those who would aspire to be one.

    I would really like to see the probability values associated with the statistic, but it seldom or never seems to appear.

    Is there a good reason for this?


  28. Mark T.
    Posted Dec 11, 2008 at 2:41 PM | Permalink

    Only when r2 is high, Robin, and it is “silly” to calculate it when it is near zero./snark


  29. Posted May 19, 2009 at 12:30 PM | Permalink

    Now we know why the reconstruction was perfect.

  30. Posted May 19, 2009 at 12:31 PM | Permalink

    Now we know why the reconstruction was perfect. RegEM doesn’t replace the original series data. I’ll do the EIV recon tonight.

2 Trackbacks

  1. […] Mann and Perfect Reconstructions by Steve McIntyre on November 28th, 2008 […]

  2. […] work continues, attempting to make sense of Mann’s hodge-podge of code and data (see this and this).  Hence the “peer review” failed. Their review might as well consisted […]

%d bloggers like this: