Not A Solution to the Caramilk Secret

Update: the following does not explain the Caramilk secret of MBH99 confidence intervals, which remains unexplained and mysterious. End Update.

OK, Mann starts with a sigma obtained from the standard errors in the calibration period from his hugely overfitted model. He uses this in MBH98. In MBH99, recognizing the autocorrelation in the residuals, he adjusts the confidence intervals through a mysterious procedure. The adjustments as shown this morning in the figure below are done in bulk.


Top: black- MBH99 sigma; red – MBH98 sigma; bottom ratio of MBH99 sigma to MBH98 sigma (using the ignore2 to extend to 1000-1399).

The adjustment that we’re looking for is about 1.6 derived somehow. Now look at the following figure from MBH99 posted up previously here:


Original Caption. Figure 2. Spectrum of NH series calibration residuals from 1902-1980 for post-AD 1820 (solid) and AD 1000 (dotted) reconstructions (scaled by their mean white noise levels). Median and 90%,95%,and 99% significance levels (dashed lines) are shown.

This is derived from a spectrum but it is, as noted in the caption, "scaled by mean white noise levels". If you squint at the y-axis, you can perhaps persuade yourself that the value at the y-axis is about 1.6. which is the value in the "adjustment". So maybe what they do is use this y-axis value as an "adjustment" to the standard deviation and thus to the confidence interval. [Update: You cannot reasonably persuade yourself that it is 1.4. I was trying too hard and this possibility – which doesn’t make any sense anyway – is ruled out.]

Jean S sent me a note saying that he thinks he’s figured it out but was too busy to write it down till the end of the week. Jean S:

I kind of figured out the procedure from Mann and Lees. I don’t know what to say anymore, I’m kind of sad this type of papers do get published… Mann&Lees is sad.

[Update: I guess we’ll have to wait for Jean S. ]

Jean S has suggested a look at Stoica and Moses, Intro to Spectral Analysis, p. 37, section 2.4, "Properties of the Periodogram Method", so I’ll check that out.

I wonder what an econometrician working with spectral methods would say about this. – someone like Granger or Peter Robinson. For that matter, Bloomfield and Nychka of the NAS panel or both specialists in frequency domain. Now however odd Mann’s result is, bear in mind that Esper, Briffa, Jones, D’Arrigo are all even worse. At least Mann has indirectly considered the possibility of autocorrelated residuals and tried to allow for them – even if his method was weird. The other folks just ignore the problem and use the same approach as MBH98 for estimating residuals. Fit a model in a calibration period and then use the standard errors to calculate confidence intervals with no allowance for autocorrelation. But hey, they’re the Hockey Team.

[Update: The relevant section of Stoica says that the estimates of the Fourier coefficients through the periodogram are inconsistent and do not converge with N, but behave as a random variable. This doesn’t sound promising as a method of estimating an adjustment, that’s for sure. Now Mann estimates his spectrum using Thomson’s multitaper method – Stoica did not discuss this methodology; the purpose of the multitaper method is to reduce the variance of the estimate, so probably it’s less bad than the periodogram, but still the whole procedure, whatever it is, doesn’t sound promising. Having said that, doing nothing is not an alternative either. ]

46 Comments

  1. Hans Erren
    Posted Apr 25, 2006 at 3:11 PM | Permalink

    your first figure caption doesn’t make sense

  2. Hans Erren
    Posted Apr 25, 2006 at 3:24 PM | Permalink

    now it does 😉

  3. Hans Erren
    Posted Apr 25, 2006 at 3:48 PM | Permalink

    May I conclude that Michael Mann still has some undisclosed methods/steps? And you still have to guess how to exactly replicate the hockeystick? Whats this, hide and seek?

    Is there an overview of what is still missing?

  4. John A
    Posted Apr 25, 2006 at 3:53 PM | Permalink

    Is there an overview of what is still missing?

    Yes. Any scientific credibility.

  5. jae
    Posted Apr 25, 2006 at 4:13 PM | Permalink

    Steve (or someone): I’m trying to understand the statistics better. During the “calibration” or “training” period, is a simple linear regression model used? If so, why is overfitting a problem?

  6. Paul
    Posted Apr 25, 2006 at 4:20 PM | Permalink

    I thought that the secret was something like this.

  7. Steve McIntyre
    Posted Apr 25, 2006 at 4:21 PM | Permalink

    It’s not simple linear regession, but a sui generis procedure. Look at my posts on the Linear Algebra. The closest analog in statistics that you can look at in a book is partial least squares – although you’d never know this from the descriptions. This is an article in itself. Since the proxies are close to being orthogonal (some signal!), the partial least squares coefficients are a slight transformation of the multiple regression coefficients of each temperature PC on the entire proxy network i.e. in a period of 79 years, a regression against 22-112 predictors. No wonder they get pretty good calibration r2 statistics. It might be one of the most laughable procedures in the history of statistics – and this is over and above the PC nonsense.

    Now Mann says that they’ve moved on but still get the “same” answer. RegEM is another procedure that’s idiosyncratic and not known to (say) Draper and Smith so you’re dependent on Hockey Team descriptions of what they did. My guess is that the linear algebra will boil down in regression terms to something like a multiple (inverse) regression on the proxies.

    Now recall that regressions are supposed to proceed from cause to effect – temperature preseumable causes tree rings, not tree rings causing temperature. Thus “inverse”.

  8. jae
    Posted Apr 25, 2006 at 5:03 PM | Permalink

    Since the proxies are close to being orthogonal (some signal!), the partial least squares coefficients are a slight transformation of the multiple regression coefficients of each temperature PC on the entire proxy network i.e. in a period of 79 years, a regression against 22-112 predictors.

    I guess this is really over my head. So there are essentially 22-112 “variables” in the multiple regression? And the “variables” are temperature PCs?

  9. jae
    Posted Apr 25, 2006 at 5:13 PM | Permalink

    re #6. Like I said some time ago, reviewers are often very busy people, and sometimes they barely scan an article. I am positive that sometimes they don’t even look at it. I have seen several cases in my field where I am fairly certain I was the only reviewer that read the paper. Once, I reviewed a paper that was complete garbage, and I said so, but it was published anyway, because the other reviewers did not submit any negative comments. This problem is probably much more prevalent in situations where the author is “famous,” and some reviewers just blindly accept what is said.

  10. Paul Penrose
    Posted Apr 25, 2006 at 8:12 PM | Permalink

    JAE,
    Sometimes I think that the reviewers assume that if they can’t make any sense of a paper it’s because the author is brilliant and they just are not smart enough. Who wants to take the chance that someone will call you “stupid” or “dim-witted” if you call them out and are wrong?

  11. TCO
    Posted Apr 25, 2006 at 10:03 PM | Permalink

    Didn’t MBH claim in 98 that they had considered autocorrelation and that it did not occur? That they have some vague (non-numeric, qualitative) evaluation that it was “fairly white”? (BTW, these type of non-mumeric puffy justifications are pretty common as Burger and Cubash noted and tweaked Mann for.) If they do adjust for autocorr in 99, what does that say about their earlier 98 claim? Did they even check in 98? Do a DW test?

  12. Lee
    Posted Apr 26, 2006 at 10:45 AM | Permalink

    How do you get “If you squint at the y-axis, you can perhaps persuade yourself that the value at the y-axis is about 1.6 …” from the graph? Its a logarithmic scale, with the 0.1 unit ticks clearly visible on the Y axis. The upper (1000-1980) curve starts at the 1.4 tick, and at no other rplace is it higher than that. The entire visible scale only goes to about 1.7.

  13. Lee
    Posted Apr 26, 2006 at 10:55 AM | Permalink

    ummm… except there arent enough ticks betweeen zero and one. And the spacing (by mark-1 eyeball) looks off a bit at the top of the interval between 0 and 1; it looks like they indicated a single point between 0.7 and 1, instead of 0.8 and 0.9?
    I still dont see how one can arrive, even by squinting, at a value of 0.6 from that, though. The tick-marks, even with the missing mark, are clearly not linear.

  14. Lee
    Posted Apr 26, 2006 at 10:57 AM | Permalink

    oh, brain farts. Never mind. This is what happens when one tries to think on 4 hours sleep.

  15. TCO
    Posted Apr 26, 2006 at 11:06 AM | Permalink

    I had same question about the squinting. Is it a decibel scale? And did they put a zero on a log scale?

  16. Lee
    Posted Apr 26, 2006 at 11:30 AM | Permalink

    Thus my gibberish above; in my sleep deprived state, I was **reading** a log scale starting at 0.

    Putting a zero at the origin on a log scale is unfortunately very easy to do; I’ve worked with statistical graphing apps (in the late 90s, as a matter of fact) that put a zero at the origin by default.

  17. TCO
    Posted Apr 26, 2006 at 11:35 AM | Permalink

    Yeah…I agree, it’s a pedantic point. I had the same questions about how Steve “eyeballs” 1.6. It’s like the stuff with the omitted part of the spaghetti graph, that Steve has written all about on and on but never bother producing a graphic that clearly shows the problem.

  18. TCO
    Posted Apr 26, 2006 at 11:37 AM | Permalink

    It looks like 1.4 to me, just counting tick marks.

  19. Lee
    Posted Apr 26, 2006 at 11:54 AM | Permalink

    It is about 1.4 only if those are linear tick-marks (starting at zero) placed on logarithmic intervals with the log scale restarting at units… which is the embarassing misreading I made with that zero in there. On a logarithmic scale, that upper curve starts at about 4.

  20. Spence_UK
    Posted Apr 26, 2006 at 12:13 PM | Permalink

    I vote the upper curve starts at 5

    Any more takers 😉

  21. Lee
    Posted Apr 26, 2006 at 12:31 PM | Permalink

    Spence – damn, I can’t count either. I’m gonna shutup now and stop embarassing myself – I need much more sleep. Its absurd what a 4am conference call, after getting home at midnight, does to my ability to handle even elementary ideas.

  22. jae
    Posted Apr 26, 2006 at 12:58 PM | Permalink

    Good grief, you guys! It is clear that the upper curve meets the y-axis at 5, and the lower curve meets it at approximately 2.2.

  23. Paul K
    Posted Apr 26, 2006 at 12:59 PM | Permalink

    I see a log scale there as well. As it happens, the natural log of 5 is … 1.6?

  24. TCO
    Posted Apr 26, 2006 at 1:29 PM | Permalink

    I see the 5 and the 2.2 now. The zero is realy 0.1 right?

  25. Bruce
    Posted Apr 26, 2006 at 2:07 PM | Permalink

    Clearly MBH have a good grasp of log scales by the way they have labelled the Y-axis of this graph!

    Actually, on the information provided, it is not possible to say what the Y-Axis scale is. I agree that we could “guess” (why is it that we have to “guess” a lot to try and figure out what MBH and the Hockey Team are on about??) that each tick below the 1 is 0.1 and each tick above 1 is 1. If that is the case, then the 0 at the origin should be 0.1 and the top of the graph is 8 (and the upper curve starts at 5 as several posters suggest).

    However, it is possible with log scales that the ticks could represent other values. For example, the tick above 1 COULD (perhaps unlikely but…) actually be 10, which would give a different set of values below 1, and make the origin a much smaller number (too hard to work it out). Or it could be a 5 or any other number.

    The point is that professionalism requires people like MBH to get this stuff right if they wish to be taken seriously. Putting 0 at the origin of what is clearly a logarithmic scale is just a basic, elementary mistake that throws doubt on their work in the same way that poor spelling (“there” for “their” for example) would.

  26. John A
    Posted Apr 26, 2006 at 2:22 PM | Permalink

    The point is that professionalism requires people like MBH to get this stuff right if they wish to be taken seriously.

    But they’ve “moved on” and they clearly didn’t expect anyone to go through this with such a thorough toothcomb as Steve.

  27. Steve McIntyre
    Posted Apr 26, 2006 at 2:31 PM | Permalink

    Lee and TCO, I agree that it’s 1.4. I was trying too hard to find 1.6 and was tired and frustrated with this piece of crap statistics and was adding in the dashed lines. So it’s 1.4 and 1.6 remains unexplained – not that this would have been an acutal explanation.

    I will edit to reflect this.

  28. Steve McIntyre
    Posted Apr 26, 2006 at 2:43 PM | Permalink

    Following added above: The relevant section of Stoica says that the estimates of the Fourier coefficients through the periodogram are inconsistent and do not converge with N, but behave as a random variable. This doesn’t sound promising as a method of estimating an adjustment, that’s for sure. Now Mann estimates his spectrum using Thomson’s multitaper method – Stoica did not discuss this methodology; the purpose of the multitaper method is to reduce the variance of the estimate, so probably it’s less bad than the periodogram, but still the whole procedure, whatever it is, doesn’t sound promising. Having said that, doing nothing is not an alternative either.

  29. Louis Hissink
    Posted Apr 26, 2006 at 2:53 PM | Permalink

    # 25

    If Bruce is correct and its a log scale, then the next increment above 1 would be 2. That said, Steve, Lee and TCO would also be right if the next increment is 1.1 blah blah. Still makes it a log scale for Y but why plot is as such??????

    Crap statistics? Stronger words come to mind for this dodgy plot labelling!

    One thing is becoming clear from all these statistics – the theory sure isn’t self evident, for it it was, there would be no need for all this statistical gobbledygook.

    In other words – AGW was, is and will forever be a crock.

  30. jae
    Posted Apr 26, 2006 at 3:08 PM | Permalink

    It sure is strange (in a sad and disgusting sort of way) that Mann will not just answer these simple questions.

  31. John A
    Posted Apr 26, 2006 at 5:03 PM | Permalink

    It’s clear to me that Mann was over his head with the statistics. No wonder he won’t respond.

  32. Mark
    Posted Apr 26, 2006 at 5:17 PM | Permalink

    It is my opinion that he was implementing something someone else suggested, without actually ever trying to understand the underlying theory. Bad idea with something this public as, obviously, somebody will eventually uncover the sleight of hand.

    Mark

  33. jae
    Posted Apr 26, 2006 at 5:54 PM | Permalink

    His silence is deafening. What is the poor guy to do now?

  34. TCO
    Posted Apr 26, 2006 at 9:26 PM | Permalink

    I’m in the “5” camp now. BTW, why do we only care about the top curve for purpose of the correction factor? What is the implication of each curve in general?

  35. Spence_UK
    Posted Apr 27, 2006 at 2:32 AM | Permalink

    I had a look at the Mann and Lees paper a while ago, but quickly got bogged down, and decided I needed to get up to speed on the MTM stuff. So I looked at a paper by Michael Ghil on the topic… all seemed fairly reasonable, perhaps of limited practical value, but in certain circumstances should marginally help detectability of oscillations in noisy data. Although reading his work is a bit like swimming through treacle – these guys like to wax lyrical in their scientific scribblings.

    Anyway, I thought, all very interesting, what has this got to do with confidence intervals? Then, last night, I briefly re-scanned Mann and Lees and realised I missed a chapter right at the end about confidence intervals. D’oh.

    I only scanned it briefly, but a pair of words leapt out at me. “Locally white”. I haven’t read it carefully enough to be sure (too busy – honest!) but I wonder if they are splitting their power spectrum up into (say) n sub-spectra, working out their contribution to the error, then reconstituting this into a confidence interval?

    It strikes me that this probably wouldn’t give a terrible answer for n-1 of the sub-spectra. But if you applied this at the very low frequency end, you’ll run into big trouble. Anything in the calibration period much below a 79 year cycle will be suppressed by the regression step. Even then, no matter how good MTM is, I doubt it would be able to detect much below (say) 158 year cycles anyway – the median filter begins to break down at the edge of the spectrum, so it becomes less accurate there.

    If you assume that the residual power spectrum is 1/f, the noise power at (say) 790 year cycles will be ten times greater than that at 79 year cycles. If Mann has assumed the power spectrum is “locally white” they will be applying a noise power at 790 years of one times that at 79 years.

    Caveat as ever, I haven’t read Mann and Lees carefully enough to be sure this is what they are doing. It is just what the expression “locally white” says to me. I could be very, very wrong 😉

  36. Jean S
    Posted Apr 27, 2006 at 4:33 AM | Permalink

    Steve, I didn’t mean that Stoica reference was answer to the question. What I meant was that I gave the answer why you should not probably proceed as they do in Mann&Lees.

    Anyhow, I was able to figure out the MBH99 secret, it is actually (once again) very simple thing. Luckily, Mann included those “ignore these” columns, otherwise it had been impossible to figure out. The answer has nothing to do with Mann&Lees (what I originally thought), but Mann&Lees turned out to be a “nice” introduction to Mannian jargon and thinking. I’ll e-mail the solution to Steve today, so he can check it. Those not wanting to wait until Steve reveals the secret, I’ll just say that the key word in MBH99 is “composite” and its relation tho those “ignore these” columns…

    re #35: Mann&Lees procedure in actually very simple. It’s just buried deep into Mannian jargon. I invite anyone with a decent knowledge of spectral techniques to take a look at the paper for the purpose of actually seeing how these guys write papers… The procedure is about the following (excuse me for the nonproper terminology, this is not exactly my field):
    1) The idea is to measure the red noise level in the observed signal. Why they had to invent their
    own method for doing that as the litereture is full of such methods, I don’t know.
    I guess you can find the partial answer from the paragraph on p. 431.
    2) First they calculate the spectrum estimate (Thomson’s method).
    3) Then they (running) median filter (smoothen) this estimate (this is why the procedure is called “robust” everywhere!) to get what they call “spectrum background”. I guess the idea here is that
    median filtering is supposed to remove the contribution of the actual (“climatic” in their case)
    signal, or something.
    4) Then they numerically (!!!) minimize the square error between the “background spectrum estimate”
    and the theoretical spectrum of AR(1) noise for the parameters of the interest, i.e.
    for noise variance \sigma^2 and lag-one coefficient \rho.
    That’s it!

  37. Louis Hissink
    Posted Apr 27, 2006 at 4:57 AM | Permalink

    Steve should answer this but writing as a physical scientist, graphs or plots are supposed to be physically meaningful – ie plot concentration of gold as Y axis and distance from source on X axis, and you have a plain simple presentation of some spatially complex facts. A no brainer, as they say.

    But when PCA and other statistical derivations are plotted, one quickly loses sight of the physical connection of the data and become embroiled in the, admitedly really interesting, mathematical properties of the derived data. I experienced that when Geostatistics started to become significant in mining and mineral exploration. Except that in the case I know about personally, a start from first principles was deemed mandatory. Once that was done, geostatistics became rather mundane and overhyped.

    So in mineral exploration we cannot allow ourselves such intellectual luxuries of arguing over statistical minutae but apparently in academia, these days, it is quite normal, since operating profitabally is an unknown experience. (Well, no, since academics, when doing outside consulting, sure know how to charge, so the profit and loss concept is not unknown to them).

    We (mining types) would call it another “flight from reality” instance.

    This seems to have occurred in astronomy where the maths are more important than the observations – black holes for example which where initially inferred from the maths, (a black hole is essentially a point in 3-D space which has no volume but infinite mass), and as the maths so sprake, so did the astronomers so search.

    It never occurred to them that the maths might be in error.

    It never occurs to the faithful that their beliefs might be misplaced.

    And the rest of us have to put up with the crap that this blind adherence to dogma, whether scientific or theological, generates.

    Hence this blog which Steve runs.

  38. John A
    Posted Apr 27, 2006 at 5:55 AM | Permalink

    (a black hole is essentially a point in 3-D space which has no volume but infinite mass)

    No it isn’t. Black holes have a definite finite mass.

  39. Louis Hissink
    Posted Apr 27, 2006 at 6:02 AM | Permalink

    Ahem,

    A black hole is derived from D = M/V where V ==> 0, ok, you are right, a finite mass in no volume.

    Nyyaaaaa!

  40. Louis Hissink
    Posted Apr 27, 2006 at 6:03 AM | Permalink

    🙂 🙂

  41. John A
    Posted Apr 27, 2006 at 6:31 AM | Permalink

    Stay away from astrophysics, Louis, you’re out of your depth.

  42. Spence_UK
    Posted Apr 27, 2006 at 6:31 AM | Permalink

    Re #36

    Thanks for the info Jean, sounds like I was a bit wide of the mark – I was assuming they had crudely mapped a red noise distribution into a short series of white noise distributions, but from your description their methods are not what I had assumed.

    The continual guesswork required to figure out the obscure methods they apply when there are perfectly good “off-the-shelf” methods is hugely frustrating.

  43. Louis Hissink
    Posted Apr 27, 2006 at 7:09 AM | Permalink

    What?

    By depth/zero John ?

    🙂

  44. Jean S
    Posted Apr 27, 2006 at 7:23 AM | Permalink

    re #42: No problem, at least you don’t have to spend that many ours with the damn paper as I did. BTW, read the “Acknowledgements”: they had one(!) reviewer. Also one of the names listed really pops up. The hockey team truly seems to be a one small happy family!

  45. Terry
    Posted Apr 28, 2006 at 9:07 PM | Permalink

    Fascinating thread folks. Thanks.

  46. nanny_govt_sucks
    Posted Apr 29, 2006 at 2:04 AM | Permalink

    Maybe someone from Northern California can ask Mann himself about this secret formula. He’ll be speaking to a bunch of tie-dyed-shirt-wearing vegans at eco-friently UC Santa Cruz on May 10th. I’m sure that amongst all the glowing praise from his supporters he’d appreciate a chance to clarify the confidence intervals question.

    http://currents.ucsc.edu/05-06/05-01/brief-mann.asp