Ross McKitrick on Mann et al 2007

Ross McKitrick writes:

First, on page 15, they say:

This result (as well as the separate study by Wahl and Ammann [2007]) thus refutes the previously made claim by MM05 that the features of the MBH98 reconstruction are somehow an artifact arising from the use of PC summaries to represent proxy data.

Well, if we had argued that the only problem in MBH98/99 is the PC error, and that the PC error alone produces the MBH hockey stick, then this paper and its triumphant conclusion might count for something. But we argued something a tiny bit more complex (though not much). The PCs were done wrong, and this had 2 effects.

it overstated the importance of the bristlecones in the NOAMER network, justifying keeping them in even though they’re controversial for the purpose and their exclusion overturned the results.
It overstated the explanatory power of the model when checked against a null RE score (RE=0), since red noise fed into Mann’s PC algorithm yielded a much higher null value (RE>0.5) due to the fact that the erroneous PC algorithm “bends” the PC1 to fit the temperature data.

Mann’s new paper doesn’t mention the reliance on bristlecones. Nobody questions that you can get hockey sticks even if you fix the PC algorithm, as long as you keep the bristlecones. But if you leave them out, you don’t get a hockey stick, no matter what method you use. Nor, as far as I can tell, does this paper argue that MBH98 actually is significant, and certainly the Wahl&Ammann recalculations (http://www.climateaudit.org/?p=564) should put that hope to rest.

Second, maybe I’m missing a nuance, but the diatribe against r2 (and by extension, Steve et moi, para 64) is misguided on 2 counts. They say that it doesn’t “reward” predicting out of sample changes in mean and variance (paragraph 39). Where I work, we don’t talk about test statistics “rewarding” estimations, instead we talk about them “penalizing” specific failures. The RE and CE scores penalize failures that r2 ignores. We argued that r2 should be viewed as a minimum test, not the only test. You can get a good r2 score but fail the RE and CE tests, and as the NRC concluded, this means your model is unreliable. But if you fail the r2 test, and you “pass” the RE test, that suggests you’ve got a specification that artificially imposes some structure on the out of sample portion that conveniently follows the target data, even though the model has no real explanatory power. Another thing that’s misguided is their use of the term “nonstationary”. They say that RE is much better because it takes account of the nonstationarity of the data. Again, where I work, if you told a group of econometricians you have nonstationary data, then proceeded to regress the series on each other in levels (rather than first differences) and brag about your narrow confidence intervals, nobody would stick around for the remainder of your talk. And you’d hear very loud laughter as soon as the elevator doors closed up the hall.

Third, what’s missing here is any serious thought about the statistical modeling. I get the idea that they came across an algorithm used to infill missing data, and somebody thought “whoah, that could be used for the proxy reconstruction problem” and thus was born RegEM. Chances are (just a guess on my part) people developing computer algorithms to fill in random holes in data matrices weren’t thinking about tree rings and climate when they developed the recursive data algorithm. You need to be careful when applying an algorithm developed for problem A to a totally different problem B, that the special features of B don’t affect how you interpret the output of the algorithm.

For example, in the case of proxies and temperature, it is obvious that there is a direction of causality: tree growth doesn’t drive the climate. In statistical modeling, the distinction between exogenous and endogenous variables matters acutely. If the model fails to keep the two apart, such that endogenous variables appear directly or indirectly the right-hand side of the equals sign, you violate the assumptions on which the identification of structural parameters and the distribution of the test statistics are derived. Among the specification errors in statistical modeling, failure to handle endogeneity bias is among the worst because it leads to both bias and inconsistency.

A comment like (para 13)

“An important feature of RegEM in the context of proxy-based CFR is that variance estimates are derived in addition to expected values”

would raise alarm bells in econometrics. It sounds like that guy in Spinal Tap who thinks his amp is better than the others because the dial on his goes up to 11. So, the stats package spits out a column of numbers called “variances”. My new program is better than the old one because the dial goes up to “variances”.

It’s just a formula computer program, and it won’t tell you if those numbers are gibberish in the context of your model and data (as they would be if you have nonstationary data). You need a statistical model to interpret the numbers and show to what extent the moments and test statistics approximate the scores you are interested in.

This entry was written by Stephen McIntyre, posted on Nov 22, 2007 at 11:43 AM, filed under Mann et al 2007 and tagged mann.2007, RE, RegEM. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

6 Comments

Jean S

Posted Nov 22, 2007 at 12:12 PM | Permalink

would raise alarm bells in econometrics

My alarm bells went wild after reading this (my bold):

As explained by Schneider [2001], under normality assumptions, the conventional EM algorithm without regularization converges to the maximum likelihood estimates of
the mean values, covariance matrices and missing values, which thus enjoy the optimality properties common to maximum likelihood estimates [Little and Rubin, 1987]. In the limit of no regularization, as Schneider [2001] further explains, the RegEM algorithm reduces to the conventional EM algorithm and thus enjoys the same optimality properties. While the regularization process introduces a bias in the estimated missing values as the price for a reduced variance (the bias/variance trade-off common to all regularized regression approaches), it is advisable in the potentially ill-posed problems common to CFR. Unlike other current CFR methods, RegEM offers the theoretical advantage that its properties are demonstrably optimal in the limit of no regularization.
Ross McKitrick

Posted Nov 22, 2007 at 12:42 PM | Permalink

I think what they’re calling “regularization”is usually referred to as ridge regression (though maybe there’s a difference I didn’t pick up on). Ridge regression introduces a bias in the slope estimator as a tradeoff for a reduction in the trace of the variance matrix, which makes the standard errors smaller. But you have to make a case why the tradeoff is valid since the ridge parameter can be arbitrary. As far as I know it’s usually associated with collinearity problems, and if you use it you’re expected to show that the size of the introduced bias is small.

The claim that RegEM’s properties are “demonstrably optimal in the limit of no regularization” amounts to saying that ridge regression has the advantage that if you don’t do ridge regression it reduces to OLS, and OLS is optimal, in those cases where OLS is the optimal estimator. In other words, if we weren’t doing what we are doing, we be doing something else, which might be optimal if what we were doing happened to fit the optimality conditions.
DAV

Posted Nov 22, 2007 at 12:55 PM | Permalink

tree growth doesnt drive the climate

There could be feedback but, yes, it’s unlikely that tree ring data would represent a cause in Pearl’s sense.

thus was born regEM. Chances are (just a guess on my part) people developing computer algorithms to fill in random holes in data matrices werent thinking about tree rings and climate when they developed the recursive data algorithm

One of the pitfalls with the EM algorithm. Its results are best used when the data omissions are not meaningful ,i.e., caused by random events such as coding error vs. say data collection ceasing upon subject’s death. A second pitfall is failing to realize that it only produces expected results by filling in the most likely value. This is useful say when training a Bayes Net but I agree it’s of questionable value for learning something new. One exception though, might be learning the values of a hidden variable.

Speaking of omission, I admit I haven’t read the paper yet. I’ll have to correct that soon. It’s a bit hard to imagine a cause of meaningful omissions in tree ring data.

regularization process introduces a bias in the estimated missing values

Yeah. That’s bizarre. If anything, the missing value is replaced with an estimate biased by the other data.
Jean S

Posted Nov 22, 2007 at 1:16 PM | Permalink

if we werent doing what we are doing, we be doing something else, which might be optimal if what we were doing happened to fit the optimality conditions.

… which is nice since this purely hypothetical advantage is not offered by other current CFR methods (and also we might bull a reviewer or two with these fancy sounding sentences) 🙂
Peter D. Tillman

Posted Nov 22, 2007 at 2:20 PM | Permalink

Ross, I hope you and Steve write up your criticisms of Mann et al (2007) and submit them to JGR. I think your findings, and Steve’s remarks on their idiotic (and repeated) scrambling of their data set, should be put into the formal literature.

[shakes head in disbelief]
Peter D. Tillman
Consulting Geologist, Arizona and New Mexico (USA)
Robert Wood

Posted Nov 22, 2007 at 6:26 PM | Permalink

Interesting comments about filling in missing gaps of time series. As a hardware EE, I have found the best implementation, from both a cost and effect perspective, is just to use random noise or repeat the previous data packet(s). These had the least effect upon the spectral characteristics

Climate Audit