First, on page 15, they say:

This result (as well as the separate study by Wahl and Ammann [2007]) thus refutes the previously made claim by MM05 that the features of the MBH98 reconstruction are somehow an artifact arising from the use of PC summaries to represent proxy data.

Well, if we had argued that the only problem in MBH98/99 is the PC error, and that the PC error alone produces the MBH hockey stick, then this paper and its triumphant conclusion might count for something. But we argued something a tiny bit more complex (though not much). The PCs were done wrong, and this had 2 effects. (1) it overstated the importance of the bristlecones in the NOAMER network, justifying keeping them in even though theyre controversial for the purpose and their exclusion overturned the results. (2) It overstated the explanatory power of the model when checked against a null RE score (RE=0), since red noise fed into Manns PC algorithm yielded a much higher null value (RE>0.5) due to the fact that the erroneous PC algorithm bends the PC1 to fit the temperature data. Manns new paper doesnt mention the reliance on bristlecones. Nobody questions that you can get hockey sticks even if you fix the PC algorithm, as long as you keep the bristlecones. But if you leave them out, you dont get a hockey stick, no matter what method you use. Nor, as far as I can tell, does this paper argue that MBH98 actually is significant, and certainly the Wahl&Ammann recalculations (http://www.climateaudit.org/?p=564) should put that hope to rest.

Second, maybe Im missing a nuance, but the diatribe against r2 (and by extension, Steve et moi, para 64) is misguided on 2 counts. They say that it doesnt reward predicting out of sample changes in mean and variance (paragraph 39). Where I work, we dont talk about test statistics rewarding estimations, instead we talk about them penalizing specific failures. The RE and CE scores penalize failures that r2 ignores. We argued that r2 should be viewed as a minimum test, not the only test. You can get a good r2 score but fail the RE and CE tests, and as the NRC concluded, this means your model is unreliable. But if you fail the r2 test, and you pass the RE test, that suggests youve got a specification that artificially imposes some structure on the out of sample portion that conveniently follows the target data, even though the model has no real explanatory power. Another thing thats misguided is their use of the term nonstationary. They say that RE is much better because it takes account of the nonstationarity of the data. Again, where I work, if you told a group of econometricians you have nonstationary data, then proceeded to regress the series on each other in levels (rather than first differences) and brag about your narrow confidence intervals, nobody would stick around for the remainder of your talk. And youd hear very loud laughter as soon as the elevator doors closed up the hall.

Third, whats missing here is any serious thought about the statistical modeling. I get the idea that they came across an algorithm used to infill missing data, and somebody thoughtwhoah, that could be used for the proxy reconstruction problemand thus was born regEM. Chances are (just a guess on my part) people developing computer algorithms to fill in random holes in data matrices werent thinking about tree rings and climate when they developed the recursive data algorithm. You need to be careful when applying an algorithm developed for problem A to a totally different problem B, that the special features of B dont affect how you interpret the output of the algorithm.

For example, in the case of proxies and temperature, it is obvious that there is a direction of causality: tree growth doesnt drive the climate. In statistical modeling, the distinction between exogenous and endogenous variables matters acutely. If the model fails to keep the two apart, such that endogenous variables appear directly or indirectly the right-hand side of the equals sign, you violate the assumptions on which the identification of structural parameters and the distribution of the test statistics are derived. Among the specification errors in statistical modeling, failure to handle endogeneity bias is among the worst because it leads to both bias and inconsistency. A comment like (para 13) An important feature of RegEM in the context of proxy-based CFR is that variance estimates are derived in addition to expected values. would raise alarm bells in econometrics. It sounds like that guy in Spinal Tap who thinks his amp is better than the others because the dial on his goes up to 11. So, the stats package spits out a column of numbers called variances. My new program is better than the old one because the dial goes up to variances.

Its just a formula computer program, and it wont tell you if those numbers are gibberish in the context of your model and data (as they would be if you have nonstationary data). You need a statistical model to interpret the numbers and show to what extent the moments and test statistics approximate the scores you are interested in.

## 6 Comments

My alarm bells went wild after reading this (my bold):

I think what they’re calling “regularization”is usually referred to as ridge regression (though maybe there’s a difference I didn’t pick up on). Ridge regression introduces a bias in the slope estimator as a tradeoff for a reduction in the trace of the variance matrix, which makes the standard errors smaller. But you have to make a case why the tradeoff is valid since the ridge parameter can be arbitrary. As far as I know it’s usually associated with collinearity problems, and if you use it you’re expected to show that the size of the introduced bias is small.

The claim that RegEM’s properties are “demonstrably optimal in the limit of no regularization” amounts to saying that ridge regression has the advantage that if you don’t do ridge regression it reduces to OLS, and OLS is optimal, in those cases where OLS is the optimal estimator. In other words, if we weren’t doing what we are doing, we be doing something else, which might be optimal if what we were doing happened to fit the optimality conditions.

There could be feedback but, yes, it’s unlikely that tree ring data would represent a cause in Pearl’s sense.

One of the pitfalls with the EM algorithm. Its results are best used when the data omissions are not meaningful ,i.e., caused by random events such as coding error vs. say data collection ceasing upon subject’s death. A second pitfall is failing to realize that it only produces

expectedresults by filling in the most likely value. This is useful say when training a Bayes Net but I agree it’s of questionable value for learning something new. One exception though, might be learning the values of a hidden variable.Speaking of omission, I admit I haven’t read the paper yet. I’ll have to correct that soon. It’s a bit hard to imagine a cause of meaningful omissions in tree ring data.

Yeah. That’s bizarre. If anything, the missing value is replaced with an estimate biased by the other data.

… which is nice since this purely hypothetical advantage is not offered by other current CFR methods (and also we might bull a reviewer or two with these fancy sounding sentences) :)

Ross, I hope you and Steve write up your criticisms of Mann et al (2007) and submit them to JGR. I think your findings, and Steve’s remarks on their idiotic (and repeated) scrambling of their data set, should be put into the formal literature.

[shakes head in disbelief]

Peter D. Tillman

Consulting Geologist, Arizona and New Mexico (USA)

Interesting comments about filling in missing gaps of time series. As a hardware EE, I have found the best implementation, from both a cost and effect perspective, is just to use random noise or repeat the previous data packet(s). These had the least effect upon the spectral characteristics