First, on page 15, they say:
This result (as well as the separate study by Wahl and Ammann ) thus refutes the previously made claim by MM05 that the features of the MBH98 reconstruction are somehow an artifact arising from the use of PC summaries to represent proxy data.
Well, if we had argued that the only problem in MBH98/99 is the PC error, and that the PC error alone produces the MBH hockey stick, then this paper and its triumphant conclusion might count for something. But we argued something a tiny bit more complex (though not much). The PCs were done wrong, and this had 2 effects. (1) it overstated the importance of the bristlecones in the NOAMER network, justifying keeping them in even though theyre controversial for the purpose and their exclusion overturned the results. (2) It overstated the explanatory power of the model when checked against a null RE score (RE=0), since red noise fed into Manns PC algorithm yielded a much higher null value (RE>0.5) due to the fact that the erroneous PC algorithm bends the PC1 to fit the temperature data. Manns new paper doesnt mention the reliance on bristlecones. Nobody questions that you can get hockey sticks even if you fix the PC algorithm, as long as you keep the bristlecones. But if you leave them out, you dont get a hockey stick, no matter what method you use. Nor, as far as I can tell, does this paper argue that MBH98 actually is significant, and certainly the Wahl&Ammann recalculations (http://www.climateaudit.org/?p=564) should put that hope to rest.
Second, maybe Im missing a nuance, but the diatribe against r2 (and by extension, Steve et moi, para 64) is misguided on 2 counts. They say that it doesnt reward predicting out of sample changes in mean and variance (paragraph 39). Where I work, we dont talk about test statistics rewarding estimations, instead we talk about them penalizing specific failures. The RE and CE scores penalize failures that r2 ignores. We argued that r2 should be viewed as a minimum test, not the only test. You can get a good r2 score but fail the RE and CE tests, and as the NRC concluded, this means your model is unreliable. But if you fail the r2 test, and you pass the RE test, that suggests youve got a specification that artificially imposes some structure on the out of sample portion that conveniently follows the target data, even though the model has no real explanatory power. Another thing thats misguided is their use of the term nonstationary. They say that RE is much better because it takes account of the nonstationarity of the data. Again, where I work, if you told a group of econometricians you have nonstationary data, then proceeded to regress the series on each other in levels (rather than first differences) and brag about your narrow confidence intervals, nobody would stick around for the remainder of your talk. And youd hear very loud laughter as soon as the elevator doors closed up the hall.
Third, whats missing here is any serious thought about the statistical modeling. I get the idea that they came across an algorithm used to infill missing data, and somebody thoughtwhoah, that could be used for the proxy reconstruction problemand thus was born regEM. Chances are (just a guess on my part) people developing computer algorithms to fill in random holes in data matrices werent thinking about tree rings and climate when they developed the recursive data algorithm. You need to be careful when applying an algorithm developed for problem A to a totally different problem B, that the special features of B dont affect how you interpret the output of the algorithm.
For example, in the case of proxies and temperature, it is obvious that there is a direction of causality: tree growth doesnt drive the climate. In statistical modeling, the distinction between exogenous and endogenous variables matters acutely. If the model fails to keep the two apart, such that endogenous variables appear directly or indirectly the right-hand side of the equals sign, you violate the assumptions on which the identification of structural parameters and the distribution of the test statistics are derived. Among the specification errors in statistical modeling, failure to handle endogeneity bias is among the worst because it leads to both bias and inconsistency. A comment like (para 13) An important feature of RegEM in the context of proxy-based CFR is that variance estimates are derived in addition to expected values. would raise alarm bells in econometrics. It sounds like that guy in Spinal Tap who thinks his amp is better than the others because the dial on his goes up to 11. So, the stats package spits out a column of numbers called variances. My new program is better than the old one because the dial goes up to variances.
Its just a formula computer program, and it wont tell you if those numbers are gibberish in the context of your model and data (as they would be if you have nonstationary data). You need a statistical model to interpret the numbers and show to what extent the moments and test statistics approximate the scores you are interested in.