Rutherford et al 2005 (the et al being half the Hockey Team: Mann, Bradley, Hughes, Briffa, Jones, Osborn) is a re-statement of the MBH98 network (flawed PCs and all) and the Briffa et al 2001 network using RegEM. I haven’t figured out exactly what the properties of the RegEM method are as compared to other multivariate methods, but that’s a story for another day. This is the article where they first put forward the idea that the verification r2 is a "flawed" statistic.
While one could seek to estimate verification skill with the square of the Pearson correlation measure (r2), this metric can be misleading when, as is the case in paleoclimate reconstructions of past centuries, changes are likely in mean or variance outside the calibration period. To aid the reader in interpreting the verification diagnostics, and to illustrate the shortcomings of r2 as a diagnostic of reconstructive skill, we provide some synthetic examples that show three possible reconstructions of a series and the RE, CE, and r2 scores for each (supplementary material available online at http://fox.rwu.edu/~rutherfo/supplements/jclim2003a)
In our discussion of verification statistics, we’ve not argued that a verification r2 is sufficient for model success, only that it’s necessary. So their illustration has nothing to do with any actual argument that we’ve ever made. But, hey, they’re the Hockey Team. Their illustration of the above paragraph showed the following "synthetic" example where there is high verification r2 with poor model behavior.
Figure 1. verifexample.pdf top panel from Rutherford et al 2005 SI
It seems hard to imagine a real-world model where you would actually get an r2 of 1 and lose track of the mean level so badly. Actual statisticians (as opposed to “I am not a statistician” statisticians) use other methods to test for situations like this – a Durbin-Watson statistic would have picked up this sort of situation effortlessly. There’s no real need for the Hockey Team to re-invent time series statistics. If they think that they’ve proved something about the r2 statistic, they should submit it to a real statisticl journal and not just push it by Andrew Weaver at Journal of Climate. It’s embarrassing that the Journal of Climate, which has published much sophisticated and interesting material in the past, should, under Andrew Weaver’s watch, publish such a juvenile sketch.
However, for today’s little irony, they really didn’t need to invent a synthetic example. I’ll rotate this example, just to get your eye in (although the comparison I’m about to give is pretty obvious). Here you see a case where the divergence is upwards.
Figure 1 rotated 180 degrees
Now here is Figure xx from Rutherford et al, showing one of their reconstructions which has MXD data in it and the resultant "divergence problem". There seems to be some high-frequency coherence which would help the r2 (but the r2 is not JUST a high-frequency statistic as the diverging trend will penalize the r2 statistic.) A Durbin-Watson statistic would pick up the divergence effortlessly – or simply looking at the plot wouldn’t do any harm. So Rutherford, Mann et al. didn’t need to invent a synthetic example, they could have just used their own reconstruction with MXD data.
Figure 3. From Rutherford et al 2005.
If the point of their synthetic example was to say that such cases are flawed, then surely they had an obvious example right at hand. They could have said – here’s the MXD data, it demonstrates what happens with flawed models.
But this is the Hockey Team, so they handle it differently. Remember the Briffa MXD reconstruction, which had the same divergence problem. They truncated it in 1960 and snipped off the embarrassing bits at the end. This was done for the first time in IPCC TAR (I reported this last May: you had to blow up the graphic to see the truncation). In the article cited by IPCC (Briffa 2000), there was no truncation. It occurred in print in a later article (Briffa et al JGR 2001, not cited in TAR).
Rutherford has archived a number of reconstructions from this article both at his website (where I’m blocked) and at WDCP. If you examine them, you’ll see that he’s done the same trick. The digital data is truncated. None of the series contain digital data for the series illustrated above with the closing downtrend; they are all truncated, nearly all of them to 1960.
Has an MXD-based reconstruction shown any ability to measure warm periods? Who knows? They sure haven’t provided any evidence so far.
Now let’s suppose hypothetically that IPCC 4 AR had a spaghetti graph and that both the Briffa et al 2001 reconstructions and a Rutherford et al 2005 reconstruction were in it. Do you suppose that they would show the entire series, complete with "divergence" in the late 20th century? Or do you suppose that they would obtain "consensus", so to speak, by censoring the post-1960 values of these series so that the reconstructions all appear to go up in the late 20th century? A hypothetical question, of course.