Yesterday in Errors Matter #1, I argued that any new reconstruction now proposed by Mann et al. as a means of salvaging MBH98-type results has to also meet the representations and warranties of MBH98 used to induce widespread acceptance. I showed that the no-PC reconstruction recently proposed by Mann et al. as a way of salvaging MBH98-type results did not meet those standards. Today, I show that the salvage proposal of Rutherford et al. [2005] (in which many Hockey Team members are co-authors, including Mann, Bradley, Hughes, Jones, Briffa and Osborn) also fails these standards.
Re-capping yesterday, Rutherford et al. [2005] is the "completely different methodology" reported in the following statement from the Hockey Team:
We quickly recap the points for readers who do not want to wade through the details: i) the MBH98 results do not depend on what kind of PCA is used, as long as all significant PCs are included, ii) the results are insensitive to whether PCA is used at all (or whether all proxies are included directly), and iii) the results are replicated using a completely different methodology (Rutherford et al, 2005).
Rutherford et al. [2005] is available in pre-publication form here. As I’ll show in detail, the "completely different methodology" of Rutherford et al. is completely irrelevant to the issues that we have raised.
The differences in methodology are all downstream of the construction of the proxy network and pertain only to difference in massaging this data. Rutherford et al. [2005] uses a procedure called "RegEM", while calibration-estimation through linear regression are used in MBH98. It is my contention that the flaws in the network will be carried forward and replacing a downstream linear regression module with a downstream RegEM module is completely and astonishingly irrelevant.
Rutherford et al. [2005] uses the RegEM module on two networks, which are relevant to the present discussion. Unfortunately, both of these networks are flawed:
1. the first network is the original MBH98 network with flawed PC calculations. It seems pretty cheeky to carry out calculations using the very same flawed PC series and then argue that such calculations have anything to say about the effect of the flawed PC calculations.
2. the second is the no-PC network discussed yesterday in Errors Matter #1, in which all semblance of reasonably even spatial sampling was abandoned in favor of a network dominated (80 of 95) by American tree rings and, in particular, by bristlecone pines.
The Flawed MBH98 Network
It may seem impossible to believe that calculations using the original flawed PC series of MBH98 are now presented as evidence that the calculation errors do not "matter", but the evidence is incontrovertible once you wade through Rutherford et al. [2005]. The data set (as used in Rutherford et al. [2005]) is not archived, so one has to rely on the descriptions in the text. The article itself states that 112 indicators were used in the AD1820 step and 22 indicators in the AD1400 step. These are exactly the same number of indicators as used in MBH98. There is no statement in Rutherford et al. [2005] that the PC series were re-calculated to correct the erroneous PC method. Thus, at this point, the evidence suggests that the Rutherford et al. [2005] dataset is identical to the MBH98 dataset. I’ll check this if and when the Rutherford et al. dataset as used is archived, but there’s not a shred of doubt in my mind on this point.
The problem with the PC calculations was known to the authors a long time prior to publication, but they did not discuss this matter in the article. Although the authors were familiar with the problem,I suspect that the referees were not. Applying standards of a business prospectus, it would be the responsibility of the authors to discuss this problem in the article and to notify the referees of this potential problem. This does not seem to have been done here. Since the Rutherford et al. article had been brought to our attention, we pointed out the inadequate disclosure of these matters to the editor of Journal of Climate prior to publication of this article and suggested that the matter be brought to the attention of the referees. Instead of doing so, my understanding of subsequent correspondence is that the editor contented himself with the assurance of the authors that they stood behind their calculations. Given that the RegEM calculations have been carried out on the same flawed network, one would expect identical problems as MBH98 in relation to a lack of robustness to the presence/absence of bristlecone pines in the 15th century calculations in this method. The onus is surely on Rutherford et al. to disprove this, but they not only fail to do so, they do not even consider the matter.
Similarly, one expects the same problems as MBH98 for verification statistics – a spurious RE statistic (as discussed in our GRL article) and non-significant values of other statistics. As with MBH98, Rutherford et al. [2005] fail to provide a suite of verification statistics (e.g. the suite listed in Cook et al. [1994] in which Briffa and Jones were coauthors). Table 2 of Rutherford et al [2005] shows RE statistics against annual temperatures of 0.46 for their "20-year hybrid method" (0.40 – nohybrid). MBH98 itself reported an RE statistic of 0.51 against annual temperatures. As we reported in or GRL paper, our emulations of their method have only been able to achieve an RE of 0.46 (and this only by a variance scaling step not mentioned in MBH98.) These numbers here are pretty similar.
All of the criticisms of MM05 (GRL) regarding the lack of statistical significance in MBH98 will apply against the RegEM method:
1. the RE benchmark in MBH98 is flawed. Our simulations show that 99% RE significance (against "reconstructions" from simulated PC1s) is only reached at 0.56 (as compared with the 0.0 benchmark asserted in MBH98). As against a more accurate benchmark, the RegEM does not have 99% (or even 95%) significance;
2. we surmise that the RegEM reconstruction fails other verification statistical tests. If the reconstruction is as close to MBH98 as it seems (and since it uses the same flawed network, this is likely), we expect the R2 statistic to be near 0.0 – obviously insignificant. Circumstantial evidence that the R2 statistic is very low under this method is provided by the simple fact that it is not disclosed in the paper itself; it is not reported online; and, instead of providing the statistic, the authors have provided an online supplement complaining about the R2 statistic. I will discuss this curious contribution on another occasion, but, for now, merely observe that, if the R2 statistic had been high, I feel confident that the authors would have reported it without editorializing. I also feel that the authors should report the R2 statistic and let the readers decide in that context. However, an R2 of 0.0 does tend to scare off many readers and perhaps that’s why it’s left unmentioned.
3. Rutherford et al. promise to provide a CE statistic in an online supplement (which will be interesting), but have not done so yet. We do not know why they didn’t report it in the paper itself. If the CE statistic is significance, I would have expected that the R2 statistic would also be significant. Conversely, since it appears that the R2 statistic is insignificant, I would expect that the CE statistic, when reported, will be insignificant.
The No-PC RegEM Calculation
Rutherford et al. also report on the application of RegEM to a no-P
C network. I will discuss this only briefly since all of the pertinent issues have already been dealt with. Yesterday, I showed that the no-PC network under MBH98 regression methods was an abandonment of any attempt at reasonably even spatial sampling, was not robust to the presence/absence of bristlecone pines and lacked statistical skill.
Obviously, the no-PC network with a RegEM back end still abandons any attempt at reasonably even spatial sampling and this single failure is really all that matters – 80 of 95 proxies are North American tree rings and bristlecone pines dominate. Again, Rutherford et al. fail to show that this method is robust to the presence/absence of bristlecone pines. I have been unable to locate any statistics on this reconstruction — not even an RE statistic. Instead, they provide a figure (Figure 2) which supposedly illustrates that the calculation is “nearly indistinguishable” from MBH98. The illustration is quite strange since the no-PC reconstruction (“allproxy”) is only shown up to the mid-19th century and thus provides no assistance towards even a visual estimate of the performance in the calibration and verification period. Our surmise is that the RE statistic will be similar to the RE statistic of the no-PC reconstruction discussed above and that the R2 statistic will be insignificant.
Thus, neither of the reconstructions in Rutherford et al. [2005] meet the various standards and representations of even spatial sampling, robustness and statistical skill, which induced the widespread acceptance of MBH98.
In passing, I’m pleased that Rutherford et al. have promised to archive Matlab source code for this article, although the source code is not yet archived. (I may even take some small credit for drawing attention of the paleoclimate community to the need to improve this type of audit trail). However, the willingness of the Hockey Team to archive the source code for Rutherford et al. [2005] points out even more starkly their obdurate refusal to archive code for MBH98, which becomes then all the more puzzling.
2 Comments
I understood that some of the MBH98 dataset had been lost. Wouldn’t that make it impossible for it to be reused in the 2005 study. It would then become somewhat strange that the number of indicators in the two steps remain the same. wouldn’t it?
Steve: No, I’m satisfied that the MBH98 data is adequately archived. The dispute is over their source code, as no one (including me) has been able to replicate their results. If you look at the various Replication posts (see the Category in the right frame), I’ve reported on many replication problems. You may be thinking of the Crowley dataset which has been “misplaced”. I’ve posted some comments on the 2005 study – see Rutherford et al.
what happened next?