I’ve got something that’s a little bit amusing today. In MM03, we pointed out collation errors in pcproxy.txt (which I’ve recently hypothesized was used in the version of Rutherford, Mann et al  submitted in July 2003 and was laundered after MM03). We pointed out that the PC series all seemed to start one year too early (1799 instead of 1800) and then the missing 1980 values (the last year in the matrix) were filled from left to right, so that, in some cases, 8 series had identical 1980 values. Rutherford, Mann et al  said that these criticisms applied only to the "wrong" data set (while not mentioning that we were directed to this dataset at Mann’s FTP site by them). After MM03, they fixed the problems in their collation of the PC series for Rutherford, Mann et al  (without thanking us). But they made the very same goof in their collation of the instrumental series into their composite matrix. It’s too funny for words.
I’ve just started looking at Rutherford’s RegEM method and first noticed another goof right away, which I will describe first. If you go to the script mbhstepwiselowrecon.m in directory http://fox.rwu.edu/~rutherfo/supplements/jclim2003a/regem/multiproxypc-scripts/, towards the bottom of page 1 in a Notepad rendering of the script, you see programming steps which combine MBH98 proxy networks and 1008 temperature gridcell series (the proxies are placed in columns 1009 on). In the 2nd step (i=10) representing the AD1450 step in MBH98, Rutherford has incorrectly set the number of proxies (nproxies) as 93, when it should be 25. It’s correct in the companion file mbhstepwisehighrecon.m. As a result of the error, the composite matrix is made with too many columns and he tries to fill columns 1009:1111 with a matrix that’s only 25 columns. A question for Matlab users: in such a mismatch, does Matlab leave the extra columns as empty or does it circularly apply the smaller matrix to fill or does the instruction simply fail (as it would in R) leaving the matrix "composite" at whatever prior value that it had.
I notice that there’s a later comment in his script that:
% there are typically a few gridpoints in the infilled instrumental data without % any values because they couldn’t be inflled. Remove them here. % note this also removes empty proxy columns
Depending on how Matlab handles the above programming error, this might very well cooper up the prior error. It raises another question: why would some of the infilled instrumental data not have any values? Did anyone notice any mention of this in the Rutherford text? Some people may remember that in MM03 we pointed out that some of the MBH98 cells with "nearly continuous" histories had 0 values in HadCRU2. This would be an interesting topic to look at for someone. Maybe someone would write Phil Jones and ask him.
Anyway, just above this step, there is the following instruction:
On the face of it, there doesn’t seem to be anything objectionable about this. However, the devil is in the details and I’ll bet that there is a real clanger here. The gridcell temperature dataset archived here goes from 1854-1993 (and is the one used in MBH98). They’ve archived a different temperature dataset than the one that they actually used. The later HadCRU datasets start in 1856 and my guess is that they were using a temperature dataset that went from 1856 to 1998. Since the combined dataset starts in 1400 (not 1401), the correct collation instruction to insert a dataset going from 1856-1998 into a matrix starting in 1400 should have been:
If they used a HadCRU dataset starting in 1856 as appears almost certain (I’m unaware of any versions starting in 1855 although who knows what may materialize out of the blue), they’ve collated the instrumental data incorrectly [insert smiley] – it’s all one year too early. This is precisely the collation errors that we oberved in MM03 about the collation of principal component series in pcproxy.txt – where they had spliced in the PC series one year too early. In Rutherford, Mann et al., they said that some of the MM03 criticisms applied to the "wrong" data set (not mentioning that it was the data set at Mann’s FTP site that they had directed us.)
Astonishingly, the only errors described in MM03 that MBH98 might have avoided (the collation errors – see MM03 Scorecard) appear to have been made in Rutherford, Mann et al  – but in the instrumental series rather than the PC series. (All the other data problems of MM03, including incorrect PC calculations, carry forward pari passu to RM05.) Does this "matter"? In one sense, nothing "matters" to MBH98 except bristlecones (and you know how they squeal if someone tries to examine results without bristlecones). But geez, after all the commotion about collation errors, you’d have thought that even the gang that can’t shoot straight would make sure that this study didn’t have any clanging collation errors. And you’ve got almost the entire Hockey Team on the masthead of this study: Rutherford, Mann, Bradley, Hughes, Jones, Briffa, Osborn. What a tangled web indeed. I can almost hear the smoke rising from Mount Mann.