Ammann made a presentation at the same AGU session as me, spending a considerable amount of time criticizing us — though with nothing new to say that we haven’t already rebutted here and in print. There was time for one question (AGU is fanatical about schedules) and I was recognized. So here’s my question to you: if you had one question to ask Mann or Ammann on such an occasion, what would you ask? Think about it before reading my choice. (Interestingly. Ross, under different circumstances, independently made the same suggestion).
Here was my choice: what is the cross-validation R2 statistic for the 15th century MBH98 reconstruction? It’s not that I don’t know the answer. The answer is ZERO (well, 0.02). I just wanted him to say it out loud, in front of his professional colleagues.
You’re going to have to wait to see what his answer was, because I’m first going to take you through the history of previous requests for this information (or equivalent information from which it can be obtained i.e. the residual series for the 15th century reconstruction.) When I started writing this post, I thought that this would be a short detour, but the detour has taken on a life of its own, as the request has made to Mann himself directly, to N.S.F., through Climatic Change in 2004 in connection with reviewing a submission by MBH, to Nature, by Natuurwetenschap & Techniek to Mann, by the House Energy and Commerce Committee to Mann, by Climatic Change in 2005 to Ammann. It’s an interesting chronology and raises many interesting questions about journal practices. This history is in parallel to requests for source code, but it raises many of the same issues in even sharper focus as the (weak) arguments for non-disclosure of source code cannot be extended to the withholding of cross-validation statistics and/or the residual series from which they are generated.
I also invited Ammann out to lunch at AGU after the session and I’m going to eventually describe an interesting offer that I made to him after I get through the R2 request (and its seeming outcome).
Mann and N.S.F. – 2003
I’d lost track of how long I’ve been attempting the Hockey Team to produce residual series from which the cross-validation statistics could be calculated and just how many evasions have taken place. So before getting to Ammann’s most recent evasion, you’ll have to indulge a review of the bidding. I’ll move it along a little faster after today’s account of 2003 events.
We began our request for residual series as early as December 2003. These requests were distinct from concurrent requests for source code and other issues, which were then in controversy. So to put the residual requests in context, I also have to note the source code debate. The next stages won’t be as prolix.
Mann had claimed remarkable statistical “skill” for his reconstruction — claims that were not made for other multiproxy reconstructions and which undoubtedly led to the wide acceptance of Mann’s work. The IPCC even said
Averaging the reconstructed temperature patterns over the far more data-rich Northern Hemisphere half of the global domain, they [MBH98] estimated the Northern Hemisphere mean temperature back to AD 1400, a reconstruction which had significant skill in independent cross-validation tests. Self-consistent estimates were also made of the uncertainties. [my bold]
Despite the boldness of these claims, we made no attempt in our 2003 paper to test these claims. One of the reasons was simply that the focus of our 2003 paper was fairly narrow. First, we wanted to demonstrate the remarkable lack of data quality control in MBH98, including the use of obsolete data. Most of the issues raised there still remain unanswered, such as, for example, the amusing use of precipitation statistics from Paris, France in the New England gridcell (an issue dodged in the Corrigendum).
Second, we wanted to show the problems in the principal component calculations. At the time, we were unable to identify exactly what was wrong with the PC series in the data set at Mann’s FTP site — other than there was obviously something seriously wrong. We had identified the problem by simply trying to verify the PC calculations. We had re-collated hundreds of tree ring series from WDCP and made fresh PC calculations and compared the explained variance of the MBH series to the explained variance of the fresh calculations — showing that the MBH explained variance was much lower. Later we were able to determine that there were three different problems: they used a previously undisclosed stepwise method and in the archived data set, values from different steps were spliced — a nonsensical procedure (the stepwise procedure itself was not initially reported and is by no means a common, proven or well-understood statistical procedure); in addition to the incorrect splicing, the series were collated incorrectly, which we’d noticed from the bizarre identify of 1980 values; finally, they used the bizarre and undisclosed short-segment centering. Interestingly, the illustration in our 2003 article was actually from the Australia network (where stepwise issues were not a factor), and not the North American network about which so much has been written. Mann claimed that the splicing occurred only in the collation archived at his FRP site and not in MBH98 calculations themselves. He blamed this incorrect collation on our supposed request for an Excel spreadsheet (which we had not requested) and claimed that the incorrect collation had been prepared especially for us (it had not, as it was dated much earlier on the FTP site; Mann deleted the file and this date evidence). The source code archived last summer indicates that MBH98 itself probably did not use incorrectly collated or spliced PC series; however, the use of incorrectly centered series is definitely proven.
Thirdly, in our 2003 article, we wanted to show the impact of freshly calculated PC series and non-obsolete data (and, as it turned out, non-manipulated data) on the final NH temperature reconstruction. Now we had thought that we had made it completely clear that we had not offered an “alternative” reconstruction.
Without endorsing the MBH98 methodology or choice of source data, we were able to apply the MBH98 methodology to a database with improved quality control and found that their own method, carefully applied to their own intended source data, yielded a Northern Hemisphere temperature index in which the late 20th century is unexceptional compared to the preceding centuries, displaying neither unusually high mean values nor variability. More generally, the extent of errors and defects in the MBH98 data means that the indexes computed from it are unreliable and cannot be used for comparisons between the current climate and that of past centuries, including claims like “temperatures in the latter half of the 20th century were unprecedented,” and “even the warmer intervals in the reconstruction pale in comparison with mid-to-late 20th-century temperatures” (see press release accompanying Mann et al 1999) or that the 1990s was “likely the warmest decade” and 1998 the “warmest year” of the millennium (IPCC 2001).
Since we did not present an alternative reconstruction, but simply an implementation with freshly calculated PC series and updated data versions, it did not occur to us that cross-validation statistics were relevant to what was essentially an argument demonstrating that MBH98 results were not robust. In retrospect, BàÆà⻲ger and Cubasch , who refer approvingly to our work, have substantially expanded this approach. They point out that there are many methodological choices available within an MBH98-type algorithm (they describe them as “flavours”, identifying 64 different ones), with results differing between “flavours”. In fact, the list of methodological issues canvassed by BàÆà⻲ger and Cubasch does not include most of the methodological issues discussed in MM03, and the number of flavours increases exponentially. BàÆà⻲ger and Cubasch make the neat point that, if the RE statistic is used to choose between “flavours”, it cannot also be used as cross-validation. It’s a neat argument that should be kept in mind as we review the subsequent history.
After our first E&E article in October 2003, Mann immediately made public through David Appell the claims that we’d used the “wrong data”. The David Appell website no longer exists, but the claims are archived here. Suddenly at Mann’s FTP site there materialized a previously private data directory that differed from the data set at his FTP site to which we’d previously been directed and which Rutherford had been unaware. Mann claimed that we’d used the “wrong data”. (For the lugubrious early story, the correspondence with Mann prior to MM03 is here see and our contemporary assessment of this dispute is here
Concurrently, Mann made a less absurd response on the internet here. He claimed that we had incorrectly implemented his algorithm (not mentioning that he’d refused to respond to previous requests for a more adequate methodological description.)
This article also discusses cross-validation statistics (and I’ll return to this shortly and get this narrative off the ground a little better). But our first reaction was not on the cross-validation front, but simply to try to figure out what Mann was doing based on these comments. For example, he said that our emulation was flawed because we didn’t use 159 series — now 159 series was never mentioned anywhere in MBH98 or elsewhere. So we asked him for a listing of the 159 series and concurrently we asked for a copy of the source code, pointing out in quite reasonable and civil terms that we had no interest in pointless controversy over methodological details and this seemed like an effective way for us to reconcile any such methodological discrepancies. We made the requests here , reiterated here . Mann refused.
Not having any luck with Mann, we tried Bradley here, also without success
Having no luck with either, we tried with NSF on source code here and filed a Materials Complaint with Nature on a variety of questions. (We never did get any answers on these matters from Nature. We later re-iterated the request to Nature in August 2004, more on which tomorrow.
The source code dispute has received a lot of public attention. N.S.F. refused immediately on the basis that it was Mann’s personal property; Nature also refused. Mann told the Wall Street Journal that he would not be “intimidated” into disclosing his algorithm, but later produced source code for the House Energy and Commerce Committee that was inoperable with any existing data sets. Again this story has been discussed elsewhere.
Anyway, in December 2003, we turned our attention to the matter of the residuals, as a result of certain claims made in Mann’s Internet response here , where MBH countered that their reconstruction had significant skill (which they measured by an RE statistic), while “ours” didn’t, as follows:
MBH98 employed the standard statistical tool of cross-validation to verify the skill of their reconstructions. MM describe no such tests. Since increasingly sparse networks are used progressively farther back in time, a series of cross-validation experiments have to be performed to estimate the skill for different time intervals. For the AD 1400-1500 period, this involves, in MBH98, performing the reconstruction over the interval 1400-1901 based on calibration against the instrumental record over the interval 1902-1980, using the specific network of proxy indicators available for the AD 1400-1500 period. The reconstruction is then independently compared against the instrumental record over the interval (1854-1901) not used for calibration. The skill can be described (see MBH98) by a ‘Reduction of Error’ statistic (RE), which is bounded by negative infinity and positive one, with substantially positive numbers indicative of predictive skill. The mean expected value for a random estimate is -1.
For the reconstruction with the data eliminated in a manner similar to that implicit in the MM approach, the RE score (-6.6) is far worse than even a typical random estimate, and such a result would have been discarded as unreliable based on the cross-validation protocol used by MBH98. The anomalous warm values during the 15th century are the artifact of an entirely unreliable statistical estimate. By contrast, the MBH98 reconstruction indicates an RE of 0.42 for the 1400-1500 interval, indicative of significant predictive skill during that time interval.
This is one of the earliest attributions of an “alternative” reconstruction to us – something that is done both by right-wing commentators and Mannians for different purposes– a somewhat unholy alliance. We had thought that our disclaimer in the text had been very clear that we were not proffering an "alternative" reconstruction. However, for greater certainty, we provided the following additional statement in the FAQ to MM03:
Your graph seems to show that the 15th Century was warmer than today’s climate: is this what you are claiming?
No. We’re saying that Mann et al., based on their methodology and corrected data, cannot claim that the 20th century is warmer than the 15th century — the nuance is a little different. To make a positive claim that the 15th century was warmer than the late 20th century would require an endorsement of both the methodology and the common interpretation of the results which we are neither qualified nor inclined to offer.
You’d have thought that these two statements were pretty clear and we’ve continued to make similar statements. Most recently, after the Wall Street Journal editorial page incorrectly attributed an alternative reconstruction to us, Ross published a letter at the WSJ disassociating ourselves from having made an alternative reconstruction. It’s one thing to have journalists mis-interpret your statements, but it is far more pernicious when academic commentators like Mann or Ammann, who actually know better, nevertheless perpetuate the canard that we have made an “alternative” reconstruction, presumably because it provides a completely diversionary tactic. More on this later.
We started our consideration of verification statistics in December 2003 by a request to Mann for the residual series in the 15th century (and other steps), which had not been archived even at the new FTP directory. (The residual series, aside from being helpful in benchmarking replication, are obviously essential in doing statistical tests and necessary to validate any claims to statistical “skill”.) Now Mann had already refused to provide source code or to identify the 159 series, so I did not actually expect him to provide the residual series, but unless you ask, they can always use that as an excuse. Dan Verardo of NSF (from whom I’d sought assistance on source code was copied. )
Before Mann himself refused (which I expected but I still had to ask), Verardo wrote that Mann was not obligated to provide this information on the grounds that I was “free to [my] analysis of climate data and [Mann] is free to his.” Needless to say, Mann did not send the requested information.
So as early as December 2003, Mann and NSF had refused requests for residual series, necessary for verifying cross-validation statistics. I’ll proceed to discuss subsequent events with Nature, Climatic Change, Natuurwetenschap & Techniek and the House Energy and Commerce Committee with respect to Mann and then Climatic Change and AGU with respect to Ammann. I’ll try to move the segments along a little faster, but no promises. (The downside of blogging is that these notes are not highly edited. )