Bruce McCullough and Ross McKitrick today published an interesting article under the auspices of the Fraser Institute entitled Check the Numbers: The Case for Due Diligence in Policy Formation.
Their abstract states:
Empirical research in academic journals is often cited as the basis for public policy decisions, in part because people think that the journals have checked the accuracy of the research. Yet such work is rarely subjected to independent checks for accuracy during the peer review process, and the data and computational methods are so seldom disclosed that post-publication verification is equally rare. This study argues that researchers and journals have allowed habits of secrecy to persist that severely inhibit independent replication. Non-disclosure of essential research materials may have deleterious scientific consequences, but our concern herein is something different: the possible negative effects on public policy formation. When a piece of academic research takes on a public role, such as becoming the basis for public policy decisions, practices that obstruct independent replication, such as refusal to disclose data, or the concealment of details about computational methods, prevent the proper functioning of the scientific process and can lead to poor public decision making. This study shows that such practices are surprisingly common, and that researchers, users of research, and the public need to consider ways to address the situation. We offer suggestions that journals, funding agencies, and policy makers can implement to improve the transparency of the publication process and enhance the replicability of the research that is published.
They canvass an interesting selection of cases from different fields (and I alert readers that I don’t have the faintest interest in debating the pros or cons of the issues in these other studies at this blog and do not wish readers to debate these issues here.) They report quantitative results from McCullough’s replication work in economics, but most readers will probably take the most interest from their accounts of several high profile studies – the Boston Fed study, the Bellesiles affair and the Hockey Stick.
The “Boston Fed” study was apparently related to policy changes on subprime mortgages. (I don’t want people to debate the rights or wrong of the policy) only the replicability issue. It appears that data requests were refused and, reminiscent of our dealings with Santer, Jones, etc, authors interested in testing the results finally resorted to FOI requests. (Climate scientists resent this, but there are precedents.) MM report the denouement as follows:
Day and Liebowitz (1998) filed a Freedom of Information Act request to obtain identifiers for these observations so they could re-run the analysis without them. They also noted that the Boston Fed authors (Munnell et al., 1992) did not use the applicant’s credit score as generated by the bank, but had replaced it with three alternate indicators they themselves constructed, which Day and Liebowitz found had omitted many standard indicators of creditworthiness. Day and Liebowitz showed that simply reverting to the bank’s own credit score and correcting the 26 misclassified observations caused the discrimination coefficient to drop to zero.
Harrison (1998) noted that the Boston Fed data set included many more variables than the authors had actually used. These included measures such as marital status, age, and whether the application contained information the bank was unable to verify. These variables were significant when added back in, and their inclusion caused the discrimination effects to drop to zero even without correcting the data errors noted by Day and Liebowitz.
Thus, the original Boston Fed conclusions were eventually shown to be wholly insupportable. But due to various delays these studies were not published until 1998 in Economic Inquiry, six years after the original study’s release …
The Bellesiles story is also very interesting for blog readers. Clayton Cramer, Bellesiles’ nemesis was a software engineer – he profiles very much like a typical Climate Audit reader. Cramer eventually published in the journal, Shotgun News, which, according to recent statistics, has an impact factor lower than either Science or Nature.
Despite the political importance of the topic, professional historians did not actively scrutinize Bellesiles’ thesis. Instead it was non-historians who began the process of due diligence. Stephen Halbrook, a lawyer, checked the probate records for Thomas Jefferson’s three estates (Halbrook, 2000). He found no record of any firearm, despite the fact that Jefferson is known to have been a lifelong owner of firearms, putting into question the usefulness of probate records for the purpose. Soon after, a software engineer named Clayton Cramer began checking Bellesiles’ sources. Cramer, who has a master’s degree in history, found dates changed and quotations substantively altered. However, Cramer was unable to get academic journals to publish his findings. Instead he began sending articles to magazines such as the National Review Online and Shotgun News. He compiled an extensive list of errors, numbering in the hundreds, and went so far as to scan original documents and post them on his website so historians would check the original documents against the text of Bellesiles’ book (Cramer, 2006)
Bellesiles claimed to have examined hundreds of San Francisco probate records from the 1850s. When confronted with the fact that all the San Francisco probate records had been destroyed in the 1906 earthquake, Bellesiles claimed that he obtained them from the Contra Costa County Historical Society. But the Society stated that it did not possess the requisite records. Bellesiles soon resorted to ad hominem, claiming that the amateur critics could not be trusted because they lack credentials. Referring to Clayton Cramer, Bellesiles said, “It is not my intention to give an introductory history lesson, but as a non-historian, Mr. Cramer may not appreciate that historians do not just chronicle the past, but attempt to analyze events and ideas while providing contexts for documents” (Bellesiles, 2001). Note that Bellesiles could have, at any time, ended the controversy by simply supplying his data to his critics, something he refused to do.
Ultimately Bellesiles appears to have been brought down by the black and white fact that it was impossible for him to have consulted the records, said to have been consulted, because they didn’t exist. Anyone remember the claims in Jones et al 1990 to have consulted Chinese station histories that don’t exist, and the absurd claims of the coauthors to have lost the records that supposedly had been faithfully preserved through World War II, the Cultural Revolution… but it’s climate and Doug Keenan’s effort to pursue the matter got nowhere.
I raised one beef today with coauthor McK. The term “due diligence” is used to frame the discussion – as it usefully puts “(journal) peer review” in a more general context. However, Mc and Mc do not identify the first academic article to use this term in this context (tho the article was cited in passing on another matter.) The first such usage, to my knowledge, was, of course, McIntyre and McKitrick (2005 EE), which ended as follows:
We are also struck by the extremely limited extent of due diligence involved in peer review as carried out by paleoclimate journals, as compared with the level of due diligence involved in auditing financial statements or carrying out a feasibility study in mineral development. For example, “peer review” in even the most eminent paleoclimate publications, as presently practiced, does not typically involve any examination of data, replication of calculations or ensuring that data and computational procedures are archived. We are not suggesting peer reviewers should be auditors. Referees are not compensated for their efforts and journals would not be able to get unpaid peer reviewers to carry out thorough audits. We ourselves do not have explicit recommendations on resolving this problem, although ensuring the archiving of code and data as used is an obvious and inexpensive way of mitigating the problem.
But it seems self-evident to us that, recognizing the limited due diligence of paleoclimate journal peer review, it would have been prudent for someone to have actually checked MBH98 data and methods against original data before adopting MBH98 results in the main IPCC promotional graphics.
The issues raised in McCullough and McKitrick are important ones and presented in an engaging fashion (though I’m obviously a fellow traveller on these matters.) Ross was on one radio show already and, like any member of the public (as I once was ), the host was dumbfounded at the lack of due diligence in the chain.
A simple and virtually zero-cost improvement in the system would be one that we’ve long supported: require the archiving of data and code. The purpose of this requirement has been totally misrepresented by Gavin Schmidt – it’s not to parse for code errors, but to put yourself in a position where you can quickly analyse sensitivities or the impact of new data, without having to run the gauntlet of doing everything from scratch.
As I’ve said on numerous occasions, I do not think that the issue is primarily inadequate peer review, though, in my opinion, journal peer review all too easily lapses into POV gatekeeping of the style of Burger and Cubasch Referee #2 and academicians are far too quick to shrug that off. Journal peer review is what it is – a cursory form of due diligence. The issue is that “buyers” assume that it’s something that it isn’t and fail to exercise caveat emptor .