Here’s a discussion of replication policy posted up in the relatively early days of the blog, which I’ve re-posted in light of NASA spokesman Gavin Schmidt’s attempts to justify Hansen’s refusal to provide the source code used in his temperature calculations. It seems that these calculations are important enough to prompt a concern over the “destruction of Creation” but apparently only the elect will be permitted to see these calculations. The discussion of replication is based on experience in economics and social science unrelated to the present controversy but fully applicable to it.
We get considerable criticism from paleoclimate scientists that complying with requests for data and methods sufficient to permit replication is much too onerous and distracts them from "real work". However, the problem is not our request, but that any request should be necessary in the first place. In my opinion, a replication package should have been archived at the time of original publication so that any subsequent researcher can replicate the results without needing to contact the original author. From my personal experience, non-academics typically assume that there are adequate due diligence packages and find it difficult to believe otherwise. It appears that significant academic experience is necessary to instill a belief that a due diligence package is an imposition.
Right now, it is obviously practical and feasible to create replication archives at the time of publication and this is a mandatory requirement in some fields (econometrics). In business, adequate compliance with regulations is often based on available practices. So I’ve tended to think that if this is feasible in empirical econometrics, it is feasible in paleoclimate science (where the structure of the datasets is surprisingly similar to empirical econometrics). During all of this, I have remained firmly convinced that climate scientists will not be able to avoid complying with proper standards for archiving data and methods much longer.
When I first ventured into climate replication, my framework was one of business audits and feasibility and engineering studies, a framework which has been ridiculed by some academics (inappropriately in my opinion.) I’ve often described myself as feeling like an anthropologist in studying the behavior of climate scientists, because their standards of replication and audit (or lack of them) seem as foreign to me as tribal customs must have seemed to early 20th century anthropologists in the South Sea Islands. During this journey, I have encountered some commentary on replication from within the academic community (though not the climate science community), especially from Dewald et al., McCullough and Vinod and Gary King, which captures almost exactly what I had in mind. We referred briefly to this in our E&E article and I’ll expound a little more today on this.
In our E&E article we said:
The ability of later researchers to carry out independent due diligence in paleoclimate is severely limited by the lack of journal policies or traditions requiring contributors to promptly archive data and methods. King  has excellent comments on replication. In this respect, paleoclimate journal editors should consider changes taking place at some prominent economics journals. For example the American Economic Review now requires, as a precondition of publication, archiving data and computational code at the journal. This is a response to the critique of McCullough and Vinod , and earlier work by Dewald et al, . The files associated with paleoclimate studies are trivial to archive. In our view, if the public archive does not permit the replication of a multiproxy study, then it should be proscribed for use in policy formation [McCullough and Vinod, 2003].
In March 2005, I reported with some satisfaction that we had been quoted in Anderson et al , The Role of Data & Program Code Archives in the Future of Economic Research, available here, which pointed out:
For all but the simplest applications, a published article cannot describe every step by which the data were filtered and all the implementation details of the estimation methods employed. Without knowledge of these details, results frequently cannot be replicated or, at times, even fully understood. Recognizing this fact, it is apparent that much of the discussion on replication has been misguided because it treats the article itself as if it were the sole contribution to scholarship – it is not. We assert that Jon Claerbout’s insight for computer science, slightly modified, also applies to the field of economics: An applied economics article is only the advertising for the data and code that produced the published results.
Our E&E article especially referred to Gary King and McCullough and Vinod. Gary King has a website discussing replication here . His concept of a replication package, included among other things, data as used and source code, so that there was a permanent archive not simply for present reviewers, but future reviewers. King’s concern for future readers is significant – one of the arguments sometimes heard against examining MBH98 is that it ihas been superceded and that it is no longer pertinent to look at it. There are many arguments against the view that it is no longer being used, but here I point out King’s concern that published research be available to readers in the future.
The replication standard holds that sufficient information exists with which to understand, evaluate, and build upon a prior work if a third party can replicate the results without any additional information from the author." This was proposed for political science, along with policy suggestions for teachers, students, dissertation writers, graduate programs, authors, reviewers, funding agencies, and journal and book editors, in Gary King “Replication, Replication,” [pdf, or HTML] PS: Political Science and Politics, with comments from nineteen authors and a response, “A Revised Proposal, Proposal,” Vol. XXVIII, No. 3 (September, 1995): Pp. 443-499. Authors normally follow the standard by making available a "replication data set" to accompany each publication that includes the data, as well as details about the procedures to be followed to trace the chain of evidence from the world to the data and to the tables and figures in the publication. (Putting all this information in the article itself is preferable, but normally infeasible.)
King’s 1995 paper is online here and is worth reading in its entirety. Here is a replication standard proposed in this 1995 paper:
The first step in implementing the replication standard is to create a replication data set. Replication data sets include all information necessary to replicate empirical results. For quantitative researchers, these might include original data, specialized computer programs, sets of computer program recodes, extracts of existing publicly available data (or very clear directions for how to obtain exactly the same ones you used), and an explanatory note (usually in the form of a "read-me" file) that de scribes what is included and explains how to reproduce the numerical results in the article….
Possibly the simplest approach is to require authors to add a footnote to each publication indicating in which public archive they will deposit the information necessary to replicate their numerical results, and the date when it will be available. This policy is very easy to implement, because editors or their staffs would be responsible only for the existence of the footnote, not for confirming that the data set has been submitted nor for checking whether the results actually can be replicated. Any verification or confirmation of replication claims can and should be left to future researchers. For the convenience of editors and editorial boards considering adopting a policy like this, the following is a sample text for such a policy:
Authors of quantitative articles in this journal [or books at this press] must indicate in their first footnote in which public archive they will deposit the information necessary to replicate their numerical results, and the date when it will be submitted. The information deposited should include items such as original data, specialized computer programs, lists of computer program recodes, ex-tracts of existing data files, and an explanatory file that describes what is included and explains how to re- produce the exact numerical results in the published work. Authors may find the "Social Science Research Archive" of the Public Affairs Video Archive (PAVA) at Purdue University or the "Publications-Related Archive" of the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan convenient places to deposit their data. Statements explaining the inappropriateness of sharing data for a specific work (or of indeterminate periods of embargo of the data or portions of it) may fulfill the requirement. Peer reviewers will be asked to assess this statement as part of the general evaluative process, and to advise the editor accordingly. Authors of works relying upon qualitative data are encouraged (but not required) to submit a comparable footnote that would facilitate replication where feasible. As always, authors are advised to remove information from their data sets that must remain confidential, such as the names of survey respondents.
McCullough and Vinod have attempted to replicate empirical econometrics results. Thier experience was pretty reminiscent of my experience with paleocimate scientists, as will be seen below. The Hockey Team purports to excuse their non-production of data and methods to me on the basis that they would readily do so to "peers", but feel no obligation to provide information on data and methods to any Canadian businessman who happens to take an interest in their work. It seems irrefutable that their supposed willingness to produce detailed information on data and methods to "peers" has not been put to the test by any actual requests from "peers" else Crowley could hardly have "misplaced" his data and Mann would have known the URL of his MBH98 data (and the "peers" would presumably found the problems that I encounter.) As I mention repeatedly, I have never asked for anything that I did not believe was part of a due diligence package. The McCullough and Vinod experience shows that the problems are systemic to empirical science. Their analysis of why the problems exist are thoughtful as are their recommendations.
I’ve posted McCullough and Vinod online here. The first part is related to issues of nonlinear problem solvers, which are not necessary for replication issues. If you go to Part IV, you’ll find a terrific discussion of the problems of replication. McC and V:
we found that the lesson of William G. Dewald et al. (1986) has not been well-learned: the results of much research cannot be replicated. Many authors do not even honor this journal’s replication policy, let alone ensure that their work is replicable. Gary King (1995, p. 445) posed the relevant questions: [I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?…
Though the policy of the AER requires that “Details of computations sufficient to permit replication must be provided,” we found that fully half of the authors would not honor the replication policy. Perhaps this should not be surprising” Susan Feigenbaum and David Levy (1993) have clearly elucidated the disincentives for researchers to participate in the replication of their work, and our experience buttresses their contentions. Two authors provided neither data nor code: in one case the author said he had already lost all the files; in another case, the author initially said it would be “next semester” before he would have time to honor our request, after which he ceased replying to our phone calls, e-mails, and letters. A third author, after several months and numerous requests, finally supplied us with six diskettes containing over 400 files- and no README file. Reminiscent of the attorney who responds to a subpoena with truckloads of documents, we count this author as completely noncompliant. A fourth author provided us with numerous data. les that would not run with his code. We exchanged several e-mails with the author as we attempted to ascertain how to use the data with the code. Initially, the author replied promptly, but soon the amount of time between our question and his response grew. Finally, the author informed us that we were taking up too much of his time- we had not even managed to organize a useable data set, let alone run his data with his code, let alone determine whether his data and code would replicate his published results.
Replication is the cornerstone of science. Research that cannot be replicated is not science, and cannot be trusted either as part of the profession’s accumulated body of knowledge or as a basis for policy. Authors may think they have written perfect code for their bug-free software package and correctly transcribed each data point, but readers cannot safely assume that these error-prone activities have been executed flawlessly until the authors’ efforts have been independently verified. A researcher who does not openly allow independent verification of his results puts those results in the same class as the results of a researcher who does share his data and code but whose results cannot be replicated: the class of results that cannot be verified, i.e., the class of results that cannot be trusted. A researcher can claim that his results are correct and replicable, but before these claims can be accepted they must be substantiated.
This journal recognized as much when, in response to Dewald et al. (1986), it adopted the aforementioned replication policy. If journal editors want researchers and policy makers to believe that the articles they publish are credible, then those articles should be subject, at least in principle, to the type of verification that a replication policy affords. Therefore, having a replication policy makes sense, because a journal’s primary responsibility is to publish credible research, and the simple fact is that “research” that cannot be replicated lacks credibility…..
We chose recent issues of JIE and IJIO, and made modest attempts to solicit the data and code: given the existence of the World Wide Web, we do not believe that obtaining the data and code should require much more effort than a few mouse clicks. We sent either e-mails or, if an e-mail address could not be obtained, a letter, to the first author of each empirical article, requesting data and code; for IJIO there were three such articles, and for JIE there were four. Only two of the seven authors sent us both data and code…
As solutions to these problems, as part of a symposium on the topic of replication, King (1995) discussed both the replication standard, which requires that a third party could replicate the results without any additional information from the author, and the replication data set, which includes all information necessary to effect such a replication. Naturally, this includes the specific version of the software, as well as the specific version of the operating system.This should also include a copy of the output produced by the author’s combination of data/code/software version/operating system. In the field of political science, many journals have required a replication data set as a condition of publication.
Some economics journals have archives; often they are not mandatory or, as in the case of the Journal of Applied Econometrics, only data is mandatory, while code is optional. A “data-only” requirement is insufficient, though, as Jeff Racine (2001) discovered when conducting a replication study. As shown by Dewald et al. (1986), researchers cannot be trusted to produce replicable research. We have shown that the replication policies designed to correct this problem do not work. The only prospect for ensuring that authors produce credible, replicable research is a mandatory data/code archive, and we can only hope that more journals recognize this fact.
To the best of our knowledge the only economics journals that have such a policy are the Federal Reserve Bank of St. Louis Review, the Journal of Money, Credit, and Banking, and Macroeconomic Dynamics. The cost of maintaining such an archive is low: it is a simple matter to upload code and (copyright permitting) data to a web site. The benefits of an archive are great. First, there would be more replication (Richard G. Anderson and Dewald, 1994). Second, as we recently argued (McCullough and Vinod, 1999, p. 661), more replication would lead to better software, since more bugs would be uncovered. Researchers wishing to avoid software-dependent results will take Stokes’ (2003) advice and use more than one package to solve their problems; this will also lead to more bugs being uncovered. Finally, the quality of research would improve: knowing that eager assistant professors and hungry graduate students will scour their data and code looking for errors, prospective authors would spend more time ensuring the accuracy, reliability, and replicability of their reported results.
As a result of McCullough and Vinod, the American Economic Review has adopted a policy requiring a replication package, including both code and data, to be archived as a condition of publication.
It is the policy of the American Economic Review to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication. These will be posted on the AER Web site.
It’s not just paleoclimate scientists that don’t want people to check their empirical work. The time to deal with replication packages is at the point of publication – not in rear-guard actions after the fact, which require authors to try to remember what they did.
In some cases, journals have adequate policies on data in the narrowest sense (i.e. the numbers and not the code) e.g. our recent discussion of Science. The main problem appears to be administration. As King points out, this need not be onerous. One simple way of ensuring much improved compliance would be to add a form at the time of submission – say like Nature’s declaration of financial interests – in which the authors provide a link to a replication archive (which might be a private archive at the time of submission) with a warranty that they will transfer the replication archive to a permanent archive like WDCP between acceptance and publication. They would have to submit a second online confirmation verifying that the archive had been transferred to a permanent archive.
AGU has perfectly good data citation policies, essentially prohibiting the use of "grey" data. Unfortunately these are not folloed at AGU publications, like GRL or JGR. This is a slightly different issue than the replication archive as a full data citation includes the URL for a digital source version – citation of print publications for digital sources is not adequate under AGU policies for obvious reasons, but is still usual paleoclimate practice.