Archiving Standards at the Journal of Political Economy

There is an interesting controversy at Nature and Science about peer review in the context of Hwang’s stem cell research (google for links.) I’m going to post a comment about this in light of my own experience with both. First, I want to post some information (courtesy of a reader here) about archiving policies at the Journal of Political Economy. For paleoclimate studies, there is absolutely nothing intrinsic to the subject which prevents the implementation of similar policies by Nature or Science as a “best practices” standard. In my view, the focus of invetigation by the two journals in respect to Hwang is not about “peer review” in the abstract, but whether the journal policies meet “best practices” standards.

I’ve previously pointed out the requirement for archiving data and source code by the American Economic Review (in response to Bruce McCullough’s work). See, for example, CA posts here here.

The Journal of Political Economy has similar comprehensive policies requiring archiving of data as used and source code as a condition of publication. They state here:

It is the policy of the Journal of Political Economy to publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers that contain empirical work, simulations, or experimental work must provide to the Journal, prior to publication, the data, programs, and other details of the computations sufficient to permit replication. These will be posted on the JPE Web site. The Editor should be notified at the time of submission if the data used in a paper are proprietary or if, for some other reason, the requirements above cannot be met. Details of this policy can be found at http://www.journals.uchicago.edu/JPE/datapolicy.html.

The latter link specifies, inter alia:

For econometric and simulation papers, the minimum requirement should include the data set(s) and programs used to run the final models, plus a description of how previous intermediate data sets and programs were employed to create the final data set(s). Authors are invited to submit these intermediate data files and programs as an option; if they are not provided, authors must fully cooperate with investigators seeking to conduct a replication who request them. The data files and programs can be provided in any format using any statistical package or software, but a Readme PDF file documenting the purpose and format of each file provided, and instructing a user on how replication can be conducted, should also be provided.

If the paper is accepted by the JPE, the appendices containing instructions, the computer programs, configuration files, or scripts used to run the experiment and/or analyze the data, and the raw data will normally be archived on the JPE Web site when the paper appears.

This is obviously not an unattainable or utopian goal for paleoclimate (or for IPCC), since it is a standard in effect for highly similar datasets. Here I emphasize the great formal similarity in size and computer programming between applied economics and paleoclimate: one is dealing with relatively small data sets; both are usually autocorrelated (although that’s not relevant to the archiving issues), versions of the data sets may differ so exact citation is important; it is impractical to exhaustively explained methodologies in words sufficiently to permit complete replication.

This entry was written by Stephen McIntyre, posted on Dec 26, 2005 at 9:57 AM, filed under Archiving, Disclosure and Diligence. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

11 Comments

Dave Dardinger

Posted Dec 26, 2005 at 11:14 AM | Permalink

The Editor should be notified at the time of submission if the data used in a paper are proprietary or if, for some other reason, the requirements above cannot be met.

The question I have about the above which doesn’t seem to be covered in the quotes is whether this means that a researcher, say Mann, need merely say that his data / methods are proprietary to escape the necessity to archive them? If so that’d be rather a large loophole. What would the journal do if faced with such a claim?
Armand MacMurray

Posted Dec 26, 2005 at 1:57 PM | Permalink

I would hope that the “notification” would be a rare exception, used perhaps to negotiate an access method to proprietary data for research use only, rather than a loophole. As with most oganizations, what really matters is to have an Editor who will strictly enforce the archiving requirements.
Ray Soper

Posted Dec 26, 2005 at 2:04 PM | Permalink

In the commercial world, in the area of due diligence for example, the onus is on the proponent to find a way to demonstrate that the statements being made are in fact true.

Presumably then, The Journal Of Political Economy will work with the author to find a way to demonstrate the truth of the conclusions drawn without breaching the proprietary nature of the data and/or methodologies. One way to do that might be to appoint an independent and suitably qualified expert to examine the data and methodologies under a confidentiality agreement. If the expert auditor finds that he is able to replicate the results, and that the data handling, programs, and methodologies are properly applied, then he would provide an independent report stating that. The report of the expert auditor MIGHT be sufficient to convince the journal that the paper is of a standard deserving of publication, despite the fact that the proprietary data and methodologies can’t be published for replication.

The auditor might also examine the reasons why the claim is being made to keep the data and methodologies proprietary, and offer an opinion regarding that matter as well.

These approaches would equally apply in any area of science. I will be interested to learn what solution The Journal of Political Economy does use to deal with these issues.

It is interesting to see some scientific journals adopting what is clearly proper practice, whereas some journals publishing papers in the area of climate science do not seem to see that their credibility ultimately depends on the practices that they adopt in this area.

Finally, another interesting aspect is how authors might argue that data/methodologies developed using public funds are proprietary in the first place. Presumably the custodians of the funds advanced also have an interest in ensuring that published work resulting from such exercises is properly audited.
John A

Posted Dec 26, 2005 at 4:08 PM | Permalink

Re: #3

I’d like to add: what on Earth is proprietary about tree ring records, coral samples and a program that is supposed to do standard statistical methodologies? Can these things be used by terrorists? Is the Holy Grail encoded in them?

What are the publishers of Science (the AAAS) and Nature thinking of? How many more scandals will it take before they implement proper procedures for data archiving and methodology transparency to permit proper replication? What are the excuses now?

In 2002 both journals were embroiled in another embarassing scientific scandal involving fraudster Jan Hendrik Schàƒ⵮. Subsequently to the investigation (by Bell Labs, Jan Hendrik Schàƒ⵮’s employer), Science had to withdraw 8 papers and Nature had to withdraw 7.

I keep getting invitations to join the AAAS so that I can subscribe to Science online. I’m just not convinced that Science is any better at distinguishing truth from fakery than the World Weekly News.
Paul

Posted Dec 26, 2005 at 4:31 PM | Permalink

If the researcher was funded through public funds, then, IMHO, there is no getting around this requirement. Public funds means that the researcher doesn’t “own” the research–the taxpayers do. But Dave does point out that there’s a rather large loophole available.
John S

Posted Dec 26, 2005 at 4:57 PM | Permalink

Proprietary data sets and datasets containing confidential information occur frequently enough. For example, unit record data from census have confidentiality restrictions on them which mean that researchers can’t provide access to the data directly – it has to be obtained from the Census Bureau. In these cases, the programs used can be provided but access to the data has to be obtained independently through the originating organisation. This is usually not much of a problem (but can be a problem for non-citizens, for example).

Sometimes private companies may provide access to their own data to a particular researcher. For example, State Street has data on all international transactions it has been involved in (most of them). Some researchers have gained access to this data and used it for academic research. Access to the raw data is unavailable generally and I am not certain what procedures were followed by the journal which published it to ensure integrity. However, the issues it raises should make it clear that easy answers are not always available – a private company shouldn’t have to provide access to all and sundry just because it does so once, and research would be detrimentally affected if such research was prohibited entirely (as would happen if you required that any research published must have publicly available data in it). At the least, claims for restrictions on access to the data should not be frivolously made and not granted easily.
Steve McIntyre

Posted Dec 26, 2005 at 8:33 PM | Permalink

One of Bruce McCullough’s interesting arguments for archiving of data and methods is that be eliminating an important barrier to verification, it reduces the cost of replication and would contribute to more such studies being done. Obviously one of the obstacles to replicating multiproxy studies is the simple unavailability of data and methods. Merely improving disclosure would help avoid such problems. One of the other requirements of the JPE is a statement of selection protocols. This is an important issue where I’ve been completely stonewalled by Nature in respect to Mann’s refusal in this area – I know it’s hard to keep track of the miscellaneous issues, but that’s just because there are a lot of them.
TCO

Posted Dec 26, 2005 at 11:16 PM | Permalink

The bottom line is that written explanation of methods is not adequate. Both by common sense and by example. Would you let a math paper skip the steps in a theorom?
John A

Posted Dec 27, 2005 at 5:18 AM | Permalink

Re #8

“Then a miracle occurs…”
David Pannell

Posted Jan 29, 2006 at 9:37 PM | Permalink

As a journal editor I had to seek out the following information. I thought it made an interesting supplement to this discussion. It summarises the policies of three other leading economics journals re lodging data and software, particularly where there are reasons for wishing to withold some data.

Econometrica
http://www.econometricsociety.org/submissions.asp
“the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use data sets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation data sets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. ”

American Economic Review

http://www.aeaweb.org/aer/data_availability_policy.html
“If some or all of the data are proprietary and an exemption from this requirement has been approved by the Editor, authors must still provide a copy of the programs used to create the final results. We require this because the criterion for exemption from the data availability policy is that other investigators can, in principle, obtain the data independently. These authors must also provide in their Readme PDF file details of how the proprietary data can be obtained by others.”

Review of Economic Studies
http://www.blackwellpublishing.com/submit.asp?ref=0034-6527
“The Review of Economic Studies will publish papers only if the data used in the analysis are clearly and precisely documented and are readily available to any researcher for purposes of replication. Authors of accepted papers which contain empirical work, simulations, or experimental work, must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication. These will be posted on the Review of Economic Studies web site. We reserve the right to refuse publication of papers whose authors do not comply with these requirements.”
Steve McIntyre

Posted Jan 29, 2006 at 9:50 PM | Permalink

It is remarkable that supposedly authoritative journals such as Science and Nature have lower standards than economic journals.