Shub Niggurath on Archiving Code

Excellent post by Shub Niggurath at his blog here discussing replication problems. It’s interesting to see how the same excuses play themselves out in different fields. Statisticians criticize authors for non-replicability of their results. The authors complain that the statisticians failed to replicate a previously unreported (and usually questionable) methodological procedure. We’ve seen this movie before.

Shub reports that Hothorn and Leisch, Case studies in reproducibility, in Briefings in Bioinformatics noted that one of our papers (MM 2005, EE) even included code in the running text of the paper to clarify certain points:

Acknowledging the many subtle choices that have to be made and that never appear in a ‘Methods’ section in papers, McIntyre and McKitrick go as far as printing the main steps of their analysis in the paper (as R code).

This entry was written by Stephen McIntyre, posted on Feb 27, 2011 at 2:24 PM, filed under Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

22 Comments

SM

Posted Feb 27, 2011 at 2:52 PM | Permalink

…”McIntyre and McKitrick go as far as printing the main steps of their analysis in the paper (as R code)…”

Yep. M&M do set a quality & integrity standard that’s hard to beat.
Jeff Id

Posted Feb 27, 2011 at 3:41 PM | Permalink

And that is what science should be. People write it in blogland all the time, but if you fear the release of your code then you fear the truth of your result. With the capacity for information storage currently available today, there is little room for people to claim that somehow code should not be disclosed. I throw nothing away. Not one thing. Data storage is cheap to to the point of being a non-issue entirely.

Any questions about disclosure of paper related code are moot unless the code is intended for sale. As is often the case in changing times, policies lag reality.
- Jeremy
  
  Posted Mar 1, 2011 at 10:04 AM | Permalink
  
  Code is usually smaller than data anyway.
  
  It is strange to see this all take place. I am specifically referring to the various fields of science reach an understanding that software is method, and if you’re not sharing your methods for getting your answer, you are not showing your work. I am a bit of an open-source software fanboy. The ideas that created the notion of intellectual property never held much water with me. Copyrights and Patents were intended to give original authors time to profit from their creativity, not to allow corporations to hold a population hostage forever through licensing restrictions and legal threats. It is amazing to see the pollution of ideas from the closed software world used to justify creating “invulnerable” results in science.
  
  It is science that should have been leading the way on the issue of fully open code sharing.
PJB

Posted Feb 27, 2011 at 4:46 PM | Permalink

Why would a reputable journal not insist on independent verification of code function BEFORE publishing?

If thermometers were known to be uncalibrated or otherwise inaccurate in the hands of practitioners, would they not…..errrrr speaking of proxies….never mind.
RBerteig

Posted Feb 27, 2011 at 5:55 PM | Permalink

I particularly enjoyed the description of an erroneous result that not only depended on the software used, but on the particular *version* of the software used. I change in default values of unspecified parameters changed the results.

That shows that they had not fully understood the implications of the default parameters of the algorithm used, let alone studied the sensitivity of their results to those parameters.
MrPete

Posted Feb 27, 2011 at 7:58 PM | Permalink

With scientists writing and using code for crucial results, careful code review, QA testing and more becomes increasingly important.

While the presence of significant errors may well be rare, the consequences of such errors can be huge… and go undetected for quite a long time. As nicely illustrated by the sign-error example in the linked articles.

I have my own example from a number of years back: I discovered an error in the core math library of a well-known software provider. The error was sufficiently subtle that it only affected about one in a thousand physical computers (it was random based on certain hardware aspects!) Because the vendor could not reproduce it quickly in their lab, they refused my bug report for more than five years… even though I could provide full proof of the issue and a working fix.

All it takes is for a subtle error to be present in the wrong place at the wrong time… and a huge amount of fallout can result.
- Eric Anderson
  
  Posted Feb 28, 2011 at 12:17 AM | Permalink
  
  Sheesh, that is an interesting, and scary, example. With all the complexity of computational systems (even the vendor lagged in your example) it makes it all the more important that others can reproduce on their machines.
  
  As to the comments in the article, yes, the code scientists write can be ugly and “bad,” but that certainly doesn’t mean it should be withheld — indeed, quite the opposite.
Al Gored

Posted Feb 28, 2011 at 3:22 AM | Permalink

Ditto to what SM said.

Hopefully the standard you set will become the standard. It should.
Craig Loehle

Posted Feb 28, 2011 at 10:57 AM | Permalink

I have found bugs in computations in Excel, Fortran, and Mathematica–with the latter being quick to fix them.
Also, Mathematica is a nice platform for documenting a full analysis. One can include not only the code and data (if not too large), but also the graphics and results, all in a single file.
j ferguson

Posted Feb 28, 2011 at 12:04 PM | Permalink

In his very excellent article, Shub reports the following recommendation by Victoria Stodden:

“We propose that high-quality journals such as Nature not only have editors and reviewers that focus on the prose of a manuscript but also “computational editors” that look over computer codes and verify results.”

Does anyone reading this think it would be practical to expect a reliable code review by pre-pub reviewers?

An absolute requirement that code be available without restraint concurrent with publication might be more effective. The hidden anonymous peer reviewers seem to have done the science little good, although possibly without them things could have been worse.
- Shub Niggurath
  
  Posted Feb 28, 2011 at 3:34 PM | Permalink
  
  My impression of the point Stodden’s trying to make is that simple reconstructibility of code-derived results and graphs in a paper should be a pre-requisite quality check before publishing, much as peer-submitted reviews are.
Richard

Posted Mar 1, 2011 at 11:40 AM | Permalink

This is a very telling quote from Shub’s fine blog:

“In his paper on reproducible research in 2006, Randall LeVeque wrote in the journal Proceedings of the International Congress of Mathematicians:

‘Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. Of course not all computational scientists are scoundrels, any more than all patriots are, but those inclined to be sloppy in their work currently find themselves too much at home in the computational sciences.'”
j ferguson

Posted Mar 1, 2011 at 11:50 AM | Permalink

Shub,
thank you again. But let me put it a different way.

Can any reader who is familiar with both the peer-review process and the character of the code we are contemplating comment on whether peer-reviewers can be expected to perform any sort of useful check on the basis that these reviews are presently done. Is it too much free work? Would such reviews really detect the aberrations?

Shub, I thought understood what Dr. Stodden was getting at. I just wondered if it was practical.
- Craig Loehle
  
  Posted Mar 1, 2011 at 1:54 PM | Permalink
  
  Reviewers can not be “expected to” based on most reviews I have gotten (some quite superficial) but some will. I know a chemist who actually duplicates experiments as part of his review.
- Shub Niggurath
  
  Posted Mar 2, 2011 at 8:21 AM | Permalink
  
  ferguson,
  Sorry I did not fully address your question.
  
  There are two things here obviously. Is it practical? Is it desirable?
  
  I think Stodden’s point is that, among all the checks for whether the paper’s basic claims are reproducible a reviewer can perform (which I know for a fact they don’t do anyway), checks on whether the coding works, is the easiest. So why not get it done?
  
  But in reality, – many a time – reviewers don’t know to code, can’t be bothered to put in hard work and offer genuine criticism (why should I do the authors’ work), eyes glaze over code and data, just want to offer generalized criticism, or more commonly, can recognize bad code when he/she sees it, but doesn’t want to go there too much, because their own coding is exactly as bad. Therefore putting this additional burden on reviewers wont be practical – especially for journals like Nature which want ‘efficient’ reviewers and not nosy people who ask too many questions. Secondly, journals might rightly see this as grunt-work which is properly the realm of the researchers and not theirs and therefore, do not want to take on additional responsibilities. In turn, many strongly feel that this should not be the case – given all the money journals make. See this interview with this researcher at Harvard about the dysfunctional scientific journal market(here)
  
  The other reason I can think of, as to why getting journals to run code verification is not a good idea is that, it artificially adds another imprimatur of credibility that scientific findings seem to gain just by publication.
  
  ‘Published in Nature? That means the code must have been checked. No need to look there then.’ or even worse, ‘We hold Nature partially responsible for this error. They provided a statement of code verification.’
  - Steve McIntyre
    
    Posted Mar 2, 2011 at 9:03 AM | Permalink
    
    I, for one, never suggested that reviewers be obligated to check code. The advantage of archiving code is that it documents all the methodological decisions so that someone seeking to replicate the results can more efficiently reconcile. From my own experience, archiving code at the time of publishing an article is helpful to the author since it’s all to easy to forget precisely what you did. And to overwrite the version of code that you used to get your results.
    
    In the first code that we archived, I didn’t try to have the code do more than act as documentation. However, it later became clear to me that you could easily archive code that was fully turnkey and that readers were interested in being able to generate results for themselves and that this helped them understand the article.
  - HaroldW
    
    Posted Mar 2, 2011 at 10:44 AM | Permalink
    
    I have to agree with Shub that the reviewers who are most likely to understand the thrust of the paper, and the relationships with other publications, are not likely to be expert in software. The purpose of publishing the code is to show the steps actually taken, rather than a description which is only approximate. Our host can expound at length on algorithmic descriptions which are kinda-sorta correct, but which overlook key implementation details (or which were implemented incorrectly). See http://climatecode.org/blog/2011/03/why-publish-code-a-case-study/ for an example of why kinda-sorta descriptions don’t cut it.
    
    And lest one think that the benefit of code publication is just for those who wish to replicate results, I have to second Steve’s observation that clear, well-organized and well-commented code not only serves the reader, but also the author. It’s common, probably universal, that one doesn’t remember the exact steps even in one’s own program, when one returns to it after an interval of a few months. A lesson I learned long ago.
  - mrsean2k
    
    Posted Mar 2, 2011 at 12:52 PM | Permalink
    
    “Is it practical?”
    
    Generally speaking? No.
    
    There are two aspects to checking that software is correct in this (and most other) contexts:
    
    1) Whatever it is I’m trying to achieve, is it the right thing to attempt in the first place? (are my requirements understood, is my choice of algorithm suitable?)
    
    2) When I try to achieve it, is my implementation correct and free from errors and unwanted side-effects?
    
    Colossally difficult to achieve even for people who’s business is software development; in the hands of a visiting programmer or dilettante, they are unlikely even to know what they don’t know.
    
    The *best* you can do in this scenario, IMO, is to set standards for development in general when code is part of a submitted paper – formatting, naming, scoping, documentation, methodology, test scripts and results etc. etc.
    
    If the included code does not adhere to the accepted best practice of the language at hand, the paper should be rejected out of hand until the software *does* pass at least this shallow hurdle for quality.
    
    This has a twofold effect:
    
    1) it forces some degree of reflection and adherence to practices that are likely to avoid the most obvious errors
    
    2) it makes the job of anyone deciding to attempt any in-depth analysis for more subtle errors that much easier, by virtue of excluding the donkey work of parsing random awful code.
j ferguson

Posted Mar 1, 2011 at 11:51 AM | Permalink

add “I” between thought and understood in the comment above.
MikeN

Posted Mar 1, 2011 at 4:32 PM | Permalink

This is a horrible idea. Now if an error is made people are less likely to catch it, as they will just be blindly reusing the same code. Having people write their own code based on the algorithm given is a better idea.
If skeptics aren’t able to reproduce results, it just means they aren’t qualified to speak on the science.
- John M
  
  Posted Mar 1, 2011 at 6:26 PM | Permalink
  
  If skeptics aren’t able to reproduce results, it just means they aren’t qualified to speak on the science.
  
  Yup, just like all those skeptics that couldn’t reproduce cold fusion.
  
  BTW, how’d that argument work for Eric Steig and Michael Mann, the “we’re not statisticians” duo?
Gilbert K. Arnold

Posted Mar 1, 2011 at 6:04 PM | Permalink

When ever I write code (not very often), I keep in mind the simple phrase:

“MEATBALL LOGIC, GENERATES SPAGHETTI CODE”… words to live by…

Perhaps a few climate researchers should have this tattooed on their foreheads in such a manner that it is readable in the mirror.