Nature’s Statistical Checklist for Authors

Nature’s Guide to Authors includes an excellent statistical checklist which authors are asked to comply with to "ensure statistical adequacy". I’ve reproduced the checklist below, bolding a couple of interesting criteria. Readers of this blog can readily imagine how this checklist would apply to MBH98 or, for that matter to Moberg et al [2005].

One wonders sometimes if the left hand knows what the right hand is doing at the big science journals. Nature’s handling of statistics reminds me of Science’s handling of data archiving. In both cases, the policy is terrific, but neither journal seems to have any procedures for implementing the policy for paleoclimate articles. Maybe they are better on medical and biological topics.

As you see below, Nature has a policy requiring that "Any data transformations are clearly described and justified". Whatever else one may think of our criticism of Mann’s PC method, it remains unarguable that the PC methodology used in MBH98 was not "conventional" and that the data transformation was not "clearly described and justified." Obviously the editors and reviewers were unaware of this at the time of the original article. But what about at the time of the Corrigendum? At this time, Nature editors were clearly aware of the data transformation prior to the PC calculations. Let’s say that they, in good faith, felt that our own submission had not demonstrated that the data transformation "mattered" in terms of its ultimate effect. That does not excuse them not insisting on a proper description and justification of the data transformation in the Corrigendum.

One could go on and on. I think that I’ve pointed out the small size of the Moberg data set as well as the extraordinary non-normality of key data sets. It is inconceivable to me that Moberg et al. could have considered the Nature statistical checklist below and reported that they were in compliance with it. So one presumes that, at no point in the Nature editorial process, did they ever ask the authors to confirm that they had carried out the statistical checks listed in Nature’s policy or check to see that they had. Just imagine what a questionnaire on MBH98 would look like.

The nice thing about policy statements like this is that they give objective standards for evaluating articles like Moberg et al 2005 or even MBH98. I think that I’ll submit Nature’s checklist to the NAS panel.

Type and applicability of test used
· Comparisons of interest are clearly defined
· Name of tests applied are clearly stated
· All statistical methods identified unambiguously
· Justification for use of test is given
· Data meets all assumptions of tests applied (with particular attention paid to non-normal data sets or small sample sizes, which should be identified in the text as such)
· Adjustments made for multiple testing is explained

Details about the test

· n is reported at the start of the study and for each analysis thereafter
· Sample size calculation (or justification) is given
· Unit of analysis is given for all comparisons
· Alpha level is given for all statistical tests
· Tests are clearly identified as one or two-tailed
· Actual P values are given for primary analyses

Descriptive statistics summary

· n for each data set is clearly stated
· A clearly labeled measure of center (e.g. mean or median) is given
· A clearly label measure of variability (e.g. standard deviation or range) is given
· All numbers following a ± sign are identified as standard errors (s.e.m.) or standard deviations (s.d.)

Anomalies

· Any unusual or complex statistical methods are clearly defined and explained for Nature’s wide readership. (Authors are encouraged to use Supplementary Information for long explanations).
· Any data exclusions are stated and explained
· Any discrepancies in the value of n between analyses are clearly explained and justified
· Any method of treatment assignment (randomization etc) is explained and justified
· Any data transformations are clearly described and justified

Within individual graphs:

Distortions

· Any distorted effect sizes (e.g. by truncation of y axis) are clearly labeled and justified Clear labelling
· Error bars are present on all graphs, where applicable.
· All error bars are clearly labeled

Many statistical analyses published in Nature are highly sophisticated and outside the scope of this checklist, particularly in the case of some studies in physical sciences disciplines. Authors and referees who have specific suggestions for additional entries to this list are encouraged to send them by email to authors@nature.com or referees@nature.com.Nature
will update this checklist at intervals in an effort to ensure that papers published are statistically robust.

This entry was written by Stephen McIntyre, posted on Apr 1, 2006 at 8:01 AM, filed under Disclosure and Diligence, Nature, Statistics and tagged nature, naturemag, Statistics. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

37 Comments

kim

Posted Apr 1, 2006 at 8:47 AM | Permalink

It is more than a little disconcerting to read how poorly the policies are followed. Hoi polloi understand the evil in that because who hasn’t, in their workplace, heard or said “Let’s check policy”. Are not the editorial offices a workplace?
=============================
J Demesure

Posted Apr 1, 2006 at 12:12 PM | Permalink

I tried to check Mann et al’s rebutal to YOUR rebutal on realclimate.com. and I find it’s just rhetoric to try to drown the readers in a deluge of technicalities and to divert your objection.

When I read his “simple explanation for the readers’ mothers”, it makes me think of a French famous humorist who says: this is someone who is bright. So bright that when he finishes his reply, you forget what your question was.

Have you submitted this problem to some mathematicians or staticians, so they can add some simple arguments to object to Mann et al ?
Pat Frank

Posted Apr 1, 2006 at 1:20 PM | Permalink

If Nature has such standards, and clearly they do, and yet do not abide by them, then the trust that is placed in their reported science is misplaced. Nature, by carelessness, or incompetence, or by design, is committing fraud.

Here’s another requirement you should have bolded, Steve, applicable specifically to dendroclimatology: “Any data exclusions are stated and explained.” Even following Rob Wilson’s reply, the exclusion of non-responder series is no more than tendentious.

“Preview” is now working again, thanks.
Steve McIntyre

Posted Apr 1, 2006 at 1:44 PM | Permalink

It’s pretty amazing to see these policies, which I wasn’t aware of.

Take a look at this post http://www.climateaudit.org/?p=346 – scroll down to the bottom and see the qqnorm plots for the 11 low-frequency series. could you imagine someone looking at this data and warranting that it met standards of normality. The non-normality is not incidental, because it is the most non-normal series that are both not calibrated to temperature and which drive the very slight modern-MWP differential in the proxy portion of Moberg.

This has always bothered me, but it’s one thing to object to the non-normality of the data in abstract terms and quite another to object to a Nature study failing to comply with policies on statistical testing for normality.

There’s another Nature policy which I’ll up to the note which states that referees will be asked to review compliance with statistical standards. I’ll bet that the paleoclimate referees for Moberg were not provided with this “statistical checklist” and did not apply it. But in administrative terms, the onus should not be on the referees to pick up non-compliance; the onus should be on the authors seeking publication to warrant compliance through some sort of form and then to provide data and code in a format that any referee or reader can verify the normality or non-normality without spending 2 years in quasi-litigation (which may or may not prove successful).
John A

Posted Apr 1, 2006 at 3:12 PM | Permalink

It has these requirements for authors, but what about the “peer” reviewers?
Pat Frank

Posted Apr 1, 2006 at 3:18 PM | Permalink

#4 That’s an amazing display, Steve. Of the 5 series with the poorest normalization — Agassiz, Conroy, Tsuolbmajavri, Beijing, and Arabian Sea — three of them (Ag, Beij, and Arab) have very pronounced positive anomalies toward modern times. Especially Agassiz, which also displays the widest divergence from normal. As a scientist, I plain don’t understand the disparity in data outcomes. I’m no expert in those normalization methods, and so can’t say whether the method was applied but didn’t work properly in those cases. I also can’t say whether the lack of proper normalization would have been easily overlooked, or whether the normalization procedure doesn’t work well with terminally non-linear data. That all said, it’s a mystery to me how someone expert in the method could have missed these critical assessments of the different series before going on to submit the results.

One other point: The onus is on the authors to see that all the methods and assays were properly done before submission. However, prior to acceptance and publication, the onus is on the journal editor to verify that all the policy requirements have been met. If there is a huge error in a published paper, the ultimate responsibility for the fact of publication is the journal editor’s. In the case of Hwang, for example, Kennedy had an ethical obligation to accept responsibility and resign (which he failed). So did the associate editor in charge of that paper. Likewise, I think heads should roll over the scandal of dendroclimatology. You have opened a very necessary can of worms, Steve. The outcome will be an ethics test for the journals and their editorial boards. We’ll all get to see the results, and grade them accordingly.
jae

Posted Apr 3, 2006 at 1:54 PM | Permalink

Off thread, but I just did some time at RC, reading a post and some 50-odd comments that attempt to trash a recent article by George Will on GW (which I thought was a good editorial). Wow, not one comment defending anything Will had to say, absolutely no real discussion, no science, only “preaching to the choir.” It’s good to get back here where science is being discussed and I can learn something.
Paul

Posted Apr 3, 2006 at 3:59 PM | Permalink

RE #4 & #6:

Maybe it’s my lack of experience with some of these things but I’d like to try and understand something. It appears, from modern readings of paleoclimate “proxies” that there seems to be less “divergence” than we have in the modern period. Why would not there have been similar “divergence” in the past? Could not the effects of time (physical pressures, time passing, etc) “squash” the “divergent” signals into what appears to be a coherent single “converged” signal? Could this explain why as we go farther back in time things get “simplified”–an example would be that if we were to listen to a symphony and then listen again with only one instrument playing its part, would we have something very different than the ensemble. We couldn’t say that one part was the symphony. Similarly, with climate proxy data, aren’t we only looking at a few “instruments”. The whole “symphony” that makes climate is missing.
Doug L

Posted Apr 3, 2006 at 4:15 PM | Permalink

Re #7

Good place to put this, the choir is unlikely to look on this thread. 🙂

I suspect Will is aware of what the NAS panel heard, and will have more to say later. They seem to be having an awfully strong reaction to what a die hard conservative has to say.

I wonder if anyone has ever mentioned the NAS probe over there?

No doubt anyone referring to it as a “probe” over there would be branded for life!
John A

Posted Apr 3, 2006 at 5:13 PM | Permalink

Re #9

I’d describe it as a cocktail party interrupted at regular intervals by discussions about climate science.

No, I’m not optimistic that the NAS Panel will produce any substance because frankly, they didn’t go for substance.
kim

Posted Apr 4, 2006 at 4:49 AM | Permalink

Still off thread, but I comment on Tom Maguire’s JustOneMinute blog and when someone brought up Will’s column on climate there were immediatley 5-6 posters who seem to know what is happening over here. There was the obligatory argument to authority from someone who claimed that the Real Climate site was where real climate scientists post, but the poster stepped in it by claiming that Mann had answered your criticism, corrected his error, and the results were the same.

So carry on; the intelligentsia is catching on.
=========================
kim

Posted Apr 4, 2006 at 5:00 AM | Permalink

A poster over there brings up a point I’m sure can be most easily answered here. Is it true that most of the climate models being used to terrify us are based off the same bunch of complex, but antique, Fortran code?
===============================
Dave Dardinger

Posted Apr 4, 2006 at 7:33 AM | Permalink

re: #12

Could be worse, it could be antique Cobol. But seriously, Fortran is much loved of scientists and besides all computer languages are isomorphic under the hood.
George

Posted Apr 4, 2006 at 8:15 AM | Permalink

Re #12. I must defend Fortran, which continues to defy rumors of its impending death. It’s gotten updated significantly and is now quite effective. According to Dr. John Prentice of Quetzal Associates:

“The main reason C++ has attracted the attention it has in the scientific community is because Fortran 77 was a terribly outdated language. The many weaknesses of Fortran 77 were solved with Fortran 90 however. Fortran 90 has every feature in C that is important to scientific programming and most of the features of an object oriented language (it lacks only inheritance and that is likely going to be added in Fortran 2000). However, unlike C and C++, Fortran 90 is designed to generate executable codes that are highly optimized and thus run extremely fast.”

Fortran is also well-suited for parallel processing (as in the use of armies of smaller computers to run climate models).

In addition, Fortran is written in a plain-English style that makes it relatively easy to understand what the programmer was doing (unlike the really cryptic languages).

Fortran is actually neither complex nor antique. But it is stable, fast, and effective. Can the same be said of climate models? Well, that’s another story…
TCO

Posted Apr 4, 2006 at 8:57 AM | Permalink

I like Basic actually. My uncle used to some interesting thermal/mechanical modeling (NOT FEA) for steel plant roller designs. His opinion was that it’s really all in the algorithm (equation) anyhow. That’s where the thinking/problem dependance is. All that a computer language does is looping and if-then else. He would tell his pointy-haired boss that he was using C++, and then run all the stuff in Basic.
jae

Posted Apr 4, 2006 at 9:25 AM | Permalink

Yeah, Fortran has come a long way. My first computing experience was in Fortran II, in the ’60s, complete with boxes of punched cards and enormous computers that filled rooms.
Greg F

Posted Apr 4, 2006 at 9:57 AM | Permalink

Fortran 90 has every feature in C that is important to scientific programming and most of the features of an object oriented language (it lacks only inheritance and that is likely going to be added in Fortran 2000).

No, Fortran 90 also lacks the Class and function template feature.

Certain useful non-OOP features are found exclusively in both languages. However, all of the additional features of Fortran 90 can be readily included in C++ by the development of class libraries; that is, without any language extensions. In contrast, the inclusion of the important features exclusive to C++ in Fortran 90 will require new language syntax. The most critical feature missing in Fortran 90 is the template, which allows C++ programmers to build portable, highly reusable libraries and to dramatically improve the efficiency of the evaluation of expressions involving user-defined types.

===============================================

However, unlike C and C++, Fortran 90 is designed to generate executable codes that are highly optimized and thus run extremely fast.”

Simply not true.

Nevertheless, for a wide variety of computational kernels running on several different computing platforms, one finds that there is little or no inherent performance advantage acquired by sticking to a procedural language such as F77 or C vs. an object-oriented language, or by choosing Fortran 90 over C++. To verify this statement, we have thoroughly examined the Livermore Fortran Kernels (LFK). The LFK is a well-known set of computational kernels that exhibit a wide range of performance.

The main advantage Fortran has over C++ is the learning curve is not as steep. In programming efficiency (time it takes to write), with a competent programmer, C++ wins hands down. I suspect one of the reasons Fortran is still around is the science professors don’t know or don’t want to learn C++. Object oriented languages like C++ require a different programming mindset. It is not just a mater of learning the syntax. The only other advantage that I can think of for Fortran is there is boatload of public domain code. The reality is that every update of Fortran is moving closer to the much more powerful object oriented languages like C++.
Mats Holmstrom

Posted Apr 4, 2006 at 11:05 AM | Permalink

Re 17.
Fortran is still alive and doing well for numerical intensive high performance computer applications.
The main advantage of Fortran in scientific computing in my opinion is that multidimensional arrays are built-in types. That makes code easily reusable and portable.
Object orientation is not as important for these applications since the hard parts are the numerical algorithms used.
There are actually also some things that makes Fortran code easier to optimize than C or C++ code (e.g., aliasing among array parameters is forbidden) but that is not so important I think.
Mark

Posted Apr 4, 2006 at 11:23 AM | Permalink

Yah to #18. This is the same reason standard (ANSI) C still rules the embedded marketplace for applications like the kinds I do (DSP). I don’t know anyone that prefers C++ to structured C. For hardcore apps, I actually get into the assembly output and modify it directly. Compilers just aren’t up to the task of real optimization (not yet).

Mark
JerryB

Posted Apr 4, 2006 at 12:10 PM | Permalink

Some additional comments regarding statistics are in section 5.6 of http://www.nature.com/nature/authors/gta/index.html .

ROTFL may be in order when you read the sentence:

“Authors should be aware that all referees are asked to review any statistical analysis present and to ensure that it is sound and that it conforms to the journal’s guidelines.”
Paul

Posted Apr 4, 2006 at 12:21 PM | Permalink

RE #20 –

Or put your head in your hands and sob…
jae

Posted Apr 4, 2006 at 12:38 PM | Permalink

Here’s the joke of the day from RealClimate, mentioned during a “lecture” about how much concern the AGW-is-a-fact scientists have for the Earth and my grandchildren (very touching, indeed):

Readers of this site know that we are very happy to discuss every piece of evidence publically, critically and in great detail – that’s what this site is for.

When have they discussed any of the issues brought up on this blog, such as statistical problems, cherry picking, nonlinearity, etc.? LOL.
Brooks Hurd

Posted Apr 4, 2006 at 1:44 PM | Permalink

Based on my own personal experience in trying to post controversy on RC, this statement makes me ROFLOL.

RC does let token controversy on the site from time to time, but only if the choir is present in sufficient numbers to smother any controversial tidbit.

If anyone takes the time to respond to their minions with a well thought out reply, RC redacts it to the point that it becomes nearly nonesensical. On the rare occaision that a comment makes it onto the site, the RC folk condesceningly tell the writer that it is boring since they have already answered (that is ignored) the question.

We heretics must be overwhelmed by the cheering RC choir.
fFreddy

Posted Apr 4, 2006 at 1:53 PM | Permalink

Ref C++ vs Fortran, etc. – surely there is a large element of what do you want to use it for ?
I would imagine that most C++ class libraries these days are wrap-arounds for big complicated interface objects like windows and things. Most Fortran work will be taking data from one text file, number-crunching, and squirting results out to another. You don’t need to faff around coddling vb.net to do that.
Ref ANSI C – I’m with Mark. VBA is pretty good, but I do miss function pointers.
Mark

Posted Apr 4, 2006 at 3:19 PM | Permalink

fFreddy: yes. Very application specific. Writing GUI apps or otherwise OS based apps lends itself neatly to C++ et. al. object oriented languages. Writing a pile of statistical analysis that has to be completed within some real-time constraint lends itself better to ANSI C type code (FORTRAN, too). The latter is what I do, and consequently why I prefer structured C.

Mark
Nicholas

Posted Apr 4, 2006 at 5:17 PM | Permalink

I have a friend who is a mathematician and supercomputer operator. He says the only reason he sees Fortran is being viable are the massive scientific libraries available for it. Other than that, C/C++ can do pretty much everything better. He would rather people use C (he has to support their code and the machines which run it) although I think he understands that they won’t switch if they lose access to useful libraries and such.
Greg F

Posted Apr 4, 2006 at 5:29 PM | Permalink

The main advantage of Fortran in scientific computing in my opinion is that multidimensional arrays are built-in types.

Google Blitz++.

Re: 18

Embedded programming is a different animal. Unlike “scientific computing” embedded programming is intimately tied to the hardware. I wouldn’t use C++ for an embedded application either, but assembly … well … I love writing assembly.

Re: 24

As fFreddy points out, “what do you want to use it for ?” If I was writing a large “scientific” application I might opt for C++ if speed or multi threading was important. Climate models would fall in this category. Climate models are the exception, not the rule. I would argue that most scientific computing is not computationally time sensitive. It makes little difference if the code takes 20 times longer to run if your talking 1 second in opposition to 20 seconds. There are a variety of time saving computational tools such as R and Matlab that are more then adequate for everyday scientific use (and quicker to program). For example, the use of Fortran by Mann is far less efficient then R, as Steve M has shown. I don’t doubt that the Fortran will run faster then the R code. At the same time I don’t doubt that difference would be more then made up in the time it would take to write the code. What tool you use always involve tradeoffs. My main objection is the argument that Fortran is faster then C++. That use to be true, it isn’t anymore.
Nicholas

Posted Apr 4, 2006 at 5:46 PM | Permalink

Actually, I suppose Fortran might be easier for mathematicians to learn. But I remember my friend saying he ended up converting some Fortran programs some mathematicians had written into C for various reasons (possibly performance-related). So it would obviously be less of a pain for him if they could write them in C in the first place 🙂
Nicholas

Posted Apr 4, 2006 at 5:51 PM | Permalink

“It makes little difference if the code takes 20 times longer to run if your talking 1 second in opposition to 20 seconds”

True, with big caveats. As I was pointing out, my friend operates supercomputers (and clusters), and the run time of software is typically on the order of days or weeks, during which time it might hog hundreds of nodes and/or some big machines. There’s often a queue of programs waiting to run. So, performance can be an issue 🙂

As you point out, climate models can be of even greater magnitude. There can be whole cluster or clusters dedicated to a single model for long run periods. Additionally, the models are not all that mathematically complex (as I understand it), rather deal with a large grid of data. I suspect in that circumstance C will be best, as it’s good for light things with many repetitions. Of course, some of my assumptions about how the modeling works could be wrong.
kim

Posted Apr 5, 2006 at 11:45 AM | Permalink

Much gracious, all, I’m edified. Thanks also, DD, for the clue about CO2 and infrared several weeks ago. That helped.
=============================
kim

Posted Apr 5, 2006 at 6:24 PM | Permalink

Must’ve been a magnetic flux.
==================
Tim

Posted Apr 7, 2006 at 8:57 AM | Permalink

http://www.userfriendly.org/

Check out the April 7th cartoon. It’s like deja’vu all over again 🙂
Gerald Machnee

Posted Apr 7, 2006 at 12:09 PM | Permalink

There was an article in the news here about March 16 about ethics in medical research. I have enclosed the site and the atart of the article.

http://www.canada.com/national/nationalpost/news/cnspolitics/story.html?id=a320fffa-9ff7-48b6-b975-86cdef0b4228

Here is the start of the article:

Medical researchers caught faking it
Federal grant recipients

Margaret Munro, CanWest News Service
Published: Thursday, March 16, 2006
More than a dozen scientists and doctors, several of them recipients of sizable federal grants, have been faking research, destroying data, plagiarizing or conducting experiments on people without necessary ethics approvals, the country’s lead research agencies report.

One medical researcher, who was awarded $1,347,445 for various projects, fabricated and falsified data and was permanently barred last year from receiving more federal money, according to documents obtained by CanWest News Service.
Troy Baer

Posted Apr 7, 2006 at 2:31 PM | Permalink

Re: #28

I’ve worked in user support at a supercomputer center for a little over 8 years, and IMHO many scientists who switch from Fortran to C probably shouldn’t. There’s an old saw about how a good Fortran programmer is able to write Fortran in any language; unfortunately, this also holds true for virtually all naive and/or bad Fortran programmers as well, and C isn’t nearly as forgiving as Fortran. Working with multidimensional arrays in C can be actively painful, and the ways to shoot yourself in the foot with malloc() are many and varied.

A humorous anecdote about C++ vs. Fortran: One user I worked with had a huge molecular dynamics program that he’d written in C++… but the routine that was the innermost loop of the calculation, the one where most of the time was spent, was written in Fortran. I once asked him why he didn’t implement that routine in C++ too, because he went to heroic efforts to link this one Fortran routine with his C++ app. His response was something to the effect of “I tried that once, and the code slowed down by a factor of two.”
TCO

Posted Apr 7, 2006 at 2:40 PM | Permalink

Basic8 is the shiznet!
fFreddy

Posted Apr 8, 2006 at 5:04 AM | Permalink

What has everyone got against C for multidimensional arrays ? The C compiler I was using nearly twenty years ago let you declare them just fine, and it would have been implementing them with direct pointer arithmetic, so I can’t imagine they were slow.
Sure, if you wanted to redimension them at run-time, you would have to write some get and put functions, and they would not be so fast, but I can’t see how any other language would do it better.
What can Fortran do with multidimensional arrays that C can’t ?
Troy Baer

Posted Apr 8, 2006 at 6:22 AM | Permalink

What can Fortran do with multidimensional arrays that C can’t ?

Three things I can think of off the top of my head:

ALLOCATE(foo(imax,jmax,kmax))

a=b+c # where a, b, and c are arrays of the same dimensionality and size

a=MATMUL(b,x)+c

These are all things built into the Fortran 90 language, BTW. Admittedly most of these are just conveniences, but being able to to do multidimensional array operations without having to think too hard about “Did I vectorize that correctly?” is a big win IMHO.