About 10 days ago, we discussed the PNAS reviews of the recent submission by Richard Lindzen, a member of the National Academy of Sciences with a distinguished publication record.
A few days ago, PNAS published Kemp et al 2011, a submission by
one of Mann’s a graduate student [from the University of Pennsylvania]. While, in this case, we do not have access to the reviews, it is possible to make conclusions about the review process of the Mann article both on the limited information in the article and on the basis of the article itself.
The Kemp article states in its masthead:
Edited* by Anny Cazenave, Center National d’Etudes Spatiales (CNES), Toulouse Cedex 9, France, and approved March 25, 2011 (received for review October 29, 2010)
The asterisk says:
*This Direct Submission article had a prearranged editor.
It was certainly generous of PNAS to give a “prearranged editor” to a submission by a graduate student at Penn
State. I’m sure that Lindzen, an actual NAS member, would have appreciated a similar courtesy. It was particularly nice of PNAS to allow the Team to “prearrange” an editor who had been a collaborator with a coauthor within the past 4 years – Cazenave was coauthor with Rahmstorf in Rahmstorf et al (Science 2007), Recent climate observations compared to projections (accepted Jan 25, 2007; published Feb 1, 2007). In contrast, PNAS objected to Lindzen’s submission being reviewed by Chou, who had co-authored with him in 2001.
In the previous discussion of the Lindzen reviews, some defenders of the PNAS reviews argued that the comments were justified.
My own issue with PNAS and other review processes is not that any given criticism cannot be justified, but the hypocrisy of seemingly inconsistent standards for Team critics and Team members. This hypocrisy is nicely illustrated by the contrasting standards for replicability required for Lindzen and for Mann et al. For example, Reviewer 2 of Lindzen’s submission stated:
The description of the procedures is long on philosophical discussion, but rather too spare in describing exactly what was done. Sufficient description is necessary so that another experimenter could reproduce the analysis exactly. I don’t think I could reproduce the analysis based on the description given. For example, exactly how were the intervals chosen? Was there any subjectivity introduced?
If this criticism of Lindzen’s submission is valid, I, of all people, can hardly take issue with it. Lindzen contradicted this criticism in his reply, arguing that the results were replicable. I’m not familiar enough with the data to have my own opinion on who’s right or not. For present purposes, the point is that the PNAS reviewer applied this standard to Lindzen. If PNAS is to be consistent, then the same standard should apply to Kemp et al.
However, if you examine statements in the article itself, the reviewers have clearly not paid any attention to replication or subjectivity in a Team submission.
Jeff Id almost immediately noticed unsupported statements in the article and SI, in particular mentioning their ‘discussion” of weights. I urge readers to search both the article and the SI for the word “weight”, excerpts of which are provided below. The statements highlighted below do not represent any sort of fine tooth comb. Rather they stick out like a sore thumb – to mix metaphors.
The term ‘weight” is used only once in the main article, in the caption to Figure 4 which states:
Fig 4. Salt-marsh proxy data used in Bayesian update were down-weighted by a factor of 10 and used only after AD 1000.
Given that the article was said to “present new sea-level reconstructions for the past 2100 y based
on salt-marsh sedimentary sequences from the US Atlantic coast”, it is puzzling, to say the least, that the actual salt-marsh proxy data in Figure 4 was downweighted by a factor of 10 and not used after AD1000. These puzzling and shall-we-say “subjective” decisions were not discussed in the article itself.
Unfortunately, as often happens with the Team, instead of discussing and explaining the decision, the SI does little more than re-assert the point. For example, they state:
The result of the Bayesian prediction is somewhat dependent on the choice of weighting for the sea-level proxy data; it is necessary to downweight them (or inflate their assumed variance) to take into account that they are subject to strong serial correlation. An appropriate choice for this factor would be 10. With this choice, we find it is not possible to obtain a reasonable a posteriori result for the entire data period: it is necessary to exclude the sea-level data before AD 1000 from the fit.
Later they say:
Weighting and fit for the early period (AD 500–1100). To fit the sea-level proxy data back to AD 500 required down-weighting of the data and generated an inadequate fit with broad uncertainty bands, suggesting that the data is not compatible. Restricting the Bayesian update to only post-AD 1000 sea-level data markedly improved the fit (Fig. 3D), but increased divergence between sea-level proxy data and sea-level predicted prior to AD 1000. There is independent evidence (21, 22) that the steep sea-level rise predicted from temperatures between AD 500 and 1000 is unphysical, and thus that the sea-level proxy data from North Carolina for this period are more realistic.
Obviously, none of the above is a standard statistical procedure. If PNAS review standards condemn the supposed “subjectivity” of the Lindzen submission, how could PNAS permit an author to simply assert that “an appropriate choice for this [downweighting] factor would be 10″. This far exceeds the alleged “subjectivity” of the Lindzen submission. Why didn’t PNAS reviewers and editors object?
And where were the reviewers when the authors perpetuated the known problem of upside-down and contaminated sediments?
Lindzen’s reviewer 2 also emphasized replicability. Again, what was sauce for the goose wasn’t sauce for the gander. PNAS reviewers didn’t pay even the slightest lip-service to ensuring that the underlying data for Kemp et al was available.
For example, the authors say:
We developed transfer functions using a modern dataset of foraminifera (193 samples) from 10 salt marshes in North Carolina, USA (7).
This is a pretty fundamental calibration. Is the data in the SI or at the WDCP archive or at Mann’s website? Nope. They continue:
The transfer functions were applied to foraminiferal assemblages preserved in 1 cm thick samples from two cores of salt-marsh sediment (Sand Point and Tump Point, North Carolina; Fig. 1) to estimate paleomarsh elevation (PME), which is the tidal elevation at which a sample formed with respect to its contemporary sea level (9).
OK, where’s the data for the reconstruction i.e. the 1-cm foraminiferal assemblages? Nowhere in sight. Where, for that matter, are the estimates of “paleomarsh elevation”? Only illustrated in a scrunched figure, but not archived.
In Lindzen’s case, reviewers said that PNAS standards required the justification of “subjective” decisions and replicability. I endorse these principles. As noted above, I am not familiar enough with the data sets for the Lindzen submission to comment on the validity of these criticisms as applied to that article (and do not have the time or energy at present to run these issues to ground.)
However, I am familiar with paleo datasets and methods. I can say categorically that Kemp et al 2011 contains bizarre “subjective” decisions, that the underlying data is unarchived and that the methodology is not described to the standards required by the Lindzen reviewer 2. (I doubt that this could be done in this case without supplying source code, a practice increasingly accepted by journals.)
Once again, the issue is hypocrisy. What’s sauce for the goose should be sauce for the gander. If PNAS standards require replicability and non-subjectivity of a NAS member, then PNAS is flagrantly hypocritical in not enforcing these standards on a submission by a Penn
State graduate student. Dare one observe that PNAS’ hypocrisy and their failure to ensure compliance with standards required of Lindzen were perhaps facilitated by the use of a “prearranged” editor, who had collaborated with the authors within the 4-year prohibited window.