Another Inch at Sciencemag

Update: Continued here

I just heard back from Science on the continuing and frustrating effort to obtain data from Esper et al. [2002] and Osborn and Briffa [2006], last discussed here . I got interesting but incomplete information in February and March. The latest installment is very disappointing in comparison even though, in my opinion, Science is making a bona fide effort to get data from the Hockey Team. (The problem originates with previous inadequate administration of their data archiving policies.) Here’s the status.

My efforts to exmine Esper data began on May 3, 2004, when I sent an e-mail to Jan Esper, politely requesting data used in Esper et al [2002], and have been going on nearly two years. I sent follow-ups on Nov 16, 2004; Dec. 29, 2004 and May 30, 2005. I contacted Science on Sept. 5, 2005 mentioned here. On Oct. 3, 2005, the deputy editor of Science replied that he was in contact with Esper. On Feb 2, 2006, with no progress having occurred, I re-iterated my request re Esper, together with a request re Osborn and Briffa. On Feb 21, 2006, I received a first incomplete installment re Esper discussed here. It’s hard not to think that the Hwang fiasco finally got the matter off the dime. On Feb 22, 2006, Peiser and others sent an Open Letter to science here.

On Feb. 23, 2006, I sent a letter back to Science itemizing gaps. On March 17, 2006, Science sent another incomplete instalment and on March 22, 2005, I sent a letter back to Science itemizing continuing shortfalls.

First here is the most recent reply from Science, which I’m reproducing here, because I see nothing discreditable to Science in this letter and it shows a genuine effort on their part. In a couple of paragraphs, it’s unclear whether Sciencemag is talking or whether Esper is talking. I will then collate these answers against the outstanding requests in March 17 letter. The Sciencemag letter:

here is some additional information.

With respect to the Esper paper here is some additional information from the authors:

As described, in some of the sites we did not use all data. We did not remove single measurements, but clusters of series that had either significantly differing growth rates or differing age-related shapes, indicating that these trees represent a different population, and that combining these data in a single RCS run will result in a biased chronology. By the way, we excluded other sites because growth was too rapid, for example.

The split into linear and non-linear ring width series is shown in a supplementary figure accompanying the Science paper. The methods of this widely accepted, approach are described in the paper cited below and in the Science paper. It is possible to make this an operational approach, for example, by fitting growth curves to the single measurement series (e.g. straight line and negative exponential fits) and group the data accordingly. We didn’t do this in the Science paper, but rather investigated the data with respect to the meta information (i.e. for a particular site; data from living trees, and clusters of sub-fossil data), which I believe is a much stronger approach. This, however, requires experience with dendrochronological samplings and chronology development.

Esper J, Cook ER, Krusic PJ, Peters K, Schweingruber FH (2003) Tests of the RCS method for preserving low-frequency variability in long tree-ring chronologies. Tree-Ring Research 59, 81-98.

I’ve attached the pol file (the first version was corrupted in the transfer so I had to get a second). I’m tracking down the other files from the original authors. WRT the mongolian data, this is the information I have:

There should be a site called Solongotyn Davaa (SolDav) in Mongolia going back to 900 AD.

These data were used by Esper et al. 2002. It was submitted years ago but I just checked and do not see it. I will check with the person who submitted and see what they called it. SolDav is an extension of the site listed below.

Tarvagatay Pass (MNG; 48N,98E; Jacoby,G.C.;D’Arrigo,R.D.;Buckley,B.;Pederson,N.)

I’m still waiting to hear on the other two sites.

Sincerely,

Brooks Hanson

Collated Against March 17, 2006 Request

1. In February, Esper provided 13 of 14 site chronologies, omitting the Mongolia series for some reason. Could you please provide the 14th chronology.

No response

2. In your March email, Esper, through your assistance, provided 10 of 14 site measurement (rwl) files, omitting Mongolia, Polar Urals, Boreal and Upperwright. Boreal and Upperwright are definitely not at ITRDB. The Polar Urals data set at ITRDB does not have 157 radii (as indicated in the chronology summary). I don’t understand why these were not provided concurrently with the other 10, but could you please provide the remaining 4 rwl files.

I’ve attached the pol file (the first version was corrupted in the transfer so I had to get a second). I’m tracking down the other files from the original authors. WRT the mongolian data, this is the information I have:

There should be a site called Solongotyn Davaa (SolDav) in Mongolia going back to 900 AD.

These data were used by Esper et al. 2002. It was submitted years ago but I just checked and do not see it. I will check with the person who submitted and see what they called it. SolDav is an extension of the site listed below.

Tarvagatay Pass (MNG; 48N,98E; Jacoby,G.C.;D’Arrigo,R.D.;Buckley,B.;Pederson,N.)

I’m still waiting to hear on the other two sites.

3. In 4 cases, the Osborn site chronology differs from the Esper site chronology, although in the other cases the versions are identical. In some cases, the date ranges do not match. I do not believe that it is possible to replicate the Osborn version from the Esper measurement data in these 4 cases and surmise that Osborn used a different measurement data set. I therefore request measurement data used by Osborn for the following sites: Polar Urals, Tornetrask, Taymir and Athabaska.

Esper et al. was not the source for the four series in question, as stated in the SOM of the paper (see paragraphs c and d). The Athabasca series was replaced with a series from Luckman and Wilson. The other three series contain some non-identical tree-ring series derived from the same sites; thus the series they used can not be reproduced using the Esper et al. data; there are fewer tree cores in the Esper et al. data. The source for these three series is Briffa (2000). Osborn and Briffa did not not use raw tree-core measurements, only chronologies that had previously been assembled by others, and these have been deposited. You may want to contact those original authors or those publications if you require their raw data.

4. In 4 cases (Athabaska, Jaemtland, Quebec,Zhaschiviersk), Esper’s site chronology says that not all of the data in the data set is used. This is not mentioned in the original article. What is the basis for de-selection of individual cores?

As described, in some of the sites we did not use all data. We did not remove single measurements, but clusters of series that had either significantly differing growth rates or differing age-related shapes, indicating that these trees represent a different population, and that combining these data in a single RCS run will result in a biased chronology. By the way, we excluded other sites because growth was too rapid, for example.

5. Esper et al. [2002] do not provide a clear and operational definition distinguishing “linear”‘? and “nonlinear”‘? trees. As previously requested, could you please provide an operational definition of what they did, preferably with source code showing any differences in methodology. (I’ve enclosed a copy of a policy on source code which has been implemented at the American Economic Review by Ben Bernanke, its former editor who is now Chairman of the Federal Reserve Board. Several other economics journals have adopted similar policies. I urge Science to consider a comparable policy.)

The split into linear and non-linear ring width series is shown in a supplementary figure accompanying the Science paper. The methods of this widely accepted, approach are described in the paper cited below and in the Science paper. It is possible to make this an operational approach, for example, by fitting growth curves to the single measurement series (e.g. straight line and negative exponential fits) and group the data accordingly. We didn’t do this in the Science paper, but rather investigated the data with respect to the meta information (i.e. for a particular site; data from living trees, and clusters of sub-fossil data), which I believe is a much stronger approach. This, however, requires experience with dendrochronological samplings and chronology development.

Esper J, Cook ER, Krusic PJ, Peters K, Schweingruber FH (2003) Tests of the RCS method for preserving low-frequency variability in long tree-ring chronologies. Tree-Ring Research 59, 81-98.

6. I acknowledge receipt of a temperature data set from 1888-1990. The HadCRU2 data set contains temperature data for the gridcell 37.5N, 117.5W commencing in 1870. However, the gridcell information provided by Osborn commenced only in 1888 and the differences are material to the final result (0.045 versus 0.18 reported). What is the reason for commencing this comparison in 1888 rather than the available 1870? Why is there no notice of this in the SI? Since there is a material difference in this example, could you please provide the gridcell temperature sets in a comparable format for the other 13 Osborn and Briffa series.

I’m still waiting for the authors to compile the requested additional temperature data as the files were not readily available.

7. The long-standing request for a complete archive of Thompson data, especially Dunde and Guliya ice cores, including both isotope and chemical data, remains outstanding.

No response.

Comments

On balance, precious little progress.

1. Why can’t they get the one remaining chronology?

2. Similarly, if they can get 10 (now 11) of the rwl files, what’s the problem with the other 3 sites – why weren’t they provided. Yes, the Tarvagatny Pass data is at WDCP, but the WDCP versions do not coincide with Esper versions (as we’ve seen in other cases). The two other missing sites are Graumlich’s foxtails sites. Now I’ve been told through back channels that Graumlich lost the foxtail measurement data when she moved to Montana a number of years ago. I thought that Esper/Cook might have had a duplicate version that my source was unaware of. The delays make me wonder, although on balance I expect something to materialize – but why the problem?

3. The response for Osborn and Briffa measurement data is irritating. Science says that Osborn and Briffa [2006] did not use measurement data (on the basis that this article used chronologies calculated from the measurement data) and suggested that I contact the original authors. In the same breath, they say that the chronologies were calculated in Briffa [2000] – excuse me, is this a different Briffa than Osborn and Briffa? If Briffa published the chronologies in a journal which has lower standards than Science, does that mean that Science allows them to kite the cheque?

4. “As described, in some of the sites we did not use all data.” Wait a minute, where is that “described” in Esper et al [2002]. Does Esper’s explanation intelligible to any of you? Shouldn’t Hanson have told Esper to stop wasting everyone’s time and produce an intelligible explanation?

5. This answer is even worse than 4. Does it make sense to any of you? If I were Hanson, I’d have been embarrassed sending this on.

6. No answer is given to the question of the favorable start date for the comparison. Also, wouldn’t they have produced these files when they did the calculations? What’s the problem?

7. I originally requested data from Thompson in 2003 before anyone had ever heard of me.

23 Comments

  1. john lichtenstein
    Posted Apr 21, 2006 at 11:15 PM | Permalink

    Wow.

    The folks at the Stanford Exploration Project have some great thoughts on really reproducible research. Maybe they oversolve the problem with make, but it’s a good reference point. The motivation is interesting.

    In the mid 1980’s, we noticed that a few months after completing a project, the researchers at our laboratory were usually unable to reproduce their own computational work without considerable agony. In 1991, we solved this problem by developing a concept of electronic documents that makes scientific computations reproducible. Since then, electronic reproducible documents have become our principal means of technology transfer of scientific computational research. A small set of standard commands makes a document’s results and their reproduction readily accessible to any reader. To implement reproducible computational research the author must use makefiles, adhere to a community’s naming conventions, and reuse (include) the community’s common building and cleaning rules. Since electronic reproducible documents are reservoirs of easily maintained, reusable software, not only the reader but also the author benefits from reproducible documents.

  2. Pat Frank
    Posted Apr 22, 2006 at 12:16 AM | Permalink

    Steve, would you like a letter sent to Brooks Hanson on your behalf? Are any academic scientists writing to Science or Nature supporting your requests for data? I’m not faculty, but could write a polite letter of support on letterhead.

    The fact that you’ve been negotiating for almost two years to get data, which are recently published and so should be readily to hand, is insupportable.

  3. John A
    Posted Apr 22, 2006 at 2:56 AM | Permalink

    As described, in some of the sites we did not use all data. We did not remove single measurements, but clusters of series that had either significantly differing growth rates or differing age-related shapes, indicating that these trees represent a different population, and that combining these data in a single RCS run will result in a biased chronology. By the way, we excluded other sites because growth was too rapid, for example.

    In other words we excluded data where we knew the answer was wrong.

  4. TCO
    Posted Apr 22, 2006 at 6:17 AM | Permalink

    A. I was thinking along similar lines to Pat: If you can more DIRECTLY connect the outsider science supporters to your quest to get data, that would be helpful. The letter by the various scientists was theoretical. If you could get them to weigh in on exactly this issue, it would be helpful.

    B. If Science is having a hard time getting data from various climate authors, at some point an editorial discussing the problem would be desirable. So would a tightening up on these authors in future submissions.

  5. TCO
    Posted Apr 22, 2006 at 6:27 AM | Permalink

    JohnA,

    In theory, the data exclusion could be reasonable. For instance based on the “Principle” of Frits’s that you pick trees that are limited by the parameter in question and not others. The concern is that I really doubt that the same tree is always limited by the same variable. I don’t have a problem with an argued exclusion from the analysis. But given the really early fundamental understanding of the science and the intrinsic variablity associated with live specimens and field work, they should keep ALL the data possible and show in SI.

    To me, the field is bizarrely close to Cold Fusion. Only an afficionado can get the field results needed and they can’t describe the methodology sufficiently for someone outside the club to replicate. And (like lousy electrochemists moving into physics) you’ve got a group of people that isn’t the brightest. That is nice and skis and listens to NPR and such. But we’re not talking 1600 SATs here. Mann is probably the best of the bunch in math knowledge. And he’s a refugee from physics (couldn’t get it done there). There’s also this bizarre German green cultural trend going on here. VS really called the ball with his comments on the growth of the field and tendancies in German culture. Once again we’ve got a little country trying to distinguish itself in science (like the Koreans). They should just know that they will never live up to perfidious Albion and it’s f***ing demon-spawn of a real-man cultural: red-state ‘Muricans!

  6. Douglas Hoyt
    Posted Apr 22, 2006 at 6:39 AM | Permalink

    They shouldn’t be excluding any data. They should add a descriptive file that tells why they think some of data is bad and is excluded from their analysis. Other researchers may not agree with their reasons and want to include some of censored data. Every researcher should be able to see all the data and make their own decisions.

  7. TCO
    Posted Apr 22, 2006 at 6:46 AM | Permalink

    Absolutely. This is Science methods 101. Lots of people don’t know or follow Science 101. that’s why Feynman had to b******* NASA…

    John: Blame me for blatently censoring a word that we don’t want Google to index us on. Knock it off, TCO

  8. Brooks Hurd
    Posted Apr 22, 2006 at 6:52 AM | Permalink

    Steve,

    You deserve high praise for your continuous efforts to obtain data.

    I never cease to be amazed how authors of recent papers can not find the data which formed the basis of their work. Clearly, Esper et al did not do their calculations using pencil, paper, abucus and slide rule. Esper and his associates used computers and consequently they have files with their data and their analyses. I find it hard to believe that scientists who write peer reviewed papers are so dissorganized that they can not find the data which they used in this decade.

  9. Steve McIntyre
    Posted Apr 22, 2006 at 6:52 AM | Permalink

    #2,4. It’s been disappinting to me that virtually no scientists have stepped up and expressed their views to the journals. I think that the Peiser letter to Science was helpful. I get the impression that Hanson is really making an effort to get the data problem to go away. In his shoes, I’d be a lot tougher with the Hockey Team, but he’s probably inexperienced at dealing with systematic obfuscation.

    So yes, letters would be helpful. One angle that needs to be nipped in the bud is the cheque kiting between journals. In the particular case, Briffa [2000] published 3 chronology results in QSR relying then on measurement data without archiving the measurement data. Now Osborn and Briffa [2006] seek to avoid archiving measurement data on the grounds that they used the chronologies from Briffa [2000] rather than the measurement data and that jurisdiction over access to the measurement data lies with QSR. Science should cut through this jurisdictional issue and say – we’re the big dogs; if there’s anything grey in your chain of evidence, clean it up if you want to publish with us.

    If I were Hanson, I would really lay down the line with these guys. I’d tell them that I have other things to do besides babysit the production of data. I’d say – here’s the list, here’s a deadline, if you don’t have done it done by then, we’re retracting the paper. We’re in a post-Hwang era; get with the program.

  10. Jim Erlandson
    Posted Apr 22, 2006 at 7:21 AM | Permalink

    The institutions these people work for have an interest in resolving this.

  11. TCO
    Posted Apr 22, 2006 at 7:26 AM | Permalink

    we’re not in the post-hwang-era. Just being realistic.

  12. jae
    Posted Apr 22, 2006 at 8:47 AM | Permalink

    In theory, the data exclusion could be reasonable. For instance based on the “Principle” of Frits’s that you pick trees that are limited by the parameter in question and not others. The concern is that I really doubt that the same tree is always limited by the same variable. I don’t have a problem with an argued exclusion from the analysis. But given the really early fundamental understanding of the science and the intrinsic variablity associated with live specimens and field work, they should keep ALL the data possible and show in SI.

    It is still very unclear to me how this principle can work in the case of temperature. IMO it only works if you have a historical record to work with. There is simply no way to know, for sure, which variable was limiting growth in North America during over, say, 150-200 years ago. These guys are simply unabashedly cherry picking whatever trees, series, portions of series, etc. that fit their preconceived theory. I don’t think they are following any Frits’ principle.

  13. Dave B
    Posted Apr 22, 2006 at 11:27 AM | Permalink

    #6…agree COMPLETELY. in medical research these are known as EXCLUSION CRITERIA, and are (or should be) clearly pointed out in Materials and Methods.

  14. Posted Apr 23, 2006 at 11:51 AM | Permalink

    Brooks,

    “I find it hard to believe that scientists who write peer reviewed papers are so dissorganized that they can not find the data which they used in this decade.”

    I must confess that in computer science departments I do not see especially good archival of data. It is something each person does on their honor, and there simply is no one interested in cracking down on people who do it poorly.

    It is not Science 101, but it actually does all works out okay in the end. The ins and outs of it are rather complicated, and between-scientist politics is a big part of it, but in the end, computer scientists of today know a lot more than they did a decade or three ago.

    Incidentally, here’s a better way (IMHO) to think about the situation. The goal is not to mark who is a Good Scientist and who is a Bad Scientist, and neither is the goal to keep some kind of score in a great big scientist competition. The goal is to find the truth. If someone’s data is questionable, then their argument falls apart and their reputation suffers in the future — nothing more nor less.

  15. John A
    Posted Apr 23, 2006 at 1:13 PM | Permalink

    Re #9

    If I were Hanson, I would really lay down the line with these guys. I’d tell them that I have other things to do besides babysit the production of data. I’d say – here’s the list, here’s a deadline, if you don’t have done it done by then, we’re retracting the paper. We’re in a post-Hwang era; get with the program.

    The one thing that the editors of Science and Nature have learned is that they have learned nothing from the Hwang woo Suk affair. Precisely nothing has changed.

  16. Brooks Hurd
    Posted Apr 23, 2006 at 1:35 PM | Permalink

    Re: 14 Lex,

    If someone’s data is questionable, then their argument falls apart and their reputation suffers in the future “¢’‚¬? nothing more nor less.

    I agree.

    The problem is to get the data from people who:
    1. lost it
    2. missplaced it
    3. are too busy to bother
    4. do not want to give you their data.

    This intransigence makes it very difficult to evaluate whether the data is questionable.

  17. John Lish
    Posted Apr 23, 2006 at 1:36 PM | Permalink

    #14 Lex, I love the sentiment of your last paragraph and in the long-run it will be proved so however in the meantime, business will continue to be done with as much hairpulling and namecalling as possible.

  18. Steve McIntyre
    Posted Apr 23, 2006 at 1:53 PM | Permalink

    #15. John A, I’m not sure you’re right. Prior to Hwang, I got nowhere with Science. Now they’re actually asking the authors for data. I wish they’d take a harder line, but that’s wishful thinking. Right now, they seem to be at least trying at the cost of more time than they want to spend at it.

  19. John A
    Posted Apr 23, 2006 at 2:02 PM | Permalink

    Re: #18

    Steve,

    If they had learned anything from the Hwang affair, it’s that the data should be archived as a precondition of acceptance for publication, not as a request after the fact some time later by which time the authors are on their next project, their next funding meeting, their next academic sinecure. The authors have no particular compulsion to respond to requests for data after publication, unless and until the journals start taking their archival policies seriously.

    Have they actively started to do this? No. The last thing we heard from Kennedy was “Scientific fraud happens and there’s nothing we can do to detect it”

  20. jim karlock
    Posted Apr 27, 2006 at 7:34 AM | Permalink

    How can peer-review be accomplished without the reviewers looking at the original data?

    Or am I completely misunderstanding the process (I am an electrical engineer)

    Thanks
    JK

  21. TCO
    Posted Apr 27, 2006 at 7:57 AM | Permalink

    One of the things that you learn as you read the literature is how to look at a paper and tell whether it is a good paper or not. If you can do it with a published paper, you can do it with a submitted paper.

  22. Dave Dardinger
    Posted Apr 27, 2006 at 8:22 AM | Permalink

    TCO,

    One of the things you learn as you read this blog is that while you might have thought that you can tell a good paper from a bad one by just reading it, that turns out not to be the case.

    Merely because a paper looks good, follows the rules for good papers and agrees with what you thought was the truth where you can check without difficulty does not mean that what’s new about the paper you’re reading is correct.

    OTOH, I’m not sure that peer review is supposed to do so. That’s supposed to be the people who read the paper and respond.

    Perhaps they should call it pier review since it’s so a dhoc.

  23. John A
    Posted Apr 27, 2006 at 10:04 AM | Permalink

    How can peer-review be accomplished without the reviewers looking at the original data?

    …is a question we’d like to know the answer to. The answer is that peer review cannot bear the weight of the expectation of due diligence that people believe when they hear about it.

    Peer review can catch obviously wrong statements that contradict known scientific principles. It cannot catch data fraud. That can only be done by people replicating and/or auditing the work.