Jacoby #1: A "Few Good" Series

Jacoby is on the Hockey Team. His treeline temperature reconstruction was made by picking the 10 most "temperature-influenced" of 36 sites studied. Only these 10 sites were archived. I sought information on the other 26 through Climatic Change, the publishing journal. Jacoby refused, stating:

The inquiry is not asking for the data used in the paper (which is available), they are asking for the data that we did not use.

Imagine this argument in the hands of a drug trial. Let’s suppose that they studied 36 patients and picked the patients with the 10 "best" responses, and then refused to produce data on the other 26 patients on the grounds that they didn’t discuss these other patients in their study. It’s too ridiculous for words. Yet Climatic Change saw no problem with this refusal. Jacoby went on to say that his research was "mission-oriented" and that:

As an ex- marine I refer to the concept of a few good men. A lesser amount of good data is better without a copious amount of poor data stirred in.

Imagine ex-marines with this philosophy in charge of drug trials. Maybe they already are.


Jacoby Response to Data Request

Jacoby and d’Arrigo [Clim. Chg. 1989], together with d’Arrigo and Jacoby [1992], is a temperature reconstruction, which is applied in many multiproxy studies (e.g. Jones et al [1998], 11 series used individually in MBH98, as an "adjustment" to the North American PC1 in MBH99, Jones and Mann [2004]). Jacoby is a member of the Hockey Team.

Jacoby and d’Arrigo [1989] states on page 44 that they sampled 36 northern boreal forest sites within the preceding decade, of which the ten "judged to provide the best record of temperature-influenced tree growth" were selected. No criteria for this judgement are described, and one presumes that they probably picked the 10 most hockey-stick shaped series.

I have done simulations, which indicate that merely selecting the 10 most hockey stick shaped series from 36 red noise series and then averaging them will result in a hockey stick shaped composite, which is more so than the individual series. The process is not dissimilar to what happens in the MBH98 PC1. In the MBH98 PC1, the 14 most hockey stick shaped series account for over 93% of the variance. There is very little difference in appearance between a simple average of these 14 series and the EOF-weighted composite (PC1).

I was interested in testing whether Jacoby’s selection process imparted a bias to the data set under consideration. In order to test whether Jacoby’s selection of the 10 most "temperature-influenced" series had any significance relative to comparable selection from red noise series, I looked for the archived information of the 36 sites. I had previously located the 10 most "temperature-influenced" sites at WDCP archives (and have elsewhere discussed inconsistencies between this archive and the versions used in MBH98), but was unable to locate archived versions of the other 26 series.

As a result of some previous exchanges with Climatic Change (which I will probably discuss on another occasion), they adopted a policy in which authors were required to provide supporting data, but decided not to adopt a policy requiring authors to provide source code. Under this policy, I asked them to obtain the other 26 series from Jacoby (since I had had no success directly).

Jacoby refused to provide the 26 series and I found his reasoning as set out to Climatic Change quite interesting (my bolds).

The inquiry is not asking for the data used in the paper (which is available), they are asking for the data that we did not use. We have received several requests of this sort and I guess it is time to provide a full explanation of our operating system to try to bring the question to closure.

Speaking for myself and immediate colleagues who have been involved with my research: Most of our research has been mission-oriented, dendroclimatic research. That means to find climatically-sensitive, old-aged trees and sample them in order to extend the quantitative record of climatic variations. Also, to relate these records to the real world and investigate the climate system and its functioning.

The first part produces absolutely-dated time series of tree-ring variations. We try to sample trees at sites where there is likely to be a strong climatic signal, usually temperature or precipitation. Sometimes we are successful, sometimes we are not. We compare the tree-ring series to climate records to test what the climate signal is. We sample latitudinal treeline and elevational treeline looking for temperature-sensitive trees with both a high-frequency and low-frequency response to temperature. A high-frequency temperature response to summer is most frequently found at these extreme locations. However, trees have much more information if one finds trees with a good communal high and low frequency variations that correspond or correlate to local or regional temperatures for longer seasons. There is abundant information to explain the physiological processes in cooler seasons and why trees can respond to more than just summer season. The sampling and development of a tree-ring chronology is an investment of research energy, time, and money.

The best efforts in site selection and sampling do not always produce a good chronology. It is only as the samples are processed and analyzed that the quality, or lack thereof becomes evident. First is the dating: this is enabled by high-frequency common variation among the trees. The dating is achieved and tested by various methods. Then the chronology is developed from the correctly dated ring-width measurements and evaluated. Testing: Is there a common low-frequency signal among the trees? At a good temperature- sensitive site with good trees, there is. We conduct common period analyses of the low- frequency variation within the cores samples from a site.

Sometimes, even with our best efforts in the field, there may not be a common low-frequency variation among the cores or trees at a site. This result would mean that the trees are influenced by other factors that interfere with the climate response. There can be fire, insect infestation, wind, or ice storm etc. that disturb the trees. Or there can be ecological factors that influence growth. We try to avoid the problems but sometimes cannot and it is in data processing that the non-climatic disturbances are revealed.

We strive to develop and use the best data possible. The criteria are good common low and high-frequency variation, absence of evidence of disturbance (either observed at the site or in the data), and correspondence or correlation with local or regional temperature. If a chronology does not satisfy these criteria, we do not use it. The quality can be evaluated at various steps in the development process. As we are mission oriented, we do not waste time on further analyses if it is apparent that the resulting chronology would be of inferior quality.

If we get a good climatic story from a chronology, we write a paper using it. That is our funded mission. It does not make sense to expend efforts on marginal or poor data and it is a waste of funding agency and taxpayer dollars. The rejected data are set aside and not archived.

As we progress through the years from one computer medium to another, the unused data may be neglected. Some [researchers] feel that if you gather enough data and n approaches infinity, all noise will cancel out and a true signal will come through. That is not true. I maintain that one should not add data without signal. It only increases error bars and obscures signal.

As an ex- marine I refer to the concept of a few good men.

A lesser amount of good data is better without a copious amount of poor data stirred in. Those who feel that somewhere we have the dead sea scrolls or an apocrypha of good dendroclimatic data that they can discover are doomed to disappointment. There is none. Fifteen years is not a delay. It is a time for poorer quality data to be neglected and not archived. Fortunately our improved skills and experience have brought us to a better recent record than the 10 out of 36. I firmly believe we serve funding agencies and taxpayers better by concentrating on analyses and archiving of good data rather than preservation of poor data.

I guess I won’t be getting the data. It would be my position that, if they picked 10 of 36 sites, they used all 36 sites in their study. Imagine this argument in the hands of a drug trial. Let’s suppose that they studied 36 patients and picked the patients with the 10 best responses, and then refused to produce data on the other 26 patients on the grounds that they didn’t discuss these other patients in their study. It’s too ridiculous.

8 Comments

  1. per
    Posted Feb 6, 2005 at 5:47 PM | Permalink

    hi there
    the link to the complete letter doesn’t work
    yours, etc

  2. Steve McIntyre
    Posted Feb 8, 2005 at 8:55 AM | Permalink

    This has been fixed. Steve

  3. fFreddy
    Posted Jul 2, 2005 at 9:24 AM | Permalink

    You have to wonder what he perceived as his mission. The strong implication was that it was to achieve the hockey-stick result. His colonel should explain to him that a scientist’s task is to find the truth, whatever it may be.
    As of 2 July, the link is not working again.

  4. fFreddy
    Posted Jul 2, 2005 at 9:30 AM | Permalink

    Sorry, that referes to the “Hockey Team” link at the top.

  5. TCO
    Posted Aug 9, 2005 at 8:47 AM | Permalink

    A few good men is a rational policy if the reasons for data exclusion make sense. If you exclusde data because you can prove a confounding factor, then that is reasonable (although including it and doing multiple regression might still be superior). However, if you exclude the data because they don’t fit the story, then that is not reasonable. Of course if that is happening, one may never see it (for instance if there is not throwarw comment about the 36 sites looked at). In this case, the only way to go after it is to replicate experrimental work.

  6. Steve McIntyre
    Posted Aug 9, 2005 at 8:57 AM | Permalink

    The only reason that has been provided is whether the data has an upward 20th century trend. But if you have highly autocorrelated data, a certain percentage will have trends just from the statistics. The issue become statistical and you need to see all the data – just as mining promoters have to report bad drill holes and drug companies have to report cases where the drug didn’t work. There is NO justification for Jacoby’s behavior and the acquiescence of all levels of climate scientists (journals, NSF and peers) is a disgrace. He’s refused to even identify the location of the Gaspe site, when I wanted to arrange for re-sampling.

  7. TCO
    Posted Aug 9, 2005 at 9:12 AM | Permalink

    If you really want to resample, pick your own site. Have a person in the field (presumably the one who will do the sampling) advise on relevant conditions (to make it meaningful). If your results differ, you won’t be able to determine what mistake was made in the earlier work. But you will be able to show that a seperate (good) study came up with a different answer.

  8. TCO
    Posted Sep 11, 2005 at 2:46 PM | Permalink

    The jarhead’s paper should not have been accepted. The reviewers should have asked the obvious question you are asking. (Doesn’t take reworking the problem).

    But really the answer is to resample. And don’t give me any crap about not knowing his locations. Sample new trees. This is not just about exposing individual’s dissembling, but about showing that the as-reported science is incorrect.

4 Trackbacks

  1. […] sites. Jacoby refused in a truly remarkable letter, reported on in one of the very first CA posts here. The following is a lengthy excerpt, see the link for the full letter): The inquiry is not asking […]

  2. […] discussed the screening fallacy (not by that name) in early CA posts e.g. here, as a criticism of Jacoby’s selection of the 10 “most temperature sensitive” […]

  3. […] reconocidos torpes en estadística. Unos ejemplos de la descripción del problema por: McIntyre [–>] Stockwell [–>] Stockwell [–>] Jeff Id [–>] Jeff Id [–>] Motl [–>] Lucia […]

  4. […] McIntyre [–>]  […]