Bürger Comment on Osborn and Briffa 2006

Gerd Bürger published an interesting comment in Science 2006 on cherry-picking in Osborn and Briffa 2006. A few CA readers have noticed the exchange and brought it to my attention. Eduardo Zorita (who I was glad to hear from after our little dust-up at the Nature blog) sent me the info as did Geoff Smith. I started on a summary yesterday, but quickly got distracted into one of the many many possible thickets. So here’s Geoff’s summary:

There’s a pretty hot exchange (at least for CA readers) in last Friday’s Science magazine. Gerd Bürger (lead chapter author and contributor for the TAR) writes about Osborn and Briffa’s 2006 hockey stick (“The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 Years”) commenting critically on site selection and statistics. He writes “…given the large number of candidate proxies and the relatively short temporal overlap with instrumental temperature records, statistical testing of the reported correlations is mandatory. Moreover, the reported anomalous warmth of the 20th century is at least partly based on a circularity of the method, and similar results could be obtained for any proxies, even random-based proxies. This is not reflected in the reported significance levels”.

In commenting on the proxies (most of them well known to CA readers) he says that this “method of selecting proxies by screening a potentially large number of candidates for positive correlations runs the danger of choosing a proxy by chance. This is aggravated if the time series show persistence, which reduces the degrees of freedom for calculating correlations (6) and, accordingly, enhances random fluctuations of the estimates. Persistence, in the form of strong trends, is seen in almost all temperature and many proxy time series of the instrumental period. Therefore, there is a considerable likelihood of a type I error, that is, of incorrectly accepting a proxy as being temperature sensitive’.

He goes on to say ” This effect can only be avoided, or at least mitigated, if the proxies undergo stringent significance testing before selection. Osborn and Briffa did not apply such criteria”.

Bürger indicates the more serious problem is the series screening process, which only looked at proxies with positive correlations. “The majority of those random series would not even have been considered, having failed the initial screening for positive temperature correlations. Taking this effect into account, the independence of the series shrinks for the instrumental period”. This means in Bürger’s opinion that the “results described by Osborn and Briffa are therefore at least partly an effect of the screening, and the significance levels depicted in figure 3 in (1) have to be adjusted accordingly”.

Bürger repeats the analysis with the appropriate adjustments for temperature sensitivity, and finds as a result “the “highly significant” occurrences of positive anomalies during the 20th century disappear. The 99th percentile is almost never exceeded, except for the very last years for {theta} = 1, 2. The 95th percentile is exceeded mostly in the early 20th century, but also about the year 1000″.

There is a reply by Osborn and Briffa, which gives a number of justifications of their procedures (which some will find unconvincing) but concludes ” we agree with Bürger that the selection process should be simulated as part of the significance testing process in this and related work and that this is an interesting new avenue that has not been given sufficient attention until now”.


Refs: 1) Gerd Bürger, Comment on “The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 Years”, Science, 29 June 2007: Vol. 316. no. 5833, p. 1844
DOI: 10.1126/science.1140982 available here but only to subscribers

2) Timothy J. Osborn and Keith R. Briffa, Response to Comment on “The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 Years” (29 June 2007)Science 316 (5833), 1844b. [DOI:10.1126/science.1141446] available here

3) Timothy J. Osborn and Keith R. Briffa, The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 Years, Science, 10 February 2006: Vol. 311. no. 5762, pp. 841 – 844
DOI: 10.1126/science.1120514 available here

A couple of quick points. The inter-relationship of persistence and spurious correlation was discussed in an econometrics context by Ferson et al 2003 (which I’ve discussed here) and has been a concept that has animated much of my thinking. David Stockwell has also been very attentive to the effects of picking from red noise based on correlations. One of the early exercises that I did was to see what happened if, like Jacoby, you picked the 10 “most temperature-sensitive” chronologies from synthetic red noise series with more than AR1 persistence and averaged them. Like Bürger, I did the exercise with persistent series and found that the Jacoby HS was not exceptional relative to red noise selections. (Jacoby only archived the 10 “most temperature-sensitive” series and failed to archive the others. He also refused to provide me the rejected series referring, “as an ex-marine” to a “few good men”).

The Jacoby case was one of the few cases where one could quantify the picking activity and benchmark it against biased selection from red noise.

Geoff spent more space on the Bürger comment than the Osborn and Briffa reply. Its’ main response is that the picking-by-correlation had a relatively minor impact on the selections at their stage because they picked 14 from a universe of supposedly onlly 16 series available from Mann et al 2003 (EOS), Esper et al 2002 and Mann and Jones 2003. They said:

The 14 series used in (2 – Osborn and Briffa 2006) were selected from three previous studies (3—5: Mann et al EOS 2003; Esper et al 2002; Mann and Jones 2003), although this set also encompasses almost all the proxies with high temporal resolution used in the other Northern Hemisphere temperature reconstructions cited in (2) [MBH98, MBH99, Jones et al 1998, Crowley and Lowery 2000, Briffa 2000, Briffa et al 2001, Esper et al 2002, Mann et al 2003, Mann and Jones 2003, Moberg et al 2005, Rutherford et al 2005].

This statement is untrue even if Osborn and Briffa are granted the one stated qualifier and another unstated qualifier. The “high temporal” resolution qualifier is not defined; this qualifier excludes several series from Crowley and Lowery and Moberg et al 2005 and prefers tree rings. The second (unstated) qualifier is that the series go back to 1000. This excludes the majority of the series. (However, Briffa et al 2001 has a very large population of series and a serious “divergence problem”. The Briffa et al 2001 network is one of two networks used in Rutherford et al 2005. The above statement is obviously false in respect to this population.) It is also false even with the long series used in these studies. There are many more than 14 series that cumulatively occur in these studies: there are Moroccan series used in MBH99, a number of oddball series in Crowley and Lowery 2000. I’m in the process of making a definitive count but it is far more than 14.

Osborn and Briffa also gloss over the impact of cumulative data snooping within the literature, by which biased selections are made within the literature. CA readers are familiar with this. For example, consider Briffa’s substitution of Yamal for Polar Urals. For example, updated results for Polar Urals show a very elevated MWP. Even though Briffa had made his name in Nature (1995) for showing a cold MWP in the Polar Urals series, he did not publish the updated information and in Briffa 2000, substituted Yamal (with a HS) for the Polar Urals series. This substitution was followed in all subsequent Team studies except surprisingly Esper et al 2002. The Polar Urals update is excluded from Osborn and Briffa 2006 on some pretext, even though they use both a foxtail and bristlecone series (Mann’s PC1) from sites about 30 miles apart – closer than Polar Urals and Yamal. These individual substitutions are not trivial as this one substitution affects medieval-modern levels in several studies.

Osborn and Briffa observe that

“it is difficult to quantify exactly the size of the pool of potential records from which the 14 series used in (2) were selected, because there is implicit and explicit selection at various stages, from the decision to publish original data to the decision to include data in large-scale climate reconstructions.”

Quite so.

They go on to say:

in our study (2), only two series were excluded on the basis of negative correlations with their local temperature, and no further series had been explicitly excluded by the three studies from which we obtained our data. We cannot be certain that prior knowledge of temperature correlations did not influence previous selection decisions, and there are more levels in the hierarchy of work upon which our study depends at which some selection decisions may have been made on the basis of correlations between proxy records and their local temperature. However, the degree of selectivity is unlikely to be much greater than that for which we have explicit information. Simply, there is not a large number of records of millennial length that have relatively high temporal resolution and an a priori expectation of a dominant temperature signal.

They argue that Bürger has created too large a universe for comparison and that the appropriate simulation is to check cherry-picking of 14 out of 16 – and, surprise, surprise, they emerge with seemingly significant results:

The assessment of the statistical significance of the results of (2) is modified so that, rather than comparing the real proxy results with a similar analysis based on 14 random synthetic proxy series, we now generate 16 synthetic series and select for analysis the 14 that exhibit the strongest correlations with their local temperature records.

Nowhere in either article are bristlecones mentioned and yet they feature prominently in the differing results. Osborn and Briffa use Mann’s discredited PC1 as one proxy and nearby foxtails as another – two out of 14 in a “random” sample! As observed here previously, these do not have a significant correlation with temperature. Under Bürger’s slightly more stringent hurdle, series 1 and 3 are excluded as having too low a correlation (these are the PC1 and foxtails, both of which are HS shaped and important to elevated 20th century results.) Osborn and Briffa say that they use a low correlation hurdle for the following reason:

Our decision to use a weak criterion for selecting proxy records was intended to reduce the probability of erroneous exclusion of good proxies.

Well, one of the “good proxies” that they are working hard not to exclude is Mann’s PC1. 😈 In addition, it is obviously ludicrous that the Team should continue to keep presenting permutations of bristlecones and foxtails as new studies, like the dead parrot in Monty Python. If, in addition, they have to lower the hurdle to get these series in, then don’t lower the hurdle. If the results are any good, they should survive the presence/absence of bristlecones/foxtails.

Their modeling of the cherry-picking process is ludicrous. It’s not even true that only excluded 2 series. What about the Polar Urals update that was in Esper et al 2002 (uniquely)? Why wasn’t that used? Well, they had a pretext for using Yamal instead – yeah, there’s always a pretext. Briffa knows all about this substitution – he was the one that originally did it back in Briffa 2000. Instead of reporting the updated Polar Urals results with a high MWP) as even a mining promoter would have had to do, Briffa substituted his own version of the Yamal series (which is now often attributed in Team articles to Hantemirov even though Hantemirov’s reconstruction is different than Briffa’s.) This substitution has a major impact on a couple of reconstructions – altering the medieval-modern level in Briffa 2000 and D’Arrigo et al 2006. So there’s at least one more series that they excluded. The pretext – that they’ve already got a series from that area (Yamal), but then what about the doubling up of bristlecone/foxtail series.

Osborn and Briffa falsely claim that the 14 series selected constitute “almost all the proxies with high temporal resolution” used in a range of Team studies:

The 14 series used in (2) were selected from three previous studies (3—5), although this set also encompasses almost all the proxies with high temporal resolution used in the other Northern Hemisphere temperature reconstructions cited in (2).

This claim is simply false as any competent reviewer would have pointed out – jeez, any reader of CA could have pointed this out. The studies cited are” MBH98, MBH99, Jones et al 1998, Crowley and Lowery 2000, Esper et al 2002, Briffa 2000, Briffa et al 2001, Rutherford et al 2005, Mann and Jones 2003, Mann et al 2003.

Briffa et al 2001 contains hundreds of series and has a big “divergence problem” not shared by the cherry-picked series. Indeed, the bias of the picking is proven by the lack of divergence. Their retort would be that they meant to limit the matter to series that go back to AD1000. Fine, but then they should say that.

Second, it’s not true even for the series that go back to 1000. I need to do a count, but at a minimum there are at least double that number within the listed studies: there are a couple of Morocco series in MBH99, a French tree ring series, several oddball series in Crowley and Lowery 2000. By the time we get to Osborn and Briffa, there has already been a lot of data snooping.

Beyond that, there is obviously data snooping before this. For example, the Mount Logan dO18 series goes back to AD1000 and has the same sort of resolution as other ice core series. Why isn’t it used? Well, it has a depressed 20th century which is attributed to wind circulation. But then how can you say that Dunde and Guliya aren’t as well, yielding different results. The use of Dunde (via the Yang composite) and not Mount Logan is classic cherry-picking that Osborn and Briffa have totally ignored. And BTW, another year has gone by without Lonnie Tompson reporting the Bona Churchill dO18 results. I’m standing by my prediction of last year that, if and when these results are ever published, they will not have elevated 20th century dO18. (And another year has gone by without Hughes reporting the Sheep Mountain update. What a swamp this is.)

And what about series like Rein’s offshore Peru lithics with a strong MWP anomaly? Is this excluded because it is not of sufficiently high resolution? Well, the Chesapeake Mg-Ca series has resolution of no more than 10 years in the MWP portion and has a couple of weird splices. And what of Mangini’s speleothems with high MWP? Or Biondi’s Idaho tree ring reconstruction? BTW: if the temporal resolution of the Chesapeake Mg-Ca series is used as a benchmark, there are quite a few ocean sediment series that qualify (e.g. Julie Richey’s series).

I need to make a systematic catalog of series going back to the MWP with resolution at least as high as the Chesapeake Mg-Ca series, but off the cuff, I’d say that there are at least 50 series, probably more.

So when Osborn and Briffa say that the universe from which they’ve selected can be represented by selecting 14 of 16, this is completely absurd. There has probably been cherrypicking from at least 3 times that population. But aside form all that, the active ingredients in the 20th century anomaly remain the same old whores: bristlecones, foxtails, Yamal. They keep trotting them out in new costumes, but really it’s time to get them off the street.


  1. Posted Jun 30, 2007 at 12:43 PM | Permalink

    Thanks for the mention Steve, some FYI citeable references resulting from the blog discussions with many on CA below. Glad this is finally being discussed.

    Reconstruction of past climate using series with red noise
    , 2006. AIG News, 83, pp14, March.

    Blog article to above.

    And Chaper 11 titled “Circularity” in my book
    Niche Modeling: Predictions from statistical distributions.”, 2006, Chapman & Hall/CRC, 201 pages.

  2. John A
    Posted Jun 30, 2007 at 12:50 PM | Permalink

    Maybe the recycling of tree ring proxies makes them carbon neutral. Who knows?

    Dave Stockwell: did you manage to locate the R-scripts for that original study?

  3. Posted Jun 30, 2007 at 1:06 PM | Permalink

    John, the script for that figure is part of an R package that regenerates the whole book using Sweave. I have to spend a bit of time extracting the parts so it will run as a stand-alone script, (but you have now embarassed me into getting more motivated).

  4. Posted Jun 30, 2007 at 2:02 PM | Permalink

    Sure there is a lot of snooping going on. Amazingly O&B contradict themselves. But the following view by Burger, that even more snooping is needed, probably won’t work either:

    He goes on to say ” This effect can only be avoided, or at least mitigated, if the proxies undergo stringent significance testing before selection. Osborn and Briffa did not apply such criteria”.

    More stringent criteria will simply result in a smaller number of series, and the proportion of temperature sensitive to random will be unknown.

    It seems to me there are two ways out of the dilemma. The first is use a completely independent determination of temperature sensitivity, not based on calibration, such as well established chemistry for example. The second is simulate the selection of proxies with persistent series and generate a synthetic reconstruction “null-hypothesis”, and only treat deviation from that shape as significant. When you do that, unfortunately the Mannian hockey-stick shape comes out as the null-hypothesis. In other words, it contains seems to contain no significant information other than what might be expected from the selection of proxies.

  5. Posted Jun 30, 2007 at 2:36 PM | Permalink

    The revised method in Osborn and Briffa’s reply generates meaningful confidence intervals that account for selection bias (at least amongst proxies included in the analysis). Certainly, there are problems with pre-selection in proxies, but a reply in Science is not the place to full explore this issue (having just had to write one, I know just how rigorously they enforce word limits).

    And be fair, Julie Richey’s excellent data from Pygmy Basin wasn’t published until more than a year after Osborn and Briffa’s paper. More generally, if the work relied on radiocarbon dated proxies, but with more precise Pb210 for during the calibration period, chronological uncertainties would make it almost impossible to determine global sychroneity or otherwise of climate over the last 1000 years.

  6. John F. Pittman
    Posted Jun 30, 2007 at 2:41 PM | Permalink

    It seems to me that all the data and proxies should be reveiwed for mistakes, inconsistancies, and problems. There is an unacknowledged acceptance of cherry picking by biologists. It is known that Gregor Mendel’s study of peas was cherry picked. Repication is almost always doomed to have less than conclusive correlations for Mendel’s work. It does not susprise me (I have a Biology degree as one of my degrees) that the dendros have engaged in cherry picking. It does not mean they are wrong…it means that more study is necessary. Especially the studies that show whether they have a chance of being right or not. It is more than somewhat troublesome they are not more open to scientific criticism consdiering that cherry picking is an accepted way to get the right result in biology.

  7. per
    Posted Jun 30, 2007 at 2:46 PM | Permalink

    …remain the same old whores: bristlecones, foxtails, Yamal. They keep trotting them out in new costumes, but really it’s time to get them off the street.

    a choice phrase !
    however, it is good to see such a pointed and justified criticism getting acknowledged as such in science. It will be that much harder for future Team-ers just to ignore selection bias.


  8. Philip B
    Posted Jun 30, 2007 at 4:35 PM | Permalink

    Experimenter’s bias is the phenomenon in experimental science by which the outcome of an experiment tends to be biased towards a result expected by the human experimenter. The inability of a human being to remain completely objective is the ultimate source of this bias. It occurs more often in sociological and medical sciences, for which reason double blind techniques are often employed to combat the bias.

    My emphasis.


  9. John Nicklin
    Posted Jun 30, 2007 at 5:13 PM | Permalink

    #8: Thanks for the definition. Recognizing the problem of bias is the first step to overcoming it. When researchers fail to admit that they have a bias, problems mount and we see things like withholding data or code and claiming certainty where none exists. By not recognizing our own bias we tend to move ourselves down a blind alley becoming evermore unaware that we are off track. Like the old adage, we don’t know where we’re going, but we’re making really good progress.

    I think that double blind studies were developed to address two important issues, one is bias (a researcher could be persuaded by money to find a result) and more importantly, the stakes are very high. Poor research could, and has meant death, so we need to ensure complete objectivity. As the stakes rise with climate research, it may be time to call for double or triple blind studies before we commit the entire domestic product of every nation to fighting phantoms.

  10. Posted Jun 30, 2007 at 5:40 PM | Permalink

    Hey, these are double-blind techniques. The Climate Modelers are blind to to their bias and statistical deficiencies, and we’re blind via concealed methodologies to prove their mistakes.

  11. bernie
    Posted Jun 30, 2007 at 6:32 PM | Permalink

    Can someone indicate how potent Burg’s comments are likely to be? Is this real progress? Is it simply confirmation for those who are already persuaded that the HS team is fiddling and diddling with the data or is it likely to lead to some major requirement to post data, methods and reassess same?

  12. Posted Jun 30, 2007 at 7:36 PM | Permalink

    #11 Good question. I’ll read it in more detail tonight, though if someone can explain what the main measure in Fig “Difference between the fraction of records that have smoothed and normalized proxy anomalies >0” means it would help.

    #5 Sorry I don’t see how “The revised method in Osborn and Briffa’s reply generates meaningful confidence intervals that account for selection bias (at least amongst proxies included in the analysis).” For that to be the case, there would have to exist in the world a population of only 16 temperature proxies. As the population of temperature proxies (i.e. trees or whatever) is essentially infinite, the simulated selection pool has to be large, like 10,000. There is nothing meaningful about pulling 14 black balls out of a bag of 16 black balls and saying all balls in the world are black, if you get my drift.

  13. John F. Pittman
    Posted Jun 30, 2007 at 7:52 PM | Permalink

    Thomas Jefferson and Special Awards
    ‘€¢ Holm Awards Year Length of Service Awards
    ‘€¢ 75-100 Year Institutional Length of Service Awards
    ‘€¢ 50 Year Length of Service Awards
    ‘€¢ 45 Year Length of Service Awards
    ‘€¢ 40 Year Length of Service Awards
    ‘€¢ 35 Year Length of Service Awards
    ‘€¢ 30 Year Length of Service Awards
    ‘€¢ 25 Year Length of Service Awards
    ‘€¢ 20 Year Length of Service Awards
    ‘€¢ 15 Year Length of Service Awards
    ‘€¢ 5-10 Year Length of Service Awards.

    The above awards are linked to the WWW and are part of the public domain.
    Senator Graham, and Jean Carter Johnson, FOIA officer of NOAA, please, note that NOAA has refused to answer FOIA requests to several FOIA submitters (legitimate requests as far as I can determine) based on the fact that NOAA sites and volunteer observers are claimed to be private even though their names and accomplishments have been made public (see above, details are available on climateaudit.org). Senator Graham, the emails to and from Jean.Carter.Johnson@noaa.gov, and other NOAA officials are easily obtainable by you or your staff. Up until approximately a week ago, these names, locations could be accessed by the internet. Positions and names were directly obtainable, accessible, and in the public domain, and available. Now there is the claim that they are private and this information has been removed from the web. There is no such thing as” a little bit pregnant”. Senator Graham as far as I can determine US law and NOAA FOIA guidelines have been violated. Could you and your office help me? Senator Graham, for your information, on a blog, I will ask that all that have been refused cooperation as required by law to contact you. Please, I hope you and your staff don’t mind a citizen of SC for all of his life of almost 54 years asks for this to be resolved according to US law and precedence.

    Sorry. this is not meant to be spamm. I need some help by those that have used FOIA.

  14. Posted Jul 1, 2007 at 3:21 AM | Permalink


    if someone can explain what the main measure in Fig “Difference between the fraction of records that have smoothed and normalized proxy anomalies >0’€³ means it would help.

    This methodology is explained more fully in the original paper. The procedure is to
    1) smooth each proxy
    2) standardise each to unit variance and zero mean
    3) at each point in time, count the number of proxies are more than a threshold away from the mean
    4) subtract the number of proxies that exceed with negative values, from those that exceed with positive values.
    This procedure is very simple, and quite a elegant way of summarising palaeodata. I have written some R code (just a few lines) that implements this.

    The pool of simulated series used for estimating confidence intervals is large. Each proxy is timeshifted to a new random starting position. Any values that move beyond the date at which the record ends are moved to the beginning of the record. The procedure above is then carried out on these simulated series. In the reply this is done for all the proxies, not just those that correlated with local temperatures, an a selection step is included now, by rejecting the records with the worst correlation to temperature.

    The selection step is now explicit in the Monte Carlo significance levels. Now the methodology is there, someone can try to look at the impact of pre-selection, the rejection of proxies before the analysis.

  15. bernie
    Posted Jul 1, 2007 at 6:16 AM | Permalink

    Doesn’t this procedure simply eliminate all proxies that would produce outliers? When I am analyzing data, I pay close attention to outliers, since until you can explain why these are outliers you have almost by definition an under-specified model? Isn’t it similar to the divergence problem, a kind of ex ante divergence problem?

  16. William Jackson
    Posted Jul 1, 2007 at 7:30 AM | Permalink

    Some advice to those who write Senators and Congress: Do not appear self-effacing or humble–and especially not apologetic. Take an “in your face” attitude and push hard. These people only react to a forceful approach. If they sense a genuine threat to their own position of power, they will blink.

    Go at them hard.

  17. Posted Jul 1, 2007 at 7:42 PM | Permalink

    Thanks RichartT, thankfully the original is free too.

    someone can try to look at the impact of pre-selection, the rejection of proxies before the analysis.

    #14 Its an interesting question. Its not clear that selection of proxies by correlation necessarily improves ‘signal to noise ratio’. While a higher threshold might increase quality of proxies, it reduces the number of them, so its a tradeoff. As Steve has put it, its simply a weighting problem, so the solution depends on the particular distribution of ‘signal and noise’ among the proxies. So I imagine there would be a study in the effects of different distributions.

    It seems like the exchange doesn’t really address the main issue.

  18. Kenneth Fritsch
    Posted Jul 2, 2007 at 8:05 AM | Permalink

    When Osborn and Briffa say:

    “it is difficult to quantify exactly the size of the pool of potential records from which the 14 series used in (2) were selected, because there is implicit and explicit selection at various stages, from the decision to publish original data to the decision to include data in large-scale climate reconstructions.”

    They have essentially said it all in my book. Once the selection process is embarked upon, keeping track of the total pool of selections used in order to make a Bonferroni like statistical adjustment is neigh on to impossible. When I once saw the potential numbers revealed to a data snooping developer of investment strategies by an economist, the numbers were staggeringly large and beyond the realm of belief by those doing the strategizing. There is nothing new here either in the reaction of those snooping/selecting the data. If one goes back to the individual trees used for the TR/MXD data, we know that they are part of a selection process, also.

    I am wondering whether Briffa and Osborn had any reservations about pre-selection/snooping before the fact of having the dangers pointed out to them.

  19. Posted Jul 2, 2007 at 10:08 AM | Permalink

    Seems like both authors are more intent on scoring points about whether 20th centuries are anomalous or not, than trying to honestly determine the conditions under which you could make such a determination or not reliably. You need at least to do as Steve has done and collect and review all the available proxy data and look at the assumptions of methodologies and a range of realistic possible distributions in the population. I don’t believe as Burger suggests that the proxies are drawn from a random population, but neither are all the proxies as reliable as O&B seem to assume.

  20. Joe Ellebracht
    Posted Jul 3, 2007 at 6:34 PM | Permalink

    I am not sure about ice cores and sediments, but it always seemed to me that tree-ring paleodendroclimatologists were looking for faint signals from the past, based upon weak statistical correlations in the present and a lot of judgement and selection. As a way to tease out some slight insight about past climates this is inexact but fun science. Adding statistical tests ruins the fun and lets people make fun of the scientists (as we have at CA), and then of the science. Instead of being like a pediatrician who notices that several kids have leukemia in town and mentions it to the mayor, you become a public health officer who has to take into account naturally occurring random concentrations of diseases, of immigrants, of potential causes, and many other factors too difficult for the local doctor to calculate or perhaps even accept. Very soon you have to employ professional statisticians, and good ones at that. The public health officer will be much less certain about the meaning of the leukemia cases than the pediatrician.

    I am not a professional statistician, as the following segment will make clear, but I know that the assumption of an unvarying relationship between tree rings and temperature (or precipitation) over very long periods of time for a tree or a stand of trees is incorrect. The local environment of a tree is not stationary. They are not planets orbiting through empty space subject to only a few physical forces.

    Anyone who has constructed a model based upon correlation over time is aware of this problem. But some faint insight into how fast the correlation relationship breaks down over time can be acquired by splitting the sample.

    How much does projecting recent correlations backward or forward reduce their utility? This reduction can easily (and roughly) be modeled by dividing the calibration period in half, using the latest half to backcast the earlier half, and calculating the backcast errors compared to the calibration half errors. Assuming the average prediction error is higher in the backcast set, then a simple linear time dependent prediction error model can be made. Or even a fancy model if you want.

    My model is perhaps too simple, but here it is.

    So if the present calibration period is 60 years, divided in half you have two periods of 30 years. A new calibration is made using only the latest 30 years. If the average temperature prediction error of the latest 30 years, the one used for the new calibration is 0.5 degrees, and the mean prediction error of the earlier backcast 30 years is 0.6 degrees, then you have an error drift of 0.1 degree per 30 years. 30 years is the diference between the mean dates of the two halves of the data. Dividing .01 degree by 30 years is .0033 per year. Now you are ready to estimate prediction errors for older data. Data 1015 years before the endpoint of the latest 30 year calibration period is 1000 years older. 1000 times .0033 is 3.33 degrees as the rough projection of the estimating error increase. 3.33 degrees plus the original 0.5 degree error from the calibration period is 3.85 degrees for the average error of the estimate.

    So the error bars get wider as the backcast gets farther into the past.

    I apologize to those who know more statistics than I do for reinventing the wheel, and I realize my wheel is squarish.

    Now for data snooping, there are accepted corrections. Just count every tree core taken and considered as potential evidence for publication in the scientific literature on your subject (OK, limit it to PhD’s and candidates), and consider that number of tree cores as the population snooped among to come up with the published results. Adjust the significance levels of the reported correlations accordingly. Yes, this means your results are not statistically signficant. Maybe, though, you picked up a faint signal about past climates and had some scientific fun.

  21. Scott-in-WA
    Posted Jul 4, 2007 at 12:06 PM | Permalink


    Without having had satellites continuously in orbit for the last 2000 years watching the entire globe and collecting a near-continuous stream of data throughout that period, any notion that we can accurately determine average grid cell temperatures even as close as 1 degree C going back two millennia is preposterous—regardless of the methods and the techniques now being attempted.

    OK, does this make any difference to the dedicated AGW hucksters? Not really, there is just too much personal profit at stake.

    The GCM research community is in the business of manufacturing climate data to support their AGW promotional products. (The notion of “climate signal” is one such product.) A coating which has the look and feel of the scientific method is painted on to these AGW promotional products so as to make them more marketable—first to the policy makers in government who use AGW as a means of gaining power and influence, and next to the Media who likewise make use of AGW for their own purposes; i.e., the selling of newspapers, magazines, air time, and cyberspace billboards.

    OK, Joe, let’s have you join the ranks of the professional AGW hucksters (temporarily) and have you step back and take a hard look at this new process for manufacturing climate data. Let’s call it “Joe’s CLI-matic Data Dispenser, Version 1.0.”

    Ask yourself some important questions about Joe’s CLI-matic Data Dispenser. From a marketing perspective, does the new technique have enough of the look-and-feel of both hard science and statistical theory so as to be useful in promoting the data it generates? But even more importantly, does the new process generate the right climate data for your customer base?

    If not, revise your new technique again; and if the process looks good to you, revise your advertising material as well, once you are sure this latest version is sellable. Make sure too that the latest advertising is peer reviewed by the Marketing Group to be sure it is in line with the latest AGW marketing trends.

    Once this has been done, take some time to gather feedback as to how widely your new data manufacturing process is being accepted by your customers, and as to how well sales are improving. If your market share within the full universe of AGW data manufacturing products is growing, and if profits flow in ever-greater amounts, then it’s time to move on and start another development phase in the AGW product lifecycle—Joe’s CLI-matic Data Dispenser, Version 2.0.

    At any rate, that’s where the true fun and reward of being an AGW huckster lies; in playing many creative games with AGW data in the pursuit of money, power, and influence.

  22. RomanM
    Posted Jul 5, 2007 at 8:34 AM | Permalink

    #20, 21

    The reconstructions of the climate by Prof. Mann and his colleagues are clearly incredible! The ability to take some tree rings, ice cores, and other assorted information and to turn them into an extremely precise picture of the climate during previous centuries or even millennia boggles the mind. How precise is that? To answer this question, you just need to examine the standard errors of the estimated temperature anomalies in their reconstructions.

    As an example, we can look at the two papers that the Mann team published in 1998 and 1999. Although the actual values of the standard errors do not seem to be given in the body of the papers, from Figure 3a in the 1999 paper, we can approximate that the standard error for the estimated anomaly for a given year is less than .125 C for years back to 1800 and less than .250 C for years dating back to 1000. Since the methodology for calculating these values is not explained in the papers, finding the flaws in their calculation becomes a bit harder. However, it is instructive to make some comparisons with other results.

    According to the web site, http://www.hadobs.org , the Hadley Climate Research Centre calculates the “annual mean global temperature” using over 3000 monthly station temperature series (which in turn are calculated from daily temperature series using accurate thermometers). According to the site,

    Annual values are approximately accurate to +/- 0.05°C (two standard errors) for the period since 1951. They are about four times as uncertain during the 1850s, with the accuracy improving gradually between 1860 and 1950 except for temporary deteriorations during data-sparse, wartime intervals.

    This implies that the standard error for their estimates is about .025 C for a given year in the latter half of the 20th century.

    How does this compare with the standard error of the reconstruction estimates? Suppose that we were to randomly select 120 of the stations and use their results to estimate the global temperature anomaly. This would result in an estimate with a standard error of about .125 C (actually, slightly higher, since the station values are not equally weighted). From 1800 to the present, Prof. Mann can calculate an estimate of the global temperature as precise (or more precise!) than you could by observing daily readings from modern weather stations placed at each location from which the proxies were selected! Of course, prior to 1800, the situation is not quite as impressive. Their precision is equivalent to obtaining the daily readings from only 30 randomly selected stations for a period of 800 years.

    What are these guys smoking??? Clearly, they don’t understand the implications of their results. So, why are their errors understated? What virtually all paleoclimatologists are ignoring is the fact that they are doing cluster sampling with their proxies. Trees at a given site can only tell you at most about what is happening at that site (disregardng for the moment other documented problems with their methodology). If I have nine trees at one site and one tree at another, it is inappropriate to treat all ten trees as equal information (which is generally what the result of their principal components calculations more or less does). The correct thing to do is to first combine the results at each site and then continue with the estimation procedure by combining the results from the different locations (clusters). Since much of the variation in the estimate of a “global temperature anomaly” stems from the fact that there is a huge variation in the patterns at various locations (some show increasing trends, many others show decreasing trends or none at all), the fact that this is ignored in most paleoclimatology reconstructions leads to drastic underestimation if the reliability of the results. Interestingly, this “between cluster” variability can be estimated independently of the proxies themselves from the actual observed temperature record. To extend this to the past does require some assumptions about the stationarity of global weather patterns.

    In another thread on CA, it was suggested that the IPPC was backing away from the hockey stick. My suspicion is that they are coming to the realization of just how flawed and unbelievable these results are, not only for the hockey stick, but for such reconstructions in general.


  23. SteveSadlov
    Posted Sep 26, 2007 at 4:38 PM | Permalink



  24. SteveSadlov
    Posted Sep 26, 2007 at 4:39 PM | Permalink

    A key extract from the above blog:

    Bürger repeated all analyses with the appropriate adjustments and concluded “As a result, the ‘highly significant’ occurrences of positive anomalies during the 20th century disappear.” Further, he reports that “The 95th percentile is exceeded mostly in the early 20th century, but also about the year 1000.”

  25. Sam Urbinto
    Posted Sep 26, 2007 at 4:56 PM | Permalink

    What about what Mackay found!

  26. SteveSadlov
    Posted Sep 26, 2007 at 5:21 PM | Permalink

    RE: #25


    Lake Baikal therefore, contains a potential uninterrupted paleoclimate archive consisting of over 7500 m of sedimentary deposits, extending back more than 20 million years.” If that is not perfect enough, the Lake “is perhaps best well known for its high degree of biodiversity; over 2500 plant and animal species have been documented in Baikal, most of which are believed to be endemic.” The Lake is a long way from the moderating effects of any ocean, and therefore, the Lake should experience large climatic fluctuations over long and short periods of time.


    Even in a setting such as Sargasso, there is great value to the sedimentary column. Notice how the Team steer well clear of use of these sorts of proxies. They are not stupid. They know very well about such proxies. That makes it all the more cynical. Bristlecones …. bah!…… humbug!

  27. Pat Keating
    Posted May 9, 2008 at 8:04 AM | Permalink

    Here’s an interesting item from a Science mailer. These authors appear to believe that bristlecone rings are probably more dependent on humidity than temperature:

    The annual rings of the bristlecone pine trees in eastern California, extending back in live trees about 8000 years, are a lynchpin for calibrating radiocarbon dates. They have also provided important local climate records, as ring width is thought to reflect temperature and moisture; the length of the record is particularly useful in applying the data toward long reconstructions of global climate.
    Berkelhammer and Stott have now obtained an annual resolved record, extending back to the year 1700, of the oxygen isotope composition of cellulose from the rings of two California bristlecone pines. Interpreting the record is somewhat complicated, as isotopes are fractionated within the tree and by transpiration from the leaves as relative humidity varies, but large changes in values probably reflect variation in the main sources of storms in the region. The record shows small oscillations every 20 years or so that probably reflect Pacific climate variations. More dramatic is a steep change in the mid-1800s that indicates a change from southern-derived moisture during the time of the Little Ice Age to more northern winter storms since. The effect of such changes on the overall growth of the trees, and thus the ring width used in longer reconstructions, need further study. — BH

    Geochem. Geophys. Geosys. 9, 10.1029/2007GC001803 (2008).

4 Trackbacks

  1. […] writer [Richard LIttlemore] (I got your email from Kevin Grandia) and I am trying to fend off the latest announcement that global warming has not actually occurred in the 20th […]

  2. […] writer [Richard Littlemore] (I got your email from Kevin Grandia) and I am trying to fend off the latest announcement that global warming has not actually occurred in the 20th century. It looks to me like Gerd Burger […]

  3. […] [Richard Littlemore] (I got your email from Kevin Grandia) and I am trying to fend off the latest announcement that global warming has not actually occurred in the 20th century.It looks to me like Gerd […]

  4. […] writer [Richard Littlemore] (I got your email from Kevin Grandia) and I am trying to fend off the latest announcement that global warming has not actually occurred in the 20th century.It looks to me like Gerd Burger […]

%d bloggers like this: