New Data from Hantemirov

Yesterday, I received updated Yamal data (to 2005) from Rashit Hantemirov, together with a cordial cover note. As CA and other readers know, Hantemirov had also promptly sent me data for Hantemirov and Shiyatov 2002. There are 120 cores in the data set, which comes up to 2005. I’ve calculated a chronology from this information – see below.

In the wake of the 2009 Yamal discussion, CRU had contacted Hantemirov for additional data. In CG2-1025 on Oct 5, 2009, Hantemirov wrote Melvin, sending additional data as follows:

Dear Tom,
files with living trees data attached, that I use to update Yamal chronology (these data have been used among many others in Esper et al. 2009: [1]
First letters in ID means river (valley):
TNL – Tanlova-yakha;
HDT, M, X – Khadyta-yakha;
POR – Porza-yakha;
all others – Yadayakhodyyakha

This email caught my attention because the CRU reconstruction using “all the data” in October 2009 did not contain any cores with prefix TNL or HDT. (Note that the HDT, M and X prefixes denote cores from Khadyta River – which Mann and Real Climate declared to be “inappropriate” for use in a reconstruction.)

In Table S2 to Esper et al 2009 (mentioned by Hantemirov), site 25 was located in Yamal (70E, 67.5N) and had 120 cores.

I did an RCS-style calculation on the Hantemirov data set and repeated the comparison illustrated in my original Climate Audit post of Sep 27, 2009 that initiated the present controversy, as shown below:

Figure 1. Yamal Chronologies. Green – from Hantemirov _liv.rwl dataset; red- from Briffa et al 2008.

Now here is the corresponding graphic from the Yamal post in September 2009:

Figure 2. Yamal chronologies from original post. Green – Yamal data from Briffa 2000 plus Schweingruber Khadyta River; red- from Briffa et al 2008.

Here is the calculation from Hantemirov data compared to the Sept 2009 green chronology plotted together:

Figure 3 – From Hantemirov data versus Sept 2009.

I think even my severest critic might observe that the chronology using the Hantemirov data set is remarkably similar to the green chronology of September 2009. The closeness of the chronologies is partly coincidence – the September calculation was only intended to show the potential impact of additional data on the very small Briffa data set. But coincidence or not, the likeness really is remarkable. The results through the 1990s and early 2000s are interesting as well: elevated ring widths, but not the multi-sigma of Briffa’s Yamal chronology.

While the form of my RCS calculation differs slightly from CRU’sstyle in Briffa et al 2008 (which is not archived), I am 99.9% sure that the difference is not material to the major point of the result and that an RCS calculation on the Hantemirov data in current CRU style would yield a chronology indistinguishable from the one that I’ve presented (yes, an “insta-chronology”, but they really only take an instant to calculate.)

In contrast, CRU’s 2009 calculation, said to use “all the data”, showed negligible change from the chronology of Briffa et al 2008, as shown below (on the same scale):

Figure 4. Yamal chronologies of Briffa et al 2008 and CRU 2009.

The difference between the two results is the selection of data. The 2009 CRU calculation only included a fraction of the Hantemirov dataset (while increasing populations at the YAD, POR ad JAH sites.)

Esper et al 2009 has almost unintelligible graphic and nothing is archived. (It was received on Jan 28, 2009 and accepted on Jan 30, 2009.)

It discusses West Siberia and includes the Yamal and Urals areas. Here is an excerpt from its Figure 6. The red is the ring width chronology and, if you squint, it has points of similarity to the green curve in Figure 1 – it definitely does not show any support for Briffa-style multi-sigma growth.

Figure 5. Excerpt from Esper et al 2009 Figure 6 (resized to correspond). “Middle panel shows the WSIB and WSIBnew tree growth data scaled over the 1881–1990 (WSIB) and 1881–2000 (WSIBnew) periods to regional JJA temperatures”

The chronology here was calculated as follows (you can’t replicate this because I haven’t placed the Hantemirov data online) but it shows the form of calculation and shows that the data set is the same as the one used in Esper et al 2009. This should generates all figures turnkey (May 16, 2012).

	trim=function(x) window(x, start=min(time(x)[!]),end=max(time(x)[!]) )

#Hantemirov Living Data
  #living data from Hantemirov sent by email
 	#range(tree$year) # [1] 1580 2005
  	#length(unique(tree$id)) # [1] 120
	#X=data.frame(year=c(time(chron.hant)), chron=chron.hant)
	#write.csv(X, file="d:/climate/data/yamal/chron.hant.csv", row.names=FALSE)
	X= read.csv("")

#Yamal Chronologies
  #Briffa 2008 Chronology
	loc= ""
	  download.file(loc, "temp",mode="wb");  load("temp")
	tsp(yamal08) #-202 1996
#Figure 1: compare Hantemirov Living RCS to CRU 2008
	legend("topleft",fill=3:2,legend=c("Hantemirov Live","CRU 2008") )
	title("Yamal Chronologies from 1850")

#Figure 2: this is from a September 2009 post 
    #script is at

#Figure 3: compare Sept 2009 to Present graphic 
	  tsp(crn2) #  -202 1990 
	legend("topleft",fill=c(3,1),legend=c("Hantmirov Data","Sept 2009") )
 	title("Yamal Chronologies from 1850")

#Figure 4: compare CRU 2008 to CRU 2009
 	loc= ""
	 download.file(loc, "temp",mode="wb");  load("temp")
	  tsp(yamal09) # 202 1996    1

	legend("topleft",fill=c("red4","red"),legend=c("CRU 2009","CRU 2008") )
	title("Yamal (CRU) Chronologies from 1850")


  1. Spence_UK
    Posted May 15, 2012 at 7:42 AM | Permalink

    Yes, but the ability to pick and choose the trees is an advantage unique to dendroclimatology.


    Esper et al 2003 (see here)

    However as we mentioned earlier on the subject of biological growth populations, this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

    • Jeremy
      Posted May 15, 2012 at 10:13 AM | Permalink

      That really cracks me up every time I read it.

      Yes, yes, the ability to pick and choose which data from telescopes best fits the standard model is unique to astrophysics.

      It just sounds more absurd every time.

      • Gord Richens
        Posted May 15, 2012 at 2:28 PM | Permalink

        One could go all day…

        The ability to pick and choose which patient to treat is unique to medicine.

        • Dave Brick
          Posted May 15, 2012 at 4:50 PM | Permalink

          Even more apt:

          The ability to pick and choose which patients respond favorably to a medication to enhance a desired signal is an advantage unique to medicine.

  2. Alberto
    Posted May 15, 2012 at 7:46 AM | Permalink

    “This email caught my because the CRU reconstruction”


  3. KNR
    Posted May 15, 2012 at 7:55 AM | Permalink

    This is the reason why the standard approach is , or rather it should be, that if you leave data out you most provide the information on why you did and what it was you left out . Once again its normal practice for an undergraduate to do this , but abnormal for the ‘best ‘ , so they call themselves , in climate science. You really are left with a feeling climate science professorships are awarded if you collect enough bottle tops .

  4. bernie1815
    Posted May 15, 2012 at 8:44 AM | Permalink

    When you say update, are you saying that Hantemirov or someone on his/her team went back and recored the same 120 trees?

    • Steve McIntyre
      Posted May 15, 2012 at 9:00 AM | Permalink

      Re: bernie1815 (May 15 08:44),
      Probably not the same trees. Just a fresh sample (incorporating the 17 live trees of Hantemirov and Shiyatov 2002, but much expanded)

  5. Steve McIntyre
    Posted May 15, 2012 at 9:03 AM | Permalink

    I’ve added in a graphic showing the much derided Sept 2009 sensitivity example (which was not presented as anything other than a sensitivity study) to the chron from the present data. They really are “remarkably similar”.

  6. morebrocato
    Posted May 15, 2012 at 9:29 AM | Permalink

    It is utterly fascinating to me to see that Steve McIntyre and the folks at RealClimate have essentially the same rundown of events, yet in the way it’s presented and framed, you’d think they have nothing in common.

    You state:

    “A URALS regional chronology had been calculated as of April 2006. This was a version of the regional chronology which remained unchanged for many years” and then he ‘concludes’: “The regional chronology has not been a “work in progress” for years.”

    But the reply is:

    This is a very clear statement that of what he thinks (or rather he thinks he knows). But the reality of science is that finished products do not simply spring out of the first calculation one does.

    So it’s absolutely true that this whole ‘late-night-at-the-office’ thing was indeed had by the Briffa et al researchers when the new data came in, and it could be assumed that they did (as you say, “99.9%”) similar calculations (the differences are meaningless) that perhaps showed identical results to your charts posted here and earlier regarding the wider regional Urals-Yamal data set.

    So then, when Steve McIntyre sees the results of the ‘insta-reconstruction’ he immediately throws it out there… (one camp says this is the ‘a-ha’ moment of voluminous data, the other says ‘not-so fast’).

    People generally try something, find something wrong, try something else, fix one problem, test something else, deal with whatever comes up next, examine the sensitivities, compare with other methods etc. etc. All of those steps contribute to the final product, and it is clear that the work on this reconstruction is indeed ongoing.

    So the question then becomes… What gave the original researchers the idea that there’s something wrong with the data, rather than thinking this new data instead challenged their original findings? I suppose we’ll see the flags that were raised when the actual paper comes out in October (which will be a fascinating thing itself), but it could boil down to simply the thought that the presently measured temperature record (and its recent HS shape) should either be matched in the cores, or there may then need steps to be taken to refine the sample in an Esper-ian Mann-er.

    In my head, isn’t that the only way they could come up with the idea that it’s going to take ‘too much time’ to go through the data? Otherwise, why do the initial ‘insta-reconstruction’ in the first place if you know in advance the large number of samples are going to need to be filtered.

    When it finally comes out, it will be interesting to see if these same methodologies described in that paper were applied to the smaller Yamal area/cores. Perhaps they won’t be because of an ascribed anomalously high value of the site itself in supplying unvarnished windows into regional temperature. But, whatever that site selection methodology is, it still would then have to be applied to the other sites in the regional chronology (though it is on record in at least one place that on site-selection alone the Khyadyta River passes muster).

    To continue…

    For an analogous example, the idea that the first simulation from a climate model would be a finished product is laughable – regardless of the existence of that original output file. It would obviously be part of the work in progress. Although science is always in a work in progress in some sense, it is punctuated by milestones related to the papers that get published. They stand as the marker of whether a stage has been reached where something can be considered finished (though of course, it is always subject to revision).

    My thought here (which I’ve been having a lot lately), is when new science revises and/or corrects old science, there should be some sort of acknowledgement of an incorrect or unadvisable procedure from a previous paper that henceforth should be avoided– included in the new stuff, no? It could/should be easy to say that the original MBH paper relied on substandard data and/or methodologies— particularly when corrected in future ‘milestone’ publications come out, regardless if they ‘confirm’ the original. It would be great for climate science communication if this happened, but unfortunately there’s too much poison in the well because only folks like Steve McIntyre figured out ‘publicly’ what all the climate scientists were conversing about often (in the climategate emails). The same thing could be said about the early Yamal papers.

    I guess scientists have at least some right to hold onto their own data until their ready to publish it, and Gavin may be right about the ‘insta-reconstruction’ not constituting ‘adverse results’ that went unreported, but that depends on what comes out as the grand dendro methodology we’re all waiting for. But, in all this, it begs the question of why bother publishing the 2008/9 paper on Yamal? Even the researchers themselves would have known that that paper was near irrelevant compared to what the larger regional chronology would say when they ever got it done. For all the talk that NW Siberian dendrochronologies are such minor players in modern Climate Science, there certainly seems to be quite an apetite for even re-hashing that data occasionally while the Big One is tinkered with back at the lab.

    In summary, McIntyre is wrong in his premise, wrong in his interpretation, and wrong in his accusations of malfeasance. – gavin]

    It’s like there’s a “Connect the dots” game going on, but at the same time, it’s an M.C. Escher drawing or some optical device…
    “A ha! I have found a rabbit! No, you idiot… You’re staring right at a duck”.

    To Gavin’s credit, in situations like these it’s best to award the benefit of the doubt to the scientists themselves who are describing their own work/motives. However, they do have a high burden of explanation for their methodology.

    • Steve McIntyre
      Posted May 15, 2012 at 9:42 AM | Permalink

      Re: morebrocato (May 15 09:29),

      thoughtful comment.

      A related point that I’m probably going to work up into a post. My original question – why didn’t Briffa do the regional chronology at Yamal in the same way as he’d done it at Taimyr? – is a question that a journal peer reviewer might have asked (if he’d been aware of what Briffa had done). The poor documentation of the article makes the question hard to ask, but that’s different.

      Asked by a journal peer reviewer, Briffa would have given a measured answer. I’m not sure what the answer would have been, but it wouldn’t have been along the lines of the RC Hey Ya-mal post, or the Yamalian yawns post. If he said that he hadn’t completed the Yamal analysis, the reviewer would undoubtedly have said: well, finish it.

      Let’s suppose that Briffa said that they’d done some experiments and they hadn’t worked. If the reviewer said – well, show me the results of the experiment that you’ve discarded – then he’d be entitled to see it. If Briffa refused, he’d recommend rejection of the article.

      • morebrocato
        Posted May 15, 2012 at 10:17 AM | Permalink

        I don’t know why the Rabbit/Duck image isn’t showing up there…

        If you wanted to see it 😉 (but I bet some of you already know what I’m talking about).

        No journal (or reviewer) is going to know what researchers aren’t presenting to them or have on the desk still to be worked on, and probably don’t have a claim to declare what they should be doing. However, at some point it should be declared that these 2008/9 papers were a little weak in how ‘new/interesting/progressing’ the work is (especially when it’s applied to numerous other papers in rejecting them).

        To me, it’s like what happens when an author writes a really bad novel after a few good ones… Now the publicist and related companies are going to be more careful on the advance and the initial run so they don’t get hosed. But in this case, these folks may be the only authors available for this data.

        —-But surely since the data is available, and since whatever methodology is applied to it in this forthcoming paper would necessarily have to be backed up in the literature elsewhere, theoretically one of the folks reading these blogs can take it upon themselves to do the work and publish it, no?

        Is this something you can do Steve?

        • Kenneth Fritsch
          Posted May 15, 2012 at 4:52 PM | Permalink

          “No journal (or reviewer) is going to know what researchers aren’t presenting to them or have on the desk still to be worked on, and probably don’t have a claim to declare what they should be doing. However, at some point it should be declared that these 2008/9 papers were a little weak in how ‘new/interesting/progressing’ the work is (especially when it’s applied to numerous other papers in rejecting them).

          There is much that the reviewer will not know about the origins and selection of data and even deeper potential mysteries going into a paper submitted for publication. That should not be a point of discussion here given that the practice in publishing papers about proxies/reconstructions should include, by the authors own free will and interest in science, and in this case, good statistics, an impeccable list of physically based and reasoned criteria used for the selection of proxies and when those rules where applied and what exclusions were made on a posterior basis and why. If an author was not forthcoming in this process they well could be embarrassed or even suffer a worse fate by future revelations.

          In the meantime the judgments on the validity of the temperature proxy and reconstruction publications to date has to be based on what was revealed by the authors in efforts to avoid a real or perceived selection bias. Whether the authors (and the journals and reviewers) were aware or unaware of the consequences of a poorly documented and/or performed selection process really matters little in these judgments.

          Further it should be noted that even given a proper documentation of the selection process a sensitivty test is always in order. Best it be by author but even the citizen-scientist doing it on a blog is a valuable addition to the discussion.

    • theduke
      Posted May 15, 2012 at 10:54 AM | Permalink

      But, in all this, it begs the question of why bother publishing the 2008/9 paper on Yamal? Even the researchers themselves would have known that that paper was near irrelevant compared to what the larger regional chronology would say when they ever got it done. For all the talk that NW Siberian dendrochronologies are such minor players in modern Climate Science, there certainly seems to be quite an apetite for even re-hashing that data occasionally while the Big One is tinkered with back at the lab.

      The simple answer is they thought they could get away with it and were intent on filling the airwaves with re-affirmation studies. But then this happened:

    • MikeN
      Posted May 15, 2012 at 6:22 PM | Permalink

      morebrocato, in the Climategate files there is an e-mail where a scientist expresses surprise that their hockey stick paper was published. They created a hockey stick by doing a count of number of proxies for each time period that surpassed a certain threshold. This is what they were graphing, the number of proxies out of I think 14 that satisfied a criteria.

  7. Nosmo King
    Posted May 15, 2012 at 9:33 AM | Permalink

    It must be really humiliating to “The Team” that they, with their grants and tenured positions, are getting eaten alive by Steve and a few others — the real scientists in the discussion — who work for the love of the truth and not much else.

    Keep up the amazing work, Steve! You may not think of it in these terms, but you are doing a huge service to millions of people who, without your noble efforts, might fall victim to the tyranny of what it is the warmists are truly trying to achieve.

  8. daved46
    Posted May 15, 2012 at 9:33 AM | Permalink

    Steve, did the data include the raw data or just the chronologies after growth-curve fitting? Also, is there any metadata which would allow you to compare them on the basis of altitude or nearness to the rivers? Also, I’m assuming (without having time to go look at earlier posts here, that the CRU 2008 set was all or mostly dead trees. What would be the likely affect on measured growth (while alive) for trees which later died? Presumably those which lived would have been ones in better circumstances, but it’s not obvious to me whether these would be trees which are now growing faster or slower since the climate is warmer.

    • Steve McIntyre
      Posted May 15, 2012 at 9:51 AM | Permalink

      Re: daved46 (May 15 09:33),
      calculated from measurement data. I’ve added script to bottom of post.

      One of the advantages of accumulating R tools to study tree ring data sets is that I can do this sort of analysis in a couple of lines.

      • TerryMN
        Posted May 15, 2012 at 9:55 AM | Permalink

        Typo in the script comment, 1580 sb 1850. (Pedantic, but bad doco has bitten me before).

        Steve: script comment is right. I showed the 1850 on portion to compare to the Sept graphic.

        • TerryMN
          Posted May 15, 2012 at 10:11 AM | Permalink

          Ah – should have phrased it as a question…. 🙂

      • Posted May 15, 2012 at 1:29 PM | Permalink

        Steve, have you seen the R package ‘dplR’? I don’t know much about dendrochronology, but it has the nicest wavelet-analysis display I’ve seen in any package. Try out its ‘morlet’ and ‘waveplot’ functions.

        Steve: I heard about this recently and haven’t parsed it. A number of its functions are utilities that I had independently written.

      • daved46
        Posted May 15, 2012 at 5:55 PM | Permalink

        Re: Steve McIntyre (May 15 09:51),

        I’m not quite sure if you mean you plotted it from the measurement data, though that would seem to be the case since you talk about the advantages of R. If so, could you make a grass plot of the data? They’re really handy for understanding the data at a glance.

  9. Anthony Watts
    Posted May 15, 2012 at 10:32 AM | Permalink

    This post has some interesting points about the ability of Bristlecone pines to record temperature extremes.

    Why Are Dendro Shafts So Straight?


    We see that we have no lower bounds (or upper for that matter) of the regional temps. So, the sensitivity to temps are constrained within this narrow margin of time and temps. Even if all of the other factors going into tree growth were quantified to such an exacting purpose as to be able to pick up on a few 1/10ths of a degree (they are not) the physical limitations of growth means we would see see a flattening in the plotting of temperatures. No extremes could be plotted because the trees are incapable of divining such a signal.

    I thought it worth bringing to your attention.

    • Willis Eschenbach
      Posted May 15, 2012 at 12:54 PM | Permalink

      Anthony, I have a different (or perhaps an additional) explanation.

      The results like the Hockeystick have a straight shaft because they have been selected by comparing them to the modern (rising) temperatures. Those with rising temperatures in the modern era are kept, and the rest are discarded for not being “sensitive to temperature” or some such rubric.

      As a result, they all agree in the modern time period (rising), but have random variations in the earlier time periods.

      Now, consider what happens when you average them. The recent rising part is common to all of them and is retained in the average, while in the earlier sections, the random variations tend to all cancel each other out.

      Result? Hockeystick, straight shaft, rising blade …


      • Steven Mosher
        Posted May 15, 2012 at 1:10 PM | Permalink

        Yup. it must of necessity reduce the variance in the shaft

        • POUNCER
          Posted May 15, 2012 at 8:58 PM | Permalink

          snip – blog policies discourage trying to solve the big picture in one paragraph

        • Posted May 16, 2012 at 5:22 PM | Permalink

          Hi Willis, Steve. This is exactly the same point I made in a lecture on AGW last fall. Can’t remember if I came up with it myself or stole it from someone. But it is pretty hard to miss once you see what was done. You simply can’t select data based on matching the modern data and expect all the previous 900 years (or whatever) to reflect reality.

        • Steven Mosher
          Posted May 17, 2012 at 1:02 AM | Permalink


          I would not say that it doesnt reflect reality. Rather, you have a reduced variance version of reality.
          This is widely, but not loudly, acknowledged in the field. personally, I don’t see why folks dont run the numbers for all cores ( see what you get ) and then run the numbers for cores based on selection criteria.

          The choice of which cores to use seems to me to be the number one sensitivity question you would ask.

          If one finds that a huge portion of cores do not “express” a signal then you have some arguments to make

          1. maybe dendro isnt all its cracked up to be
          2. maybe you cant select trees that are temperature sensitive merely by inspection on site

          But as long is the practice is to screen cores based on corellation this fundamental question
          doesnt get asked. I like questions. Questions are good. run the test with and without screening
          and then you get to ask a fun question. and you have numbers.

          Its not like somebody is going to go blind if you do the test.

        • bernie1815
          Posted May 17, 2012 at 6:35 AM | Permalink

          Steve, I can see the shaft being flat by why a reduced variance? If they are random, then wouldn’t the variance actually increase dramatically relative to the blade portion of the stick?

      • Posted May 15, 2012 at 2:37 PM | Permalink

        Recently I’ve been graphing the CRUTEM NH and SH data and I’ve noticed the same thing. The Annual graphs look like temperature globally rarely varies by more than .5C from year to year, while temperatures in the 1800s fluctuate by 1C or more in a year or two. And looking at individual months shows even wilder fluctuations.

        Maybe climate was more chaotic in the past. Or maybe modern temperatures are in synch because the thermometers are all in airports and cities with giant UHI bubbles that keep temperatures from fluctuating too much.


        • Steven Mosher
          Posted Jun 10, 2012 at 12:23 PM | Permalink

          1. you need to compare the actual number of stations in your comparison. prior to 1900
          the SH is lightly sampled in CRU datasets. A smaller sample will lead to larger swings
          on a monthly and annual basis. you are seeing an artifact of sampling rather than
          the climate. It’s easy to prove this to yourself with some simple simulations.

          2. The number of stations in large cities is rather small.

          3. You cannot just average Cru3 data. you have to take account of the common anomaly period
          and do spatial averaging.

      • Steve John
        Posted May 15, 2012 at 3:16 PM | Permalink

        Lucia did an interesting analysis of this phenomenon in 2009 in a post titled “Tricking yourself into cherry picking”:

        Steve: that’s a good exposition. The same phenomenon has been reported (more or less independently) by Jeff Id, Lubos Motl, David Stockwell, myself. It’s instantly understood on the critical blogs, but seems to baffle climate scientists.

        • Kenneth Fritsch
          Posted May 15, 2012 at 5:12 PM | Permalink

          That cherry picking in this manner can baffle otherwise intelligent people become very clear to when I used to participate in blogs on investing and investing strategies. At one blog we had an economist who was an excellent teacher (he went by the name Datasnooper) pointing out in great detail and by example why an investment strategy that was devised with in-sample data could well fail with out-of-sample data (i.e. when real money was ventured on it). Some of the strategies being proposed with in-sample data even had obviously nonsensical criteria but since they performed well based on past data some very intelligent people would question Datasnooper’s teachings and even attempt to refute them. They did not even bother to understand, that of those investment strategies devised from in-sample data, only the best would survive to be talked about and proposed for use and that we had lost all those losing strategies needed to do the proper statistics.

          Some of those otherwise intelligent people were scientists.

        • Jeff Alberts
          Posted May 16, 2012 at 9:58 AM | Permalink

          You guys are assuming it was actually bafflement, and not purposeful. The latter seems more and more likely.

      • AJ
        Posted Jul 7, 2013 at 9:16 AM | Permalink


        “Result? Hockeystick, straight shaft, rising blade …”

        Maybe that explains the “Little Bit Pregnant” timeseries?

  10. Kenneth Fritsch
    Posted May 15, 2012 at 10:34 AM | Permalink

    “To Gavin’s credit, in situations like these it’s best to award the benefit of the doubt to the scientists themselves who are describing their own work/motives. However, they do have a high burden of explanation for their methodology.”

    I think if you have looked at enough of the individual proxy series that go into the numerous temperature reconstructions (and without the spaghetti of many proxies and the instrumental record tacked onto the end)one is not surprised by what SteveM revealed in his initial sensitivty test on the Briffa Yamal results or his most recent test revealed on this thread. Further putting those observations together with the lack of those doing reconstructions revealing in any detail their a priori applied criteria for selection of proxies and proxy elements, one tends to discount the potential hurt feelings of the scientists involved and wants nothing more or nothing less than a reasonable conversation about the selection process.

    Gavin’s attempts to emphasis the personalities and peripheral issues appear to the serious citizen-scientist as a diversion from the real issues.

    • pdtillman
      Posted May 15, 2012 at 11:15 AM | Permalink

      “Smokescreen” is the term I’d use.

  11. Posted May 15, 2012 at 11:40 AM | Permalink

    What’s intriguing to me about un-cherry-picking:
    The results seem to show the temperature sticking
    To values somewhat less than decades before
    Just as in the US and 1934

    It may be that, once you stop throwing out trees
    That catastrophe evaporates by degrees
    As the “unprecedented” temps come back around
    Showing cherry-tree-picking to be less than sound

    But moreover it points to thermometer tweaks
    Which are hard to examine unless data sneaks
    Out from under agreements quite proprietary
    Why does giving up data strike the Team as so scary?

    ===|==============/ Keith DeHavelle

  12. Posted May 15, 2012 at 11:58 AM | Permalink

    A question;
    What happens to the new plots if the ‘XX’ stay=thirsty tree is excluded?

    Is it still disproportionately represented? I guess its lusty voice would be muted in the larger sample.

    Maybe that is the news, to plot Briffa with/without ‘#061’ and the hantemirov group with/without it…
    Talk is cheap, analysis takes time.
    This would help me (at least) integrate the various aspects of this epic saga.

    Side note; how can we reinforce this exemplary behavior by Hantemirov?


    • Steve McIntyre
      Posted May 15, 2012 at 12:04 PM | Permalink

      Re: almostcertainly (May 15 11:58),

      YAD061 is included, but it (and a few others) no longer overwhelm. There are 107 cores in 1996 instead of 5 (or 10 in CRU’s “all the data” version”)

      • Jeff Alberts
        Posted May 16, 2012 at 1:41 PM | Permalink

        If I understand things correctly, the problem wasn’t that YAD061 was included in the first place, but as Steve indicates, it was given undue influence. None of the other cores in that set had an apparent HS shape, as I recall. So, when proper methods are applied, the one outlier won’t monumentally skew the results.

        Same goes with Mann’s bristlecones, as far as weighting goes. But BCs shouldn’t have been included at all due to growth problems.

  13. J Solters
    Posted May 15, 2012 at 12:31 PM | Permalink

    Morebrocado is extremely difficult to follow because of sentence length and parentheticals interjected into sentence structure. Use of that writing technique never goes unnoticed. Substantively he concludes that current peer review is OK, and that Gavin et al should be given benefit of the doubt because they’re in the business as a career. Wrong on both substantive conclusions. Simply put, if the studies under review can’t be replicated because of data or code unavailability, the reviewer should pursue that issue to final resolution; or reject the paper. If the publisher nevertheless proceeds, the fact that results can’t be replicated should appear at the very top of the publication. No party to climate science (or other science) should get a ‘free benefit of the doubt’ pass from anybody. Proponents of AGW and those in opposition should be treated exactly the same with respect to motives, volition, career goals or any other tangible or intangible measuring parameter in any scientific debate, especially where public policy applications appear. If alleged facts can’t be proven or conclusions supported without complete access to relevant data no public policy application can be considered. Gavin et al cannot be given the benefit of the doubt in this vital arena. Waving the Peer Review flag is meaningless here.

  14. Andy
    Posted May 15, 2012 at 1:20 PM | Permalink

    It is still amazing that Gavin and co have all the grants, all the money and all the time and still look like chumps.

    Their behaviour suggests to me that any paper with their names on it goes in the bin.

  15. Tom Anderson
    Posted May 15, 2012 at 1:21 PM | Permalink

    It may be clear to others, but is this new data the same data that was sent to CRU (Melvin) back in October 2009? And is this expanded Yamal data different than what Briffa would of had prior to Briffa et al 2008? If yes, how is it different? Or maybe you can’t answer that since you don’t know (yet)exactly what Briffa had. Thanks in advance for your reply.

    Steve: Hantemirov sent this to them in Oct 2009. To my knowledge, they didn’t have it before. It is a big expansion of the Yamal “living” dataset in B2000 and B2008. It presently appears to me that CRU’s data expansion in October 2009 was from primarily data that they already had on hand from the 1990s (JAH, POR) plus their condensing of the Schweingruber KHAD data. Hantemirov also sent them a few additional YAD trees which they added. Ill try to summarize this some time.

  16. Posted May 15, 2012 at 2:11 PM | Permalink

    Given that the 2008 CRU archive is known, and the inputs are known, is it possible to reverse engineer the Briffa procedure? (using a brute force search working through all permutations or some kind Monte Carlo search). Is the search space too large?

    Steve: the Briffa procedure is closely approximated by a simple negative exponential fit. They’ve done work on trying to estimate temperature and growth effects concurrently, but the effect on a chronology is generally not that large relative to fitting growth first. It’s an interesting statistical problem and not all that easy. I think that the problem could be better approached with completely different tools, but for these sorts of analyses, that introduces an irrelevant difference.

    • Posted May 16, 2012 at 11:00 AM | Permalink

      It’s in fact a very difficult statistical problem. I have been working on it for two years and have developed an algorithm that will separate environmental and non-environmental signals during the detrending process. It will take some time before it appears in the literature though.

      • RomanM
        Posted May 16, 2012 at 11:12 AM | Permalink

        What exactly do you mean by a non-environmental signal? Can you provide some examples of what you mean by each of these types?

        • Posted May 16, 2012 at 11:42 AM | Permalink

          The non-environmental signal is that caused by the long term changes in tree age or size. The environmental signal is typically climatic, but it doesn’t have to be; it could also be biogeochemical, e.g. from N or CO2 fertilization for example, or more likely, their combined effects with climate.

      • Steven Mosher
        Posted May 16, 2012 at 12:18 PM | Permalink


        Was this the poster from AGU? I cant recall if you were author on it. I did pass it on to Roman and craig.

        link used to be live but is dead now

        • Posted May 16, 2012 at 5:11 PM | Permalink

          Yes, that was mine. I guess the AGU has disabled links to the e-posters unfortunately. However I’ve revised the poster a lot since then anyway.

        • Steven Mosher
          Posted May 16, 2012 at 8:00 PM | Permalink

          nice jim. as i recall it was a thought provoking poster and craig lohle also found it interesting. looking for to ur nxt version

        • Posted May 18, 2012 at 10:27 AM | Permalink

          Thanks for the compliments. I’ll put it up on my web site even if it is out of date.

        • Posted May 21, 2012 at 12:23 PM | Permalink

          Steven, see here:

          Click to access AGU2011.pdf

        • Steven Mosher
          Posted May 21, 2012 at 12:56 PM | Permalink

          Thanks Jim?

          Are you at UCDavis?
          I help ( as best I am able, testing mostly ) Dr. Hijmans with his spatial package ‘raster’
          He’s at Davis. I’m currently working with a couple researchers to give them tools to
          do their own temperature reconstructions for paleo studies. basically so that they
          dont have to rely on CRU gridded temps. There are some areas where there is more local
          data than CRU uses and Berkeley/Nick Stokes least squares methods can get you a better
          local temperature reconstruction than a simple 5*5 grid. Let me know if you interested
          I have a couple projects as beta testers.

  17. Bebben
    Posted May 15, 2012 at 3:01 PM | Permalink

    “Esper et al 2009 has almost unintelligible graphic and nothing is archived. (It was received on Jan 28, 2009 and accepted on Jan 30, 2009.)”

    Does that mean that non-replicable + non-insta reconstruction = insta-peerreview?


    • Jimmy Haigh
      Posted May 15, 2012 at 11:16 PM | Permalink

      Jan 28th was a Wednesday. I wondered what took them so long to accept it but then I realised that they probably didn’t want to spoil their weekend.

  18. Barclay E MacDonald
    Posted May 15, 2012 at 3:26 PM | Permalink

    It’s like there’s a “Connect the dots” game going on, but at the same time, it’s an M.C. Escher drawing or some optical device…
    “A ha! I have found a rabbit! No, you idiot… You’re staring right at a duck”.

    But this “game” is no game! It is in the context of one side consistently and over a long period of time refusing to disclose the underlying data and analysis to the other, and the game is being played in the context of very serious consequences and costs.

  19. Posted May 15, 2012 at 5:10 PM | Permalink

    I wondered what else might correlate with tree ring growth. So I took the Esper 2009 plot at the bottom of Steve’s post and overlaid detrended sunspot number averaged over the solar cycle length against it.

    The result is here

  20. Biddyb
    Posted May 15, 2012 at 5:40 PM | Permalink

    Is it me, why is it that Steve can knock out an analysis on receipt of the data in short order yet Briffa didn’t have time to do it?

    What is it?
    Or, that it was an inconvenient result?

    • John M
      Posted May 15, 2012 at 6:05 PM | Permalink


      I guess you missed the Gavscuse…it takes time to get the “right” answer.

      Remember Oson Welles?

      We will sell no line before it’s time.

  21. LearDog
    Posted May 15, 2012 at 6:19 PM | Permalink

    I am just sitting and reading and almost giggling to myself. How on earth are they ever going to get out of a discussion on a) suppression of adverse results and b) cherry picking, Gavin notwithstanding.

    The journal editors ought to be asking some embarrassing questions just about now – the whole thing is going to unravel.

    Well done, sir! What a tenacious scientist you are. Incredible.

  22. DocMartyn
    Posted May 15, 2012 at 7:41 PM | Permalink

    Steve. This is a paper on the temperature, precipitation, snowfall and much else for Yamalo-Nenets AO, Sibiria,

    It was written for:-
    “The IPY project EALAT Research ( Reindeer Herders
    Vulnerability Network Study: Reindeer pastoralism in a changing climate
    , funded by The Research Council of Norway coordinated by Sami
    University College, and the Arctic Council project EALAT Information
    funded by The Nordic Council of Ministers coordinated by International
    Centre for Reindeer Husbandry.”

    Click to access Ealat_Yamal_climaterep_dvs-1.pdf

    The figures are mind blowing, especially Figure 7.

    • David Anderson
      Posted May 15, 2012 at 10:35 PM | Permalink

      Thanks, good find. I created a graphic containing the map and temperature/rain for each of the 4 stations. From memory the Yamal site sits just east of Salekard and NW of Nadym. It’s within a river system at the bottom of the peninsula on its west side.

      They all show a temperature dip at 1970 with peaks/rises on either side. Present temperature are about the same as 1940.

    • bernie1815
      Posted May 16, 2012 at 9:11 AM | Permalink

      Excellent find. The paper is very interesting even if it is not peer reviewed!!!
      It suggests that any tree ring calibration effort could be sensitive to the weather station location(s) and time period chosen.
      It also clearly suggests that temperature increases in the Arctic region are not dramatically different from elsewhere.

  23. Craig Loehle
    Posted May 15, 2012 at 7:52 PM | Permalink

    The process of analyzing tree ring data has been compared here and elsewhere to iterative experimental studies. Let’s say one is trying to synthesize some compound. In this process, mistakes can be made, experiments can be contaminated, etc. One keeps trying things until either the compound is synthesized or one gives up and concludes that maybe it isn’t possible. But what in the world does it mean to tinker with dendro data? Where is one justified in rejecting any of the data (except in the case of stripbark trees which are clearly physically damaged, but which they won’t throw out)? The only sign of a mistake or problem is that you don’t get the answer you like!! I have read lots of this stuff and never have seen an objective reason given for keeping or rejecting any set of trees. How about: “I’m rejecting this set of patients because they did not respond properly to the medicine”? I hope no one thinks that is ok.

    • Posted May 15, 2012 at 10:06 PM | Permalink

      Exactly. If morebrocato’s quote from RC is accurate, then they really are clueless about what they are saying and how science should work. Comparing the analysis of real data to tinkering with and improving a climate model over time is not at all “analogous.” With real data you decide up front what your rejection criteria are and then live with the answer you get. Or you have an extremely good reason, explained in careful detail without any obfuscation, why you incorporated new rejection criteria after the fact.

    • Brandon Shollenberger
      Posted May 16, 2012 at 4:34 AM | Permalink

      Craig Loehle, I’ve never seen it done, but I can think of objective ways of determining what data to use. To me, the first step should be to examine the records of any given area/proxy type that would normally get combined to see how well they correlate with each other.* If one “record” shows something the other eleven don’t, that one record get tossed out. It’s possible only one record would pick up some signal, but it’s far more likely to be spurious.

      Once you’ve applied that filter, you’d then look at the remaining data in that set and see what signal, or signals, were found.* Records with similar signals would be grouped together and combined (probably by linear averaging). Each signal would then be pulled out as a separate series, at which point a similar process could be applied over a larger region, repeating until you’ve reached the global level.

      Some details still need to be worked out, such as how to handle series which contribute to multiple signals (enough double-counted records combined could create spurious correlation between their respective series), what sort of spatial weighting may be needed, but it would give a fairly simple method for looking for patterns without being biased toward an expected result.

      I don’t think you’ll see anyone push for that sort of approach though. It’d require a better understanding of what’s being done than is common with temperature reconstructions, and it would most likely show little to no significant results.

      *Of course, there would still be some subjectivity in things like what level of correlation is required, how long a period correlation needs to be found over, etc. However, each decision regarding those could be explained. They could even be tested, as one could rerun the process, changing any decisions one is uncertain of.

      • Kenneth Fritsch
        Posted May 16, 2012 at 8:49 AM | Permalink

        Brandon, I think you are talking about something different than what the primary objections to the selection of proxies that can lead to selection biases entails. When Craig says: “The only sign of a mistake or problem is that you don’t get the answer you like!!” I think he hits the difference squarely on the head. You appear to me to be talking about strategies for relating, as an example, tree ring growth to temperature, whereas Craig is pointing to the fact that that relationship is not well understood and the selection process gets biased by peeking at the final result.

        The obvious best way to proceed with trees as thermometers would be to understand the physical aspects of the tree response and all interferences of that response. Testing that hypothesis would fall under the tests you describe, if I understand them correctly. The approach at that stage of the study that would not be proper is to assume that there has to be a discernible signal of temperature in a proxy type provided the proper individual proxies are found through an exhaustive search and testing procedure. The Bonferroni adjustment required under those conditions would no doubt be either huge or impossible to track.

        I would suppose with sufficient data and testing if one could demonstrate that a given tree species at given global location or locations and during given months of the year responded to measured temperatures in a predictable manner over sufficient years to determine the tree responses to other climate variables, one might be able to propose a tree thermometer without understanding in detail the physical mechanisms of how the tree responded to temperature. Even that process is fraught with selection biases unless the necessary precautions are taken and, of course, would be rendered useless if at the next stage of producing reconstructions the use of a particular proxy was based on peeking at how it responded to temperature.

        • Brandon Shollenberger
          Posted May 16, 2012 at 9:13 AM | Permalink

          Kenneth Fritsch, I am discussing something different. I was responding to him saying he had “never have seen an objective reason given for keeping or rejecting any set of trees” by showing a way one could give objective reasons. Of course, since what I proposed is basically the exact opposite of the current methods used in multiproxy studies, I’m discussing something different than the normal criticisms.

          My proposal is to examine the data without any expectations, see if a signal is found, and if so, compare that to the modern temperature record. This frees us from needing to focus on all the physical aspects of the records. For any records where no signal is found, we don’t need to know anything. For the rest, we can always examine them in more detail once they’ve been picked out.

          Mind you, I assume we’d only look at data which has some (plausible) relation to temperature, so it’s not like random data sets would be used. This means if we do find a signal which correlates with temperature, we can be reasonably confident that’s what it actually tracks.

          We’d still need to account for the possibility of spurious correlations, but that’s a fairly minor issue. Remember, to get used, a signal would have to be found across multiple records from the same area. The odds of that happening spuriously, then the resulting average spuriously correlating with another signal, are not high.

        • daved46
          Posted May 16, 2012 at 11:22 AM | Permalink

          Re: Kenneth Fritsch (May 16 08:49),

          One thing we sometimes forget is that there is actually an objective criteria which was postulated at the very beginning of the dendro-thermometry tale. It was recognized that two of the primary causes of tree-ring width are precipitation and temperature. So scientists looked for where to find trees which would not be limited in their growth by precipitation and an obvious choice was trees growing at the timberline in mountainous areas. This is what lead to looking at bristlecone pines. Unfortunately the best such records were the Graybill bristlecone piness. I say unfortunate since Graybill and Itso were looking for trees which might show a CO2 fertilization effect. They reasoned that stripbark trees would be good choices, and since most very old bristlecones are stripbarked do to the vicissitudes of time, that’s what the collected. But a stripbarked pine will have different characteristics than a non-stripbarked one. Mann et. al. should have known better than to use them, but were too excited by their showing so clearly the sort of temperature “signal” they thought should be there that they let desire overcome science. The rest is history.

          BTW, I mention this only because some reading this thread may not be aware of some of the back-story and think that the team didn’t have ANY selection criteria rather than that they circumvented their original selection criteria.

          Dave Dardinger (hate having to use my wordpress handle rather than my name, but I don’t know how to circumvent it.)

        • daved46
          Posted May 16, 2012 at 11:38 AM | Permalink

          Re: Kenneth Fritsch (May 16 08:49),

          Tsk, tsk! I usually pride myself for avoiding typos, but I see at least 3 in a quick reread. Piness for pines, do for due and the for they.

        • Kenneth Fritsch
          Posted May 16, 2012 at 2:28 PM | Permalink

          “My proposal is to examine the data without any expectations, see if a signal is found, and if so, compare that to the modern temperature record. This frees us from needing to focus on all the physical aspects of the records. For any records where no signal is found, we don’t need to know anything. For the rest, we can always examine them in more detail once they’ve been picked out.”

          Brandon, when you say you would use only proxies that had a signal for temperature, how would you determine/define what a signal or an adequate signal was. I am sure with sufficient data I could find a proxy that responded with a correlation of 0.10 to temperature and with a probability of occurring by chance being 0.001 or smaller. Actually Mann (08) used a screening value for r=0.10 that he upped to r=0.13 because of the lost degrees of freedom due to auto correlation. But a correlation of 0.10 means that 1% of the variation of the proxy response is due to temperature. How well would that proxy response transport to another region or another time with all that interfering noise?

          Further you surely could not look for a single individual proxy that gave what you thought was an acceptable response and use it without having to show that that response could be expected with other individual proxies and with good confidence.

        • Kenneth Fritsch
          Posted May 16, 2012 at 2:34 PM | Permalink

          Dave46, it always good to remind readers of the history of the issues. I suspect that dendros know their field well, but I have the view that others use their data with much less caution because like you said it fits their purposes.

        • Brandon Shollenberger
          Posted May 16, 2012 at 4:14 PM | Permalink

          Kenneth Fritsch, suppose you had three synthetic data sets (representing proxies), each with ten records. The first has a single record with a hockey stick, the second has three, the third has 10. For the first set, my approach would say no similar signals were found across records, so no information is had. For the second set, it’d find a signal across three records and average those three records. For the third, all ten records get averaged.

          These three “proxies” are then compared to each other. The first says nothing and gets discarded. The second has a noisy hockey stick, and that gets compared to the third’s less noisy hockey stick (averaging multiple records with the same signal attenuates noise). Those two would get grouped together. To test if the hockey stick signal tracked temperatures, a combination of them would be compared to the temperature record.

          That’s the basic idea of my approach. I can’t give many details about what correlation values to use, how to calculate significance levels, or things like that off the top of my head, but those are things one can figure out (one can also test out various values/approaches and compare them).

          To get a spurious correlation, one would need a signal spuriously correlated across multiple records, the average of which would need to be spuriously correlated with a signal found in other proxies. And the combination of those would need to be spuriously correlated to the temperature record.

          As an added benefit of my approach, any signals found in records/proxies/whatever would be readily available for viewing if one wanted to check where they came from (such as for questions of data issues), and it’d be easy to test what impact they have. Imagine how convenient it’d be if you saw a temperature reconstruction, and you could “click a button” to see a list of all proxies (and from there, all records) which contribute to its shape.

          Steve: Brandon, if you have one HS series (bristlecones) and a network of nondescript low-order red noise, and apply CPS re-scaling to the instrumental record in the 20th century, you get a HS back. The white noise/low order red-noise series cancel out due to the Central Limit Theorem. Even if you had a fairly network with a real non-HS signal, Mannian PC methodology under common circumstances would return the single HS as the PC1

        • Brandon Shollenberger
          Posted May 16, 2012 at 6:39 PM | Permalink

          Steve, I agree. Part of the reason I structured that approach the way I did is to avoid the issue you describe. A single series with a hockey stick wouldn’t even be considered because it doesn’t match any other series. If multiple series had a hockey stick, they would get combined, attenuating the noise, but not by anywhere near as much as as with Mann’s methods.

          Also, since the series would be compared to each other first, and the temperature record last, you’d avoid the problem of looking for anything with a blade. You’d get just as much noise pulled through for things with a backwards hockey stick (as in, rising data in the ~1200s) as you would for the regular hockey stick.

          Put simply, I’d examine the data to see what signals are in it, then look for an explanation for any signal I found. This is basically the opposite of Mann’s methodology.

        • Kenneth Fritsch
          Posted May 16, 2012 at 6:39 PM | Permalink

          “These three “proxies” are then compared to each other. The first says nothing and gets discarded. The second has a noisy hockey stick, and that gets compared to the third’s less noisy hockey stick (averaging multiple records with the same signal attenuates noise). Those two would get grouped together. To test if the hockey stick signal tracked temperatures, a combination of them would be compared to the temperature record.”

          Brandon, I think you have it backwards here as surely you first have to look at the proxy correlation with temperature in the instrumental period and not look for HS shapes in the proxy series(which, of course, you would need to define in quantitative terms if you were using it as a test criteria). The HS is a term used in a general description of the entire proxy series and you would never properly anticipate what the proxy would reveal historically until you had calibrated it in the instrumental period.

          So your description of one set with one hockey stick and another with 3 and the other with 10 is misleading since a proper evaluation would deal with a correlation of the proxy response to temperature and that would never be a pass/fail but rather your 10 records would have a range of correlation values. But it is even more complicated than that since a reasonable correlation over the entire instrumental period could be obtained for a proxy that flew up at the end like the Yamal example or those many proxies that dive down at the end with a case of divergence or like the North American TR PC that was truncated and refit at the series end in Mann(99) where the proxy had a spurt of growth before the end of the series and then diverged at the end. A correlation responding to the higher frequency parts of the series used in calibration could be reasonably high and still miss the mismatched trends in the proxy series and temperature record.

          After you have looked at a large number of proxies that have been used in published temperature reconstructions and looked at those series without other series on the same graph interfering with your view and without the instrumental record attached at the end to confuse you on how the proxy performs at the end of the series you will see that most proxies yield rather unspectacular series that meander their way through the series time period- much like what SteveM has shown here. Some others can show upward trends at the series end as well as others showing divergence at the end.

          One of the most unsatisfying aspects of some proxy series is that the time period of the series ends substantially before the instrumental record does. When an author tacks the instrumental record on the end of the series the unknowing observers can confuse one for the other or think that the instrumental record and proxy are equally valid thermometers and that the proxy had it existed all the way to the end of the instrumental record would have followed in the same path.

          I think being aware of the problems that current proxies are viewed as having could help one who was interested in doing a proper job of selecting proxies for reconstructions and properly presenting the results.

        • Brandon Shollenberger
          Posted May 16, 2012 at 7:01 PM | Permalink

          Kenneth Fritsch, I suspect you may be getting confused because the approach I describe is so different than what is normally done. For example, you say I “first have to look at the proxy correlation with temperature in the instrumental period.” That couldn’t be farther from the truth. What you describe is the sort of approach used by Mann. It takes a known signal and looks for that in the data. Put bluntly, it is data mining.

          What I describe is the inverse of that. First, I’d look for signals in the data. It may be a signal I find is a hockey stick signal, but it could just as easily be some other signal (I’d actually expect to find many different signals). Then, I’d compare the “list” of signals I had found to other things.

          If I were working at a regional level, I’d compare the signals found in one proxy to the signals found in other proxies in that region. Signals which aren’t found across multiple proxies would be dismissed as local signals/noise. Signals which are found in multiple proxies would be considered as regional signals (taking into consideration the possibility of spurious correlations, of course).

          Once I had a set of regional signals, I’d then compare those to the temperature record. If there was good correlation, I’d conclude I had found a regional temperature signal (again, with reasonable caveats). If there wasn’t, I’d say I found no information about the regional temperatures.

          To put it simply, I’d look for a signal, then see if that matched the temperature record. That avoids biasing the results by having preconceived expectations.

        • MrPete
          Posted May 16, 2012 at 7:24 PM | Permalink

          Brandon, it will pay to reflect on what Kenneth said:

          The obvious best way to proceed with trees as thermometers would be to understand the physical aspects of the tree response and all interferences of that response.

          Without a physical basis, there’s no objective reason to claim that “good” proxy data is actually a temp proxy.

          Go review the Proxy->Almagre category here on CA. We found by inspection what ought to be obvious: a strip-bark BCP (Bristle Cone Pine) will have a recent growth pulse… simply because it has been fighting for its life for the last 150 years or so. The ones that don’t make it are not sampled because they’re dead. The ones not stripped were ignored by dendros because they had no “interesting” signal.

        • Brandon Shollenberger
          Posted May 16, 2012 at 8:12 PM | Permalink

          MrPete, I’m well aware of that issue. Part of the benefit of my approach is it clearly delineates what signals are found in what data, as well as how much impact those signals have. This means if one wants to examine the data for issues like you describe, it’s easy to do. It’s also easy to see what would happen if you exclude particular data.

        • MrPete
          Posted May 16, 2012 at 8:52 PM | Permalink

          Permit me to disambiguate a bit. Brandon said:

          Suppose you had three synthetic data sets (representing proxies), each with ten records. The first has a single record with a hockey stick, the second has three, the third has 10. For the first set, my approach would say no similar signals were found across records, so no information is had. For the second set, it’d find a signal across three records and average those three records. For the third, all ten records get averaged.

          and also said

          Put simply, I’d examine the data to see what signals are in it, then look for an explanation for any signal I found. This is basically the opposite of Mann’s methodology.

          I think that saying “hockey stick” in the example is a distraction. If I understand correctly, what Brandon is proposing is a search for correlated “wave forms” across multiple proxies, that are statistically different from noise, no matter what their shape. By this logic, a flat line can be “signal.”

          This is an entirely fair procedure. In fact, it is used in a variety of real-world applications.

          For example, the GPS unit in your cell phone attains its incredible sensitivity by taking ~1000 samples of what could easily be an almost-all-noise environment (such as in a parking garage or urban canyon — where traditional GPS does not work.) Sophisticated signal processing finds correlations and removes noise. What is left is most likely a real signal. Further processing proves whether or not that’s the case.

          The point is: the data samples can be processed to find correlations (signal) and noise removed, without regard to the nature of the signal.

          Another example: temporal comparison of subsequent video frames enables emphasis of signal (correlation) and removal of noise. See here, and the whole example page here. Finally here is a brief tutorial showing how this works in real life.

        • Brandon Shollenberger
          Posted May 16, 2012 at 9:51 PM | Permalink

          MrPete, you’re right about that being a distraction. I needed an example of a pattern one could find in a signal, and that was the first that came to mind. If I had thought about it, I’d have probably used a different example to try to avoid confusion.

        • Kenneth Fritsch
          Posted May 17, 2012 at 10:21 AM | Permalink

          “The point is: the data samples can be processed to find correlations (signal) and noise removed, without regard to the nature of the signal.”

          Let us take this example of a signal in a proxy series over a long time period that correlates with signals in other proxy series. I have viewed a number of the same type of proxy series that show some coherence over some part of the series. The problem is in determining whether that signal is the temperature or something else or a combination of temperature and something else. Another problem I have seen with these proxy “signals” is that while several proxies will respond to an event (sometimes the event might be a volcanic eruption that is well dated from history), i.e. you will see the same blip in all proxies at the same time, the magnitude of those blips (intensity of response if you will) is often very different. That would appear to be a scaling problem with temperature if indeed it was only temperature that the proxy blips were responding.

          I think when a cell phone signal is being sampled one knows the origin of that signal while with the proxy signal (temperature) that cannot be assumed.

          Finally, while I am surprised that SteveM has not yet cut-off this discussion, I do think that such a discussion can show how complicated selecting proper temperature proxies can be. I should also state here and now that I am not close to being an expert in these matters, but like good writing, I know good writing when I see it without being a good writer myself.

          My overall view of a number of published temperature reconstructions is that not much thought went into using those proxies that in turn went into the reconstructions and as a result the authors had to scramble for selection justification after the fact – and continue to scramble and some even changing the subject.

        • Brandon Shollenberger
          Posted May 17, 2012 at 1:48 PM | Permalink

          I typed up a response, but it got lost in the aether. Short version, I’m aware of how many confounding factors there are, and it’s because of them, I wouldn’t expect my approach to give much in the way of answers. It’s not a bad sign as that’s an issue of the data, not the approach. Temperature reconstructions claim to find the right answer, but generally, their conclusions are unjustifiable.

          As for scaling, one expects different magnitudes for signals. I think the best approach would be to scale records/proxies based on the size of the signal, relative to the size of the signal in the records/proxies they’re combined with. Mind you, I haven’t given this issue much thought, so there could be problems with that I haven’t considered.

          And yeah, this is rather off-topic. I didn’t think it’d matter because I expected to just make one comment on the issue, but after this many…

        • Kenneth Fritsch
          Posted May 17, 2012 at 6:36 PM | Permalink

          “As for scaling, one expects different magnitudes for signals. I think the best approach would be to scale records/proxies based on the size of the signal, relative to the size of the signal in the records/proxies they’re combined with. Mind you, I haven’t given this issue much thought, so there could be problems with that I haven’t considered.”

          Brandon, I believe the subtracting of the mean from the series and dividing by the standard deviation as in Composite Plus Scaling (CPS) is a common practice in proxy series presentations. The scaling problem to which I refer above is after the operations I mentioned. My point was that one can find a signal and a reasonable correlation between series, as you suggest looking for, but when one sees an event that all the proxies have obviously responded to (as in a known volcanic eruption) and after the CPS operation we see large variations in the magnitude of responses there is a problem in calibrating temperature to proxy response – if indeed the signal is mostly temperature response. If you go to the most current thread at CA here you will see a post that includes a comment from Cook lamenting that he is primarily interested in the differences in proxy responses. My guess is that he is referring to the problems I noted here.

          I do not believe that your reply addresses this issue.

        • Kenneth Fritsch
          Posted May 17, 2012 at 6:46 PM | Permalink

          Brandon here is the exchange to which I referred above:

          I’m sure you agree–the Mann/Jones GRL paper was truly pathetic and should never have been published. I don’t want to be associated with that 2000 year “reconstruction”.
          I am afraid that Mike is defending something that increasingly cannot be defended. He is investing too much personal stuff in this and not letting the science move ahead.”

          Cook goes on to propose a single unifying GRAND climate reconstuction paper by the entire team of “climate scientist” and then states how little he thinks they will learn from it.

          “…Without trying to prejudice this work, but also because of what I
          almost think I know to be the case, the results of this study will
          show that we can probably say a fair bit about 100 year variability was like with any certainty i.e. we know with certainty that we know fuck-all.

          Of course, none of what I have proposed has addressed the issue of
          seasonality of response. So what I am suggesting is strictly an
          empirical comparison of published 1000 year NH reconstructions
          because many of the same tree-ring proxies get used in both seasonal
          and annual recons anyway. So all I care about is how the recons
          differ and where they differ most in frequency and time without any
          direct consideration of their TRUE association with observed

        • Brandon Shollenberger
          Posted May 17, 2012 at 11:28 PM | Permalink

          Kenneth Fritsch, I knew exactly what you meant. I was referring to the same issue. You do have to scale proxies to some sort of standardized units before anything else, but that’s not what I was talking about. I was talking about scaling them “relative to the size of the signal” after that, as in a second scaling step.

          Any signal you find will not have the same strength in all records which show it. To address that, my idea is to scale each record with the same signal so the relative amplitudes of their responses would be the same, making them directly comparable. And since the noise would also be rescaled, it wouldn’t introduce biases.

          The idea is as long as the difference in the series is limited to the amplitude of the signal, you can treat it as a difference in noise. That’s why Do that, and there’s no difficulty in how to handle it. The only time it is difficult to handle a difference in responses is if there is a difference on the temporal level.

        • Kenneth Fritsch
          Posted May 18, 2012 at 10:12 AM | Permalink

          “Any signal you find will not have the same strength in all records which show it. To address that, my idea is to scale each record with the same signal so the relative amplitudes of their responses would be the same, making them directly comparable. And since the noise would also be rescaled, it wouldn’t introduce biases.”

          I cannot visualize what this process gains information-wise or ultimately how it would quantatively relate proxy response to temperature.

          You have proposed finding a signal in the proxy response but “a” signal could be partially due to temperature and to something else(s) or entirely to something else. Once you find a signal you are obligated to show that it is a temperature response or at least some part temperature response. How would that be accomplished?

          I visualize several series of proxies that show different amplitudes at various times throughout the common time period. How would you make the relative responses the same across the entire time period? I believe some makers of reconstructions attempt to live with these variations in amplitude without dealing directly with them by simply taking an average of all the standardized proxy series with the assumption that the noise is cancelling out and a signal (temperature??) remains.

        • Brandon Shollenberger
          Posted May 18, 2012 at 1:56 PM | Permalink

          Kenneth Fritsch, what you say you cannot visualize is something the step I describe doesn’t even attempt to do. It is merely a processing step used to normalize the records so they’re directly comparable. It is not designed to affect the level of information, nor is it designed to have anything to do with the temperature record.

          The entire point of it is once it’s done, you’re left with just a group of similar signals with different amounts of noise. From there, it’s a simple matter to check correlation with any other record, such as the temperature record. But comparing things is a different step than just normalizing them.

          As for what you are visualizing, if the difference in responses is non-uniform over time, they’re different across the temporal level, which I specifically said would be difficult to address. There are potentially ways of handling that, but that’s getting into far more detail than I’d care for. If nothing else, one could always just not use data which has too dissimilar of responses. In that case, you’d wind up only using data with uniform responses, completely avoiding the problem you describe.

          Anyway, I’ve pretty much given an outline for the entire process I’ve proposed, which I didn’t intend or expect to do, especially not when it’s mostly off-topic. So while there are plenty more details I could discuss, I don’t think it’s worth continuing the discussion here unless someone thinks there is a flaw in something I’ve described (missing steps aren’t flaws, they’re simply undescribed).

          Mind you, if there is an appropriate place for the topic, I’d be happy to discuss it there (even if it is just through e-mails). I honestly believe a systematic analysis of proxies like I describe is the best way for multiproxy studies to be done. Since I cannot do it, I’d be happy to promote the idea however I can.

        • Kenneth Fritsch
          Posted May 18, 2012 at 2:54 PM | Permalink

          In the most current CA thread and quote from Jacoby you can see what Jacoby thinks of the methodology that I referred to where the makers of reconstructions throw together proxies hoping the noise cancels on averaging and the signal prevails. While I would agree that that approach appears to be through brute force and avoids looking at the question of differing proxy responses, Jacoby’s counter approach as SteveM notes is a horror story for anyone knowing anything about selection bias.

          “As we progress through the years from one computer medium to another, the unused data may be neglected. Some [researchers] feel that if you gather enough data and n approaches infinity, all noise will cancel out and a true signal will come through. That is not true. I maintain that one should not add data without signal. It only increases error bars and obscures signal.”

        • Brandon Shollenberger
          Posted May 18, 2012 at 6:12 PM | Permalink

          I remember being completely shocked when Steve McIntyre first posted about that comment. I couldn’t believe a scientist would say something like that. The idea was ridiculous enough, but that he’d say it as though it was normal was dumbfounding. You obviously don’t like to add noise without signal, but you can’t use that as an excuse to not archive data. If you do, why should anyone believe it is just noise?

    • DocMartyn
      Posted May 17, 2012 at 5:08 PM | Permalink

      Introduced species have changed North American forests beyond measure. The biggest single change was the introduction of the European Earthworm, The European Rabbit, Fallow & Red Deer, The Wild boar, and the (Californian) Wild Goat.

  24. Mike Lewis
    Posted May 15, 2012 at 8:33 PM | Permalink

    Thank you for taking time to perform this analysis and presenting it here. I applaud you for showing restraint in responding to your critics, who are now accusing you of slander, defamation, and deliberately attempting to deceive the public.

    snip – overeditorializing

  25. mpaul
    Posted May 15, 2012 at 9:22 PM | Permalink

    Surely there can no longer be any doubt that these proxy reconstructions are extremely sensitive to data selection. On top of that, the data selection criteria used in dendrochronology is ex post facto. So there is an ever-present danger that bias can enter the process. The only way to guard against this is to have totally transparent methods. Everything needs to be in the open and subject to inspection by anyone at any time. As Ross has said on many occasions, calculating climate indexes should be like calculating the consumer price index — totally open and transparent. Instead, we have UEA fighting disclosure every step of the way.

  26. ferd berple
    Posted May 15, 2012 at 9:26 PM | Permalink

    real climate Borehole

    ferd berple says:
    15 May 2012 at 9:12 PM

    Scientific American
    An Epidemic of False Claims


    “The best way to ensure that test results are verified would be for scientists to register their detailed experimental protocols before starting their research and disclose full results and data when the research is done.”

  27. ferd berple
    Posted May 15, 2012 at 9:27 PM | Permalink

    More from the RC borehole
    ferd berple says:
    15 May 2012 at 8:04 PM

    People generally try something, find something wrong, try something else, fix one problem, test something else, deal with whatever comes up next, examine the sensitivities, compare with other methods etc. etc.
    There is a basic rule in statistics that you never do this. You choose your method ahead of time, otherwise the temptation is to simply cherry-pick the methodology until you get the answer you are looking for. No matter how un-baised the researcher, our subconcious directs us to obtain the results we expect, unless we are very careful in the design of our analysis.

    • Brandon Shollenberger
      Posted May 15, 2012 at 11:19 PM | Permalink

      ferd berple, this isn’t true, at all. Many things in statistics would be impossible to do if you had to decide on your approach beforehand. For a real-life example, I’ve recently been doing stuff with LOWESS smooths (on lucia’s blog). One of the big issues with that topic is picking the smoothing window one will use. If I had to pick one without looking at the data or being able to guess wrong, I’d never be able to accomplish anything.

      And what if I later decide LOESS would work better so I could use a polynomial instead of linear fit? Does the fact I first used LOWESS (a simpler method) mean I can’t change to a more effective approach? Of course not!

      One of the largest components of statistics is validation testing, and that’s because it’s so important to test out different approaches. The way you avoid cherry-picking isn’t by arbitrarily limiting yourself to your first guess. It’s by trying multiple approaches, using validation testing and disclosing the impact of your choices.

      As long as you offer the appropriate caveats to whatever you do, you won’t cherry pick by trying multiple approaches. In fact, you’re more likely to cherry-pick if you force yourself to stick with your first guess, since then you may not be able to find errors or alternative approaches.

      So no, the comment you quoted is absolutely right about this issue.

      • HAS
        Posted May 15, 2012 at 11:43 PM | Permalink

        The question is how do you adjust for the fact that you’ve used the data once to pick your methodology and again to run your experiment?

        You should more properly hold out data to keep your induction and deduction separate else you stuff the assumptions behind any statistical testing you do.

        • Brandon Shollenberger
          Posted May 16, 2012 at 12:23 AM | Permalink

          HAS, that depends on what you’re doing. In some cases, it is enough to just calculate something like confidence intervals. If what you choose turns out to be significantly wrong by those, you know not to use it.

          In other cases, you might do something like bootstrapping. In bootstrapping, you randomly select a number of points from your data equal to the total amount of points, allowing repetition. This causes some points to be picked multiple times, and other points not to be picked at all. You then rerun your analysis on this new selection (possibly even adding noise of different types to it) and see what your results are. You repeat that process a number of times, perhaps even in the hundreds or thousands, and see what sort of variation you get.

          You can also do in-sample testing where you apply your analysis to portions of your data (with all sorts of options on how you pick/modify those portions) and see what results you get. Again, this will give you information about what your analysis says about your data.

          Another thing you can do is try out different methodologies/values and see what results they give. If you’re picking a value for a parameter between one and zero, and you try a dozen different choices, you can get a pretty good idea what effects your choice has on your results. If any of those choices give results notably different than the ones you want to go with (and can’t be proven to be wrong), you know you cannot give your preferred results without discussing what else you’ve found.

          There is nothing wrong with using hold-out data, but it is not something you should necessarily do in all cases (it isn’t even possible in some cases). The correct way to avoid cherry-picking is to test your conclusions. If you don’t test alternatives, which you can’t if you prevent reanalysis of the same data, you will actually wind up cherry-picking.

        • HAS
          Posted May 16, 2012 at 3:38 AM | Permalink

          It is an interesting methodological question as to what you have demonstrated if you run 10 different window widths (say) in a LOWESS analysis of a dataset to get a desired balance between apparent bias and variance, and then move to make statistical inferences about this dataset on the basis of that analysis.

        • Brandon Shollenberger
          Posted May 16, 2012 at 4:15 AM | Permalink

          HAS, you seem to be misunderstanding what I said. I never said one would try out a variety of values to get a desired result, then go and make inferences about the dataset based on that “analysis.” There is no step between trying the variety of values and making inferences. The inferences you make are based on the testing you do.

          For example, suppose you have a hundred evenly spaced data points taken from sin(x), from 0 to 10. Now suppose you wanted to examine the data by using a LOWESS smooth. The first thing you try is a smooth with R’s default window (.67). When you plot it, you see there is obviously a signal you’re not representing. To verify that, you check the residuals (and possibly run various tests to see what signals may be present).

          You then try f = 0.5. This gives a better representation, but still, fails to capture clear parts of the signal (even missing one major downturn entirely). So next you try 0.35. Now, you have a clear signal, and you could intuitively guess it was a simple sine wave. However, you still test your generated line.

          When you look at the residuals, you can see a small, but clear, sinusoidal pattern. Given the primary pattern is sinusoidal as well, with the same period, you can conclude you’re underfitting the data (if both conditions weren’t met, you’d know either the “noise” had a sinusoidal pattern, or your fit was wrong).

          You repeat that process until the residuals don’t show a clear signal anymore (what value that’s at would depend on how much noise is in the data). Having reached that point, you make a note of the value, and then you keep using smaller and smaller values until you see signs of overfitting (such as too many points of inflection). Once you do, you take note of that value.

          With those two values in hand, you can then conclude the “right” value must lie somewhere in-between them. For a reasonable amount of noise, you might conclude the smoothing window must be somewhere between .2 and .5. That range of values would be directly determined by the tests you run, and you wouldn’t be able to pick a single value within it as “right.”

          You could possibly do more tests once you’ve determined that range, hoping to narrow it further, but at no point would you be making inferences based upon anything you desired. It would all be directly taken from your calculations, calculations you could make available for anyone to replicate, even if they have completely different desires than you.

        • Steve McIntyre
          Posted May 16, 2012 at 9:16 AM | Permalink

          Please discuss LOWESS at Lucia’s thread.

        • Brandon Shollenberger
          Posted May 16, 2012 at 4:40 AM | Permalink

          HAS, everything I said was based on the presumption the tester didn’t know that. If he already knew that, there’d be no reason to try smoothing the data to figure out what it shows!

          If you’d like a practical example, you’re welcome to come up with an equation, a schema for generating points on (or near it if you’d like me to implement noise) the line for that equation, and I’ll show how I’d test what parameters to use with a smooth. Or, if you’re willing to put in more work, you could generate a data set and ask me to try the same thing “blind” to the actual equation used.

          I’d actually prefer the latter, as I’ve been wanting to try out some tests I don’t have much experience with, and it’d be good practice. However, I know that’s more work, so I’m willing to do either.

        • HAS
          Posted May 16, 2012 at 5:27 AM | Permalink

          I hope you don’t think I’m being difficult here but I think the problem is that data analysis in this context is purposeful. We want to explain the physical world. Call me old fashioned but either we are at the stage where we are trying to develop hypotheses about what those physical relationships are, or we have developed the hypotheses and are testing their validity.

          Now if we are in the situation where we don’t know what the relationships are and are “smoothing the data to figure out what it shows” we can fit nonlinear models to it and assess the fit. And we can go trying different models in an attempt to get better fits. But in the end this doesn’t help us ascribe any apparent relationships that arise out of this analysis to the physical world. To do that we need to formally test this, and that subsequent testing needs to be independent of the prior analysis (even in the rather loose living Bayesian world as I understand it – but my mother told me to be very careful with those Baysians so I don’t frequent there often).

          Anyway the serious point remains. You might do datamining as you and RC described it, but you don’t start to show anything about the real world until you stop doing it and get formal(as ferd berple suggests).

        • Brandon Shollenberger
          Posted May 16, 2012 at 6:40 AM | Permalink

          HAS, I don’t think you’re being difficult. I understand why you’re hesitant on the matter. Looking for patterns in data then trying to find a physical explanation is always fraught with difficulty.

          Unfortunately, I’m having trouble figuring out where you disagree with me. You say the approach I’ve described for looking for a signal “doesn’t help us ascribe any apparent relationships… to the physical world.” You then say to do that, we have to test our conclusions independently. This is exactly what I would say.

          After the steps I describe, if some signal was found, I would then compare it to the modern temperature record (or whatever other record I was interested in). If the two had strong correlation, that would give good reason to believe I had found a temperature signal (spurious correlation is possible, but unlikely). It would also be the exact independent test you say is necessary. Finding signals by examining the data itself ensures I don’t find signals based upon what I “want” to find.

          And even then, what ferd berple describes isn’t true. What if when comparing the two things, I realize the two match well, but with a lag? If I didn’t think of that possibility before comparing them, should I not be allowed to adjust for it? What if I find an exact, physical reason for the lag? Or, what if I find the two were inversely proportional? Should I just dismiss the relationship because I thought more warmth would lead to more growth, not considering that after a point more warmth would harm trees?

          Of course not. You can make decisions at any point in an analysis. You just need to be able to justify them and explain what happens if you make an alternative decision. The key to avoiding cherry-picking is not to place arbitrary limitations on yourself, but to be thorough, open and honest.

        • HAS
          Posted May 16, 2012 at 3:06 PM | Permalink

          Can I first just note for our host’s benefit that my comments relate to the methodological issues with LOWESS simply an example.

          The problem I am dealing with is that of using the same information twice. First to create your hypothesis and second when you come to test it. If you do that you have most likely violated the assumptions of the statistical tests you are using (among other things).

          You mention lags as an example. A recent case of the above problem was in a paper where Tamino et al I think regressed lagged SOI against temperature. First they ran regressions against every possible lag to determine where the lag correlation was highest, and then included that lag in a subsequent linear model with other variables and reported statistical tests from that model. I trust you can see the problem.

          A useful way to think about this is to remember there is only a finite amount of information contained in a dataset. If you use information up in dataming you have lost it for validation of any subsequent model. You can only get over that barrier by introducing further information external to it, including perhaps by way of assumptions (explicit or implicit). But obviously in those cases any results are conditional on those assumptions.

        • Brandon Shollenberger
          Posted May 16, 2012 at 4:37 PM | Permalink

          HAS, why would you have a problem with using the same information twice? That’s exactly what you’d expect to do. When you examine any data set, you’re trying to reduce a large amount of information into a smaller amount of relevant information. If you couldn’t use the extracted information, what would be the point of any analysis?

          For example, suppose you have a noisy data set. To reduce the noise, you examine the data, concluding a low-pass filter would be best. You’ve now used the information in the data to pick your filter, so when you then look at the cleaner form of the data, you’re using that information a second time. There’s nothing wrong with that.

        • Steve McIntyre
          Posted May 16, 2012 at 5:02 PM | Permalink

          Brandon and HAS – none of this has anything to do with the Hantemirov data.

        • Brandon Shollenberger
          Posted May 16, 2012 at 5:45 PM | Permalink

          Steve McIntyre, that’s true. It’s a worthwhile matter to discuss, but it is pretty off-topic. I’ll leave it be.

          Though HAS, if you’d like to continue the discussion, I’m happy to somewhere else/in e-mails.

      • Willis Eschenbach
        Posted May 16, 2012 at 12:07 AM | Permalink

        Brandon, you are talking about choosing methods. The RC folks are justifying picking which data to analyze. They are very different situations.


        • Brandon Shollenberger
          Posted May 16, 2012 at 12:26 AM | Permalink

          Willis Eschenbach, that’s not true. The comment ferd berple quoted from includes things like, “compare with other methods.” Selecting data may be included in the process they describe, but it is obviously not the only thing being referred to.

        • Willis Eschenbach
          Posted May 16, 2012 at 1:53 AM | Permalink

          Brandon, re-read what I wrote. I said nothing about fred berple, I was talking about the RC folks and the Hantemirov data …


        • Brandon Shollenberger
          Posted May 16, 2012 at 2:12 AM | Permalink

          ferd berple discussed something Gavin had said about the Yamal issue over at RealClimate. I responded to him, disputing what he said. You responded to my disputation, talking about what the people at RealClimate do. Since you responded to my comment about what someone at RealClimate said, referring to what people at RealClimate are doing, I assumed you were responding to what I actually said.

          I apologize for not realizing you responded to my comment to discuss something not discussed in my comment. I apologize for not realizing when you referred to a group of people, you didn’t intend to make any connection to my discussion of a comment by a member of that group. I further apologize for not realizing the comment I discussed, which dealt with the Yamal issue, was not what you were discussing as you were apparently referring to something which had to do with the Yamal issue…

          And I apologize to our host for my sarcasm.

        • Willis Eschenbach
          Posted May 16, 2012 at 11:26 AM | Permalink

          Thanks, Brandon. Now that you have that off your chest, do you have any comments on the subject of the thread, which is the criteria (or the lack thereof) for choosing or not choosing data?


  28. Geoff Sherrington
    Posted May 16, 2012 at 5:57 AM | Permalink

    In reading this matter from start to finish on CA, and noting the “received 28 Jan 2009, accepted 30 Jan 2009”, pressure has to go again onto the reviewers of the formal papers. In my past fields of Science, the reviewers usually knew more about the subject than the authors. That was a criterion for selecting reiewers. Mistakes or misconstructions of the magnitude reported here would simply have led to rejection of the papers and possibly an invitation to avoid submitting anything more in the future.
    In my later career, time was spent in a framework similar to that of a reviewer. Scientist employees would present proposals for in-house funding; a couple of colleagues and self would audit the proposals, give pass, fail or revise/resubmit. In that occupation, the growth of the enterprise was highly dependent of the choice of pass or fail. Hundreds of millions of $ at a time could be involved in a bad decision and there were dozens of decisions a year. The point is, as quasi reviewers, we needed to weed out the errors, if any, and provide the means to progress.
    It is difficult to comprehend that reviewers of this dendro work were so slack that they did not see the deficiencies. You say conspiracy, I simply say that they could not care less. Scientific ignorance and indifference, coupled with over-abundant funding and little accountability – a recipe for the present dreadful mess.

  29. Rashit Hantemirov
    Posted May 16, 2012 at 9:01 AM | Permalink

    Steve, I’m horrified by your slipshod work. You did not define what you compare, what dataset used in each case, how data were processed, and what was the reason for that, what limitation there are, what kind of additional information you need to know. Why didn’t you ask me for all the details? You even aren’t ashamed of using information from stolen letters.
    Do carelessness, grubbiness, dishonourableness are the
    necessary concomitants of your job?
    With disrespect…

    Steve: all graphics and results in these posts have been supported by turnkey code, showing the precise calculations for an interested reader (other than the calculation from your living data set which I showed the calculation method.) Some of the steps have been shown in recent or linked posts and the present post is not self-contained. But the steps are all shown

    As to the CRU emails, I do not know that they were “stolen”. Many people believe that they were released by someone within the University. Nor was any disrespect shown to you in the quotation from the email, which showed you in a professional light.

    I totally agree with standards requiring disclosure of “what dataset used in each case, how data were processed, and what was the reason for that, what limitation there are, what kind of additional information you need to know”. That’s why I provide turnkey code as much as possible. Much of my frustration in this field has arisen because authors do not do this and are unhelpful to inquiries. I’ve placed source code to generate the graphics at the bottom of the article.

    I regret that you you feel this way. If I can provide specific clarification on data sets, processing steps, etc, perhaps through reference to prior posts and scripts that are familiar to regular readers or through any other way, please advise me.

    • MikeP
      Posted May 16, 2012 at 9:12 AM | Permalink

      I wonder who spoofed the name? Peter Gleik? Gavin visiting in disguise?

      Steve: the comment appears to be genuine.

      • MikeP
        Posted May 16, 2012 at 10:33 AM | Permalink

        Mea culpa. It just didn’t feel like the way a true scientist would write.

        • Steve McIntyre
          Posted May 16, 2012 at 10:54 AM | Permalink

          He has to coexist with Briffa, Schmidt and those guys. I suspect that he’s received criticism for providing me with data. I didn’t do anything complicated in the calculation, so I’m not sure what his specific problem is.

          In addition, CRU has told Muir Russell and the public that the “purpose” of Briffa 2000 and Briffa et al 2008 was to do RCS-style calculation on a Hantemirov data set. What is the objection to doing a similar calculation on his 120 core living data set? Even if there are other worthy data sets.

        • TerryMN
          Posted May 16, 2012 at 11:50 AM | Permalink

          I suspect that he’s received criticism for providing me with data.

          This was my immediate thought. Which is sad, IMO…

        • Posted May 16, 2012 at 1:04 PM | Permalink

          I was going to suggest that you write up a joint paper with him, but it doesn’t look like he would be inclined!

        • MikeN
          Posted May 16, 2012 at 3:08 PM | Permalink

          English may not be his primary language.
          I suspect the ask me for details is a reference to Yamal and Briffa 2000 and how Steve had the data all along.

          I’ve discussed this over and over. I never had data for Taimyr and Tornetrask. I did not know that Briffa 2000 used the dataset of Hantemirov and Shiyatov 2002. Nor could Hantemirov provide me with that confirmation. If Briffa had simply said: I used the same data set as was subsequently described in Hantemirov and SHiyatov 2002, then that would have resolved any question about Yamal, but left Taimyr unresolved. The Yamal dispute did not arise from Yamal in isolation, but in the comparison of the methodology at Taimyr to the methodology at Yamal. I understand that this issue continues to be raised as a debating point. However, it really sounds like pettifogging to anyone with experience with due diligence in real life.

        • J Bowers
          Posted May 17, 2012 at 3:14 AM | Permalink

          “He has to coexist with Briffa, Schmidt and those guys.”

          Why? Surely Hantemirov holds all the cards. According to the CG emails, by 1998 he was already highly regarded in the dendro field, and is described as “world class” as far back as 2000.

      • Posted May 16, 2012 at 1:06 PM | Permalink

        It would be worthwhile checking the email header carefully – it is easy to ‘spoof’ the return path in an email message. Please snip if this is already done or you don’t want to clutter up the thread. In the message header you’ll see the path whereby the message got to you in terms of ip addresses in ‘Received:’ lines.

        • Posted May 16, 2012 at 1:09 PM | Permalink

          Sorry forget that – just relevant for email – you should have an ip address for a comment.

      • Anthony Watts
        Posted May 18, 2012 at 5:25 PM | Permalink

        “Comment “appears to be genunine”. I’m not sure of this.

        Easy to find Hantemirov’s email address via Google

        A continuous multimillennial ring-width chronology in Yamal, northwestern Siberia

        Rashit M. Hantemirov
        Institute of Plant and Animal Ecology, Ural Division of the Russian Academy of Sciences, 8 Marta Street 202, Ekaterinburg 620144, Russia; rashit “at” ipae – uran – ru (obfuscated by Anthony to prevent spambot harvest)

        However, using my adminsitrator privilege at CA I’ve looked at the IP address where this comment originated, and it doesn’t make sense.

        Geolocation Information
        Country: Russian Federation ru flag
        State/Region: Krasnoyarsk
        City: Ural
        Latitude: 55.9052
        Longitude: 94.7537

        It also says: Confirmed proxy server

        That’s 2100km from Ekaterinburg, where Hantemirov is supposed to work. That, and the language combined with the proxy server notice makes me think this might me a doppleganger. Why would he need to use a proxy server to drop a comment? And, the email used in the comment is different than his academy email. It sources to which appears to offer a gmail like email alternative. We see plenty of examples of setting up dead drop gmail accounts to bomb somebody…Gleick for example.

        • theduke
          Posted May 18, 2012 at 5:50 PM | Permalink

          Good work, Anthony. But he needs to confirm it wasn’t him. Seems he would have heard by now if someone had impersonated him.

          Maybe someone needs to call him.

        • theduke
          Posted May 18, 2012 at 5:54 PM | Permalink

          I also thought the language was suspicious. In particular the use of the word “grubbiness.” There’s been a discussion about the misuse of a derivative of that word lately . . .

        • Nick Stokes
          Posted May 18, 2012 at 11:10 PM | Permalink

          Krasnoyarsk has a dendroecology department at the Sukachev Institute of Forest Research.

          His co-authors Vaganov and Shiskov are there.

        • Steven Mosher
          Posted May 18, 2012 at 11:17 PM | Permalink

          shoot me the URL and I will check it out at various hang outs.

        • Steven Mosher
          Posted May 18, 2012 at 11:17 PM | Permalink

          I mean IP

        • seanbrady
          Posted May 23, 2012 at 9:30 AM | Permalink

          “But he needs to confirm it wasn’t him. Seems he would have heard by now if someone had impersonated him.

          If he’s not a regular CA reader he may not know about the impersonation. And if Steve replied directly to the email address from which the suspicious email was sent (which seems likely, given that Steve thought the email was genuine) then there’s no reason he would know about it.

          Best would be for Steve to email him at the address from which the original email, containing the data, was sent.

    • mpaul
      Posted May 16, 2012 at 9:18 AM | Permalink

      You did not define what you compare, what dataset used in each case, how data were processed, and what was the reason for that, what limitation there are, what kind of additional information you need to know.

      For years, we have been arguing that such a standard should be applied to the published work of the Team.

    • KNR
      Posted May 16, 2012 at 9:46 AM | Permalink

      Such views could be taken more seriously if they where applied universal, as they should be . Yet despite the self given status of the ‘best in their ‘ area the Team have often failed to meet the standards expected of an undergraduate writing an essay . Why this causes Rashit no concern is a very good question, perhaps they feel scientific standards should be flexible depended on what the scientists position on ‘the cause ‘ is.

    • Salamano
      Posted May 16, 2012 at 9:47 AM | Permalink


      With your comments, I hope you haven’t just taken yourself off the list of possible peer-reviewers for the forth-coming regional Ural/Yamal chronology that’s been talked about.

      The more methodological/selection clarifying elements within this coming paper, the better.

      Surely one can’t just declare all cores that ‘show’ a chronology equal to the observed temperature record as ‘accurate’, and all others ‘flawed’ and rejectable… otherwise, this would be a circular argument/proof.

    • Steve McIntyre
      Posted May 16, 2012 at 10:20 AM | Permalink

      Re: Rashit Hantemirov (May 16 09:01),
      I’ve added source code to the bottom of the article to generate all graphics and annotate where each data set and graphic comes from. As CA readers know, this is my common practice, particularly with posts of interest. It would have been better if I’d had the script up concurrently with the post; I regret any inconvenience caused by posting it up today rather than yesterday.

      I believe that the source code shows the provenance of each series. If there are any further questions, I’d be happy to respond.

      • bernie1815
        Posted May 16, 2012 at 10:50 AM | Permalink

        I am not sure how you could have handled Rashit’s comment better. Do you have a sense of what he is referring to when he says: “Why didn’t you ask me for all the details?” Were there specific details that you did not have that would have impacted your analysis?

        • bernie1815
          Posted May 16, 2012 at 10:51 AM | Permalink

          Also, have you tried off-line communication with Rashit?

          Steve: I sent him an email this morning in a pleasant tone. I thought that I’d sent him one notifying him of the post as follows:

          Thanks again. I’ve done a post at Climate Audit on this . If I’ve made any errors of interpretation, I’d be happy to correct. Regards, Steve Mc

          However, when I looked in my Sent folder, I hadn’t sent it, it was stuck in Drafts. Too bad, but I don’t think that it would have made any difference.

          I sent the following this morning:

          I’m sorry about your reaction to my post. It’s my usual practice to place turnkey scripts online to access all data and generate all scripts, even for a blog article. I regret that this was not up concurrently with the post, but it is up now (at the bottom of the article) and I hope that it clarifies the graphics.

          In relation to the liv.rwl file, I did a RCS style fit to your living data, using an algorithm in R to emulate the style that Briffa has used in many of his articles.

          I apologize for any misunderstanding,
          Steve McIntyre

        • pete m
          Posted May 22, 2012 at 9:57 PM | Permalink

          Steve, have you received a reply to your nicely worded emails?

      • Don Monfort
        Posted May 16, 2012 at 10:54 AM | Permalink

        Your problem is that you don’t follow proper climate science protocol, Steve. You gave up way too easily. When ‘harassed’ for code and that stuff, you are supposed to stonewall them forever, even at the cost of your credibility.

    • theduke
      Posted May 16, 2012 at 12:44 PM | Permalink

      Mr. Hantemirov: re your criticism of Steve for quoting “stolen letters,” you should know that Gavin Schmidt has done the same thing re the Osborn email in one of his replies to Steve at RealClimate. Those emails are now in the public domain and, rightly or wrongly, are useful in a discussion of problems that afflict climate science.

      Regardless, now that Steve McIntyre has posted his source code, I would hope you would return and engage him on those matters that concern you. Too often those who have disagreements with Steve post something harshly critical and insulting and then disappear.

      If you do return I hope you do it without “disrespect.” These are, afterall, scientific discussions.

    • Kenneth Fritsch
      Posted May 16, 2012 at 2:42 PM | Permalink

      I think my first response is along the lines of theduke and that is would not the curious scientist want to first ask for information that he thought was missing. If he had processed the same data, I would think he would be eager to share his results.

    • seanbrady
      Posted May 17, 2012 at 10:01 AM | Permalink

      The most curious thing is that Mr. Hantemirov accuses Steve of things he would have known or expected before he sent the data. Anyone who knows anything about Steve woul know that he runs statistical analyses on raw data, and would have expected exactly the sort of analysis found in this post. Also, it’s hard to imagine that someone named in Climategate emails didin’t already know that Steve has discussed those emails on this site.

      Perhpas Mr. Hantemirov never intended to send Steve the data in the first place. He may have been asked to provide data for a work in progress (which Gavin may also know about, hence his curous claim that Steve claims there is no such work in progress), may have had correspondence in his email from someone working on that paper, asking for the data, and also an email from Steve asking for the same data. An assistant could easily have sent the data to the wrong person. It’s happened to me before!

      • theduke
        Posted May 17, 2012 at 10:53 AM | Permalink

        Given his circumstances, Mr. Hantemirov may be subjected to forces we know nothing about. As Counselor MacDonald says, best to leave it at that and move on.

    • Jeff Condon
      Posted May 17, 2012 at 6:23 PM | Permalink


      I do take issue with your critique of Steve not disclosing ‘how to process’ the data. First, he has always disclosed how the data was processed whenever asked. ONE HUNDRED PERCENT of the time. Second, I have seen, and replicated, Steve’s RCS calculations myself so many times that I did NOT need to ask to see the code in order to replicate the above post.

      He has done the same thing so many times that technical readers aren’t required to ask. Of course an obviously NEW reader here like yourself, of some technical skill, could have easily asked for a clarification. Instead, we get a rant.

      I am of the opinion that you have been pressured by your colleagues for releasing their secret. Why you wish to protect them is beyond me.

      The scientists doth protest too much, methinks!

    • polysci
      Posted May 20, 2012 at 8:33 PM | Permalink

      The comment from Hatnamirev is very reasonable and common.

      Steve has a bad, bad, BAD habit of writing blog posts that do not precisely explain what the heck he is doing. And it is not some nitpick…it is honestly obscure. He does not label axes. Does not CITE papers (not a link, not read some previous post…but EFFING CITE like a nuke).

      Of course the yuck yuck cheering choir, cheers him on. But I have pushed some stuff for clarification and found Steve was wrong or at least was giving a wrong impression by selective analysis and mistaken “gotcha” implications.

      Always remember that this is Steve’s SCRATCHPAD. He stands behind nothing. It is just trial analysis BS. That he publicizes and makes big hay like an Anthony Watts moron.

  30. Barclay E MacDonald
    Posted May 16, 2012 at 2:50 PM | Permalink

    An argument can be made that Rashit Hantemorov said what he had to say for additional reasons not identified here. Leave his statement at that and move on.

    • theduke
      Posted May 16, 2012 at 5:44 PM | Permalink

      Yes. Some of those possible reasons have occurred to me since I wrote my post.

  31. Gerald Machnee
    Posted May 16, 2012 at 4:04 PM | Permalink

    Steve, there was no similar response when you previously received and used data from him.
    I think there was interference.

    Steve: our previous exchange was in 2004. Climate Audit didnt exist. I was mainly working on MBH98 at the time. He did communicate my inquiry back to Briffa. When Briffa told me in May 2006 that he would seek permission from the Russians (not just Hantemirov from others), I did not take his reply as expressing a genuine intent to try to make the datasets available. (I’d been told that Briffa had spoken to other dendros in Beijing opposing giving me data). I don’t believe for a minute that Briffa emailed the holders of the data at the time seeking permission.

    • bernie1815
      Posted May 16, 2012 at 5:11 PM | Permalink

      Surely we would have seen one or more of those emails or a trace of them?

  32. R.S.Brown
    Posted May 16, 2012 at 7:54 PM | Permalink

    A slightly off-topic aside:

    Steve McIntyre/Rashit Hantemirov,

    With respect… we seem to be seeing one of the “tiger traps” hidden within the
    use of computer email systems that cause confusion , irritation and
    at times even outrage on the part of the sender or the receiver.

    Steve, sometimes even if you actually hit “send” on an email final draft to a prewritten
    address or an aliased address list, or even if you’re using “respond” which uses the
    original author’s email addy, things can add up to where the email essentially
    goes down a wire into the ground and nowhere else.

    If you used a “confirm receipt” tagged onto your email to Rashit, then moved on to
    other items, you had various stuff come into your “in” box, and the lack of a confirmation
    receipt wouldn’t be noticed. Depending on how many server hops are between you and
    the supposed recipient, there can sometimes be a seriouslag time between the “send”
    action and the receipt confirmation.

    It can get worse if you import ( ~ ) a text (or data) file into your pre-addressed
    email as the body of the message without further editing within the email program.
    Hitting “send” in this instance and you’ll get “comnfirmation” of it going out through
    your “sent” box, and you may even get receipt confirmation back… except you sent an
    alomst blank body.

    One sure way to avoid the “But I thought I sent you…” situation is to always blind
    carbon copy yourself on anything you consider important. These BCC’s show up almost
    instantly as an incoming mail announcement. You’ll just have to transfer these out to
    another file later, or delete them when the real receipt confirmation comes in.

    The McIntyre/Hantemirov messages above are one more example why good folks yell at
    their computers, “OMG, you did exactly what I told you to do!”

    Computer messaging always invloves impersonal relations.

    Been there, did that… almost at the expense of ny old job a few years ago.

  33. Posted May 17, 2012 at 11:07 AM | Permalink

    I just thought I’d mention that Rashit Hantemirov has read ClimateAudit in the past ‘I had a look to McIntyres post and can superficially comment…’ ( So he has a sense of the open and detailed nature of Steve’s posts. The message 4981.txt which shows this also shows Rashit and the CRU perfecting their public messaging to inquiring journalists, and Melvin conditioning Rashit in climate reconstruction dangers “If you are reconstructing climate using the new data set there are a few problems (RCS bias) that could cause problems. If you wish for any suggestions or help from us please ask.”. And is ‘slipshod’ a high entropy word?!

    • seanbrady
      Posted May 17, 2012 at 3:39 PM | Permalink

      I had a similar thought. Hantemirov could have expected exactly what Steve did with the data, and further would likely have know that Steve (like nearly everyone else) feels comfortable discussing the released Climategate emails.

      Which made me think it’s unlikely he sent the data and was shocked that Steve analyzed it and also shocked, shocked that Steve mentioned a Climategate email in the post.

      Possibly, he sent the data to Steve by accident, either by replying to the wrong email or by asking an assistant to send it and the assistant replied to the wrong data request.

    • Martin A
      Posted May 20, 2012 at 3:55 AM | Permalink

      Is “dishonourableness” high entropy? [English (almost) spelling, rather than Merkan, as someone on BH pointed out.]

  34. Sera
    Posted May 19, 2012 at 11:42 PM | Permalink

    I find it unusual that a russian would use words like ‘slipshod’ and ‘concomitants’- not exactly russian verbage IMHO.

    • Jon Grove
      Posted May 21, 2012 at 4:39 AM | Permalink

      Go on, put us all out of misery. It’s a lovely conundrum: was it Hantemirov or not? Not all the comments that have made the cut here can possibly be right — but at the moment things all seem a bit unresolved and post-modern. It’s not only a question of truth, but also epistemology. With the results in, it should be possible to calibrate the utility of arm-chair pontification on English language usage (here apparently on the part of an educated user more familiar than most native users with high-register and highly specialised technical language — but still no native for all that). This kind of linguistic debate has become a delightful new element in the climate wars over the past few months. What’s the result of the final audit?


      Heinrich Schliemann

    • Jon Grove
      Posted May 23, 2012 at 3:23 AM | Permalink

      Please can you confirm whether or not you now know whether it was Hantemirov who contacted you in the comments above? You should have further information by now, after having contacted him directly on the matter. A clear statement would cut through the ill-founded speculation which has accumulated here. At the moment the matter has been left hanging in a way that simply advances the power of rumour and muddle.

      • kim
        Posted May 23, 2012 at 8:13 AM | Permalink

        No news, is bad news.

  35. Alexej Buergin
    Posted May 21, 2012 at 5:39 PM | Permalink

    May we read Hantemirov’s mail to McI from 2002 and 2012?

    • Steve McIntyre
      Posted May 21, 2012 at 9:48 PM | Permalink

      Re: Alexej Buergin (May 21 17:39),

      2004 email is in the Climategate emails. Hantemirov notified Briffa about my request.

      • Alexej Buergin
        Posted May 22, 2012 at 11:09 AM | Permalink

        I do not doubt that these emails exist. But I wonder if the cordial tone of the cover note suggests that Hantemirov made his “mistake” on purpose and with intent (as I wonder if Wagner did the same), and now pretends it was just something like a freudian slip.
        As a real scientist he would want his data to be treated with respect, that is mathematically correct, and not manipulated.

        • Ed_B
          Posted May 25, 2012 at 7:47 PM | Permalink

          Was Hantemirov trying to use sarcastic humour? Seems very Russian to me!

      • Jon Grove
        Posted May 28, 2012 at 5:07 PM | Permalink

        Once again: please can you state whether or not you know whether the comment attributed here to Hantemirov was authentic? It’s most perplexing that you haven’t yet clarified the matter.

        Steve: haven’t heard from Hantemirov.

10 Trackbacks

  1. […] Steve McIntyre’s latest here Rate this: Share this:TwitterFacebookStumbleUponRedditDiggEmailLike this:LikeBe the first […]

  2. By The Climate Change Debate Thread - Page 1265 on May 15, 2012 at 11:07 AM

    […] […]

  3. […] wars between and, where Steve McIntyre has just delivered two more blows to the ‘Team’ treering chronology-temperature reconstructions. These have been used to […]

  4. […] New Data from Hantemirov May 15, 2012 – 7:31 AM […]

  5. […] Steve McIntyre recently published a new graph on his website Climate Audit. […]

  6. […] have been reading the back and forth between Climate Audit and Real Climate. Steve McIntyre obtained some data from Rashit Hantemirov and posted a […]

  7. […] of which is remarkably similar to the calculations in my posts of September 2009 here and May 2011 here, both of which were reviled by Real Climate at the […]

  8. […] of which is remarkably similar to the calculations in my posts of September 2009 here and May 2012 here, both of which were reviled by Real Climate at the […]

  9. […] of which is remarkably similar to the calculations in my posts of September 2009 here and May 2012 here, both of which were reviled by Real Climate at the […]

  10. […] Scenario A in any presentation or blog article, other than an incidental use in a May 15, 2012 post where I ironically observed that a Yamal chronology incorporating fresh data from Hantemirov was […]

%d bloggers like this: