RCS Homogeneity- Esper in Jaemtland

Starting with the first of my recent posts on Yamal, I raised the issue of whether the CRU 12 actually came from a homogeneous population to the subfossil population. This issue was related to the surprisingly small sample size of the supposedly “highly replicated” Yamal chronology, but is distinct. In his online response to Yamal posts, Briffa stated that they have “stressed” potential problems arising from “inhomogeneous” sources” in their “published work”.

Indeed, we have said so before and stressed in our published work that possible chronology biases can come about when the data used to build a regional chronology originate from inhomogeneous sources (i.e. sources that would indicate different growth levels under the same climate forcing).

Whether two populations are homogeneous is ultimately a statistical question. Consideration of this question as a statistical question has been blurred to date by the lack of understanding within the dendro community of the statistical issues involved in making a chronology (or within the statistical community of the difficult and interesting applied problems that dendros are trying to solve.) The disconnect is neatly illustrated by the references to Briffa and Melvin 2008, a highly technical article on standardization which nonetheless doesn’t include a single reference to an article or text by a non-dendro and which shows no awareness of how making a “chronology” ties into statistical literature on random effects (or how testing for population homogeneity relates to the statistical concept of “exchangeability”).

While Briffa’s articles occasionally contain caveats on population requirements for RCS standardization, I have been unable to locate any Briffa articles which he actually procedures for testing populations for homogeneity or the results of such tests, a lacuna that seems somewhat inconsistent with the idea that CRU has stressed this issue. In this instance, one wonders whether there is any practical difference between CRU stressing an issue and ignoring it.

While there doesn’t appear to be any relevant procedure in the Briffa corpus, Esper et al 2003 url does provide an on-point qualitative discussion of when two different populations can be combined in an RCS population (Tom P noted this in an earlier thread). It isn’t a formal test procedure, but it does provide a useful example (Jaemtland) where two populations were considered sufficiently distinct that they were not combined and does provide a basis for analysing Yamal homogeneity in a framework recognized by dendros.

In this post, I’ll provide a detailed description (and emulation) of the methodology described in Esper et al 2003, which, in another post, I will then attempt to apply to the “extended” Yamal data set recently presented by Briffa.

Esper’s example -see his Figure 8 – considered populations from two distinct sites: Jaemtland (swed023) and Trondelag (norw002), further distinguishing the Jaemtland site into two species PISY and PCAB. One attractive feature of this example as a precedent for analysing Yamal is that the Jaemtland population ends in the early 19th century, while the Trondelag population begins in the late18th/early 19th century, so that the techniques bear directly on the issue of population homogeneity between the Yamal subfossil and the “expanded” YAD-POR sites recently proposed by Briffa. To further accentuate the similarity, the Trondelag data set is a Schweingruber data set where each tree has two cores, while the Jaemtland data set is a non-Schweingruber data set with one core per tree.

Esper’s first step was to calculate age-dependence curves for each of the populations in question, plotting the resulting age dependence curves on the same quadrant. The following graphic shows the original Esper graphic, together with my emulation from available information (note Esper’s use of a biweight mean – which requires 4 samples). Esper observed that the (red) Trondelag (PISYsk) had a “significantly higher” growth rate and has a “rather different” slope than (black) Jaemtland. Esper’s comparison is qualitative, but nonetheless there is an obvious difference between the two age-dependence curves. One feels that this particular comparison could be worked up into a quantitative test (relatively easily if the comparison is between two negative exponential fits.)

Figure 1. Esper et al 2003 Figure 8B showing age dependence curves from two nearby sites. Left – original; right – emulation from Esper data obtained from Sciencemag. The core counts for Trondelag and Jaemtland match the reported core counts in E2003, but there is no available metadata showing which cores in swed023 are PISY and which are PCAB, the two species having been mixed in the archive. The Trondelag (PISYsk) curves should match exactly; while they are close, they don’t quite. Esper uses a biweight mean instead of a simple mean or median (this is OK.)

Esper demonstrated the impact of two non-homogeneous populations on an RCS chronology in his Figure 8D shown below at left versus my emulation shown at right, together with the core counts for Jaemtland (light gray) and Trondelag (dark gray). These show first of all that I’ve accurately emulated the key features of Esper’s methodology for this particular analysis. (Further comments below graphic.)

Figure 2. Left: Esper 2003 Figure 8D; right – Jaemtland chronology as used in Esper et al 2002, together with core counts. Light grey – Jaemtland; dark grey – Trondelag.

Look first at the early 19th century transition from Jaemtland to Trondelag data and the impact on the RCS chronology. Application of a one size fits all RCS method to this inhomogeneous population results in a inhomogeneity in the RCS chronology at the hinge point linking the two populations. Esper characterized the inhomogeneity as resulting in an “artificially shifted” mean in the early 1800s.

Esper stated that the inhomogeneity is caused by the

“rapidly growing [Trondelag] samples forcing the RCS spline [SM: would be the same for negative exponential] too high for the older, slower growing samples. This feature, namely the significantly deviating growth rates and growth decreases with aging of the Trondelag samples, indicates the existence of a different population in the sense introduced earlier. It also demonstrates the biasing effects of different populations and the fundamental requirement of the RCS method: sample homogeneity”.

There’s nothing in this paragraph that ought to be objectionable to any CA reader – it’s the sort of thing that we’re regularly concerned with. And it’s precisely the sort of thing that I’m wondering about at Yamal.

There are some other interesting features to this graphic. In the 13th century, the sample replication goes down to one core from Tornetrask, inserted into the data set as a bridge. The breakpoint in the RCS time series is pretty obvious. At the start of the record, replication falls below benchmark standards and again there is an obvious breakpoint.

In this particular analysis, Esper is at least attempting to analyse population homogeneity. In contrast, despite its length, the analysis in Briffa and Melvin 2008 fails to deal in any relevant way with the problem of population inhomogeneity (and other Briffa articles are even less helpful.)

When you have inhomogeneous populations, you can’t just add everything together. This is well understood in social science, where analysts have to take care to separate the effects of different factors. Esper’s Jaemtland-Trondelag example provides a foothold for trying to quantify population inhomogeneity within frameworks familiar to and accepted by dendros.

In a forthcoming post, I’ll apply these methods to both the Briffa 2000 Yamal data set and the “moved on” data set with some interesting and perhaps surprising results.

Script: multiproxy/esper/jaemtland.txt


  1. jeff id
    Posted Nov 9, 2009 at 4:45 PM | Permalink

    Region by region the trees are more homogenous but it still didn’t look very good. Taking the new Briffa data by region the HS was reduced considerably even though the new data had the same 12 series as the original Yamal.

    The black line is the original Yamal- clipped off the top end of the graph. The red curve is all the data in the sensitivity study corrected by region.

    When the whole series is RCS’d together, the curve looks very different. We’ve been wrestling a bit on this issue with delayed.oscillator at tAV. He didn’t like my point that the variance in the 20th century was mostly due to problems with homogeneity and probably not the fact that it got referred to as hockeystickization. It’s something I need to nail down better but from the early results like this figure and several hundred other plots, it didn’t seem unreasonable.

    I’ll look forward to Steve’s next step.

  2. Geo
    Posted Nov 9, 2009 at 4:58 PM | Permalink

    “Consideration of this question as a statistical question has been blurred to date by the lack of understanding within the dendro community of the statistical issues involved in making a chronology (or within the statistical community of the difficult and interesting applied problems that dendros are trying to solve.)”

    Change the parenthetical “or” to an “and”. . . then we can all start a round of “Kumbaya”.

    Yes, I know. . . Steve, as avatar of the statisticians, clearly has been trying desperately hard to understand “the difficult and interesting applied problems” of the dendros, and mostly turned away as not a lodge member for his efforts.

  3. bernie
    Posted Nov 9, 2009 at 5:01 PM | Permalink

    Is this a pretty central issue for building any proxy from multiple trees covering different time periods?
    Wouldn’t any test for homogeneity require a pretty large overlap in the years covered and a sufficient sample size of cores for both sets of data for the common period in order to be able to say that the two measures for the common years are statistically identical? Wouldn’t this type of homogeneity testing already need to be done in order to construct a proxy for a single site from multiple trees covering multiple periods, let alone different species from different locations? Aren’t the problems with stripbarks examples of what happens when homogeneity assumptions are made without testing?

  4. Peter Dunford
    Posted Nov 9, 2009 at 5:46 PM | Permalink

    I searched for a while on-line looking for homogeneity requirements in RCS, to try and understand the restrictions. In Briffa et al 1996 they write that RCS:

    requires large amounts of tree growth data representing a wide range of different tree ages, each distributed widely through time and all drawn from a single species in a relatively small region.

    (Although this quote comes from Briffa 2001 in explaining why they did not use that technique in the paper.)

    This is presumably to ensure the homogeneity that Steve is investigating.

    “Large amounts” and “relatively small” seem to be hurdles that some recent RCS work falls at. No doubt RCS analysis techniques or definitions have improved (moved on?) since 2001.

    I’m still struggling with the idea that there are magic trees that ignore the local growing conditions (light, temperature, nutrients, crowding out, rainfall etc.) and match themselves to somebody’s chosen global climate signal established from broken temperature reconstructions extrapolated and derived from something called the “instrument record” and manipulated the hell out of to show the required trend and then peer-reviewed to death (not). Bright trees. Clever trees. Dead trees. Signally trees. Noisy trees. But always Magic trees.

    • Steve McIntyre
      Posted Nov 9, 2009 at 5:52 PM | Permalink

      Re: Peter Dunford (#4),

      Although this quote comes from Briffa 2001 in explaining why they did not use that technique in the paper.

      In Melvin and Briffa’s more recent articles, they observe that “Age Banded Decomposition” is actually an alter ego for RCS (with very slight variation). Presumably the data met the criteria for neither method.

    • Steve McIntyre
      Posted Nov 9, 2009 at 6:09 PM | Permalink

      Re: Peter Dunford (#4),

      No doubt RCS analysis techniques or definitions have improved (moved on?) since 2001.

      I have seen no evidence of any improvement to (or even consideration of) population homogeneity methodology since Esper et al 2003. Melvin worries a lot about spline shapes, but IMO this is hugely secondary or tertiary to population homogeneity.

  5. Steve McIntyre
    Posted Nov 9, 2009 at 6:03 PM | Permalink

    Scripts is at http://www.climateaudit.org/scripts/esper/jaemtland.txt

    ## Esper et al 2003
    ## Emulate Jaemtland Comparison

    source(“http://data.climateaudit.org/scripts/utilities.txt”) #
    f=function(x) filter.combine.pad(x)[,2]


    restate=function(tree,info) {
    tree$tree= info$tree[match(tree$id,info$id)]

    #the counts match reported counts in Esper 2003
    download.file( “http://data.climateaudit.org/data/esper/jae.rwl.tab”,”temp.dat”,mode=”wb”);load(“temp.dat”)
    # 25249 4
    tapply(info$core,info$site,function(x) length(unique(x) ))
    #545 940 TOR
    # 2 10 1
    #matches Esper 2003: jae: 133= 119 PISYjae + 14 PCAB jae; sk: 24 PISYsk; Tornertrask – 1
    temp= info$site==”940″
    nrow(info) # 158

    ##Stratify the population by making Id list

    strata[[1]]= info$id[info$site==”940″] #swed023
    strata[[2]]= info$id[info$site==”545″] #norw002
    strata[[3]]= info$id[info$site==”TOR”] #tornetrask


    chron.jae= make.chron(jae,strata[1:2]) #this collects chron for each stratum into a litst
    sapply(strata, function(A) max(tree$age[!is.na(match(tree$id,A))]) );
    # swed023 norw003 tornetrask
    # 276 203 201

    data.frame(t(round(sapply(chron.jae$strata, function(A) A$coefficients) ,4) ) )
    # A B C
    #swed023 33.1308 122.7716 0.0340
    #norw003 -74.6444 242.6716 0.0038

    ## PLOTS

    ### COUNTS

    # GDD(file=”d:/climate/images/2009/esper/jaemtland_count.gif”, type=”gif”, h=360, w=420)
    title (“Esper JAE Core Counts”)
    # dev.off()

    ### Age Dependence Curve of Subpopulations

    # GDD(file=”d:/climate/images/2009/esper/jaemtland_dependence_rev.gif”, type=”gif”, h=360, w=360)
    plot.rcs(chron.jae,ylim0=c(0,250),method=”biweight” )

    # dev.off()

    # GDD(file=”d:/climate/images/2009/esper/jaemtland_composite.gif”, type=”gif”, h=320, w=420)
    details=Details=list(at=seq(25,75,25),ylim=c(0,150),col=rep(c(“grey40″,”grey80″),4) )
    abline(v=c(1789,1828),col=”purple”,lty=3) # 20%,80% quantiles
    title(“Jaemtland Populations”)

    # dev.off()

    download.file( “http://data.climateaudit.org/data/esper/esper.archived.tab”,”temp.dat”,mode=”wb”);
    range( time(esper.archived)[!is.na(esper.archived[,”Jae”])] )
    #1107 1804

  6. JS
    Posted Nov 9, 2009 at 6:17 PM | Permalink

    I’m quite busy at the moment but might be interested to see what happens if I apply my random effects with lots of dummies method to this data and see a) what the resulting chronology looks like and b) what the standard errors look like c) What the results of a test for population homogeneity give. Is there a nice text file version of the data sitting around somewhere?

    (I wonder if your answers on the CRU 12 will be that same as mine using my non-dendro-approved-but-well-known-to-statisticians technique… I await your posting with baited breath.)

    • Steve McIntyre
      Posted Nov 9, 2009 at 6:57 PM | Permalink

      Re: JS (#8),

      I’ll do up my random effects version as well. I’d like to compare notes.

  7. steven mosher
    Posted Nov 10, 2009 at 12:15 AM | Permalink

    The term “large” used by briffa begs for some kind of quantification. Some idea of the variances involved and the confidence interval size desired should drive the sampling requirements. Isn’t this just design of experiments 101? or am I missing something again?

  8. Ron Cram
    Posted Nov 10, 2009 at 12:30 AM | Permalink

    Steve, I love this line:

    In this instance, one wonders whether there is any practical difference between CRU ”stressing” an issue and ”ignoring” it.

    This was the same idea I was thinking but you express it so well!

  9. Chas
    Posted Nov 10, 2009 at 3:48 PM | Permalink

    One feels that this particular comparison could be worked up into a quantitative test (relatively easily if the comparison is between two negative exponential fits.)

    Trondelag doesn’t look to be simple negative exponential growth. Forestry scientists seem to have used a Weibull distribution for growth curve fitting:

    Click to access 300_colbert.pdf

    So might the Weibull shape parameter work as an index of exponential-ness?

    • JS
      Posted Nov 10, 2009 at 4:39 PM | Permalink

      Re: Chas (#14). With enough data you can fit it without any functional form restrictions at all. Once you’ve done that you can also test for the appropriateness of any given functional form restriction. I’ll bet that both Weibull and exponential are rejected for most data sets due to site-to-site idiosyncracy even though they might be reasonable approximations of a generic and highly stylised ‘representative tree’.

    • Geoff Sherrington
      Posted Nov 11, 2009 at 2:42 AM | Permalink

      Re: Chas (#14),

      I agree with earlier CA writers, that a continuous down-dip curve like a negative exponential eliminates the case of a tree that started to grow larger tree rings in later life. A more compound type of curve, with more inflexions seems needed – Weibull might be a candidate. The trouble is, from whence does one derive the detail that indicates which style of curve is appropriate?

      Might I venture the answer? From simple experiments that should have been done before many complex papers were published.

  10. MarkB
    Posted Nov 10, 2009 at 4:51 PM | Permalink

    Briffa and Melvin 2008, a highly technical article on standardization which nonetheless doesn’t include a single reference to an article or text by a non-dendro and which shows no awareness of how making a “chronology” ties into statistical literature on random effects (or how testing for population homogeneity relates to the statistical concept of “exchangeability”).

    I’m also suspicious on general principles when authors cite their own work to justify themselves. Such papers are echo chambers.

  11. Rattus Norvegicus
    Posted Nov 10, 2009 at 8:27 PM | Permalink

    Well, Briffa and Melvin 2008 is a chapter from a textbook. I don’t find his modest citing of his own work that odd since he *has* done a large amount of the work developing the RCS method. And of course, Steve is a bit misleading here since the meat of the article is about the possible biases which can be introduced into a chronology.

    Steve: I was quite precise. Do you have trouble reading? Briffa and Melvin did not use the term “inhomogeneity” or the word population in a relevant sense. They discussed a variety of biases in artesanal terms that did not connect to any statistical literature within known statistical canons or cite any text or reference by a statistician.

    • bender
      Posted Nov 10, 2009 at 8:47 PM | Permalink

      Re: Rattus Norvegicus (#17),
      That ain’t the point. They don’t consult the mainstream statistical literature. That’s the point. And that’s a fact. A fact that did not escape notice of Wegman. Nobody learns in Team town. But hey, the science is self-correcting.

  12. Håkan B
    Posted Nov 11, 2009 at 9:27 AM | Permalink

    Where can I find the exact location of the Jaemtland proxy?

  13. Chas
    Posted Nov 11, 2009 at 1:49 PM | Permalink

    Re (JS#15) I was primarily suggesting it as a number that could be used when checking for homogeneity – though I suspect that the parameters from exponential fits could pick out most groupings of tree growth, in say, the Yamal collection .
    On the other hand, if the trees from Trondelag (above) had grown rather more slowly, one might find it hard to pick them out from the Jaemtland trees, in a mixture, using only exponential parameters, and then one is left in the position of saying that it became cold every time one of these trees started to grow.
    -I am guessing that the Weibull shape parameter would seperate them.
    Re (Geoff Sherrington #19), After locating/creating the subgroups it perhaps isnt necessary to use any type of prescriptive curve form and so upticks would tend to sort themselves out. However the boundaries between groups might be very blurred — ie does one deal with X13 (in the Yamal ragbag) on its own?

    • Steve McIntyre
      Posted Nov 11, 2009 at 4:25 PM | Permalink

      Re: Chas (#21),

      We’ve seen “Hugershoff” functions mentioned in connection with tree rings and in particular for the Schweingruber data. Hugershoff and Weibull are the same thing.

  14. Håkan B
    Posted Nov 11, 2009 at 4:15 PM | Permalink

    Okey, in Steve’s data I found it to be 63.05 north, well within my guess, but Jeamtland at that latitude stretches from about 12 to 15 in latitude, not much for a canadian but quit a lot for a european. So does anyone have an idea about the location. I suppose Esper himself didn’t do the sampling.

    • Steve McIntyre
      Posted Nov 11, 2009 at 4:27 PM | Permalink

      Re: Håkan B (#22),
      63 30N; 15 30E according to NCDC information from SChweingruber. This is not always reliable.

  15. Håkan B
    Posted Nov 11, 2009 at 4:30 PM | Permalink

    Håkan B #22 latitude should of course be longitude.

  16. Håkan B
    Posted Nov 11, 2009 at 4:39 PM | Permalink

    Steve # 24
    Quite right not always reliable, this one in the middle of a lake, but thanks anyway!
    But while I have you at it who collected the data originally, if you know?

    • MrPete
      Posted Nov 12, 2009 at 6:32 AM | Permalink

      Re: Håkan B (#26),
      Note that by providing precision only to one minute, the search box for the site covers several km.
      Depending on whether the value was truncated or rounded, you can have fun searching 🙂

      Here’s a link (and the second if you need it), showing the two search areas, assuming the data is correct.

      (The fact that minutes are 30 for both makes me uneasy. It *could* be correct, but we’ve seen huge errors in the past, even off-by 100km.)

      • ianl8888
        Posted Nov 12, 2009 at 10:39 PM | Permalink

        Re: MrPete (#27),

        Why the #%%$## don’t people just use the UTM co-ords?

        Depending on the individual satellite photo strip, I’ve easily been able to zoom Google Earth to exact Siberian drillhole sites off recorded UTM co-ords where you can count the boulders in the stream nearby (only summer photos, of course)

        • Geoff Sherrington
          Posted Nov 13, 2009 at 2:25 AM | Permalink

          Re: ianl8888 (#29),

          There is a 1996 Australian paper that goes into fair depth about why UTM was not used from the start. Countries sometimes chose their own datum points and with later GPS they all had to be tied in together.

          Click to access tech_doc.pdf

          It’s not the only useful paper but it should do the job. There can be no single, static projection that satisfies all needs for all time. The earth is plastic and the divergence of some tectonic plates is 100mm a year.

          Other posts on CA have shown how, particularly with satellites, small projection/geodesy errors can have significant effects in climate science.

          In our exploration work, mineral leases that had been pegged along magnetic N-S fencelines in the 1870s were now quite away from the fenceline. “Only the stars never change”.

          It’s probably another discipline like statistics where climate scientists should have their work vetted by experts pre-publication. I do not know how many climate scientists would peg out grids that fail to account for curvature, but unless they had been exposed to the topic it would be easy to go astray.

        • ianl8888
          Posted Nov 13, 2009 at 5:31 PM | Permalink

          Re: Geoff Sherrington (#30),

          Thanks Geoff

          I’ve had an enormous amount of quite frustrating experience trying to collate survey for mapping across about half the world. I agree that compensating for the earth’s curvature is a constant issue. However, UTM gridding has proved the most useful, albeit that the surveyors constantly change the origin

          There is no absolute answer, of course, just more or less useful ones. UTM has worked for me much better than other choices but it ain’t perfect.

        • ianl8888
          Posted Nov 13, 2009 at 5:38 PM | Permalink

          Re: ianl8888 (#32),

          I meant to add that it obviously works least best nearer the poles. I’ve tried it in both far northern Siberia and western Antartica – the disconnect between adjacent UTM zones due to curvature is just too much

          However, the frustration of people trying to locate the dendro sample sites on Google is clear

  17. Geoff Sherrington
    Posted Nov 12, 2009 at 7:28 PM | Permalink

    It looks like the region has been heavily logged and replanted at different times. Anyone disagree? I wonder when? It’s not uncommon to aerial fertilize plantation trees and sometimes there’s drift outside the target area and there’s stream runoff. Conclusion obvious.

    Also, some of the reforest areas seem quite old. It might be complicated if plantation trees were sampled for dendro, especially if their sources (seed/cutting/sub-species?) were from elsewhere.

    • Håkan B
      Posted Nov 13, 2009 at 11:07 AM | Permalink

      Re: Geoff Sherrington (#28),
      I don’t think there was much logging before the 19th century, as there was no way to get the timber down to the coast. Why it was so you can read here:
      Döda fallet

      • Geoff Sherrington
        Posted Nov 13, 2009 at 11:36 PM | Permalink

        Re: Håkan B (#31),

        Thank you for this info. In this distant part of the world I guess not many know of the lake burst and we are the wiser for your account. Historical accounts like yours help keep the blog interesting and often contain lessons – e.g. be careful of alternative solutions to problems posed by Nature unless you are good at it.

        Re: ianl8888 (#33),

        Did you see a spreading chasm in NE Africa nicknamed “Al Gorge” on WUWT a few days ago? Shows the problems in coordinate mapping a changing feature on the ground as compared with making a global grid reference system.

        We once flew an aircaft briefly over the border from Iran to Afghanistan because of a mapping error. It excited the launch of a MiG fighter.

        • Håkan B
          Posted Nov 14, 2009 at 4:51 AM | Permalink

          Re: Geoff Sherrington (#34),
          I just noted a funny twist on units, measures and translations in the wikipedia article I linked to.
          It states the lake was 4 km (2.5 mi) long, while the swedish version states 2.5 mil, but a swedish mil, as used in spoken language, is 10 km. So the lake was about 25 km long, although probably not in a straight line.
          Here’s an article about scandinavian miles, but note that it was even worse than described in it:
          Scandinavian mile

        • ianl8888
          Posted Nov 14, 2009 at 2:52 PM | Permalink

          Re: Geoff Sherrington (#34),

          Can’t match that, Geoff – too much 🙂

          The silliest I’ve seen was an entire drilling program on someone else’s lease due to a surveyor disliking working in a hot climate (preferred the air conditioned vehicle). The recipient of this unexpected bounty of information was quite grateful, really

  18. Scott Brim
    Posted Nov 15, 2009 at 3:58 PM | Permalink

    After reading Steve’s exposition at the start of this thread, I went out on Google Main this afternoon with the intention of learning more about the general topic of population inhomogeneity, as it affects statistical analysis, using the following search criteria:

    “population inhomogeneity” statistics definition

    Up pops this very thread at the top of Google’s output list.
    So either a lot of people are reading this thread on the Internet, or perhaps this thread is considered the best real-world example of population inhomogeneity now available in Cyberspace.

%d bloggers like this: