## Briffa et al 2008

Briffa et al (Phil Trans Roy Soc London 2008) is a relatively new emanation from the Team, not previously discussed here, which is another example of the discrepancy between what the Team professes at its PR challenge and what they actually do.

While AGU journals (for example) have a category for “data” papers in which data sets are published, this cannot be said to be a data paper since no data is archived. It discusses 5 data sets (3 familiar to connoisseurs of Team multiproxy studies – Tornetrask, Yamal and Taimyr; and two not “traditionally” incorporated into Team multiproxy studies – Finnish Lapland (Helama et al 2002) and Bol’shoi Avam (Sidorova et al 2007). None of these 5 data sets are archived at ITRDB (or, to my knowledge, elsewhere), other than a very small subset of Tornetrask archived by Schweingruber some years ago. I’ve expressed my frustration on the unavailability of measurement data used over and over again and have tried to have journals require Briffa to disclose data used in his reports, but so far he’s resolutely refused. (Briffa’s typical excuse is that the data belongs to the Russians or some other excuse; as far as I’m concerned, who cares? The journal needs to ensure that he has permission to archive the data before they permit him to publish.)

The criteria for data selection- as so often in Team publications – are nowhere stated. His section 3 is entitled “Selected Eurasian Tree Ring Chronologies”. and merely notes that the paper is about “selected” long chronologies. Why, example, is the Bol’shoi Avam data set merged in with the Taimyr data set, while the Polar Urals update is not merged with the Yamal data set? (Readers of CA know that the Polar Urals update) had a very elevated medieval period.) It would be one thing if Briffa presented arguments why the Polar Urals data was no good, but he doesn’t. So what were Briffa’s selection criteria?

One only has to think of prior statements by Briffa and his PR challenge coauthors to motivate a little concern. PR Challenge coauthor D’Arrigo told an astonished NAS panel that you have to pick cherries if you want to make cherry pie. PR Challenge coauthor, Esper (in Esper et al 2003):

this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

Indeed. But I, for one, find these little cherry picking exercises increasingly absurd. So why isn’t the Polar Urals update used? For that matter, why isn’t the Indigirka River chronology used? Could its elevated MWP be a factor?

There are typically frustrating inconsistencies between the information on the data sets in the citations and in Briffa’s compilation. For example, Briffa reports that Bol’shoi Avam, citing the very recent Sidorova et al (Russ J Ecol 2007), had 178 samples yeilding a chronology from 851 to 2003. On the other hand, Sidorova et al 2007 states that “a total of 118 samples” were taken, of which 81 were cross-dated yielding a chronology from 886-2003. Is there a typo in Briffa et al or did they use a version different than the one reported in Sidorova et al 2007? Who knows.

While Briffa does not even archive digital versions of his three regional chronologies, his Figure 3 shows that the only series with a strikingly anomalous 20th century is Yamal – which we’ve discussed over and over at this site.

Original Caption: Briffa et al 2007 Figure 3. Regional curve standardized (RCS) chronologies (thin lines) and smoothed chronologies, the sum of the first three components of singular spectrum analysis of each RCS chronology (thick lines), for the regions: (a) Fennoscandia, (b) Yamal and (c) Avam–Taimyr. The grey shading represents the changing number of samples that go to make up the chronology through time.

While it becomes difficult to make definitive statements in the absence of proper data archiving, Briffa et al 2007 (which includes Grudd as a coauthor) has a different appearance than Grudd’s 2007-2008 update of Tornetrask, previously discussed at CA here, where Grudd contrasted his present chronology to the Grudd et al 2002 version in the figure shown below.

Original Caption Grudd 2008 Figure 11. In the lower panel (b), Reconstruction IV is compared with two previously published temperature reconstructions based on tree-ring data from Tornetrask: The thin curve is from Briffa et al. (1992) and based on TRW and MXD. The hatched curve is from Grudd et al. (2002) and based on TRW. All three reconstructions have been smoothed with a 100-year spline filter and have a common base period: AD 1951- 1970.

The present Briffa version merges Torntrask with Helama’s (unarchived) Lapland data – is it the merging that causes the difference or is it different handling of the Tornetrask measurement data by Briffa as compared to Grudd? Who knows. No data is archived.

It’s also hard to reconcile Briffa’s Taymyr-Avam composite with prior images. Here is a smoothed version of Briffa’s Taymyr series from Briffa (2000). Briffa et al 2007 shows a relatively more elevated 20th century. Sidorova et al doesn’t show an elevated 20th century. Did Briffa re-calculate the results from Sidorova et al 2007 using his own method of adjusting tree rings? Who knows. No data is archived.

Taymyr chronology re-calculated in Briffa 2000 (smoothed).

Briffa et al 2007 introduces methodological variations in its handling data that differ somewhat from prior dendro articles. For example, to supposedly provide an “objective picture” of lone time-scale variations, they filtered the three RCS regional chronologies using singular spectrum analysis (SSA) filtering. Later, “in order to assess the changing nature of large-scale average tree-growth variability”, they analyze trends and means from 101-year time windows. Then “in order to quantify the degree of correspondence in tree-growth trend changes on time scales ranging from multi-decadal to centennial, we have compared
the temporal growth patterns across all RCS chronologies using the Kendall’s (1975) concordance coefficient, applied over different moving time windows” etc. If this is a “correct” way to analyze tree ring data, then shouldn’t they publish some sort of methodological package at http://www.r-project.org or the equivalent or otherwise show that these methodological variations have some validity relative to other plausible choices.

“Unprecedented”
There is an interesting bit of sleight-of-hand on the “unprecedented” front. The article abstract states:

Using Kendall’s concordance, we quantify the time-dependent relationship between growth trends of the long chronologies as a group. This provides strong evidence that the extent of recent widespread warming across northwest Eurasia, with respect to 100- to 200-year trends, is unprecedented in the last 2000 years.

But followed by the next odd sentence in the Abstract:

An equivalent analysis of simulated temperatures using the HadCM3 model fails to show a similar increase in concordance expected as a consequence of anthropogenic forcing.

Here’s what’s going on. Here’s the middle panel of Briffa’s Figure 8 showing the “concordance coefficient” reaching supposedly “unprecedented” levels. Note that this figure, which supposedly motivates the caption, does not show “unprcedented” ring widths, but supposedly “unprecedented” concordance coefficients, not something that we’ve heard about previously as a “fingerprint”. This is described as follows:

In the unsmoothed concordance series, except for the shortest (51 years) window results which clearly show high concordance approximately 900, there is evidence of rising and unprecedented similarity in tree growth across northwest Eurasia in the most recent century. This is accentuated in the smoothed series for 101- and 201-year window lengths.

Briffa et al Figure 8. (b) Kendall’s concordance coefficients … for the unfiltered RCS chronologies calculated for moving windows of 51, 101 and 201 years, and the same data smoothed using the negative-exponential weighted least-squares method;

Briffa then discusses whether such concordance results are model “finger prints” reporting that their experiment driven with AGW failed to yield “unprecedented” concordance. Indeed, it barely reached “significant” concordance.

a simple analysis of one such experiment, under natural and GHG forcing for the last 250 years, while showing consistently increasing concordance between simulated temperatures in the regions of our chronologies, failed to produce results that could be distinguished from the results of a similar experiment driven only with natural (i.e. non-anthropogenic) forcings.

The concordance values clearly increase steadily throughout the duration of the all forcings simulation, but the magnitude of the values is low, even by the end of the experiment. Indeed even the maximum concordance values calculated for the series 101-year windows reach only just above 0.3, barely significant, while values approaching 0.4 occur in the naturally forced experiment. These results imply either that an interpretation of strong external forcing of recent widespread high warmth over northern Eurasia, perhaps the consequence of increased atmospheric GHGs, cannot be supported or, alternatively, that this particular GCM simulation of the last 250 years is not consistent with the observational temperature and dendroclimatically implied evidence of unusual warming that has been experienced in the real world.

a simple analysis of one such experiment, under natural and GHG forcing for the last 250 years, while showing consistently increasing concordance between simulated temperatures in the regions of our chronologies, failed to produce results that could be distinguished from the results of a similar experiment driven only with natural (i.e. non-anthropogenic) forcings

How did they go from this statement to the statement in the abstract that:

Kendall’s concordance … provides strong evidence that the extent of recent widespread warming across northwest Eurasia, with respect to 100- to 200-year trends, is unprecedented in the last 2000 years.

Makes no sense whatever. Of course, we’ve already seen Briffa’s Cargo Cult explanation of divergence, so why wouldn’t we expect another cargo cult jump in logic?

Briffa et al. (1998b) discuss various causes for this decline in tree growth parameters, and Vaganov et al. (1999) suggest a role for increasing winter snowfall.” In the absence of a substantiated explanation for the decline, we make the assumption that it is likely to be a response to some kind of recent anthropogenic forcing. On the basis of this assumption, the pre-twentieth century part of the reconstructions can be considered to be free from similar events and thus accurately represent past temperature variability. [Briffa et al. 2002]

As the poets say:

Cherry trees have tasty fruit;
And pickers need dexterity;
But not as much as paleos,
Who claim unprecedentity.

Reference:
Briffa, K.R., T.M. Melvin, E. A. Vaganov, et al. 2008. Trends in recent temperature and radial tree growth spanning 2000 years across northwest Eurasia. Philos Trans Roy Soc Lond B.

1. Stan Palmer
Posted Jul 14, 2008 at 4:52 PM | Permalink

Has anyone ever done a study in which they normalized the extracted temperature to the non-anthropogenic forcings. The quotations from the Briffa paper sounds like they did something similar to this. I wonder what the extracted temperature of the Mann hockey stick would look like if it were normalized to non-human forcings – e.g. historic volcanic eruptions.

2. Posted Jul 14, 2008 at 5:21 PM | Permalink

Steve:

For example, to supposedly provide an “objective picture” of lone time-scale variations, they filtered the three RCS regional chronologies using singular spectrum analysis (SSA) filtering. Later, “in order to assess the changing nature of large-scale average tree-growth variability”, they analyze trends and means from 101-year time windows. Then “in order to quantify the degree of correspondence in tree-growth trend changes on time scales ranging from multi-decadal to centennial, we have compared the temporal growth patterns across all RCS chronologies using the Kendall’s (1975) concordance coefficient, applied over different moving time windows”

Woah! Has Keith B been playing with the options in the stats package? Does any of this make sense to a statistician?

Steve: I can’t see why the smoothing choice would matter much. As to Kendall’s coefficient, I guess my issue is whether “novel” statistical methods should be introduced in an empirical study where they conclude that something “unprecedented” happened. My preference would be that they cite the use of Kendall’s coefficient for this or a similar purpose in a standard study, then show that the method works on data sets with known properties and then show its results on an empirical test in a follow-up article. Every article seems to have a different poorly documented method, not to mention different unarchived data.

3. Scott-in-WA
Posted Jul 14, 2008 at 5:55 PM | Permalink

John A: Woah! Has Keith B been playing with the options in the stats package? Does any of this make sense to a statistician?

Well, it would certainly make sense to an accomplished dendro data chiropractor.

On the other hand, if you are a reputable statistician and you can get a look at the actual data, then you can’t replicate the statistical analysis, regardless of your experience and qualifications.

4. George M
Posted Jul 14, 2008 at 5:59 PM | Permalink

It becomes more difficult, with each additional paper, to continue to regard these dendros as scientists. There must be another category somewhere they fit into. Some kind of lumberjacks?

5. jae
Posted Jul 14, 2008 at 6:04 PM | Permalink

After so much has been said about archiving, it is very disconcerting to see yet another paper ignore it. And I’m still as shocked as ever to see research scientists believe it is acceptable to “cherry-pick” samples that exhibit the very patterns that they are supposedly trying to “discover.”

6. Dave Dardinger
Posted Jul 14, 2008 at 6:04 PM | Permalink

George,

They’re the dendrojacks and they’re ok.

7. Craig Loehle
Posted Jul 14, 2008 at 6:17 PM | Permalink

Am I correct in interpreting the concordance as showing that the trees across the region showed more similar growth to one another? This would not necessarily imply warming, but merely that the climate across the region was more uniform–it could even be uniformly better.

8. Bernie
Posted Jul 14, 2008 at 6:28 PM | Permalink

Craig:
Is not Concordance simply a measure of association for non-continuous or ordinal variables? Does not the ;atter suggest that other measures have to be in line with this assumption as to the type of metrics being used?

9. Colin Davidson
Posted Jul 14, 2008 at 6:43 PM | Permalink

Thank you Steve, for your hard work in bringing this lousy work to attention.

Where the results cannot be replicated (and that is always the case when either the method or the data is not made available) then they cannot be falsified. If the results cannot be falsified then they cannot be “Science” nor can the authors be considered to be “scientists”. Rather, non-replicable, non-falsifiable work should be classified as “art(ifice)” and the authors as “artists”.

In this case neither the methods nor the data are available. The work is non-replicable and non-falsifiable. The authors, having produced art, are artists, not scientists.

10. Steve McIntyre
Posted Jul 14, 2008 at 6:53 PM | Permalink

Folks, there are lots of things wrong with this sort of stuff, but no need to go a bridge too far in editorializing. The facts speak loudly without a lot of piling on.

Having said that, you really do wonder what the “point” of an article is which neither discloses a data set nor provides a clear methodology, other than to moralize a little about something being “unprecedented”.

11. tom s
Posted Jul 14, 2008 at 7:26 PM | Permalink

Briffa et al. (1998b) discuss various causes for this decline in tree growth parameters, and Vaganov et al. (1999) suggest a role for increasing winter snowfall.” In the absence of a substantiated explanation for the decline, we make the assumption that it is likely to be a response to some kind of recent anthropogenic forcing. On the basis of this assumption, the pre-twentieth century part of the reconstructions can be considered to be free from similar events and thus accurately represent past temperature variability. [Briffa et al. 2002]

NOW THAT’S SCIENCE FOLKS! Sheesh. How do they get away with this? I mean, really now.

12. Ian McLeod
Posted Jul 14, 2008 at 7:28 PM | Permalink

It’s curious how a group of professionals like Briffa et al in their 2007 paper can say one thing in the abstract—a summary that is widely read by the dendro community—and then contradict the abstract in the central portion of their paper as they describe the empirical evidence.

In engineering, this type of error is referred to as an attribution error. Your thinking is guided by a prototype or model. You fail to consider other possibilities that may contradict the prototype and by doing so attribute indicators to the wrong cause. This is why it’s dangerous to go with your gut when you are attempting to do good science, as I’m sure Briffa el al were attempting to do. That said, it does not explain the disconnect between what was summarized in the abstract and the explanation in the main body of their paper. A misanthropist might think there was an agenda.

13. John Lang
Posted Jul 14, 2008 at 7:44 PM | Permalink

I don’t understand why they keep going back to tree-ring trough again and again to show the past 50 years of warming are some kind of significant aberation – the global warming “proof”.

We know there has been some warming in the last 50 years – the last 150 years. We know there are cycles in the climate. We have the ice core do18 data to rely on for the past cycles of the climate including the past 1,000 years.

The only rationale for going back to the empty trough again and again is to somehow resurrect Mann’s reputation or to keep trying to do away with the Medieval Warm Period and the Little Ice Age.

Judging by how often the Little Ice Age continues to be mentioned in the academic community and in the media, they have failed and they should just give up.

14. Geoff Sherrington
Posted Jul 14, 2008 at 8:09 PM | Permalink

Steve, magnificent forensic work again by you, another poor attempt at science revealed. These guys need more than professional statistical input as Dr Wegman suggested, they need professional input from plant nutritionists, arborists, methodological scientists and others.

There is a danger of oversaturating CA readers with example after example of underperformance in AGW. For myself, it is becoming weary to study and absorb the finer points; weary because one cannot offer a suggestion for improvement when lacking the data base to so do.

There will come a time when the dendro boil needs to be lanced again (I thought you had done it a couple of times). Is there scope for a questionnaire to members of dendro societies, pointing out fundamental problems and asking if they agree/disagree? It’s not conclusive, but it might guide your feeling that some younger dendros are alert. You might catalyse the emergence of two camps, one of which finds it inappropriate to select samples that suit.

15. Andrew
Posted Jul 14, 2008 at 9:14 PM | Permalink

Something is “unprecedented”, even if we can’t find it, so we’ll say “its unprecedented”, not elaborate, then bury the fact that models do not predict such unprecedented-ness in the paper…

My word, people seem to be losing their heads.

16. jae
Posted Jul 14, 2008 at 9:22 PM | Permalink

Look on the bright side: this saga will eventually be used as a prime example of several pitfalls in modern science: 1) use of poor scientific methodology; 2) some important pitfalls associated with peer review by your own colleagues (ala Wegman); 3) political influence on science and scientists; 4)the hopelessness of “stonewalling” against the inevitable truth; 5) the uneccesary grief caused by failures to archive data and methods. Wow.

17. Jeff A
Posted Jul 14, 2008 at 10:25 PM | Permalink

This will be yet another paper which is considered “proof” of AGW, with no one on the AGW side bothering to acknowledge it’s uselessness. But let there be a single paper which points to non-AGW climate effects, and they’re all over it for every single miniscule thing that even MIGHT be wrong.

18. Posted Jul 14, 2008 at 10:26 PM | Permalink

The Royal Society’s policy on data is:

“As a condition of acceptance authors agree to honour any reasonable request by other researchers for materials, methods, or data necessary to verify the conclusion of the article.”

I assume that Steve has been through the loops of requesting data on previous Royal Society papers?

19. Steve McIntyre
Posted Jul 14, 2008 at 10:29 PM | Permalink

Haven’t been refused by Royal Society yet= just by Nature, Science, Proc Nat Acad Sci, Clim Chg, Holocene, JGR.,… Maybe they’re different. I’ll try.

20. Willis Eschenbach
Posted Jul 14, 2008 at 10:51 PM | Permalink

The Kendall Coefficient of Concordance is the average of the (pairwise) Spearman Rank coefficients of the datasets. As such, it must have a standard error of the mean (SEM) of (presumably) sigma over sqrt(N-1), to put error bars around it (independently of the error bars in the original calculation of the Spearman R coefficients). At least that’s how I understand it. (N in this case is 15, one for each of the pairwise combinations.)

Of course, in the best tradition of modern “climate science”, they have neglected to show us these error bars. I gotta confess, I’m tired of claims of “unprecedented” which don’t have error bars.

The other oddity regarding Fig. 8 is that the concordance windows of 51, 101, and 201 years all peak in about 1910-1920, with both the 51 and 101 year concordances dropping after that. Exactly how does a 1910 peak followed by a steady decrease in concordance translate into “evidence of rising and unprecedented similarity in tree growth across northwest Eurasia in the most recent century.”

w.

PS – in the R documentation, I note the following:

A test for the significance of Kendall’s W [Concordance] is only valid for large samples.

Since the sample size in this case is only 5 different series, I suspect that this may not be a “large sample”.

21. MartinGAtkins
Posted Jul 14, 2008 at 10:56 PM | Permalink

[url=http://academic.emporia.edu/aberjame/ice/lec20/lec20.htm]Ancient tree rings contain a year-by-year record of past 14C abundance. During the period AD 1650-1750, 14C content is anomalously high in tree rings. This is called the DeVries effect, and it is exactly what would be expected during a long sunspot minimum.[/url]

Briffa et al. (1998b) discuss various causes for this decline in tree growth parameters, and Vaganov et al. (1999) suggest a role for increasing winter snowfall. In the absence of a substantiated explanation for the decline,
we make the assumption that it is likely to be a response to some kind of recent anthropogenic forcing.

This is appalling methodology.

When Briffa et al have finished massaging numbers and graphs perhaps they could get down to some serious science.

22. old construction worker
Posted Jul 14, 2008 at 11:13 PM | Permalink

“That said, it does not explain the disconnect between what was summarized in the abstract and the explanation in the main body of their paper.”
LOL Isn’t that the way science is done now? Write the summary frist.

23. Posted Jul 15, 2008 at 1:37 AM | Permalink

Bender was right. A Journal of Statistical Climatology would fix things like this.

24. Posted Jul 15, 2008 at 2:13 AM | Permalink

So, global warming? Happening? Not? Scam? There’s definitely climate change out there, but is it going to wreak havoc. I saw a press release last week that claimed global warming would increase the incidence of kidney stones in the USA…now that’s really stretching a medical grant application as far as it could possibly go, isn’t it?

25. Pierre Gosselin
Posted Jul 15, 2008 at 3:30 AM | Permalink

The publication of science has reached new lows. Is it really too much to ask for if we request publications apply at least just a few basic standards and requirements before they publish a paper? At the university students used to be given an “F” for this kind of work. And as Steve wishes, I’ll refrain from expanding on Dave Dardinger’s post, where I’d use the term NUMBERJACK.

What’s left to do other than editorialze? Could somone at least provide the e-mail address of the publication so that I can at least express my displeasure?

26. Steve McIntyre
Posted Jul 15, 2008 at 5:34 AM | Permalink

Please stop describing this article as a “new low”. It isn’t. It’s a routine Briffa article. All of them have pretty much the same pattern.

I guess what annoyed me was the hypocrisy of the recent PR Challenge announcement and the actuality of no data archived together with idiosyncratic statistical methods neither established in statistical literature nor benchmarked in the article, culminating in yet another proclamation of unprecedentity.

27. Craig Loehle
Posted Jul 15, 2008 at 5:57 AM | Permalink

The concordance of growth across the region and negative growth = widespread divergence (lower growth than predicted by the treemometer), as also indicated by the lack of concordance with the climate model results. But it seems they violated several of the statistical rules (sample size, proper tests, confidence intervals).

28. Dave Dardinger
Posted Jul 15, 2008 at 7:03 AM | Permalink

Steve,

Can’t we at least call it an “unprecedented” approach to scientific legerdemain?

Steve: Nope. It’s hardly “unprecedented”.

29. Tom
Posted Jul 15, 2008 at 8:29 AM | Permalink

What is the issue with the tree rings?

Were tree rings used by the team to reconstruct the MWP temperatures?

Are these tree rings validated by comparing them to current temperatures? ( THus validatingtheri use in reconstructing past temps)

30. bender
Posted Jul 15, 2008 at 8:41 AM | Permalink

you really do wonder what the “point” of an article is which neither discloses a data set nor provides a clear methodology, other than to moralize a little about something being “unprecedented”.

If extent of warming in Eurasia was “unprecedented”, this would be a significant observation. (Briffa et al are obsessed with the the spatial extent of current warming trend as an index of GHG AGW. The belief is that the extent of warming during MWP was less than global.)

But look at The concordance coefficient curve in the original post, and note how it peaks during MWP at a level roughly similar that observed today. Given that these “selected” chronologies are a sample from a population, the concordance coefficient must have a statistical distribution. So where are the confidence intervals on those concordance curves? Who reviewed this paper? Not Wegman and not bender, that’s who.

IMO the observed level of concordance is not significantly different than during the MWP, if you were to account for random sample error (not to mention selection bias). i.e. It is probably not “unprecedented”.

31. Clark
Posted Jul 15, 2008 at 9:05 AM | Permalink

My favorite aspect of dendrology is when they decide to ignore many tree rings in the second half of the 20th century because the results do not match their model. Just because.

Just jaw-dropping.

32. NeedleFactory
Posted Jul 15, 2008 at 9:18 AM | Permalink

Re Colin (#9)

If the results cannot be falsified then they cannot be “Science” nor can the authors be considered to be “scientists”. Rather, non-replicable, non-falsifiable work should be classified as “art(ifice)” and the authors as “artists”.

Your devastating remark is insightful and amusing — a great comment. Thanks! But I have a question.  By my understanding, Karl Popper introduced falsifiability to define a demarcation between science and religion, not between science and art.  I have no quibble with the latter, but am curious about it and wonder if the pairing is your own or if you can refer me to that distinction elsewhere.

Using Popper’s original formulation, the authors are not scientists, but rather “believers”.

33. jae
Posted Jul 15, 2008 at 9:25 AM | Permalink

Are these tree rings validated by comparing them to current temperatures? ( THus validatingtheri use in reconstructing past temps)

No, it would be silly to do that (inside joke).:) Seriously, this is one of the primary problems with the “science.” In the few cases that the tree-ring proxies have been brought up to date, they are showing a divergence from the “theory.” There’s a lot of info. on this in various threads here.

34. Posted Jul 15, 2008 at 9:53 AM | Permalink

Please stop describing this article as a “new low”. It isn’t. It’s a routine Briffa article. All of them have pretty much the same pattern.

Ouch!

35. Craig Loehle
Posted Jul 15, 2008 at 10:21 AM | Permalink

NeedleFactory: Popper was distinguishing between science and pseudoscience (e.g. astrology, Freudianism) where “experts” claim all sorts of ability or results but all post hoc (explaining but never predicting).

36. Kenneth Fritsch
Posted Jul 15, 2008 at 10:48 AM | Permalink

Folks, there are lots of things wrong with this sort of stuff, but no need to go a bridge too far in editorializing. The facts speak loudly without a lot of piling on.

Having said that, you really do wonder what the “point” of an article is which neither discloses a data set nor provides a clear methodology, other than to moralize a little about something being “unprecedented”.

I would agree that a sufficient take on climate science articles such as this one is what one takes away from doing some detailed analyses on it such as you have completed here. This article under proper scrutiny, in my view at least, does little to support the HSish temperature reconstructions and in the end can only make a generalized conjecture that modern day concordance is “unprecedented”.

I think I see a trend in these papers whereby when the authors have published something that could be construed as somewhat contradictory evidence, or at least evidence invoking more uncertainty, for the prevailing POV on these climate issues, they feel obligated to wave a subjective “unprecedented” or a concession to a big A in AGW about as a jester indicating that their personal views remain unchanged. I think once one understands these characteristics, one can learn from what the articles contain and what they do not (and should).

37. Steve McIntyre
Posted Jul 15, 2008 at 11:08 AM | Permalink

#36. Kenneth, I partly agree with you. When I read Chu (1973), and mentioned this in a post, I was struck by his genuflections towards Chairman Mao in an otherwise technical article on historical climate change; there’s some of that going on here with genuflections to “unprecedented AGW” instead of Chairman Mao.

I wouldn’t mind the mandatory salutation if the article published some data that could be incorporated into scientific literature or published a methodological package that showed its applicability and could perhaps be used on other sites – something that one could use. But what’s usable in this article except the abstract? There isn’t any data published here nor any useful methodology. So what exactly is there left except the genuflection?

38. bmcburney
Posted Jul 15, 2008 at 11:50 AM | Permalink

Someone please correct me if I am wrong, but if there were an anthropomorphic component to increased “concordance coefficients” isn’t it more reasonable and logical to ascribe the forcing to increased CO2 levels than to temp change? As I understand it, greater efficiency in water use is one of the observed effects of CO2 fertilization in trees. If increased CO2 levels increased growth of water-stressed trees without increasing the growth of trees with an ample water supply this should increase the “concordance coefficients” by reducing one source of variablity. It would also tend to explain the increased “concordance” during the MWP since increased warmth during the MWP should have increased atmospheric CO2.

I have always been told that AGW will increase climate variablity (i.e., more floods and more droughts), not reduce it.

39. Willis Eschenbach
Posted Jul 15, 2008 at 2:11 PM | Permalink

It also strikes me that as we look further and further into the past, the mis-datings between the tree ring records will gradually increase.

As a result, we would expect that a windowed Kendalls Concordance Coefficient would show an increase over time. And in fact, this is generally what we see in their Figure 8.

Also, I suspect that the Kendall Coefficient will prove to have a very wide range, given (a) autocorrelation, and (b) a tiny sample size. I’ll do some Monte Carlo simulations in this regard when I get a chance.

w.

40. RomanM
Posted Jul 15, 2008 at 2:27 PM | Permalink

Cru seems to be misdirecting with regard to this paper as well! :)

I went looking for it at the site:

http://www.cru.uea.ac.uk/cru/pubs/byauthor/briffa_kr.htm

When you click the link for the paper, you get something quite different.

41. bernie
Posted Jul 15, 2008 at 2:52 PM | Permalink

I think it is a poorly defined link. If you search for Briffa on the standard search area for the page it will show up – but it will cost you an arm and a leg for the article.

42. Willis Eschenbach
Posted Jul 15, 2008 at 3:11 PM | Permalink

OK, some preliminary results. Here’s the program I used:

myn=2000
x=c(1:(5*myn))
dim(x)=c(myn,5)
mytot=0
mysig=0
myar=.9
myma=-.2

for (i in 1:100){
x[,1]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))
x[,2]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))
x[,3]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))
x[,4]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))
x[,5]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))
if (i==1) {
mytot=kendall(x)$value mysig=kendall(x)$p.value
}
else {
mytot=c(mytot,kendall(x)$value) mysig=c(mysig,kendall(x)$p.value)
}
}
average(mytot)
sum(mysig

“kendall” is the Kendall Concordance, in the R package “irr”. What I am doing is generating 5 random pseudo-proxies which are 2000 “years” long, then calculating their Kendall Concordance and its significance.

If I set the AR coefficient to 0.5 or less, I get about the expected amount of false significance at the (p less than 0.05) level (one in 20, or about 5%, what we’d expect)

On the other hand, setting the AR coefficient up to 0.9 jacks the false significance at the 0.05 level up to about 25% – 30% … in other words, autocorrelation does exaggerate the significance levels greatly.

In all cases, the Kendall Concordance for the ARMA models is about 0.2 … and curiously, the Kendall Concordance for five perfectly random series (using “rnorm”, N=2000) is about the same.

Since their results (see Fig. 8 above) often go below 0.2, this means that there is often less concordance between their series than between purely random series …

If we had real data I could give a more accurate answer, but sadly …

w.

… If we had ham we could have ham and eggs for breakfast … if we only had eggs …

43. bernie
Posted Jul 15, 2008 at 3:55 PM | Permalink

Willis:
This looks pretty elegant. Given the extended debate over auto-correlation in these types of time series, why wouldn’t they have investigated this?
I looked briefly at Briffa’s co-author Thomas Melvin’s PhD thesis – Historical Growth Rates and Changing Climatic Sensitivity of Boreal Conifers – and there are a number of discussions of how he and Briffa, his advisor, presumably think about auto-correlations. One other thing surprised me is that I found reference to only a single pure statistics article or book – Mendenhall et al, 1990. I may have missed others – but given the substance of the topic I expected many more references. The way Mendenhall was cited also seemed odd to me – but then I am technically out of my depth.

44. Colin Davidson
Posted Jul 15, 2008 at 3:58 PM | Permalink

Re: Needlefactory #32

All my own thoughts, but I would not have thought all that original. It seems a shame that work done by able people is devalued by their own poor practice in failing to archive the data or disclose the method.

45. Kenneth Fritsch
Posted Jul 15, 2008 at 6:45 PM | Permalink

Re: #37

I wouldn’t mind the mandatory salutation if the article published some data that could be incorporated into scientific literature or published a methodological package that showed its applicability and could perhaps be used on other sites – something that one could use. But what’s usable in this article except the abstract? There isn’t any data published here nor any useful methodology. So what exactly is there left except the genuflection?

In my mind, the genuflection, in effect, indicates that the authors have a point to make and surely would make every effort to present available evidence to substantiate that point. When they fail in the manner of Briffa et al. 2007, one can arrive at one of two available conclusions: one about the evidence and one (not so flattering) about the authors.

46. Willis Eschenbach
Posted Jul 15, 2008 at 8:44 PM | Permalink

Well, while waiting for Briffa to release his data, I thought I’d take a look at my speculation that the trend in the concordance line (in Fig. 8 above) could be caused by dating errors.

Accordingly, I made up five 2000-year pseudo-proxies which are related (concordance ~0.6). Then I added two one-year dating errors in each of the following intervals

Proxy 1, Years 1 – 350
Proxy 2, Years 1 – 700
Proxy 3, Years 1 – 1050
Proxy 4, Years 1 – 1400
Proxy 5, Years 1 – 1650

Note that this only represents an error rate of two errors for every 2000 data points, or a 0.2% error rate. This is likely low by real world standards. In addition, I have limited the error to being exactly one year. In the real world, multi-year errors would not be uncommon.

Thus each pseudo-proxy contained two one-year dating errors, with more errors in the earlier stages of the record. Here is the result:

Figure 1. Kendall Concordance for 5 random pseudo-proxies (blue line), with 0.05 significance level shown in red. 51-year moving window.

This gives us “unprecedented” concordance in recent times. As few as two single-year dating errors per pseudo-proxy has produced a distinct slope to the trend line, a slope that is very similar to their Figure 8. Accordingly, when using real data, I would expect the concordance to increase as the year gets closer to the present time. In addition, the autocorrelation (AR = 0.9, MA = -0.2) causes a very large number of false positives in the significance test.

However, without data, we can’t say much more than that …

w.

More code, same variables as before, yes, I know, it’s kind of kludgy, but …

expand=3

x2=x

x2[,1]=(arima.sim(n= myn,list(ar = c(myar), ma = c(myma))))# dating errors
x2[,2]=x[,1]+rnorm(myn)*expand
x2[,3]=x[,1]+rnorm(myn)*expand
x2[,4]=x[,1]+rnorm(myn)*expand
x2[,5]=x[,1]+rnorm(myn)*expand

x=x2

for (i in 1:5){
split=as.integer(runif(1,1,i*350))
x[1:split-1,i]=x[2:split,i]
split=as.integer(runif(1,1,i*350))
x[1:split-1,i]=x[2:split,i]
}

for (i in 1:(myn-wind)){

if (i==1) {
mytot=kendall(x[i:(i+wind-1),])$value mysig=kendall(x[i:(i+wind-1),])$p.value
}
else {
mytot=c(mytot,kendall(x[i:(i+wind-1),])$value) mysig=c(mysig,kendall(x[i:(i+wind-1),])$p.value)

}

}
plot(mytot,type=”l”,main=”Kendall Concordance Monte Carlo”,ylim=c(.1,.7),xlab=”year”,ylab=”concordance”,las=1, col=”Blue”)

zmore=mysig > .05
zless=mysig

47. bender
Posted Jul 15, 2008 at 9:51 PM | Permalink

the trend in the concordance line (in Fig. 8 above) could be caused by dating errors

It is unlikely that there are dating errors of that frequency or magnitude.
The real issue here is hypocrisy over stated and actual policies on data documentation and methodological transparency and replicability. (Plus the weak statistics of unprecedentedness.)

48. Willis Eschenbach
Posted Jul 15, 2008 at 11:06 PM | Permalink

bender, your comment is excellent as always. However, see my #46. A couple of one year errors in each of the five proxy datasets easily introduces a trend into the results. These are two very minor errors per set, and are all in the same direction. Despite the small size and the common direction, they still create the trend. In a real dataset, I would expect a couple of dating errors in two thousand years of overlapping trees. I would also expect those errors to be more common the further back that you go.

Of course, the real issues are, as you say, accountability, replicability, and transparency.

However, without even statistical error bars on the Kendall Concordance numbers, much less actual error bars, we can’t address the question of the weak statistics. Even if they didn’t archive the data, they should have put on the error bars.

w.

49. Posted Jul 16, 2008 at 12:19 AM | Permalink

@Willis #42 and 46,

OT

I am learning R in order to follow some of the issues and comments raised here. I entered your scripts into my copy running on a Windows machine. It is barfing on the function average(). Apparently this function isn’t included in the base installation. I have read Steve’s R page, the help files included with R, and some of the resources linked by Steve on the R page without success. My Google-fu is usually quite good, but this time the results were not very helpful. I can substitute the mean() function for your average() function if this is the typical usage of average.

50. Willis Eschenbach
Posted Jul 16, 2008 at 2:39 AM | Permalink

cdquarles, I learned R at the urging of Steve M, and it has been very valuable. I’m still a beginner.

One of the beauties of R is that you can define your own functions. I have mine all in one file, there’s maybe 50 of them. Some of them are just aliases, because I’m used to using say “stdev” (from Excel) rather than sd (from R).

I also use them for simplicity. The “mean” function chokes on missing data (NA). I rarely want that behavior, so I’ve defined a function:

average = function (x) mean(x,na.rm=T)

Otherwise, I always forget to include the “na.rm=T” that makes it ignore missing values.

So, my bad, used a user-defined function without thinking about it.

All the best,

w.

51. Frank Upton
Posted Jul 16, 2008 at 3:48 AM | Permalink

Doesn’t an increase in atmospheric carbon dioxide make tress grow faster by itself, without requiring any increase in temperature?

52. Pete
Posted Jul 16, 2008 at 6:34 AM | Permalink

#51. That’s what I understand, but you probably need to evaluate precipitation/sunniness/CO2 response for a given tree species to see if CO2 alone or some CO2/precipitation/sunniness variable correlates best to tree rings.

I would think someone has done this already or determined its too hard.

53. Timo Hämeranta
Posted Jul 16, 2008 at 12:47 PM | Permalink

Steve et all, actually the study is published four days ago:

Briffa, Keith R., et al., 2008. Trends in recent temperature and radial tree growth spanning 2000 years across northwest Eurasia. Philosophical Transactions of the Royal Society of London B Vol. 363, No 1501, pp. 2271-2284, July 12, 2008

54. Steve McIntyre
Posted Jul 16, 2008 at 12:50 PM | Permalink

#53. Well then, I was very prompt in my report. I’ll change the heading.

55. K. Hamed
Posted Jul 17, 2008 at 2:35 AM | Permalink

Steve, Wiilis (21, 39, and 42), Craig (27), Bernie (43)

The effect of persistence on Kendall’s test has been reported a long time ago (Cox and Stuart, 1955), yet it continues to be ignored in many recent studies. Please take a look at:

Cox, D.R., Stuart, A., (1955). Some quick sign tests for trend in location and dispersion, Biometrika, 42, 80-95.

Hamed, K.H., and Rao, A.R., (1998). A modified Mann-Kendall trend test for autocorrelated data. J. Hydrol., 204, 182-196.

Hamed, K.H., (2008). Trend detection in hydrologic data: The Mann-Kendall trend test under the scaling hypothesis. Journal of Hydrology, Volume 349, Issue 3-4, February 2008, Pages 350-363

56. K. Hamed
Posted Jul 17, 2008 at 2:44 AM | Permalink

More references on the subject:

Lettenmaier, D.P., (1976). Detection of trends in water quality data from records with dependent observations. Water Resour. Res., 12(5), 1037-1046.

Hirsch R.M., Slack, J.R., (1984). Non-parametric trend test for seasonal data with serial dependence. Water Resour. Res., 20(6), 727-732.

Koutsoyiannis, D., (2003). Climatic change, the Hurst phenomenon, and hydrological statistics. Hydrol. Sci. J., 48(1), 3-24.

Koutsoyiannis, D., (2006). Nonstationarity versus scaling in Hydrology. J. Hydrol., 324, 239-254.

Matalas, N.C., and Sankarasubramanian, A., (2003). Effect of persistence on trend detection via regression. Water Resour. Res., 39(12) 1342, doi: 10.1029/2003WR002292.

Yue, S., Pilon, P., Phinney, R., and Cavadias, G., (2002). The influence of autocorrelation on the ability to detect trend in hydrological series. Hydrol. Process. 16, 1807-1829, doi: 10.1002/hyp.1095.

57. Willis Eschenbach
Posted Jul 17, 2008 at 3:42 AM | Permalink

K. Hamed, thank you kindly for the references, you clearly understand the issues.

The test in question in Briffa 2008 is Kendall’s Concordance, also called Kendall’s “W”. This is different from the Mann-Kendall test I see in your reference list. I have not seen anything about the effect of autocorrelation/persistence/Hurst phenomenon on Kendall’s W, but I am only a beginner in statistics. Koutsoyiannis is my guide, so I suspected the effect might be there. That’s why I tested for it. I generally like to do that in any case, just to get a sense of the statistical measure.

All the best,

w.

58. bernie
Posted Jul 17, 2008 at 5:57 AM | Permalink

K. Hamed:
Given your apparent use of this test for what I assume are similar types of time series data, how would you characterize its use in this context? Dou you agree with the conclusions that Willis draws based on his simulation?

Many thanks for the references.

59. K. Hamed
Posted Jul 17, 2008 at 6:41 AM | Permalink

Bernie and Willis

I apologize for not reading carefully. I was referring to Kendall’s Tau, another measure of concordance used more commonly in the field of hydrology. Kendall’s Tau can be used to test for trends by assessing the concordance between the ranks of observations in a time series with their time order, which is commonly known as the Mann-Kendall test. However, I would think (and as demonstrated by Wiilis above) that persistence would have the same effect on W since I believe it involves comparison between ranks of observations. However, it may not as simple to derive the theoretical results for multiple samples. By the way, even parametric tests have the same problem as shown by Matalas and Sankarasubramanian (2003).

60. Steve McIntyre
Posted Jul 17, 2008 at 7:38 PM | Permalink

#18. Bishop Hill, thanks for the link. I’ve sent the following letter to the editors of the Journal cc KEith Briffa.

Dear Sirs,
Your policy on data availability as stated at: http://publishing.royalsociety.org/index.cfm?page=1684#question10 states:

“As a condition of acceptance authors agree to honour any reasonable request by other researchers for materials, methods, or data necessary to verify the conclusion of the article.

Supplementary data up to 10Mb is placed on the Society’s website free of charge and is publicly accessible. Large datasets must be deposited in a recognised public domain database by the author prior to submission. The accession number should be provided for inclusion in the published article.”

Briffa et al failed to comply with your requirement that “large datasets must be deposited in a recognised public domain database by the author prior to submission” and your editorial staff and reviewers failed to ensure that the article included an accession number for such deposit.

In particular, Briffa et al. 2008 discussed the following tree ring measurement data sets which have not been archived at the International Tree Ring Data Bank or other public domain data base (other than a small subset of the Tornetrask data set.) Would you therefore please provide me with either a URL or the complete tree ring measurement data sets in digital form for all data sets discussed in Briffa et al 2008, including Yamal, Tornetrask, Taymyr, Bolshoi Avam and Finnish Lapland, together with digital versions of the individual reconstuctions referred to in Briffa et al 2008, including, without limitation, the reconstructions for each of the above sites and the composite regional reconstructions referred to in the article. This informaiton is necessary to “verify the conclusion of the article”.

Yours truly,
Stephen McIntyre

61. Willis Eschenbach
Posted Jul 17, 2008 at 11:42 PM | Permalink

K. Hamed, thanks for the post.

The Kendall W is defined as

$W=\frac{(k-1)R+1}{k}$

where R is the average of the pairwise Spearman’s Rank coefficients between all possible pairs of the k (in this case 5) different datasets. So in addition to the significance testing, there is another uncertainty in the calculation of W, which is the standard error of the mean of the (in this case 10) different pairwise rank coefficients.

Without the data, however, there’s no way to tell how large that uncertainty is.

w.

PS – It is of interest that the relation can be rewritten as

$\lim_{k\rightarrow\infty}\left(\frac{(k-1)R }{k}+\frac{1}{k}\right)=R$

In this form it is clear that the limit of W as k -> infinity is R. So with a very large number of datasets, W = average R.

At the other end of the spectrum, when k=2, W = R/2+.5. This merely transforms R from a range of -1 to 1, to a range from 0 to 1.

62. Posted Jul 18, 2008 at 12:22 AM | Permalink

Steve #60

You’re welcome. Just don’t go holding your breath, OK?

63. Ulises
Posted Jul 18, 2008 at 9:32 AM | Permalink

# 59 K. Hamed :

I was referring to Kendall’s Tau, another measure of concordance….

In fact, it’s a measure of correlation, let’s not confound the concepts. Imagine you have the series x = (1,2,3,4) and y = (1,2,3,4). Then you have perfect correlation and perfect concordance (agreement between series). With y = (4,3,2,1) instead, you still have perfect (negative) correlation, but zero concordance.

# 61 Willis :

The Kendall W is defined as….

BTW,the original definition is a different one. The computation through the average rank correlations is just one way. Conover (“Practical Nonparametric Statistics”) explains the interrelationships between Kendall’s W, Friedman Test (a rank-based analysis of variance) and average Spearman’s rho.

…there is another uncertainty in the calculation of W, which is the standard error of the mean of the… pairwise rank coefficients.
Without the data, however, there’s no way to tell how large that uncertainty is.

What would you do with it if you knew it ? With more discrepancies between series, R is low and W drops accordingly. That’s the logic of the test, no matter how R is composed.

64. K. Hamed
Posted Jul 18, 2008 at 12:47 PM | Permalink

Ulises #63

Thanks for the explanation. However, in his book “Rank Correlation Methods (1948)”, Kendall writes about tau (section 1.13): “…… and thus has evident recommendations as a measure of the concordance between two rankings.”
There should be no problem in calling perfect negative correlation between two rankings “discordance” or “perfect disagreement” which is also used by Kendall in section 1.8 in his book. Of course for three or more rankings we can only talk about concordance or discordance.

65. bernie
Posted Jul 18, 2008 at 3:45 PM | Permalink

K. Hamed:
I would still be interested on your take on how Briffa used Kendall W both specifically with respect to the results (How should he have stated his results?) and from a pure measurement point of view ( Would you have used this measure?).

66. Willis Eschenbach
Posted Jul 18, 2008 at 3:47 PM | Permalink

Ulises, thanks for your thoughts and clarifications. Among them, you say:

What would you do with it if you knew it ? With more discrepancies between series, R is low and W drops accordingly. That’s the logic of the test, no matter how R is composed.

I fear you misunderstand my point. Inherently, there is nothing to distinguish between a Kendall W comprised of six identical Spearman Rank correlations between k=4 datasets (lets say all of the R’s are 0.5), and a Kendall W comprised of six Spearman Rank correlations of R = (0, 0.25, 0.5, 0.5, 0.75, 1). Because the averages are equal, both give a W statistic of (3*R+1)/4 = 0.625

I hold that in the second case, the Kendall W will have a greater inherent inaccuracy than in the first, and that the way to deal with this is to put error bars on the Concordance figure. This is particularly true when a claim is made (as in Briffa 2008) that a slight change in the W statistic has some larger significance. If the slight change is less than the sum in quadrature of the relevant errors, then it has no statistical significance.

w.

67. Willis Eschenbach
Posted Jul 18, 2008 at 3:53 PM | Permalink

Further to my last post, error bars are also useful when comparing different length records. Briffa 2008 shows us Kendall W figures for data lengths of 51, 101, and 201 years. Surely the accuracy of the statistic improves with increasing length of the datasets considered.

Gotta run, more later,

w.

68. K. Hamed
Posted Jul 19, 2008 at 4:31 AM | Permalink

Bernie #65

In my openion, the test is approporiate for this kind of investigation. My only concern is the effect of autocorrelation. Judging by the simulations supplied by Willis #42 and my experience with tau, the existence of autocorrelation in each series increases the variance of the test statistic, resulting in more rejections (or equivalently highly significant test statistic values) than when each series is random.