Recently, after the posting of the Phil Trans B archive on Sept 8, 2009, I determined that the Yamal data set as used by Briffa is not more “highly replicated” than the Polar Urals data set and thus there is no basis for the preferential selection of the Yamal chronology over the Polar Urals chronology into Team multiproxy studies. The seemingly biased selection of Yamal over Polar Urals has been a longstanding concern of mine and was the theme of numerous of my AR4 Review Comments, all of which were repudiated by Briffa, the IPCC author responsible for this section. In light of the abysmal modern replication of the Yamal chronology, the rejection of these comments seems highly questionable.
However, the main response of critics of this site over the past few days (e.g. Tim Lambert, David Appell, Deep Climate, the latter now linked by Andrew Revkin) has not been reflection on the poor replication of the Yamal series or the impact of the now established bias in the selection of Yamal over Polar Urals, but vituperative criticism of me for not being able to determine the poor replication and provenance of the Briffa data set earlier, using the materials available before Sept 2009, including the Hantemirov’s low-replication corridor standardization data set for Hantemirov and Shiyatov 2002 which I had obtained in early 2004, well before I began examining the Yamal data set in more detail after the publication of Osborn and Briffa 2006 and D’Arrigo et al 2006 in Feb 2006.
No opprobrium for the many climate scientists who used Briffa’s abysmally low replication chronology without inquiring into its replication. No opprobrium for those climate scientists to whom precisely the same materials were available and who had also failed to identify the defects in the Briffa data set prior to Sept 2009. No opprobrium for Briffa who had failed to report core counts or provide the data when requested, not just by me but by the authors of D’Arrigo et al 2006. Instead, the criticism was leveled at the first person to actually figure out the poor replication of the Briffa data set because, in their opinion, I should have been able to figure it out earlier.
They seem to think that I should have been able to deduce that the low-replication Hantemirov data set used for corridor standardization was the same version that Briffa had used for RCS standardization. There were a couple of reasons why I did not presume that they were the same. One important reason was that the authors of D’Arrigo et al believed that the Briffa data set was more “highly replicated” than the Polar Urals data set which had 57 cores in 1990. If the Briffa data set was more highly replicated than Polar Urals, then it had to be a different and larger version than the Hantemirov version, which only had 10 cores in 1990.
In addition, the population requirements for corridor standardization are very different than the population requirements for RCS standardization. Briffa was a leading proponent on RCS standardization and his writings all stated the need for large populations very clearly. So it seemed inconceivable that the Hantemirov data set would be the same data set as the one that Briffa used.
Some people have argued that I should have been able to figure out that Briffa had used the Hantemirov data set from information available to me prior to Sept 2009 or at least been able to figure out the low replication of the Briffa data set.
This question can be interpreted as an interesting mathematical inverse problem – can you deduce the number of cores in a measurement set if you are given (1) a chronology; (2) a measurement set that your RCS emulation maps “close” to the given chronology. I’m not convinced that the inverse problem is as easy as my critics suggest.
The RCS algorithm “mapping” measurement matrices X onto chronology vectors y is a “projection”; it is not one-to-one and no inverse function is defined. Many measurement matrices project close to one another in chronology space under a RCS map as discussed below. If you don’t have a precise operating definition of the mapping function, but merely an emulation that is “close” to the underlying algorithm, then the inverse problem seems intractable to me.
In the case at hand, contrary to what many people think, there was no published software for Briffa’s RCS methodology (there was published software for ARSTAN, but this is a different methodology.) As described in the literature, RCS is not difficult mathematically (a one-size-fits-all age dependence curve is used for standardization). However, there are a number of options and alternatives that make it impossible to be sure that you have got Briffa’s precise algorithm, particularly when, as was the case here, there are no benchmark data sets containing both measurement data set and chronology vector. Different forms of curve specification for age dependence are possible (and referred to in various Briffa articles), including negative exponential, Hugershoff and splines of varying stiffness. (See Melvin and Osborn 2008 for examples of the substantial impact from different specifications.) Different specifications of the form of the age-dependence curve can change the RCS mapping function. In addition to the form of the age-dependence curves, there are other known variations in RCS methodology e.g. a stratification between “linear” and “nonlinear” trees (Esper) or a stratification by site e.g. Wilson).
It was possible for me to determine that my emulation function applied to the Hantemirov data set yielded a chronology vector that was “close” to the archived results from application of the “true” function to the Briffa data set. But I did not know (and do not see how I could have known) whether the differences were due to a projection from a different (and perhaps much more replicated data set) or due to differences between my emulation of Briffa’s algorithm and Briffa’s own implementation of his algorithm.
We know from a variety of examples that RCS maps from different but related data sets can produce “remarkably similar” chronology vectors. Reader Tom P has argued that one can exclude all young trees from the measurement data set and still obtain a chronology that is “remarkably similar” to the original chronology. But stop for a minute. This means that the RCS algorithm applied to a dataset and a truncated version can yield reconstruction vectors that are “remarkably similar”, which I take to mean that the norm of the residual vector is small. Considering the inverse problem now, because the chronology vectors from both the original and truncated measurement data sets are so close, you can’t determine which measurement data set originated the chronology vector if your function is only known to be “close” to the Briffa function.
This is the problem in trying to make deductions merely given the reconstruction vector y and a the low-replication Hantemirov data set. If the emulated chronology nailed the archived chronology to 6 nines, then you could reasonably conclude that you had identified the originating data set and the function (at least in that range.) But if all you have is an emulated chronology that is “close” in some sense to the archived reconstruction, but not exact, and a RCS emulation that is “close” to Briffa’s algorithm but not necessarily the same, then I, for one, do not see how you could rule out the possibility that the archived reconstruction vector y was generated from a much larger measurement data set, one of a size appropriate to RCS standardization.
In my opinion, it is wishful thinking to think that sufficiently sophisticated reverse engineering could have identified the low replication of the Briffa data set. I don’t rely on this point. Maybe someone can demonstrate that reverse engineering was possible in the case at hand and that, with a little more insight and better reverse engineering, I (and others) could have figured out the poor Briffa replication. If so, so be it. Without the actual Briffa data set, I wasn’t able to figure out that Briffa had used the Hantemirov corridor data set for RCS standardization. Nor was anyone else, including any of the authors who used the poorly replicated Briffa data in important multiproxy reconstructions apparently without inquiring into its replication.
But surely my reverse engineering ability (or the reverse engineering ability of others) isn’t the real issue. If Team climate science is dependent in any measure on my skill in reverse engineering to identify egregious problems like the low replication of the Briffa data set, then surely it is time to examine other methods of improving Team quality control procedures so that such problems need not occur in the future.
I can think of three major checkpoints where chances of earlier identification of the defect were missed.
The first and most obvious checkpoint occurs in the original publication where Briffa failed to provide core counts. Had this been done, it would have been easy to determine that the Hantemirov version was the one used by Briffa. Had this been done, the authors of D’Arrigo et al 2006 (and perhaps even the authors of other multiproxy studies) would have been under no illusions about the replication of the Briffa data set. They would have been notice and they would have had only themselves to blame for using it. Why were core counts not reported in the seminal introduction of this important proxy? In my opinion, it was because the Briffa RCS reconstruction for Yamal was never published in an appropriate peer reviewed article. It was introduced passim in Briffa 2000, which was a birds-eye overview of long reconstructions. Had Yamal ever been presented in a proper peer reviewed article, any reviewer would have insisted on the presentation of core counts. But this was never done and it fell between stools – not unlike the Mann PC1, which likewise was never presented in a technical article.
The second checkpoint was when multiproxy authors used the proxy. It appears that all of these authors, other than D’Arrigo et al failed to notice that no core counts were available for the Briffa data set, but used the chronology anyway, without knowing what the replication was. The authors of D’Arrigo et al 2006 appear to be the only exception. They requested the data set from Briffa, but Briffa refused to provide it to them. Had they been provided the data set – as ought to have happened – there is no doubt in my mind that they would have done the core count calculations and identified the poor replication of the Briffa data set in 2005. The unavoidable point is that Briffa’s withholding the measurement data from D’Arrigo delayed identification of the poor replication for at least four years – from 2005 to 2009.
A third checkpoint was Science’s refusal to require Briffa to provide the data in 2006. In my opinion, a “senior” journal like Science should require a proper chain of custody for data used in its articles. Where authors like Osborn and Briffa rely on results from an earlier paper (i.e. Briffa 2000) published in a journal with a less adequate data archiving policy than Science, Science should not merely pass the buck. Science should have taken charge of the situation and required Briffa to have provided the requested measurement. Their failure to act also delayed the identification of the poor replication by three years from 2006 to 2009.
In Briffa’s last communication with me, he said that he would refer the request to the Russians. By that, I assumed that he was seeking their consent to provide his version of the data to me. I knew at the time that Briffa had also refused to provide his data to the authors of D’Arrigo et al and so I didn’t hold out much hope that I would have more luck with him than they had. And again, at this time, I was influenced by their belief that Briffa was using a much bigger data set than Polar Urals (i.e. not the small Hantemirov corridor data set.) Had I followed up, I might have discovered that there wasn’t some mysterious larger data set and that they had only sent the low-replication Hantemirov corridor version that I already had. But I didn’t follow up at the time. Soon after this correspondence, the NAS and Wegman reports came out, then there were the House Energy and Commerce Committee hearings, so I didn’t follow up further. Nor did D’Arrigo et al. With all of us continuing to assume that the Yamal data set was “more highly replicated” than the Polar Urals data set – an assumption that obviously was later proved incorrect.
When Briffa et al 2008 was published, a Climate Audit reader drew my attention to the fact that Phil Trans B had more stringent data archiving policies than other journals and that it might be possible to finally pin down the precise version used by Briffa. The discovery that Briffa had used the small Hantemirov data set designed for corridor standardization came as a considerable surprise.
Perhaps I should have been able to have determined much earlier that the low-replication Hantemirov corridor standardization data set was used by Briffa for RCS standardization, but the fact of the matter is that I didn’t. Until I was able to inspect a data set that I knew for certain had been used by Briffa in Briffa 2000, I did not know of the low replication of this data set. Nor, to my knowledge, did any other specialists in the field including the authors of D’Arrigo et al 2006.
Obviously it shouldn’t have taken nine years from the publication of Briffa 2000 to establish the inadequate replication of Briffa’s Yamal chronology nor that it was not “more highly replicated” than Polar Urals, as the authors of D’Arrigo et al 2006 thought. The root problem lay in Briffa’s failure to formally present the Yamal RCS chronology in a peer reviewed publication where core counts would have been required. Second, prior to using the Russian data in a publication, Briffa should have obtained any required consents from the Russian originators of the data so that he could respond to requests for data himself. Third, Science should have insisted on Briffa providing the data when it was at issue in 2006 and not relied on a third party journal for ensuring data archiving compliance, especially when the authors were the same. Briffa’s delays in complying with Phil Trans B instructions cost a year as well. Given that the Briffa version proved identical to the Hantemirov version, this could have been archived in a few minutes: why did it take a year?
Up to Sept 2009, none of the users of Briffa’s Yamal data set had discovered its inadequate replication. The first person to determine this was me and only after Briffa finally archived the data as used. And critics are angry at me for not figuring it out earlier. Climate science.