Colorado Springs Fire

My sister sent me the following picture of the Colorado Springs fire. It’s about 10 miles from their house. CA commenter Pete H (also author of the handy WordPress unthreading plug-in and one of the major behind-the-scenes technical helpers to the blog) also lives in Colorado Springs.

Nic Lewis on Forest et al 2006

Myles Allen recently asked that more attention be paid by critics to work on climate sensitivity, rather than paleoclimate. Nic Lewis, a coauthor of O’Donnell et al 2010, has been parsing climate sensitivity calculations for some time and with considerable frustration. Nic Lewis has a very important article at Judy Curry’s here.

One of the seminal sensitivity estimates is Forest et al 2006. Nic reports that he tried for over a year to get data for this study with Forest finally saying that the raw data was now “lost”.

I have been trying for over a year, without success, to obtain from Dr Forest the data used in Forest 2006…. Unfortunately, Dr Forest reports that the raw model data is now lost.

Nic was able to to get data for two predecessor studies and has concluded that the calculations in Forest et al 2006 were done erroneously:

If I am right, then correct processing of the data used in Forest 2006 would lead to the conclusion that equilibrium climate sensitivity (to a doubling of CO2 in the atmosphere) is close to 1°C, not 3°C, implying that likely future warming has been grossly overestimated by the IPCC.

This is important stuff. Nic is very sharp, Forest et al is an important paper and Nic’s conclusions are damning. It’s frustrating that, after all the controversy, climate journals don’t require authors to archive data and that IPCC authors continue to “lose” data.

Royal Society Report on Data Sharing

Although Geoffrey Boulton was the lead author, the Royal Society report on data sharing published today was surprisingly even handed. (h/t Bishop Hill.)

Climate Audit and McIntyre S receive a cameo mention on page 40:

At the other extreme, there is a small, but increasingly numerous body of engaged “citizen scientists” that wish to dig deeply into the scientific data relating to a particular issue. They are developing an increasingly powerful “digital voice,” though many lack formal training in their area of interest…. Some ask tough and illuminating questions, exposing important errors and elisions.102 (102 McIntyre S (2012). Climate Audit. Available at: http://www.climateaudit.org/)

The term “citizen scientist” is not a term that I use nor one that I like. In addition, most of the core Climate Audit commenters not only have formal training in statistics, but their formal training in statistics generally substantially exceeds that of the authors being criticized. The dispute is between formally trained statisticians and statistically-amateur and sometimes incompetent real_climate_scientists.

The Report refers to FOI events more accurately than either Nature or the Muir Russell report:

The potential loss of trust in the scientific enterprise through failure to recognise the legitimate public interest in scientific information was painfully exemplified in the furore surrounding the improper release of emails from the University of East Anglia.99 These emails suggested systematic attempts to prevent access to data about one of the great global issues of the day – climate change. The researchers had failed to respond to repeated requests for sight of the data underpinning their publications, so that those seeking data had no recourse other than to use the Freedom of Information Act (FoIA) to request that the data be released.

The need to invoke FoIA reflects a failure to observe what this report regards as should be a crucial tenet for science, that of openness in providing data on which published claims are based.

Nature and the climate community have been wilfully obtuse to CRU’s obstruction leading up to FOI requests.

There are many comments about adverse results which are ones that I endorse.

Perhaps even a comment that applies to the screening fallacy as applied and/or endorsed by real_climate_scientists:

Good science , simply put, “looks at all the evidence (rather than cherry picking only favourable evidence), uses controls for variables so we can identify what is actually working, uses blind observations so as to minimise the effects of bias, and uses internally consistent logic.”103 To ignore this kind of rigour is at the least poor practice.

All the evidence. Not just the evidence selected by ex post correlation screening.

Screening Proxies: Is It Just a Lot of Noise?

There has been a great deal of discussion on a recent CA thread on the efficacy of screening proxies for use in reconstructions by selecting on the size of the correlation between the proxy and the temperature during the calibration time period. During the discussion I asked Nick Stokes the following questions in a comment:

Do you think that it would be appropriate to use correlation to screen tree rings in a particular site or region when doing a temperature reconstruction for that site? Would you not be concerned that the process could bias the result even if the trees did contain actual information about the temperature?

His reply was

Roman, let me counter with a question,

“bias the result”

Bias from what? You’re using the language of population statistics. But what is the population? And why do you want to know its statistics? What is it biased from?

But to answer yours, yes I do. The proxy result is going to track instrumental in the training period. That’s intended, and means it isn’t independent information. But how does it bias what you want, which is the pre-training signal?

Continue reading

An Unpublished Law Dome Series

Oxygen isotope series are the backbone of deep-time paleoclimate. The canonical 800,000 year comparison of CO2 and temperature uses O18 values from Vostok, Antarctica to estimate temperature. In deep time, O18 values are a real success story: they clearly show changes from the LGM to the Holocene that cohere with glacial moraines.

On its face, Law Dome, which was screened out by Gergis and Karoly, is an extraordinarily important Holocene site as it is, to my knowledge, the highest-accumulation Holocene site yet known, with accumulation almost 10 times greater than the canonical Vostok site. (Accumulation is directly related to resolution: high accumulation enables high resolution.) The graphic below compares glacier thickness for some prominent sites for three periods: 1500-2000, 1000-1500 and 0-1000. its resolution in the past two millennia is nearly double the resolution of the Greenland GRIP and NGRIP sites that have been the topic of intensive study and publication. Continue reading

More on Screening in Gergis et al 2012

First, let’s give Gergis, Karoly and coauthors some props for conceding that there was a problem with their article and trying to fix it. Think of the things that they didn’t do. They didn’t arrange for a realclimate hit piece, sneering at the critics and saying Nyah, nyah,

what about the hockey stick that Oerlemans derived from glacier retreat since 1600?… How about Osborn and Briffa’s results which were robust even when you removed any three of the records?

Karoly recognized that the invocation of other Hockey Sticks was irrelevant to the specific criticism of his paper and did not bother with the realclimate juvenilia that has done so much to erode the public reputation of climate scientists. Good for him.

Nor did he simply deny the obvious, as Mann, Gavin Schmidt and so many others have done with something as simple as Mann’s use of the contaminated portion of Tiljander sediments according to “objective criteria”. The upside-down Tiljander controversy lingers on, tarnishing the reputation of the community that seems unequal to the challenge of a point that a high school student can understand.

Nor did they assert the errors didn’t “matter” and challenge the critics to produce their own results (while simultaneously withholding data.) Karoly properly recognized that the re-calculation obligations rested with the proponents, not the critics.

I do not believe that they “independently” discovered their error or that they properly acknowledged Climate Audit in their public statements or even in Karoly’s email. But even though Karoly’s email was half-hearted, he was courteous enough to notify me of events. Good for him. I suspect that some people on the Team would have opposed even this.

The Screening Irony
The irony in Gergis’ situation is that they tried to avoid an erroneous statistical procedure that is well-known under a variety of names in other fields (I used the term Screening Fallacy), but which is not merely condoned, but embraced, by the climate science community. In the last few days, readers have drawn attention to relevant articles discussing closely related statistical errors under terms like “selecting on the dependent variable” or “double dipping – the use of the same data set for selection and selective analysis”.

I’ll review a few of these articles and then return to Gergis. Shub Niggurath listed a number of an interesting articles at Bishop Hill here.

Kriegeskorte et al 2009 (Nature Neuroscience) in an article entitled “Circular analysis in systems neuroscience: the dangers of double dipping” discuss the same issue commenting as follows:

In particular, “double dipping” – the use of the same data set for selection and selective analysis – will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis.

Nonindependent selective analysis is incorrect and should not be acceptable in neuroscientific publications….

If circularity consistently caused only slight distortions, one could argue that it is a statistical quibble. However, the distortions can be very large (Example 1, below) or smaller, but significant (Example 2); and they can affect the qualitative results of significance tests…

Distortions arising from selection tend to make results look more consistent with the selection criteria, which often reflect the hypothesis being tested. Circularity therefore is the error that beautifies results – rendering them more attractive to authors, reviewers, and editors, and thus more competitive for publication. These implicit incentives may create a preference for circular practices, as long as the community condones them.

A similar article by Kriegeskorte here entitled “Everything you never wanted to know about circular analysis, but were afraid to ask” uses similar language:

An analysis is circular (or nonindependent) if it is based on data that were selected for showing the effect of interest or a related effect

Vul and Kanwisher here, entitled “Begging the Question: The Non-Independence Error in fMRI Data Analysis” make similar observations, including:

In general, plotting non-independent data is misleading, because the selection criteria conflate any effects that may be present in the data from those effects that could be produced by selecting noise with particular characteristics….

Public broadcast of tainted experiments jeopardizes the reputation of cognitive neuroscience. Acceptance of spurious results wastes researchers’ time and government funds while people chase unsubstantiated claims. Publication of faulty methods spreads the error to new scientists.

Reader fred berple reports a related discussion in political science here “How the Cases You Choose Affect the Answers You Get: Selection Bias in Comparative Politics”. Geddes observes:

Most graduate students learn in the statistics courses forced upon them that selection on the dependent variable is forbidden, but few remember why, or what the implications of violating this taboo are for their own work.

John Quiggin, a seemingly unlikely ally in criticism of methods used by Gergis and Karoly, has written a number of blog posts that are critical of studies that selected on the dependent variable.

Screening and Hockey Sticks
Both I and other bloggers (see links surveyed here) have observed that the common “community” practice of screening proxies for the “most temperature sensitive” or equivalent imparts a bias towards Hockey Sticks. This bias has commonly demonstrated by producing a Stick from red noise.

In the terminology of the above articles, screening a data set according to temperature correlations and then using the subset for temperature reconstruction quite clearly qualifies as Kriegeskorte “double dipping” – the use of the same data set for selection and selective analysis. Proxies are screened depending on correlation to temperature (either locally or teleconnected) and then the subset is used to reconstruct temperature. It’s hard to think of a clearer example than paleoclimate practice.

As Kriegeskorte observed, this double use “will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis.” This is an almost identical line of reasoning to many Climate Audit posts.

Gergis et al, at least on its face, attempted to mitigate this problem by screening on detrended data:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p.&.lt.0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

This is hardly ideal statistical practice, but it avoids the most grotesque form of the error. However, as it turned out, they didn’t implement this procedure, instead falling back into the common (but erroneous) Screening Fallacy.

The first line of defence – from, for example, comments from Jim Bouldin and Nick Stokes – has been to argue that there’s nothing wrong with using the same data set for selection and selective analysis and that Gergis’ attempted precautions were unnecessary. I have no doubt that, had Gergis never bothered with statistical precaution and simply done a standard (but erroneous) double dip/selection on the dependent variable, no “community” reviewer would have raised the slightest objection. If anything, their instinct is to insist on an erroneous procedure, as we’ve seen in opening defences.

Looking ahead, the easiest way for Gergis et al to paper over their present embarrassment will be to argue (1) that the error was only in the description of their methodology and (2) that using detrended correlations was, on reflection, not mandatory. This tactic could be implemented by making only the following changes:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

Had they done this in the first place, if it had later come to my attention, I would have objected that they were committing a screening fallacy (as I had originally done), but no one on the Team or in the community would have cared. Nor would IPCC.

So my guess is that they’ll resubmit on these lines and just tough it out. If the community is unoffended by upside-down Mann or Gleick’s forgery, then they won’t be offended by Gergis and Karoly “using the same data for selection and selective analysis”.

Postscript: As Kriegeskorte observed, the specific impact of an erroneous method on a practical data set is hard to predict. In our case, it does not mean that a given reconstruction is necessarily an “artifact” of red noise, since a biased procedure will produce a Stick from an actual Stick signal. (If the “signal” is a Stick, the biased procedure will typically enhance the Stick.) The problem is that a biased method can produce a Stick from red noise as well and therefore not much significance can be placed to a Stick obtained from a flawed method.

If the “true” signal is a Stick, then it should emerge without resorting to flawed methodology. In practical situations with inconsistent proxies, biased methods will typically place heavy weights on a few series (bristlecones in a notorious example) and the validity of the reconstruction then depends on whether these few individual proxies have a unique and even magical ability to measure worldwide temperature – a debate that obviously continues.

Neukom’s South American Network

In a figure that took considerable work, IPCC AR5 (First Draft) compared 5 regional proxy  reconstructions to model output.  In Australia, they used the Gergis (Neukom) et al 2012 reconstruction, In South America, they used a Neukom et al 2011 (Clim Dyn) reconstruction.  In 2011, Neukom refused to provide me with the data versions used in this article (many of which are not public).  I recently wrote to the editor of Climate Dynamics without acknowledgement. Their Table 2 lists 19 “proxies” used in their winter temperature reconstructions – one of which is Law Dome, in a remarkable highly truncated version.
Continue reading

Gergis et al “Put on Hold”

A few days ago, Joelle Gergis closed her letter refusing data stating:

We will not be entertaining any further correspondence on the matter.

Gergis’ statement seems to have been premature.  David Karoly, the senior author, who had been copied on Gergis’ surly email and who is also known as one of the originators of the “death threat” story, wrote today:

Dear Stephen,

I am contacting you on behalf of all the authors of the Gergis et al (2012) study ‘Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium’

An issue has been identified in the processing of the data used in the study, which may affect the results. While the paper states that “both proxy climate and instrumental data were linearly detrended over the 1921–1990 period”, we discovered on Tuesday 5 June that the records used in the final analysis were not detrended for proxy selection, making this statement incorrect. Although this is an unfortunate data processing issue, it is likely to have implications for the results reported in the study. The journal has been contacted and the publication of the study has been put on hold.

This is a normal part of science. The testing of scientific studies through independent analysis of data and methods strengthens the conclusions. In this study, an issue has been identified and the results are being re-checked.

We would be grateful if you would post the notice below on your ClimateAudit web site.

We would like to thank you and the participants at the ClimateAudit blog for your scrutiny of our study, which also identified this data processing issue.

Thanks, David Karoly

Print publication of scientific study put on hold

An issue has been identified in the processing of the data used in the study, “Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium” by Joelle Gergis, Raphael Neukom, Stephen Phipps, Ailie Gallant and David Karoly, accepted for publication in the Journal of Climate.

We are currently reviewing the data and results.

The inconsistency between replicated correlations and Gergis claims was first pointed out by Jean S  here on June 5 at 4:42 pm blog time. As readers have noted in comments, it’s interesting that Karoly says that they had independently discovered this issue on June 5 – a claim that is distinctly shall-we-say Gavinesque (See the Feb 2009 posts on the Mystery Man.)

I urge readers not to get too wound up about this, as there are a couple of potential fallback positions. They might still claim to “get” a Stick using the reduced population of proxies that pass their professed test. Alternatively, they might now say that the “right” way of screening is to do so without detrending and “get” a Stick that way. However, they then have to face up to the “Screening Fallacy”.  As noted in my earlier post, while this fallacy is understood on critical blogs, it is not understood by real_climate_scientists and I would not be surprised it Gergis et al attempt to revive their article on that basis.

One thing we do know.  In my first post on Gergis et al on May 31, I had referred to the Screening Fallacy. The following day (June 1), the issue of screening on de-trended series was discussed in comment. I added the following comment in the main post ( responding to comment by Jim Bouldin and others):

Gergis et al 2012 say that their screening is done on de-trended series. This measure might mitigate the screening fallacy – but this is something that would need to be checked carefully. I haven’t yet checked on the other papers in this series.

There was a similar discussion at Bishop Hill.  What the present concession means – is that my concession was premature and that the screening actually done by Gergis et al was within the four corners of the Screening Fallacy. However, no concessions have been made on this point.

Gergis “Significance”

Jean S observed in comments to another thread that he was unable to replicate the claimed “significant” correlation for many, if not, most of the 27 Gergis “significant” proxies. See his comments here  here

Jean S had observed:

Steve, Roman, or somebody 😉 , what am I doing wrong here? I tried to check the screening correlations of Gergis et al, and I’m getting such low values for a few proxies that there is no way that those can pass any test. I understood from the text that they used correlation on period 1921-1990 after detrending (both the instrumental and proxies), and that the instrumental was the actual target series (and not the against individual grid series). Simple R-code and data here.
http://www.filehosting.org/file/details/349765/Gergis2012.zip

I’ve re-checked his results from scratch and can confirm them. Continue reading

Law Dome in Mann et al 2008

As mentioned yesterday, the Law Dome series has been used from time to time in IPCC multiproxy studies, with the most remarkable use occurring, needless to say, in Mann et al 2008. As noted yesterday, despite Law Dome being very high resolution (indeed, as far as I know, the highest resolution available ice core) and the DSS core being finished in 1993 and (shallow) updates in 1997 and 1999, there hasn’t yet been a formal technical publication.

To give a rough idea of Law Dome resolution, its layer thickness between AD1000 and AD1500 averages 0.45 m, as compared to 0.034 m at Vostok, 0.07 m at EPICA Dome C, 0.1 m at Siple Dome, 0.028 m at Dunde and 0.18 m at the NGRIP site, which is regarded as very high resolution. This is a very high accumulation and very high resolution site. There were some technical difficulties with the original DSS core in the 19th century and the upper portion of the “stack” presently reported relies on the DSS97 and DSS99 cores, crossdated to the long DSS core. Continue reading