Connolley co-author: “Unfortunately we have deleted all the NetCDF files…”

Recently, I made a request to Thomas Bracegirdle, junior partner of Connolley and Bracegirdle, for the model data used in two recent articles: Bracegirdle and Connolley (GRL 2007) about 20th century Antarctic models; and Bracegirdle, Connolley and Turner (JGR 2008) about 21st century Antarctic models. William Connolley is well known in the blogosphere, especially for his zeal in extinguishing heresy at Wikipedia.

The request for collated model data was very similar in form to my request last fall to Benjamin Santer, senior partner of Santer and Associates LLP, for model data used in Santer et al 2008.

Readers may remember my recall Santer’s rude refusal to provide the data, culminating in the cordial Team salutation “Please do not communicate with me any further” – I guess this means “Hasta la vista, baby” in Team dialect.

I recounted this refusal and the progress of several FOI requests in several contemporary posts here here here and here.

After all of this, Santer’s boss, David Bader, sent me an email purporting to “clarify several mis-impressions” – saying that, aw shucks, they had planned to put the data online all along and that my various FOI requests had nothing to do with it. See here.

Connolley and Bracegirdle (GRL 2007) had said that they had assessed “19 coupled models from the IPCC fourth assessment report archive from the simulation of the 20th century,” from the “Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset at https://esg.llnl.gov:8443/ “. They report the collecting several variables, one of which was sea ice, a variable of considerable recent interest. For sea ice, they said that used data from “15 coupled models”, considering “sea ice fraction (the proportion of grid covered by sea ice and not leads) rather than ice extent (the proportion of grid covered by ice of fraction at least 15%)”.

They noted that

… except for CSIRO the models have essentially zero skill. This is because, apart from CSIRO, all the models have months in which their total area falls well outside the observed range compared to satellite observations from Comiso [1999] using the bootstrap method, which verify best against other observations in Antarctica [Connolley, 2005]. All models produce a seasonal cycle with a peak in approximately the right season, though HadCM3 is a month late and NCAR CCSM two months early. IAP FGOALS has vastly overextensive ice, extending to South America.

Their table showing “essentially zero skill” is as follows:

Bracegirdle et al (JGR 2008) examined 13 models for 21st (rather than 20th century) reporting that:

Projections of total sea-ice area show a decrease of 2.6 ± 0.73 × 106 km2 (33%).

On Mar 30, 2009. I wrote Bracegirdle on the online submission form at the British Antarctic Survey, requesting the collated monthly data used in the articles. The underlying data is online at PCMDI, but to extract the monthly data would require downloading terabytes of data and then figuring out how the monthly composites were made – time-consuming clerical operations with related risk of error, that are irrelevant to statistical analysis.

For the purpose of statistical analysis, I was prepared to use the Bracegirdle-Connolley collation – only verifying the collation if issues arose. Here is my initial request:

Dear Dr Bracegirdle,

I read your interesting articles on AR4 models and would appreciate a digital version of the collation of Antarctic sea ice model projections as used in your most recent articles.

Regards, Steve McIntyre

Bracegirdle promptly replied but not responsively, sending me a PDF of his article, rather than the requested data. I replied:

I already had a copy of the article. My interest was in the DATA: the collation of Antarctic sea ice model projections.

Thanks, Steve McIntyre

Bracegirdle cheerfully replied:

I’d be happy to supply the data. Would NetCDF format be ok?

I thought to myself that the Team seemed to have learned something from the Santer episode. (They had, but not in the way that I had optimistic ally thought when I got the above email.) A week or so later, Bracegirdle emailed:

I have attached NetCDF files of the data that were used for Fig. 8 and described in Bracegirdle et al. (2008). I had to convert to NetCDF from PP format (which is what we and the UK Met Office use). Therefore some metadata, such as variable name (look for ‘unspecified’), does not appear in the NetCDF files. Hopefully these are the data that you were referring to.

Cheers,

These turned out to be tiny files. They did not contain the collated monthly sea ice data used for the calculations, but small files of about 80 numbers showing 21st century sea ice concentration change (difference between 2080-2099 mean and 2004-2023 mean) for (a) DJF, (b) MAM, (c) JJA and (d) SON.

Once again, it wasn’t responsive to the request. So one more request:

This is not at all what I was looking for. You sent me data sets that are only 30K. You state in the article “Model data were retrieved from the data portal at https://esg.llnl.gov:8443/ , from which 19 of the 24 available models were found to have the data required for this assessment.”

This is the data that I was looking for.

Regards, Steve McIntyre

Bracegirdle’s responded that he couldn’t supply the data. After initially asking me “would NetCDF format be ok?” and my answering yes, he now said that they had deleted the NetCDF data and that it “would take some time (more time than I have spare!) to retrieve the data again or convert them back to NetCDF”, with his answer ultimately being the same as Santer’s.

Hi Steve

Unfortunately we have deleted all the NetCDF files that we downloaded after converting them to PP format. It would take some time (more time than I have spare!) to retrieve the data again or convert them back to NetCDF. However, all the data are freely available at https://esg.llnl.gov:8443/home/publicHomePage.do – you just need to register. “

I’ve responded as follows:

Dear Tom,

I find it hard to believe that the British Antarctic Survey would permit the deletion of relevant files for two recent publications or that there aren’t any backups for the deleted data on institutional servers. Would you mind inquiring for me? In the mean time, would you please send me the PP format files that you refer to here for the monthly sea ice data for the 20th century models discussed in your GRL article and the 21st century models referred to in your JGR article.

Regards, Steve McIntyre

We’ll see where this goes.

This entry was written by Stephen McIntyre, posted on Apr 18, 2009 at 8:32 PM, filed under Modeling and tagged Antarctica, Bracegirdle, British Antarctic Survey, climate models, collation, Stoat, Wikipedia, William Connelley. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

57 Comments

Smokey

Posted Apr 18, 2009 at 9:34 PM | Permalink

So, they deleted the relevant data. Gee, how unfortunate.

-snip –
jorgekafkazar

Posted Apr 18, 2009 at 9:58 PM | Permalink

Bracegirdle is temporizingly evasive, instead of Trouet’s “temporally pervasive.”
Jeff Id

Posted Apr 18, 2009 at 9:59 PM | Permalink

I hope the casual reader understands this, scientists I’ve worked with delete NOTHING. Nada. Go ahead and try to delete the data in their engineering papers, but you’d better bring a gun or a bat or something. Erasers are not permitted and the whole discussion of applying an eraser to NON FAULTY lab data is idiotic. It costs pennies to store a megabyte. To even imagine in passing that a recent papers data are deleted is beyond reason.

Put me down as a data deletion skeptic, data deletion naturalist or even a data deletion denier.

Sorry Steve, not much more to add. Snip it if you’re in the mood.
- John Norris
  
  Posted Apr 19, 2009 at 9:27 AM | Permalink
  
  Re: Jeff Id (#3),
  
  Put me down as a data deletion skeptic, data deletion naturalist or even a data deletion denier.
  
  I second the motion. -snip- Please show me where data supporting research at the professional level has been deleted with anything more then a freak occurrence, with anything less then a most humble apology from the offender, and recognition from the researcher that the research is now at risk of not being supported.
  - Ryan O
    
    Posted Apr 19, 2009 at 9:47 AM | Permalink
    
    Re: John Norris (#26), I don’t know about that. I can see downloading something, converting it to a usable format, and then deleting the original download. I think that is unwise, but not gross incompetence or malfeasance. What I wonder is why the files were initially offered to Steve in the NetCDF format if that format was the intermediate format. What should have been offered to Steve was the PP formatted files. Steve shouldn’t have had to ask explicitly for the PP format ones; they should have been offered.
    .
    Now, if they deleted those, then I would agree.
  - Clif C
    
    Posted Dec 13, 2009 at 2:38 PM | Permalink
    
    Kenneth J. Stewart and Stephen B. Reed, “CPI research series using current methods, 1978-98”, Monthly Labor Review,, June 1999, p.29
    “No attempt was made, however, to recompute the [indices] by applying hedonic regression analysis to the individual [index] prices collected for the CPI during the 1978–98 period. Such an effort would not have been feasible, in part because the early price data are no longer available.”
MWalsh

Posted Apr 18, 2009 at 10:22 PM | Permalink

Meh. Might be stonewalling, might not be.

To me it reads that they obtained the original data from another source, converted it and kept the conversion.
Then deleted the original….since it’s readily available and wasn’t in the format they used anyway.
Would make sense if they didn’t actually use it in that format, since they still have the data they actually did use it sounds like.
Although, like Jeff, I think I’d be inclined to keep all of it, personally.

Now, the claim of not “having time” to reconvert for you might be stonewalling….

Perhaps you should request the PP format data and convert on your end.
Jason

Posted Apr 18, 2009 at 10:23 PM | Permalink

My first thought upon reading the exchange is that you could have been a little more patient.

Certainly the Team’s history of avoiding open data access is well documented and undeniable.

But Bracegirdle has no such history (yet).

As I understand the Bracegirdle’s take on the situation:

1. The files in question were downloaded from a publicly accessible site, to which he referred you.
2. The files were deleted after being converted from a format which Bracegirdle’s group does not use.

I really don’t think that #2 is unreasonable.

Jeff’s comparison is not on point, because these files were NOT “lab data”. They are something downloaded off the internet from another group. Perhaps the nature of the specific files used would provide some useful information. But this is a far cry from deleting data that was measured in a lab or even data that was generated by a simulation or analysis in the lab.

It also seems to me (from the very limited amount written here) that the paper in question does consensus climate models no favors.

Given your history of not getting the information you requests, I don’t think your anger is unreasonable, but I do think it is a little premature.
- Neil Fisher
  
  Posted Apr 18, 2009 at 11:41 PM | Permalink
  
  Re: Jason (#5),
  
  Jeff’s comparison is not on point, because these files were NOT “lab data”. They are something downloaded off the internet from another group. Perhaps the nature of the specific files used would provide some useful information. But this is a far cry from deleting data that was measured in a lab or even data that was generated by a simulation or analysis in the lab.
  
  Let’s see Jason – they downloaded gigabytes or perhaps even terabytes of model data, then presumably ran some sort of program to summarise it into a form they found useful. They published a paper based on this collated/summarised data. They then deleted the summary data. As we have seen before many times, data referenced on the web can and does change, and not everyone keeps their old data when they make the changes (and given the volume of data that comes out of CGMs, it’s likely that this particular data is not archived for eternity), and nor do they always even keep a record of what was changed and when (a change log). If someone wants to check the paper or some aspect of, or even try a variation they think will provide some useful metric not reported in the original paper, it can’t be done.
  
  This is, at best, bad practice – even sloppy. It’s somewhat surprising to me that such actions would not result in a withdrawl of the paper, either by the authors themselves with an apology, or by the publisher with a note about the journals archiving policies.
  
  snip
Soronel Haetir

Posted Apr 18, 2009 at 10:26 PM | Permalink

The one thing I would say about these episodes, your initial requests strike me as much too short.

Unlike a FOIA request where you have to be careful to include every possible item of interest these requests seem like they would be better served with a fat more pin-point request.

Steve: The problem isn’t my request; it’s that he deleted the data.
ryanm

Posted Apr 18, 2009 at 10:42 PM | Permalink

Steve, I have downloaded and fiddled around with some of the AR4 ensemble model data from the CMIP3 portal a while back. As I recall, a century of monthly means is about 1 GB per variable per model run. So, you are in the market for about 20 GB of data for the sea-ice fraction stuff. With a bit more information, i.e. which models and time periods, I can easily link that data to this thread from my server. The native format is netCDF.
Steve McIntyre

Posted Apr 18, 2009 at 10:44 PM | Permalink

#5. Where did my language indicate that I was “angry”? I discourage readers from getting “angry” and I try not to get “angry” myself. Here I’m simply documenting another incident. AS noted below, the “data” is a work product from gigabytes if not terabytes of data and is not simply downloaded from another site as is.

#4. As to your suggestion that I request the PP data, did you not read my closing letter in which I ask for precisely that data.

As to deleting the NetCDF files because they are in a format that they do not use – why put them in this format if they didn’t use it? And having prepared these data sets, like Jeff Id, I find it hard to understand the mentality of deleting them. I’ve got a number of old GISS scrapes and I don’t throw them out. I’m working on a home computer not a public sector workstation.

Anyway he presumably he has source code for the operation in which he converted PP data to NetCDF. How long would it take him to run the conversion? Negligible. Or did he delete that as well?

As to the suggestion that I re-collate from PCMDI, this is what Santer argued (but they eventually archived after FOI as I oberved above). The point, that some have missed, is that PCMDI doesn’t have monthly data just sitting there. If it was, Br could just point me to it. But Bracegirdle has collated the monthly versions from terabytes of model data. It’s not a small job doing this from scratch. I have two objections. Any collation introduces pointless risk of error that is irrelevant to statistical analysis. Plus both Santer and Bracegirdle were paid by public institutions to do this collation; I have no interest in doing so over again.
Steve McIntyre

Posted Apr 18, 2009 at 10:55 PM | Permalink

William Connolley has a webpage here on converting from PP files
http://www.antarctica.ac.uk/met/wmc/misc/ancil/
ryanm

Posted Apr 18, 2009 at 10:58 PM | Permalink

Note: relevant because the paper describes some other ways to measure sea-ice extant (e.g. Comiso and Nishio, 2008), as well as referencing Connolley and Bracegirdle (2007), and includes T. Bracegirdle as an author.
The British Antarctic survey is apparently publishing a paper based upon recent observations of sea-ice around Antarctica. This dataset

Ice core drilling in the fast ice off Australia’s Davis Station in East Antarctica by the Antarctic Climate and Ecosystems Co-Operative Research Centre shows that last year, the ice had a maximum thickness of 1.89m, its densest in 10 years. The average thickness of the ice at Davis since the 1950s is 1.67m.

A paper to be published soon by the British Antarctic Survey in the journal Geophysical Research Letters is expected to confirm that over the past 30 years, the area of sea ice around the continent has expanded.

I found the paper in the GRL papers-in-press bin, downloaded, and attached the abstract:

Turner, J., J. C. Comiso, G. J. Marshall, T. A. Lachlan-Cope, T. Bracegirdle, T. Maksym, M. P. Meredith, Z. Wang, and A. Orr (2009),

Non-annular atmospheric circulation change induced by stratospheric ozone depletion and its role in the recent increase of Antarctic sea ice extent

Abstract:

Based on a new analysis of passive microwave satellite data, we demonstrate that the
annual mean extent of Antarctic sea ice has increased at a statistically significant rate of
0.97% dec-1 since the late 1970s. The largest increase has been in autumn when there has
been a dipole of significant positive and negative trends in the Ross and Amundsen-
Bellingshausen Seas respectively. The autumn increase in the Ross Sea sector is
primarily a result of stronger cyclonic atmospheric flow over the Amundsen Sea. Model
experiments suggest that the trend towards stronger cyclonic circulation is mainly a result
of stratospheric ozone depletion, which has strengthened autumn wind speeds around the
continent, deepening the Amundsen Sea Low through flow separation around the high
coastal orography. However, statistics derived from a climate model control run suggest
that the observed sea ice increase might still be within the range of natural climate
variability.
- curious
  
  Posted Apr 23, 2009 at 6:02 PM | Permalink
  
  Re: ryanm (#10), Has anyone got a public access version of this paper? Bit more info. here:
  
  http://www.antarctica.ac.uk/press/press_releases/press_release.php?id=838
  
  I thought I saw other comment at CA but can’t refind it. Apologies if a location has been given.
John A

Posted Apr 18, 2009 at 11:28 PM | Permalink

$100 to CA that says that Steve will never get anything out of William Connelley. Since Stoat doesn’t work for BAS any more he shouldn’t be able to block anything (in a rational world of course).
Edouard

Posted Apr 19, 2009 at 12:53 AM | Permalink

April is not yet over ;-)))
RW

Posted Apr 19, 2009 at 1:48 AM | Permalink

“This is not at all what I was looking for. You sent me data sets that are only 30K…”

I think that if you’re asking for a favour, this is entirely the wrong tone to adopt. I am a physicist, and if I received a query like this regarding one of my papers, I’d be very inclined to ignore it, especially if the data being requested was freely available.
John A

Posted Apr 19, 2009 at 3:35 AM | Permalink

Can I ask for the record: what is the difference between Bracegirdle and Connelley deleting the datafiles for recent papers in which they made substantive claims, and Bracegirdle and Connelley simply making substantive claims without bothering with any actual data?

If Bracegirdle cannot produce the files, then Steve should request that the papers be withdrawn by the journals. “The dog ate my homework” shouldn’t be any more acceptable an excuse in academic journals than it is in junior high.
Martin

Posted Apr 19, 2009 at 5:10 AM | Permalink

Steve, you do come across as a bit abrupt in some of those emails, which might be taken as angry/rude.

It shouldn’t matter of course, a request for data is a request for data; I’ll disagree with RW that it should be seen as a favour. It’s part of the job.

(Dammit, Steve would you mind deleting my previous comment, I’ve incompetently put an email format in the website box!)
Lawrence Beatty

Posted Apr 19, 2009 at 5:15 AM | Permalink

Well for my tuppence worth any organisation that today does not have 24 hour back up facilities for its data regardless of whether a small glazing business or a government department is nigh on impossible.
Yet here we have vitally important data in as much it goes towards shaping decisions like all western governments are now making; being accidently erased.

That is absolutely outrageous and heads should roll for such poor sloppy practice.

-snip
per

Posted Apr 19, 2009 at 5:18 AM | Permalink

the BAS is funded by NERC:
http://www.antarctica.ac.uk/about_bas/our_organisation/who_we_are.php

NERC has a policy on data holding:
http://www.nerc.ac.uk/research/sites/data/policy.asp

“It follows that scientists in the NERC environmental science community will often be holding
data owned by NERC (or some other body), or in which NERC as a funding body has an interest.
There are consequent obligations on the holders (and their management) to look after the
data responsibly, so that these interests are not compromised, and indeed to be aware of the
issues that are important in this context.”

have fun
per
dearieme

Posted Apr 19, 2009 at 6:32 AM | Permalink

I was parked outside the BAS recently. It was a mild Spring day; people were strolling past in shirtsleeves. Suddenly a woman appeared dressed in a woolly hat, a thick scarf and an anorak worn over a bulky sweater. She walked into the BAS building. Scary, eh?
Craig Loehle

Posted Apr 19, 2009 at 7:03 AM | Permalink

Why might the raw data from the web not be useful (as steve noted)? Problems you can encounter: duplicate measurements on the same date (delete one? average them?), missing data, outliers. How you handle these makes a difference. Then there is the question of gridding the data, which has numerous pitfalls.

Valid reasons for “data is gone” — you no longer work at the place where data was collected (hardly likely for a 2008 paper, eh?), the person who collected/stored the data kept it in a format no one can now read (it’s happened to me), AND catastrophic computer crash (which means you don’t back up files often enough).

Responses I’ve gotten to data requests: right away, and never. About 50-50. And this is with much politeness and without the CA reputation.
stan

Posted Apr 19, 2009 at 7:10 AM | Permalink

“The dog ate my homework”.

I wonder if it’s the same dog that ate Jones’ homework.

The are two possible explanations. -snip
Snowmaneasy

Posted Apr 19, 2009 at 7:45 AM | Permalink

My first Post…a little off topic but maybe someone can answer it… of interest to me is the problem of a continuing CO2 buildup (levels on top of Mauna Loa are still increasing at exactly the same rate as of early April,2009…) despite the severe global economic recession. One would expect to see a modest drop in the CO2 levels…very odd or is there a severe time lag…..one would imagine our current recession to be a dream come true to the those amongst us who would have us sitting and freezing in the dark…..I have seen nothing anywhere on this topic…….!!!!!!!!!

Steve: unfortunately, this blog cannot be all things to all people. I realize that most people want to discuss the “big picture”. However, editorially, I’ve taken the position that such threads turn into the same points over and over. I’ve therefore taken the position that this blog will deal with finite issues, focussing on verification and statistical analysis, and do not permit this sort of “big question” to enter specific threads, while having sympathy for people who wonder about it. I wonder about the “big questions” myself and regret that IPCC has not provided an engineering-quality A-to-B exposition of how doubled CO2 leads to 3 deg C. When we see such a document, we’ll discuss it here, but until then, we’ll stick to narrower issues where I think that we can improve our collective understanding.
Steve McIntyre

Posted Apr 19, 2009 at 7:50 AM | Permalink

I got stonewalled before CA existed. It is sometimes used as an excuse for not providing data; however, I do not personally believe that it is the reason for not providing data.

Consider Jacoby’s response before CA. He archived a lot of data, but failed to archive a lot of data. It turned out that he hadn’t given Mann his Mongolia data (Mann digitized it from a squiggle) and said so in also refusing to give it to me. As a primary data collector, it seemed to me that he was annoyed at Mann for getting more publicity than Jacoby without ever doing any data gathering himself.

Added to this was the bizarre “few good men” theory for archiving data that supported your story and destroying the data that didn’t.

Also Team members don’t always share. Apparently Esper can’t get data from Hughes.

For me, I don’t much care about motives. Mining promoters have to report all their drill results. So should climate scientists.
Ryan O

Posted Apr 19, 2009 at 8:43 AM | Permalink

Steve, is there a reason why he couldn’t simply supply you with the PP formatted data? I imagine converting from PP to whatever other format you wanted wouldn’t be too trying. After all, what you really want is the monthly composites – that’s where all the work would be. Did he give a reason for not wanting to supply the PP format?
- Steve McIntyre
  
  Posted Apr 19, 2009 at 9:51 AM | Permalink
  
  Re: Ryan O (#24),
  
  Steve, is there a reason why he couldn’t simply supply you with the PP formatted data?
  
  I can’t think of any reasonable excuse. He asked me if NetCDF would be all right; I said yes. Then he said he couldn’t supply it because he had deleted the NetCDF. I’ve asked for the PP and we’ll see what happens. In his shoes, I would have spent the half hour (or day) recovering the NetCDF files and cursed myself for deleting the public format data. Or as a very poor fallback, I would have sent the PP data with abject apologies and saying that I would arrange conversion help if it proved problematic. I don’t see how he imagined that any good would come out of simply providing nothing.
  
  #26 and others, I have many data sets in R binary form and it would take a little while to make them into ASCII. If I’d used such a data set in a publication, if someone were not satisfied with an R binary format, I’d view myself as obligated to provide an ASCII version or equivalent. ( I don’t view myself as being under equivalent obligation for blog posts, where I think that readers of this blog should be able to handle R binary data.)
  
  Folks, no editorializing on motives. I’ve snipped some such commentary as unproductive.
PaulM

Posted Apr 19, 2009 at 8:53 AM | Permalink

per makes an important point.

The NERC data policy details our commitment to support the long-term management of our data. It also outlines the roles and responsibilities of NERC funded scientists, the NERC data centres and NERC management in ensuring that data that are collected using NERC funds are available for the long-term. The main agents for supporting the data policy are the network of NERC data centres.

Reading the policy, it makes it quite clear that it applies to the outputs of computer models as well as field data.
MWalsh

Posted Apr 19, 2009 at 11:01 AM | Permalink

#8 Steve McIntyre As to your suggestion that I request the PP data, did you not read my closing letter in which I ask for precisely that data.

Nope, I did not notice that last night….my apologies for suggesting a fait accompli.
That’ll teach me to not post when I’m tired and half-snapped. 🙂
It will be interesting to see the response, which should answer the stonewalling/not question.

If I may, there seems to be a series of assumptions being made on the part of a number of the posters which are contrary to my own after reading the emails.

Perhaps it’s my lack of knowledge on the formats in question (or perhaps I’m missing something else in the emails)….so a question:
Why is the assumption being made by several posters that there has been data loss from a change in file format?
Does the format conversion somehow imply that?

My assumption (that I described poorly last night) would lead to a chain like this:

1/ Request data from third party. Receive data from third party in NetCDF file format.
1/ alternate 1/ Download directly from web-portal. Software at hand collects in NetCDF format.
1/ alternate 2/ Download directly from web-portal. Software converts to PP format on-the-fly. End of steps, NetCDF is never actually saved.

2/ Convert all received data from NetCDF to PP.

3/ Trash NetCDF files, keep PP files and work with those.

Under this assumption, there is no data loss/deletion/canine consumption….all data is retained, but in the PP file format.
The only thing not being retained is the original file format, which is, after all, readily available per re-request.

As I stated last night, I, personally, would be inclined to keep the original received file….but I’m a pack-rat, and there is always a possibility of that just-in-case moment actually occurring. Keeping it saves me having to re-request. And, yes, agreed, in the circumstances not keeping the original would be sloppy.
But, if the data quantity is massive, I certainly would think twice about keeping it.
This, BTW, may answer your question Anyway he presumably he has source code for the operation in which he converted PP data to NetCDF. How long would it take him to run the conversion?. Conversion of a bunch of 30k files is trivial time-wise….conversion of 20, 50, 100, 1000 gigs of data, not so much.

I guess it needs to be said, I do have issues with the Santer mess, as well as Jones and Steig…..but my issue mostly comes from the belligerent tone and attitude of “nobody should be checking my work”, as well as non-compliance with scientific norms of data provision for review. These are primary issues that led to me being skeptical (and by saying skeptical, I’m being kind here) of their claims in the first place.

I’m afraid I’m not seeing that tone and attitude in the current set of emails….at least, not yet.
- Steve McIntyre
  
  Posted Apr 19, 2009 at 12:21 PM | Permalink
  
  Re: MWalsh (#29), the PCMDI data comes in gigantic files, which are processed to yield the monthly sea ice data in question as a work product which is a fraction of the original data size. Maybe 1 MB per model tops.
  
  Yes, it’s possible to re-do the work but, as repeatedly observed, such re-collation introduces pointless risk of error. Plus the work has already been used for a publication and should be available (as even Santer’s boss agreed.)
  
  The tone of these emails is more pleasant, but the result was the same. As compared with Santer, Phil Jones was always very pleasant but nothing happened without resorting to FOI.
  - Andrew
    
    Posted Apr 19, 2009 at 2:11 PM | Permalink
    
    Re: Steve McIntyre (#30),
    
    Phil Jones was always very pleasant
    
    Unless your name happens to be Warwick Hughes…
    
    Steve: That brush-off was not a 9 on a scale of 1-to-Santer. 🙂
Cris

Posted Apr 19, 2009 at 1:58 PM | Permalink

Mwalsh asks

Why is the assumption being made by several posters that there has been data loss from a change in file format?
Does the format conversion somehow imply that?

One would typically assume that a format conversion will not insert errors, given clean input data fed into a correctly written program running on a stable system. But given the possibility for problems in any of those areas (see Saltzer, Reed, and Clark, End-to-end arguments in system design), why wouldn’t you retain the intermediate data? It seems to be an integral step in the data chain, as worthy of archiving as any other data.
AnonyMoose

Posted Apr 19, 2009 at 9:39 PM | Permalink

In 1997 Connolley didn’t publish some data but did have it separately available. What has he learned since then?
http://ams.allenpress.com/perlserv/?request=get-document&doi=10.1175%2F1520-0442(2000)013%3C1351%3ADOAARC%3E2.0.CO%3B2&ct=1
Plimple

Posted Apr 20, 2009 at 4:45 PM | Permalink

“I thought to myself that the Team seemed to have learned something from the Santer episode.”

You think Bracegirdle and Santer are working together on this work? Or is that everyone you deal with in climate science is part of the Team? Gasp.

“Once again, it wasn’t responsive to the request.”

I’m sure Tom Bracegirdle enjoys being referred to as “it”.

Paranoia and rudeness. I’m sure that having read this Tom is really eager to help you Steve.
- Steve McIntyre
  
  Posted Apr 20, 2009 at 6:56 PM | Permalink
  
  Re: Plimple (#35),
  
  You think Bracegirdle and Santer are working together on this work?
  
  No, I didn’t.
  
  “Once again, it wasn’t responsive to the request.”
  
  I’m sure Tom Bracegirdle enjoys being referred to as “it”.
  
  No such reference. “It” in this context, obviously refers to the email.
  
  I’m sure that having read this Tom is really eager to help you Steve.
  
  I only ask that he fulfil his obligations as part of publishing an article and in accordance with being a recipient of public funds. I think that he should he appreciate the interest, but whether he is “eager” to fulfil his obligations is immaterial to me.
- Willis Eschenbach
  
  Posted Apr 20, 2009 at 7:31 PM | Permalink
  
  Re: Plimple (#35), you seem to be advancing the “But he didn’t ask me nicely” defense as though it were actually rational and reasonable.
  .
  Asking a scientist to release his data and methods should never have to be done. The data and methods need to be, and should be, made available as part of the usual process of publication. Replication is the central part of the scientific method, and it can’t be done when people hide how they’ve gotten their results.
  .
  As such, there is no defense for not handing over your data and methods. You can’t just go “He didn’t say ‘Mother May I’ first! He was mean to me.” You can’t just say (as Phil Jones did) that you refuse because they’ll (gasp) look for errors in your results.
  .
  So what if someone didn’t ask in the world’s nicest way? Who cares? If you want to be an obstructive jerk and refuse to show your data and methods, you deserve all that you get, and likely more.
  .
  So whether Tom is “eager to help” or not is totally meaningless. If he wants to be a real scientist and not just play a scientist on TV, he has to follow the rules. His state of mind as he does so means nothing. If you want to play, you have to pay.
  .
  Having said that, I’ve never known Steve to be anything but polite and cordial in his requests for information, as you can see in the record of his correspondence above.
  .
  w.
jay alt

Posted Apr 23, 2009 at 8:41 PM | Permalink

And when can we expect that you will be publishing your findings?
Yeah, I thought so . . .
- curious
  
  Posted Apr 24, 2009 at 2:37 AM | Permalink
  
  Re: jay alt (#39), ? what’s that all about?
Datastream

Posted Apr 24, 2009 at 2:41 AM | Permalink

Re #37

Asking a scientist to release his data and methods should never have to be done. The data and methods need to be, and should be, made available as part of the usual process of publication. Replication is the central part of the scientific method, and it can’t be done when people hide how they’ve gotten their results.

This assertion is fine and would receive little argument from most practicing scientists like myself. The problem is how far it is taken and how it is interpreted. Steve takes a pretty hard line that many on the receiving end find pushes the envelope, exhibits a sense of entitlement and is delivered brusquely, even discourteously.

Providing data is an obligation that most of us meet – methods *should* be described in publications, but I will admit that sometimes this is weaker than we should aspire to: reviewers and journals have a way to go. Where I part company with Steve is in the extent to which I should make available intermediate tools.
The handle-cranking and head-scratching that we see on this website (at its rare best) is intellectual property, as is the code that captures it. Sure – it must be described fully, but if “replicators” need to use their own intellectual horsepower to replicate with somewhat different tools then that becomes a far better test. Not that I often mind sharing tools anyway, but that is a somewhat separate issue, and should be something for which the recipient is grateful. I have asked colleagues for code, on occasion that they worked long and hard to develop. I have been grateful that they shared it rather than simply referring me back to the differential equations and physics that would have undoubtedly strengthened my understanding – I was taking shortcuts and their courtesy helped me to a speedy, if lazy and less principle-based, result. This forum and similar commentators often comment adversely on the “black box” garbage-in/garbage-out problem, but sharing code simply exaggerates this, in my experience.

Finally, there is a real logical problem in this mentality of a publicly funded (therefore all ingredients accessible) mentality – do I need to allow an auditor to visit my lab to check my gas-chromatograph, or mass-spectrometer, or whatever? Indeed should an independent investigator be present on my next field trip gathering pollen, ocean sediments, tree corings or whatever. At some point, tools and methods must be left to careful description. If the result is important enough and the concerns significant then replication will surely be driven by these questions – see for example teh replication of the deep Greenland ice cores, GISP and GRIP.
- Craig Loehle
  
  Posted Apr 24, 2009 at 6:45 AM | Permalink
  
  Re: Datastream (#41), Okay, let’s consider replication of the Greenland ice cores. Replication in this case cost $millions, paid for by tax payers. The data are publicly available, but let’s say the authors would NOT release them. Would that be ok with you? The two key world temperature reconstructions, CRU and GISS, can’t be audited. CRU won’t release anything and GISS finally released its code but no one can make it work (last I heard). Is that ok? These are more than just an individual research project–they are government funded labs creating critical datasets. Last October Phil Jones actually published a paper showing a huge urban heat island effect in China, contra claims in the past of 0.05deg C effect only. Did they redo the CRU analysis? ? ? Why not? I bet they find the same thing in the great red spot of Siberia.
  - bender
    
    Posted Apr 24, 2009 at 8:31 AM | Permalink
    
    Re: Craig Loehle (#42),
    Ross McKitrick’s paper suggests half of the current land surface warming trend may be due to under-estimated UHI effects. This factor clearly needs to be re-assesed – especially in light of the Chinese data. IPCC: this means YOU.
    
    I bet they find the same thing in the great red spot of Siberia.
    
    I agree. At the same time I am willing to bet there is a large residual warming effect attributable to changes (natural? anthropogenic?) in the Arctic ocean.
  - Kenneth Fritsch
    
    Posted Apr 24, 2009 at 11:02 AM | Permalink
    
    Re: Craig Loehle (#42),
    
    Last October Phil Jones actually published a paper showing a huge urban heat island effect in China, contra claims in the past of 0.05deg C effect only.
    
    Do not we have to be careful in interpreting the magnitude of UHI found at some point in time versus a changing UHI that would be critical to measuring temperature anomalies over time. I personally feel that micro climate conditions and differences over time due to nonclimate changes at the measuring stations have a larger effect than UHI by itself, but all that needs to be determined — as the Watts team has been attempting to do in the US.
    
    Re: Datastream (#41),
    
    If a scientist has his own peculiar reasons for not releasing data (intermediate) that would make replication much less difficult, I say so be it. My point, and I think perhaps that of others here, would be that that point should be clearly made when reporting results in reviews that are precursors for policy decisions, i.e. use these data and conclusions at your own risk as replication may not be readily or in a timely manner performed. I do not think you can have this one both ways.
- Ryan O
  
  Posted Apr 24, 2009 at 10:49 AM | Permalink
  
  Re: Datastream (#41),
  I understand your concerns, and I agree with most of them in whole (especially the last two paragraphs) and all of them in part. I think a line would need to be drawn between what is sufficient for science and what is sufficient for policy.
  .
  Though this might be at odds a bit with many here, for scientific purposes, I do not see it as critical that a researcher must provide the specific code used to arrive at a result. I also agree that, in many cases, better science and more learning results from others attempting to replicate the results using physical principles and descriptions of methods (though sometimes the descriptions provided are terrible – a practice that definitely needs improvement). To me, that would be sufficient for science.
  .
  When it comes to evaluating science for policy purposes – such as the EPA reviewing and using published climate work to justify marching down the path to regulating greenhouse gasses – the primary purpose is not to learn, it is to assess the certainty of the result. It is an audit function, not a scientific one.
  .
  For an audit to be conducted properly, the data as used needs to be provided. I might even go so far as to say that the calibration of your gas chromatograph needs to be checked. Code needs to be provided and be audited for mistakes and buggy behavior. An audit is not a time to independently replicate results from first principles.
  .
  While I understand that many would object to this, when (quite literally) trillions of dollars might be spent, the government has a responsibility to take an engineering approach to the issue that necessarily involves auditing the data, code, and methods as they were actually used.
  - Kenneth Fritsch
    
    Posted Apr 24, 2009 at 11:11 AM | Permalink
    
    Re: Ryan O (#44),
    
    Ryan,I did not read your post before making mine but I think we are saying the same thing and I think it is an important point. It would also put more pressure on those scientists and groups supporting those scientists who want to influence policy. I think those posters, who sometimes comment here and imply that we should take those good scientists word for it, need to remember “trust but verify” and all its implications.
clivere

Posted Apr 24, 2009 at 2:40 PM | Permalink

Ryan O – unless a paper is written specifically to inform policy or as a result of a policy initiative then I dont see how an author will know that this greater standard must be imposed for their specific paper. Do we have reason to believe the papers under discussion here should require such a standard?

Ultimately the requirement must be to define the minimum standard for data and process description in order to support either review or replication. Once that minimum standard is defined then authors and journals should be encouraged to at least meet and ideally exceed that standard. Steve so far has only desribed the loss of intermediate files that will make review/replication more difficult but has not demonstrated that it will make review or replication impossible.

With regard to this specific incident I share some peoples unease with how Steve has handled this request. I try to look at this from the perspective of all parties and ask myself what is reasonable.

1. Is Steve’s request reasonable. In my opinion yes because he has an interest in the data and it would be helpful to him to have it in that form.

2. Were the first two responses by Bracegirdle reasonable. Yes but clearly he had misunderstood what Steve wanted and that confusion was not helpful.

3. Were the third and fourth responses by Bracegirdle reasonable. In my opinion yes because he thought he was providing Steve with what he wanted and did so in a timely manner. I also note that the intermediate files were in the NetCDF format promised.

4. Was the last refusal email by Bracegirdle reasonable. In my opinion maybe. If the data is unavailable due to deletion then it all hangs on whether anything material is lost by the deletion other than to create extra work for Steve. If equivalent data is made available in the alternative PP format then the response may be reasonable.

5. Up to this point I dont have any issue with Steve’s requests. However Steve has then chosen to make this blog post. In my opinion making the post is premature and an unreasonable escalation at least in the form it has been posted. Steve should have waited until the response to his last request for the files in PP format was received. If the request was refused or if Steve can demonstrate that this prevents reasonable review/replication then escalation via this blog would be a reasonable tactic.

6. The post also emphasises William Connolley who was not involved in the email chain and whose input was apparently not sought. It is reasonable to mention he was a co author but it is not reasonable to place a level of emphasis on him at this point and to try to make him appear guilty by association. Steve may be having a bit of inter blog sport by making the snark (I enjoy amusing and appropriate snark) but in these circumstances I feel the snark has an edge to it.

7. Relating this to the very unpleasant Santer refusal is not justified. Steve had 3 main grounds for complaint against Santer ie the refusal to provide data which was actually available, the appalling tone of the refusal and the subsequent handling of the FOI. None of these apply in this instance. The data is not available due to deletion which is a different issue entirely.
- Ryan O
  
  Posted Apr 24, 2009 at 7:27 PM | Permalink
  
  Re: clivere (#47), My point is not to impose a greater standard for certain papers. My point is that before policymakers take action that a proper engineering study (which includes detailed auditing) be undertaken on the science upon which the proposed policy is based. To impose those kinds of standards on science in general would be, in my opinion, stifling.
  The burden of demonstrating the certainty of results is on the policymakers.
  - Kenneth Fritsch
    
    Posted Apr 30, 2009 at 9:14 AM | Permalink
    
    Re: Ryan O (#50),
    
    The burden of demonstrating the certainty of results is on the policymakers.
    
    I think in the ideal case the burden of establishing CIs would be on the policymakers – if policymakers were ideally and simply interested in establishing uncertainties. The policymakers, with whom I have experience in the US, are much more oriented towards marketing a political solution that often is preconceived.
    
    The thinking person, in my mind, has to take the available evidence into hand while at the same time recognizing the limitations of the systems involved and attempting to take those limitations into account. Blogging (and here one needs to be also cautious of looking only at an ideal version and not recognizing agendas) appears to me to be a potential tool better suited for that process than faithfully and single-mindedly heeding the policymakers, the peer-reviewed literature and the reviewers of that literature that might have demonstrated agendas.
    - Ryan O
      
      Posted Apr 30, 2009 at 9:36 AM | Permalink
      
      Re: Kenneth Fritsch (#53), Agree entirely. Ideally, we could depend on our government agencies and policymakers to serve this function, as that is what they are supposed to do. What actually happens would indicate the situation is far from ideal.
    - bender
      
      Posted May 1, 2009 at 4:56 AM | Permalink
      
      Re: Kenneth Fritsch (#53),
      
      The policymakers, with whom I have experience in the US, are much more oriented towards marketing a political solution that often is preconceived.
      
      If their preconceived political solutions do not allow for the possibility of uncertainty or error, then it is not much of a “political solution”, is it?
    - Kenneth Fritsch
      
      Posted May 1, 2009 at 9:45 AM | Permalink
      
      Re: bender (#56),
      
      If their preconceived political solutions do not allow for the possibility of uncertainty or error, then it is not much of a “political solution”, is it?
      
      Frequently, as it turns out, in the end it is not a solution, but I think the politicians/policy makers, in the US anyway, have a storehouse of general solutions waiting for problems, or better yet emergencies to come along to which they can apply a “solution”.
      
      And, of course, the politician/policymaker has to make a go at establishing the possibility of a calamity as part of this process.
- B.D.
  
  Posted Apr 25, 2009 at 11:27 AM | Permalink
  
  Re: clivere (#47),
  
  Your point #4:
  
  PP is a proprietary format, so the data are not accessible to Steve. Bracegirdle deleted the publicly-accessible data. That is the problem and is what makes Bracegirdle’s last refusal unreasonable. Bracegirdle should convert the PP data to netCDF for Steve and save it somewhere, and not make the same mistake again in the future.
per

Posted Apr 24, 2009 at 2:47 PM | Permalink

Datastream

do I need to allow an auditor to visit my lab to check my gas-chromatograph, or mass-spectrometer, or whatever?

you should be aware that for much research that the government relies upon, that is the standard that the government mandates. Look up Good Laboratory Practice.

It is also worth noting that this is an area where paper after paper has been found to have severe problems; look at MBH’98, where the original methods were completely inadequate, and even the corrected version is not enough to allow people to replicate the work.

And isn’t that the point of science ? You put enough in the paper so people can replicate. When people cannot replicate (and that is what has happened many times here), then you have to start asking why. You don’t just put your hands up and say, “oh, i am doing something else now, but the original paper stands even if others cannot replicate it”.

And the point is, this is not just trivial stuff. These papers are being used to justify multi-billion dollar decisions, so it is a reasonable request to find out if the basics can be replicated.

per
- John M
  
  Posted Apr 24, 2009 at 4:13 PM | Permalink
  
  Re: per (#47),
  
  Datastream
  
  do I need to allow an auditor to visit my lab to check my gas-chromatograph, or mass-spectrometer, or whatever?
  
  you should be aware that for much research that the government relies upon, that is the standard that the government mandates. Look up Good Laboratory Practice.
  
  And since the EPA is now involved, you might want to Google:
  
  EPA protocols audit.
  
  If research is simply to publish papers, such physical audits would be silly. As pointed out earlier, if it’s important enough to influence policy, the comfy world of the ivory tower no longer exists.
Christopher

Posted Apr 29, 2009 at 2:00 PM | Permalink

Well, I think this is overblown. Just register for the site and get the data. I do a fair amount of poking around in climate-related databases in my research, this is the norm. It’s not like he deleted all records of his data, right? That would “deserve” some of the comments in this thread but otherwise, this is a non-issue.
Steve McIntyre

Posted Apr 30, 2009 at 12:46 PM | Permalink

On April 22, Thomas Bracegirdle sent me the following explanation of the situation – I apologize for not posting this up earlier due to being busy on non-blog matters for the past week, something that I mentioned elsewhere. Bracegirdle writes:

I would just like to clarify the steps we took in acquiring and processing the CMIP3 model data here at BAS. The monthly mean NetCDF data fields were downloaded and immediately converted to PP format and archived. PP is the UK Met Office proprietary format and is used in all our work in the Climate group at BAS. The conversion to PP does not affect the data. The NetCDF files were then deleted, as we could not justify storing many terabytes of the same data in two different formats (multi-level, multi-variable ocean and atmosphere fields for many models and scenarios occupy large amounts of disk space). In addition the original files remain freely available from the primary source at Lawrence Livermore.

In terms of sea ice concentration (SIC) data, we have monthly mean PP fields for individual CMIP3 models archived. I should stress that these are identical data to the monthly mean NetCDF files available at the CMIP3 data portal, which I assume would be easier for you to handle than PP format. While we downloaded multi-variable, multi-level files (for use in a variety of investigations), it is perfectly straightforward to download individual NetCDF files containing monthly-mean SIC for a specific model and scenario. These files occupy about 0.5GB per scenario per model (as of course do the PP fields that we have stored locally).

The multi-model average results shown in our papers were calculated in the following way. For time slice difference plots we calculated the 20-year time slice averages from each individual model before combining these to give the multi-model average. The data from various models were re-gridded to the HadCM3 grid before combining (as weighted and unweighted averages) to form the multi-model average. If you would rather not perform these calculations independently, we can provide the time-slice multi-model averages (i.e. in the 21st century paper this represents four time-slice fields: early+late 21st century unweighted and weighted averages).

So, in summary, we have archived data identical to that available directly from the CMIP3 portal, but in a different format. Our calculations here at BAS generated only a small number of additional files containing multi-model average time-slice data. We are happy to provide you with these multi-model average files (either in PP format or converted to NetCDF). Additionally, we could provide you with the individual-model, monthly-mean data files used to construct these ensembles, but only in PP format. However, if you are interested in using this level of data, we would recommend that you obtain the (identical) data in NetCDF format directly from Lawrence Livermore.

Hopefully now I have provided a clearer picture of what CMIP3 data are available both here at BAS and from the primary source at Lawrence Livermore. I am happy to provide you with the data fields that we used and generated for these papers. If you are still interested in receiving some or all of these data then let me know precisely what you would like.

5 Trackbacks

By Jennifer Marohasy » Unfortunately we have Deleted All the Files on Apr 19, 2009 at 7:51 PM

[…] or that there aren’t any backups for the deleted data on institutional servers. Read more here. […]
By The Daily Bayonet » Global Warming Hoax Weekly Round-Up, April 24 2009 on May 1, 2009 at 6:47 AM

[…] Steve McIntyre never tires of holding warmers feet to the fire, and they don’t enjoy it much. Coincidence then that the data he needs was deleted? […]
By Lack of disclosure continues « CO2 Realist on May 16, 2009 at 1:14 PM

[…] Steve McIntyre at Climate Audit has been one of the major thorns in prodding those in the climate science community to come clean and shows us what they have. In a recent post, McIntyre discusses yet another adventure into attempting to procure the data that was used for a study. Read his post entitled Connolley co-author: “Unfortunately we have deleted all the NetCDF files…” […]
By New Math « Awed Manor on Nov 22, 2009 at 10:15 PM

[…] deleting files”, Examiner with a deeper look, The Air Vent with a real deep dissection, and Climate Audit on deleted […]
By Climategate Scandal, An Inconvenient Data Dump … They Threw Away Data that Predictions of Global Warming are Based Upon | Scared Monkeys on Nov 29, 2009 at 10:23 AM

[…] Connolley co-author: “Unfortunately we have deleted all the NetCDF files…” […]

Climate Audit