In May 2008, I collated correspondence requesting CRU station from various parties commencing with Warwick Hughes’ correspondence in July 2004. See here. On a couple of occasions, I’d referred to some correspondence with Phil Jones in pre-Hockey Stick days (fall 2002). At that time, I was surprised by the promptness of the response and the extra effort that Jones had put into the response. (I think that I noted Jones’ courtesy as a correspondent from time to time in the first years of the blog.)
The correspondence is interesting to re-read in light of subsequent developments and subsequent positions. (It also shows some development in my own technical skills. In fall 2002, I hadn’t started using R and, like many of you, looked at things in Excel.)
On 8 Sep 2002, I sent the following to CRU Information; I think that this letter to the CRU contact marks my very first climate inquiry:
In Journal of Climate 7 (1994), Prof. Jones references 1088 new stations added to the 1873 stations referred to in Jones 1986. Can you refer me to a listing of these stations and an FTP reference to the underlying data? Thanks, Steve McIntyre
This was referred to Jones, who provided the following helpful reply on Sep. 12, 2002. Note that Jones said that he would be “putting the station temperature and all the gridded databases onto our web site” once the paper is published (which occurred in early 2003.) In fact this seems to have happened, as station data for 5070 stations (presumably some duplicates were eliminated between Sept 2002 and Feb 2003) was placed on their website in Feb 2003, where it remained until July 31, 2009. (However, no link was placed to the station data, nor was the name of the file ever published. To find the file without email advice, you’d have to look behind their subsequent data refusals and parse candidate files one by one, something that I did recently after taking a renewed interest in this file in June 2009.)
Sent: Thursday, September 12, 2002 12:36 PM
Subject: Fwd: Jones 1994 Data Set
You are looking into station lists from papers in the early 1990s and 1980s. These are now out of date. There will be a new paper coming out in J. Climate (probably early next year). I’m attaching the station list (5159 stations) from that paper. In this file the first number is the WMO number ( or an approximation to it or just a number – large number of US stations at the end). Official WMO numbers are those here divided by 10. The first station (Jan Mayen has a WMO number of 01001, but in our list it is 10010).
Latitude (degrees*10 so 589 is 58.9 N, -ve will be S)
Longutude ( similar to latitude with E -ve)
Height (m , with missing of -999)
Country (this field isn’t always there and doesn’t always take into account changes of the last 20 years. We don’t use this field, so don’t bother keeping up to date with it.
Also names are common English names for countries not their official ones that the UN uses).
First year of data
Last year of data (Most of the 2001s also include 2002 but this file hasn’t been altered)
Then some other numbers.
The first file (above description) is what we call station headers. They mean we have temperature data for the years between the first and last year for each station. However there may be lots of missing data or the data may be deemed inhomogeneous (see the papers you have), so a station may not be used in the our analysis for a whole raft of reasons. As we work with station anomalies we also have a file (also 5159 lines) of stations normals (average temps in deg C*10 for 1961-90). If this second file contains -999 (missing values) then the station temperatures will not get used so the station isn’t used.
Once the paper comes out in the Journal of Climate, I will be putting the station temperature and all the gridded databases onto our web site. The gridded files on our web site at the moment are from our current analysis. The new analysis doesn’t change the overall character of the gridded fields, it is just easier for me to send the new lists of stations used from the new analysis.
I hope this helps.
I have a file entitled allnorms6190.dat of length 5159 dated Sept 13, 2002, which appears to be the file enclosed in this email though it doesn’t precisely match the description. It is a list of length 5159 (as described in the Jones letter), but it does not have station names, lats and longs; instead it has the station normals – useful information not presently available anywhere that I’m aware of. The first few lines of allnorms6190.dat are shown below.
10010 1921 2001 1961 1990 -57 -61 -61 -39 -7 20 42 50 28 1 -33 -52
10050 1912 1979 951 970 -117 -123 -122 -94 -33 17 48 43 8 -38 -73 -101
10080 1911 1999 1961 1990 -153 -163 -158 -124 -44 18 58 48 4 -55 -103 -133 ]
In 2007, when Willis Eschenbach sought station data, the current version of the information was refused under various pretexts, requiring him to make repeated requests, ultimately resulting in a list of 4138 stations being placed on the CRU website after 3 or 4 FOI requests.
In response to Jones’ helpful email, on Sept 17, 2002, I sent a short note to Jones thanking him for the list and inquired about a concordance of his numbers to GHCN numbers – something that I started doing recently based on more recent versions. I referred to the availability of the 1991 version of station data (then available at CDIAC) and inquired about the 1994 version. Subsequently, it turned out that a variation of the 1994 version (cruwlda2.zip) had been online at CRU since 1996. Home computers at the time were not nearly as handy as they are now. And I hadn’t discovered the magic of R. As a result files like the one at CDIAC were then awkward for me to handle. (My skills have definitely changed on this front over the past years.)
Thanks for this. It seems awkward not to use exact WMO station numbers – do you by any chance have a concordance of your numbers to WMO numbers where they do not correspond? I (think I) noticed that sometimes your numbers are also in use for a nearby but different GHCN station, which seems a bit awkward. I also noticed a few stations in which the lat-long’s do not seem to tie into GHCN data and can forward these possible errata if you like. Wouldn’t it make more sense to convert over to WMO station numbers carrying a concordance to your past numbers?
I’m still interested in the 1994 data set as it has become so standard. Is there a FTP from which the underlying station data and mean temperatures (either as anomalies or absolutes) can be downloaded? I’ve located an FTP for your 1991 version, but have had little success in locating the 1994 version.
When you do publish the 2002 version, I would urge you to make FTP available annualized data for individual station boxes as well as for grid-boxes, so that readers interested in regional studies can carry out verifications. (If this is not currently available for the older data, it would also be nice for it as well.)
It would also be nice if annualized data were also available as I am sure that many of your users are mostly interested in this. The 12-fold reduction in dataset size is fairly important for fitting into Excel spreadsheets, which work nicely on annual data.
Regards, Stephen McIntyre
On Sept 18, 2002, Jones sent me two files: cruwld.zip, the then published version of the station data (from Jones 1994) and a file of normals for the 1994 version normup6190.dat. In the email a few days earlier, Jones said that he would place station data online at the time of publication of the new version (Jones and Moberg 2003), but in this letter, he resiled somewhat adding the word “possibly”, alluding to a desire to avoid problems such as those supposedly experienced between some European countries and GHCN, a point that recurs in later correspondence.
Attached are the two similar files [normup6190, cruwld.dat] to those I sent before which should be for the 1994 version. This is still the current version until the paper appears for the new one. As before the stations with normal values do not get used.
I’ll bear your comments in mind when possibly releasing the station data for the new version (comments wrt annual temperatures as well as the monthly). One problem with this is then deciding how many months are needed to constitute an annual average. With monthly data I can use even one value for a station in a year (for the month concerned), but for annual data I would have to decide on something like 8-11 months being needed for an annual average. With fewer than 12 I then have to decide what to insert for missing data. Problem also applies to the grid box dataset but is slightly less of an issue.
I say possibly releasing above, as I don’t want to run into the issues that GHCN have come across with some European countries objecting to data being freely available. I would like to see more countries make their data freely available (and although these monthly averages should be according to GCOS rules for GAA-operational Met. Service.
At the time, I was less attuned to some climate science practices, but there are some interesting points here. Jones sent me a file of exactly the same type as the one now requested. What changes took place between 2002 and 2009 that are relevant to a refusal decision? My qualifications are much greater now than they were in 2002. Warwick Hughes, at the time of the 2004 refusal, had published five articles in peer-reviewed literature? What relevant change had taken place between 2002 and 2004?
There’s nothing in here about confidentiality agreements and if there were relevant agreements governing cruwld.zip, then they obviously didn’t prevent Jones from sending me the data or posting the data on the CRU website. (In this light, what is the justification for the deletion of cruwlda2.zip from the CRU website on July 31, 2009?)
Jones alludes to problems between Europe and the GHCN and his desire to have more countries make their data freely available. Surely the best way of accomplishing this is to place some sunshine on the matter. Let’s find out who the problem countries are, if any, and publicize the problems. It’s hard for me to imagine any European country that could sustain such obstruction in the face of international publicity. Anyway, it’s well worth finding out. If Jones really “would like to see more countries make their data freely available” as he says here, then surely we’re on the same side.