On Sept 28, 2006, Willis Eschenbach sent an FOI for CRU station data. A year later, after many letters, we still do not have the CRU data as used, but do have a list of stations used, a list which is slightly shorter (4138) than the 4349 stations said to have been used in Brohan et al 2006, their most recent publication. It looks as though they sanitized the list somewhat – in Brohan et al 2006, they said that they removed 55 duplicates. I guess that they’ve identified 156 more duplicates (4 times as many as reported in Brohan et al.) But perhaps there’s another reason. In my opinion, they should have delivered a list of 4349 stations – I’ve asked for this from Phil Jones.
Secondly, the list has not been delivered in working order: the data is supposed to be mostly (“98%”) derived form GHCN, but the identification numbers do not tie in precisely to GHCN numbers. The CRU ID numbers are 6 digits versus 11 digits at GHCN. Many of the CRU numbers tie in to GHCN numbers as follows: the GHCN number is in the form CCCWWWWWDDD where CCC is the country code, WWWWW is the WMO number and DDD is the station – for example, nearby sites (but different) sites can have the same WMO number. GHCN DDD identifiers seldom get out of single digits. CRU identifications of WWWWWD tie to GHCN numbers for 1782 sites. As seen below, many sites can be identified with GHCN sites, but not without a further concordance. CRU says that they have a look-up table, but failed to disclose it. I’ve requested it.
Thirdly, and somewhat unbelievably, the CRU identifications are not unique. In one case, there are 6 stations with an identical ID number. Perhaps there is some still undisclosed list and they’ve delivered it in non-working form for some reason of their own. IF there is no proper list, then I have no idea how they can define a look-up table that functions for non-unique ID numbers. For my own attempt at a concordance, I’ve added a duplicate number for each group of sites with overlapping ID numbers so that the new ID number is unique. (I wasted a considerable effort before I figured out that they had non-unique ID numbers – imagine.)
After doing this, in order to make a concordance, for each CRU site unmatched in a first pass, I then selected GHCN stations that were within 1 degree latitude and within 1 degree longitude and had the first 6 letters identical. If there was only one, I declared a match and assigned the GHCN number. This didn’t match as many sites as could be matched, but reduced the unmatched sites to about 354 sites.
I then made an ASCII tab-separated table in which I wrote down the CRU station plus lat, long and altitude and GHCN stations within a degree plus the same information and manually inspected the stations. I probably could have figured another semi-automatic method of reducing it further, but I also wanted to inspect the matches. In many cases, there was a fairly obvious match with the previous method failing due to multiple candidates or spelling variations. In this way, I added 175 matches, getting up to 3959 matches (a little under 96%), leaving 179 unmatched.
Here are ASCII files listing all CRU stations together with proposed GHCN identifications (ID, name, lat, long shown from GHCN) and the unmatched list. All Unmatched These are ASCII tab-separated and can be opened in Excel or read in R.
The distribution of unmatched stations is really very strange – and I add here, that, for each country specified below, I’ve double checked manually against the GHCN inventory to confirm that there was at least one unmatched CRU station from that country. In total, I identified 29 countries where there were CRU stations that were not present at GHCN, including surprisingly stations from Canada, Australia and even the U.S. Here is a list of countries with at least one station that is unmatched in GHCN:
Argentina: CRU had quite a few stations not at GHCN.
Australia: nearly all match, but two CRU stations didn’t match GHCN – Maryborough and Brisbane Airport. Why these?
Austria – quite a few stations not at GHCN
Bolivia – a couple didn’t match
Brazil – a couple didn’t match
Canada – quite a few stations not at GHCN. I noticed a duplicate GHCN for Parry Sound, which is near Toronto and which occurs in two alter egos in GHCN.
Chile – a couple didn’t match. A couple were called “UNKNOWN” in the CRU list. Perhaps they are connected to the UCAR “Bogus Stations”.
China – quite a few stations not at GHCN
Denmark – a couple didn’t match
Dominica – a couple didn’t match
Germany – one didn’t match
Finland – one (Kuopio) didn’t match
Greenland – possibly a couple didn’t match
Guinea – one didn’t match
Iran – one didn’t match
Ireland – one may not match (Phoenix Park)
Israel – a few don’t match
Italy – a couple don’t match
Kyrgyz republic – one doesn’t match
Netherlands – a couple don’t match
Norway – a few don’t match
Oceania — a few don’t match
Peru – one doesn’t match
Sweden – a few don’t match
Syria – several don’t match
Taiwan – quite a few don’t match
UK – a couple don’t match (Kirkwall, Wick)
USA – about 25 don’t match e.g. Moroni, Lahontan
Russia – a couple may not match
IT is quite weird to see these oddball stations crop at CRU. I’m sure we’ll quickly track down where Moroni and Lahontan and their ilk come from, but it doesn’t seem to be GHCN.
With these results in mind, let’s review the history of CRU excuses as to why they should not be required to disclose information under the FOI Act – and it’s taken slightly over a year and many letters and appeals to even get this station list. Their original refusal CRU stated that the data was already located at GHCN as follows:
Datasets named ds564.0 and ds570.0 can be found at The Climate & Global Dynamics Division (CGD) page of the Earth and Sun Systems Laboratory (ESSL) at the National Center for Atmospheric Research (NCAR) site at: http://www.cgd.ucar.edu/cas/tn404/ Between them, these two datasets have the data which the UEA Climate Research Unit (CRU) uses to derive the HadCRUT3 analysis. The latter, NCAR site holds the raw station data (including temperature, but other variables as well). The GHCN would give their set of station data (with adjustments for all the numerous problems). They both have a lot more data than the CRU have (in simple station number counts), but the extra are almost entirely within the USA. We have sent all our data to GHCN, so they do, in fact, possess all our data.
In accordance with S. 17 of the Freedom of Information Act 2000 this letter acts as a Refusal Notice, and the reasons for exemption are as stated below
In response to a further request trying to pin them down, they stated that “more than 98%” of CRU data and the remaining 2% was collected under confidentiality agreements.
Our estimate is that more than 98% of the CRU data are on these sites. The remaining 2% of data that is not in the websites consists of data CRU has collected from National Met Services (NMSs) in many countries of the world. In gaining access to these NMS data, we have signed agreements with many NMSs not to pass on the raw station data, but the NMSs concerned are happy for us to use the data in our gridding, and these station data are included in our gridded products, which are available from the CRU web site. These NMS-supplied data may only form a very small percentage of the database, but we have to respect their wishes and therefore this information would be exempt from disclosure under FOIA pursuant to s.41. The World Meteorological Organization has a list of all NMSs.
Obviously, none of this justified not providing a list of stations, but that has taken another 6 months. In connection with the supposed confidentiality agreements, as reported previously, Doug Keenan asked for the countries with which there were confidentiality agreements that restricted access and was told:
I have done some searching in files – all from the period 1990-1998. This is the time when we were in contact with a number of NMSs. We have also got datasets from fellow scientists and other institutes around the world. All supplied data (eventually and sometimes at cost), but we were asked not to pass on the raw data to third parties, but we could use the data to develop products (our datasets) and use the data in scientific papers. It is likely that some of the NMSs and Institutes have changed their policies now – and that the people we were corresponding with (all by regular mail or fax) are no longer there or are in different sections. The lists below don’t refer to all the stations within these countries, nor to all periods, but to some of the data for some of the time.
Germany, Bahrain, Oman, Algeria, Japan, Slovakia and Syria
Scientists/Institutes (data for these countries)
Mali, India, Pakistan, Poland, Indonesia, Democratic Republic of the Congo (was Zaire), Sudan and some Caribbean Islands.
These are the only ones I can find evidence for. I’m sure there were a few others during the 1980s, but we have moved buildings twice since 1980.
Not sure how you will use this data.
Above I summarized the countries for which there are stations that are not matched at GHCN. Remarkably, these include virtually none of the countries where Jones said that they had received data subject to confidentiality agreements – so that the confidentiality agreement excuse cannot apply for any of these countries. And for each of the countries for which Jones said that there was a confidentiality agreement (Bahrain, Oman, Algeria, Japan, Slovakia, Mali, India, Pakistan, Poland, Indonesia, Zaire and Sudan), I was able to cross-identify all CRU stations with GHCN identifications so that the confidentiality excuse didn’t affect anything.
At this point, the only unmatched stations which would appear to be covered by a reported “confidentiality agreement” are about 6 stations in Syria (about half at GHCN) and one German station (Wahnsdorff). Otherwise there is no valid excuse for not disclosing this station data. Of course it is possible that Jones has confidentiality agreements with Canada and the Australia, but was embarrassed to report them and thus omitted them in the above list. We’ll see.
It is disappointing that the pretexts for not providing the data previously have turned out to be untrue. However, it should be possible to now develop a reasonable concordance for the CRU stations to GHCN where applicable and to identify provenances for the oddball stations to make a concordance up to a very small number of stations – at which analysis can begin.