As is by now well-known, CRU lost or destroyed the “original” data that went into the construction of CRU station data. This doesn’t mean that analysis is totally compromised (though it is made more difficult.)
Let me explain this through a comparison to GISS methodology. A given station may have a number of (what I’ve called) scribal versions. The GISS “dset1″ station record is a combination of scribal versions using Hansen’s reference station method – a method that has received virtually no close examination from the “community” though it has some important defects. (One such defect is its contribution to the “Great Dying of Thermometers”, which results in part from the interaction of Hansen’s reference station method with the changeover from World Weather Records provenance to CLIMAT provenance.)
The CRU station records are more or less conceptually equivalent to the GISS dset1 station record. Like GISS dset1, they are combinations of underlying scribal versions, though CRU’s method of combining versions is not the same as GISS. (It is much simpler and probably more sensible if combined with someone actually looking at the scribal versions.)
While CRU has not retained the original scribal versions, GHCN has. These undoubtedly connect to the “ur”-versions of CRU station data though CRU seems to have used data not archived at GHCN (not just some stations not at GHCN but occasionally data for GHCN stations not archived at GHCN.) Many stations at CRU survive the Great Dying of Thermometers at GISS. In some cases, this is due to not using Hansen’s reference station method (a thermometer “killer”), but, in other cases, it seems to be due to CRU using data for GHCN stations that has not been acquired by GHCN.
Merely from the point of view of craftsmanship, I think that it’s instructive to compare CRU and GISS dset1 versions for individual stations and have done so below for a couple of the first stations that I looked at – neither of which have entered into prior discussion.
The top panel is the CRU version; the middle is the GISS dset1 version; the bottom is GISS minus CRU. Values are identical from January 1961 to December 1965, but are erratically different from January 1966 to December 1970.
GHCN/GISS has values from Jan 1971 to Dec 1980 (while CRU is blank for this period), while CRU has values from Jan 1981 on, while GISS dset1 has none other than a few isolated values in 1991 – which exactly match CRU.
There are 4 scribal versions at GISS for this series which respectively cover the periods 1961-1991(but actually 1961-1970 with a few values in 1991) ; 1971-1980; 1966-1975; 1991-2007+. (I collated a version in 2007 and am using this collation – presumably updated since then to 2011.)
The overlap between the version 1991-on and the earlier versions is too short to qualify under the GISS formula and thus this thermometer “dies” at GISS. CRU’s values from 1991 on correspond to the GISS dset0 data that GISS didn’t use. It’s not that CRU had data from 1991 on that was unavailable to GISS; it’s just that Hansen’s methodology rejected the combination of the data – a discarding of data that surely merits close examination. Does this “matter”? Perhaps not, but it should offend anyone with any sense of craftsmanship.
The other interesting question here is the provenance of CRU’s data in the 1980s. Did CRU get this from WWR, getting something that was missed by GHCN? I don’t know, but this is the sort of thing that should be documented before the parties forget.
Next is a similar comparison from London, Ontario, near Toronto. In this case, there is only one GISS dset0 record, but it has no values from Feb 1932 to summer 1940 and ends in mid-1991, though London, Ontario obviously collected records subsequent to this. This particular thermometer seems to have died at GHCN rather than as a result of Hansen’s reference method.
As in the Mersing case, CRU records in a period that GHCN/GISS are missing. Why? Dunno. Unlike Mersing, there is a step difference between the two versions, with the step difference being the same order of magnitude as the trend being sought.
Parsing of other station records will show other differences. Do any of these things “matter”? If the only question is whether the 20th century is warmer than the 19th century, then no. But some people are interested in the relative timing of the warming between the first half of the 20th century and the last half of the 20th century, with Thompson et al 2008 occasioning a substantial revision to the SST record in the late 1940s, 1950s and 1960s.
A couple of days ago, Trevor Davies of East Anglia said that the purpose of releasing station data was to “pull the rug out” from under skeptics. As I mentioned a couple of days ago, it seems to me that the proper reason for archiving station data is to ensure that the most accurate possible information is available on an important statistic.
But most of all, the global temperature index has become an important record. If this were a Consumer Price Index, the statistical agency would carefully examine the nuts and bolts of the information. I don’t get the impression that either Phil Jones or Jim Hansen are much interested in actually working on the details of the temperature indices that have brought attention to their respective institutions.