A CA reader, John Goetz, has written in with a mind-numbing puzzle related to Waldo in Bagdarin, Siberia, where Waldo appears to have experienced an adjustment unusual even by the standards of NASA chiropractors.
This is still fairly preliminary so keep that in mind. John G tried to replicate a really simple step in the NASA calculation: the combination of different series at the same station; and encountered problems. He initially experimented with Erbogacen station and more recently with the Bagdarin station. I’ve posted up a data set showing the three source versions (dset-0) and the combined version (dset=1) at http://data.climateaudit.org/data/giss/bagdarin.dat.
The three Bagdarin versions (222305540000, 222305540001 and 222305540002) do not appear to be from different stations as their values are identical for many periods. The first series goes from March 1938 to April 1990; the second series from Jan 1951 to December 1990 and the third version from Jan 1987 to Apr 2007 (and probably has been updated). During the period of overlap of all 3 series from 1987 to 1990, every value is identical. For the period from May 1980 to 1987, all the values of the first two versions are identical. During the period from Jan 1951 to 1980, the first two versions diverged substantially with the difference having a standard deviation of 1.14 deg C .
If I were trying to combine these records, my instinct would be that the exact identity of the series in their overlaps indicates (1) that they are all versions of the same thing; (2) there is no basis for preferring version 1 or version 2 where they differ; so I’d be inclined to simply average the available values of the three versions.
The graphic below shows the differences between the NASA-combined version and the individual series and the average. The first thing that you notice is the opposite noisiness of the first two series in the 1960s; the standard deviation of the differences is 1.14 deg C – which is not a small amount if one has no reason to prefer one series to the other. If you look at the bottom panel, you can see that the two versions are averaged in some way – fair enough.
The second thing that you can see (which is what caught John’s interest) is the downward displacement of all versions prior to 1990. The amount of the displacement is about 0.3 deg C – an amount which should “matter” even to Hansen and Gavin Schmidt. John G has looked at many alternatives to try to explain the displacement. Below I outline what adjustment appears to have taken place, although developing a rational basis by which the adjustment was done is not easy and I have so far been unable to discern policy: warning, it’s very weird.
Here’s what I’ve noticed about the data (posted up here)
1) After Jan 1991, when there is only one version (version 2), the GISS-combined is exactly equal to that version.
2) For December 1990 and earlier, Hansen subtracts 0.1 deg C from version 2, 0.2 deg C from version 1 and 0.3 deg C from version 0.
If I average the data so adjusted, I get the NASA-combined version up to rounding of 0.05 deg C. Why these particular values are chosen is a mystery to say the least. Version 1 runs on average a little warmer than version 0 where they diverge ( and they are identical after 1980). So why version 0 is adjusted down more than version 1 is hard to figure out.
Why is version 2 adjusted down prior to 1990 and not after? Again it’s hard to figure out. I’m wondering whether there isn’t another problem in splicing versions as with the USHCN data. One big version of Hansen’s data was put together for Hansen and Lebedeff 1987 and the next publication was Hansen et al 1999 – maybe different versions got involved. But that’s just a guess. It could be almost anything.
Here’s what Hansen et al 1999 said that they were doing – a statement which NASA spokesman Gavin Schmidt says is adequate to replicate their results and anything further would be spoon feeding :
Two records are combined as shown in Figure 2, if they have a period of overlap. The mean difference or bias between the two records during their period of overlap (dT) is used to adjust one record before the two are averaged, leading to identification of this way for combining records as the “bias” method (HL87) or, alternatively, as the “reference station” method [Peterson et al., 1998b].
The adjustment is useful even with records for nominally the same location, as indicated by the latitude and longitude, because they may differ in the height or surroundings of the thermometer, in their method of calculating daily mean temperature, or in other ways that influence monthly mean temperature. Although the two records to be combined are shown as being distinct in Figure 2, in the majority of cases the overlapping portions of the two records are identical, representing the same measurements that have made their way into more than one data set.
A third record for the same location, if it exists, is then combined with the mean of the first two records in the same way, with all records present for a given year contributing equally to the mean temperature for that year (HL87). This process is continued until all stations with overlap at a given location are employed. If there are additional stations without overlap, these are also combined, without adjustment, provided that the gap between records is no more than 10 years and the mean temperatures for the nearest five year periods of the two records differ by less than one standard deviation. Stations with larger gaps are treated as separate records.
I’ve tried to replicate this verbiage in a couple of different ways to see if it yielded the NASA-combined series and got nothing near their result. It would be interesting to check their source code and see how they get this adjustment, that’s for sure.
Are there similar problems in other series? I haven’t evaluated this yet. John G said that he had first noticed the combining problem at another Russian station (Erbogacen), so there appear to be at least two stations with this type of problem. In this case, the error, if it is an error, imparts a bias increasing recent values relative to older values. If the problem is more widespread, is there a systemic bias (as we found with the USHCN data) or does it cancel out? Questions for another day. (Note: I just did the same calculation for Erbogacen and there again seems to be a downward bias, but not as much as Bagdarin. )
Hansen cites the fact that Phil Jones gets somewhat similar results as evidence of the validity of his calculations. In fairness to Hansen, while they have not archived code, they have archived enough data versions to at least get a foothold on what they are doing. In contrast, Phil Jones at CRU maintains lockdown anti-terrorist security on his data versions and has even refused FOI requests for his data. None of these sorts of analyses are possible on CRU data, which may or may not have problems of its own.