In my post The Accidental Tourist I discussed the relationship between the Russian Meteo daily temperature record for Kurgan and two of the GHCN records for that same weather station. One surprise I learned was that GHCN discarded an entire month’s worth of data when a single data point was suspect. Doing so left GISS estimating the missing month in order to calculate an annual average.
The daily records from Meteo provide an opportunity to test the accuracy of the GISS estimation algorithm. They also give an indication as to how readily data is dropped from the record, and perhaps a little bit of hope that the accuracy of the historical record can be improved.
In the referenced post I noted that the “GISS.0″ record for Kurgan was derived from the Meteo record’s “Mid” values. Furthermore, I had found that there were eleven months in the Meteo record with a single suspect daily record that caused the entire month to be dropped from the GISS.0 record. For this particular effort I started by focusing on those eleven months.
In order to compare the GISS.0 estimate with the actual Meteo record, I needed to be able to do two things.
First, GISS does not record the estimated monthly value – they continue to report it as “999.9″. Instead, they record an estimate for the seasonal average and the annual average. To determine the monthly estimate I needed to have enough other data points available to reverse-calculate the monthly estimate. Of the eleven months in question, nine of them had sufficient data available for a reverse-calculation. Here are those nine months:
Second, I had to determine what value to assign, or estimate, for the suspect data points in the Meteo record. In the case of the data points I was interested in, all were flagged as having a Mid value that was either lower than the Min or higher than the Max. This fact left me with four fairly straightforward options:
- Ignore the day and calculate the average over the remaining days in the month.
- Use the Mid value anyway.
- In place of the Mid value, use the Min or Max value flagged as being inconsistent with the Mid value.
- Interpolate the value using the previous and next day Mid values.
Some may ask why I did not have a fifth option, which would be to use the mean of Min and Max. The reason is that for five of the months in question, Max values were not available.
I decided to try all four options and see what the effect was on the monthly average. Here is a side-by-side comparison:
For each month, my choice as to handle the day that had the problem data is highlighted in red. Here is the rationale behind those selections:
- June 1963 – the 24th was flagged because the Mid value of 27.8 was higher than the Max value of -36.5. I concluded the sign of the Max value was transcribed incorrectly – a common error that I have seen many times in the quality control outputs from GHCN. I decided it was appropriate to keep the Mid value.
- June 1967 – no dates were flagged. I have no idea why this month was dropped from GHCN. I decided it was appropriate to keep the Mid value.
- For the remaining months, it seemed to me likely that the Min and Mid values were inadvertently swapped during transcription. Use of interpolation or simply ignoring the day altogether seemed excessive. For the most part, the difference between one method and the other was not terribly large.
I then compared my results with the GISS estimates:
As one can readily see, the GISS algorithm did a pretty good job with October, 1960.
There are 1063 months in GISS.0 that have a valid (non-999.9) temperature record. It is sensible to ask whether or not adding nine more months of valid records has a material effect on the overall record. After all, those nine months affect a total of just nine years, and to a much lesser degree than the monthly effect. It is near impossible to perceive the difference when plotting the two series together, so what I did instead was plot their anomaly trends:
Thus, in the case of Kurgan (not necessarily the general case), replacing a small number of estimates with actual data reduced the slope of the warming trend a small amount.