Occasionally I will take a trip after much careful planning and preparation, only to find myself going off into uncharted territory soon after embarking on my adventure. That is what happened to me recently when I started to take a fresh look at worldwide station coverage. Where I ended up and what I found when I got there was incredibly surprising.
It all began last week when GISS released their global mean summary for April, 2008. Following this release I went to view their global maps to get an idea as to where the “hot” and “cold” spots were last month. I viewed the data using both a 1200km and a 250km smoothing radius. Doing so helped me gauge the station coverage and the extent the 1200km smoothing algorithm estimates temperatures over the vast unsampled swaths of the planet.
It occurred to me that it would be interesting to compare April 2008 with April 1978 using a 250km smoothing radius. I was looking for “holes” in 2008 station coverage not present in 1978. I selected 1978 for two reasons. One was that the worldwide station coverage was near its peak that year. The second reason was that 1978 fell in the 1951-1980 30-year base period for calculating anomalies.
My thought was to identify multiple stations within a hole that were still reporting data today but were not being captured by GHCN. I wanted to see if the data from those stations supported the anomaly estimated by the 1200km smoothing. The 250km smoothed plots would be ideal for visually identifying holes. Here are the plots for April 1978 and April 2008:
There were lots of holes to choose from: Russia, China, Australia, Canada, Africa, and South America. I decided to start with Russia as I already knew where to look for recent temperature data from “discontinued” GHCN sites: meteo.ru. But first, I had to locate some stations to examine.
Looking at the April 2008 plot, the hole to the northeast of the Caspian Sea seemed like a good place to start. I went this time to the station data page at GISS and simply clicked my mouse on the map to the northeast of the Caspian Sea. GISS gave me a list of stations – sorted by increasing distance from where I clicked. At the top of the list was Kurgan, so I decided to go there first.
Wikipedia says Kurgan “is the administrative center of Kurgan Oblast, Russia; one of the oldest cities in Siberia.” The view from Google Earth indicates it is pretty remote as well, but apparently has a population of 310,000 (according to the GISS data page).
GHCN records for Kurgan extend from November 1893 to April 1990. These are actually comprised of three scribal records: (0) November 1893 to December 1989, (1) May 1929 to December 1989, (2) January 1931 to April 1990. Because I grabbed the data from the GISS website I will refer to the records as GISS.0, GISS.1, and GISS.2 respectively. Remember, however, that GISS takes the data from GHCN.
I was hoping that the Meteo record for Kurgan would match one of the three GISS records. What I had forgotten was that the Meteo records were of daily readings rather than monthly averages. This meant I was going to have to calculate monthly averages for Meteo before I compared it with the GISS records. It is at this point my journey took an unexpected turn.
The Meteo records have three daily temperature records: Min, Mid, and Max. The Mid value is described simply as “Daily air temperature”. I have not been able to find out when that value is recorded each day or how it is otherwise calculated. However, one thing that is certain: Mid does not represent the average of Min and Max. In fact, many of the early records only include Min and Mid. In the Meteo record for Kurgan, Mid records are available from November 1893 to December 2005. Following is a plot of that record:
I calculated the monthly averages using the Mid values in the Meteo record. I then compared this monthly record with GISS.0 and found they very closely match in the months that overlap. The values for just nine months differ by 0.1, likely due to rounding differences. Another eleven monthly records not present in the GISS record were present in the Meteo record. I went back to the Meteo record and found that in ten of those months, one or two days were flagged as having a quality issue. The quality issue turned out to be a Mid value that was lower than the Min value, so in the case of GISS.0, the entire month’s worth of data was discarded when just one or two data points were suspect. Interestingly, the GISS algorithm later creates an estimate for the missing month when calculating the annual average!
With the exception of June 1967 (which is missing from the GISS record) and the fact that the GISS record ends in December 1989, I was able to use the Meteo Mid data to reproduce GISS.0 for Kurgan.
Max data values begin appearing in the Meteo record May 1, 1929. I happened to notice that GISS.1 also begins with May 1929. On a whim, I decided to calculate the monthly averages using the daily averages in the Meteo record when both the Min and Max values were present. To my surprise, this variant of the Meteo record matched the GISS.1 record!
At this point I have not been able to determine whether or not GISS.2 is also derived from the same record, but it is likely that it is not. Clearly, however, GISS.0 and GISS.1 are derived from the same record. If you recall, the GISS algorithms will combine the two derived records using the “bias method”, which assumes that one record is biased warmer or lower than the other record. Here is a plot of the difference between the Meteo record calculated using Mid values and the Meteo record calculated using the average of Min and Max. Can you determine the relative bias?
There are several points to be made here:
- GISS (from GHCN) ultimately uses the Meteo record twice. In GISS.0 they use the “Mid” values from the record. In GISS.1 they use the average of Min/Max where possible. Those two variations of the same record are then combined with a third record GISS.2 whose origin is unknown to me at this time.
- The bias method is used to combine GISS.0 with GISS.1 (and GISS.2). The bias method assumes that one record is running warmer or cooler than the other, and adjusts one of them accordingly. In the case of the Meteo record Mid is cooler than the average of Min/Max most of the time, but not always, and not by a constant amount. The bias method is an inappropriate method for combining these records.
- GHCN throws out an entire month’s worth of data when just one or two day’s are suspect. This is done rather than estimating the suspect days. In doing so, GHCN has left it to GISS to come back later and estimate the temperature for the entire missing month.