Earlier this year I did a post on the amount of estimation done to the GHCN temperature record by GISS before generating zonal and global averages. A graphic I posted compared the amount of real temperature data with the amount of estimation over time. To read the graphic, consider 2000 as an example. As of February 7, 2008 there were 3159 station records in the GHCN data with an entry for the year 2000. Of those station records, 62% were complete and an annual average could be fully calculated. Another 29% were incomplete, but contained enough monthly data that the GISS estimation method kicked in. The final 9% were so incomplete that no estimation could be done.
What I did not explore at the time and would like to look more closely here is the accuracy of the estimation. One would hope with so much infilling going on that the accuracy would be rather high (I will leave the determination of “high accuracy” for a later time). Because I did not have real data to compare with the GISS estimations, I took another approach. I used the GISS method to estimate real temperature data as if that data were missing.
Recall that GISS never explicitly estimates missing monthly temperatures. What they do is estimate seasonal averages when one monthly temperature is missing but the other two are present. Similarly, an annual temperature can be estimated when one seasonal value is missing but the other three are present. Using this methodology GISS can estimate an annual temperature when as many as six monthly values are missing.
While no explicit monthly estimate is recorded by GISS, it certainly can be derived from the seasonal estimate. I have shown several times a one-line equation that exactly reproduces the GISS seasonal estimate. Leaving a subsequent derivation as an exercise for the reader, the implied monthly estimate can be found from that equation and is expressed as follows:

where the average values for A, B, and C are calculated from all valid entries for the given month in a particular station record.
Now to test the estimation accuracy. In Connecticut, December 2006 was warmer than normal, but February 2007 was colder than normal. Looking at the records for Hartford, CT, we see the following monthly and seasonal temperatures:
Dec 2006: 3.3
Jan 2007: -0.3
Feb 2007: -4.6
DJF: -0.5
If the December 2006 record were missing from Hartford, GISS would estimate a value of -0.7 C, which would yield a seasonal average of -1.9 C. Similarly, if February 2007 were missing, GISS would estimate it at 1.7 C and produce a seasonal average of 1.6 C. That’s a 4.0 degree miss for Dec, a 6.3 degree miss for February, and a 3.5 degree swing at the seasonal level.
The winter of 06-07 in Connecticut was a bit of an oddball. I really wanted to know what the typical error looked like. To do that, I performed the same calculation on all GHCN v2.mean records.
Continue reading →