Recently Anthony Watts noted that the Lampasas TX station was relocated in 2000 to an extremely poor location and attributed a hockey-sticking of the Lampasas series to this re-location. In a comparison that I made with nearby Blanco TX (which is the sort of comparison that USHCN says that they do), it seemed plausible that the move could have added over 1 deg C. Atmoz argued that the site problems had a negligible impact and that there was some presently unknown problem with the GISS algorithm, which I dubbed a UFA (unidentified faulty algorithm). In order to examine this a little more closely, I waded one more time into the swamp of temperature data, re-collating various versions of the Lampasas temperature series, eventually ending up with 19(!) different versions of the Lampasas TX temperature history. Perhaps, borrowing the language of climate modelers, we could dub these an “ensemble”.
These include various versions of the NASA GISS “raw” and “adjusted” series, scraped from the NASA website last year (causing some controversy within the climate blog world), but which now yield an interest resource for comparing these different versions.
Here’s one comparison that caught my eye and caused me to re-work the material in a little more detail. This shows the impact of the Hansen adjustment (USHCN adjustments are additional) over three stages: green- pre-identification of the “Y2K” error; red – immediately post identification of the Y2K error identification reflecting corrections implemented in Aug 2007; black – the current adjustment, which includes the various changes made (without announcement) in Sept 2007 and which caused considerable puzzlement here as we decoded them. NASA has now provided information and the present documentation while, hardly to the standards that I would recommend, are an improvement and better than the documentation for rival collations, such as CRU.
As soon as one sees this graph, one wonders: what caused the NASA-stage adjustment for Lampasas TX in the early part of the 20th century to increase by as much as 0.3 deg C in the early part of the century? By raising this question, I do not imply that all policy initiatives should be put on hold pending resolution of this matter (which seems to be an all too typical straw man response), but mere curiosity: what is it in the behavior of the algorithms that leads to this result? Why is the temperature history of Lampasas TX being thrashed around this way? And this revision is taking place in one of the best documented networks in the entire world? And the changes are not just the Y2K adjustment as the major change occurred after the initial adjustment for the Y2K error.
The 19 Versions
From time to time, I post information on Station Data sources on a permanent page (see left frame) and this should be consulted for URLs.
So far I’ve identified 4 different locations where a total of 19 different versions are archived (this does not include multiple editions of the same version). The 4 locations are NOAA, CDIAC, GHCN and NASA GISS. Some of the data handling seems rather hard to justify, but,
NOAA: NOAA is by far the most up-to-date (up to May 2007) – it has Raw, TOBS and Filnet versions. I recommend that this version be adopted by all users.
CDIAC: CDIAC’s data ends in Dec 2005. They are at least 2 updates behind NOAA for reasons that are unclear. I’ve spot-checked CDIAC versions against the corresponding NOAA versions and found them identical in the examples that I studied. In addition to the Raw, TOBS and Filnet versions, CDIAC produces an Urban-adjusted version (also to Dec 2005). Intermediate versions (SHAP, MMTS) up to early 2000 were formerly available at CDIAC but were deleted in the past year. Jerry Brennan managed to archive the vintage versions before they were deleted.
GHCN: GHCN’s version of USHCN data ends in March 2006 and are also at least 2 updates behind NOAA, also for reasons that are unclear. Other than being short-dated, the GHCN Raw version corresponds to the USHCN Raw version up to rounding. For the most part, other than being short-dated, the GHCN Adjusted version matches the USHCN Filnet (Adjusted) version up to rounding. For reasons that seem hard to understand, the two versions do not match in the late 1940s or the early 1890s, where the GHCN Adjusted version appears to be drawn from the USHCN TOBS version or some analogue. Right now, it’s hard to say – other than the versions are related, but distinct.
NASA: Now for the really thorny part. NASA went through some fairly wild gyrations last summer, making it difficult to tell what was going on at any given time. However, I think that it’s now possible to diagnose things fairly accurately. At present, their most recent data is drawn from GHCN (and thus, like GHCN, only goes to March 2006) even though data is available at NOAA up to May 2007. In addition, NASA does a bizarre adjustment in order to splice two different data sets – a completely unnecessary torturing of the data if they merely used the readily available and more up-to-date NOAA version, as I’ll show below.
Pre-Y2K: As of mid-2007, prior to the identification of the Y2K error, NASA used the vintage (2000) CDIAC SHAP version up to 2000 and the GHCN Raw version for 2000 and after. They removed any USHCN interpolations (made during the SHAP adjustment, which is included in the Filnet version BTW); instead of using the SHAP/Filnet interpolation, they used the Hansenizing interpolation that we’ve discussed elsewhere in connection with dset=1. [raise eyebrow /raise eyebrows]. The splice was particularly problematic if there was a TOBS or other adjustment in effect as at 2000 and this led to the Y2K error (and resulting bias). This process yielded the GISS “Raw” version (dset0). For USHCN stations, dset1 is equal to dset0 since there is only one record and the Hansenizing dset1 splice does not come into play. From this, they then did another adjustment – their two-legged trend adjustment to coerce results to unlit stations. What’s a little hard to understand about this methodology is why “lit” stations are being used at all, if the unlit stations are driving matters – but that’s a question for another day.
Post Y2K: After the Y2K problem was brought to their attention, they appear to have patched the above method by calculating (separately for each month) the difference between the GHCN Raw and Shap versions for the 15 years up to and including 1999 and then adjusting all data up to and including 1999 by this amount, in effect re-writing their entire USHCN data set to cope with the Y2K splicing error. I think that it would have made more sense to leave the 99% of the data in place and adjust the post-2000 data. The NASA two-legged adjustment was re-applied to the revised data, leading to revised adjustments.
Current: After these first changes were made in mid-August 2007, NASA scrubbed the modified version and issued an entirely new version, to our considerable consternation last September. It got a little hard trying to keep up, but things seem to have settled down. In their third go at this, they’ve used the CDIAC Filnet version up to December 2005, which is now combined with the GHCN Raw version (for the three further months up to March 2006). As in the first patch, they appear to have patched the discontinuity by calculating the 15 year difference by month and then applying this difference to all records prior to Dec 2005.
It’s a bizarre and goofy way of doing things on a number of counts. First and most obviously, NOAA has a Filnet version through to May 2007 that is online and easily accessible. So why even use the stale GHCN version? And since they are using GHCN – why use the GHCN raw version and calculate a patch? If they’re going that route, which is stupid, why not just use the GHCN Adjusted version?
Filnet and UHI
When Hansen applies the NASA adjustment to data, relying on “unlit” sites, the effect of the USHCN Filnet adjustment needs to be considered. This adjustment appears to do some quite odd things which we’ve talked about from time to time. There are some “good” USHCN sites and it’s a little disquieting to see some of the adjustments to the “good” sites; while I haven’t assessed the quantitative effectiveness of the TOBS adjustment or the MMTS adjustment, I can understand why this is not unreasonable in principle for a “good” site, but the other adjustments applied to “good” sites trouble me. My impression of the impact of the SHAP/Filnet adjustments is that, whatever their stated intention, they end up merely creating a blend of good and bad sites, diluting the “good” sites with lower quality information from sites that are “bad” in some (objective) sense. When this version gets passed to Hansen, even his “unlit” sites no longer reflect original information, but are “adjusted” versions of unlit sites, in which it looks to me like there is blending from the very sites which are supposed to be excluded in the calculation.
Having said all this, even if the Hansen methodology is somewhat incoherent, as we discussed last fall, there are enough decent sites in the USHCN network that the overall results are not implausible.
The real issue in all of this is the quality of information in the ROW. On this topic, there are some opposing tendencies. I’m satisfied that the Lampasas site relocation in 2000 had a measurable impact and reported temperatures and demonstrates one more time, if further demonstration were needed, that you need impeccable meta-information and “good” sites if you want to develop the best quality long-term time series. You need to work out from the best data.
There are pros and cons to the USHCN network. It is a volunteer network, leading to some peculiar locations. Of course, some of these locations are by volunteers who should know better, such as presumably Atmoz’ University of Arizona. Somehow it seems unlikely to me that Canadian or Swiss weather services would have quite the same profusion of exotic violations of WMO standards.
On the other hand, there doesn’t seem to be any shortage of exotic stories about mis-measurement in various parts of the world. Without proper documentation by the national services, it’s hard to assess the situation.
However, weather stations in other countries seem far more likely to be located at airports and stations in the MCDW network which make up the lion’s share of reporting stations in the post-1993 GHCN network used by both CRU and GISS appear to be especially so. The classic urban heat island effect, as opposed to gross violations of WMO policy, would appear to be more of a consideration in these networks than in the USHCN network. (And of course, SSTs are different story again.)
Regardless of whether these station histories “matter”, surely there’s no harm in NASA (and GHCN) adopting rational approaches to their handling of the USHCN network. To that end, I would make several recommendations to NASA:
1. Use the NOAA USHCN version rather than the stale CDIAC and/or GHCN versions.
2. Lose the splice and the patch.
3. Use USHCN interpolations rather than Hansenizing interpolations.
4. Use TOBS or perhaps MMTS, and if MMTS is used, ensure that NOAA places this online.
For GHCN, I’d be interested in knowing exactly where are they getting their data from. (It’s not impossible that the difference that I noticed here, results from inconsistent USHCN versions – so maybe there’s something going on there as well.)
Here’s the range of the ensemble (with the two versions with Y2K error removed.) The range of the ensemble in the early part of the century reaches about 1 deg C. The variation is not limitless. This range does NOT include the impact of the relocation which was the effect that we originally sought to measure – which is on top of this ensemble. I may try to update this graphic at some point when we know more about the relocation error.