GHCN Updates

In its land surface temperature calculations, NASA GISS is little more than a distributor for NOAA GHCN. As Gavin Schmidt explained, they spend no more than 0.25 man-years on this product, which permits negligible (if any) quality control. Although there are 7280 stations in the GHCN network, only a fraction of these occur as up-to-date records in GHCN (or their distributor), even from countries where up-ro-date information is easy to locate (e.g. Canada, New Zealand). I’ve discussed this bizarre failure to update station data on many occasions in the past.

Aside from seemingly poor practice, a further issue for me and others is whether some kind of bias has been introduced into the system by the discontinuity. Assessing this potential bias is not a small job and one that NOAA and NASA should obviously have done long ago and existing literature does not do so.

An obstacle to assessing such a bias has been the lack of any clear description of the update program. In my recent review of the data, I noticed a couple of scraps of information and have now constructed what I believe to be a reasonable accurate inventory of the stations that are included in the GHCN update program.

Peterson et al (BAMS 1997) reported the following update procedure:

Of the 31 sources, we are able to perform regular monthly updates with only three of them (Fig. 5). These are 1) the U.S. HCN, 1221 high quality, long-term, mostly rural stations in the United States; 2) a 371-station subset of the U.S. First Order station network (mostly airport stations in the United States and U.S. territories such as the Marshall and Caroline Islands in the western Pacific); and 3) 1502 Monthly Climatic Data for the World stations (subset of those stations around the world that report CLIMAT monthly code over the Global Telecommunications System and/or mail reports to NCDC). Other stations will be updated or added to GHCN when additional data become available, but this will be on a highly irregular basis.

When they said that the “other stations” would be updated on a “highly irregular basis”, they were not joking. In the 11 subsequent years, as far as I can tell, there have been no such updates. We’ve already collated information on the 1221 USHCN sites, but the GHCN website unfortunately doesn’t provide any lists of either the 1502 MCDW stations or 371 First Order stations.

Last fall, included in the NASA GISS archive were two tables named “mcdw.tbl” and “sod.tbl” (insert urls). The two tables consisted of columns of id numbers without any descriptors, but the first table had 1522 rows and the second table had 371 rows, matching the information in Peterson et al 1997. When I noticed this, I realized that this was a guide to the stations in the GHCN update program.

Even with this guide, it proved to be a lot of work actually reconciling this information to the inventory of NASA GISS/GHCN station id numbers, as the tables matched to a point, but only to a point. Many numbers in the 2nd column of these tables matched parts of the GHCN station identifications, but there were many sites that were only partial matches. In some cases, the GHCN station seemed to match the number in the first column better than the second column. I got about 95% matched sem-automatically but then had a bunch of left over numbers for which I sought matches in a variety of ways. I inserted oddball id numbers at this website (http://weather.gladstonefamily.net/site/42599 ) and got names; then I checked partial name matches in my GHCN station inventories to see if the id number resembled any of the numbers in the two tables.

I got close enough that I wanted to finish the enterprise and eventually wasted a lot of time on it, but I’ve emerged with a plausible list of identifications here: http://data.climateaudit.org/data/station/ghcn/update.dat .

There appears to be a considerable overlap (149 stations) between the two lists (mcdw.tbl and sod.tbl), reducing the total number to 1724 stations. Of the 1724 stations, only 1058 stations have values in 2008.

Although there are no USHCN stations in this part of the network, there are 138 US stations in the non-USHCN update, almost entirely from airports, including large airports like Phoenix, Los Angeles, Houston etc. If you want a list, you can examine them as follows:

info=read.table(“http://data.climateaudit.org/data/station/ghcn/update.dat”,sep=”\t”,header=TRUE)
temp=(info$country==425)
temp1=(info$end_raw>=2008)&!is.na(info$end_raw)
info[temp,]

The ID numbers in this table tie into the ID numbers in the giss_info collation that I’ve already posted up (which contains some additional GISS information on these sites e.g. altitude, (incorrect) population etc.

Country codes are here and can be read as follows:

url=”ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.country.codes”
ccode=read.fwf(url,widths=c(4,30),colClasses=c(“numeric”,”character”))
names(ccode)=c(“id”,”name”);
ccode$name=gsub(” +$”,””,ccode$name)

NASA GISS makes an effort to adjust for UHI. As noted elsewhere, their procedure in the US is considerably different than their procedure outside the US, as, in the US, the USHCN network, whatever its warts, provides many stations that are not in urban airports. There is scattered information on non-US stations, but this is clearly a major lacuna at present (and I’m sending a copy of this list to Anthony in the hope that this topic will interest him – which seems likely given the interesting recent posts at his site on Verhoyansk in Russia). The Russian (and western China) sites seem like excellent first targets as they have a lot of leverage in overall land-based aggregates.

24 Comments

  1. Gary
    Posted Nov 24, 2008 at 3:05 PM | Permalink

    It should be pointed out that some of the 1221 USHCN stations have been closed in the last decade or so. The Surfacestations project had identified at least ten of these as of November 2008 with still about half the stations to survey.

  2. Posted Nov 24, 2008 at 4:02 PM | Permalink

    If USHCN is not used in GISS, what is it being used for? What is the point of a separate “high quality” network? Or, am I not reading this correctly?

    Steve: It gets incorporated about a year after the fact. I’ll make this clear in the post.

  3. chris
    Posted Nov 24, 2008 at 6:42 PM | Permalink

    I am suprised they have no recent data for Auckland Airport (6014) or Raoul Island (6029). The attached site (http://www.metvuw.com/index.php ) shows they do daily raidosonde ballons at Raoul as well as most of the other NZ sites. Raoul should be very interesting as the troposphere data should show whether there has been AGW heating as predicted by the models.
    For Auckland, they do the radiosonde at the the old airport at Whenuapai but there is plenty of other data from the current airport at Mangere. The current data from many of the sites is here There should be data for Campbell Island but I don’t know what site it is at.

    I wonder why Wellington didn’t make their sites. Maybe they took your criticisms to heart and have deleted it as lost

  4. Norm
    Posted Nov 25, 2008 at 12:55 AM | Permalink

    Like the multiple proxies buggering up the data in “Can’t See the Signal For the Trees by Willis Eschenbach on November 23rd, 2008”, is it not also possible that having so many stations in the US and in particularly warm locations also results in a forcing of the data toward the temperatures in those locations? I read about the land temperatures, and land + ocean temperatures, but there’s no just plain ocean temperatures. I believe that ocean temperatures should be the best indicator of GW, as they are all done by satellite with no undo corrections required. The land temperatures could easily be corrupted by UHI effects, whereas the ocean temps only go back to 1979, they should still show the rapid increases that Al G. and Hansen are complaining about starting in 1988.

    • jae
      Posted Nov 25, 2008 at 7:01 PM | Permalink

      Re: Norm (#4),

      I believe that ocean temperatures should be the best indicator of GW, as they are all done by satellite with no undo corrections required.

      You might like to read Roger Pielke, Sr’s current post

  5. Geoff Sherrington
    Posted Nov 25, 2008 at 4:44 AM | Permalink

    For Australian land, there is both a high quality network a few years old and a larger, older network with over 1,000 stations, some going back to the 1860s for temp. These can be bought or donated to collecting authorities. The problem is to discover which adjustments have already been made and by what method, when you study new global maps.

    It would not surprise me if the data sent from Australia to Hadley and USA is further processed, such as by considering surface sea temperatures nearby. As an outcast who dared to question, I have received no satisfactory reply beyond the type “We have no control over the changes to data made by later users”. I do not even know if NOAA uses the high quality data or the older set. I’m not in the club.

    • James Lane
      Posted Nov 25, 2008 at 6:32 AM | Permalink

      Re: Geoff Sherrington (#5),

      I, similarly, made some enquiries about the provenance of Australian temperature data, and got nowhere. This was years ago, before the birth of CA. I was an interested in a paper that linked the recent Australian drought to AGW. I don’t now have the details, but I think Karoly was the lead author. I was puzzled as the the raw data from the rural stations (most of those listed in the OP) didn’t seem to show any any warming trend at all. I emailed the junior author who replied that the Australian temp data had been supplied by the Australian Bureau of Meteorology.

      Much later it occurred to me that the BOM data might be hadCRU stuff parroted back. I certainly don’t know this to be the case, but I might now have a closer look at the Australian data.

  6. Nicholas
    Posted Nov 25, 2008 at 8:41 AM | Permalink

    (insert urls)

    I think you forgot to insert them…

  7. Kieran
    Posted Nov 25, 2008 at 12:19 PM | Permalink

    Looking at the UK, there appear to be 10 up-to-date stations included.

    5 in Scotland
    4 in England
    1 in Northern Ireland (at an airport)

    There is no up-to-date non-airport location south of Scotland.

  8. Posted Nov 26, 2008 at 8:30 AM | Permalink

    Steve, I wouldn’t say this exercise was a ‘waste of time’ – that file is very useful.
    In the Eastern half of Australia (E of 137 long.) there are about 20 sites in use; only 3 of these are ‘rural’ and these 3 are all at airports.
    This raises a very basic question – I thought that the UHI effect of urban sites was supposed to be compensated for by nearby rural sites? But what happens if there are no nearby sites? Could it be that no compensation is done currently? That seems to be the case for a few sites I have looked at, Adelaide, Brisbane, Canberra, Dublin (Ireland). The files for before and after homogeneity adjustment are the same for 2008. But past temperatures are adjusted and – you’d never guess – usually downward, by a tapering scale by as much as 1 degree for Adelaide and Dublin, 0.6 for Brisbane (though slightly up for Canberra). This seems to create a false warming effect. Do you understand this? Could it be that past temperatures were adjusted downward when there were nearby rural sites, but not now that there are none?

    Kieran, I have set up a thread on the message board to discuss UK sites.

  9. Steve McIntyre
    Posted Nov 26, 2008 at 9:15 AM | Permalink

    #10. The change in the character of the data bsse around 1990 is something that bothers me, since it introduces a pointless potential inflection. Given the billions being spent on climate, you’d think that NASA, CRU, NOAA, GHCN could actually keep the GHCN collection up to data, rather than getting so reliant on suburban airports.

    Maybe it doesn’t matter. However, in less time than they’ve spent “proving” that that they don’t need to update the rural records, they could have collated the non-airport information from the big countries as a start: Canada, Russia, Australia, China, Brazil,..

  10. Steve Carson
    Posted Nov 26, 2008 at 12:24 PM | Permalink

    Not directly on topic, but I’ve been following this subject with great interest for some time and I’m thinking of putting together a temperature website.

    Can anyone point me to the best resource for where the GISS/GHCN weather stations are actually located, what area they are presumed to represent and how to get the current and any previous month’s summary data (mean temp, min temp, max temp)? Would really appreciate the assistance. Thanks.

  11. Deep Climate
    Posted Nov 29, 2008 at 1:52 PM | Permalink

    Steve,
    As you know I’ve been looking at the GHCN monthly data directly. The counts I have are a little higher than shown in your post and comments.

    For 2008 (from latest 2008-11-29 archive):
    1233 station records (including 134 U.S. stations)

    For 2007 (from 2008-11-22 archive):
    1247 station records

    I’m working from various versions of the GHCN monthly dataset found here (there were at least four this month):
    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z

    I’m not doing any programming as such with this data – just using grep to split out the year of interest, and then browsing, linking to station lists, sorting and analyzing from there.

    FWIW, #3:

    GHCN data for New Zealand Jan-Oct, 2008

    50793780000 -43.48 172.52 CHRISTCHURCH
    178 -9999 151 121 71 68 65 60 96 108

    50793844000 -46.70 168.55 INVERCARGILL
    149 -9999 132 105 72 72 57 60 95 97

    50793987000 -43.95 -176.57 CHATHAM ISLAN
    149 -9999 143 124 99 91 86 84 103 102

    50793994000 -29.25 -177.92 RAOUL ISLAND,
    224 -9999 -9999 217 184 173 163 166 169 177

    None of the four have data for Feb. and Raoul is also missimg March. But the recent months are there – don’t know if they made it into GISS.

  12. Steve McIntyre
    Posted Nov 29, 2008 at 2:13 PM | Permalink

    I agree that the present GHCN has 1233 stations in 2008 (though many stations are very spotty). I’ll have to check the script for the number shown above – I might have used the number of GISS dset1 stations – Hansen excludes some GHCN stations (though this distinction should have been clear if that’s what I did). I won’t have time to check this for a few days.

  13. Deep Climate
    Posted Nov 29, 2008 at 3:45 PM | Permalink

    #12:
    The GHCN data is here:
    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/

    The mean, mix and min datasets go back to 1880, though. The station data contains long/lat co-ordinates, as well as an implicit country “code” (e.g. 425 = U.S.). The country codes are in a sepaarte file in that same ftp directory.

    It sounds like Steve has posted some of the GISS station data on this website – check the links above.

    As to what area each station “represents”, each station falls into one or another of the 5×5 deg grids, according to longitude and latitude. You can read how NCDIC calculates grid anomalies from the avaerage of each grid cell’s station data here:
    http://www.ncdc.noaa.gov/oa/climate/research/ghcn/ghcngrid.html

    Hope this helps …

  14. Deep Climate
    Posted Nov 29, 2008 at 3:50 PM | Permalink

    #14:
    Yes, agreed, that’s my only point – namely that GISS drops some currently reporting GHCN stations along the way for some reason (redundancy? missing data? QC?).

  15. Deep Climate
    Posted Nov 29, 2008 at 4:05 PM | Permalink

    I’ve checked 2007 GHCN monthly for “carryover” errors of the sort that plagued the initial GHCN October data. I checked all six spring and fall monthly transitions (i.e. Feb-Mar, Mar-Apr, Apr-May, Aug-Sep,Sep-Oct, Oct-Nov). All stations having at least one “0 transition” were identified and sorted by maximum absolute transition. The only definite carryover errors identified were for the two Slovakia stations. Here there was a whopping 8 degree rise from May to June, but identical temps in April and May. Comparison with nearby Czech Republic shows that this is not possible.

    64111903000 48.65 19.15 SLIAC
    20 26 62 106 106 190

    64111934000 49.07 20.25 POPRAD/TATRY
    14 2 39 83 83 166

    61111782000 49.68 18.12 OSTRAVA/MOSNO
    36 28 56 99 153 189

    I also checked the GHCN daily for POPRAD and that also shows a large difference from April to May. There were a lot of missing days in that file.

    But other possible candidates seem to be chance occurrences (lots of those in the tropics, of course). In particular, the rest of the temperate zone 0 transitions seem due to juxtaposition of warmer and colder months in a region, resulting in some chance 0 transitions. For example, consider the following stations in Luxembourg and France:

    62906590000 49.62 6.22 LUXEMBOURG/
    49 50 64 146 146 172
    61507222000 47.17 -1.6 NANTES
    84 92 84 145 145 170
    61507510000 44.83 -0.7 BORDEAUX/MERI
    81 97 97 156 161 189
    61507280000 47.27 5.08 DIJON
    53 64 69 147 156 185
    61507434000 45.87 1.18 LIMOGES
    56 72 71 148 139 166
    61507630000 43.63 1.37 TOULOUSE/BLAG
    67 85 90 152 159 198

    Throughout the region, March was cool and April was warm leading to small transitions between both Feb-Mar and Apr-May. The Luxembourg daily data shows only .3 deg difference between the average mean for April and May (calculated as min-max average), tending to confirm this analysis.

    By the way, the three April errors in 2008 (Resolute etc.) are still there in the latest archive. All the same, the prior incidence and impact of “carryover” appears to be quite limited, at least as far as I can see in the 2007 and 2008 data. It should still be fixed though. This may require a screening process to identify probable carryover errors, based on automation of the above procedure.

  16. Steve McIntyre
    Posted Nov 29, 2008 at 4:56 PM | Permalink

    My own take is that the information is probably OK in the national services. The priority is surely for them to find out why these errors occur rather than throwing another patch. If they find out why the errors occur, maybe they’d find out why the miss data that is plainly available.

    These people are paid to collect this data. I just don’t believe that it’s that hard a problem.

  17. steven mosher
    Posted Nov 30, 2008 at 12:07 AM | Permalink

    re 26. DC GHCN stations are dropped from GISS for a variety of “reasons” none of which is stated very clearly. You can refer to hansen 2001 to get some feel for the QC reasons but was ignored. Most of the “cleaning” happens by hand.

    One place to start is by downloading the GISTEMP code and having a look at the various readme files. also some of the input files provide some clues. If you need help holler..

  18. steven mosher
    Posted Nov 30, 2008 at 12:24 AM | Permalink

    re 16. GISS ommission are listed in Step0 input files. See the file Ts.strange.RSU.list.IN. for some of these stations certain time periods are omitted, for others, the entire period is ommitted. There are no documents that explain or allow one to understand these ommissions. here’s a partial sample:

    115624640010 HURGHADA lat,lon 27.3 33.8 omit: 0-9999
    134652010000 LAGOS/IKEJA lat,lon 6.6 3.3 omit: 0-9999
    134652360000 WARRI lat,lon 5.5 5.7 omit: 0-9999
    134652430000 LOKOJA lat,lon 7.8 6.7 omit: 0-9999
    205549450010 JUXIAN lat,lon 35.6 118.8 omit: 0-9999
    207433330002 PORT BLAIR lat,lon 11.7 92.7 omit: 0-9999
    210476960010 YOKOSUKA lat,lon 35.3 139.7 omit: 0-9999
    219415600005 PARACHINAR lat,lon 33.9 70.1 omit: 0-9999
    303824000000 FERNANDO DE N lat,lon -3.9 -32.4 omit: 0-9999
    314804440000 CIUDAD BOLIVA lat,lon 8.2 -63.5 omit: 0-9999
    403717300040 RUEL,ON lat,lon 47.3 -81.4 omit: 0-9999
    414762200010 CIUDAD GUERRERO,CHIHUAHUA lat,lon 28.6 -107.5 omit: 0-9999
    414762580020 QUIRIEGO, SONORA lat,lon 27.5 -109.2 omit: 0-9999
    414763730000 TEPEHUANES,DG lat,lon 25.4 -105.7 omit: 0-9999
    414766950010 CHAMPOTON, CAMPECHE lat,lon 19.4 -90.7 omit: 0-9999
    414767750030 CANTON, OAXACA lat,lon 18.0 -96.3 omit: 0-9999
    440785260010 ANNAS HOPE, ST.CROIX VIRG lat,lon 17.7 -66.7 omit: 0-9999
    425724910030 HOLLISTER USA lat,lon 36.8 -121.4 omit: 0-9999
    501947880000 KEMPSEY lat,lon -31.0 152.8 omit: 0-9999

    • Len van Burgel
      Posted Nov 30, 2008 at 7:00 PM | Permalink

      Re: steven mosher (#20),

      The only Australian station in your abbreviated list is Kempsey.

      Kempsey (Lat -31.08 Lon 152.08) has records back to 1882 and is still open. Temperature records are available from 1907. Kempsey is also known as Kempsey (Wide Street). Another station opened at Kempsey Airport in 2001 called Kempsey Airport AWS. It is 5km to the west further inland.

      The Bureau reports the daily observations as follows:
      “Temperature, humidity, cloud and rainfall observations are from Kempsey (Wide Street) {station 059017}. Wind and pressure observations are from Kempsey Airport AWS {station 059007}”.

      I suspect GHCN either is supplied with Kempsey Airport AWS data (or it assumes it is) and therefore because of its short record it is deleted. In addition although temperature records for Kempsey (Wide Street) go back 100 years, there are 10 years listed as missing mostly in the 1961-1990 period. That may also be a problem.

  19. steven mosher
    Posted Dec 1, 2008 at 2:00 AM | Permalink

    re 21, A short record or records with lacuna would not be deleted prior to processesing as the processing has step to “overcome” these issues by combining stations within geographical areas according to the ‘reference station method’
    (either Karl or Peterson et al ) this is a step0 function in gistemp I believe ( from memory… impaired memory since I have suffered through reading gistemp more than the FDA recommends)

    • Len van Burgel
      Posted Dec 1, 2008 at 4:56 AM | Permalink

      Re: steven mosher (#22),

      Steven, I don’t disagree with you, but I thought it instructive to try to work out why it could be that such a long period record is rejected. If I get time, I will look at any other Australian records rejected.

  20. steven mosher
    Posted Dec 1, 2008 at 7:43 AM | Permalink

    re 23. ok.

2 Trackbacks

  1. […] Bad Practices 4 12 2008 Perhaps now the problems with disappearing weather stations and slow or non-existent updates of GHCN weather data can be explained. The U.N. appears to be ineffective at managing basic science data gathering. – […]

  2. […] temperature record is not as simple and well anchored as one might wish. There are indications that the record is incestuous and less robust than advertised (not the legions of scientists […]