Hansen's Station Lists

The section of Hansen’s code that we’d been looking at immediately prior to the dump of relatively unannotated code was how Hansen combined scribal versions of different stations in a 2-column case – which looks to contain a material error already discussed without the benefit of source code and which is going to be examined further. We had just started trying to figure out how Hansen dealt with multiple series. This appears to be considered in the Step 1 program comb_records.py, which employs a subroutine get_best in which station versions are ranked accorded to provenance as follows:

‘MCDW': 4, ‘USHCN': 3, ‘SUMOFDAY': 2, ‘UNKNOWN': 1

I am unaware of any mention of this ranking procedure in Hansen et al 1999, 2001. Hansen mentions MCDW and U.S. First Order (which seems to be mostly ASOS) as follows:

Second, updates of the GHCN data covering the most recent several years include only three component data sets [Peterson and Vose, 1997]: (1) up to about 1500 of the global MCDW stations that report monthly data over the Global Telecommunications System or mail reports to NCDC, (2) up to about 1200 United States Historical Climatology Network stations, which are mostly rural; (3) up to about 370 U.S. First Order stations, which are mostly airport stations in the United States and U.S. territories in the Pacific Ocean. Third, the update for the final (current) year is based mainly on MCDW stations

We’ve talked about USHCN in the past, but not much about MCDW or First Order (ASOS) networks. Hansen provides three lists that pertain to these three networks (the lists occurring in both Step 0 and Step 1 files):

ushcn.tbl
mcdw.tbl
sumofday.tbl

The first table ushcn.tbl is a concordance with 1221 rows (equal to the number of USHCN stations), linking USHCN identification numbers with GHCN station.inv numbers (carried forward into the GISS station.inv. ) This is the second such concordance of USHCN and GHCN identifications numbers to appear online – an earlier concordance being posted here http://www.climateaudit.org/data/ushcn/details.dat . (I’ve not compared the concordances yet.) So it’s nice to see GISS’ contribution. I checked the 1221 station IDs in ushcn.tbl for inclusion in the 7364 ids in GISS station.inv and all were included.

MCDW
The next list mcdw.tbl is more problematic. There are 1502 stations in the mcdw.tbl list (which dates from 1998 BTW), which is consistent with the number in Hansen et al 1999. The USHCN list was a concordance of USHCN numbers with GHCN numbers. The MCDW table also appears to be a concordance, but there are a couple of big differences with the USHCN series where the USHCN ids reconciled to USHCN listings and the GHCN ids rconciled to the GISS station.inv ids. In this case, neither happened.

Although 1301 GHCN ids in the concordance matched GISS station.inv ids, a total of 201 identifications did not – raising a couple of questions: where did these new IDs come from? what is their purpose? how are they used?

Also where do the 9-digit “MCDW” numbers come from. There doesn’t appear to be an online concordance of MCDW ids. There are individual reports at http://www1.ncdc.noaa.gov/pub/data/mcdw/ . In these reports, 5-digit WMO numbers are used; the first 3-digits are country codes (which differ somewhat from other lexicons.) There is a 4th digit in the concordance, which is usually 0. Again I don’t know the function. In the USHCN case, there was a list of stations in the network: if someone can identify the provenance of the MCDW numbers in the Hansen concordance, I’d appreciate it.

I’ve spot checked some individual GISS records back to MCDW publications and have been able to trace some individual values back.

The MCDW stations appear to be primarily airports, including many international airports. For the ROW, these constitute the lion’s share of information since 1990.

First Order/ASOS
The other new list is sumofday.tbl, which also appears to be a concordance of GHCN numbers of other identifications. There are 371 rows in this concordance – consistent with the number of First Order stations referred to in Hansen et al 1999. These are hourly-stations, most of which are ASOS since the 1990s (see post on the HO-83 thermometer).

Again, about 20 ids in this table do not reconcile with any GISS station.inv numbers. Also I’ve been unable so far to locate a data inventory in which all the station codes in the other side of the concordance could all be located.

AGU has specific policies on data provenance. Had Hansen observed these protocols, the exact digital source of the data would be specified – something that hasn’t happened so far, although they’ve indicated an attempt to improve their documentation.

17 Comments

  1. Anthony Watts
    Posted Sep 9, 2007 at 11:25 PM | Permalink

    Steve MCDW stands for “Monthly Climatic Data for the World” and appears to be part of a regular CD ROM that is published with that data and made available for purchase. I’ve purchased that CDROM awhile back, it used to be sold under the name “World Weather Disk” by a private company that took NCDC data and compiled it, but since then NCDC has offered a subscription service for a CD ROM.

    I found a reference for the data format of the lists on that CD ROM here

    http://www.cs.indiana.edu/sudoc/image_30000056083458/30000056083458/adobe/READPUB.PDF

    A station list exists, and I think I can get to it. Give me a bit of time.

    In the meantime this link (if you haven’t found it already) may shed some valuable light on the variety of datasets in existence that can be acquired on CD ROM

    http://www.ncdc.noaa.gov/oa/documentlibrary/surface-doc.html

    Lots of descriptions of datasets worldwide on the link above.

  2. Anthony Watts
    Posted Sep 9, 2007 at 11:37 PM | Permalink

    OK I have the station list. Unfortunately, since this is a WMO undertaking, no cross reference exists to GISS, but one may be able to be made by doing station name and country matching.

    I have it on the surfacestations.org web site at this URL

    http://gallery.surfacestations.org/main.php?g2_view=core.DownloadItem&g2_itemId=27020

    I hope this helps you.

  3. Posted Sep 10, 2007 at 12:08 AM | Permalink

    While I was doing some digging, I came across this map from NCDC:

    GHCN stations with mean temperature as of the year 1900.

    The USA data clearly makes up the bulk of the last century’s worth of mean temperature data. And the where’s Waldo search has few candidates for ROW that span 100 years. More detail described in this report:

    http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td9100.pdf

  4. rafa
    Posted Sep 10, 2007 at 12:51 AM | Permalink

    Re:2

    For Spain, 20 out of the 33 MCDW stations are airports. I have to check it yet but I think the remaining 13 are located at army facilites. Best

  5. rafa
    Posted Sep 10, 2007 at 1:02 AM | Permalink

    forgot to mention all the MCDW are in the GHCN

  6. Phil
    Posted Sep 10, 2007 at 2:40 AM | Permalink

    #2 From:

    ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt

    IV. FORMAT OF “ghcnd-stations.txt

    ——————————
    Variable Columns Type
    ——————————
    ID 1-11 Character
    LATITUDE 13-20 Real
    LONGITUDE 22-30 Real
    ELEVATION 32-37 Real
    FIPS 39-40 Character
    STATE 42-43 Character
    NAME 45-74 Character
    GSNFLAG 76-78 Character
    HCNFLAG 80-82 Character
    WMOID 84-88 Character
    SOURCEID 90-97 Character
    ——————————

    These variables have the following definitions:

    WMOID is the World Meteorological Organization (WMO) number for the station. If the station has no WMO number, then the field is blank.

    SOURCEID is an additional identification number for this station supplied in the original source dataset.

    Maybe this will help.

  7. Geoff Sherrington
    Posted Sep 10, 2007 at 5:57 AM | Permalink

    I have been wondering if eyeballing was a technique used to adjust raw data. Is it here?

    “The procedure for duplicate elimination with mean temperature was more complex. The first 10,000 duplicates (out of 30,000+ source time series) were identified using the same methods applied to the maximum and minimum temperature data sets. Unfortunately, because monthly mean temperature has been computed at least 101 different ways (Griffiths 1997), digital comparisons could not be used to identify the remaining duplicates. Indeed, the differences between two different methods of calculating mean temperature at a particular station can be greater than the temperature difference from two neighboring stations. Therefore, an intense scrutiny of associated metadata was conducted. Probable duplicates were assigned the same station number but, unlike the previous cases, not mingled because the actual data were not exactly identical (although they were quite similar). As a result, the GHCN version 2 mean temperature data set contains multiple versions of many stations. For the Tombouctou example, the 6 source time series were merged to create 4 different but similar time series for the same station (see Figure 1).”
    National Climatic Data Center
    DATA DOCUMENTATION FOR DATA SET 9100 (DSI-9100)

  8. EW
    Posted Sep 10, 2007 at 6:14 AM | Permalink

    I wonder if the Czech station 11438 is in the dataset and if it is combined. If you look at Sinan Unur’s page, there are in fact two different places under the same WMO No.

    The “historical” Schossl village (Strelna in Czech) with data from 19th century located at
    50°39’58.49″N,13°44’55.59″E

    However, the present 11438 is situated at the coal power plant Tusimice, which is here:
    50°23’9.57″N,13°20’4.56″E

  9. EW
    Posted Sep 10, 2007 at 6:15 AM | Permalink

    Forgot the link.

  10. Bob Koss
    Posted Sep 10, 2007 at 8:08 AM | Permalink

    Here is a list of IDs for Global Summary of the Day stations.
    ftp://ftp.ncdc.noaa.gov/pub/data/gsod/ish-history.txt

    It cross references Air Force DatSav3 station numbers with WBAN station numbers if one exists.

    Here is a list of the two letter country codes that goes with it.
    ftp://ftp.ncdc.noaa.gov/pub/data/gsod/country-list.txt

  11. Bob Koss
    Posted Sep 10, 2007 at 8:11 AM | Permalink

    Here is that same GSOD file in xls format ftp://ftp.ncdc.noaa.gov/pub/data/gsod/ish-history.xls

  12. Bob Koss
    Posted Sep 10, 2007 at 8:16 AM | Permalink

    I believe the DatSav3 and WMO numbers are the same.

  13. Bob Koss
    Posted Sep 11, 2007 at 2:39 AM | Permalink

    Steve,
    The SumofDay.tbl file left-hand column contains Weather Bureau, Army, Navy WBAN ID numbers. An XLS file can be found half way down this page. http://mi3.ncdc.noaa.gov/mi3_reports
    Haven’t located any other format.

    I found files listing all the IDs for the middle column that don’t match GHCN v2.temperature.inv.txt or Giss station_list.txt. I came up with 40 of them. I’m listing them below by the file in which they were found. Many of them seem to match other locations, but the IDs are slightly different. Haven’t located any data files for them though.

    http://www1.ncdc.noaa.gov/pub/data/climvis/ghcn/ATL-OCEAN/TEMP.ATL-OCEAN.INV

    13601 42278016000 1
    11630 42178535003 0
    11641 42178526000 1

    http://www1.ncdc.noaa.gov/pub/data/climvis/ghcn/IND-OCEAN/TEMP.IND-OCEAN.INV

    70701 14661967000 1

    http://www1.ncdc.noaa.gov/pub/data/climvis/ghcn/PAC-OCEAN/TEMP.PAC-OCEAN.INV

    40309 51691408000 1
    40504 51691348000 1
    40505 51691334000 0
    40710 51691376000 0
    40604 51691366000 1
    41406 51691212000 1
    22701 51691066000 1
    41606 51691245000 0
    41415 51691217000 0
    21504 51991285000 0
    22514 51991178002 0
    22519 51991176001 0
    22516 51991190000 0
    22521 51991182000 0
    61705 51691765000 0
    40308 51691413000 1
    22536 51991165000 0
    22501 51991162001 0

    http://www1.ncdc.noaa.gov/pub/data/climvis/ghcn/US/TEMP.FLORIDA.INV

    13889 42572206004 3

    http://www1.ncdc.noaa.gov/pub/data/climvis/ghcn/v2.comp.srt

    94850 42572743006 2
    13739 42572408009 1
    93721 42572406004 0
    14895 42572521007 0
    14821 42572428002 0
    12960 42572243004 0
    24233 42572793005 0
    13891 42572334009 1
    93134 42572297005 3
    14745 42572605003 0
    14923 42572544007 0
    14847 42572734003 0
    14840 42572636002 0
    94846 42572530003 0
    14852 42572525003 0
    94847 42572537005 1
    14848 42572535004 0

  14. Bob Koss
    Posted Sep 11, 2007 at 3:31 AM | Permalink

    I found a comprehensive text file of WBAN stations.
    ftp://ftp.ncdc.noaa.gov/pub/data/inventories/WBAN.TXT

    Here is the file to explain the format.
    ftp://ftp.ncdc.noaa.gov/pub/data/inventories/WBAN-FMT.TXT

  15. Steve McIntyre
    Posted Sep 11, 2007 at 7:38 AM | Permalink

    #13, I located many of the same places where one could track down numbers – but that doesn’t explain (1) why these id numbers materialize in HAnsen’s program? 2) how HAnsen’s program recognizes them 3) if they collected theIDs through manual trawling through the literature, that should be in a manual; 4) why there stations out of the population?

  16. Bob Koss
    Posted Sep 11, 2007 at 1:43 PM | Permalink

    Sorry Steve, I have no answers for your questions. Maybe they are just left over from something they thought they might use. I did find out where the MCDW IDs come from. So I might as well put it up.

    MCDW.tbl file IDs(left column) are a combination of the CountryCode and IndexNbr found in this tab file.
    ftp://ftp.wmo.int/wmo-ddbs/Pub9volA070907.flatfile

    Here is the file layout.

    http://www.wmo.int/pages/prog/www/ois/volume-a/9ALayoutGuide9805.html

    Here is the Code Table used.

    http://www.wmo.int/pages/prog/www/ois/volume-a/9ACodeTables9805.html

  17. Steve McIntyre
    Posted Sep 11, 2007 at 2:08 PM | Permalink

    #!6. Good spotting, as usual.

Follow

Get every new post delivered to your Inbox.

Join 3,317 other followers

%d bloggers like this: