USHCN "Raw" – A Small Puzzle

During the past few days, I’ve been assessing the GHCN-Daily dataset, which is a very large data set and plan to do a number of posts on this topic, including a description of the data set. It turns out that literally hundreds of stations that expire around 1989 or 1990 in the NASA data set are alive and thriving in the GHCN-Daily parallel universe. More on this over the next few days.

Before I get to this, I’d like to document a small puzzle in connection with the calculation of the USHCN “raw” monthly average that arose out of inspection of the GHCN-D data, a puzzle that takes us back to the Detroit Lakes MN station, the investigation of which led to the identification of Hansen’s Y2K error.

I’m not saying that these small puzzles necessarily or even probably “matter” in terms of world averages, but they are relevant in terms of craftsmanship and I presume the following: if one is going to the trouble of making these large temperature collations, the the craftsmanship should be as good as possible. By commenting on issues pertaining to craftsmanship, I am not imputing malfeasance, as some perpetually agitated commenters allege. However, as far as I can tell, no one – journal peer reviewer, NASA peer reviewer (if such existed), rival climate scientist, skeptic – ever seems to have gone to the trouble of parsing through the actual craftsmanship of the large temperature calculations and I see no harm and some benefit in doing so. It looks like NASA is paying some attention and has already implemented a couple of recommendations made at CA. (I’ll have some similar suggestions in the near future.)

The GHCN-D dataset contains daily max and mean information for nearly all the USHCN stations, as well as thousands of ROW stations. (I’ll discuss some discrepancies in the USHCN station lists on another occasion). The GHCN-D data set is available in a huge zipped file, but is also available on a station-by-station basis. Most of the identification codes are different than GHCN-M (and thus GISS), but I’ve managed to create a concordance of over 3300 station identifications – and do not preclude the possibility of further gains. I’ve created time series of monthly means for these 3300 or so GHCN-D series. As a first cut, I simply took a monthly average of available values, without requiring a minimum number of values to constitute an average (which I usually do and would probably do if I re-run the results.) I then calculated the monthly mean as the average of the mean monthly minimum and mean monthly maximum – in some cases, there would be different numbers of measurements.

The figure below shows the difference between the USHCN “raw” monthly mean and my calculation from daily information for a station (Kalispell MT) with an excellent match. There are rounding differences, but the two versions clearly reflect the same provenance. In this case, I presume that the small spike differences result from some procedural difference in calculation of monthly averages. While the differences appear attributable to rounding, the differences are not truly random: there are far more +0.1 differences than -0.1 differences, but this is unrelated to time.

ghcnd6.gif
Figure 1. USHCN Raw monthly (NOAA) minus monthly average calculated from GHCN-D. (I was at Kalispell airport once in the late 1970s and had an amusing experience there.)

Next here is the same plot for Detroit Lakes MN, a station which had a puzzling jump around 2000 in the original NASA version (a jump that could be attributed in part to the Y2K error.) This particular error has now been patched by NASA. In this case, the tracking looks very similar to the Kalispell tracking from 1950 to about 1980. But in the late 1990s-2000s, the USHCN Raw version (and thus the downstream versions) jumps up relative to the average calculated from GHCN-D daily information. Why is this?
ghcnd5.gif
Figure 2. As Figure 1, but for Detroit Lakes MN.

I parsed through about 40 such plots, most of which were in between Kalispell and Detroit Lakes in appearance. But there were a couple of oddballs: here’s one. It looks like the USHCN Raw version must be spliced from two different GHCN-D stations, with values after 1980 or so from the present station and earlier values from some other station.

ghcnd8.gif

Here’s a station (Dillon MT) which has a somewhat similar appearance of being spliced – only this time, it looks like the USHCN station is drawn from the GHCN-D data set prior to 1980 and perhaps some other related source after 1980.

ghcnd7.gif

The puzzle that needs to be resolved is the exact relationship between the USHCN “Raw” and GHCN-D data. If this can be sorted out, then NASA could make a substantial gain in the timeliness of their reporting.

GHCN-D versions of USHCN stations are current through early March 2008. Right now NASA’s USHCN data is only current to March 2006 – the date of the most recent GHCN update. Following a CA suggestion, NASA is moving to make its USHCN stations more current by adopting the USHCN (NOAA) source, which is more current than the versions at GHCN-M or CDIAC,

However, the GHCN-D data is truly current. NASA already uses “raw” USHCN data for its current results, using a patch to splice each station to the FILNET version used for historic values. If monthly averages calculated from GHCN-D data were used instead of GHCN-M data, then NASA could report USHCN stations right through to February 2008 (and keep current) instead of the current system of being up to two years out of date for USHCN stations. (A better system would be for NASA to write NOAA and ask them to update the USHCN data set on a monthly basis, which should be trivial to program and dispense with the patch altogether.)

Gaining two years in report timeliness for USHCN stations is a small thing but worth doing. In some forthcoming posts, I’ll discuss how NASA can gain nearly 20 years in reporting timeliness for many international stations.

51 Comments

  1. MarkW
    Posted Mar 5, 2008 at 9:58 AM | Permalink

    The fact that an outsider has so little trouble finding so many problems may or may not indicate malfeasance. It does however strongly hint at incompetence.

    BTW, malfeasance does not necessarily imply evil intent, just a gross disregard of the standards of your profession.

  2. steven mosher
    Posted Mar 5, 2008 at 10:05 AM | Permalink

    Keeping current.

    In the past NASA has always posted a monthly update, like UAH and RSS and HADCRU, are NASA going to avoid the monthly update? Shrugs. I hope not. I was counting on following this cold
    year month by month.

  3. LadyGray
    Posted Mar 5, 2008 at 10:10 AM | Permalink

    Climate Science is at much the same state as electronics were about a hundred years ago. Eventually there will probably be an organization similar to the IEEE (which was started by a couple of engineers in their basement) to establish what standards should be applied. We need a group of various scientists, held together by a few business managers, to start up an organization of volunteers to establish what the standards are, how they should be applied, and recommendations for assessing data. Their findings would probably be published on the web, and be downloadable for offline reference. They would not be interested in the data itself, but be more interested in methods of collection, archiving, and data analysis. Interpretation of the data would be left to others. This goes beyond simply auditing what occurs, though that is a valuable function of determining where standards should be applied.

  4. steven mosher
    Posted Mar 5, 2008 at 10:32 AM | Permalink

    A long while ago JerryB and I had a discussion about the rounding rules of daily and monthly.
    It’s OT here, but I recall noting that the rounding rules would appear to bias things upwards.
    The trend, most, likely is not going to be impacted by this since its a constant practice over
    time. But I have noted in the past that I get minor differences between daily and monthly.

  5. Steve McIntyre
    Posted Mar 5, 2008 at 11:01 AM | Permalink

    #4. Noted, but I don’t want people to start discussing rounding issues.

  6. MarkW
    Posted Mar 5, 2008 at 11:15 AM | Permalink

    LadyGray,

    Any standards should be those that science in general should follow. Keeping your data and methods open. etc.

    mosh,

    round and round we go.

    steve,

    you didn’t mean that kind of rounding, did you?

  7. Posted Mar 5, 2008 at 11:20 AM | Permalink

    Steve writes,

    It looks like NASA is paying some attention and has already implemented a couple of recommendations made at CA.

    IMHO, it constitutes professional plagiarism for NASA to implement Steve’s suggestions without some form of recognition. This doesn’t even have to be on the NASA webpage, but it wouldn’t kill Gavin or Reto to post a comment on CA to the effect that even though they usually disagree with most of what Steve says, he had a good point with regard to such-and-such, that procedures have been / will be revised accordingly, and thanks for the suggestion(s).

  8. Posted Mar 5, 2008 at 11:39 AM | Permalink

    Steve writes,

    It turns out that literally hundreds of stations that expire around 1989 or 1990 in the NASA data set are alive and thriving in the GHCN-Daily parallel universe.

    Even more curious are the “zombie” stations that are dead and buried, yet still continue to crank out adjusted data.

    A case in point is Delaware OH, which hasn’t had a daily observation since 1/01 and was officially closed in 5/03. However, this doesn’t stop CDIAC from continuing to provide annual average readings with “final” adjustments through at least 2005!

    See my obit, with graphs, on Surfacestations.org, at http://gallery.surfacestations.org/main.php?g2_itemId=5278. Does NASA use any of these living-dead zombie stations?

  9. EW
    Posted Mar 5, 2008 at 11:52 AM | Permalink

    #8

    How to? Where the CDIAC gets the data?

  10. Posted Mar 5, 2008 at 12:04 PM | Permalink

    EW (#9) asks where CDIAC gets its data.

    According to the useful USHCN adjustment documentation at
    http://cdiac.ornl.gov/epubs/ndp/ushcn/ndp019.html, USHCN is developed and maintained by NCDC and CDIAC in cooperation with each other, so that CDIAC evidently has as much claim to be the official USHCN outlet as NCDC. Their data is available at http://cdiac.ornl.gov/epubs/ndp/ushcn/usa_monthly.html .

  11. John Lang
    Posted Mar 5, 2008 at 12:29 PM | Permalink

    This is always a good chart to keep in mind when talking about USHCN data.

  12. Jeff C.
    Posted Mar 5, 2008 at 12:47 PM | Permalink

    Being from Orange County I’ve investigated the Newport Harbor record before and stumbled on the discrepancy you note. According the the NCDC USHCN station history file, data from “Newport Harbor” prior to 1981 is actually from Avalon on Catalina Island. This seems very strange as the Newport Beach station had data available back to at least the 1930s.

    I don’t see how you can splice together data from Avalon with Newport Beach.

    1)Newport Beach and Avalon are at least 25 miles apart and separated by the ocean
    2)Newport Beach is on the South shore of mainland Orange County, Avalon is on the North shore of Catalina Island
    3)Newport Beach is on flatland with no significant hills or mountains for at least 10 miles, Avalon is in a sheltered cove at the base of of a string of peaks reaching up to 2000 feet.

    I’d sure like to know why Avalon was used prior to 1980 when data from the the Newport Beach site was available.

  13. Jeff C.
    Posted Mar 5, 2008 at 1:00 PM | Permalink

    Follow up to previous comment-

    USHCN Newport Harbor
    1909-1981 NWS Coop Station Avalon/Avalon Pleasure Pier, station number 0395
    1981-present NWS Coop Station Newport Beach Harbor, station number 6175

  14. Bernie
    Posted Mar 5, 2008 at 1:04 PM | Permalink

    That is one nasty chart. Do you have anything on its provenance?

  15. Bernie
    Posted Mar 5, 2008 at 1:11 PM | Permalink

    Oops
    #11 John Lang
    That is one nasty chart. Do you have anything on its provenance?

  16. EW
    Posted Mar 5, 2008 at 2:00 PM | Permalink

    Hu,

    actually my question was more about the source of data (measurements) from the stations that is effectively dead. 😉 Some mystical currents from the defunct line to the formerly active MMTS? Teleconnection? New station under the same codename?

  17. Don Keiller
    Posted Mar 5, 2008 at 2:11 PM | Permalink

    Steve “the differences are not truly random: there are far more +0.1 differences than -0.1 differences”
    Don’t you think that this is interesting bearing in mind that the odd +0.1C here and here could make a significant difference to the +0.7 or so “increase” over the last century?

  18. JerryB
    Posted Mar 5, 2008 at 2:12 PM | Permalink

    Bernie,

    See: http://www.ncdc.noaa.gov/oa/climate/research/ushcn/ushcn.html

  19. Barclay E. MacDonald
    Posted Mar 5, 2008 at 2:13 PM | Permalink

    A contribution to making NASA’s job easier, its data collection more efficient, or to correcting analytical errors is a contribution to all of us.

  20. Demesure
    Posted Mar 5, 2008 at 2:42 PM | Permalink

    Maybe a mere coincidence but in the Newport Beach graph, it’s like they try to cool the 1900-1945 warming and warm the 1945-1977 cooling. A sort of hockey-sticking of modern temperatures.
    Steve, is it possible to have statistics for thoses discrepancies (% of uniform, upward, downward differences) ?

  21. Anthony Watts
    Posted Mar 5, 2008 at 3:22 PM | Permalink

    I’ll point out that Newport Beach is odd in many ways, not the least of which is non-standard equipment, plus non-standard observing height and roof placement.

    It is viewable here:

    http://gallery.surfacestations.org/main.php?g2_itemId=670

  22. Joe Black
    Posted Mar 5, 2008 at 3:25 PM | Permalink

    JerryB says:
    March 5th, 2008 at 2:12 pm

    Bernie,

    See: http://www.ncdc.noaa.gov/oa/climate/research/ushcn/ushcn.html

    This page seems to indicate that an urban adjustment is made to the USHCN data. Would any GISS urban adjustments be in addition to this? When did NOAA start this adjustment?

  23. Kenneth Fritsch
    Posted Mar 5, 2008 at 4:22 PM | Permalink

    Re: http://www.climateaudit.org/?p=2830

    This page seems to indicate that an urban adjustment is made to the USHCN data. Would any GISS urban adjustments be in addition to this? When did NOAA start this adjustment?

    It has been my impression that GISS uses the USHCN Filenet version (before the Urban adjustment) and does its own urban adjustment using the Hansen satellite lighted index. Look for the Karl references at the USHCN site to determine when and how USHCN makes their urban adjustments.

  24. Gary
    Posted Mar 5, 2008 at 9:15 PM | Permalink

    Frankly, I’m confused by all the source-agency versions and need a picture. Can anyone draw a simple flowchart showing who takes what from whom and the modifications made between nodes? TIA.

  25. Jeff C.
    Posted Mar 5, 2008 at 9:52 PM | Permalink

    One last point about the Newport Beach Harbor record, the Avalon/Newport Beach splice makes no sense from a UHI perspective.

    Due to Avalon’s island location, the only population in a 25 mile radius is that on the island – about 3600 people. Newport Beach on the other hand, has most of Orange County and a significant portion of Los Angeles County within 25 miles – on the order of 3 million people.

    So pre-1981 data is rural and post-1981 data is about urban as it can get despite the fact the Newport Beach station had continuous data back to 1921. No worries, I’m sure it’s been adjusted out.

  26. Brooks Hurd
    Posted Mar 6, 2008 at 5:14 AM | Permalink

    RE: 7

    Hu,

    I advise you not to hold your breath waiting for Gavin or others to acknowledge CA’s help to NASA.

  27. Posted Mar 6, 2008 at 8:11 AM | Permalink

    Re #8, to NASA’s credit, I have found, on digging deeper, that although the defunct station at Delaware OH is in GHCN (#42500332119) and GISS (#42572428004), GISS (unlike CDIAC) reports no monthly data for it after it stopped reporting altogether in 1/01.

    There is still a problem that it is missing a lot of data after 1996, but at least there was some real data during that period.

  28. novoburgo
    Posted Mar 6, 2008 at 10:21 AM | Permalink

    Re: Newport Beach. This is a completely bogus placement for the temperature sensor. It’s just off the SE corner of a flat roof and no more that 3-4 above the thermals rising from that roof. A prevailing westerly wind (coming from the bay) has to traverse the width of the roof before reaching the sensor. An idiot couldn’t have done a worse job (or could it have been a genius with an agenda?)!

  29. Bob Koss
    Posted Mar 6, 2008 at 2:22 PM | Permalink

    Hu,

    RE 27
    According to this link station 332119 is still reporting. The inventory and history files were last modified in 2005. The data files have data through 2007. Could this be simply another case of GHCN and GISS inexplicably dropping the station in 2000? Is there some other reference to the station indicating it being closed?

  30. steven mosher
    Posted Mar 6, 2008 at 7:31 PM | Permalink

    Wheres Waldo

  31. Paul Penrose
    Posted Mar 6, 2008 at 11:21 PM | Permalink

    Bernie,
    That chart is coming directly from the NOAA website, so I assume that NOAA created it.

  32. Geoff Sherrington
    Posted Mar 7, 2008 at 12:06 AM | Permalink

    # 30 steven mosher – superb youtube.

  33. John A
    Posted Mar 7, 2008 at 3:08 AM | Permalink

    Steve:

    By commenting on issues pertaining to craftsmanship, I am not imputing malfeasance, as some perpetually agitated commenters allege.

    I don’t impute malfeasance to issues of craftmanship in climate science when a perfectly clear case of incompetence coupled with overwheening arrogance will produce the same result. Its clear to me that one of the key historical purposes of climate science as the final port of academia for people not bright enough for the hard sciences yet apparently overqualified for a career in the private sector, has not actually changed over time.

    Lest you think that this is simply my own biased opinion (whose isn’t?) I watched Richard Lindzen make the same point in a television interview.

  34. Posted Mar 7, 2008 at 8:06 AM | Permalink

    Bob Koss (#29) writes,

    RE 27
    According to this link station 332119 is still reporting. The inventory and history files were last modified in 2005. The data files have data through 2007. Could this be simply another case of GHCN and GISS inexplicably dropping the station in 2000? Is there some other reference to the station indicating it being closed?

    According to MMS, it has been inactive since 1/01 and was officially closed 6/03. CDIAC’s USHCN monthly and daily data page at http://cdiac.ornl.gov/epubs/ndp/ushcn/newushcn.html has good daily data through 1996, then spotty data in 1997, getting worse in 1999 and 2000, finally ending 1/30/01. But this doesn’t stop it from having “monthly data” through 2005, when CDIAC’s coverage generally stops.

  35. John Goetz
    Posted Mar 7, 2008 at 8:52 PM | Permalink

    Steve, I don’t think I understand exactly which two datasets you are comparing for Kalispell.

    One is (I believe) http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/hcn/42500244558.dly.

    The other I thought was the raw GHCN plus USHCN corrected data available from GISS. However, when I create an average from the daily version and compare it to the GISS “raw” version, I do not get nearly the correlation you get. My calculated averages for the earlier years is higher – sometimes 0.5 C or more – than the GISS version. Thus, I think I am looking at a second dataset that is different from the one you are looking at.

    Can you post a link to both Kalispell datasets that you are comparing?

  36. John Goetz
    Posted Mar 7, 2008 at 9:07 PM | Permalink

    OK, I think you must be comparing with the monthly data buried in http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z.

    I get mostly excellent matches. Where I seem to have discrepancies are the months with one or more missing daily values. That leads me to believe that NOAA is estimating the missing values before calculating an average for the month.

    Steve: Raw (“areal”) in ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z

  37. Pierre Gosselin
    Posted Mar 8, 2008 at 4:58 AM | Permalink

    A little off the subject – sorry.
    In Hamburg, Germany a climate congress is taking place: Extreme Weather Congress.
    http://www.extremwetterkongress.de/en/beitrag.html
    Perhaps some readers here could send some climate data to cheer up these poor panicked folks.
    Send your abstracts and words of hope to:
    boettcher@extremwetterkongress.de

  38. John Goetz
    Posted Mar 8, 2008 at 4:13 PM | Permalink

    Steve,

    Do you have a pointer to a description of the file format? I don’t understand what each of the four rows per station per year means, and there are codes next to each monthly entry that I cannot find a definition for. I’ve poked through a number of files on the ftp site and cannot find an appropriate guide.

  39. John Goetz
    Posted Mar 8, 2008 at 4:22 PM | Permalink

    It seems we need a flow chart showing the sources of the temperature data and how it is averaged, adjusted, biased, combined, and gridded on its way to a global average.

    For example, do we know that the data in http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/hcn/ finds itself in ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z as the next step on its way to GISS, or does it end up in http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z as the next step? And in moving from one dataset to the other, what is being done to the data?

    I’m happy to help build that flow chart, and in particular I am happy to try to duplicate the process that turns data in one file to data in another file. Actually, this is why I am making this comment. I want to duplicate the process of turning daily into monthly prior to GISS using the data, but I don’t know which monthly to use.

  40. aurbo
    Posted Mar 9, 2008 at 12:28 AM | Permalink

    Re #39:

    John,

    The four strings of data for each station (q1 … q4) in the USHCNv1 data set contain the following versions:

    1. Raw: the data in this version have been through all quality control but have no data adjustments.
    2. TOB: these data have also been subjected to the time-of-observation bias adjustment.
    3. Adjusted: these data have been adjusted for the time-of-observation bias, MMTS bias, and station moves, etc.
    4. Urban: these data have all adjustments including the urban heat (UHI) adjustments.

    The q4 (UHI) adjustment for example contains the absurd manipulations to the NYC Central Park station referred to in an earlier CA posting and on other sites. It is yet to be explained.

    The NCDC brief descriptions to the various USHCNv1 parameters can be found by following the links from the parent USHCNv1 site. Click on the various explanatory links to the left of the US map.

    For USHCNv2, there is a hot link to USHCNv2 near the top of the page. In this version, q4, the UHI adjustment, has been eliminated. The explanation for this is that it is no longer necessary as urban effects are taken care of with a “change-point detection algorithm”. NCDC cites this procedure as a reason for not needing an urban adjustment, because as they state; “no specific urban correction is applied in HCN version 2 because the change-point detection algorithm effectively accounts for any “local” trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2.”

    How this method accounts for UHI effects is beyond my comprehension as change-point analysis is basically a statistical procedure designed to locate the most likely inflection points in a continuous stream of data. Nevertheless, the NCDC descriptions proceed to describe this procedure by citing several papers where presumably these problems are addressed. To my inadequate brain it appears to be more bafflegab than sound analysis.

    Finally, those wishing to download the latest USHCN version, try this USHCNv2 link.

  41. JerryB
    Posted Mar 9, 2008 at 4:07 AM | Permalink

    John Goetz,

    The USHCN V1 monthly file formats are described in
    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/README.TXT
    which also mentions sources of USHCN data.

    The “GHCN Daily” file collection is of recent vintage, having first been
    published in December 2006. Also, the temperature data that it contains
    are TMAX/TMIN. It is too young to be the source for either GHCN V2, or
    USHCN V1, monthly data, and could not be the source for stations for which
    TMAX/TMIN data are not reported.

    See the paper:

    Peterson, T.C., and R.S. Vose, 1997: An overview of the Global
    Historical Climatology Network temperature database. Bulletin of the
    American Meteorological Society, 78 (12), 2837-2849. (PDF Version)

    linked at:
    http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/index.php?name=temperature
    for information about GHCN V2 sources.

  42. JerryB
    Posted Mar 9, 2008 at 4:50 AM | Permalink

    John Goetz,

    A revision, and a bit more:

    “…, and could not be the source for stations for which TMAX/TMIN data
    are not reported.”

    should have been:

    “…, and could not be the source for stations for which the “official”
    TMEAN is based on something other than TMAX and TMIN.”

    Regarding USHCN data getting to GHCN, see the update to
    http://www.climateaudit.org/?p=2711

    Various parts of the US NCDC route data to various collections such as
    GHCN and USHCN. The USHCN data go through USHCN processing before they
    become “official” and go to GHCN, or get published in the USHCN
    directory, on the NCDC FTP server. Then GISS, or whoever wishes to do
    so, e.g. CDIAC, can pick them up.

  43. steven mosher
    Posted Mar 9, 2008 at 9:57 AM | Permalink

    re 42. JerryB, did you know about this list of 138 Prime USHCN stations?

    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/daily/README

    It would appear that in 1992 (H92) they selected the best 138 stations, and then later expanded that to 1062, changing the quality guidelines to gain spatial coverage.

    The selection of stations for inclusion
    in H92 was performed with the following data quality issues in mind.

    1. The degree to which each station maintained a constant
    observation time for maximum and minimum temperatures,
    excursions from a station’s predominant observing time of no
    more than four years being desired.

    2. At least 95% of a station’s pre-1951 data should be contained
    in NCDC digital daily archives.

    3. A station’s potential for heat island bias over time should be
    low.

    4. Quality assessments based upon the decile ranking assigned by
    Karl et al. (1990) to the stations’ monthly maximum/minimum
    temperature data for certain quality characteristics.

    Since the release of H92, much more work has been conducted at NCDC
    involving compilation and digitizing of daily data. However, to enable
    the compilation of a database providing better spatial coverage of the
    contiguous United States, the four station selection criteria listed
    above were not strictly adhered to in the current version of the HCN/D
    presented here.”

  44. JerryB
    Posted Mar 9, 2008 at 11:39 AM | Permalink

    steven,

    I have been aware of the 138 station collection (NPD042), and
    have saved a copy of it, but have not done anything else with it.
    I have tinkered a bit with the 1062 station collection (NDP070).

    It was in NDP070 that I first noticed my favorite “outlier”:
    128 F TMAX in McConnellsville Lock, Ohio, on January 2, 1900.

  45. steven mosher
    Posted Mar 9, 2008 at 11:56 AM | Permalink

    Somehow I knew you knew about this.

    Moving from 138 stations to 1062 by “changing a few” quality rules seems to me to be
    a step change of sorts.?

    One criteria that interested me was attempt to select stations that had minimal TOBS
    adjustments ( not in magnatude but in number) I think it would be interesting from a purely
    theoretcial standpoint to look at what trends and errors we get in the US record if we select
    Homogeneous stations at the start, rather than trying to homogenize bad stations to
    good ones and pretending that we gain “area coverage” thereby. ( I think willis and I have beat this dead horse and are ready to mumify it)

  46. John Goetz
    Posted Mar 9, 2008 at 11:58 AM | Permalink

    aurbo and JerryB

    Thank you for pointing me to exactly what I was looking for.

  47. John Goetz
    Posted Mar 9, 2008 at 12:19 PM | Permalink

    #40 aurbo

    When I look at the records for Kalispell, the “4. Urban: these data have all adjustments including the urban heat (UHI) adjustments.” records all have a small magnitude. Does this represent the amount of adjustment? It clearly does not represent the adjusted temperature.

  48. aurbo
    Posted Mar 9, 2008 at 12:49 PM | Permalink

    Re #44:

    Jerry

    The 128°F “outlier” from McConnellsville Lock OH back on Jan 2, 1900 would never have been a problem for contemporary weather plotters and analysts back then. They would have simply noted that the three nearest first order stations…Columbus OH, Parkersburg WV and Cincinnati OH all reported a max temperatures of 28°F for that date and you can guess what they plotted for McConnellsville.

    This sort of illustrates the value of what was lost when computers programmed without AI oversight replaced thinking humans. Of course computers can do a lot more and much more quickly than people, but the point is that before the computer age, with much less to do and more time to do it, weather observers on the whole were more precise and fussier about their observations then than they are today when the dependence is on machines and electronics. The McConnellsville record was clearly a typo rather than a faulty observation. Where and how it entered into the NCDC data base is a matter of speculation, but you can bet that no observer ever entered that number onto the original record.

  49. JerryB
    Posted Mar 9, 2008 at 1:20 PM | Permalink

    Re #47,

    John,

    If you are looking in hcn_doe_mean_data.Z , the fourth line
    labeled 3C is a “confidence factor”, and is not related to
    the urban adjustment. The USHCN urban adjusted mean temps
    are in a different file: urban_mean_fahr.Z .

  50. John Goetz
    Posted Mar 9, 2008 at 1:52 PM | Permalink

    Steve McIntyre,

    In your Kalispell plot, do both datasets you used to compare begin in 1896 or 1899? I ask because the monthly data I find begins in 1896, but the daily data found in http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/42500244558.dly begins in 1899. Right now I am assuming that you are ignoring the years 1896 through 1898, but it is hard to tell from the plot.

  51. John Goetz
    Posted Mar 13, 2008 at 4:12 PM | Permalink

    Steve McIntyre,

    In your Kalispell plot, do both datasets you used to compare begin in 1896 or 1899? I ask because the monthly data I find begins in 1896, but the daily data found in http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/42500244558.dly begins in 1899. Right now I am assuming that you are ignoring the years 1896 through 1898, but it is hard to tell from the plot.