Crossword Puzzle #4: Re-Visiting Y2K

In my opinion, one of the main purpose of compiling the Hansen code is to produce intermediate versions to determine what Hansen really did. Today I’m going to compare the Step 0 version of Detroit Lakes MN (familiar to many readers from the Y2K error) as produced from running Hansen’s code with a current USHCN data set to the corresponding version presently online at NASA GISS. They are materially different. In trying to figure this out, I’ve waded through the relevant code in Step 0 in which Hansen splices USHCN data into GHCN and spent a little time pondering how this particular code – prior to the recent patch – could yield the observed Y2K error. (I might add that, although Hansen thanked me for observing that they needed a patch to correct the error, it’s more precise to say that I observed that the error resulted from using two different versions – my own suggestion would have been to use a consistent data version, rather than to add another adjustment.)

In some earlier speculation, I wondered whether Hansen might not have been using a vintage version of the USHCN data. After looking through the matter some more, I’m 99% certain that Hansen used (and continues to use) a vintage USHCN version ending in 1999 – even though the online USHCN version is current up to late 2006 (more current than the corresponding GISS data.) I’ll summarize the evidence for Hansen’s continued use of a vintage USHCN version, and in the course of this description, improve the description of Hansen’s Step 0 as part of a process enabling the implementation of this step in a more current software environment.

First, the graphic below compares the Detroit Lakes version from Step 0 from Not Sure not here to the Dset=0 version currently online at NASA GISS. The results are obviously similar in recent years, but do not match in earlier years. How come?

patch2.gif

In some of the discussion in the Hansen Code thread, there have been discussions of the provenance of the USHCN noFIL version; I can resolve this as a result of investigating the above question.

The routine get_USHCN takes as input the file hcn_doe_mean_data and produces a file hcn_doe_mean_data_fil using the following routine. The name hcn_doe_mean_data is recognizable from USHCN directories and current versions are available at NOAA ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/hcn_doe_mean_data.Z (Mar 1, 2007) and CDIAC (Aug 14, 2007 prior version May 10, 2005). On various occasions, we’ve discussed the difference between USHCN “Raw”, “TOBS” and “Adjusted” versions – the selection command used here picks out the Adjusted version.

echo “extracting FILIN data”
grep -e ” 3A” hcn_doe_mean_data_fil

The USHCN “A” version contains interpolated values which they mark through an indicator M. The interpolated values can also be identified merely by setting adjusted values to NA if the raw values are NA. Hansen modifies the USHCN Adjusted version to remove the interpolated values in the function USHCN2v2.f . You can see that this function opens the input file hcn_doe_mean_data_fil and outputs the noFil file USHCN.v2.mean_noFIL – (the entire replacement operation would take a line or two in R):

open(1,form=’formatted’,file=’ID_US_G’)
open(2,form=’formatted’,file=’hcn_doe_mean_data_fil’)
open(10,form=’formatted’,file=’USHCN.v2.mean_noFIL’)

This is denoted by comment:

echo “replacing USHCN1 data in $1 by USHCN_noFIL data (Tobs+maxmin adj+SHAPadj+noFIL)”

echo ” reformat USHCN to v2.mean format”
${fortran_compile} USHCN2v2.f -o USHCN2v2.exe ; ./get_USHCN

The Hansen Y2K Offset

Now we come to the Hansen Y2K – a procedure hardly recommended for amateurs, much less “professionals”. This is described in the subroutines get_offset_noFIL and dif.ushcn.ghcn.2005.f. Hansen gets the file USHCN.v2.mean_noFIL.gz produced above and the GHCN file v2.mean (a standard issue file from GHCN- the “raw” version). The differences in the average monthly values between the GHCN and USHCN Adjusted versions is calculated and saved for each station in a file ushcn-ghcn_offset_noFIL as shown in the latter function:

open(1,file=’USHCN.v2.mean_noFIL’,form=’formatted’)
open(2,file=’ghcn_us_end’,form=’formatted’)
open(10,file=’ushcn-ghcn_offset_noFIL’,form=’formatted’)

The derivation of the file USHCN.v2.mean_noFIL was described above. The file ghcn_us_end referred to here is a U.S. subset of GHCN in which US stations are picked out by a country code 425 as shown below. (There’s a reference to end 1980 that I haven’ traced yet.)

echo “finding offset caused by adjustments”
grep ^425 v2.meany > ghcn_us
dump_old.exe ghcn_us ghcn_us_end 1980
grep -v -e -9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999 USHCN.v2.mean_noFIL > xxx

The difference between the GHCN version and USHCN version was calculated for the 26 year period from 1980-2006.

This procedure gives a reasonable estimate of the size of the Y2K step involved in the switch from using the USHCN Adjusted version in the early part of the record to the use of the GHCN Raw version in the later record and can pathc this particular error, but it doesn’t explain why the error occurred on January 2000.

Let’s proceed now to the next step in which Hansen splices the GHCN and USHCN versions. This is done in the routine cmb2.ushcn.v2.f . Consulting the input and output files, one sees that the noFIL version is an input and that the output is the vz.meanz file that is used in later workings.

open(3,file=’ushcn-ghcn_offset_noFIL’,form=’formatted’)
open(1,file=’USHCN.v2.mean_noFIL’,form=’formatted’)
open(2,file=’v2.meany’,form=’formatted’)
open(12,file=’v2.meanz’,form=’formatted’) ! output

If you parse through the Fortran code in this program, you will see that it replaces the GHCN versions of USHCN data with the USHCN (Adjusted) version in years where the USHCN data is available – with the GHCN (Raw) version providing the most up-to-date portion of the record.

The Vintage Version Hypothesis
If Hansen used the current version of hcn_doe_mean, the replacement doesn’t occur because the USHCN Adjusted data is as up-to-date as the GHCN Raw data. Also there’s nothing in this code to suggest a mechanism whereby a Y2K step would occur.

As noted above, it struck me that some of these peculiarities could be explained if one hypothesized that Hansen was using aa vintage USHCN version that ended in 1999. This would account for a variety of disconnected and puzzling results:

1) if the vintage version ended in 1999, then the GHCN raw version would take effect in 2000 and the step would occur at Y2K. If the USHCN version ended more recently, then the step would occur in a different year. So to exactly replicate Hansen’s present results, we need to get the vintage USHCN version that he is using for the current results.

2) if an vintage USHCN version was used, this would explain why the online Detroit Lakes version does not match the emulated Detroit Lakes version in the early portion. (I’ve tried to match the online version with other candidates.) If this is not the case, then there is another crossword puzzle required to explain how to get the online Detroit Lakes version.

Here we see the advantages of being able to consult actual code. Without being able to diagnose the matter through consulting the code, one could speculate endlessly on why the answers were different. Inspecting the code can rule out (for example) a plausible hypothesis of JerryB, pointing to other candidate data sets:

The hcn_doe_mean_data file can be found in the USHCN directory of the server to which you linked, and also on a CDIAC server, but that file does not include data with USHCN SHAP adjustments without FILNET adjustments. Without the USHCN.v2.mean_noFIL file, I would not expect you to get the same temperature numbers for USHCN stations that GISS gets. The temperature data contained in the USHCN.v2.mean_noFIL file presumably came from the 3A lines of the hcn_shap_ac_mean.Z file at ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/OtherIntermediates/ however, the format of those two files is apparently quite different, and converting a copy of one into a copy of the other could be quite a chore. Similar, but possibly slightly different, numbers could be found in the GHCN v2.mean_adj.Z file, which would seem to be more similar to the expected format, and with slight editing, might be “close enough’, but not an exact match. Such and effort would also require extracting only the USHCN stations, and only through year 2000 data.

Because we can trace the hcn_doe_mean data through the selection of the “3A” adjusted data, this otherwise plausible explanation appears precluded. For now, I’m hypothesizing an vintage data version.

23 Comments

  1. jae
    Posted Sep 11, 2007 at 1:44 PM | Permalink

    Perhaps a dumb question, but what the hey: Why is this 1999 data version “obsolete?” How is it different to the year 1999 from the later version?

  2. JerryB
    Posted Sep 11, 2007 at 3:42 PM | Permalink

    Steve,

    Yes, there is no question but that GISS is using a vintage version of a USHCN
    file, and that it does not have the FILNET adjustment. Reto Ruedy came close
    to stating that in one of his emails to you. I had missed the USHCN2v2.f step,
    and so had imagined that they had named that file USHCN.v2.mean_noFIL. Instead,
    it appears that it is named as if it were an ordinary hcn mean data file.

  3. JerryB
    Posted Sep 11, 2007 at 3:53 PM | Permalink

    BTW, Steve, you might try taking the hcn_shap_ac_mean.Z file from
    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/OtherIntermediates/
    renaming it to the expected name, and feeding that to the
    get_USHCN step. One caution, it includes data for the year 2000.

    jae,

    It’s not that it’s obsolete, it’s that it is different than a usual USHCN
    file, and also that it does not include data for recent years.

  4. JerryB
    Posted Sep 11, 2007 at 3:55 PM | Permalink

    Perhaps my previous BTW suggestion should have been addressed to Not Sure.

  5. Not sure
    Posted Sep 11, 2007 at 4:49 PM | Permalink

    JerryB (4) Your experiment gave a somewhat different file. It’s here.

  6. Steve McIntyre
    Posted Sep 11, 2007 at 4:54 PM | Permalink

    #2. Well, it may also be a different version of the filnet adjustment – it’s impossible to be sure on the present information.

  7. JerryB
    Posted Sep 11, 2007 at 5:03 PM | Permalink

    Not Sure,

    Thank you! It may be tomorrow before I get to reply further.

  8. JerryB
    Posted Sep 11, 2007 at 5:16 PM | Permalink

    Steve,

    In the “Quantifying the Hansen Y2K error” thread, you posted an email from Reto
    Ruedy which mentioned:

    “In 2000, USHCN provided us with a file with corrections not contained in the GHCN data. Unlike the GHCN data,
    that product is not kept current on a regular basis. Hence we used (as you noticed) the GHCN data to extend
    those data in our further updates (2000-present).”

    Based on Hansen’s 2001 paper on his adjustments, which indicated that GISS would
    prefer their own “adjustments” for missing data, I interpreted Ruedy’s statement
    to indicate that GISS received a USHCN file with SHAP, but not FILNET, adjustments.

    You are correct to say that it is impossible to be sure on the presently
    available information.

  9. JerryB
    Posted Sep 11, 2007 at 5:23 PM | Permalink

    Steve,

    Let me ask if you would repeat the Detroit Lakes comparison, but with Not Sure’s
    new result in place of his previous result.

  10. Geoff Sherrington
    Posted Sep 12, 2007 at 2:56 AM | Permalink

    Maybe a complete red herring, but some of the discontinuities I’ve seen on CA over recent months coincide with publication of Phil Jones papers, plus minus a year, enough for me to report it as noticeable. Geoff.

  11. JerryB
    Posted Sep 12, 2007 at 4:27 AM | Permalink

    New ballgame at GISS!

    I tried comparing Not Sure’s vintage Detroit Lakes with the dset=0 currently
    online at GISS, and saw how different they are. However, I noticed that the
    format at GISS had changed slightly, and wondered if more than the format had
    changed. I then compared Not Sure’s version with the version from GISS that
    I had saved on August 11th. For most months: exact matches. For most of the
    other months: a consistent difference of 0.1 C.

    I checked a few other locations for which I had saved GISS versions after
    the “Y2K change” at GISS last month, and the GISS numbers today are different
    than they were.

    It’s too early in the morning for this stuff!

  12. scp
    Posted Sep 12, 2007 at 5:08 AM | Permalink

    I’m 99% certain that Hansen used (and continues to use) a vintage USHCN version ending in 1999 – even though the online USHCN version is current up to late 2006

    Does that mean it’s “very likely”? ; -)

    I guess this would explain why giss maps has “Unadjusted 1880-1999” listed as a land data source and says “I cannot construct a plot from your input. GHCN unadjusted plots only available through 1999” when I try to plot using that data source with years above 2000.

    I can’t explain why it now says the same thing when I try to use only years during the 1900s with the unadjusted data source though. It used to let me do that.

  13. JerryB
    Posted Sep 12, 2007 at 5:48 AM | Permalink

    Some GISS files of USHCN
    stations from before whatever change at GISS occurred very recently. The file
    for Danevang, like that for Detroir Lakes, indicates mostly exact matches, or
    differences of 0.1 C, when compared with Not Sure’s results. The file for
    Boulder has larger differences ( 0.3 C or 0.4 C) for some months. I have not
    checked the others yet.

    I will venture the guesses that Not Sure has STEP0 very well emulated, and that
    the old file with the SHAP adjustments is at least very similar to the vintage
    file that GISS has been using.

    As for the nature of the most recent change(s) at GISS, I have no guesses to
    offer at this time.

  14. Chris D
    Posted Sep 12, 2007 at 6:01 AM | Permalink

    JerryB, #11 & 13: This has been documented over on Anthony’s blog, also:

    http://www.norcalblogs.com/watts/2007/09/33_of_the_ushcn_network_has_be.html#comments

  15. Murray Duffin
    Posted Sep 12, 2007 at 7:10 AM | Permalink

    This one is a little off topic, but since very recent GISS data changes have become evident, it might be partly pertinent. I noticed a couple of days ago that I could no longer find raw and adjusted plots. I’m not a whiz and don’t work with data if I can avoid it. Now, as you can see below, yesterday I couldn’t find current plots for Canadian Arctic stations. Has this been another aspect of the changes? The following was written last night.

    Because there has been discussion here that GW is found mainly in the Canadian and Siberian north, and because much has been made recently of the unprecedentd 2007 decline in Arctic areal ice extent, I decided to eyeball northern stations to look for trends. On Sept 11 2007 at http://data.giss.nasa.gov/gistemp/station_data/ I checked stations north of 63 with records from 1935 to 2007.

    The first thing that shows up is that only a few Canadian stations are updated to 2003, and almost none later. Why isn’t GISS data current? Even the few Canadian stations listed as having data to 2007 only have the plots to 2003.

    The next thing that becomes apparent is that there is no consistent circumpolar pattern. Western Siberia shows a very cold event in 1976, warming sharply in 1977/78, but little or no shift in the average level. Alaska and the Yukon have their cold event in 1975, with sharp warming to 1977 or 1978, and with a shift in the average upward by from 0.5 degrees C in the Yukon to >1.5 degrees C in western Alaska (Nome and Kotzebue, respectively on the Bering and Chukchi seas, south and north of the Bering strait, and nearly at the same longitude). This discontinuity is evident all over Alaska, so has to be a climate event. See: http://markov.ldgo.columbia.edu:81/%28/siberia/alexeyk/mydata/TSsvd.in%29readfile/.SST/.PDO/figviewer.html?map.url=T+fig-+line+-fig showing a major PDO switch starting about 1973, which may have been manifested in west Alaska temperatures starting in 1975.

    Neither of these events is apparent in the NWT and eastward to Norway. However from Baffin Island, across Greenland to Iceland there is a low in 1993 followed by a big recovery lasting 2 to 5 years of from 1.5 degrees C in eastern Canada and Iceland to 4.5 degrees C in Greenland. In Greenland there is also an upward shift in the average of at least 1.5 degrees C. See http://www.cgd.ucar.edu/cas/jhurrell/nao.stat.winter.html . The jump seems to have happened about 2 years after the smoothed peak of the NAO. This event is not apparent in Siberia.

    Out of 50 stations checked around the Arctic only 20 are clearly warmer in the last decade, than in the decade from 1932 to 1942. Of this 20, at least 3 in Alaska appear to have moved from a snow strip to a paved airport runway in 1997/98, so maybe evidencing microsite effect. A couple of others show what appear to be localized discontinuities (Olenek 1966/67 and Iakutsk 1988) that may also be microsite effects that I couldn’t track down. Fifteen out of 50 appear to be clearly warmer, with most of the warming being higher annual minima. Only a few have a recent maximum higher than the 1930s high. The recent peak year is 2003, 2005 or 2006, with 2003 being most common. In the few cases where there is a value for 2007 it is always cooler than the nearest peak. I didn’t count but only a few sites are clearly cooler than the 1930s or trending down. Surprisingly, Fairbanks is clearly cooling since a 1987 peak. Net net, there is no obvious reason to believe that the Arctic is warmer now than in the 1930s. I don’t see record GW from the Arctic unless it is in the missing Canadian records, or comes from an upward bias. Of course I am only eyeballing plots so cannot detect a few tenths of a degree. Since AGW theory says that warming will be much stronger at high latitudes I expected a trend obvious to an eyeball check.

    As for Arctic sea ice extent, the two major sources I could find are http://arctic.atmos.uiuc.edu/cryosphere and http://nsidc.com/news/press/2007_seaiceminimum/20070810_index.html . The nsidc site seems less problematic and more credible. The uiuc site has min. extent of 3.99x10e6 sq. km. at 6 Sept vs 4.24 for nsidc at 9 Sept. The nsidc curve of ice extent suggests that the major loss has been during July/Aug. See
    http://nsidc.com/sotc/sea_ice.html, whereas uiuc seems to show the fastest loss from 6 to 15 May, by scaling the annual info at http://arctic.atmos.uiuc.edu/cryosphere/IMAGES/sea.ice.anomaly.timeseries.jpg.
    NSIDC accounts for the July/Aug loss as a combination of unusually high insolation (little cloud cover), and atmospheric circulation transporting warm air around a stationary Siberian high. There is no obvious reason for a sharp drop in May. However it could be that uiuc depends on satellite microwave data to the exclusion of visual data, and microwave can be fooled by water ponding on the ice. In any case the ice extent is at an unprecedented low since satellite supported estimates have been made starting in 1972.

    NSIDC notes that ice extent has been declining since the early ‘50s. The only plot I could find is here: http://arctic.atmos.uiuc.edu/cryosphere/IMAGES/seasonal.extent.updated.jpg . There is an apparent anomaly at 1953, and NSIDC notes that sometime after 1972 the “Dehn ice charts” for the Canadian Arctic for 1953 to 1986 became available. Incorporating them is probably the source of the anomaly. If the summer curve pre 1953 is moved down to reattach directly at 1953, it becomes clear that the ice extent began to shrink about 1970, which would be reasonably consistent with the return to warming of the global average surface instrument plot. NSIDC further notes that estimation accuracy was at best +- 5 to 10% from 1972 to 1995, and that estimates are subjective, especially prior to the beginning of training of estimators in 2003. Accuracy prior to 1972, and especially prior to 1953 is anybody’s guess. Just eliminating the 1953 anomaly moves the 1940 minimum (PDO high, NAO low) in the referenced plot to the same level as 2002/3. A 10% overestimation would move it below the 2005 level, the lowest prior to 2007. Since the apparent 1953 correction was downward, one would expect overestimation more likely than underestimation. It is not unlikely that Arctic sea ice extent now is just at the lowest level of the 1932-1942 decade.

    Does anyone have a source for Canadian Arctic temperature plots extending to 2006? Or PDO sst to near 2007? Or the NAO to near 2007?

    Steve, while trying to dig up some info on PDO and NAO I came across this: http://ipcc-wg1.ucar.edu/meeting/Drght/restricted/present/Hughes.pdf . The Bristlecone pines appear again. Murray

  16. JerryB
    Posted Sep 12, 2007 at 8:10 AM | Permalink

    Climate change science in action: it seems that within the past few days,
    Detroit Lakes, Danevang, and Boulder, all got cooler about one hundred
    years ago. That should help the trends. 🙂

    Steve, it may be time for you to do another one of your “scrapes” of GISS data.

    Also, if you can generate a GHCN format file from your previous scrape, it may
    be of use as a comparand for Not Sure and the others who are working at emulating
    the GISS processes. Comparing to what is currently online at GISS will result
    in false negatives.

  17. Demesure
    Posted Sep 12, 2007 at 9:21 AM | Permalink

    That GISS or CRU “cool the past” to up the trends is nothing new. Just visually compare IPCC 2001 to IPCC 2007 global temperature graphics and the differences jump to the eyes.
    Such a large scale data tampering game can continue forever as long as there is backing from the politics.

  18. Murray Duffin
    Posted Sep 12, 2007 at 9:36 AM | Permalink

    Tried to send a longish partially off topic message, about 1 and 1/2 pages in Word, but it seems to have not been accepted. Too long?? Too much off topic?? Other??
    Note, whenever I post my little green bars stall at about 1/2 way, no matter how long I wait. If I hit “submit” a second time I get a message that I have sent a duplicate, but only the one seems to post. This time, after more than 1 hour I hit submit again, got the message and it seems it still didn’t post. Any suggestions? Murray

  19. Jeremy Friesen
    Posted Sep 12, 2007 at 1:01 PM | Permalink

    Close your browser, reopen it, and sent your message in a couple chunks.

  20. bernie
    Posted Sep 12, 2007 at 1:03 PM | Permalink

    #18
    See #15?!!!

  21. Chris W
    Posted Sep 12, 2007 at 1:16 PM | Permalink

    Re: #15

    Murray, You can get up to date Canadian data from Environment Canada.

    http://www.climate.weatheroffice.ec.gc.ca/climateData/canada_e.html

    Pick your location and look for the csv option at the bottom of the page to get the data for use in excel, etc.
    You can also download a CD with daily data for all of Canada up to 2002.
    I’ve been looking closely at the data for Calgary (where I live) which dates back to 1885. The warming trend evident in the annual data is almost entirely due to warmer winters (January and February). All other months show pretty constant temperatures.

    I am not sure what that means (as I am not even sure I would qualify as a jester)…, but looking only at annual data does not tell the whole story.

  22. Murray Duffin
    Posted Sep 12, 2007 at 2:24 PM | Permalink

    RE: 20 Thanks. Makes me feel silly
    Re: 21 Thanks, but I don’t find a “CSV option” and don’t get any choices beyond monthly (no annual data?), and get no data when I choose monthly and the CD is only to 2002. Are there no plots showing yearly averages roughly 1930 to 2007? Murray

  23. Murray Duffin
    Posted Sep 12, 2007 at 6:04 PM | Permalink

    Re 15
    See http://nsidc.com/news/press/2007_seaiceminimum/images/20070910_extent.png On further reflection, if the melt in the late ’30s looked like now, there is neither reason or likelihood that any sailors would have sailed into that huge bay that has melted, and less probability that aircraft would have overflown more than a skmall part of it. It is close to a dead certainty that, under such conditions, the ice extent would have been substantially overestimated in the years near to and including 1940. Murray

2 Trackbacks

  1. […] the precise provenance of the NASA USHCN data has been raised in recent posts – most recently here where I posited the use of a vintage USHCN data set. One of the nice things about climateaudit […]

  2. By Hansen Then and Now « Climate Audit on Oct 29, 2011 at 2:14 PM

    […] looks like this is the reason for the conundrum observed in my last post . I never thought of checking to see if Hansen had altered early 20th century values for Detroit […]