Quantifying the Hansen Y2K Error

I observed recently that Hansen’s GISS series contains an apparent error in which Hansen switched the source of GISS raw from USHCN adjusted to USHCN raw for all values January 2000 and later. For Detroit Lakes MN, this introduced an error of 0.8 deg C. I’ve collated GISS raw minus USHCN adjusted for all USHCN sites (using the data scraped from the GISS site, for which I was most criticized in Rabett-world). Figure 1 below shows a histogram of the January 2000 step for the 1221 stations (calculated here as the difference between the average of the difference after Jan 2000 and for the 1990-1999 period.) The largest step occurred in Douglas AZ where the Hansen error is 1.75 deg C! There is obviously a bimodal distribution.

hansen40.gif

Next here is a graph showing the difference between GISS raw and USHCN adjusted by month (with a smooth) for unlit stations (Which are said to define the trends). The step in January 2000 is clearly visible and results in an erroneous upward step of about 0.18-0.19 deg C. in the average of all unlit stations. I presume that a corresponding error would be carried forward into the final GISS estimate of US lower 48 temperature and that this widely used estimate would be incorrect by a corresponding amount. The 2000s are warm in this record with or without this erroneous step, but this is a non-negligible error relative to (say) the amounts contested in the satellite record disputes.

hansen41.gif

Aug 7 UPDATE:
On the weekend, I notified Hansen and Ruedy of their Y2K error as follows:

Dear Sirs,
In your calculation of the GISS “raw” version of USHCN series, it appears to me that, for series after January 2000, you use the USHCN raw version whereas in the immediately prior period you used USHCN time-of-observation or adjusted version. In some cases, this introduces a seemingly unjustified step in January 2000.

I am unaware of any mention of this change in procedure in any published methodological descriptions and am puzzled as to its
rationale. Can you clarify this for me?

In addition, could you provide me with any documentation (additional to already published material) providing information on the
calculation of GISS raw and adjusted series from USHCN versions, including relevant source code.

Thank you for your attention,
Stephen McIntyre

Today I received the following response:

Dear Sir,

As to the question about documentation, the basic “GISS Surface Temperature Analysis” page starts with a “Background” section whose first paragraph contains the sentence: “Input data for the analysis ,…, is the unadjusted data of GHCN, except that the USHCN station records were replaced by a later corrected version”. A similar statement appears in the “Abstract” and the “Introduction” section of our 2001 paper (JGR Vol 106, pg 23,947-23,948). The Introduction explains the above statement in more detail.

In 2000, USHCN provided us with a file with corrections not contained in the GHCN data. Unlike the GHCN data, that product is not kept current on a regular basis. Hence we used (as you noticed) the GHCN data to extend those data in our further updates (2000-present).

I agree with you that this simple procedure creates an artificial step if some new corrections were applied to the newest data, rather than bringing the older data in sync with the latest measurements – as I naively assumed. Comparing the 1999 data in both data sets showed that in about half the cases where the 1999 data were changed, the GHCN data were higher than the USHCN data and in the other half it was the other way round with the plus-corrections slightly outweighing the minus-corrections.

Although trying to eliminate those steps should have little impact on the US temperature trend (much less the global trend), it seems a good idea to do so and I’d like to thank you for bringing this oversight to our attention.

When we did our monthly update this morning, an offset based on the last 10 years of overlap in the two data sets was applied and our on-line documentation was changed correspondingly with an acknowledgment of your contribution. This change and its effect will be noted in our next paper on temperature analysis and in our end-of-year temperature summary.

The effect on global means and all our tables was less than 0.01 C. In the display most sensitive to that change – the US-graph of annual means – the anomalies decreased by about 0.15 C in the years 2000-2006.

Respectfully,

Reto A Ruedy

Well, my estimate of the impact on the US temperature series was about 0.18-0.19 deg C., a little bit more than Ruedy’s 0.15 deg C. My estimate added a small negative offset going into 2000 to the positive offset of about 0.15-0.16 after 2000 – I suspect that Ruedy is not counting both parts, thereby slightly minimizing the impact. However, I think that you’ll agree that my estimate of the impact of the impact was pretty good, given that I don’t have access to their particular black box.

Needless to say, they were totally unresponsive to my request for source code. They shouldn’t be surprised if they get an FOI request. I’ll post some more after I chance to cross-check their reply.

As to the impact on NH and global data, I’ve noted long before this exchange that the non-US data in GHCN looks more problematic to me than the US data and it would be really nice if surfacestations.org starting getting some international feedback. Ruedy’s reply was copied to Hansen and to Gavin Schmidt. I’m not sure what business it is of Gavin’s other than his “private capacity” involvement in a prominent blog.

79 Comments

  1. steven mosher
    Posted Aug 6, 2007 at 9:38 PM | Permalink

    Hansen’s own Hockey stick.

  2. TCO
    Posted Aug 6, 2007 at 10:25 PM | Permalink

    Where did you earlier discuss Hansen. There is no linnk or citation.

    You also lack a link and citation to the Hansen pulbication itself.

    Reading this it is unclear how much of the trend, you think comes spueriously from this artifact. Is it .10 degrees or what? Why don’t you give that?

  3. Steve McIntyre
    Posted Aug 6, 2007 at 10:33 PM | Permalink

    Link to post a couple of days ago added. I estimate the error at about 0.18-0.19 deg C in Hansen’s US estimate for 2000 and after, as I said in the post.

  4. Willis Eschenbach
    Posted Aug 7, 2007 at 12:42 AM | Permalink

    Another oddity: why is the variance of the post 2000 so much greater than the pre 2000 data?

    w.

  5. TCO
    Posted Aug 7, 2007 at 5:58 AM | Permalink

    Good catch, Will…is. Also the variance in 1900 is huge and gets smaller and smaller. Is it something about the two series which are being subtracted becoming more similar with time?

  6. reid simpson
    Posted Aug 7, 2007 at 6:35 AM | Permalink

    is it possible that a switch “from USHCN adjusted to USHCN raw” would cause an increase in variance?

  7. JerryB
    Posted Aug 7, 2007 at 7:21 AM | Permalink

    The magnitudes of the differences in the second graph of the
    pre 2000 years, and the post 1999 years, has to do with
    which USHCN adjustments GISS has used for the pre 2000 years,
    and which it has not. The pre 2000 portion of that graph
    may indicate the USHCN FILNET adjustments, while the post
    1999 portion of that graph may indicate FILNET plus SHAP
    plus TOB adjustments, or their negatives. The wording of
    this comment is loose because I haven’t had enough coffee
    yet to want to write a more detailed description.

  8. bernie
    Posted Aug 7, 2007 at 7:29 AM | Permalink

    Willis:
    Is the grey line actually the variance and if so the variance of what? It does not appear symmetrical about the mean value.
    Thanks

  9. Michael Jankowski
    Posted Aug 7, 2007 at 9:19 AM | Permalink

    I guess I won’t hold my breath for the “Hansen mucks it up again” headline from any hockey fanatics.

  10. SteveSadlov
    Posted Aug 7, 2007 at 10:18 AM | Permalink

    Anyone remember that movie … “Entrapment.”

    Not saying anything here, but the main theme of that movie was how a series of miniscule adjustments would result in the mother of all thefts.

  11. Papertiger
    Posted Aug 7, 2007 at 12:16 PM | Permalink

    snip – not interested in policy issues on this thread]

  12. Willis Eschenbach
    Posted Aug 7, 2007 at 1:23 PM | Permalink

    Bernie, you say:

    Is the grey line actually the variance and if so the variance of what? It does not appear symmetrical about the mean value.

    Steve can correct me if I’m wrong, but I think the grey line is the data, and the black line is some kind of filtered average. As you point out, it appears asymmetrical, but I think that is because the top part of the data is cut off in this chart (again I may be wrong).

    w.

    • Steve Garcia
      Posted Jun 30, 2009 at 12:38 PM | Permalink

      Re: Willis Eschenbach (#12),
      Willis and Bernie –
      I enlarged the image and get three impressions:
      1.) Looking at other years and the data traces vs the smoothed out curve, it appears that this asymmetry is common on this graph.
      2.) It does not appear that the data line (grey) extends up past the top of the image.
      3.) It does appear that the data line (grey) has more width above the line than below, so I speculate that the data line above the black line represents more data points above than below. The negative spikes are sharper, implying single points, whereas the positive spikes are more rounded.

      My overall impression is that there are simply more positive values than negative ones, implying the positive spikes are multiple data points; i.e., there is some kind of a dwell on the positive side.

      It is too indistinct to tell for sure, but that is my impression.

      The data, of course, are the data. I am just adding to the discussion (way after the earlier comments) about the image.

  13. Steve Garrison
    Posted Aug 7, 2007 at 2:26 PM | Permalink

    Steve, So when are you going to publish this?

  14. Steve McIntyre
    Posted Aug 7, 2007 at 3:46 PM | Permalink

    Has anyone visited the GISS data page today – something interesting: http://data.giss.nasa.gov/gistemp/ . More on this later.

  15. TCO
    Posted Aug 7, 2007 at 3:54 PM | Permalink

    Ok, I visited it, now, and don’t see what is interesting.

  16. J Edwards
    Posted Aug 7, 2007 at 4:03 PM | Permalink

    Wow, saw the attribution on the GISS data page. Congratulations Steve. Wonder what this does to their “Top 5 Warmest Years” list….

  17. Kenneth Fritsch
    Posted Aug 7, 2007 at 4:12 PM | Permalink

    The NASA GISS Surface Temperature Analysis (GISTEMP) provides a measure of the changing global surface temperature with monthly resolution for the period since 1880, when a reasonably global distribution of meteorological stations was established. Input data for the analysis, collected by many national meteorological services around the world, is the unadjusted data of the Global Historical Climatology Network (Peterson and Vose, 1997 and 1998) except that the USHCN station records up to 1999 were replaced by a version of USHCN data with further corrections after an adjustment computed by comparing the common 1990-1999 period of the two data sets. (We wish to thank Stephen McIntyre for bringing to our attention that such an adjustment is necessary to prevent creating an artificial jump in year 2000.) These data were augmented by SCAR data from Antarctic stations not present in GHCN. Documentation of our analysis is provided by Hansen et al. (1999), with several modifications described by Hansen et al. (2001). The GISS analysis is updated monthly

    I am eagerly anticipating the more later from Steve M. I need a reality check — again.

  18. Anthony Watts
    Posted Aug 7, 2007 at 4:15 PM | Permalink

    RE15 In the text they place this on the GISTEMP page:

    (We wish to thank Stephen McIntyre for bringing to our attention that such an adjustment is necessary to prevent creating an artificial jump in year 2000.)

    Lets all clap at our keyboards for Steve

  19. TCO
    Posted Aug 7, 2007 at 4:17 PM | Permalink

    Wow. Good job, Steve.

    I have not been keeping up with all the ins and outs, but think maybe a more prominent note is needed, if the are actually changin the data set? If a previous corrected version was incorrect and has now been replaced by a modifed one? Need to let users of the previous one know that they used an incorrect one?

  20. Douglas Hoyt
    Posted Aug 7, 2007 at 4:25 PM | Permalink

    Perhaps the error that Steve found in the USHCN exists in the GHCN data. That would make a significant impact on global temperatures and trends.

  21. mccall
    Posted Aug 7, 2007 at 4:30 PM | Permalink

    Congratulations on the quick e-catch — words worth a thousand pictures.

  22. James Lane
    Posted Aug 7, 2007 at 4:51 PM | Permalink

    GISS are to be commended for their acknowledment to Steve, but what an opaque piece of writing. I’ve read it several times, and I can’t figure out what they mean. I hope Steve will be able to translate it for us.

  23. JerryB
    Posted Aug 7, 2007 at 5:19 PM | Permalink

    Re #20,

    Doug,

    The problem that Steve related in this thread is limited to
    GISS usage of USHCN station data.

  24. Steve McIntyre
    Posted Aug 7, 2007 at 5:34 PM | Permalink

    I’ve posted up my letter to GISS and the Hansen-Ruedy reply (copied to Gavin Schmidt.)

  25. TCO
    Posted Aug 7, 2007 at 5:53 PM | Permalink

    Donde?

  26. TCO
    Posted Aug 7, 2007 at 5:59 PM | Permalink

    See it. I think it’s fine. You get a BZ for helping science Steve.

    They don’t want a bunch of crowing about how this changes the overall average. Which is the reason for the other comments.

    Let’s let it go at that.

  27. TCO
    Posted Aug 7, 2007 at 6:03 PM | Permalink

    If this data was used by a lot of other people, the change in correction may need to be stated in a publication. In that case, it may if central enough, warrent having you as a co-author given that you “made a significant contribution”. If that causes too much acrimony, since you won’t sign onto other things in their paper, or just don’t play well in the sandbox, it might be right for you to publish on your own.

    All of the above is speculative as I have no idea about the datasets or their usage. I’m not sure if it’s a minor thing like eliminating a miscellaneous bad JCPDF, or if it is a big deal (widely used data for other work).

  28. Steve McIntyre
    Posted Aug 7, 2007 at 6:20 PM | Permalink

    #26,27. TCO, I don’t know whether you remember the Mann corrigendum of 2004 – where Mann admitted a few errors, not on principal components or verification r2 or bristlecones – and said that it didn’t “matter”. Yeah, yeah, he still says that it didn’t “matter”, but I doubt that anyone really feels very confident in his reassurances. My guess is that 0.15 (probably more like 0.18) will affect some of the U.S. hot year rankings a little – not enormously the 2000s are still warm. Why would you or anyone simply think that this is the last stone to be turned over in this data set?

  29. TCO
    Posted Aug 7, 2007 at 6:22 PM | Permalink

    I don’t think it’s the last stone. Why do you think I think like that? I can disaggregate issues.

  30. TCO
    Posted Aug 7, 2007 at 6:23 PM | Permalink

    Just take your attaboy and be happy, Steve. I don’t give ’em easy.

  31. JerryB
    Posted Aug 7, 2007 at 6:51 PM | Permalink

    Steve,

    Let me add my congratulations. My guess is that the copy
    to Gavin is due to his having been pelted by RC regulars
    about your findings.

    Regarding the GISSTEMP update:

    Before:

    “Input data for the analysis, collected by many national meteorological
    services around the world, is the unadjusted data of the Global
    Historical Climatology Network (Peterson and Vose, 1997 and 1998) except
    that the USHCN station records included were replaced by a later
    corrected version.”

    After:

    “Input data for the analysis, collected by many national meteorological
    services around the world, is the unadjusted data of the Global
    Historical Climatology Network (Peterson and Vose, 1997 and 1998) except
    that the USHCN station records up to 1999 were replaced by a version of
    USHCN data with further corrections after an adjustment computed by
    comparing the common 1990-1999 period of the two data sets. (We wish to
    thank Stephen McIntyre for bringing to our attention that such an
    adjustment is necessary to prevent creating an artificial jump in year
    2000.)”

    The phrase “such an adjustment” seems yet to be defined.

  32. Steve McIntyre
    Posted Aug 7, 2007 at 6:59 PM | Permalink

    #30. OK. I’m still annoyed because they still didn’t provide proper provenance for the data before 2000, leaving another guessing game (not that I mind such puzzles.)

  33. TAC
    Posted Aug 7, 2007 at 7:34 PM | Permalink

    SteveM: Excellent!

    Everyone benefits when errors are caught and corrected.

    This reminds me that it’s time to pay another visit to the Tip Jar… 😎

  34. Bob Meyer
    Posted Aug 7, 2007 at 8:16 PM | Permalink

    Steve,

    By now whenever Hansen watches the movie “The Terminator” and hears the line about how the terminator “just won’t stop until you are dead” he must see your face.

    That was one great piece of detective work.

    Thanks.

  35. steven mosher
    Posted Aug 7, 2007 at 8:37 PM | Permalink

    SteveM.

    I’m dense. Did Reudy say that the mistake was .15C for the US over the period of 2000-2006?

    That kinda renders the silly LLN debates moot.

    The trend for the US is like .8C century.

    Gosh. a error that big in just 6 years of data.

    So, Anthony has fought global warming and reduced the Global temp by .01 by taking pictures.

    What would that reduction cost in carbon credits

    Hmm can we sell carbon credits for taking pictures?
    .

  36. Steve McIntyre
    Posted Aug 7, 2007 at 9:00 PM | Permalink

    Yes, it’s 0.15 deg C. for the US.

    I’d been looking at GISS adjustments before the stations got off the ground http://www.climateaudit.org/?p=1142 http://www.climateaudit.org/?p=1139 http://www.climateaudit.org/?p=1175 , and some of this analysis comes from continuing to poke at that data. The surface stations survey were a fantastic benefit to that analysis since they sharpened up sites that needed to be examined. So you notice things comparing Tucson to Grand Canyon that wouldn’t necessarily turn up in an abstract setting.

  37. steven mosher
    Posted Aug 7, 2007 at 9:58 PM | Permalink

    RE 30. I think the Goddard guys deserve sme credit for doing the right thing.

    I posted a thank you to gavin and crew on RC.

    Josh Halpern and others made a big deal ( rightly) about a mistke that Kristin Byrnes made on a graph.
    Some here advised her to correct her error ( science is a history of being wrong). She did.

    And some gloated. I would advise against this. Goddard made a mistake. It was pointed out.
    They fixed it. Congratulations on all sides.

    Next?

  38. JS
    Posted Aug 7, 2007 at 10:04 PM | Permalink

    Does this mean that 2005 will no longer be the warmest year on record?

  39. Lee
    Posted Aug 7, 2007 at 10:34 PM | Permalink

    [snip- I scrubbed something not because it was an error but because it was off topic and bickering. I’ve been trying to improve the threads by doing this and most readers appreciate the intervention with bickering people.]

  40. Alan Woods
    Posted Aug 7, 2007 at 10:37 PM | Permalink

    I agree with Lee. Steve, why do this?

    [Steve – because bickering and off-topic postings occasionally swamp the threads. I don’t always prune things but I’ve been trying to do so recently due to an increased activity of bickering. From my perspective, it’s helped as the parties involved are less likely to post bickering posts if they are regularly pruned.]

  41. Posted Aug 8, 2007 at 5:39 AM | Permalink

    This is what the contiguous U.S. GISS temperature history looks like now:

    Warmest year now is 1934, followed by 1998, 2006, 1921 and 1931:

    http://data.giss.nasa.gov/gistemp/graphs/Fig.D.txt

    Will NOAA issue a press release to correct the one where they announced 2006 as the warmest year on record for the US?

    http://www.noaanews.noaa.gov/stories2007/s2772.htm

    Congratulations Steve!

  42. JerryB
    Posted Aug 8, 2007 at 6:22 AM | Permalink

    Thanks Mikel for posting those links.

    GISS has moved very quickly.

    Steve,

    GISS input files for USHCN stations have been updated with
    new numbers. Lots of new numbers.

    Whether the changes that have already been made will stay,
    or whether there may be further revisions, time will tell.

  43. Steve McIntyre
    Posted Aug 8, 2007 at 8:31 AM | Permalink

    #42. Jerry, can you give me some particulars of changes and updates that you’ve noticed?

    They have already changed their US (Figure D) numbers online without any preservation of the old numbers. By sheer chance, I happened to have the old information sitting in my active R session; I hadn’t saved them or planned to save them(I have now) and will post them.

  44. Steve McIntyre
    Posted Aug 8, 2007 at 8:32 AM | Permalink

    #41. The NOAA calculation is different than the GISS calculation. It has its own set of hair on it.

  45. JerryB
    Posted Aug 8, 2007 at 8:37 AM | Permalink

    Steve,

    I looked at three of the stations that I checked a few days ago, and
    all three have completely new pre 2000 numbers in the GISS “raw” files.

    Station Name . Number
    CHEYENNE WELLS 051564
    ENOSBURG FALLS 432769
    HOPEWELL ….. 444101

    GHCN Number
    42572401001 HOPEWELL
    42572465001 CHEYENNE WELLS
    42572612001 ENOSBURG FALLS

  46. Steve McIntyre
    Posted Aug 8, 2007 at 9:48 AM | Permalink

    #45. I checked Hopewell and I agree. Jeez, they’ve been crazy busy the last couple of days. I’m not sure what they’re doing but they’re really going at it fast. IF Hopewell VA is typical, they’ll have changed all the GISS raw and GISS adjusted versions in the U.S. before 2000.

    I think that they are trying to do things too fast without thinking it through. If this is what they’ve done (and I’m not sure yet), the pre-2000 GISS raw (which was fairly stable) has been changed into pre-adjusted versions that now don’t track to original sources, whatever those sources were.

    My, my…

    If it were me in their shoes, I’d have kept the pre-2000 data intact and adjusting the post-2000 data. Far too many changes in what they’re doing. But it will take a couple of days to assess the situation.

  47. JerryB
    Posted Aug 8, 2007 at 9:59 AM | Permalink

    I’ve since checked Port Angeles, Eads, Hammon, Lakin, Boulder, and Kanab, all
    of which have USHCN adjustments, and all have new pre 2000 numbers at GISS.

    My guess is that they will change, or have already changed, the pre 2000 numbers
    for all USHCN stations with non trivial adjustments.

    Perhaps this is a temporary measure until USHCN version 2 is final, but that’s
    just a wild guess.

  48. pk
    Posted Aug 8, 2007 at 10:21 AM | Permalink

    What will the new numbers do to the model fit?

  49. Steve McIntyre
    Posted Aug 8, 2007 at 11:08 AM | Permalink

    Here’s something interesting. If you compare “old” Hopewell VA numbers (fortunately preserved due to my much criticized “scraping” of GISS data) to the “new” Hopewell VA numbers, the GISS “raw” data for say June 1934 or June 1935 has gone up by 0.7 deg C, while the GISS “adjusted” data has gone up by only 0.1 deg C. So in some cases, their “UHI” adjustment as applied offsets what was a programming error. Makes you wonder about the validity of the UHI adjustment.

    BTW as Jerry previewed, their US data set is now a total mess. Everything’s been written over prior to 2000.

  50. J Edwards
    Posted Aug 8, 2007 at 11:24 AM | Permalink

    Steve, would this latest mess be grounds for a new FOIA request for all “adjustment” methods including source code? Somehow we need a little “sunshine” on just what is actually being done to the data.

  51. Steve McIntyre
    Posted Aug 8, 2007 at 11:30 AM | Permalink

    I’ve requested information through an email (this will be my 3rd request) but FOI time is close.

  52. Posted Aug 8, 2007 at 11:42 AM | Permalink

    #46 Steve

    So whats so important about the post 2000 data that they’d rather adjust the pre-2000 data instead?

  53. J Edwards
    Posted Aug 8, 2007 at 12:14 PM | Permalink

    #52 I think it has to do with normalization of the surface data with the satellite data. At least that would make sense to me.

  54. Dave Dardinger
    Posted Aug 8, 2007 at 12:14 PM | Permalink

    re: 52 Kevin,

    I think that it seems necessary to the Climateers to always keep the current measured temperatures in sync with the current derived temperatures. Therefore they have to periodically readjust to make present average temperatures match.

    To be fair, if they did it the opposite way they’d be jeered at just as loudly if not more so. Still, if one is looking at the thermometer outside or at a weather report in a paper from 1930 it still will make one pause if the scientists say their temperature is different than what it is/was.

  55. Posted Aug 8, 2007 at 12:21 PM | Permalink

    There’s an interesting read from earlier this year over at Open Mind on this subject.

    Note the number of things that we were assured could not occur but have in fact now occurred.

    BTW, does anyone know the actual facts regarding the TOB correction and the temperature data outside the US? I get the impression that the TOB “correction” has an important effect and that it is not applied to non-US data? Has the TOB model been updated and re-validated as new data have become available?

    All this talk about “corrections” has made an incorrect application of the word readily accepted SOP. We can now make corrections when the correct answer is not known.

  56. JerryB
    Posted Aug 8, 2007 at 1:39 PM | Permalink

    Based on data for the few stations for which I have before and after
    copies, I think I see what the new adjustments to the adjustments
    are, but I don’t believe what I think I see.

    It appears that whatever the adjustments were for the months of 1999
    will get backed out for 1999, and for all previous years.

    As mentioned, this opinion is based on a very small number of stations.
    Steve has much more before data, and may find that what I think I see
    is not supported by that additional data.

  57. JerryB
    Posted Aug 8, 2007 at 2:46 PM | Permalink

    Steve,

    Reviewing Reto Ruedy’s note to you, I would revise my interpretation
    in light of his statement:

    “When we did our monthly update this morning, an offset based on the last
    10 years of overlap in the two data sets was applied and our on-line
    documentation was changed correspondingly with an acknowledgment of your
    contribution.”

    I would say that the average adjusments of 1990 through 1999 would be
    what gets back out (“offset”) of 1999 and preceding years. For most
    stations, the adjustments will be the same for each of those years,
    but for some stations the adjustments may have changed.

    I would say that such an adjustment would be a temporary measure. Whatever
    is its rationale would not survive much of a critique.

  58. Kenneth Fritsch
    Posted Aug 8, 2007 at 6:12 PM | Permalink

    RE: #55

    Note the number of things that we were assured could not occur but have in fact now occurred.

    I think the writer is conjuring up some straw men from CA. He relates to the adjustment methods as being neutral on the issue of temperature trends and given their underlying assumptions he is correct. Unfortunately what he and many of the defenders of Hansen methods here at CA fail to address is the assumptions. It is as if they do not want to dig any further than the method itself allowed and that makes for a very dull analysis.

  59. Sam Urbinto
    Posted Aug 8, 2007 at 7:32 PM | Permalink

    #55 Dan, It’s not just the best estimates thread, there’s also a brewhaha of sorts in the surface stations thread.

    Liling calls 20% of the 125 year trend in just 6 years as being “a glitch”, but rather than deal with that, dhogaza complains about Dr. McIntyre instead!!

    That glitch pales compared to McIntyre’s outright dishonesty about other issues, and Hansen’s graceful acceptance of the correction to the data analysis contrasts greatly with McIntyre’s unwillingness to acknowledge his own (frequent) errors.

    Dano, Guthrie, Bloom, Boris and others are up to the usual tricks also. Maybe I should stop by.

  60. Sam Urbinto
    Posted Aug 8, 2007 at 8:15 PM | Permalink

    It seems he stopped doing that WE….

    On the other hand, I made a comment over there at Tamino’s blog, but forgot to properly enclose my link tags.

    I’m sure nothing I ever do or say will ever be taken seriously ever again.

    I mean that’s obviously far worse than messing up 6 years of data after all.

  61. David Smith
    Posted Aug 10, 2007 at 9:46 PM | Permalink

    I plotted the annual GISS temperature versus the satellite-derived lower troposphere temperature (RSS) for the US. The resulting plot is here . The GISS temperatures include the recent adjustment.

    The periods where GISS surges ahead of the satellite record appear to be associated with times of El Ninos , perhaps involving changes in precipitation. The record is muddied by major volcanoes in the early 1980s and early 1990s and any correlation is weak, but it’s my best guess.

    The year 2006 looks like an outlier. It was about neutral on ENSO ( a weak La Nina and a weak El Nino occurred) so why the apparent surge? It looks odd.

    Prior to the recent GISS adjustments the 2000-2006 period stood out as an odd period. Now, with the adjustments, GISS in the 2000s looks much closer to the satellite record (except for 2006).

  62. David Smith
    Posted Aug 11, 2007 at 7:28 AM | Permalink

    I revised the GISS versus satellite plot to include the old (incorrect) GISS numbers. The plot is here .

    Note how the old GISS values split away from the satellite record in 2000 – in retrospect it seems like that should have been a caution flag that something was amiss.

    It looks like something is still amiss with 2006.

    The most intriguing thing about the comparison to me are the larger year-to-year swings for GISS than for the satellite record. If the satellite record rises or falls by X then GISS moves by, say, 1.3X. I’ll quantify that later. Remarkably that pattern of exaggerated swings seems to have broken down in 2000, even with the revised data.

    A possible natural explanation is that it reflects the difference between years in which the US receives cold wintertime Arctic air and the years (typically El Nino years) when the US doesn’t get the bitter cold air. The really cold Arctic air is only in the lowest regions of the atmosphere (below say 5,000 feet), which GISS would fully see, while the satellite also sees air above 5,000 feet and averages that “warmer” upper air with the cold surface air.

    Or, maybe it’s another data weirdness.

  63. Steve McIntyre
    Posted Aug 11, 2007 at 8:20 AM | Permalink

    #65. DAvid, that’s a nice plot. I mentioned that I thought that there was still some air in the GISS 2006 numbers and this is another indication. There’s a major difference in data provenance in 2006: GISS only uses USHCN data up to March 2006 in their US numbers; they have a population of non-USHCN sites : primarily airports (which dominate the current data in all the indices). There is more up-to-date USHCN data available (up to late 2006), but, for some reason, GISS has not included it. MY guess is that inclusion of the USHCN data will lower the final US number (based on this graphic.)

  64. jae
    Posted Aug 11, 2007 at 8:56 AM | Permalink

    Isn’t it odd that the peaks for the GISS graph are much higher than the satellite data; whereas there is no difference in the “troughs?” Why would GISS show higher highs, but almost identical lows?

  65. Posted Aug 11, 2007 at 11:03 AM | Permalink

    #34 Bob Meyer,

    I think “Butch Cassidy…” is more apt: “who are those guys?”

    #46 Steve McIntyre,

    You may have induced the “climate scientists” to commit a forced error.

  66. David Smith
    Posted Aug 11, 2007 at 1:05 PM | Permalink

    Re #65 The GSS and satellite US anomalies for 1979-1999 correlate at r=0.95, which is impressive. But, 2000-2006 drops to a mediocre r=0.53 (only seven years of data, of course).

    jae, my guess is that the amplitude of the GISS peaks and valleys is driven by ENSO and reflects the fact that the satellites and surface stations are measuring somewhat different things.

    If the ENSO connection is true then confirming evidence should show up in the winter anomaly data. Unfortunately I have not found monthly anomaly GISS data for the US, only the annual numbers.

  67. J Weimer
    Posted Aug 18, 2007 at 11:09 AM | Permalink

    Is there an elephant in this room? I find this discussion of statistical methodology fascinating, but somewhat reminiscent of rearranging the deck chairs on the Titanic. Pardon me, but I’m just an innocent interloper in this discussion. I undoubtedly don’t appreciate the finer points of statistical significance, yet with a few minutes of investigation, I’ve learned some things that are absolutely stunning, but have gone totally unmentioned in the blog dialogue, or the popular media.

    (1) When the GISS talks about a “Surface Temperature Anomoly” they are not speaking about the surface of the earth, but rather about the tiny bottom layer of the air. But the air moves up and down in the atmosphere, so how can you estimate the heat content of the atmosphere by only looking at the bottom 6 feet?

    (2) “Anomoly” means relative to the average temperature from 1951 to 1980 (a period of relative stability). That means that the 0.6 C temperature rise has occured in the past 25 years — since 1980, not since 1880, which is how the popular press reports it. They talk about temperatures rising over the past century and a half. What this data shows is far more drastic than anything they report.

    (3) As startling as that revelation was to me, I saw something even more stunning when I looked at the GISS 2005 Summation. The global temperature rise is not evenly spread over the planet. It is highly concentrated in the North polar region, where it is actually 5 times greater than the average global rise — 2.5 to 3 C in the last 25 years! That observation should be hugely distressing but it is never the focus of the popular press coverage. They are much more concerned with the conflict and controversy over whether 1934 was a few tenths of a degree warmer than 2005, not whether the polar region has warmed 5.4 degrees F in the past 25 years.

    I say the warming of the Arctic should be hugely distressing because that is the most dangerous region on the planet to experience a temperature rise. It is the one region where a temperature rise will create extreme and irreversible positive feedback. Wherever the North polar ice cover is removed to expose the ocean and the tundra, the sun’s energy will be absorbed at 10 times the rate, and the thawing tundra will release methane with 20 times the GHG effect of CO2. Any temperature rise in the Arctic will be amplified, accelerate exponentially, and become irreversible. In the face of this looming planetary catastrophe, what does it matter if the U.S. lower 48 was a fraction of a degree hotter in 1934 or 2005?

    (4) I suppose that temperature records are the only data we have that allow us to look back a century or more but it seems to me that total atmospheric heat energy is what we’re really after, not temperature. That would require us to also know the density and moisture content of the air for every temperature measurement. It would also require us to estimate the total air mass assumed to be associated with each discrete point and time where we have a temperature reading. Can someone please explain to me how that could possibly be calculated in a moving dynamic atmosphere, with a set of data points that must have changed in both quantity and quality over the past 125 years.

    Perhaps we should be less concerned with identifying trends over past centuries, where we are bound to have all kinds of data problems. We could focus more of our concern on the last 25 years, where we have identified a real and dangerous problem, and where our data is likely to be much more complete, consistent, and reliable. However, to get any media attention there has to be controversy and conflict, so perhaps Stephen M could highlight how correcting Hansen’s Y2K error has reduced the 25-year warming of the Arctic from 5.4 degrees (F) to 5.35 degrees (just guessing)– if, indeed, the correction of post-2000 mainland U.S. temperatures has any effect at all on the Arctic measurements. (I suggest using Fahrenheit because the general American audience of the popular media can understand a 25-year warming of 5 degrees F much better than a 125-year warming of 0.6 degrees C, which is what we’ve been told so far.)

  68. John F. Pittman
    Posted Aug 18, 2007 at 12:23 PM | Permalink

    #73
    Near surface anomolies were chosen as a proxy for global temperature. The methodology of this proxy is being challenged. The basic argument concerns what the cause of the phenomena of melting Artic areas, etc, than whether it is occurring. Since about 1990, most of the emphasis and claims have been that the temperature rise is due not just to man, but in particular manmade emissions, especially CO2. Yes, it is atmosphere heat as you call it. However, the temperature anomoly is used as a proxy for this. The real issue concerns whether man is the cause or not. Many want to do something about climate change or at least prevent an assumed climatic catastrophe. But to do something usually means you have to know the cause of the problem or what a cure consists of. This is the nature of our discussion on CA. Others see those who question as “denialists”. Their claim we must do something now, or we kill Terra. Many who post here have problems with the assumptions that have to be accepted for such a position, such as it is definitely manmade CO2. Of course many who post here are interested in showing the us the error of our thoughts. Your comment

    Perhaps we should be less concerned with identifying trends over past centuries, where we are bound to have all kinds of data problems. We could focus more of our concern on the last 25 years, where we have identified a real and dangerous problem, and where our data is likely to be much more complete, consistent, and reliable.

    As I stated above, hard to do as you suggest unless you know the cause or have a cure. Both of which are being challenged.

  69. Kenneth Fritsch
    Posted Aug 18, 2007 at 3:01 PM | Permalink

    The article linked here and coauthored by Pielke, Sr. expresses in its conclusion what I have been repeatedly attempting to say about the importance of quality control problems exposed here at CA and SurfaceStations.

    Click to access R-318.pdf

    CONCLUSIONS. As Davey and Pielke (2005) documented and Peterson (2006) acknowledges, 926 | JUNE 2007 several USHCN stations are poorly sited or have siting conditions that change over time. These deficiencies in the observations should be rectified at the source, that is, by correcting the location and then ensuring high-quality data that are locally and, in aggregate, regionally representative. Station micrometeorology produces complex effects on surface temperatures, however, and, as we show in this paper, attempting to correct the errors with existing adjustment methods artificially forces toward regional representativeness and cannot be expected to recover all of the trend information that would have been obtained locally from a well-sited station.

    The comparison of the reanalysis with the unadjusted and adjusted station data indicates that the reanalysis can be used to detect the inhomogeneity of individual station observations resulting from nonclimatic biases. In general, the adjustments indeed correct a large portion of nonclimatic biases in these poorly sited stations as far as the difference between the NARR/NNR and station data is concerned. The NNR yields a relatively uniform and statistically significant trend in this region, which is statistically similar to two of the four station trends. However, we found that there are some inconsistencies in the trends of the adjusted data. Among the four stations that have been subjected to adjustments, only the adjusted trend at Lamar is consistent with the NNR trend (being statistically similar). The other three adjustments either make the consistent trend (Cheyenne Wells) statistically inconsistent, produce a statistically significant larger trend than for the surrounding stations (Las Animas), or cause little change in the trend (Eads). This leads us to conclude that, whereas the adjustments do improve the consistency among the nearby station data and reduce the differences with respect to the reanalysis at the monthly and yearly scales, the trends of the adjusted data are often inconsistent among closely located stations.
    Peterson’s approach and conclusions, therefore, provide a false sense of confidence with these data for temperature change studies by seeming to indicate that the errors can be corrected. For instance, the dependence of the corrections on other information (such as regional station moves, which in itself has been found on occasion to be inaccurate) can be considered an indication of the uncertainty and limitations of the “corrective approach” that is being sought. As a requirement, the statistical uncertainty associated with the effect of the adjustments on the regional temperature record needs to be quantified and documented.
    Temperature adjustments such as those resulting from change in instrumentation are, of course, necessary. However, the results shown in this paper demonstrate that the lack of correctly and consistently sited stations results in an inherent uncertainty in the datasets that should be addressed at the root, by documenting the micrometeorological deficiencies in the sites and adhering to sites that conform to standards such as the Global Climate Observing System (GCOS) Climate Monitoring Principles (online at http: / /gosic.org/GCOS/ GCOS_climate_monitoring_principles.htm). A continued mode of corrections using approaches where statistical uncertainties are not quantified is not a scientifically sound methodology and should be avoided, considering the importance of such surface station data to a broad variety of climate applications as well as climate variability and change studies.

    From the USHCN itself, we see from the excerpt below the special problem that undocumented changes (unknown noncompliances) can present to any adjustment schemes. It is interesting, however, that USHCN and NOAA documents and scientists, like the poster here at CA, Lee, make numerous references to quality control as that process for (attempting) extractions of valid data from poorly obtained collections. They spell out what they see as an optimum site for collecting temperature measurements, but on further reading it is clear that a true proactive quality control process is not in place. Below also is the process used for detecting and adjusting for undocumented changes. I would dearly like to find a statistically versed person who could determine, if nothing more than qualitatively, the assumptions that are made about the compliance of stations in general in using the adjustment processes.

    http://www.ncdc.noaa.gov/oa/climate/research/ushcn

    The potential for undocumented discontinuities adds a layer of complexity to homogeneity testing. Tests for undocumented changepoints, for example, require different sets of test-statistic percentiles than those used in analogous tests for documented discontinuities (Lund and Reeves, 2002). For this reason, tests for undocumented changepoints are inherently less sensitive than their counterparts used when changes are documented. Tests for documented changes should, therefore, also be conducted where possible to maximize the power of detection for all artificial discontinuities. In addition, since undocumented changepoints can occur in all series, accurate attribution of any particular discontinuity between two climate series is more challenging (Menne and Williams, 2005).
    The USHCN Version 2 homogenization algorithm addresses these and other issues according to the following steps. At present, only temperature series are evaluated for artificial changepoints.
    1. First, a series of monthly temperature differences is formed between numerous pairs of station series in a region. The difference series are calculated between each target station series and a number (up to 40) of highly correlated series from nearby stations. In effect, a matrix of difference series is formed for a large fraction of all possible combinations of station series pairs in each localized region. The station pool for this pairwise comparison of series includes U.S. HCN stations as well as other U.S. Cooperative Observer Network stations.
    2. Tests for undocumented changepoints are then applied to each paired difference series. A hierarchy of changepoint models is used to distinguish whether the changepoint appears to be a change in mean with no trend (Alexandersson and Moberg, 1997), a change in mean within a general trend (Wang, 2003), or a change in mean coincident with a change in trend (Lund and Reeves, 2002) . Since all difference series are comprised of values from two series, a changepoint date in any one difference series is temporarily attributed to both station series used to calculate the differences. The result is a matrix of potential changepoint dates for each station series.
    3. The full matrix of changepoint dates is then “unconfounded” by identifying the series common to multiple paired-difference series that have the same changepoint date. Since each series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby series share the same changepoint date.
    4. The magnitude of each relative changepoint is calculated using the most appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a jump in mean within a general linear trend, etc.). This magnitude is used to estimate the “window of uncertainty” for each changepoint date since the most probable date of an undocumented changepoint is subject to some sampling uncertainty, the magnitude of which is a function of the size of the changepoint. Any cluster of undocumented changepoint dates that falls within overlapping windows of uncertainty is conflated to a single changepoint date according to
    1. a known change date as documented in the target station’s history archive (meaning the discontinuity does not appear to be undocumented), or
    2. the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity appears to be truly undocumented)
    5. Finally, multiple pairwise estimates of relative step change magnitude are re-calculated at all documented and undocumented discontinuities attributed to the target series. The range of the pairwise estimates for each target step change is used to calculate confidence limits for the magnitude of the discontinuity. Adjustments are made to the target series using the estimates for each discontinuity.

  70. John F. Pittman
    Posted Aug 21, 2007 at 10:27 AM | Permalink

    I have read this 6 times now. I cannot see that it would do anything other than detect large errors. A small trend woulod not necessarily, in fact, would be quite unlikely to be corrected. Looking at the way that they set up the matrix it would appear that a rural dominated matrix would adjust urban down and to a lesser degree urban up. It would have the effect in an urban to adjust the rural up with a small decrease in the urban. THe two statements are based on the assumption of urban heat islands. Thus I would assume the reason UHI was not detected using this scheme is that it has been included as part of the normalization, homogenization steps.

  71. Posted Aug 21, 2007 at 10:33 AM | Permalink

    Translation to English from the posts 76 and 77:

    “Esto gracias a Steven McIntyre quien encontró un bug en los datos de la NASA (en los cuales se basan todas las publicaciones serias) debido al error del Y2K.)”

    Translation: “This was possible thanks to Steven McIntyre, who found a bug in NASA data (on which all serious publications are based) due to the Y2K error).”

  72. steven mosher
    Posted Aug 21, 2007 at 1:11 PM | Permalink

    re 75

    see this http://www.climahom.eu/software/docs/Prezentation_SW.pdf

  73. Jim Edwards
    Posted Aug 21, 2007 at 5:17 PM | Permalink

    Steve M and all readers:

    I suppose the horse has left the barn and already entered the glue factory, but can we not call Hansen’s error a Y2K problem ?

    Calling it a Y2K error, while funny, will lead the lay reader to assume that the error lies at Microsoft’s feet, rather than NASA’s.

  74. Posted Aug 31, 2007 at 7:13 AM | Permalink

    Today global warming news is very danger news for earth life. Now this global warming issue takes big picture for this world. Now we are aware about this issue.

  75. Sam Urbinto
    Posted Aug 31, 2007 at 10:45 AM | Permalink

    Or how about this:

    http://earthobservatory.nasa.gov/Newsroom/NasaNews/2002/200201317366.html

  76. Posted Oct 29, 2007 at 11:11 AM | Permalink

    Thanks to Steve also we crazy Germans learn more and more about the climate … [snip] The best is quoted (in German) here. [snip]

    My tip: Google can translate it.

    Rgds

    Konrad
    (Germany)

  77. Mike A.
    Posted Feb 11, 2008 at 11:48 AM | Permalink

    This stuff is practically Greek to me, but thank you Steve for finding the problem. I am currently experiencing a heated argument with another poster on another website concerning Global warming and the validity of the IPCC report and its followers and their agendas, etc. Whilst I am a mere average citizen, he is a reporter for a newspaper in Canada and his ego knows no bounds. I wish some of you smart guys could come and put him to sleep. It would help. Thanks for the interesting comments.

  78. Mike J.
    Posted Feb 22, 2008 at 3:25 PM | Permalink

    Mike A. – link?

18 Trackbacks

  1. […] example started with this, which led to this, and leading finally to […]

  2. […] notified the pair of the bug; Ruedy replied and acknowledged the problem as an “oversight” that⟷ould be fixed in the next data […]

  3. By The Baltimore Reporter on Aug 10, 2007 at 8:06 PM

    […] notified the pair of the bug; Ruedy replied and acknowledged the problem as an "oversight" that would be fixed in the next data […]

  4. […] notified the pair of the bug; Ruedy replied and acknowledged the problem as an “oversight” that would be fixed in the next data […]

  5. […] notified the pair of the bug; Ruedy replied and acknowledged the problem as an “oversight” that would be fixed in the next data […]

  6. By Global Warming is a YK2 Bug at InMuscatine on Aug 17, 2007 at 10:46 AM

    […] notified the pair of the bug; Ruedy replied and acknowledged the problem as an “oversight” that would be fixed in the next data […]

  7. […] se debía al famoso bug Y2K al manejar los datos brutos. McIntyre notificó el bug a la NASA, y le reconocieron el error como un “descuido” que sería arreglado al actualizar de nuevo los datos. La […]

  8. […] gracias a Steven McIntyre quien encontró un bug en los datos de la NASA (en los cuales se basan todas las publicaciones […]

  9. By Global Warming : Resources on Sep 1, 2007 at 9:44 PM

    […] the data from ground temperature stations. Steve McIntyre did find a glaring error in the data for US Temps over the last 100 years, removing 1998 as the hottest year. This site is often cited by skeptics of […]

  10. By The Baltimore Reporter on Nov 4, 2007 at 9:54 PM

    […] to be a Y2K bug in the handling of the raw data. McKintyre notified the pair of the bug; Ruedy replied and acknowledged the problem as an “oversight” that would be fixed in the next […]

  11. […] Hansen of NASA (who is apparently unfazed and unrepentant in the face of his recently revealed Y2K error). Even certain faux market environmentalists are following Hansen’s lead. The strategic […]

  12. […] August 6 (23:19 Eastern), I published my own first estimate of the impact of the error in the post Quantifying the Hansen Y2K Error. I showed a bimodal distribution of the step discontinuities and that the distribution was not […]

  13. […] errors lead to positive steps. There is a bimodal distribution of errors reported earlier at CA here , with many stations having negative steps. There is a positive skew so that the impact of the step […]

  14. […] in 2007 of their “Y2K error” and they changed their data accordingly with Reto Ruedy sending me the following email: When we did our monthly update this morning, an offset based on the last 10 […]

  15. […] in 2007 of their “Y2K error” and they changed their data accordingly with Reto Ruedy sending me the following email: When we did our monthly update this morning, an offset based on the last 10 […]

  16. By Bureaucracy: The Enemy Within on Feb 7, 2011 at 5:24 PM

    […] Or the web page of Steve McIntyre at Climate Audit, which has many examples – including the one that led to the claim that 1998 was the warmest year in the US when it was actually 1934. […]

  17. By Michelle Malkin » Hot news: NASA quietly fixes flawed temperature data; 1998 was NOT the warmest year in the millenium on Sep 13, 2011 at 2:49 PM

    […] further refines his argument showing the distribution of the error, and the problems with the USHCN temperature data. He also sends an email to NASA GISS advising of […]

  18. […] graphs had been provided to show the magnitude of the effect” is false. In one of my original posts on the matter, I showed graphics estimating the impact of the error on the U.S. temperature record […]