Hello, Viet Nam!

Hey, NOAA, if you’re not too busy deleting data, you can drop in and say hello.


  1. Deep Climate
    Posted Nov 20, 2008 at 11:23 PM | Permalink

    Same comment as 104 in Finland thread, ‘cos it seems to belong here.

    Have you tried this file?


    And have you read this description?

    Sometimes the front door works better.

    So let’s see … I showed that you were using the wrong dataset, and now I think I’ve pointed you to the right dataset.

    You’re welcome. Later …

    • Posted Nov 21, 2008 at 5:33 AM | Permalink

      Re: Deep Climate (#1),

      Have you tried this file?

      That file seems to contain *a lot* of -9999s (looks like there are 3,184,980 missing values but my count might be wrong). It seems to have been a the output from a custom data extraction job for Vose. Given that it is not mentioned anywhere, I am guessing it is not relevant to what Steve is trying to do.

      It is entirely possible that Steve used the wrong file. However, for that claim to have credibility, you’ll need to point out the right file.

      At this point, it is more likely that the README file was hopelessly out of sync with what they were actually posting on the FTP server.

      — Sinan

  2. Steve McIntyre
    Posted Nov 20, 2008 at 11:29 PM | Permalink

    You’ve demonstrated nada. The RVose file is form last February and is irrelevant.

    I corresponded with Vose today and in that email he didn’t know what the problem was.

    Please stop congratulating yourself prematurely, DC. If you show a data set with valid October data fine, but so far you’ve thrown spitballs. Please check out the data.

  3. ChrisJ
    Posted Nov 21, 2008 at 12:31 AM | Permalink

    Sure are (were?) a lot of decoy files. Oh well. Hang in there. -chris

  4. Deep Climate
    Posted Nov 21, 2008 at 12:44 AM | Permalink

    Yes, I see that it has no data for 2008, so I withdraw that last one.

    But I think you still need to consider that a file identified as FD might actually be FD, and one described as anomaly would contain anomalies. Right? Why not try it for the gridcell in question?

    • James Lane
      Posted Nov 21, 2008 at 1:44 AM | Permalink

      Re: Deep Climate (#4),

      But I think you still need to consider that a file identified as FD might actually be FD, and one described as anomaly would contain anomalies. Right? Why not try it for the gridcell in question?

      Hey DC. Here’s an idea, why don’t you try it and report back?

    • KimberleyCornish
      Posted Nov 21, 2008 at 5:24 AM | Permalink

      Re: Deep Climate (#4),

      Would you mind being a little more specific about precisely what you are withdrawing? It seems to me that you accused Steve of using the wrong data file, then you referenced what you mistakenly thought was the correct file that Steve SHOULD have used, only to find that what you referenced was dodgy. Is this a correct account of it? One would think that if you can produce the file you think Steve should have used, then discussion can proceed. If you can’t, then your credibility is somewhat suspect. The file removal shenanigans are not something either innocent or separate from your participation in this blog discussion. That is to say, pleaase either put up or shut up. That way bandwidth is not wasted, but made available for those who want to engage in serious discussion.

    • RomanM
      Posted Nov 21, 2008 at 11:27 AM | Permalink

      Re: Deep Climate (#4)

      But I think you still need to consider that a file identified as FD might actually be FD, and one described as anomaly would contain anomalies. Right? Why not try it for the gridcell in question?

      Why is it that the most arrogant posters are also usually the least likely to check that their statements are at least based on correct facts? First, read the readme file carefully (bold mine):

      These data sets contain gridded temperature anomalies for three parameters (mean,minimum, and maximum temperatures) from the GHCN V2 monthly temperature data sets.

      There are two options: grid_1880_YYYY.dat.gz, where YYYY is the current year and the values are calculated using the first difference method, and anom-grid-1880-current.dat.gz which uses the “anomaly” method.

      We calculated these anomalies with respect to the period 1961 – 1990 using the First Difference Method, an approach developed to maximize the use of available station records (see, e.g., Peterson et al., 1998, ‘The First Difference Method: Maximizing Station Density for the Calculation of Long-term Global Temperature Change’, Journal of Geophysical Research). The
      First Difference Method involves calculating a series of calender-month differences in temperature between successive years of station data (FDyr = Tyr – Tyr-1). For example, when creating a station’s first difference series for mean February temperature, we subtract the station’s February 1880 temperature from the station’s Februrary 1881 temperature to create a February 1881 first difference value. First difference values for subsequent years are calculated in the same fashion by subtracting the station’s preceding year temperature for all available years of station data.

      For each year and month we sum the ‘first difference’ value of all stations located within the appropriate 5 X 5 degree box and divide by the total number of stations in the box to get an unweighted first difference value for each grid box. We then calculate a cumulative sum of these gridded first difference values for all years from 1880 to 1998 to produce a time series for each grid box. The cumulative sum is calculated for each grid box and each month of gridded first difference data independently through time. Each grid box time series is then adjusted to create anomalies with respect to the period 1961 – 1990.

      Still not good enough for you? OK, let’s check for the “gridcell in question”. We will take the data as downloaded by Steve Mc in the thread Anomalous in Finland. Fortunately, I managed to download the NOAA data set before the magic disappearing act started.
      Do some R work:

      swede = ts.union(station1,station2,noaa[,329])
      #extract October
      #Note NOAA values for October are all NAs before 1992
      swede.oct = window(swede,c(1880,10), frequency=1)

      #form first difference sequence for all three series
      #(including the NOAA series)
      swede.diff = diff(swede.oct)

      #average the two station series

      #display the results from 1991 to 2008
      window(ts.union(swede.diff,swede.dave), 1991)

      Time Series:
      Start = 1991.75
      End = 2008.75
      Frequency = 1
      swede.diff.station1 swede.diff.station2 swede.diff.noaa[, 329] swede.dave
      1991.75 -0.6 0.0 NA -0.30
      1992.75 -8.1 -8.2 NA -8.15
      1993.75 4.3 4.7 4.50 4.50
      1994.75 2.5 1.9 2.20 2.20
      1995.75 -0.5 2.3 0.90 0.90
      1996.75 3.0 0.5 1.75 1.75
      1997.75 -1.8 -4.0 -2.90 -2.90
      1998.75 0.3 2.3 1.30 1.30
      1999.75 1.3 0.8 1.05 1.05
      2000.75 2.4 2.6 2.50 2.50
      2001.75 -3.8 -3.1 -3.45 -3.45
      2002.75 -3.1 -4.5 -3.80 -3.80
      2003.75 2.6 3.4 3.00 3.00
      2004.75 0.5 0.1 0.30 0.30
      2005.75 1.6 2.5 2.05 2.05
      2006.75 -3.1 -4.2 -3.65 -3.65
      2007.75 3.9 4.7 4.30 4.30
      2008.75 -3.1 -1.8 -2.45 -2.45

      You will note that the mean of the two “first differences” of the temperature series matches the first diffs of the original NOAA series. This indicates the latter had to be anomalies and not what was incorrectly claimed by DC.

      So why are the results so far out of line from the observed data? The appears to be created by those MIN (missing in noaa) values prior to 1992 combined with the fact that the base year from which the first of the differences was calculated by NOAA was the coldest October on record for both stations. It is not clear how NOAA deals with missing values in constructing the anomalies from the first diffs, so I won’t try now. My strong suspicion is that Steve’s 6.66 degree difference will be explained by this fact.

      • fFreddy
        Posted Nov 21, 2008 at 12:20 PM | Permalink

        Re: RomanM (#22),

        Why is it that the most arrogant posters are also usually the least likely to check that their statements are at least based on correct facts?

        Arrogant – from the latin, “a rogare” – “not asking”

  5. Posted Nov 21, 2008 at 3:10 AM | Permalink

    What fun!

    Now I know why I woke so early this morning (UK time) thinking about the AGW c*** the world believes, and how the heck to get the message out about – er – chewing gum in the machine.

  6. Mike Bryant
    Posted Nov 21, 2008 at 4:20 AM | Permalink

    It’s getting pretty deep around here. Time to get a shovel.

  7. Geoff Sherrington
    Posted Nov 21, 2008 at 5:04 AM | Permalink

    For what it is worth, my brother volunteered for a year in Viet Nam with the Australian Forces on condition that he could take his own sniper rifle because it worked better than the issued equipment. It was not made in USA.

    He is no longer alive. He complained repeatedly that the US forces were a menace because they were poorly trained, ill-disciplined, forgetful of basic rules, noisy and often detectable from some distance because of what they were smoking. He is far from the only Aussie Viet Nam vet to say these things in public, so I tend to believe them.

    This allegory from Steve would seem to have some basis of fact also. We should have a competition among readers some day to estimate the number of inexcusible errors that the keepers of global public temperature records have made – or been found out. I have lost count and I have lost confidence. What are those USA climate guys smoking?

    Good night, USA.

  8. Posted Nov 21, 2008 at 5:58 AM | Permalink

    For everyone’s reference, you can see the history of changes to the contents of ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/grid/ at http://www.flickr.com/photos/asinan/tags/ncdcftplisting/. The screen shots are stamped with US/Eastern times at which they were taken.

    — Sinan

  9. Nylo
    Posted Nov 21, 2008 at 6:23 AM | Permalink

    The only thing that seems certain is that Steve has not used the file which NOAA uses to produce the graph. But that was precisely the whole point. Steve wanted to know, from the very begining, which is the file that NOAA uses, why it isn’t publicly available and how it is being calculated. Deep Climate’s interventions have helped to detect a number of other funny things like this incredible number of continuous changes in a matter of minutes of the information publicly available in the NOAA directories, but so far didn’t help to solve any of the main three questions: which file, why it isn’t available and how it is calculated.

    Yet he says “you’re welcome” all the time, like we should be thankful to him for something (?)

    • Urederra
      Posted Nov 21, 2008 at 12:55 PM | Permalink

      Re: Nylo (#12),

      Deep Climate’s interventions have helped to detect a number of other funny things like this incredible number of continuous changes in a matter of minutes of the information publicly available in the NOAA directories,…

      So, is that what climate change really means?

  10. Bob North
    Posted Nov 21, 2008 at 7:59 AM | Permalink

    Steve – Love the clip. I am not sure how it relates to the topic, but I was laughing the entire time. Nice break.

  11. Steve McIntyre
    Posted Nov 21, 2008 at 8:30 AM | Permalink

    Did anyone happen to save the deleted NOAA file?

    • Posted Nov 21, 2008 at 9:57 AM | Permalink

      Re: Steve McIntyre (#14), which one?

      I have anom-grid2-1880-current.dat.gz and grid_mean_temp_1880_current.dat.gz but not grid_1880_2008.dat.gz

      — Sinan

  12. Jeff Alberts
    Posted Nov 21, 2008 at 9:26 AM | Permalink

    Bravo for spelling “Viet Nam” correctly. A Viet Namese friend once told me that all words in the Viet Namese language were one syllable, so it seemed odd to me that the country name would be two syllables.

    OT I know, sorry.

  13. Jack Linard
    Posted Nov 21, 2008 at 10:01 AM | Permalink

    #15 Khong van de gi

  14. Steve McIntyre
    Posted Nov 21, 2008 at 10:04 AM | Permalink

    grid_1880_2008.dat.gz was the current file that I’d like.

    I modified by scripts only slightly to check out all the irrelevancies thrown at us by Deep Climate and ended up with one of his irrelevant files in my working session while NOAA was shredding the data. It looks like the same thing happened to you.

    • Posted Nov 21, 2008 at 10:08 AM | Permalink

      Re: Steve McIntyre (#18), sorry. I was haphazardly extracting, dumping, importing in a temporary directory until I realized that the contents of the directory changed. By that time, the files was already gone.

      I should have been more careful.

      — Sinan

  15. Bob B
    Posted Nov 21, 2008 at 10:32 AM | Permalink

    You think it’s cold now?–Accuweather calls out NOAA:


  16. Hoi Polloi
    Posted Nov 21, 2008 at 10:55 AM | Permalink

    It’s nowhere to run to for NOAA…

  17. StuartR
    Posted Nov 21, 2008 at 12:50 PM | Permalink

    re Steve McIntyre #18

    I just got back from work and checked and yes, grid_1880_2008.dat.gz was the very file I chose to download out of noodling curiosity at 23.46 GMT last night, do you still need it Steve?

  18. Deep Climate
    Posted Nov 21, 2008 at 1:04 PM | Permalink

    Roman, I apologize for making you go through this. I’d already reread the material, and poked around and changed my mind. So, yes, you and Steve are right – it is an anomaly dataset. I fully admit I did not read the README carefully enough the first time. I went to post and saw your comments (and some others).

    So I admit without reservation that I was wrong about what the file was. But I still think it’s the wrong file, not because it’s a version with errors, but beacause it was generated via a different method than the dataset actually used by NCDC. I’m not really sure what this FD-generated dataset is used for, and maybe it’s considered obsolete. Or maybe it’s only useful for comparing recent year-to-year changes (as some of the documentation seems to suggest). I think this is the most reasonable explanation right now.

    So I think we can agree there are apparently two methods used to calculate grid cell anomaly values, the “first difference” method and the “anomaly” method. I gather the FD method can incorporate more stations, but suffers from certain forms of error propagation that make it inappropriate for the kind of analysis you want to do.

    My understanding is that the NCDC anomaly map is generated using the newer “anomaly” method, but that both datasets were distributed. Granted though, the README is from 2004, so who knows for sure.

    So even if you could get ahold of the FD version that disappeared, it would not be useful IMHO if your goal is to reproduce the NCDC anomaly map. And obviously I do see now that you couldn’t use this file to generate missing 2008 values in the “anomaly” dataset.

    So there’s no way to proceed until NCDC provides an up to date grid data – at least that’s the way it looks to me.

    Having said all that, I would dial back the rhetoric with NCDC. And at the end of the day, at least I’ve provided a reasonable explanation of why the two maps and datsets did not correspond. I don’t think I deserve abuse for that.

    Not that I’m excusing out of date documentation or datasets. As far as I can see, NCDC has not provided a dataset corresponding to the published anomaly product. So either they should provide the up-to-date data, or update the README (preferably both). I’d be surprised if this doesn’t happen sooner or later.

    But I also gather there is some sort of past conflict between Climateaudit and NCDC. So chances are they will sort things out in their own time and not communicate directly with you about this.

    Meanwhile can we all just get along …

    • Posted Nov 21, 2008 at 1:18 PM | Permalink

      Re: Deep Climate (#26),

      Meanwhile can we all just get along …

      Now that you seem to have understood the point of Steve’s post, why not?

      — Sinan

    • KevinUK
      Posted Nov 21, 2008 at 2:04 PM | Permalink

      Re: Deep Climate (#26),

      DC I do hope that you’ve now learned your lesson, namely that you’ve really got to get up very early in the morning to catch Steve out. However when it comes to the UK Met Office, UEA CRU, GISS and NOAA you can have snip



  19. crosspatch
    Posted Nov 21, 2008 at 1:15 PM | Permalink

    DC, I don’t believe there is any “conflict” except when people get defensive when questions are asked about their data. The culture within the various US government agencies seems to be one of covering their backsides more than one of trying to get the data correct and often the attitude toward anyone who would question them is to first dismiss them as if they don’t know what they are talking about, and then to eventually change the data without any explanation. They seem to have a sort of barricade mentality when it comes to answering questions about their data and methods. Basically they treat people who find errors as “the enemy” rather than the resource that it is to getting things right.

    Maybe some of this stems from public positions these agencies have taken where they have expressed various conclusions as certainties and then they use these data to back those conclusions. When the data is discovered to be incorrect, it calls the credibility of those conclusions and the people reaching them into question and that seems to elicit a defensive response rather than a partnering response to get to the bottom of what is really going on. There wouldn’t be so much appearance of “conflict” if the various government agencies didn’t put so much energy into defending their position rather than trying to learn what the real position should be. There is only one “side” and that is the side of what is really going on, what temperatures really are, etc. The data should be able to speak for itself provided it is accurate. What seems to keep turning up, though, are conclusions based on a combination of incorrect data, incorrect methods applied to the data, “missing” data that isn’t really missing but being kept out of the record for some reason, etc.

  20. StuartR
    Posted Nov 21, 2008 at 1:16 PM | Permalink

    re #18 and #24

    Hi Steve, I found the Contact Steve Mc page and sent the grid_1880_2008.dat.gz file to the email there, hope it gets there intact

  21. Steve Huntwork
    Posted Nov 21, 2008 at 1:27 PM | Permalink

    I joined the U.S. Army in 1974 and I understood EXACTLY what you were trying to say! Of course, I eventually became an Army Meteorologist and specialized in satellite remote sensing.

    Some of us “old farts” still remember when scientific studies were based upon reality, instead of computer models.

    WAY TO GO!

  22. UK John
    Posted Nov 21, 2008 at 1:35 PM | Permalink

    A little light relief!

    I work exclusively in the Project Management field, and one thing I had noticed, that whatever I did, the activity completion bar on my Gannt charts on Microsoft Project only moved to the right, a sort of “forcing” was evidently in play. I checked this on every project I had ever been involved with and it was True on every one, the Gannt bars only moved to the Right!

    I formed 3 Hypotheses what this “forcing” might be:-

    1. It might be a as yet undiscovered software bug, deep in the Microsoft machine code, but I rejected this as “too hard” as it might require some knowledge of computer programming outside my expeirience, and any way only a few Geeks would understand, so what’s the point!

    2. All Projects everywhere always finish late, they always have, a sort of natural cycle effect!. The general consensus view from all my professional scientific colleagues was that every Project had finished on time, and additionally to budget, so that couldn’t be right.

    3. Next then a mysterious force that had been unleashed by the planet because our human civilisation was no longer closely connected to nature, and this would conspire to make every project late and ruin our lives, unless we make sacrifice.
    This must be true, because when I made everyone sacrifice, and upped everybodys working hours the Gannt bars moved to the left! QED

  23. Deep Climate
    Posted Nov 21, 2008 at 3:03 PM | Permalink



    From the README

    http://ftp.ncdc.noaa.gov/pub/data/er-ghcn-sst directory contains the global gridded dataset that has been compiled from GHCN (gridded using the “anomaly” method), and Sea Surface Temperatures, and merged using reconstruction techniques similar to those of the Extended Reconstructed Sea Surface Temperature dataset.

    Took a quick look – has blended data to Oct. 2008.

    • RomanM
      Posted Nov 21, 2008 at 3:19 PM | Permalink

      Re: Deep Climate (#33),

      The directory contains three data files, two of them with time stamps in 2007 and the third (temp5d6c.v2.adj.1880.last.dat.gz) time stamped today at 3:33 pm – presumably EST. Was it available yesterday?

  24. Deep Climate
    Posted Nov 21, 2008 at 3:52 PM | Permalink

    Yes, I noticed it was timestamped Nov. 21 (both file and archive). Basically I just went through the directories in pub/data until I found a likely sounding directory with a November timestamp. But I only did that today, so I can’t say for sure what was there yesterday.

    A lot of people were challenging me to find the “right” data set, since I was so sure the other one was the “wrong” one (still am). Don’t know for certain that this is it, but my hunch is it will give a much closer match with the NCDC anomaly map.

  25. Steve McIntyre
    Posted Nov 21, 2008 at 4:59 PM | Permalink

    My point yesterday was that the archived information did not yield their graphic and that the archive contains some sort of error. I won’t have time to look at the new candidate for a day or so, but I can tell you without looking at it that it will not yield the graphic in question which is a land-only graphic. You’ve pointed to a graphic that is a land-sea blend. Whatever its merits, it’s not a solution to the problem as originally posed. Please note that I’m not claiming that it’s impossible for them to present a consistent file – it looks like the graphic was generated from a file without the error in the now-deleted archive. I only observed that there was something wrong with the archived file, which seems to have been confirmed by the deletion of the file – a point that you might acknowledge.

    I don’t think that there was a up-to-date file with that name yesterday, though I can’t prove it.

  26. Posted Nov 21, 2008 at 5:06 PM | Permalink

    I am not sure if this cavalier attitude with files and directories is the kind of scientific anarchism Feyerabend had in mind…

    There must still be many people not understanding that to “put data into a publicly-available directory or website” is perfectly equivalent to “publishing” that same data (and to “announcing” them)

  27. Alan S. Blue
    Posted Nov 21, 2008 at 6:11 PM | Permalink

    Coming from the programming side, it’s shocking that it isn’t something like a public-read-only Subversion installation. The actual incoming data site list, the actual incoming data, the actual functional code, the actual data output, and the actual presentational graphics (+ script to produce same) all stored in an automated version control system. It sounds impressive, but it is easier than juggling folders full of daily updating files manually.

    No one really cares that much about individual freaking bugs — so long as there’s a definate ‘pre-fix’ date and ‘post-fix’ date and a little note “Oops, we didn’t text for this particular corner case originally, test added today.”

    But there’s been a tremendous amount of effort spent here trying to infer what exactly comments like this (from the NOAA readme of #33):

    using reconstruction techniques similar to those…



  28. John Lang
    Posted Nov 21, 2008 at 7:45 PM | Permalink

    All these errors, changing important data files mid-stream, and all the questionable “adjustments” to the temperature data to date are extremely important.

    The 1880 to 2008 temperature trend puts us on a certain global warming track. It is much less right now than the global warming models predict but the trends to date show it might be a problem.

    A little less error here and a little less “adjustment” there and a little less data file error there might put us on a completely different temperature trend track which indicates global warming is not that much of problem at all. A small change here or there might mean the difference between building a few more needed coal-fired power plants or closing all the existing ones.

    The satellite data which seems to be less susceptible to these problems indicates that global warming is much, much less of an issue than the NOAA temp trends/errors and the GISS models indicate.

    I, for one, think it is important these errors do not continue happening.

  29. Gary
    Posted Nov 21, 2008 at 7:48 PM | Permalink

    The skills of a librarian/archivist are clearly needed when it comes to making the data public. These professionals know how to categorize, catalog, and describe the information so that anyone can figure out what they’re looking at. Digital repositories are becoming more prevalent and except for routinely updating the storage formats, there’s very little difference between properly preserving electronic and paper-based information. A brief browsing of the NOAA website didn’t reveal this job title, although I may have missed it. There were some information technology positions, but these don’t necessarily require any knowledge of librarianship. Data preservation should be in the hands of people who know how and like to do it, not the scientists. The present situation is kind of like asking programmers to write instruction manuals — they don’t have a good track record with this either.

  30. Deep Climate
    Posted Nov 21, 2008 at 9:11 PM | Permalink

    Yes, this is the blended dataset, and I surmise it corresponds to the right hand graphic in the October, 2008 NCDC update. But at least it looks like it matches and has the right description and so on. Plus, you can get to it from the NCDC October update by following the links. So at least this part of the archive seems to be up to date.

    As far as I can see an up to date “land only” grid cell dataset is not to be found. I have to presume it was/is supposed to be in ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/grid/

    Now there were three different temp mean grid files in that directory (only one left, while another is gone, but can be found elsewehere). The readme definitely says that the anom-grid-1880-current.dat.gz is constructed with “anomaly” method, while another NCDIC access page points to grid_1880-2008_RVose.dat. Of course, neither of those datasets are up to date.

    So even if you had a correct version of grid_1880_2008.dat, it’s not clear to me that it is/was the dataset used to generate the land only “anomaly method” chart. That’s what I mean by the “wrong” dataset. But who knows for sure? Isn’t there someone you could ask nicely, like one of the contacts listed – or is it too late for that?

    The only thing that is clear that the archiving system is not, um, robust and the documentation is not up to date or consistent. I get the sense that NCDC is not too motivated to fix it very fast – “catching flies with vinegar” and all that.

    • KimberleyCornish
      Posted Nov 21, 2008 at 10:18 PM | Permalink

      Re: Deep Climate (#41), One can reasonably take offence at the complete lack of humility in the tone of DC’s commentary. He/she has come into the discussion roaring like a lion about “errors” and (if he/she understands the concpt of decency) owes Steve a rather abject and very overdue apology. The apology ought to be stated ax such and delivered unambiguously in some form such as: “I was wrong and regret what I wrote. I will take on board your position in future and treat it with a good deal more respect than I have so far. I see that your auditing efforts are in fact fully justified and deserve my support. I will adopt this policy on my own blog.”

      • Nicolas J.
        Posted Nov 22, 2008 at 8:07 AM | Permalink

        Re: KimberleyCornish (#45),

        Rubbish. This is not an apology, this is a pledge to a sect guru.

        • kim
          Posted Nov 22, 2008 at 8:23 AM | Permalink

          Re: Nicolas J. (#52),

          I don’t know, NJ; the by-laws to that sect seem pretty arcane and subversive. Seems more like a cult to me.

  31. Graeme Rodaughan
    Posted Nov 21, 2008 at 10:08 PM | Permalink

    NOAA Claims “NOAA understands and predicts changes in the Earth’s environment, from the depths of the ocean to the surface of the sun, and conserves and manages our coastal and marine resources”.

    However I would be far happier if they claimed “NOAA understands effective Data and Configuration Management techniques, and are able to provide 24 hour public access to both current and archived baselines of your data and any software that has been used to process it in accordance with “…pick any professional Industrial standard”.

    Why “your data” – because the general public has paid for it’s gathering and storage – and if is being used to motivate policies that impact the lives of the general public. Both in themselves are sufficient reasons for an effective data and configuration management system to be in place.

    Perhaps NOAA employees could spend some time at http://www.cmcrossroads.com/.

  32. Mike C
    Posted Nov 21, 2008 at 10:15 PM | Permalink

    I don’t mean to be the monnkey in the wrench, the fly in the ointment or the pain in the a$$… but since this is all preliminary data, isn’t it possible that this is all SOP and just the first time you guys walked into it when it was being done?

  33. Steve McIntyre
    Posted Nov 21, 2008 at 10:33 PM | Permalink

    #44. It might well be. However, I sent a polite email to Russell Vose notifying him of the defects and, in his shoes, I’d have sent back a short email saying so. Vose said that the problem lay with some other dude in NOAA and he’d inquire about it. It was the nature of the error that was odd and spoke to some sort of programming error in one of their modules.

    It’s not like I think that any of this is necessarily a big issue. I’ve noted that there seems to be a clean version somewhere (as the graphic looks like it came from a clean version).

    Because I’d raised the issue with them in a polite way, you’d also think that they’d alert me to the fact that they were withdrawing the data set to fix it. Much of the discussion here has arisen from the peculiarity of seeing data sets disappear while various folks were online looking at it disappear. This has happened on other occasions with other data sets and it’s always a little startling when people are online chatting about the data set and changes are taking place under our nose. We’ve seen this before with Hansen and with Mann.

    “Deep Climate” also enlivened the discussion by charging me with gross errors, and all of his charges have proved unfounded.

  34. ozzieaardvark
    Posted Nov 21, 2008 at 11:02 PM | Permalink

    @Deep Climate,

    I don’t want to seem like I’m piling on to some of the rather tart responses you’ve gotten to your original post, but I’m having a really hard time understanding the perspective and though processes behind your last post (#41).

    You say:

    “So even if you had a correct version of grid_1880_2008.dat, it’s not clear to me that it is/was the dataset used to generate the land only “anomaly method” chart. That’s what I mean by the “wrong” dataset. But who knows for sure? Isn’t there someone you could ask nicely, like one of the contacts listed – or is it too late for that?
    The only thing that is clear that the archiving system is not, um, robust and the documentation is not up to date or consistent. I get the sense that NCDC is not too motivated to fix it very fast – “catching flies with vinegar” and all that.”

    You seem to be acknowledging that the NCDC approach to publishing their data and methodology is at best disorganized and unprofessional, while in the same breath (or at least the same post) saying that if someone asked more nicely, perhaps things would improve.

    A couple of points:
    1) Publishing their data and methodology is not only their job, it’s why they exist
    2) The data in question is being used to decide whether we need to reconstruct the global economy

    A couple of questions:
    1) Should we accept slipshod work from federally funded agencies on a question as important as this?
    2) Should asking nicely be a prerequisite for public servants doing their work in professional way?

    I’m fully familiar with all the “good enough for government work” quips that could come out of the above, but my goodness man, this is IMPORTANT! There are people in this world that are going to either live or die as a consequence of public policy decisions taken based on this very data.

    I’m not trying to paint you as an apologist for all of this, but I just can’t fathom why someone would think that we should have to ask nicely in order for NCDC to do its job in an organized and professional way.

    Help me out here.


  35. Mike C
    Posted Nov 21, 2008 at 11:04 PM | Permalink

    Steve Mc, 46 You are on the point about NOAA responsiveness. Unless you are Anthony Watts and have their thug in a thissle, you are as high a priority as crumbs on the coffee room floor. As for the DC poster, well, I had the time to read through the posts and that all became apparent.

  36. E.M.Smith
    Posted Nov 22, 2008 at 3:38 AM | Permalink

    I’m a past Unix geek, and sometimes I forget that other folks are not. It’s worth mentioning that just about every software project on the planet has tight version controls for the software source code and data files. The tools for this are built in to every Unix and Linux.

    To set up a complete version control system would take a typical Unix / Linux guy about 3 hours (or less). One “make” file. One README. A directory. The first data copies checked in. Some documentation of where the data creator was to put the data to be loaded into the system. And maybe one cron job to automate the periodic typing of the words “make update”.

    There is absolutely no reason whatsoever for data set updates to be as clumsy as evidenced in the recent failures. Don’t forget that company accounting and email archives must be kept accurate through periodic updates or someone can go to jail (Sarbox…) and often with complete rollback to any version. Journalling file systems anyone?

    The state of the art for businesses is that you have duplicate copies in primary and backup sites so that you have business continuation even in a disaster. These copies are both kept live, synchronized and accurate at all times. The cutover from one to the other is expected to be seamless.

    What I’ve seen of the NOAA/NCDC process looks like something high school kids would do saving their MP3 downloads. No, wait, I take that back. My kid has never deleted any of his MP3s …

  37. JimB
    Posted Nov 22, 2008 at 5:14 AM | Permalink

    “There is absolutely no reason whatsoever for data set updates to be as clumsy as evidenced in the recent failures. Don’t forget that company accounting and email archives must be kept accurate through periodic updates or someone can go to jail (Sarbox…) and often with complete rollback to any version.”

    The problem is that there’s no “feedback” loop. It’s an open-ended system. In business, if you fail to keep things archived and retreivable in an orderly fashion, you face still penalties or jail. Your compliance is mandatory, and people from the board of directors to senior management sign off on reports stating compliance.

    Here, there is no compliance, there are no penalties (direct, of course). That we apply these laws to businesses but not our own government is another story entirely.


  38. crosspatch
    Posted Nov 22, 2008 at 6:03 AM | Permalink

    “The problem is that there’s no “feedback” loop. It’s an open-ended system.”

    I believe it is worse than that. There is actually “positive” feedback which is inherently unstable. Several influential members of these agencies have gone on record as stating that warming is a major crisis we must address. They use these data to back their claims. When the data shows more warming, it bolsters their position and increases the credibility of their argument. This has the potential to enhance their career on the speaking circuit and gets them invited to better cocktail parties, not to mention time on the major news networks and in the papers.

    Now when the data are shown to be incorrect, it can directly impact the personal credibility of the people who have gone out on a limb and stated that the science is “settled”. So in this way, a person who uses these data to advance their career sounding the clarion of “Global Warming” can experience a person who questions the veracity of that data as a personal attack or as a threat to their career and cocktail party invitations. While I can’t begin to say how anyone experiences something without talking to them, the defensive response and dismissive, condescending attitude toward anyone who would dare question the data leads one to at least wonder if that is the case. Rather than come out right away and engage in a dialog, the response seems to be to circle the wagons, delete the data, and not say anything in public. In the past, this has usually been followed by wholesale replacement of data with new data showing different results without any comment as to what was done, what was wrong, why it was replaced, etc. In other words, the reaction of the various government agencies erodes trust in them. On the other hand, if Hansen (for example) were to simply come out and say he didn’t care which way it goes, he just wants to know the facts, thanks people for pointing out errors, fixes the errors with an explanation of what went wrong and posted the code they use to operate on the raw data, the process would be transparent and if there is warming, then there is warming. And if there is cooling, then there is cooling, it is what it is. Bit it would establish trust in the people who provide the data and the counsel to government and industry.

    I currently have little trust in that counsel.

  39. Arthur Glass
    Posted Nov 22, 2008 at 8:27 AM | Permalink

    ‘Arrogant – from the latin, “a rogare” – “not asking”‘

    Not exactly. The adjective is derived from the stem of present participle of the verb ‘adrogo.’ Now, the primitive meaning of the simple verb ‘rogo’ is indeed ‘ask’, but the verb developed a specifically political meaning, to ‘ask for’ or ‘propose’ a law. The ‘ad’ part is the prefix/preposition ‘ad’, meaning ‘to’ or ‘toward’, and the relevant meaning of the compound verb is ‘take to oneself, claim, assume’, with the connotation that the ‘taking’ is arbitrary and therefore not valid or justifiable.

  40. JimB
    Posted Nov 22, 2008 at 10:23 AM | Permalink

    I think we’re in vehement agreement 🙂
    One-sided feedback is not a “loop”…it’s open-ended.

  41. Deep Climate
    Posted Nov 22, 2008 at 1:09 PM | Permalink

    Well, sure this part:

    “Steve, I was wrong and regret what I wrote.”

    And I mean it. I can’t quite sign on to the rest of it though.

    As far as NCDC documentation and archiving of the particular data goes, there are certainly problems. But I’m not sure as to what service level is mandated, or should be, for this kind of research data, or rectification of it. I don’t know, for example, if the out of date documentation in this particular directory has been pointed out to NCDC, even now.

    It could be that removal of the two files is the first step in dealing with the immediate problem of the wonky datasets. There is a some sort of maintenance activity scheduled for today I belive, and I expect we’ll see an update next week.

    Myself, I have an interest in the blended land-sea, zonal monthly anomaly data. Some of that is up to date and some is not. I wish it were all up to date, but I can’t get too exercised that it’s not (unless they don’t fix it within a reasonable amount of time).

    You have to look at the whole context, too. Of course, I’ll take Steve’s word that he sent a polite email (or he may have published it somewhere, anyway). But the heading of this post (and presumably the video, which I haven’t watched) could be construed as, um, somewhat insulting. So … I expect the NCDC to take care of it, but not to be particularly friendly, informative or speedy.

    Now a more substantive question. After all the brouhaha about the GHCN “copying” errors in the monthly data, I’m actually more interested in the general state of the GHCN monthly data than the NCDC temp analysis as such. Is there going to be a post on that in particular? Or can someone direct me to the right post?

  42. Hoi Polloi
    Posted Nov 22, 2008 at 1:28 PM | Permalink

    From the Daily Telegraph:

    The world has never seen such freezing heat

  43. crosspatch
    Posted Nov 22, 2008 at 3:09 PM | Permalink

    “I’m actually more interested in the general state of the GHCN monthly data”

    Yes, I believe that is what this whole thing has really called into question. How many times in the past have stations had monthly data carried over to a subsequent month but the difference wasn’t glaring enough to attract immediate attention. Until that question has been answered (and corrected), the monthly data can not be relied upon as being accurate. They are going to need to get to the bottom of whatever caused that carryover of data, fix it, and rebuild the monthly data files if those data are to be considered accurate. If they would also post the code that does that, the community of parties interested in those data may then also help spot errors and actually help in the maintenance of the code itself. It’s one of the benefits of “open source” code.

  44. Deep Climate
    Posted Nov 22, 2008 at 3:40 PM | Permalink

    I don’t think it’s necessary to have the code to do some useful checking. I think taking a subset of GHCN station daily data for say 2008, generating averages for each month and comparing that to the published data would at least reveal the current scope of the “carryover” problem.

    I’ve not done that yet, and I’m not sure I want to take it on, but I have looked at the GHCN monthly data and identified the most probable “carryover error” candidates in the mid-November GHCN 2008 update (including three northern Canadian stations, one of which is the Resolute station that Steve noticed). I’m not sure where to comment about this, and I’d prefer to comment on a post dedicated to that specific issue. Anyway, I can’t do anything more on this until I get back to my regular computer (hopefully tomorrow).

    I guess it was touched on here, but the post and comments were mostly about the missing stations. Still, if that’s the right place I’ll go there some time soon.

  45. crosspatch
    Posted Nov 22, 2008 at 5:47 PM | Permalink

    “I don’t think it’s necessary to have the code to do some useful checking.”

    DC, you have apparently been here for about a week. SteveM and the others on this site have been doing this for a considerable period of time though the focus does shift from time to time to one thing or another. I don’t want to discourage you from making a contribution but you might want to keep in mind that the ftp sites involved are probably being scoured for relevant data by a dozen or more people at this point. The obvious places one might find by looking at a README file have likely been visited already.

    ‘I think taking a subset of GHCN station daily data for say 2008, generating averages for each month and comparing that to the published data would at least reveal the current scope of the “carryover” problem.’

    How so? There might be only one such error one month, none the next month, and dozens of them another month. And 2007 might be completely different than 2008. In other words, the carryover problem doesn’t seem to be consistent. Because one happened at one month at one station, there is nothing that would indicate that the same station the same month would have the carryover problem in another year. Or that 2008 results would have any relevance to 2007 or that first half of 2008 bears any relationship to the second half of 2008 though there might be a pattern to what triggers these errors (I suspect it is some “magic” combination of missing values somewhere but that is pure speculation). All one could say from a result of the exercise you suggest is that so far in 2008 there have been X number of these errors.

    SteveM has already stated that he has found at least one similar error in a spring reading that results in a month being much colder than it should be. These errors might be sprinkled at random throughout the record. We already know they are there, that means the monthly record is incorrect. How incorrect it is seems irrelevant at this time since finding the source of the problem, fixing it, and regenerating the monthly data will show exactly how wrong it was.

    So it appears SteveM et al have discovered what appear to be two different problems. One was a monthly value being carried over to a subsequent month and a second problem where the set of data posted on the archive site is apparently not the data used to generate the published results. When SteveM brought the second problem to their attention, the relevant data began to disappear from the archive site without any explanation or ETA on when correct data would appear. This must be very frustrating for the researcher, particularly when different versions of data appear and disappear in rapid succession. It forces one to start over again from the beginning only to have the rug pulled out again when the data is changed again. But see, DC, these are the kinds of issues that have been typical whenever a problem has been discovered. Antics that would be quite unacceptable from a citizen or private company are de rigeur for these agencies. Could you imagine if an auto company were required to archive emissions control data for the EPA and engaged in such antics when an error is discovered? They would be fined or possibly prosecuted. This is particularly irritating because many of us work in occupations that do require rigid standards for data handling / archiving and the blatant deletion of incorrect data would be grounds for dismissal from most private entities yet is “business as usual” for our government agencies, it would seem. It is frustration more than conflict, I think.

  46. Hu McCulloch
    Posted Nov 22, 2008 at 7:36 PM | Permalink

    Re RomanM, #22,
    Thanks for the explanation of NOAA’s “first difference method”. Your post made a lot more sense than “Deep Climate”‘s citation of the definition of first differencing at #99 on the “Anomalous in Finland” thread.

    As described, the “First Difference Method” makes sense when not all stations cover the whole time period. However, when a single observation is missing for one station, it appears to treat the resumed data as if it were a new station, and therefore loses the valuable information that this is in fact the same station.

    So apparently it’s just as well if NOAA doesn’t really use it, as now claimed by DC.

    • Posted Nov 23, 2008 at 7:40 AM | Permalink

      Re: Hu McCulloch (#62), their so-called first-differencing method is more like a 12-differencing method given that the time unit in consideration is a month.

      I am not a fan of the anomaly method (especially the freedom it affords its users in picking the ‘normal’ period).

      Just two minor observations.

      — Sinan

  47. Deep Climate
    Posted Nov 23, 2008 at 3:16 AM | Permalink

    The NOAA discussion of transition from FD to anomaly method is here:

    I don’t really want to get into a discussion about “antics” and so forth, although I know your POV is fairly common here.

    As far as having something useful to contribute, I’ll leave it to others to judge:
    http://www.climateaudit.org/?p=4370#comment-313046 on GHCN monthly.

  48. JimB
    Posted Nov 23, 2008 at 6:10 AM | Permalink

    I believe that you bring value to the table and to the debate. I also think you have a sincere interest in doing science, regardless of what the results of that science determine (else why do it?).
    I think what some posters were/are trying to articulate is that Steve is very well known by pretty much every government climate office/agency in the system.
    It’s probably fair to say that getting a very polite email from Steve with questions regarding some data stream/file that your office is responsible for is…well…I imagine it’s pretty much like coming to work on Monday morning and discovering 60 Minutes has set up at the front door. It’s really the proverbial turd in the punchbowl.

    So it’s really not a question of “Geee Steve, if you’d just asked nicely, none of this would be necessary, they would have gladly sorted things out and thanked you very muchly!”

    Keep up the good work…


    • Kenneth Fritsch
      Posted Nov 23, 2008 at 11:37 AM | Permalink

      Re: JimB (#64),

      I have refrained from commenting on the DC exchange, but now that it appears to have ended, I will.

      I think once DC got by the finger wagging at Steve M on how to get NOAA responses and over the admonitions to pay more attention to what the NOAA website was showing, he well could have done some work to reveal a better path for auditing and made a request of NOAA on his own.

      As far as I can tell he did neither.

  49. Deep Climate
    Posted Nov 23, 2008 at 12:54 PM | Permalink

    #64, 66:
    I am planning to get in touch with NOAA, actually. The three points I want to raise are:

    1) GHCN – Request for documentation on monthly averaging, and communication of my preliminary data quality report (for more, see comments ff:

    2) Update of blended zonal monthly data (for use in UAH/RSS tropical troposphere analysis)

    3) Request to clarify/update documentation in ghcn/v2/grid and GHCN information page (right now they conflict), as well as estimated date for update of the gridded mean temp dataset(s).

    As I’ve said before, #3 is the lowest priority for me, but I will give it a shot if there is no update soon.

    Yes, it’s based on monthly series. It’s FD in the sense that the data is broken up into twelve separate time series; it seems this is considered a disadvantage by the researchers.

    As discussed above, my inital suggestions for “auditing” GHCN monthly average are to do two types of independent checks on the data:

    a) Look for zero transitions in months that should show significant differences
    b) Calculate a subset of averages from scratch and compare to GHCN output

    Actually the more I think about it the more I think (b) could be done for a year of data without too much trouble.

    See #59 etc.

    • Kenneth Fritsch
      Posted Nov 23, 2008 at 2:23 PM | Permalink

      Re: Deep Climate (#67),

      DC, I would recommend that you get to work doing your suggested actions and get back to CA when they are completed.

  50. crosspatch
    Posted Nov 23, 2008 at 1:02 PM | Permalink

    Well, antics was in the sense of: A ludicrous gesture or act; ridiculous behavior

    DC, if you had been watching this unfold over the years, by now what has happened would be “old hat”. The difference is that it is now NOAA and not GISS this time. We have seen repeatedly where data sets were replaced without comment. It is frustrating to download a version of a file, do some operation on it, and publish results only to find that others may not be able to duplicate your work because the file on the archive site, while still bearing the same name, now has different data. They changed it without comment or notification so all of the research done on the old file is now useless. And nobody knows if the new data carries different errors than the first one did so everything now needs to start from scratch. Where a note in a CHANGELOG file might easily explain what was done and why.

    I don’t think people are asking for anything unreasonable. The files should bear a version number and when they are changed, the version number incremented and a note made in a change log file. That might go a long way to easing the frustration. Replacing files several times with identical names in a short period of time, deleting them, etc. all seem to be “antics” that frustrate anyone researching those files.

  51. Craig Loehle
    Posted Nov 23, 2008 at 2:11 PM | Permalink

    “Hello Viet Nam” just came on my local cable channel–how weird is that?

  52. Deep Climate
    Posted Nov 23, 2008 at 6:24 PM | Permalink

    I had already reported step (a) (at least for 2008) of my proposed check on GHCN momthly here:
    When I have more on GHCN, I’ll comment on the most recent GHCN thread.

    • Kenneth Fritsch
      Posted Nov 24, 2008 at 10:30 AM | Permalink

      Re: Deep Climate (#71),

      DC, I think the bigger picture intent of these audits at CA is to determine the general reliability that can be expected of these temperature data sets. I have compared UAH/RSS to GISS at the links below and I suspect that real differences exist. In order to make a more detailed analysis of these differences, I would judge it important to determine the QC/QA involved and willingness of the data set owners/keepers to reveal their methods and explain the errors detected in their data sets.

      I may be wrong in my impression, but you seem more intent on busily showing that this particular error(s) was inconsequential to the overall picture.

      Re: Kenneth Fritsch (#138), Re: Kenneth Fritsch (#140),

      • Deep Climate
        Posted Nov 24, 2008 at 11:09 AM | Permalink

        Re: Kenneth Fritsch (#76),

        There was a legitimate concern raised about the existence of the “carryover” error in earler months, and there were even two concrete examples (Resolute and Finland). So I thought it was a relevant investigation. Actually, I was surprised to find only three definite errors, given that Steve had stumbled on one of them more or less by accident. But it still needs to be fixed.

        I have no idea what the 2007 GHCN monthlies will reveal. I’ll have a look at it later today when I get a few minutes.

        Your suggested links point back to this thread, so I can’t find them. Can you point me to the right thread? Thanks.

  53. Mike C
    Posted Nov 23, 2008 at 7:48 PM | Permalink

    I got in touch with NOAA several times last year and still haven’t heard back from them.

  54. Mike Bryant
    Posted Nov 23, 2008 at 8:29 PM | Permalink

    hey Mike C.,
    Mabe someone needs to go knock on their door.

  55. JimB
    Posted Nov 23, 2008 at 9:00 PM | Permalink

    Mike B:
    Would that be the front door?


  56. Steve McIntyre
    Posted Nov 24, 2008 at 10:29 AM | Permalink

    A new version of ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/grid/grid_1880_2008.dat.gz was posted on the NOAA website today without explanation or apology.

    PS. It appears to be the same version as the one that was deleted last week. Gridcell 329 is the same as before and is still problematic.

  57. Posted Nov 30, 2008 at 7:49 AM | Permalink

    Heads up!


    File:grid_mean_temp_1880_current.dat.gz 2434 KB 2008/11/28 07:00:00 PM

    I haven’t looked at the file yet.

    — Sinan

    • Posted Nov 30, 2008 at 8:03 AM | Permalink

      Re: Sinan Unur (#79), looks like a false alarm. This still contains the file from April, 2007. Sorry, got a little excited at first.

      — Sinan

  58. Posted Dec 3, 2008 at 5:44 PM | Permalink

    I just noticed that there is now a new version of grid_1880_2008.dat.gz on the server.

    I do not have a copy of the previous version but I am hoping Steve does so he can compare if anything has changed.

    — Sinan

  59. Deep Climate
    Posted Dec 5, 2008 at 2:08 PM | Permalink

    It’s December now – so this is the first to have November, 2008 data. It would normally be updated at least once a month at least to incorporate the latest month’s data.

One Trackback

  1. […] Steve McIntyre of Climate Audit is the master of a devastating understatement that highlights hubris and reduces mountains to molehills with the flick of a wit… and today slips in a vid, Hello, Viet Nam! as a sweet underline of relativity, aka reality. […]

%d bloggers like this: