Collated GISS Versions Online

I’ve loaded R-tables for the dset=1 and dset=2 versions. The R-tables are lists each 7364 long, each item is a station time series. The files are about 8 MB in size. I have a variety of little scripts to retrieve and analyze things. The data is located here:

http://data.climateaudit.org/data/giss/giss.dset1.tab
http://data.climateaudit.org/data/giss/giss.adj.tab

Each file can be downloaded as follows:

con < – url("http://data.climateaudit.org/data/giss/giss.dset1.tab&quot;)
load(con) #giss.dset1
length(giss.dset1) #7364

It took about 15 seconds to load on my cable which is high-speed.

The names are the station identifications 3-digit country plus 9-digit station identifier. So you can pull out a station as follows:

test=giss.dset1[[paste(“42574500003”)]] ;
tsp(test) # 1898.000 2006.917 12.000

It would be easy to make a NetCDF file from this and I’ll post one up if someone sends me a conversion script. I thought about posting up an ASCII version but it’s about 4 times larger and NOBODY should be using Excel for this type of thing. Get with the R-program.

29 Comments

  1. Willis Eschenbach
    Posted Aug 24, 2007 at 12:33 AM | Permalink

    Steve, many thanks for the info. A couple points, and one note:

    1. To get it to download on my machine, I had to use

    con = url(paste("http://data.climateaudit.org/data/giss/giss.dset1.tab"))

    in place of what you had posted above.

    2. To get the program line

    test=giss.dset1[[paste(”42574500003?)]]

    to work, I had to replace the styled quotes with regular quotes.

    After that, it ran fine.

    My question is, do you have a file with the lat/long, elevation, and population data for the stations?

    Keep up the good work, it’s much appreciated.

    Steve: The pasting of quotation marks from WordPress is problematic. You may need to re-type

    w.

  2. PHE
    Posted Aug 24, 2007 at 12:51 AM | Permalink

    What software is needed to read them?

  3. Louis Hissink
    Posted Aug 24, 2007 at 3:32 AM | Permalink

    NOBODY should be using Excel

    Hah

    dumber than dumb geologists love it.

    I don’t.

    Version 2007 is causing me heaps of grief.

  4. TAC
    Posted Aug 24, 2007 at 3:42 AM | Permalink

    Willis (#1), My computer (MacBook) seems to balk at the funny quotes in your R command:
    con = url(paste(”http://data.climateaudit.org/data/giss/giss.dset1.tab”))
    However, it works fine with SteveM’s command once you replace the and .lt. with an “=” (or push the two characters together to form the assignment operator).
    con = url("http://data.climateaudit.org/data/giss/giss.dset1.tab")

  5. Wolfgang Flamme
    Posted Aug 24, 2007 at 3:44 AM | Permalink

    Steve,
    thank you for sharing the data. Could you share the ‘little scripts’ as well, please?

  6. fFreddy
    Posted Aug 24, 2007 at 5:14 AM | Permalink

    Re #3, Louis Hissink

    Version 2007 is causing me heaps of grief.

    Any one who uses a Microsoft product in the same year as it was released is a braver man than I …

  7. Steve McIntyre
    Posted Aug 24, 2007 at 10:10 AM | Permalink

    giss.adj is now posted after re-scraping the USHCN sites.

  8. John Goetz
    Posted Aug 24, 2007 at 10:11 AM | Permalink

    Yes, I need to teach myself R, but I have so many other things I have to do as well, so for the time being I am stuck with Excel and Visual Basic, which is why I focus on one station at a time.

  9. Jon
    Posted Aug 24, 2007 at 11:05 AM | Permalink

    Steve: I’d like to re-import this information into a database. It would be nice if you could zip a csv file.

  10. Steve McIntyre
    Posted Aug 25, 2007 at 8:11 AM | Permalink

    I think that the giss.dset1 version that I archived was the version that had 335 stations with more than one time series. In any event, I’ve replaced this with the version that combines these.

  11. Kenneth Fritsch
    Posted Aug 25, 2007 at 10:08 AM | Permalink

    I thought about posting up an ASCII version but it’s about 4 times larger and NOBODY should be using Excel for this type of thing. Get with the R-program.

    There’s a comment that is getting as predictable as rain in Seattle. Unfortunately, after attempting some of these excercises with Excel, I have found that I have no choice but to follow Steve’s advice. I have had one false start on R, but I have noted that other posters here have evidently gotten with the program. I guess if I can remain content to learn from the information as presented by other users of R, I would not have to learn R. Unfortunately I like apparently others here like to play with data our way.

    Mean while, I anticipate some better understanding of the processes that go into the GISS data set from these excercises.

  12. steven mosher
    Posted Aug 25, 2007 at 10:23 AM | Permalink

    A climate “R” package would be cool.

  13. Steve McIntyre
    Posted Aug 25, 2007 at 12:38 PM | Permalink

    #12. Steve Mosher, I can’t figure out how to make an R-package. I’ve read some manuals but am stumped. If anyone knows of a template that was used for making a package, I’d be happy to work up a package. Actually there are about 10 packages that I could make.

  14. Steve McIntyre
    Posted Aug 25, 2007 at 12:41 PM | Permalink

    #11. Ken, I’m not saying this to be tiresome. There are benefits from a standing start ranging from the ability to download and read data into a program; R functions are easier to manage than Excel macros; you can handle much MUCH bigger data sets, it goes on and on. You’ll be able to do anything that you can presently do in Excel in about 15 minutes.

  15. Sam
    Posted Aug 25, 2007 at 12:59 PM | Permalink

    Steve or anyone,

    I’m wondering if you can help clear up my confusion.

    I have apparently been incorrect in my assumption that GISS data was used in USHCN and GHCN results. Rather, I hear that they use the data from NCDC.

    What purpose does Hansen’s GISS data provide if they are not used in the historical data bases? Or am I incorrect in this as well?

    Is the difference between the two (GISS and USHCN) merely in the way they manipulate (adjust?) the data since I believe they collect the data from the same weather station history data bases?

    I’d greatly appreciate any insight or corrections you can provide.

  16. Joel McDade
    Posted Aug 25, 2007 at 1:07 PM | Permalink

    A newbie R thread would be cool.

    I have been tackling R off and on for a few months. While I can’t describe exactly why, it *is* difficult. And heck, I used to be pretty much an expert in Fortran and semi-fluent in Delphi.

    I have purchased a couple of books, but as soon as I need to do something slightly different and deviate from the examples provided, I am usually stuck for hours. The help files only occasionally *help*, as you almost need to be an expert to understand them.

    Despite the learning curve, however, I’ve seen and learned enough to know that is the last language I will ever need or want to know.

  17. Richard
    Posted Aug 25, 2007 at 1:18 PM | Permalink

    Anyone have a conversion method to get the files into ASCII (for us too-old-to learn-anything-net Excel guys)?

    Thanks,

    Richard

  18. Kenneth Fritsch
    Posted Aug 25, 2007 at 1:21 PM | Permalink

    RE: #14

    #11. Ken, I’m not saying this to be tiresome.

    You are not being tiresome. I just had to comment, since when I saw a post that commented on downloading to Excel, I noted to myself that your reply was coming shortly. The other day, I could not download an .nc file, so I searched (and searched) for a source with a text file that I could download. I found it and felt a little satisfied, but realized full well that if one has a hard head one better have a strong back.

  19. Steve McIntyre
    Posted Aug 25, 2007 at 1:25 PM | Permalink

    USHCN is a collection of 1221 US stations in 4 flavors: raw, TOB-adjusted, adjusted, and urban-adjusted. USHCN is a subset of GHCN, which has 2 flavors, raw and adjusted. GISS uses GHCN and has two versions: raw and adjusted. Their raw version is sort of like GHCN adjusted, except when it’s GHCN raw. Many if not most GHCN series are not updated and current dated tends to be the online WMO network at GHCN daily.

  20. Sam
    Posted Aug 25, 2007 at 1:50 PM | Permalink

    re: #19

    Thanks Steve. So Hansen’s data is not used by GHCN although he uses their data.

  21. Sam
    Posted Aug 25, 2007 at 2:01 PM | Permalink

    Steve, sorry. One last question.

    Which source of data is ‘considered’ to be the authoritative source when describing US temperatures – GISS or NCDC, I guess that would mean either NASA or the National Climatic Data Center? Sorry if I may seem a little thick, but I understand that NCDC still clings to the old saws about the last decade being the warmest years and that they haven’t accepted Hansen’s corrections.

  22. steven mosher
    Posted Aug 25, 2007 at 2:54 PM | Permalink

    RE 13.

    SteveMC..
    I’m assuming you have hit this page

    http://developer.r-project.org/

  23. steven mosher
    Posted Aug 25, 2007 at 3:15 PM | Permalink

    RE 16 Joel.

    If you didnt grow up in Unix, Yacc, perl, C, and Vi, then
    I suspect R might be a steep ass hill to climb.

    I’m stuggling and I’ve done this crap before.
    However, I had a lobotomy when I left engineering.
    It was required as part of the promotion package.

  24. Mhaze
    Posted Aug 25, 2007 at 3:16 PM | Permalink

    RE #1

    Worked for me using a tablet computer, quite a surprise, also XP 64 bit.

    I suggest go back and review R documentation, pick the section dealing with grammer and syntax and then watch proper use of quotes, parenthesis, brackets, having a reference as to what Should Be Correct.

  25. Steve McIntyre
    Posted Aug 25, 2007 at 3:35 PM | Permalink

    I think that NOAA uses USHCN with some sort of regional weighting. To add to the brew there is USHCN version 1 and USHCN version 2( scheduled for release in July 2007 but still not released other than press release results).

  26. steven mosher
    Posted Aug 26, 2007 at 11:30 AM | Permalink

    RE 4.

    TAC thanks that worked for me

  27. steven mosher
    Posted Aug 26, 2007 at 11:39 AM | Permalink

    RE 13..

    Did you try here?

    http://www.maths.bris.ac.uk/~maman/computerstuff/Rhelp/Rpackages.html

    http://web.maths.unsw.edu.au/~wand/webcpdg/rpack.html

  28. Posted Aug 28, 2007 at 6:38 AM | Permalink

    Hi Steve and group,

    I’ve developed an R resource on http://www.stikir.com. It should be useful for those of you having trouble installing R, finding it difficult to climb the learning curve, or having problems accessing or using the large climate datasets.

    It’s a wiki site where you can enter R scripts. The server will run it and provide your results. Everyone can see and modify your analysis. Using this technology the provenance of the results is clear. The data, the analysis, and the results are all fully transparent and auditable.

    You can see some examples here:
    GHCN_Climate_Data_Sandbox

    If you press the “edit” button on the page, it should make sense how the tool can be used.

    The sample analyses are not particularly clever, but are provided to illustrate the functionality. I’ll happily lend a hand to people wanting to publish any more substantial analysis.

    The GHCN v2 temperature and precipitation data has been made available, as well as some Energy Information Administration data on carbon emissions. Steve, please let me know if this collated GISS data is in the public domain. If so, I’ll upload it to the site, along with any other data sets this community might be interested in.

    Regards,
    Mike

  29. steven mosher
    Posted Aug 28, 2007 at 7:32 AM | Permalink

    Mike,

    Very nice.