Reading Data

There is an interesting collection of data from 223 Russian stations here mentioned by both Jerry Brennan and myself. These 223 stations are precisely the same as the 223 stations in CDIAC’s NDP048, conveniently listed here . CDIAC NDP-048 includes an extensive discussion of station history.

The versions at typically go to 1995, while NDP048 stops in 1990. The data at is stored in files coded by WMO number. Nicholas has kindly developed a method of reading the zip data into R. I’ve applied this to make a function read.meteo (see first post below) which will return an annualized anomaly series from the zip files. I’ve done similar functions for HadCRU3, GHCN v2 and will post up a collation of these functions some time.

Nicholas has also developed a method for downloading GISS station data directly from R. I’ll post this up as accessing GISS station data has been exceedingly frustrating since there is no large data archive and, without Nicholas’ skill, you have to do this manually through the web page.


  1. Steve McIntyre
    Posted Mar 2, 2007 at 10:19 AM | Permalink

    read.meteo<-function(id0) {
    test<-try(download.file( file.path(url,name0),name0, "auto", quiet = FALSE, cacheOK = TRUE, mode = "wb"));
    if(class(test)=="try-error") read.meteo<-test else {
    data_handle <- unz(name0,paste("F",id0,"V4.M02",sep="") ,"r")
    Data <- scan(data_handle, what = list(rep(0,14)), comment.char = "\x1a");
    temp<-(Data== 9999);Data[temp]<-NA

    chron<-t( array(chron,dim=c(12,length(chron)/12) ) )
    read.meteo<-ts(apply(chron,1,mean,na.rm=TRUE),start=min(year1) )}


  2. MilanS
    Posted Mar 2, 2007 at 12:58 PM | Permalink

    Thanks, Steve, it works fine. However, I had to deal with a minor problem – the chars ” (HEX94) in the script have to be exchanged by char ” (HEX24). Apparently, the blog software causes this inconvenience, because I see that it converts both the characters in my text to the char ” (HEX94) in the preview. If the problem is known, you can delete my post.

  3. Steve McIntyre
    Posted Mar 2, 2007 at 2:22 PM | Permalink

    #2. I’ll post this (and other similar read scripts) as an ASCII file.

  4. JerryB
    Posted Mar 2, 2007 at 3:34 PM | Permalink

    There seems to be a problem, an apparent conflict between
    the description of the layout of the data, and the actual
    layout of the data.

    The description indicates TMIN, followed by TMID, followed
    by TMAX. However, perusing the file for Karsakpaj, I notice
    that when two observations are present, but not three, the
    two are supposedly TMIN and TMID, or TMID and TMAX, but
    seemingly never TMIN and TMAX.

    My guess is that the placement of the data within the files
    is not correct when two, and not three, observations are
    listed; that the two are actually TMIN and TMAX, and that
    the misplacement accounts for the apparent discrepancy that
    I mentioned in another thread between data and
    GHCN daily data for Karsakpaj.

    Be careful out there. 🙂

  5. JerryB
    Posted Mar 2, 2007 at 4:09 PM | Permalink

    When TMIN, TMID, and TMAX are all present, TMID seems usually *not* to be (TMIN+TMAX)/2.

    Be really careful out there. :-0

  6. Steve McIntyre
    Posted Mar 2, 2007 at 10:37 PM | Permalink

    #5. Jerry – can you determine what TMID is?

  7. JerryB
    Posted Mar 3, 2007 at 8:13 AM | Permalink


    TMID seems to be an average of several readings:

    Two excerpts from ndp040.txt:

    “Daily mean, minimum, and maximum temperatures are available (to the
    nearest tenth of a degree Celsius) for each station. Temperature
    observations were taken eight times a day from 1966-89, four times a day
    from 1936-65, and three times a day from 1881-1935. Daily mean
    temperature is defined as the average of all observations for each
    calendar day.”

    See Table 1 in ndp040.txt for more details.

    “The amount of missing data varies from element to element and station to
    station. Typically, the records of minimum/mean temperature are more
    complete than those of maximum temperature and rainfall.”

    It seems that when only two temps are listed, they probably are not TMIN
    and TMAX, and my previous guess on that aspect was probably in error.

  8. Steve McIntyre
    Posted Mar 3, 2007 at 8:34 AM | Permalink

    #7. I’m going to try to compare a daily archive from GSN to a daily archive at meteo and perhaps ndp048 for a site or two and see what turns up.

  9. JerryB
    Posted Mar 3, 2007 at 8:53 AM | Permalink

    Re #8,

    I bet a donut, or even two, that the GSN will not have the TMID numbers; both of the others will.

  10. Steve McIntyre
    Posted Mar 3, 2007 at 10:45 AM | Permalink

    #9. I’ve looked at daily data for site 29612 – Barabinsk where both GSN and meteo-ru have long daily archives. Where both sites have values, they are identical. There are quite a few daily readings in which there is a TMIN but TMAX. Curiously GHCN has a number of months in which an average is given but there are no daily max values in the entire month. I wonder what they did. Either they estimated the monthly mean from the monthly minimum or there’s some other daily data which has strangely got lost from these two archives.

  11. JerryB
    Posted Mar 3, 2007 at 12:00 PM | Permalink

    #9, I would guess that the TMID was used to estimate the monthly mean,
    and that when TMIN and TMAX were available, they were simply archived.

  12. JerryB
    Posted Mar 3, 2007 at 12:08 PM | Permalink

    BTW, my impression is that, except in the GHCN adjusted files,
    the GHCN monthly mean, min, and max, numbers are all calculated
    by someone other than GHCN people.

  13. JerryB
    Posted Mar 3, 2007 at 2:12 PM | Permalink

    I found my old program for comparing GHCN monthly means with
    their min and max monthly data, and ran it on Barabinsk,
    for which GHCN has three sets of monthly means, each with
    different numbers. For one of those sets, there are also
    monthly min and max numbers in GHCN for several years,
    and when min and max numbers are present, the mean for that
    set is (min+max)/2, and it usually differs from the numbers
    in the other two sets.

    I also remembered the name of the file that I posted in January
    2005 in which I listed those instances in which at that time the
    GHCN monthly mean differed from (min+max)/2 by more than 0.1 C :

  14. JerryB
    Posted Mar 3, 2007 at 6:28 PM | Permalink

    Some counts of ndp040 data, which I suspect will match those
    in files:

    Total number of
    month/year/station sets 183493

    Sets containing:

    All three temp types
    TMIN TMID TMAX 159522

    Two of three temp types
    TMIN TMID 17030
    TMID TMAX 1341
    TMIN TMAX 67

    One of three temp types
    TMIN 31
    TMID 5292
    TMAX 16

    No temperature data 194

    PRCP 182480
    No PRCP 1013

  15. JerryB
    Posted Mar 3, 2007 at 6:54 PM | Permalink

    Some plain counts of data types in ndp040:

    176661 TMIN
    183196 TMID
    160957 TMAX
    182491 PRCP

    indicate that some of the numbers in my previous comment don’t
    quite add up to what they should. Close, but not exact.

%d bloggers like this: