Downloading UWisc Data

The Jeffs have been exploring the UWisc AVHRR data which is stored as a lot of gzipped ncdf files. I couldn’t figure out how to download and open this data into R. So, as I usually do in these cases, I asked CA reader Nicholas who, as usual, has a solution. I edited this slightly to make a function which downloads and unzips the file into a temporary file which can be opened in ncdf and treated in usual ways. The call url can be structured to scrape data.

download.gz.cdf=function(url,tempname=”temp.gz”) {
download.file(url, tempname, mode=”wb”)
# decompress the gzipped data into RAM
gz < – gzfile(tempname, "rb")
data <- readBin(gz, "raw", 104857600)
close(gz)
# write the decompressed data back to the temporary file
tempfile <- file(tempname, "wb")
writeBin(data, tempfile)
close(tempfile)
}

For example:

year=1982; hour=”0200″;tempname=”temp.gz”
library(ncdf)
url=paste(“ftp://stratus.ssec.wisc.edu/pub/appx/Antarctica/”,year,”/”,hour,”/mean_”,year,”0199_”,hour,”.params.cdf.gz”,sep=””)
#this creates a structured url; structures vary
download.gz.cdf(url,tempname=”temp.gz”)
nc < – open.ncdf(tempname)

Then you have to parse the netcdf object to try to find things, but this is standard parsing.

35 Comments

  1. Posted Mar 27, 2009 at 2:21 PM | Permalink | Reply

    We used this library found by JeffC

    http://cran.r-project.org/web/packages/ncdf/index.html

    It did a fine job turning the cdf files into R objects.

    • Steve McIntyre
      Posted Mar 27, 2009 at 3:07 PM | Permalink | Reply

      Re: Jeff Id (#1), I’ve used this for a long time. The issue is not reading netcdf into R but *.cdf.gz files. Didn’t you do this in Perl or something external?

      • Posted Mar 27, 2009 at 3:12 PM | Permalink | Reply

        Re: Steve McIntyre (#3),

        I tried for a long time to do it in R. I’ll try your script out later because I couldn’t make it work, and you didn’t leave your cell number on the blog when you went on vacation ;) .

        • Steve McIntyre
          Posted Mar 27, 2009 at 3:17 PM | Permalink

          Re: Jeff Id (#5), calling me wouldn’t have helped. This needed Nicholas’ magic.

  2. Jeff C.
    Posted Mar 27, 2009 at 3:05 PM | Permalink | Reply

    Jeff Id has has a script that takes the unzipped monthly files, extracts the temperature data, and then combines them into a 3 dimensional R object (month,x,y). There are two objects, one for the 0200 files and one for the 1400 files. My script (copied below) then does the following:

    Opens 1400 3D R object
    Converts from Kelvin*10 to Celcius
    Averages four cells to one (from 25 x 25 km to 50 x 50 km)
    Calculates new lat-lon for new 50 x 50 km cells
    Strips out cells outside of the continent or ice shelves leaving 5509 cells
    (calls external file with non-ocean cell laocations)
    Converts to anomalies

    Repeats process for 0200 3D R object

    Averages 1400 and 0200 anomalies

    Writes files to disk

    The external file for the ocean mask is too big to put here but I can forward it. Figuring out that mask was the most difficult part of the task.

    #### Processes AVHRR monthy mean data ####
    #
    # input files required:
    # “slon25.img.cdf”, “slat25.img.cdf” lat-lon files
    # “antarctic_ocean_mask1.csv” list of cells to be included
    # “AVHRR1400 1982-2004.rts” AVHRR 1400 temperature file (3D)
    # “AVHRR 0200 1982-2004.rts” AVHRR 0200 temperature file (3D)
    #

    #### Calculate lat lon for 5509 cells ######

    #read in lat-lon and parse to continent/shelves only
    # cell size enlarged to 50 x 50 km

    library(ncdf)
    lon_file=open.ncdf(“slon25.img.cdf”)
    lat_file=open.ncdf(“slat25.img.cdf”)
    lon=get.var.ncdf(lon_file,”lon”)/100
    lat=get.var.ncdf(lat_file,”lat”)/100
    #write.csv(lon,”lon_coord_all.csv”)
    #write.csv(lat,”lat_coord_all.csv”)

    #convert to polar xy coordinates for averaging
    x=(lat+90)*sin(lon*pi/180)
    y=(lat+90)*cos(lon*pi/180)

    #x average four points to one
    #doesn’t use last row or first column to match steig grid
    x1=x[seq(1,319,by=2),seq(2,320,by=2)]
    x2=x[seq(1,319,by=2),seq(3,321,by=2)]
    x3=x[seq(2,320,by=2),seq(2,320,by=2)]
    x4=x[seq(2,320,by=2),seq(3,321,by=2)]
    x_avg=(x1+x2+x3+x4)/4 #x in 160×160 grid
    rm(x,x1,x2,x3,x4) #remove variables from workspace

    #y average four points to one
    #doesn’t use last row or first column to match steig grid
    y1=y[seq(1,319,by=2),seq(2,320,by=2)]
    y2=y[seq(1,319,by=2),seq(3,321,by=2)]
    y3=y[seq(2,320,by=2),seq(2,320,by=2)]
    y4=y[seq(2,320,by=2),seq(3,321,by=2)]
    y_avg=(y1+y2+y3+y4)/4 #y in 160×160 grid
    rm(y,y1,y2,y3,y4) #remove variables from workspace

    #convert back to lat-lon, combine to one variable
    lat_avg=((x_avg^2+y_avg^2)^0.5)-90
    lon_avg=atan2(x_avg,y_avg)*180/pi
    dim(lon_avg)=c(25600,1)
    dim(lat_avg)=c(25600,1)
    coord_avg=cbind(lon_avg,lat_avg)

    #strip out ocean data points – ##ocean mask##
    ocean_mask=scan(“antarctic_ocean_mask1.csv”,sep=”,”)
    coord_5509=coord_avg[ocean_mask,]
    rm(lon_avg,lat_avg)
    write.csv(coord_5509,”coord_5509.csv”,row.names=F)

    ## lat-lon complete ##

    ###### process AVHRR 1400 Data ########

    #read in AVHRR 1400 Data and parse to continent/shelves only
    # cell size enlarged to 50 x 50 km
    load(file=”AVHRR1400 1982-2004.rts”)
    month.temp=fu[1:276,,]/10-273.15 ##puts temps in 276x321x321 grid, convert to deg C
    rm(fu) #removes fu from workspace

    #average four points to one
    #doesn’t use last row or first column to match steig grid
    month.temp1=month.temp[,seq(1,319,by=2),seq(2,320,by=2)]
    month.temp2=month.temp[,seq(1,319,by=2),seq(3,321,by=2)]
    month.temp3=month.temp[,seq(2,320,by=2),seq(2,320,by=2)]
    month.temp4=month.temp[,seq(2,320,by=2),seq(3,321,by=2)]
    month.temp_avg=(month.temp1+month.temp2+month.temp3+month.temp4)/4 #temps in 160 x 160 grid
    rm(month.temp,month.temp1,month.temp2,month.temp3,month.temp4) #remove variables from workspace

    #strip out ocean data points – ##ocean mask##
    dim(month.temp_avg)=c(276,25600)
    temp14_5509=ts(month.temp_avg[,ocean_mask],start=1982,frequency=12)
    #write.csv(temp14_5509,”AVHRR1400_1982-2004_5509_points.csv”)

    rm(month.temp_avg) #remove variables from workspace

    #calculate monthly mean for all 5509 cells
    #calculate anomalies for all 5509 cells
    a=as.numeric(temp14_5509)
    dim(a)=c(276,5509)
    anom14_5509=array(NA,dim=c(276,5509))
    mean14_5509=array(NA,dim=c(12,5509))
    for(i in 1:5509){
    b=a[,i]
    dim(b)=c(12,23)
    mean14_5509[,i]=rowMeans(b,na.rm=TRUE) #calculates monthy mean for all cells
    anom14_5509[,i]=a[,i]-mean14_5509[,i]} #calculates anomalies for all cells
    rm(b,a) #removes temp variables from workspace

    #convert anoms array to time series, calculate continent mean
    anom14_5509=ts(anom14_5509,start=1982,frequency=12)
    temp14_cont=ts(rowMeans(temp14_5509),start=1982,frequency=12,na.rm=T)
    anom14_cont=ts(rowMeans(anom14_5509),start=1982,frequency=12,na.rm=T)

    ## AVHRR 1400 complete ##

    #### process AVHRR 0200 Data ######

    #read in AVHRR 0200 Data and parse to continent/shelves only
    # cell size enlarged to 50 x 50 km
    load(file=”AVHRR 0200 1982-2004.rts”)
    month.temp=su[1:276,,]/10-273.15 ##puts temps in 276x321x321 grid, convert to deg C
    rm(su) #removes su from workspace

    #average four points to one
    #doesn’t use last row or first column to match steig grid
    month.temp1=month.temp[,seq(1,319,by=2),seq(2,320,by=2)]
    month.temp2=month.temp[,seq(1,319,by=2),seq(3,321,by=2)]
    month.temp3=month.temp[,seq(2,320,by=2),seq(2,320,by=2)]
    month.temp4=month.temp[,seq(2,320,by=2),seq(3,321,by=2)]
    month.temp_avg=(month.temp1+month.temp2+month.temp3+month.temp4)/4 #temps in 160 x 160 grid
    rm(month.temp,month.temp1,month.temp2,month.temp3,month.temp4) #remove variables from workspace

    #strip out ocean data points – ##ocean mask##
    dim(month.temp_avg)=c(276,25600)
    temp02_5509=ts(month.temp_avg[,ocean_mask],start=1982,frequency=12)
    #write.csv(temp02_5509,”AVHRR0200_1982-2004_5509_points.csv”)

    rm(month.temp_avg)

    #calculate monthly mean for all 5509 cells
    #calculate anomalies for all 5509 cells
    a=as.numeric(temp02_5509)
    dim(a)=c(276,5509)
    anom02_5509=array(NA,dim=c(276,5509))
    mean02_5509=array(NA,dim=c(12,5509))
    for(i in 1:5509){
    b=a[,i]
    dim(b)=c(12,23)
    mean02_5509[,i]=rowMeans(b,na.rm=TRUE) #calculates monthy mean for all cells
    anom02_5509[,i]=a[,i]-mean02_5509[,i]} #calculates anomalies for all cells
    rm(b,a) #removes temp variables from workspace

    #convert anoms array to time series, calculate continent mean
    anom02_5509=ts(anom02_5509,start=1982,frequency=12)
    temp02_cont=ts(rowMeans(temp02_5509),start=1982,frequency=12,na.rm=T)
    anom02_cont=ts(rowMeans(anom02_5509),start=1982,frequency=12,na.rm=T)

    ## AVHRR 0200 complete ##

    #### combine 1400 and 0200 data sets ######
    #set from 1400 and 0200 AVHRR are averaged into one set
    temp_5509=(temp14_5509+temp02_5509)/2
    anom_5509=(anom14_5509+anom02_5509)/2
    anom_cont=(anom14_cont+anom02_cont)/2

    write.csv(temp_5509,”temp_5509.csv”,row.names=F)
    write.csv(anom_5509,”anom_5509.csv”,row.names=F)
    write.csv(temp02_5509,”temp02_5509.csv”,row.names=F)
    write.csv(temp14_5509,”temp14_5509.csv”,row.names=F)
    write.csv(anom02_5509,”anom02_5509.csv”,row.names=F)
    write.csv(anom14_5509,”anom14_5509.csv”,row.names=F)
    #write.csv(anom02_cont,”anom02_cont.csv”,row.names=F)
    #write.csv(anom14_cont,”anom14_cont.csv”,row.names=F)
    #write.csv(temp02_cont,”temp02_cont.csv”,row.names=F)
    #write.csv(temp14_cont,”temp14_cont.csv”,row.names=F)

    • Steve McIntyre
      Posted Mar 27, 2009 at 3:11 PM | Permalink | Reply

      Re: Jeff C. (#2), RomanM had a clever way of extracting an Antarctica mask:

      library(mapproj)
      trans.spole = function(lat,lon,R=1){
      crad = pi/180
      x = R*sin(crad*(90-lat))*sin(crad*lon)
      y = R*sin(crad*(90-lat))*cos(crad*lon)
      list(x = x, y = y)}

      temp = map(“world”, plot=F)
      anta = map(“world”,region = temp$names[grep("Anta",temp$names)],plot=F)
      anta.p = trans.spole(anta$y,anta$x) # #convert to south polar view

      anta$x=anta.p$x
      anta$y=anta.p$y
      rm(anta.p,temp)
      length(anta$x) #[1] 1342

      • Jeff C.
        Posted Mar 27, 2009 at 3:58 PM | Permalink | Reply

        Re: Steve McIntyre (#4), This is clever. I think it only gives us the continent and doesn’t include the ice shelves though. I did it the hard way of overlaying the Steig cells on top all the AVHRR cells in Excel. I then deleted all the AVHRR cells that didn’t fall in the Steig coverage area. I ended up with 5509 cells, the same number as Steig, but the points don’t exactly overlay his (but are very close). The results is a list of 5509 cells IDs from the original AVHRR that the program keeps and discards the rest.

      • RomanM
        Posted Mar 27, 2009 at 4:36 PM | Permalink | Reply

        Re: Steve McIntyre (#4),

        I made a minor improvement in the map by getting rid of an island well north of Antartica. It caused the map boundaries to be non-square causing problems.

        library(map)
        trans.spole = function(lat,lon,R=1){
        crad = pi/180
        x = R*sin(crad*(90-lat))*sin(crad*lon)
        y = R*sin(crad*(90-lat))*cos(crad*lon)
        list(x = x, y = y)}

        temp = map(“world”, plot=F)
        anta = map(“world”,region = temp$names[grep("Anta",temp$names)],plot=F)
        excl = which(anta$y > -60)

        anta.p = trans.spole(anta$y[-excl],anta$x[-excl]) # #convert to south polar view
        anta$x=anta.p$x
        anta$y=anta.p$y
        rm(anta.p,temp)
        length(anta$x) #[1] 1338

        It works better now.

        • Steve McIntyre
          Posted Mar 27, 2009 at 5:00 PM | Permalink

          Re: RomanM (#12), we don’t quite get a mask of the Wisconsin pixels from this as is. Here’s a pretty good approxmation using the Steig coordinates in grid.info.

          grid.info$longclass=round(grid.info$long)
          test=tapply(grid.info$lat,factor(grid.info$longclass,levels=-179:180),max)
          test=c(test[360],test);names(test)[1]=”-180″
          temp=!is.na(test)
          h=approxfun(as.numeric(names(test))[temp],test[temp])
          info$antmax=h(info$long)
          temp=(info$lat>info$antmax);sum(temp)
          sum(!temp) #5535

          This pares the Wisconsin pixels down to 5535 pixels for the continent.

    • Steve McIntyre
      Posted Mar 27, 2009 at 3:14 PM | Permalink | Reply

      Re: Jeff C. (#2),

      Jeff Id has has a script that takes the unzipped monthly files, extracts the temperature data, and then combines them into a 3 dimensional R object (month,x,y). There are two objects, one for the 0200 files and one for the 1400 files.

      The above method avoids the need for external unzipping. The unzipping is done within R. Then you can make the data matrix in a few lines. All you need to do is cycle through the years and months, picking out the temperatures from the ncdf object and collating them.

      I’m trying to figure out what the 19 parameters in the Wisc object actually represent, tho temperature seems to the first line.

  3. Steve McIntyre
    Posted Mar 27, 2009 at 9:00 PM | Permalink | Reply

    Here’s a way to scrape the UWisc data. I’ve got it running now and it will take a while. Each year as collated is about 3 MB (extracted from about 36 MB of data.)

    library(ncdf)
    download.gz.cdf=function(url,tempname=”temp.gz”) {
    download.file(url, tempname, mode=”wb”)
    # decompress the gzipped data into RAM
    gz < – gzfile(tempname, "rb")
    data <- readBin(gz, "raw", 104857600)
    close(gz)
    # write the decompressed data back to the temporary file
    tempfile <- file(tempname, "wb")
    writeBin(data, tempfile)
    close(tempfile)
    }

    year_range=1982:2004
    K=length(year_range)
    hour=c("0200","1400")
    month=paste(1:12);month[1:9]=paste("0",month[1:9],sep="")
    k=1
    for(year in year_range) {
    surft=array(NA,dim=c(2,12,321,321))
    for (k in 1:2) {
    for(j in 1:12) {
    url=paste("ftp://stratus.ssec.wisc.edu/pub/appx/Antarctica/&quot;,year,"/",hour[k],"/mean_",year,month[j],"99_",hour[k],".params.cdf.gz",sep="")
    #url="ftp://stratus.ssec.wisc.edu/pub/appx/Antarctica/1982/0200/mean_19820199_0200.params.cdf.gz&quot;
    test=try(download.gz.cdf(url,tempname="temp.gz"))
    if( !( class(test)=="try-error")) {
    nc=open.ncdf( "temp.gz", write=FALSE, readunlim=TRUE, verbose=FALSE)
    surft[k,j,,]=get.var.ncdf(nc, "tsurf") #321 321
    close(nc)
    }
    } #j
    }#k
    save(surft, file=paste("d:/climate/data/steig/wisc",year,".surft.tab",sep="") ) #dim 276 321 321
    } #year

  4. Steve McIntyre
    Posted Mar 27, 2009 at 9:49 PM | Permalink | Reply

    It took about 36 minutes to scrape the data. A peculiarity – there’s no monthly Wisc data for the last three months of 1994 – which, as I recall, is the time of the splice after NOAA-11 that bothered Ryan O.

    • Posted Mar 27, 2009 at 9:53 PM | Permalink | Reply

      Re: Steve McIntyre (#15),

      That’s right. JeffC and I mentioned that in another thread. Where did the data come from? I’m sure the NSIDC would have posted it if it existed in reasonable form. Still, Comiso’s data is visually an excellent match to surface stations and extremely clean.

  5. Posted Mar 28, 2009 at 9:20 AM | Permalink | Reply

    Dear Steve,

    well, that’s funny. I would be able to unzip the GZ archives, at least manually with WinRAR, and maybe even make this step automatic, but the mess in the resulting CDF file that I just see would probably look like too much of a good thing to me. ;-)

    Shouldn’t someone create a mirror where these files are translated to a somewhat more readable text form? It looks like a GB of data, not more. At Skydrive, a user has 25 GB of free space.

    Best
    Lubos

    • Steve McIntyre
      Posted Mar 28, 2009 at 9:35 AM | Permalink | Reply

      Re: Luboš Motl (#17),
      It won’t take me very long to collate the scraped data. I’ll post it up somewhere once I’ve done so. Within a few days as I’ve got a few other things to do.

  6. Posted Mar 28, 2009 at 9:33 AM | Permalink | Reply

    I will try to never underestimate Wolfram again! Of course, Mathematica can import all such CDF (and many other) files. Saving a file as mean.cdf,

    Import["mean.cdf", "Elements"]

    returns

    {“Annotations”, “Data”, “DataFormat”, “Datasets”, “Dimensions”, \
    “Metadata”}

    And then the commands like

    Import["mean.cdf", "Annotations"]

    and the same with “Data” and other elements probably give me everything useful that the CDF file contains. ;-)

    • Steve McIntyre
      Posted Mar 28, 2009 at 9:37 AM | Permalink | Reply

      Re: Luboš Motl (#18), Importing a gz-ipped cdf file may be a little trickier. Try it. I had to get help from Nicholas to figure it out in R.

    • Steve McIntyre
      Posted Mar 28, 2009 at 9:42 AM | Permalink | Reply

      Re: Luboš Motl (#18), Importing a gz-ipped cdf file may be a little trickier. Try it. I had to get help from Nicholas to figure it out in R.

      While I admire your purity in Mathematica, you should also consider the option of being slightly bilingual in R as the retrieval codes may not be worth trying to figure out in Mathematica.

  7. Posted Mar 28, 2009 at 9:48 AM | Permalink | Reply

    Dear Steve, while I admit that I have certain problems to import these particular GZ files, Mathematica automatically decompresses all GZ and similar files, so the only problem with the GZ compression is that one adds .GZ to the filename in the import command. See

    http://reference.wolfram.com/mathematica/ref/format/GZIP.html

    It saves a file into the temporary directory. But so far I am getting $Failed with these files…

    At any rate, if I were working with those files, I would probably first create a local copy of all of them, and if there were a problem with GZ, I would decompress all of them in one way or another.

    • Posted Mar 28, 2009 at 9:54 AM | Permalink | Reply

      Re: Luboš Motl (#22),

      When I did it I used some freeware to unzip the files before running R NCDF. I had to try several different softwares before I found a working one. What happened was certain software wouldn’t open these GZ’s, it was like there was something different in the GZ version.

  8. Posted Mar 28, 2009 at 10:10 AM | Permalink | Reply

    Dear Jeff Id, surprisingly, I was just going to write something that loosely contradicts your observation.

    The problem with importing the files above is probably related to the remote FTP URL (although Mathematica can read FTP files in general), not to a non-standard GZIP format.

    When I save a cdf.gz from the server locally and write

    Import["mean2.cdf.gz"]

    I obtain

    {“parameters”, “size”, “tsurf”, “broadalb”, “cldre”, “cldtau”, \
    “cldphase”, “cldtemp”, “cldpress”, “cldtype”, “pw”, “landmask2″, \
    “swdnsrf”, “lwdnsrf”, “swupsrf”, “lwupsrf”, “swdntoa”, “swuptoa”, \
    “lwuptoa”, “swcldfrc”, “lwcldfrc”, “levels”, “tprofave”, “wvprofave”, \
    “pprofave”}

    It clearly works, much like my WinRAR 3.71 that did work, too. (Hint: downloading serial keys for software via torrents is a theft.) Not sure why the remote FTP import has problems right now but the GZ, CDF packing seems to be perfectly OK. A more nontrivial example:

    Import["mean2.cdf.gz", "Annotations"] [[5]]

    returns

    {“Description” ->
    “Cloud particle effective radius (micros), multiplied by 10 in \
    2-byte integer data type.”}

    You know, R is free which is great but a commercial software can sometimes go beyond some limitations of free software. ;-) Home Edition Mathematica 7 is for USD 295 only and I guess it can do all these things.

    • Posted Mar 28, 2009 at 10:20 AM | Permalink | Reply

      Re: Luboš Motl (#24),

      I agree about the freeware, there are dozens of features in matlab which are superior to R but for people to work together and replicate each other R is an excellent software.

      Still, I did have problems with other freewares on these particular GZ files. No idea why.

  9. Posted Mar 28, 2009 at 10:37 AM | Permalink | Reply

    Dear Jeff Id, I agree with that. It’s cool when things are free and when people collaborate. :-) Right now, I have decoded the whole structure of a typical CDF.GZ file from the server.

    It is now pretty clear to me how the data is structured and what is in the file. For example, there are 25 pieces of data of “vectors” or “matrices”. Each of them comes with an annotation, information about the type of numbers in the vectors, their length, and the data themselves.

    See this PDF printout of a Mathematica notebook for a self-explanatory treatment of the structure of the file:

    http://lumajs.googlepages.com/cdf-import.pdf

    And please, don’t panic about GZ – it’s a standard triviality and if one decoder has problems, others surely don’t.

  10. Posted Mar 28, 2009 at 11:12 AM | Permalink | Reply

    You really do not have to mess around looking for software. Command line utilities provided by Info-ZIP are especially useful. You download them from, among other places, http://www.ctan.org/tex-archive/tools/zip/info-zip/WIN32/

    gzip is a standard format.

    — Sinan

  11. Posted Mar 28, 2009 at 11:13 AM | Permalink | Reply

    You probably know how to import CDF files into R, but if you don’t, here are some extra hints:

    http://cran.r-project.org/doc/manuals/R-data.pdf

    5.1 Binary data formats

    Packages hdf5, RNetCDF and ncdf on CRAN provide interfaces to NASA’s HDF5 ( Hierarchical Data Format, see http://hdf.ncsa.uiuc.edu/HDF5/ ) and to UCAR’s netCDF data files (network Common Data Form, see http://www.unidata.ucar.edu/packages/netcdf/ ). Both of these are systems to store scientific data in array-oriented ways, including descriptions, labels, formats, units, … HDF5 also allows groups of arrays, and the R interface maps lists to HDF5 groups, and can write numeric and character vectors and matrices. Package ncvar on CRAN provides a higher-level R interface to netCDF data files via RNetCDF.

    There is also a package rhdf5 available from http://www.bioconductor.org.

    • Kenneth Fritsch
      Posted Mar 28, 2009 at 12:45 PM | Permalink | Reply

      Re: Luboš Motl (#28),

      I have not been able to download a nc file directly to R, but only after downloading it locally to my computer and then loading it into R with normal prescribed commands – same for zipped nc files.

      This is way OT, but since you are commenting here, Sinan Unur, I need help in finding the link to your code for using the gridded GISS temperature data.

  12. Jeff C.
    Posted Mar 28, 2009 at 12:02 PM | Permalink | Reply

    I’ve been playing around with this data for the past two weeks or so and am heartened to see others digging into it. If there is one thing I have learned (actually re-learned) over the past few days is the more eyes reviewing things, the better.

    Here is a list of some things I have learned about the dataset. Some are firmer than others and I have tried to express my uncertainty in the comments below. Please let me know if anyone disagrees or has any additions. I hope I’m not rehashing things that are obvious; the intent is to put what I think I know in one place.

    1) The files are 0200 and 1400 hours. I originally though these were UTC, but the text says they are local times. This appears to mean that satellite passes from different times over different areas are patched together to form a “night” composite and a “day” composite (relatively speaking).
    2) The monthly means in this dataset was compiled using CASPR, documentation for CASPR can be found here and here. For this dataset, CASPR used the daily 5 km data parsed to every 5th point to create the 25 km data. Dr. Jeffrey Key of U Wisconsin is the author of CASPR and appears to have compiled this dataset.
    3) An extremely detailed description of the daily 5 km x 5 km dataset can be found here here. This page has a good description of the data, its shortcomings and known flaws, missing days and processing methodology.
    4) CASPR has multiple cloud-masking options for determining cloud cover. I can’t find a specific description of what setting was used for this dataset, but the text seems to indicate is was something of a “default” setting. The link in #3 has a good description of CASPR’s cloud masking methods.
    5) These data files are “all sky” (as opposed to “clear sky”). In all sky mode, CASPR uses an interpolation algorithm to infill areas it thinks are cloud-covered. It uses information from adjacent clear sky areas for the infilling. The author states that this infilling can have large errors as the clear sky areas can be at a considerable distance. They don’t quantify the error magnitude.
    6) From reading Comiso 2000, it’s not clear if Dr. Comiso used CASPR. He does seem to indicate that he discarded cloud covered areas. This could be akin to the clear sky mode of CASPR as opposed to the all sky mode used for this data set.
    7) In the SI, Dr. Steig describes the use of a daily filter where values that exceed +/-10 deg C of the climatological mean are considered cloud covered and discarded. When asked at RC, Gavin stated that “climatological mean” meant the monthly average over the entire satellite record. Since this dataset is already in monthly means, we obviously can’t apply a daily filter as Steig describes in the SI.
    8) The dataset contains a data grid called “cloudtype”. This value ranges from 0-99 and describes the percentage of time each cell experienced cloud cover during a given month. I’m working with this dataset to see if there are any obvious spatial/temporal patterns that might explain discrepancies with the Comiso dataset.

    I’ll put anything else up that I find. Some of the points above are speculative based on my reading of the documentation. Please let me know if anyone disagrees.

    • Kenneth Fritsch
      Posted Mar 28, 2009 at 12:27 PM | Permalink | Reply

      Re: Jeff C. (#29),

      I am not in a position to disagree, but your review was very helpful in allowing me to better understand the important issues involved in that part of the analysis. It is most appreciated by this blog reader when an “involved” poster/analyzer takes the time to review what has been covered.

      • Jeff C.
        Posted Mar 28, 2009 at 12:55 PM | Permalink | Reply

        Re: Kenneth Fritsch (#31), Thanks Kenneth. Steve has talked about using the blog as a daily journal. I see what he means as it is helpful for me to put what I know in one place so I can find it later. Advancing years with two small rugrats running around has given me a pretty serious case of CRS.

  13. Jeff C.
    Posted Mar 28, 2009 at 12:04 PM | Permalink | Reply

    Errr, the smiley face with sunglasses is supposed to be point #8.

Post a Comment

Required fields are marked *

*
*

Follow

Get every new post delivered to your Inbox.

Join 3,142 other followers

%d bloggers like this: