CA reader, Nicholas, an extremely able computer analyst, has helped me with a number of problems with downloading data in compressed formats into R. One of the most annoying and heretofore unsolved problems was how to get Z files into R without having to handle them manually – a problem that I revisited recently when I looked at ICOADS data which is organized in over 2000 Z files.
Z files are an obsolete form of Unix compression that is not even mentioned at zlib.com nor was it supported at R. So if you wanted to analyze a Z file in R, you had to download the file, unzip it manually using WinZip or equivalent and then start again.
I presume that this obsolete format fits in a ecological niche with Fortran, an antique computer language (one that I learned over 40 years ago and which, in comparison with a modern language like R, seems about as relevant as medieval Latin). Since most climate scientists appear to live in an ecological niche with Fortran and Unix, many climate data sets are only available in Z files, e.g. USHCN, GHCN, ICOADS, although a number of data sets are available in NetCDF format, which is accessible in R through the ncdf package.
Nicholas figured out how to uncompress Z files and contributed a package “uncompress” to R, which is online and downloadable as of today. You can install R packages easily within a session using the Install Packages button. There are a couple of little tricks in using the package to extract ASCII data so you have to pay close attention to the example. I did a test this morning and it worked like a champ. Here was my trial session (after installing uncompress). The flag in the rawToChar command is set here for Unix lines, which will be the relevant option in most of our applications.
handle < – url("ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/station.inventory.Z", "rb")
data <- readBin(handle, "raw", 9999999)
uncomp_data <- uncompress(data)
Data <- strsplit(rawToChar(uncomp_data), "\n")
Data = unlist(Data)
This returns an ASCII file, which can be handled conventionally using a variety of techniques. For large files, I usually use the substr command to parse columns out, but you could also write the file to a “temp.dat” file and read it using read.fwf or read.table or scan.
Anyway, it’s a great utility!!
PS. I asked any number of people about how to handle Z files in R without having to do it manually and got nowhere. I did learn about a number of annoying Windows mysteries and some interesting R techniques, which I’ll note here as a diary item. It turns out that you can run DOS commands out of R by using the system() command. The following command runs Firefox:
system(paste(‘”c:/Program Files/Mozilla Firefox/firefox.exe”‘,’-url cran.r-project.org’)) #
In order to run particular applications, on my machine in Windows, some would only run in a default directory. So the following command:
dir() # “COPYING.GZ”
dir() # “COPYING”
worked, but they didn’t work in any other directory. Go figure.
Note – the R function gzfile handles gz files just fine; I was using the gzip.exe program only for testing DOS commands within R.