Here’s a major complaint about BEST now that I’ve looked at it more closely.
If BEST wanted to make their work as widely available as possible, then they should have done their statistical programming in R so that it was available in a public language. And made their data available as organized R-objects.
I’ve taken a quick look at their code which is in Matlab. I’ve browsed some of the code and it all looks like transliterated Fortran. I haven’t noticed much vector processing and list processing in R style. IMO, one of the great benefits of the vector and list processing features in R is that you can write scripts that clearly self-document what you’re doing as “scripts”. I haven’t seen anything in their code files that looks like the sort of R-script that I would like to see in order to follow the calculations.
I collated the station data and station information into two R-objects for interested readers so that data can be quickly accessed without having to compile rather large data sets.
The station information is uploaded to http://www.climateaudit.info/data/station/berkeley/details.tab. It is a dataframe of 39028 rows containing id, name, lat, long, etc, directly collated from the information in the BEST data. Some tidying of trailing spaces and use of NA has been done. It’s 1.8 MB.
The station data is uploaded to http://www.climateaudit.info/data/station/berkeley/station.tab. It’s organized in a style that I’ve used before: it is a list of 39028 objects, each object being a time series of the station data beginning in the first year of data. I didn’t collate the accompanying information about counts and uncertainty. Interested readers can consult the original data for these. Each of the 39028 objects is given a name corresponding to the id in the details file – which I’ve kept as a character object rather than a number (though it is a number). As an organized R-object, this is 39 MB, as opposed to 618 MB if data.txt in the PreliminaryTextDataset were expanded.
If you want to look at the BEST results for a given station, here’s how to do it quickly (and you want to keep the data in say directory d:/climate/data/berkeley). The example here is Detroit Lakes 1NNE, the subject of a number of posts in connection with Hansen’s Y2K:
destfile="d:/climate/data/berkeley/details.tab" download.file("http://www.climateaudit.info/data/station/berkeley/details.tab",destfile, mode="wb") load(destfile); nrow(details) #39028 destfile="d:/climate/data/berkeley/station.tab" download.file("http://www.climateaudit.info/data/station/berkeley/station.tab",destfile, mode="wb") load(destfile); length(station) #39028 details[grep("DETROIT L",details$name),1:5] # id name lat long alt #144289 144289 DETROIT LAKES(AWOS) 46.8290 -95.8830 425.500 #144298 144298 DETROIT LAKES 1 NNE 46.8335 -95.8535 417.315 u=station[[paste(144298)]] ts.plot(u,ylab="deg C (Seas Adj)",main="Detroit Lakes 1NNE")
There are some puzzles about the station data that I’ll discuss in another post.
Update: Nick Stokes reports:
I made a GHCN v2 version of data.txt. It’s here. I had to split into two, bestdata1.zip and bestdata2.zip (to fit site limit). Each is about 11 Mb. Units are 1/10 deg C. [SM note - this is the same data that I collated into an R-list. For R users, you're far better off using my collation than re-collating from this.]
There is also a zipfile called ALL4inv.zip which has inventories in CSV format of BEST, GHCN, GSOD and CRUTEM3. The fields don’t match GHCN, but may be the best that can be derived from what BEST has published.
There are also lots of KMZ/KML files. The one called ALL4.kmz combines GHCN, GSOD, BEST and CRUTEM3 with a folder structure to show by source and start date.