Lucia has been experimenting with model data downloaded from KNMI and I thought that I’d try to experiment with this a little from time to time.
Unlike Benjamin Santer, Ph.D. of the U.S. federal Lawrence Livermore Labs, who refused to provide any assistance whatever in providing the data used in Santer et al, Geert Oldenburgh, the KNMI scientist responsible for their public interface, is pleasant and cooperative though it seems that they aren’t used to automated data inquiry of the type that I expect to do.
Their webpage is designed for manual retrieval, but it can be pinged so that data can be directly downloaded into R. KNMI had no objections to me trying to do this and Oldenburgh helped as much as he could. I’ve managed to write a neat little retrieval script that picks up ensemble combinations but I’m stumped as to how to retrieve individual runs. It looks like it should be possible; KNMI has no objection to my doing so; maybe one of the computer-oriented readers can solve this little puzzle.
Here’s what I can do so far.
The access point are cgi commands in the URL that instruct the KNMI computer to compile averages. This works very quickly. Examples of these commands are:
http://climexp.knmi.nl/get_index.cgi?email=yourname@you&field=tas_bcc_cm1_20c3m&standardunits=true
or
If you insert this sort of URL in your browser, it will ping the KNMI computer to produce ensemble temperature averages for the region specified by the lat-long (default – global.) The HTML page that is produced contains information on the location of the sought data file. The information can be grep’ed from the HTML page and a new URL can be constructed where the ensemble average is located. This file can be read fairly easily – there’s an annoying comment at the end of the file that requires extra handling, but this is programmed easily enough. I’ve inserted the program below.
The ensemble averages is fine. Here’s where I get stuck on reading individual runs. Go to http://climexp.knmi.nl and then ping Scenario Runs on the right frame under Select A Field. Now as an example, check the radio button under tas for BCC_CM1 and click Select Field.
If you click Make Time Series in the first panel, this will make the ensemble average, producing a new page. Above the third panel in the new page is a hyperlink “raw data” which contains a time series of anomaly data. This is what I can retrieve with the script below.
If you now go back to the previous page, you’ll see hyperlinks at the bottom of the page which produce results from each of two different runs. If you click the hyperlink of “analyse ensemble member 1 separately”, it will generate a webpage looking just like the one we just had, except this time if you click Make Time Series in the first panel, it produces the results of this run located in a webpage structured as before.
I presume that there is a cgi command generated by one of these links, but I haven;t been able to figure out the structure. Maybe one of the computer programmers who visits here can figure it out.
Here are my scripts. Please paste in a real email address in the appropriate spot below (requested by Oldenburgh) so that usage of the access tools can be reported.
##FUNCTIONS
read.ensemble=function(model,scenario,prefix=myprefix,region=”GL”,suffix= “&standardunits=true”) {
if (region==”TRP”) {suffix=”&standardunits=true&lat1=-20&lat2=20&lon1=0&lon2=360″;suffix2=”0-360E_-20-20N”}
if (region==”NH”) {suffix=”&standardunits=true&lat1=0&lat2=90″;suffix2=”0-360E_0-90N”}
if (region==”SH”) {suffix=”&standardunits=true&lat1=-90&lat2=0″;suffix2=”0-360E_-90-0N”}
if (region==”arctic”) {suffix=”&standardunits=true&lat1=60&lat2=90″;suffix2=”0-360E_60-90N”}
if (region==”antarctic”) {suffix=”&standardunits=true&lat1=-90&lat2=60″;suffix2=”0-360E_-90-60N”}
if (region==”GL”) {suffix=”&standardunits=true”;suffix2=”0-360E_-90-90N”}url=paste(paste(prefix,model,scenario,sep=”_”),suffix,sep=””)
#http://climexp.knmi.nl/get_index.cgi?email=yourname@you&field=tas_bcc_cm1_20c3m&standardunits=true
# “http://climexp.knmi.nl/get_index.cgi?email=yourname@you&field=tas_bcc_cm1_20c3m&standardunits=true&lat1=-20&lat2=20&lon1=0&lon2=360”
my_info=download_html(url)
grep(“raw data”,my_info)
y= my_info[grep(“raw data”,my_info)][3]
# “raw data, ”
n=nchar(y)
loc=file.path(“http://climexp.knmi.nl”,substr(y,10,n-16))
Sys.sleep(2)
test=readLines(loc)
count=as.numeric(substr(test[1],19,26))+1
writeLines(substr(test[3:length(test)],1,20),”temp.dat”)
test=read.table(“temp.dat”);
x=test[1,1];x
read.ensemble=ts(test[,2],start= c(floor(x),round( 12* (x%%1),0)+1),freq=12)
read.ensemble}download_html =function(url) {
download.file(url, “temp.html”);
html_handle < – file(“temp.html”, “rt”);
html_data <- readLines(html_handle);
close(html_handle);
unlink(“temp.html”);
return(html_data);
}
Here is a sample use. This can easily be modified for bulk retrieval.
knmi.info=read.csv(“http://data.climateaudit.org/data/models/knmi.info.csv”,sep=”\t”)
scenario=”20c3m”
temp=(knmi.info$scenario==scenario);sum(temp) #24
id=unique(knmi.info$alias[temp]);K=length(id) #24
email=”yourname@you” #
myprefix=paste(“http://climexp.knmi.nl/get_index.cgi?email=”,email,”&field=tas”,sep=””)
i=1; test=read.ensemble(model=id[i],scenario,region=”GL”));
35 Comments
IF you use this script, please paste in a real email address in the appropriate spot below (requested by Oldenburgh) so that usage of the access tools can be reported.
Steve, following your instructions above, the final ‘raw data’ hyperlink appears to be this:
http://climexp.knmi.nl/data/itas_bcc_cm1_20c3m.1_0-360E_-20-20N_n.dat
Perhaps it’s possible to simply use this, and substitute the different parameters you might want?
Off topic but this should interest Climate Audit. An online register has been created to flag scientific papers that may be tainted by fraud or misconduct. My instinct tells me that it won’t apply to climate science.
http://www.sciencedaily.com/releases/2008/12/081217075134.htm
“The idea is to identify papers that have been shown to be fraudulent but are still in circulation.”
#2/ I know that. But you need to ping the page to create the file. You’ve done that manually (as I have). The question is how to ping the page.
Steve,
When using the browser a HTTP POST is done with the parameters in the data (:
POST /get_index.cgi HTTP/1.1
….
Content-Length: 150
/r/n
email=yourname@you&field=tas_bcc_cm1_20c3m&lat1=-90&lat2=90&lon1=0&lon2=360&intertype=nearest&gridpoints=false&standardunits=standardunits
Your R script is sending a HTTP GET with the parameters in the URL:
GET /get_index.cgi?email=jan.fluitsma@hccnet.nl&field=tas_bcc_cm1_20c3m&standardunits=true HTTP/1.0
I don’t know how to do a HTTP POST in R.
You may want to use a simple one line rsync command to copy the directories at the site into a local directory on your computer and then work from there. It works for any directory tree and moves a complete copy to your computer. The data is likely to be much easier to work with once you have a local archived copy.
rsync is easy to use, search on ‘rsync’ .. If the site won’t allow rsync access, you might ask the contact if he could provide it for you.
rsync can also be used to keep your local archive up to date with simple difference transfers and a ‘crontab’ script.
Here is the wikipedia link http://en.wikipedia.org/wiki/Rsync
Not an R person but this seems to work:
Download the httpRequest package if you don’t already have it: http://cran.r-project.org/web/packages/httpRequest/index.html
> port data = “email=someone@somewhere&field=tas_bcc_cm1_20c3m&gridpoints=false&intertype=nearest&lat1=&lat2=&lon1=&lon2=&standardunits=standardunits”
> simplePostToHost(“climexp.knmi.nl”,”/get_index.cgi”,””,data,port=port)
For real world usage it’s probably better to use the postToHost with the array of name/value pairs.
As Jan F noted, the post fields are:
email someone@somewhere
field tas_bcc_cm1_20c3m
gridpoints false
intertype nearest
lat1
lat2
lon1
lon2
standardunits standardunits
Some of the code got munged.
port = 80
data = etc…
My script generates the URL to ping the ensemble call just fine. It’s this:
The problem is that you have to figure out some way of adding the run number into with some extra term of the form “_1” or “.1” and no one seems to know how. When you hit the button Make Time Series, something is generated like this, you just don’t see it. I’ve experimented with things like
in various ways but to no avail and no one seems to know.
Apologies, I completely missed part of the post
It’s posting data to http://climexp.knmi.nl/selectmembers.cgi
This is the field/value pairs it’s posting:
EMAIL someone@somewhere
NPERYEAR 12
ensanom on
nens1 1
station bcc_cm1_20c3m_tas_0-360E_-90-90N_ensemble
type i
wmo tas_bcc_cm1_20c3m_0-360E_-90-90N_n_++
Sorry, that was the wrong one. It’s posting this with the 1 being 0 or 1 for the ensemble picked:
email someone@somewhere
field data/tas_bcc_cm1_20c3m.1.someone@somewhere.info
gridpoints false
intertype nearest
lat1
lat2
lon1
lon2
standardunits standardunits
This page seems to contain a shed load of raw data. I’m not sure if it contains all possible model runs etc. but it certainly has a lot of them. The output is in graphic and text format depending on the file extension.
E.g. for the example you are playing with there is
itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.dat
itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.eps.gz
itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.png
itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.txt
It doesn’t seem to need an email login either FWIW.
Re: FrancisT (#12), Sorry just realized I’m echoing comment 2 more or less.
It would probably be worth verifying whetehr the raw data exists before recreating it though so a check for that would be useful. Let me cogitate some more
If the data exists in http://climexp.knmi.nl/data/ it will be of the form
(baseurl)(type)_(model)_(exp)(.ens)(latlong)
baseurl=http://climexp.knmi.nl/data/i
type=(tas|pr|psl|tx|ty|sst|tos|z20|zos|z500)
model and exp derived by lowercasing the names e.g.
model GFDL CM2.0 =>gfdl_cm2_0
exp 20c3m = 20c3m
ens is the optional ensemble and is of the form .digit
latlong=_0-360E_-90-90N_n.dat
putting it together we have stuff like
http://climexp.knmi.nl/data/ipsl_csiro_mk3_5_sresa1b_0-360E_-90-90N_n.dat
In looking for KNMI digital data, I took 2 towns in Australia whose annual averaged minimum daily temperatures over the last 40 years were at hand for Meekatharra, West Australia, World Number 5461 50194443000. I also looked at Esperance, West Australia, 5572 50194638000.
My data were raw from the Aust Met Bureau, daily. I do not yet know the source of the KNMI figures.
However, in year 2000 for the first and only 2 cases I have looked at, there is a severe continuity break at year 2000. There is excellent greement with the KNMI graph before 2000 for Meekatharra back to 1950( but post 2000 is lower than pre-2000). The opposite sense is present for Esperance, with pre-2000 temperatures non-systematically some 0.6 to 0.8 degrees higher than mine, but excellent agreement after year 2000.
Meekatharra
year my calcs KNMI calcs deg C min annual mean
1998 17.02 17.0
1999 15.91 15.9
2000 14.82 14.8
2001 15.08 14.5
2002 16.11 15.2
2003 16.25 15.4
2004 15.91 15.1
2005 16.44 15.6
2006 15.84 14.8 etc
My examination was to see if these BOM designated high quality rural sites show the world-wide 1998-2006 “Hump” so familiar on temp graphs. The KNMI figures do not show the hump for each of the first 2 cases out of 2 that I looked at.
I too would like to be able to pull down daily/weekly/monthly/annual averages in digital form.
Something is clearly badly amiss.
Not sure what platform you’re on, but if you have access to Windows/IE for experimenting on then this app: http://www.fiddler2.com/fiddler2/ will show you what’s going on between the browser and the server.
try this for ensemble 0:
http://climexp.knmi.nl/get_index.cgi?email=yourname@you&field=data/tas_bcc_cm1_20c3m.0.yourname@you.info&standardunits=true
Don’t forget to append your email address and “info”.
On the page you get there are links to postscript, raw and netcdf data formats.
Hope that’s what you want.
#18. That command doesn’t work for me. I get response that there’s no data.
#Prior comments – the required command has to be a cgi line. The other suggestions only work if the cgi command has already been done. You’re getting wrongfooted because the dat files have been created by manual cgi’s which have generated the dat files (Which will expire after a period of time). We need something along the lines of 18 if it works (which I can’t confirm so far.)
Pardon my ignorance/naivete. “Oldenburgh helped as much as he could”: Since he is friendly and cooperative, does it not make sense to request of him to find someone (else) at KNMI who can tell you how to do it?
I downloaded the /Data listing after each step to see when files are generated (it’s a 20 meg listing each time):
Logon – http://climexp.knmi.nl/start.cgi?climexp.knmi.nl@insurgent.us
Select “Monthly Scenario Runs” – http://climexp.knmi.nl/selectfield_co2.cgi?climexp.knmi.nl@insurgent.us
Select “tas” and click Select Fields – http://climexp.knmi.nl/select.cgi?email=climexp.knmi.nl@insurgent.us&field=tas_bcc_cm1_20c3m
Select “Analyze ensemble member 0” – http://climexp.knmi.nl/selectmember.cgi?climexp.knmi.nl@insurgent.us+0+tas_bcc_cm1_20c3m
File appears in /Data: tas_bcc_cm1_20c3m.0.climexp.knmi.nl@insurgent.us.info
Click “Make time series” – POST to http://climexp.knmi.nl/get_index.cgi with these field:
email climexp.knmi.nl@insurgent.us
field data/tas_bcc_cm1_20c3m.0.climexp.knmi.nl@insurgent.us.info
gridpoints false
intertype nearest
lat1
lat2
lon1
lon2
standardunits standardunits
New file in /Data: itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.12.climexp.knmi.nl@insurgent.us.inf
Raw data link on page points to: http://climexp.knmi.nl/data/itas_bcc_cm1_20c3m.0_0-360E_-90-90N_n.dat
Since the “Make time series” command references a file generated by the “Analyze ensemble member 0” command, it doesn’t look like the process can be reduced to a single command.
Do you know how to use Wireshark (aka Ethereal)? It will show you everything that is being sent back and forth.
I started from #18 but used the command:
wget “http://climexp.knmi.nl/get_index.cgi” –post-data=”email=someone@somewhere& field=data/tas_bcc_cm1_20c3m.0.someone@somewhere.info&standardunits=true”
Which returned the correct html page with URLs to download the relevant data files.
Replacing the .0 with a .1 seems to do the right thing.
It has to be an email address that it recognises or a page about unknown email is returned.
So, using the command as specified in #18 with a valid email address and making sure you use the POST method should result in the correct page being returned.
Just FYI wget is available on both Linux and Windows, if required.
MD
#21. Nicely done. I’ll set this up in an R-routine tomorrow. I’ll advise KNMI as well.
#22, 23. I don’t know these tools but for present purposes, I think that #21 has what I was looking for. I’ve got a couple of other similar retrieval problems that I’ll post up as well.
James Smyth said:
Even though my name is on a number of the dissectors in Ethereal/Wireshark, I would hesitate to suggest it casually to people …
Re: Richard Sharpe (#25),
I think our host would be quite capable of figuring it out in a relatively short time. He’s not my mom, for crying (and laughing) out loud.
I received the following message from KNMI today:
I thought that my scripts and methods made it quite clear as to the distinction between model runs and ensembles. Indeed, the entire purpose of the post was to get assistance with a cgi script to extract the individual runs – and by posting scripts the steps are clarified. However, in the event that anyone misunderstood, I’ve posted up this message.
I can’t respond to the comment that the script yields the concatenated ensemble members and not the mean, as I am just feeling my way through this data. If the incorrect thing is being recovered, I’ll obviously modify things accordingly.
As KNMI says: “climate is confusing enough as it is without these kind of confusions” and I would encourage them to provide a proper description of their cgi options somewhere on their website. They’ve done a good job on the cgi access; I’m handy on things like this and I’d have preferred a little less chippiness on their part.
Or it could be that scientists at KNMI read this blog, and have seen how the posters here are able to find simpler ways to access and use data, as long as we can access the data.
OTOH, maybe that’s why the scientists from Santer’s group don’t want you to have access to data: they know you’ll find a simpler, more correct way to use it.
In linux, or cygwin on windows, etc, it should be possible to get a loop going on a wget command on a command line or in a shell script or in R that will increment filenames either locally or on the server or both. If that’s what you’re trying to do.
#29. That’s not the problem. R has brilliant tools for pasting URLs together and doing things. The only issues arose in trying to figure out how to write the cgi code to make the server to what we want. And I think that we’ve broken the back of this, tho I haven’t worked on it today.
The Climate Explorer http://climexp.knmi.nl is a web application to investigate climate data, Observations, (re)analyses and model output. As a side effect of the way it is written it is possible to use it from a script as well as interactively. I recommend you first try it interactively, and figure out how to script it from there.
The http request http://climexp.knmi.nl/get_index.cgi instructs the software to retrieve a time-varying field and construct a time series by averaging over a region or interpolate to a point. The arguments can be specified in the URL as http://climexp.knmi.nl?get_index.cgi?arg1=val1&arg2=val2 etc. For get_index.cgi the arguments are
email – please register first and use a real (or almost-real) e-mail address, especially when using the system heavily.
field – can be found in the source code of one of the selectfield pages; for CMIP3 data it is ${var}_${model}_${scenario}, with var=tas,pr,psl,…; model the CMIP3 model name (mpi_echam5, gfdl_cm2_1, etc), and scenarion the CMIP3 name (20c3m, sresa1b, picntrl, etc). Not all combinations are available. I concatenated the sresa1b experiments with similarly numbered 20c3m runs.
lon1,lon2 – longitude range to be averaged, default is 0,360
lat1,lat2 – latitude range to be averaged, default is -90,90
intertype = “nearest”: nearest grid point, “interpolated”: interpolate. The latter is only supported for a single grid point, i.e., lon1=lon2 and lat1=lat2. Default is at the moment “nearest”
This call just produces an HTML page that gives further options. The data files are put in http://climexp.knmi.nl/data/ can be retrieved from there as
http://climexp.knmi.nl/data/i$field_$lon1-$lon2E_$lat1-$lat2N_i|n.dat
or
http://climexp.knmi.nl/data/i$field_$lon1-$lon2E_$lat1-$lat2N_$i|n_NN.dat
for an ensemble, NN=00,01,02,… The “i” or “n” denotes whether the point has been interpolated or approximated by the nearest grid point.
These files stay in the data directory until three days after last use. There is no need to check that these files exist to save work – get_index.cgi already does that. Please pause your script for a little while between calls in order not to hog the computer.
Other options that may be of interest to readers of this blog:
– Running trend analysis: see how unusual the last N years of the global mean temperature have been. Select an estimate of the global mean temperature (e.g., GISSTEMP), follow the link “correlate with another time series”, select “time”, “regression”, “Dec”, “12” and fill out the required number of years (11 to start on the big El Niño of 1998).
– Antarctic sea ice: select the NSIDC ice concentration from “Monthly observations” under “Select a field”, average this over -90 to -45, 0 to 360, and follow the link “View per season” to get seasonal means with a 10-yr running avreage. Do the same for Arctic sea ice.
If you use results from the Climate Explorer in a scientific publication, please cite one of my papers as credit:
Ulden, A.P. van and G.J. van Oldenborgh, Large-scale atmospheric circulation biases and changes in global climate model simulations and their importance for climate change in Central Europe
Atm. Chem. Phys., 2006, 6, 863-881, sref:1680-7324/acp/2006-6-863.
http://www.atmos-chem-phys-discuss.net/5/7415/2005/acpd-5-7415-2005.html
Oldenborgh, G.J. van, S.S. Drijfhout, A. van Ulden, R. Haarsma, A. Sterl, C. Severijns, W. Hazeleger and H. Dijkstra, Western Europe is warming much faster than expected
accepted, Climate of the Past, 2008.
Climate of the Past Disc., 2008, 4, 897-928, sref:1814-9359/cpd/2008-4-897.
http://www.clim-past-discuss.net/4/897/2008/cpd-4-897-2008.html
Share and enjoy,
Geert Jan
In emails, Geert Jan van Oldenborgh of KNMI is most cooperative, but is part of a team that has taken on a quite large task that no single person could possibly answer in detail off the cuff.
He noted privately that the data mismatch I showed allegedy originated from NOAA, that “This way the warming trend in Australia is underestimated in the GISS and NCDC estimates that depend on GHCN.”
Which more or less says what SM has been saying for some time now.
Re: Geoff Sherrington (#32), Geoff–someone has sent me some aussie data for you but I don’t have your email. Please email me cloehle at ncasi dot org
I’d appreciate help from any reader who knows this sort of stuff. I want to extract “land only” model run information from KNMI in R. They have a radio button that masks information to land, but I can’t figure out how it’s expressed in the cgi? command that I’m trying to paste together in R.
Information from a reader who knows how to extract what the radio button does would be appreciated.
The HTML source contains the following lines:
< input type=”radio” class=”formradio” name =”masktype” value=”all” checked > everything
< input type=”radio” class=”formradio” name =”masktype” value=”land” > only land points
< input type=”radio” class=”formradio” name =”masktype” value=”sea” > only sea points
It seems “masktype” is an argument for the get_index.cgi command
Thus, a retrieval line for a land only model should look something like this:
http://climexp.knmi.nl/get_index.cgi?&field=hadcm3_a2_temp&masktype=land&…(etc)…&email=yourname@you
Re: Thor (#34),
I also sent an email to Geert who replied on a Saturday saying to add &masktype=land (or sea). I had thrashed around for some time trying to figure it out.
I’ve improved my scraping function and done a similar one to scrape some observation sets. I’ll probably do another post on this.
2 Trackbacks
[…] doubt. Steve McIntyre has written on his experiences on trying to retrieve some of this information here and here. He has also written extensively on his difficulties in getting access to other data the […]
[…] need to register and insert your own e-mail address), which I developed with some help from this Climate Audit post that cut down the learning curve. This is not a complete set of all CMIP5 models available at […]