Downloading KNMI Model Data

Lucia has been experimenting with model data downloaded from KNMI and I thought that I’d try to experiment with this a little from time to time.

Unlike Benjamin Santer, Ph.D. of the U.S. federal Lawrence Livermore Labs, who refused to provide any assistance whatever in providing the data used in Santer et al, Geert Oldenburgh, the KNMI scientist responsible for their public interface, is pleasant and cooperative though it seems that they aren’t used to automated data inquiry of the type that I expect to do.

Their webpage is designed for manual retrieval, but it can be pinged so that data can be directly downloaded into R. KNMI had no objections to me trying to do this and Oldenburgh helped as much as he could. I’ve managed to write a neat little retrieval script that picks up ensemble combinations but I’m stumped as to how to retrieve individual runs. It looks like it should be possible; KNMI has no objection to my doing so; maybe one of the computer-oriented readers can solve this little puzzle.

Here’s what I can do so far.

The access point are cgi commands in the URL that instruct the KNMI computer to compile averages. This works very quickly. Examples of these commands are:


If you insert this sort of URL in your browser, it will ping the KNMI computer to produce ensemble temperature averages for the region specified by the lat-long (default – global.) The HTML page that is produced contains information on the location of the sought data file. The information can be grep’ed from the HTML page and a new URL can be constructed where the ensemble average is located. This file can be read fairly easily – there’s an annoying comment at the end of the file that requires extra handling, but this is programmed easily enough. I’ve inserted the program below.

The ensemble averages is fine. Here’s where I get stuck on reading individual runs. Go to and then ping Scenario Runs on the right frame under Select A Field. Now as an example, check the radio button under tas for BCC_CM1 and click Select Field.

If you click Make Time Series in the first panel, this will make the ensemble average, producing a new page. Above the third panel in the new page is a hyperlink “raw data” which contains a time series of anomaly data. This is what I can retrieve with the script below.

If you now go back to the previous page, you’ll see hyperlinks at the bottom of the page which produce results from each of two different runs. If you click the hyperlink of “analyse ensemble member 1 separately”, it will generate a webpage looking just like the one we just had, except this time if you click Make Time Series in the first panel, it produces the results of this run located in a webpage structured as before.

I presume that there is a cgi command generated by one of these links, but I haven;t been able to figure out the structure. Maybe one of the computer programmers who visits here can figure it out.

Here are my scripts. Please paste in a real email address in the appropriate spot below (requested by Oldenburgh) so that usage of the access tools can be reported.

read.ensemble=function(model,scenario,prefix=myprefix,region=”GL”,suffix= “&standardunits=true”) {
if (region==”TRP”) {suffix=”&standardunits=true&lat1=-20&lat2=20&lon1=0&lon2=360″;suffix2=”0-360E_-20-20N”}
if (region==”NH”) {suffix=”&standardunits=true&lat1=0&lat2=90″;suffix2=”0-360E_0-90N”}
if (region==”SH”) {suffix=”&standardunits=true&lat1=-90&lat2=0″;suffix2=”0-360E_-90-0N”}
if (region==”arctic”) {suffix=”&standardunits=true&lat1=60&lat2=90″;suffix2=”0-360E_60-90N”}
if (region==”antarctic”) {suffix=”&standardunits=true&lat1=-90&lat2=60″;suffix2=”0-360E_-90-60N”}
if (region==”GL”) {suffix=”&standardunits=true”;suffix2=”0-360E_-90-90N”}

# “”
grep(“raw data”,my_info)
y= my_info[grep(“raw data”,my_info)][3]
# “raw data, ”
read.ensemble=ts(test[,2],start= c(floor(x),round( 12* (x%%1),0)+1),freq=12)

download_html =function(url) {
download.file(url, “temp.html”);
html_handle < – file(“temp.html”, “rt”);
html_data <- readLines(html_handle);

Here is a sample use. This can easily be modified for bulk retrieval.“;,sep=”\t”)
temp=($scenario==scenario);sum(temp) #24
id=unique($alias[temp]);K=length(id) #24
email=”yourname@you” #
i=1; test=read.ensemble(model=id[i],scenario,region=”GL”));


  1. Steve McIntyre
    Posted Dec 20, 2008 at 4:06 PM | Permalink

    IF you use this script, please paste in a real email address in the appropriate spot below (requested by Oldenburgh) so that usage of the access tools can be reported.

  2. masmit
    Posted Dec 20, 2008 at 4:55 PM | Permalink

    Steve, following your instructions above, the final ‘raw data’ hyperlink appears to be this:

    Perhaps it’s possible to simply use this, and substitute the different parameters you might want?

  3. Reid
    Posted Dec 20, 2008 at 5:05 PM | Permalink

    Off topic but this should interest Climate Audit. An online register has been created to flag scientific papers that may be tainted by fraud or misconduct. My instinct tells me that it won’t apply to climate science.
    “The idea is to identify papers that have been shown to be fraudulent but are still in circulation.”

  4. Steve McIntyre
    Posted Dec 20, 2008 at 5:10 PM | Permalink

    #2/ I know that. But you need to ping the page to create the file. You’ve done that manually (as I have). The question is how to ping the page.

  5. Jan F
    Posted Dec 20, 2008 at 5:30 PM | Permalink


    When using the browser a HTTP POST is done with the parameters in the data (:
    POST /get_index.cgi HTTP/1.1
    Content-Length: 150

    Your R script is sending a HTTP GET with the parameters in the URL:
    GET /get_index.cgi? HTTP/1.0

    I don’t know how to do a HTTP POST in R.

  6. bill-tb
    Posted Dec 20, 2008 at 5:48 PM | Permalink

    You may want to use a simple one line rsync command to copy the directories at the site into a local directory on your computer and then work from there. It works for any directory tree and moves a complete copy to your computer. The data is likely to be much easier to work with once you have a local archived copy.

    rsync is easy to use, search on ‘rsync’ .. If the site won’t allow rsync access, you might ask the contact if he could provide it for you.

    rsync can also be used to keep your local archive up to date with simple difference transfers and a ‘crontab’ script.

    Here is the wikipedia link

  7. insurgent
    Posted Dec 20, 2008 at 6:21 PM | Permalink

    Not an R person but this seems to work:
    Download the httpRequest package if you don’t already have it:

    > port data = “email=someone@somewhere&field=tas_bcc_cm1_20c3m&gridpoints=false&intertype=nearest&lat1=&lat2=&lon1=&lon2=&standardunits=standardunits”
    > simplePostToHost(“”,”/get_index.cgi”,””,data,port=port)

    For real world usage it’s probably better to use the postToHost with the array of name/value pairs.
    As Jan F noted, the post fields are:
    email someone@somewhere
    field tas_bcc_cm1_20c3m
    gridpoints false
    intertype nearest
    standardunits standardunits

  8. insurgent
    Posted Dec 20, 2008 at 6:23 PM | Permalink

    Some of the code got munged.
    port = 80
    data = etc…

  9. Steve McIntyre
    Posted Dec 20, 2008 at 10:57 PM | Permalink

    My script generates the URL to ping the ensemble call just fine. It’s this:

    The problem is that you have to figure out some way of adding the run number into with some extra term of the form “_1” or “.1” and no one seems to know how. When you hit the button Make Time Series, something is generated like this, you just don’t see it. I’ve experimented with things like

    in various ways but to no avail and no one seems to know.

  10. insurgent
    Posted Dec 20, 2008 at 11:23 PM | Permalink

    Apologies, I completely missed part of the post :/
    It’s posting data to
    This is the field/value pairs it’s posting:
    EMAIL someone@somewhere
    ensanom on
    nens1 1
    station bcc_cm1_20c3m_tas_0-360E_-90-90N_ensemble
    type i
    wmo tas_bcc_cm1_20c3m_0-360E_-90-90N_n_++

  11. insurgent
    Posted Dec 20, 2008 at 11:34 PM | Permalink

    Sorry, that was the wrong one. It’s posting this with the 1 being 0 or 1 for the ensemble picked:
    email someone@somewhere
    field data/
    gridpoints false
    intertype nearest
    standardunits standardunits

  12. Posted Dec 21, 2008 at 3:12 AM | Permalink

    This page seems to contain a shed load of raw data. I’m not sure if it contains all possible model runs etc. but it certainly has a lot of them. The output is in graphic and text format depending on the file extension.

    E.g. for the example you are playing with there is

    It doesn’t seem to need an email login either FWIW.

    • Posted Dec 21, 2008 at 3:15 AM | Permalink

      Re: FrancisT (#12), Sorry just realized I’m echoing comment 2 more or less.

      It would probably be worth verifying whetehr the raw data exists before recreating it though so a check for that would be useful. Let me cogitate some more

  13. Posted Dec 21, 2008 at 4:18 AM | Permalink

    If the data exists in it will be of the form


    model and exp derived by lowercasing the names e.g.
    model GFDL CM2.0 =>gfdl_cm2_0
    exp 20c3m = 20c3m
    ens is the optional ensemble and is of the form .digit

    putting it together we have stuff like

  14. Geoff Sherrington
    Posted Dec 21, 2008 at 6:40 AM | Permalink

    In looking for KNMI digital data, I took 2 towns in Australia whose annual averaged minimum daily temperatures over the last 40 years were at hand for Meekatharra, West Australia, World Number 5461 50194443000. I also looked at Esperance, West Australia, 5572 50194638000.

    My data were raw from the Aust Met Bureau, daily. I do not yet know the source of the KNMI figures.

    However, in year 2000 for the first and only 2 cases I have looked at, there is a severe continuity break at year 2000. There is excellent greement with the KNMI graph before 2000 for Meekatharra back to 1950( but post 2000 is lower than pre-2000). The opposite sense is present for Esperance, with pre-2000 temperatures non-systematically some 0.6 to 0.8 degrees higher than mine, but excellent agreement after year 2000.

    year my calcs KNMI calcs deg C min annual mean
    1998 17.02 17.0
    1999 15.91 15.9
    2000 14.82 14.8
    2001 15.08 14.5
    2002 16.11 15.2
    2003 16.25 15.4
    2004 15.91 15.1
    2005 16.44 15.6
    2006 15.84 14.8 etc

    My examination was to see if these BOM designated high quality rural sites show the world-wide 1998-2006 “Hump” so familiar on temp graphs. The KNMI figures do not show the hump for each of the first 2 cases out of 2 that I looked at.

    I too would like to be able to pull down daily/weekly/monthly/annual averages in digital form.

    Something is clearly badly amiss.

  15. Wilbur
    Posted Dec 21, 2008 at 1:04 PM | Permalink

    Not sure what platform you’re on, but if you have access to Windows/IE for experimenting on then this app: will show you what’s going on between the browser and the server.

  16. arf
    Posted Dec 21, 2008 at 2:17 PM | Permalink

    try this for ensemble 0:
    Don’t forget to append your email address and “info”.
    On the page you get there are links to postscript, raw and netcdf data formats.
    Hope that’s what you want.

  17. Steve McIntyre
    Posted Dec 21, 2008 at 2:40 PM | Permalink

    #18. That command doesn’t work for me. I get response that there’s no data.

    #Prior comments – the required command has to be a cgi line. The other suggestions only work if the cgi command has already been done. You’re getting wrongfooted because the dat files have been created by manual cgi’s which have generated the dat files (Which will expire after a period of time). We need something along the lines of 18 if it works (which I can’t confirm so far.)

  18. Shallow Climate
    Posted Dec 21, 2008 at 5:19 PM | Permalink

    Pardon my ignorance/naivete. “Oldenburgh helped as much as he could”: Since he is friendly and cooperative, does it not make sense to request of him to find someone (else) at KNMI who can tell you how to do it?

  19. insurgent
    Posted Dec 21, 2008 at 5:53 PM | Permalink

    I downloaded the /Data listing after each step to see when files are generated (it’s a 20 meg listing each time):
    Logon –
    Select “Monthly Scenario Runs” –
    Select “tas” and click Select Fields –
    Select “Analyze ensemble member 0” –
    File appears in /Data:
    Click “Make time series” – POST to with these field:
    field data/
    gridpoints false
    intertype nearest
    standardunits standardunits

    New file in /Data:
    Raw data link on page points to:

    Since the “Make time series” command references a file generated by the “Analyze ensemble member 0” command, it doesn’t look like the process can be reduced to a single command.

  20. James Smyth
    Posted Dec 21, 2008 at 6:00 PM | Permalink

    Do you know how to use Wireshark (aka Ethereal)? It will show you everything that is being sent back and forth.

  21. MD
    Posted Dec 21, 2008 at 7:22 PM | Permalink

    I started from #18 but used the command:

    wget “” –post-data=”email=someone@somewhere& field=data/”

    Which returned the correct html page with URLs to download the relevant data files.

    Replacing the .0 with a .1 seems to do the right thing.

    It has to be an email address that it recognises or a page about unknown email is returned.

    So, using the command as specified in #18 with a valid email address and making sure you use the POST method should result in the correct page being returned.

    Just FYI wget is available on both Linux and Windows, if required.


  22. Steve McIntyre
    Posted Dec 21, 2008 at 10:10 PM | Permalink

    #21. Nicely done. I’ll set this up in an R-routine tomorrow. I’ll advise KNMI as well.

    #22, 23. I don’t know these tools but for present purposes, I think that #21 has what I was looking for. I’ve got a couple of other similar retrieval problems that I’ll post up as well.

  23. Richard Sharpe
    Posted Dec 21, 2008 at 10:28 PM | Permalink

    James Smyth said:

    Do you know how to use Wireshark (aka Ethereal)? It will show you everything that is being sent back and forth.

    Even though my name is on a number of the dissectors in Ethereal/Wireshark, I would hesitate to suggest it casually to people …

    • James Smyth
      Posted Dec 22, 2008 at 11:42 AM | Permalink

      Re: Richard Sharpe (#25),

      Even though my name is on a number of the dissectors in Ethereal/Wireshark, I would hesitate to suggest it casually to people …

      I think our host would be quite capable of figuring it out in a relatively short time. He’s not my mom, for crying (and laughing) out loud.

  24. Steve McIntyre
    Posted Dec 22, 2008 at 2:24 PM | Permalink

    I received the following message from KNMI today:

    I just saw your blog post. Please make clear that you get the ensemble members concatenated in one file with your script, not the ensemble mean. To compute the ensemble mean press the big button at the bottom of the page. I hope not too many people get confused about this, climate is confusing enough as it is without these kind of confusions.

    I thought that my scripts and methods made it quite clear as to the distinction between model runs and ensembles. Indeed, the entire purpose of the post was to get assistance with a cgi script to extract the individual runs – and by posting scripts the steps are clarified. However, in the event that anyone misunderstood, I’ve posted up this message.

    I can’t respond to the comment that the script yields the concatenated ensemble members and not the mean, as I am just feeling my way through this data. If the incorrect thing is being recovered, I’ll obviously modify things accordingly.

    As KNMI says: “climate is confusing enough as it is without these kind of confusions” and I would encourage them to provide a proper description of their cgi options somewhere on their website. They’ve done a good job on the cgi access; I’m handy on things like this and I’d have preferred a little less chippiness on their part.

  25. henry
    Posted Dec 22, 2008 at 3:54 PM | Permalink

    Unlike Benjamin Santer, Ph.D. of the U.S. federal Lawrence Livermore Labs, who refused to provide any assistance whatever in providing the data used in Santer et al, Geert Oldenburgh, the KNMI scientist responsible for their public interface, is pleasant and cooperative though it seems that they aren’t used to automated data inquiry of the type that I expect to do.

    Or it could be that scientists at KNMI read this blog, and have seen how the posters here are able to find simpler ways to access and use data, as long as we can access the data.

    OTOH, maybe that’s why the scientists from Santer’s group don’t want you to have access to data: they know you’ll find a simpler, more correct way to use it.

  26. Sam Urbinto
    Posted Dec 22, 2008 at 6:26 PM | Permalink

    In linux, or cygwin on windows, etc, it should be possible to get a loop going on a wget command on a command line or in a shell script or in R that will increment filenames either locally or on the server or both. If that’s what you’re trying to do.

  27. Steve McIntyre
    Posted Dec 22, 2008 at 7:40 PM | Permalink

    #29. That’s not the problem. R has brilliant tools for pasting URLs together and doing things. The only issues arose in trying to figure out how to write the cgi code to make the server to what we want. And I think that we’ve broken the back of this, tho I haven’t worked on it today.

  28. Posted Dec 23, 2008 at 2:04 AM | Permalink

    The Climate Explorer is a web application to investigate climate data, Observations, (re)analyses and model output. As a side effect of the way it is written it is possible to use it from a script as well as interactively. I recommend you first try it interactively, and figure out how to script it from there.

    The http request instructs the software to retrieve a time-varying field and construct a time series by averaging over a region or interpolate to a point. The arguments can be specified in the URL as etc. For get_index.cgi the arguments are

    email – please register first and use a real (or almost-real) e-mail address, especially when using the system heavily.

    field – can be found in the source code of one of the selectfield pages; for CMIP3 data it is ${var}_${model}_${scenario}, with var=tas,pr,psl,…; model the CMIP3 model name (mpi_echam5, gfdl_cm2_1, etc), and scenarion the CMIP3 name (20c3m, sresa1b, picntrl, etc). Not all combinations are available. I concatenated the sresa1b experiments with similarly numbered 20c3m runs.

    lon1,lon2 – longitude range to be averaged, default is 0,360
    lat1,lat2 – latitude range to be averaged, default is -90,90

    intertype = “nearest”: nearest grid point, “interpolated”: interpolate. The latter is only supported for a single grid point, i.e., lon1=lon2 and lat1=lat2. Default is at the moment “nearest”

    This call just produces an HTML page that gives further options. The data files are put in can be retrieved from there as$field_$lon1-$lon2E_$lat1-$lat2N_i|n.dat


    for an ensemble, NN=00,01,02,… The “i” or “n” denotes whether the point has been interpolated or approximated by the nearest grid point.

    These files stay in the data directory until three days after last use. There is no need to check that these files exist to save work – get_index.cgi already does that. Please pause your script for a little while between calls in order not to hog the computer.

    Other options that may be of interest to readers of this blog:

    – Running trend analysis: see how unusual the last N years of the global mean temperature have been. Select an estimate of the global mean temperature (e.g., GISSTEMP), follow the link “correlate with another time series”, select “time”, “regression”, “Dec”, “12” and fill out the required number of years (11 to start on the big El Niño of 1998).

    – Antarctic sea ice: select the NSIDC ice concentration from “Monthly observations” under “Select a field”, average this over -90 to -45, 0 to 360, and follow the link “View per season” to get seasonal means with a 10-yr running avreage. Do the same for Arctic sea ice.

    If you use results from the Climate Explorer in a scientific publication, please cite one of my papers as credit:

    Ulden, A.P. van and G.J. van Oldenborgh, Large-scale atmospheric circulation biases and changes in global climate model simulations and their importance for climate change in Central Europe
    Atm. Chem. Phys., 2006, 6, 863-881, sref:1680-7324/acp/2006-6-863.

    Oldenborgh, G.J. van, S.S. Drijfhout, A. van Ulden, R. Haarsma, A. Sterl, C. Severijns, W. Hazeleger and H. Dijkstra, Western Europe is warming much faster than expected
    accepted, Climate of the Past, 2008.
    Climate of the Past Disc., 2008, 4, 897-928, sref:1814-9359/cpd/2008-4-897.

    Share and enjoy,

    Geert Jan

  29. Geoff Sherrington
    Posted Dec 23, 2008 at 4:51 AM | Permalink

    In emails, Geert Jan van Oldenborgh of KNMI is most cooperative, but is part of a team that has taken on a quite large task that no single person could possibly answer in detail off the cuff.

    He noted privately that the data mismatch I showed allegedy originated from NOAA, that “This way the warming trend in Australia is underestimated in the GISS and NCDC estimates that depend on GHCN.”

    “It is unfortunate that there is no formal bug-tracking system in place for climate data, something like Bugzilla for programming. The chain is sufficiently complicated (BOM => NCDC => KNMI => you) to warrant a formal system IMHO. Also, dealing with these things by hand for O(10^5) time series is undoable.”

    Which more or less says what SM has been saying for some time now.

    • Craig Loehle
      Posted Dec 23, 2008 at 7:55 AM | Permalink

      Re: Geoff Sherrington (#32), Geoff–someone has sent me some aussie data for you but I don’t have your email. Please email me cloehle at ncasi dot org

  30. Steve McIntyre
    Posted Jun 13, 2009 at 12:27 PM | Permalink

    I’d appreciate help from any reader who knows this sort of stuff. I want to extract “land only” model run information from KNMI in R. They have a radio button that masks information to land, but I can’t figure out how it’s expressed in the cgi? command that I’m trying to paste together in R.

    Information from a reader who knows how to extract what the radio button does would be appreciated.

  31. Thor
    Posted Jun 14, 2009 at 4:12 AM | Permalink

    The HTML source contains the following lines:

    &lt input type=”radio” class=”formradio” name =”masktype” value=”all” checked &gt everything
    &lt input type=”radio” class=”formradio” name =”masktype” value=”land” &gt only land points
    &lt input type=”radio” class=”formradio” name =”masktype” value=”sea” &gt only sea points

    It seems “masktype” is an argument for the get_index.cgi command

    Thus, a retrieval line for a land only model should look something like this:…(etc)…&email=yourname@you

    • Steve McIntyre
      Posted Jun 14, 2009 at 5:25 AM | Permalink

      Re: Thor (#34),
      I also sent an email to Geert who replied on a Saturday saying to add &masktype=land (or sea). I had thrashed around for some time trying to figure it out.

      I’ve improved my scraping function and done a similar one to scrape some observation sets. I’ll probably do another post on this.

2 Trackbacks

  1. By NRO on Global Warming – Cynical Heretic on Jul 26, 2009 at 10:39 AM

    […] doubt. Steve McIntyre has written on his experiences on trying to retrieve some of this information here and here. He has also written extensively on his difficulties in getting access to other data the […]

  2. […] need to register and insert your own e-mail address), which I developed with some help from this Climate Audit post that cut down the learning curve.  This is not a complete set of all CMIP5 models available at […]

%d bloggers like this: