Collating Hurricane Track Data

I’ve collated some hurricane track data in flat text files so that they are more usable (at least for me) and have posted up the working files since there seems to be interest in this data. I’m also posting up some notes on fiddly format issues while they are fresh in mind.

Track data for 5 of the 6 basins discussed in Webster et al 2005 is located at This also seems to be the source of ther data used in Emanuel 2005 (which adjusted it.) Neither article gives a data citation according to (say) AGU data citation requirements. These requirements – exact digital references- appear to be unknown to climate scientists as data references are virtually always to print publications, although occasional data citations are now starting to drift into the occasional SI. The 6th basin appears to be obtained from the NAvy data set.

The archive contains collated data for Atlantic storms from 1851-2005 with uncollated information for 2006; for W PAcific storms, the collation is from 1945 to 2003(!) with uncollated storm information for 2004-2006, which requires ad hoc collation. Emanuel 2005 used data up to 2004; after Landsea criticized him for endpoint pinning, he added 2005 data. I’ve added 2006 up-to-date to investigate the effect of endpoint pinning to 2006.

A collation script is here . This is not a polished script as it’s a one-off collation exercise. I don’t guarantee the collation – in this type of exercise with new data, I sometimes have to re-visit collation exercises a couple of times when the underlying files are messy and I use some ad hoc tweaking for q.c. Undoubtedly the script could be made a little simpler; this has not been optimized in any sense and is a little slow, but since it’s one-off, a little slowness doesn’t matter

I make two tables for each basin: – data,frame with one record for each storm; -data.frame with one record for each measurement – measurements made per quarter day. The Track records contain the following:
names(Track) # id year qtr month day lat long wind press

Most of the fields in the hurricane tables are not used. However, the data frames are handy for summarizing the Track data as I’ll show later.

To collate the annual storms for 2004-2006 in the West Pacific and 2006 in the Atlantic, I went to the following pages / ( Tracks are stored for each storm, but the directories are not externally readable. To make a list of subdirectories for reading track data, I did the following expedient. At the top of the pages mentioned, there is a table of storms; I manually copied them into a text file labeled WPAC.2006.txt,… so that they can be read and analysed. I read these files using the R-function scan and then collated the words after any of “Storm”/”Typhoon”/”Depression”/”Hurricane”. This left me with the required storm names which are used in subdirectories.

The data files are saved in tab-separated format as, etc. and whould be directly readable through an R-command. (I haven’t had any luck reading R-tables from this site – advice welcomed.)

hurricane< -read.table(";,sep="\t",header=TRUE)
Track< -read.table(";,sep="\t",header=TRUE)

Same for WPAC. At some point, I’ll add on EPAC, NIO, SIO and SH. I should be able to tidy this up some more. Anyway with a relatively clean database, I can now start testing some of the claims in Emanuel 2005. I’ve taken a preliminary look and some results will be interesting.


  1. TAC
    Posted Oct 12, 2006 at 4:08 AM | Permalink

    (I haven’t had any luck reading R-tables from this site – advice welcomed.)

    Steve, my version of R returns “Error: syntax error” when you use fancy quotes (” rather than “), as in:



    I don’t know if that’s the problem (are the fancy quotes are introduced by the blog software?)

  2. TAC
    Posted Oct 12, 2006 at 4:38 AM | Permalink

    #1 Ignore my previous post. Apparently WordPress converts regular quotes into fancy quotes. In addition, I forgot about the .LT. (FORTRAN has its uses!) problem.

%d bloggers like this: