The source code for Hansen’s Step 2- the “urban adjustment” step is online. If anyone’s been able to operate the program through to Step 2, I’d be interested in some stage results for the stations discussed here. The verbal description is not clear and the code is a blizzard of old-fashioned Fortran subscripts, so it will take a little while to translate the procedure into modern languages and see what he’s doing.
Hansen et al 1999 says:
An adjusted urban record is defined only if there are at least three rural neighbors for at least two thirds of the period being adjusted. All rural stations within 1000 km are used to calculate the adjustment, with a weight that decreases linearly to zero at distance 1000 km.
In the stations that I’ve looked at, I’ve seen adjusted stations being calculated when the above condition doesn’t seem to hold. So any light that can be shed on the procedures would be appreciated.
Stations within 1000 km
The first step in Hansen’s Step 2 is the calculation of rural stations within 1000 km of an urban station to be adjusted. The archived script shows that this calculation is done from scratch in each run. No particular harm in that although the information presumably remains the same in every run. I’ve compiled the list of “Hansen-rural” stations for each of the 7364 stations, including in each list the id, name, lat, long, start_raw, end_raw, start_adj, end_adj, distance (from target), GISS-population, GISS-urban and GISS-lights. I archived the result at http://data.climateaudit.org/data/giss/stat_dist.tab . The information is a bit redundant to the station information lists but it was handy having some of the information directly accessible to aid analysis. The object is 33MB . The script to do this is at http://data.climateaudit.org/scripts/station/hansen/step2A.txt
In order to analyze Hansen’s actual urban adjustment mechanism, it’s nice to identify the stations with only a few contributing rural stations. There’s a very pretty R function that can do this in one line. The object containing the information stat_dist.tab is structured as a list in R with 7364 items, one for each GISS station. Each item is an R data-frame – which is structured as a matrix but the columns can be of different types. A very handy thing to be able to use. To obtain the number of rows for the 1000th station in the list, you can use the command
nrow(stat_dist[[1000]]) #86
To obtain the number of rows for all 7364 stations, all you need to do to get a vector is:
stat.length=sapply(stat_dist,nrow)
Just like magic. No blizzard of Fortran subscripts and 5 pages and programming.
Now to locate stations with only a few contributing rural stations for analysis purposes, you can do the following (I’ve already got my GISS information http://data.climateaudit.org/data/station/giss/giss.info.dat loaded as stations.tab
temp=(stat.length<3) ; index=(1:7364)[temp]
temp_urban=(stations$urban==”U”)
stations[temp&temp_urban,1:10]
This yields the following list of sites, all in India:
country id name lat long altitude alt.interp urban pop topo
1180 207 20742867000 NAGPUR SONEGA 21.1 79.05 310 302 U 930 HI
1188 207 20743128000 BEGAMPET 17.5 78.50 545 550 U 1796 HI
1190 207 20743185000 MACHILIPATNAM 16.2 81.15 3 30 U 113 FL
In these three cases, I checked and there was no “adjusted” series for any of the 3 sites, which, in this case, complies with the Hansen et al 1999 3-rural station criterion. I then checked “U” sites with 3 neighbors, all in India, Brazil and New Zealand and again didn’t obtain any adjusted series.
index=(1:7364)[temp_urban& (stat.length==3)];stations[index,1:10]
country id name lat long altitude alt.interp urban pop
1189 207 20743149000 CWC VISHAKHAP 17.70 83.30 66 142 U 353
2054 303 30382599000 NATAL AEROPOR -5.92 -35.25 52 12 U 377
2062 303 30382900000 RECIFE -8.05 -34.92 7 18 U 1184
2088 303 30383781000 SAO PAULO -23.50 -46.62 792 883 U 7034
6011 507 50793116001 AUCKLAND -36.90 174.80 5 9 U 145
6012 507 50793116002 ALBERT PARK -36.85 174.77 49 4 U 145
6013 507 50793116003 AUCKLAND, ALBERT PARK -36.85 174.77 49 4 U 145
6014 507 50793119000 AUCKLAND AIRP -37.02 174.80 6 17 U 145
6024 507 50793890001 DUNEDIN AERODROME -45.93 170.20 1 151 U 77
6025 507 50793893001 DUNEDIN MUSSELBURGH NEW ZE -45.90 170.50 2 190 U 77
I then experimented with sites classified as “small” and got some puzzles as shown below.
index=(1:7364)[temp_small& (stat.length==3)];stations[index,1:10]
country id name lat long altitude alt.interp urban pop topo
312 125 12567197000 FORT-DAUPHIN -25.03 46.95 9 100 S 14 HI
1185 207 20743041000 JAGDALPUR 19.08 82.03 553 512 S 47 HI
1855 224 22443436000 BATTICALOA 7.72 81.70 12 8 S 42 FL
1861 224 22443497000 HAMBANTOTA 6.12 81.13 20 42 S 11 FL
6023 507 50793844000 INVERCARGILL -46.70 168.55 4 28 S 49 FL
For the first two sites, there was no adjusted series, but for the 3-5 series there were adjusted series. Batticaloa and Hambantota are very close and it turns out that their 3 R neighbors are identical. So based on the apparent Hansen adjustment process in which the urban station trends are supposedly coerced to the rural reference stations, one would expect similar adjusted trends. This proves not to be the case. Why – I’m not sure right now and would welcome any thoughts.
The three rural stations for Batticaloa and Hambantota are shown below – note that the distances from the two urban sites to the three rural comparanda are similar and in the same order.
id name long lat start_raw end_raw start_adj end_adj dist pop urban lights
1859 22443476001 DIYATALAWA, SRI 81.00 6.80 1901 1980.917 1901 1980.917 128.2031 NA R A
1200 20743339000 KODAIKANAL 77.47 10.23 1900 1980.917 1900 1980.917 542.0993 NA R A
1203 20743369000 MINICOY 73.15 8.30 1931 2007.917 1931 2007.917 943.8923 NA R A
id name long lat start_raw end_raw start_adj end_adj dist pop urban lights
1859 22443476001 DIYATALAWA, SRI 81.00 6.80 1901 1980.917 1901 1980.917 76.9864 NA R A
1200 20743339000 KODAIKANAL 77.47 10.23 1900 1980.917 1900 1980.917 609.3207 NA R A
1203 20743369000 MINICOY 73.15 8.30 1931 2007.917 1931 2007.917 913.2765 NA R A
The figure below shows the annual temperature values of the three rural stations. Two of them end in 1980 and only one (Minicoy) continues to the present). I test the proportion of years with at least 3 stations to the number of years of adjusted record and found that 2 of the 3 series failed the test. So it would be interesting to locate exactly where Hansen implements the 2/3 criterion in his code. I haven’t been able to do so yet. One also sees that, in this case, much depends on the Minicoy station as the only one continuing to the present.

Now for today’s puzzle – showing the dset=1 and adjusted versions of Hambantota and Batticola. How does Hansen get such different looking adjustments from identical rural comparanda? If anyone can do runs of the actual code for these stations and save any intermediate work, it would help. (Also Wellington NZ).

