Here are some notes and functions on some work that I did last fall trying to benchmark the GISS Step 2 adjustment in a non-US site. My first efforts to follow the written prescription have been unsuccessful. I’m placing some tools and a benchmark case (Wellington NZ) online and perhaps people who are trying to decode Step 2 from the Fortran angle can push this peanut along a little. All discussion is in R,
The script is online here. http://data.climateaudit.org/scripts/station/hansen/step2/step2.txt
I’ve worked through the major chunks below, but you’re better off to cut and paste from the ASCII version as WordPress does odd things to quotation signs.
First here are commands to load functions that I’ve used.
source(“http://data.climateaudit.org/scripts/station/hansen/step1.txt”) #loads hansenq
source(“http://data.climateaudit.org/scripts/station/hansen/step2/functions.txt”)
Step 2A
This is an optional step that I would encourage people to ignore for present purposes since there is nothing at issue here and I’ve saved the files from this step in ASCII form online. This step finds the “rural” stations within 1000 km and then collects their GISS dset1 histories. For completeness, I’m showing the work here.
First here are commands to download two files of GISS dset1 and dset2 versions (scraped Feb 2008 vintage).
loc.giss=”http://data.climateaudit.org/data/giss”
download.file( file.path(loc.giss,”giss.dset1.tab”),”temp.dat”,mode=”wb”); load(“temp.dat”)
download.file( file.path(loc.giss,”giss.dset2.tab”),”temp.dat”,mode=”wb”); load(“temp.dat”)
Next the station information is downloaded
url= file.path(loc.giss,”giss.info.dat”)
stations=read.table(url,sep=”\t”,header=TRUE,fill=TRUE,quote=””)
names(stations)[3]=”name”
pi180=pi/180;R= 6372.795 #earth’s radius in km
A utility function calculating the great circle distance is used:
circledist =function(loc, lat,long,R=6372.795) {
N=length(lat)
if(N==0) circledist=NA else {
pi180=pi/180;
x= abs(long -loc[2])
if (N>1) delta= apply( cbind(x %%360,(360-x)%%360),1,min) *pi180 else delta= min (x %%360,(360-x)%%360) *pi180
loc=loc*pi180; lat=lat*pi180; long=long*pi180
theta= 2* asin( sqrt( sin( (lat- loc[1])/2 )^2 + cos(lat)*cos(loc[1])* (sin(delta/2))^2 ))
circledist=R*theta
}
circledist
}
The function ruralstations locates the “rural” stations within a distance of 1000 km of the target station.
ruralstations=stations[temp1&temp2&temp_rur,c(“id”,”name”,”long”,”lat”,”start_raw”,”end_raw”,”start_adj”,”end_adj”,”dist”,”pop”,”urban”,”lights”)]
#matrix of stations meeting first two screen tests
ruralstations$dist=circledist(loc=c(lat0,long0),lat=ruralstations$lat,long=ruralstations$long)
#calculate circle distance for all stations in first screen
temp=(ruralstations$dist< =1000)&!is.na(ruralstations$dist)
ruralstations=ruralstations[temp,]
#restrict to stations within 1000 km
ruralstations=ruralstations[order(ruralstations[,"dist"]),]
#order by increasing distance
ruralstations$weights= 1- ruralstations$dist/1000
ruralstations
}
This step leaves us with three data sets for onward use: an information set of 4 stations ( here – HOKITIKA AERO, WHENUAPAI, KAITAIA and CHATHAM ISLAND); a data set holding the dset1 and dset2 versions of Wellington NZ and a data set holding the dset1 versions of the 4 rural comparanda – all located at http://data.climateaudit.org/data/giss/step2/ .
STEP 2
Starting from this step, first we read in the comparandum series, the target series and the information.
chron=read.table(“http://data.climateaudit.org/data/giss/step2/compare.dat”,sep=”\t”,header=TRUE)
chron=ts(chron[,2:ncol(chron)],start=chron[1,1])
target=read.table(“http://data.climateaudit.org/data/giss/step2/target.dat”,sep=”\t”,header=TRUE)
target=ts(target[,2:3],start=target[1,1])
dset1= target[,1]
dset2= target[,2]
rural=read.table(“http://data.climateaudit.org/data/giss/step2/rural.dat”,sep=”\t”,header=TRUE)
weights0=rural$weights
Now we count the number of rural stations with at least 3 values. This is done by counting availability and setting the count at NA for values less than 3. Then the range is determined. In this case a range of 1951-1984 is obtained. In this case, dset2 is calculated for 1939 to 1988. Don’t know why this doesn’t reconcile.
count=ts(apply(!is.na(chron),1,sum),start=tsp(chron)[1])
count.adj=count;count.adj[count<3]=NA
#if less than 3 stations, not calculated
M0=range(time(count)[!is.na(count.adj)])
#range of available
Then I calculated the fraction of this range in which there are at least 3 stations (some cases go in and out.) This is not an issue in this example. I haven’t implemented this test yet as there are other conundrums at hand, but will at some point.
Y=ts.union(dset1,count.adj)
temp1=(time(Y)>=M0[1])&(time(Y)=3 in range with count>=3
N3/(M0[2]-M0[1]+1) # fraction of range
Then a “reference” series is calculated, adding in the rural stations in order of increasing length using the Hansen delta-adjustment that we employed in Step 1.
#set up arrays
N=nrow(chron)
W=t(array(rep(weights0,N),dim=c(length(weights0),N) ) )
W[is.na(chron)]=NA
#matrix of weights: at least 3 stations and station available
long0=apply(chron,2,long)
#calculates length of the candidate rural series
order1=order(long0,decreasing=TRUE);index=NULL
#long0[order1] # orders by decreasing length
delta=rep(NA,K);
fixed=NULL; available= 1:K;
#start with the longest series
j0=order1[1];
reference=chron[,j0] #picks longest series to insert
delta[j0]=hansen_delta(chron[,j0], dset1);
#calculates Hansen delta uses Step 1 methods
reference=reference-delta[j0];
reference.seq=reference
weights1=rep(weights0[j0],length(reference));weights1[is.na(reference)]=NA
N=nrow(chron);fixed=j0
#sequentially adds in the rural series
for (k in 2:K) {
j0=order1[k] #chooses the series to be added
delta[j0]=hansen_delta(chron[,j0], reference);#calculates Hansen delta
weights2=W[,j0]
weights1=apply(as.matrix(W[,fixed]),1,sum,na.rm=T)
#g=function(x) g= weighted.mean(x,,1),na.rm=TRUE)
Y=cbind(chron[,j0]-delta[j0],reference); X=cbind(weights2,weights1)
for(j in 1:N) reference[j]= weighted.mean(Y[j,],X[j,] ,na.rm=T)
fixed=c(fixed,j0)
reference.seq=cbind(reference.seq,reference)
}
The reference series is then used in the two-legged adjustments to follow. These operations are bundled in a function called emulate_dset2 which can be called:
id0=”50793436001″
test=emulate_dset2(id0,method=”read”)
This returns the various dset versions and the reference series.
Two-Legged Adjustment
Now collect the dset versions and reference series with placeholders for the adjustments (keeping an index for dates that meet the purported test criteria of 3 comparanda).
Y=ts.union(test$dset1,test$dset2,test$reference,test$reference,test$reference)
dimnames(Y)[[2]]=c(“dset1″,”dset2″,”reference”,”adjustment”,”adjusted”)
Y[,4:5]=NA
temp1= (test$count>=3)&!is.na(Y[,1]);sum(temp1) #patch assumption
Now calculate the two-legged adjustment over the period in which adjusted values have been reported (this particular period selection has not been replicated in this example, so this is a restricted test.) The two-legged adjustment here is done from the difference between the dset1 version for Wellington and the comparandum series (in column 3), using the following implementation of the two-legged procedure as described in the underlying texts:
twoleg=function(x) {
N=length(x)
stat=rep(NA,N)
temp=!is.na(x)
index=(1:N)[temp]
for (i in index[2:(length(index)-1)]) {
z=data.frame(x, abs( (1:N)-i), (1:N)-i) ;names(z)=c(“y”,”abs”,”x”)
fm=lm(y~abs+x,data=z)
stat[i]=ssq(fm$residuals)
}
index0=(1:N)[(stat==min(stat,na.rm=TRUE))&!is.na(stat)]
z=data.frame(x, abs( (1:N)-index0), (1:N)-i) ;names(z)=c(“y”,”abs”,”x”)
twoleg=lm(y~abs+x,data=z)
twoleg}
The adjustment is inserted in the data set as is the adjusted series.
temp=(time(Y)>=M0[1])&(time(Y)< =M0[2]) #patch assumption
Y[!temp,3]=NA
adj0=twoleg(Y[,3]-Y[,1])
Y[temp&!is.na(Y[,1]),4]=fitted(adj0)
Y[,5]=Y[,1]+Y[,4]
The results are illustrated below. The first panel shows the dset1 version for Wellington and the rural reference series calculated above (green). The second panel shows the two-legged fit (green) to the difference between the two series (dashed), compared to the actual adjustment series (solid). Obviously not at all the same. The bottom panel compares the dset2 version to the emulated dset2 using the above adjustment series.

In this particular case, the adjustment is not at all close. OF course, there are other issues here that we’ve visited previously: like why, NASA has been unable to locate data for Wellington NZ for nearly 20 years, but that’s a different story.
In choosing this site, I wanted to stay away form sites that had dozens of comparanda in case clues arose from the versions. I got stuck last fall. What I’d like from anybody that’s been able to get GISTEMP to work through this step is to extract working files for Wellington NZ (50793436001) and we’ll see if we can decode the intermediate steps.
I’ve also collated various scripts and programs from GISTEMP step 2 in the order that I think that they are implemented in one file here , but have been unable to get much of a foothold in understanding the actual implementation of the calculations.