NASA Step 2: Another Iteration

Here are some more notes and scripts in which I’ve made considerable progress on GISS Step 2. As noted on many occasions, the code is a demented mess – you’d never know that NASA actually has software policies (e.g. here or here . I guess that Hansen and associates regard themselves as being above the law. At this point, I haven’t even begum to approach analysis of whether the code accomplishes its underlying objective. There are innumerable decoding issues – John Goetz, an experienced programmer, compared it to descending into the hell described in a Stephen King novel. I compared it to the meaningless toy in the PPM children’s song – it goes zip when it moves, bop when it stops and whirr when it’s standing still. The endless machinations with binary files may have been necessary with Commodore 64s, but are totally pointless in 2008.

Because of the hapless programming, it takes a long time and considerable patience to figure out what happens when you press any particular button. The frustrating thing is that none of the operations are particularly complicated. Relevant scripts are or will be online here.

I’ve excerpted major chunks below, but you’re better off to cut and paste from an ASCII version as WordPress does odd things to quotation signs. I’ve placed tools online so anyone that’s interested can experiment with arbitrary stations. In a companion post, I’ll illustrate some methods of analyzing individual stations.

Tidying Dset1
One problem for organized programming is that the output of dset1 is a mess. There are 7364 stations, but some of these stations have more than one version so that there are 7716 different versions in dset1. The versioning is different than dset0, since these are not scribal variations but time variations. Programming is complicated by the fact that some (100) versions don’t have any data in them. I’m not sure why these versions are even created, but that’s merely one of the pointless land mines for users that Hansen has created. I haven’t figured out why different dset1 versions are created in some cases, while in others, there are long NA periods between data. A problem for another day.

For dset2 calculations, only versions with 20 years of data were considered. In the dset2 calculations, annual anomaly versions of “Hansen-rural” dset1 stations are used, with the same station often contributing to multiple urban adjustments. In order to do this coherently, I created a list in which I identified the dset1 versions (together with the station and relevant station information) with at least 20 years of data. From the 7716 versions, 100 versions were excluded as having no data whatever (!) and another 1473 versions for having less than 20 years of data, leaving 6143 versions. An information file and a file consisting of 6143 time series of annual anomalies were created. I’ve uploaded the collation script as collation.giss.anom.txt, but interested users would probably be better off downloading my collations.

This was one of those exercises that doesn’t look like much in retrospect, but was done only after battering my head against the wall. The actual script doesn’t take all that long to run even on a home computer.

Wrapper Function
I’ve organized the following functions in a wrapper function emulate_dset2(id0), which will return the emulated dset2 series and intermediate diagnostics for a given station id. The steps in the wrapper function are described below.

Making a Rural Network for Comparison
The first step is collecting the Hansen-rural stations for comparison. For present purposes, we’re only discussing ROW stations, where Hansen-rural is triggered by a GHCN classification as R-rural, a classification readily seen to be extremely flawed (See past CA posts). But hey, why would NASA care? It’s been peer-reviewed. Hansen has a two-stage inspection in which he first searches for stations within 500 km and if he has “enough”, uses those, otherwise he proceeds to 1000 km. I’ve implemented the search using a function ruralstations (id0) , which uses a function circledist. I’ve used a slightly different strategy in which I’ve first created a short list of stations using a type of “Manhattan” metric set a little larger than 1000 km and then calculated great circle distances for these stations, taking the 500 km subset if there are enough. This type of vector processing is more efficient than Hansen’s endless obsolete do loops. This function takes a negligible amount of time to run for a given station.

circledist =function(loc, lat,long,R=6372.795) {
if(N==0) circledist=NA else {
x= abs(long -loc[2])
if (N>1) delta= apply( cbind(x %%360,(360-x)%%360),1,min) *pi180 else delta= min (x %%360,(360-x)%%360) *pi180
loc=loc*pi180; lat=lat*pi180; long=long*pi180
theta= 2* asin( sqrt( sin( (lat- loc[1])/2 )^2 + cos(lat)*cos(loc[1])* (sin(delta/2))^2 ))

ruralstations=function(id0) {
if(is.numeric(id0)) i=id0 else i= (1:7364)[! (names(giss.dset1),id0))]
temp1= (abs (station_versions$lat-lat0) <10) # & !$start_raw)
#initial screening only by latitude: must be within 10 deg latitude to be within 1000 km
test_angle= 10/sin(abs(lat0)*pi180)
x= abs(station_versions$long -long0)
delta= apply( cbind(x %%360,(360-x)%%360),1,min)
temp2=(delta<test_angle )
#second screen: within 1000 km by longitude
#matrix of stations meeting first two screen tests
#calculate circle distance for all stations in first screen
if(sum(temp)==0) ruralstations= NA else {
ruralstations=ruralstations[temp,] #restrict to stations within 1000 km
temp=(ruralstations$dist=3) ruralstations=ruralstations[temp,]
ruralstations=ruralstations[order(ruralstations[,”dist”]),]#order by increasing distance
ruralstations$weights= 1- ruralstations$dist/1000}

Hansen’s actual selections can be decoded from the file PAPARS.GHCN.CL.1000.20.log. I wrote a function to extract information for individual stations from these logs as follows ( a function that I will add to.)

K=min(grep(“possible range”,test))
test2= paparslist[c(1,grep(substr(id0,4,11),paparslist))]
test2[2]=gsub(“-“,” “,test2[2])

By calling log0=extract.log(id0), the first item log0[[1]] is the list of rural stations that Hansen selected.

I got exact matches for Wellington NZ and Batticaloa, both with small comparanda networks. In a larger test (Changsha), I extracted one more station than Hansen did, with Hansen apparently not picking up Zhijiang (20557745000) for reasons that are presently obscure. A project for another day.

Making the Reference Series
From the “rural” network, Hansen makes one reference series weighting each station in proportion to its distance from the target and folding in each station in order of decreasing length, adding/substracting an amount to the entering series to ensure that it has the same average as the prior entrants over the overlap period. I’ve done this in a function hansenref (rural) which uses the information from the rural network. This calculation relies on the prior tidying of the dset1 network. It returns the reference series, the rural anomaly network and the count of non-NA values by year, the latter two useful in diagnostics.

hansenref=function(rural) {
#collate rural anom
for (k in 1:K) anom.rural=ts.union(anom.rural,giss.anom[[ paste(rural$version[k]) ]])
#cross check to binary should be OK

W=t(array(rep(rural$weights,N),dim=c(length(rural$weights),N) ) )
#matrix of weights: at least 3 stations and station available
#calculates length of the candidate rural series
#long0[order1] # orders by decreasing length
fixed=NULL; available= 1:K;

#start with the longest series
if (max(long0)<20) reference=NA else {
reference=anom.rural[,j0] #picks longest series to insert

#sequentially adds in the rural series
for (k in 2:K) {
j0=order1[k] #chooses the series to be added
#if(sum(temp)>=20) {
bias= mean(Y[temp,1])-mean(Y[temp,2])
#in this case, there is a long separation: what happens then
Y[,1]=Y[,1]-bias #these have same centers
g=function(x) weighted.mean(x[1:2],x[3:4],na.rm=T)

Annual Adjustments

The next step is to calculate an annual adjustment series. For Wellington NZ, the adjustment exactly matches the actual dset2 minus dset1 difference. For Batticaloa, it is mostly exact, but is off by 0.1 deg in occasional step years. For Changsha, it doesn’t presently match. So I’m only partway there, but feel that I’ve got the main components.

The adjustment process is seriously weird. I haven’t gone back to the text yet to see how much of this is described in the text, but my guess is that some key steps will not have been described or will have been incompletely described.

My present surmise is that Hansen first tries the “two legged fit” on the continuous subset where there are at least 3 stations. The resulting slopes are then tested against various “flags” (which reject among other things too great a slope in one leg or too great a difference between legs). If the two-legged fit fails any of these tests, it goes back to one legged fit (Wellington is a case where it reverts back to a one-legged fit.) In fact, there are a LOT of one-legged fits in the entire inventory. Call this first range of years M0 (1951-1984 for Wellington).

Hansen also likes to zero things to the present (resulting in constant re-writing of history). It appears that the adjustment is zeroed on the last year of the M0 segment, by subtracting the last adjustment value in the range.

Hansen now arbitrarily expands this range by 50%, allocating the range extension first to any portion of the urban record more recent than the end of the M0 range (1989 for Wellington, with the balance extending the early portion, thus to 1939 in Wellington.) The M1 range is 1939-1989. The adjustment in the extended range is bewildering even for Hansen. He just uses the first and last M0 values in the respective range extensions, regardless of the actual reference series. When you plot up the Hansen adjustment against the difference between the reference series and the urban series (Which gives rise to the adjustment), the inconsistencies can be huge, to say the least.

Bulding the Monthly dset2 Series
The monthly dset2 series is in monthly average (not anomaly). To rebuild the “adjusted” dset2 version, Hansen spreads the annual adjustment into months, starting each year in the prior December. The monthly norms from the original anomaly calculation are recalled and used to re-scale the series. This step can be done pretty easily.

Wrapper Function Again
I’ve collected all the various intermediates as output from the wrapper function emulate_dset2. So for example, the command:


yields the following components:

W$annual: a time series of annual series, including the emulated adjustment and dset2 emulations
W$monthly: same for monthly series
W$adj: results from the two-legged adjustment including both two-legged W$adj$model and one-legged W$adj$fm0
W$rural: info on the rural comparanda
W$info: info on the urban station
W$reference: results from the reference calculation – W$reference$reference – the reference comparandum; W$reference$anom, the rural anomaly network;
W$M0, W$M1 – the 3-station and extended dates.

These can be used for diagnostics. I don’t think that there is any material extra time in collecting things, but, if necessary, once all the corners have been inspected, a pared down function could be used to return the dset2 emulation. At some point, this sort of trail can be used to keep track of exactly what weights are being assigned to the various stations.

emulate_dset2=function(id0) {

#collate urban target
if (is.numeric(id0) &(id0<8000)) i=id0 else i=(1:7364)[(names(giss.dset1)==id0)]
dset1.monthly=ts( c(t(giss.dset1[[i]][[1]][,2:13])) ,start=c(giss.dset1[[i]][[1]][1,1],1),freq=12)
dset1.annual= ts(giss.dset1[[i]][[1]][,18], start=giss.dset1[[i]][[1]][1,1])
dset1.count= ts( apply(![[i]][[1]][,14:17] ),1,sum,na.rm=TRUE),start=giss.dset1[[i]][[1]][1,1])
dset2.monthly=ts( c(t(giss.dset2[[i]][[1]][,2:13])) ,start=c(giss.dset2[[i]][[1]][1,1],1),freq=12)
dset2.annual= ts(giss.dset2[[i]][[1]][,18], start=giss.dset2[[i]][[1]][1,1])[i,]

#collate rural station comparanda

M0=range(time(Y)[temp0]) #range of available
#temp0=(time(Y)>=M0[1])&(time(Y)=3 in range with count>=3

M1=c(NA,max(c(time(Y))[temp4]) ) #1988

if ((adj$flag==0)|(adj$flag==100)) x=fitted(adj$model) else x= fitted(adj$fm0)
temp1A=(time(Y)>=M1[1])&![,”dset1.anom”])&(time(Y)<m0 [1])

##now rebuild monthly series
temp2=(time(Y)>=tsp(Z)[1])&(time(Y)< =tsp(Z)[2])
x= c(t(array(rep(Y[temp2,"adjustment"],12),dim=c(nrow(Y[temp2,]),12)) ))
X= t(array(Z[,"adjusted"],dim=c(12,nrow(Z)/12) ))
Z[,"emulation"]= c(t(scale(X,center= -dset1.norm,scale=FALSE) ))



Wellington NZ

Here are some diagnostic plots illustrating the use of this function for diagnostics. First we execute the wrapper function using the station id and the extraction function from unzipped Hansen logs.


First here is a simple plot comparing the actual delta (dset2 minus dset1) to the calculated adjustment (here for monthly results), a plot obtained very simply as follows:

plot.ts(W$monthly[,”delta”]-W$monthly[,”adjustment”],ylim=c(-.4,.4),main=paste(W$info$name,” dset2 (monthly delta)”))

This yields the following graphic, which, in this case, shows an exact match. So something has been rescued from the living hell of Hansen’s code, though other stations don’t work yet.

Here is a more detailed diagnostic plot, showing a variety or relevant adjustment information. First, I show here (black solid) the actual (rural reference minus the dset1 target) that gives rise to the ultimate adjustment (black dashed). Second, in green, I show the two-legged and one-legged fits over the M0 period (here these come from W$adj$model$fitted.values and W$adj$fm0$fitted.values), both being readily accessible from the list of outputs. In this case, these can be matched exactly to values extracted from Hansen’s PAPARS.list file, though this requires a little interpretation and head-scratching. Interpretations are plotted here in cyan and can be seen to match my fits in this case. All the values in Hansen’s log were divided by 10 to yield deg C, which is plotted here and is the unit of interest. Second, in this case, Hansen shows two positive slopes, but the results here require that the 2nd slope be negative. I haven’t checked how this affects flag tests. In red, I’ve shown first the impact of rounding to even 0.1 deg C and then the series displaced so that the last M) value of the adjustment is 0, together with extensions to the M1 range. This yields the exact match to the observed dset2 minus dset1 adjustment.


Here’s the code to generate the graphic shown here:


temp=(x =log0[[2]]$knee);sum(temp)
points( x[temp], 0.1*(a$Yknee – a$sl2*(x[temp]-xmid) ) ,col=5,pch=19,cex=.1)
#note that this is minus sl2 and plus sl1
points( x, 0.1*(a$Ymid + a$slope*(x-xmid) ) ,col=5,pch=19,cex=.1)
title(paste(W$info$name,” Adjustments”))


  1. George M
    Posted Jun 22, 2008 at 9:00 PM | Permalink

    One has to wonder who can actually be credited with or blamed for this code. Undoubtedly the good Dr. Hansen had a principal role in it initially, but what do you want to bet that hordes of graduate and undergraduate students have concocted most of the pieces of the current program(s)? And are also likely the ones who have to try to wring useful outputs from it on command. I have seen this kind of kludge before which resulted from a continuous turnover of programmers of varying experience and interest in their product.

  2. anna v
    Posted Jun 22, 2008 at 9:22 PM | Permalink

    I want to thank you for doing this for us. Though I started with binary codes back in 1967 or so, my competence ended with FORTRAN programming when I retired.

    I have waded through large volumes of programming, but the people writing the subroutines knew somebody else would come along down the line and would be checking. I suppose people working with this program did not expect anybody to go after the details.

    Good job.

  3. Steve McIntyre
    Posted Jun 22, 2008 at 9:25 PM | Permalink

    #1. HAnsen is the senior author on the GISTEMP publications. In addition, as head of GISS, he is repsonsible for ensuring that NASA policies are complied with, including their software policies. Unfortunately, Hansen, Schmidt et al don’t seem to feel that NASA’s policies apply to them.

  4. Barclay E. MacDonald
    Posted Jun 22, 2008 at 9:29 PM | Permalink

    Steve M.

    Under “Wrapper Function Again”, s/b intermediates as “output“.

    Your explanation of your findings is amazingly clear, given the complexity. Actually, you are pretty amazing. I wanted to tell you that before Hansen has you arrested:))

  5. Steve McIntyre
    Posted Jun 22, 2008 at 10:40 PM | Permalink

    I’ve added a section showing the benchmarking on Wellington NZ, in the process showing how the emulate_dset2 products can be used to produce diagnostic graphics.

  6. W Robichaud
    Posted Jun 22, 2008 at 10:46 PM | Permalink

    Hansen is going to???
    Put oil firm chiefs on trial, says leading climate change scientist

    Speech to US Congress will also criticise lobbyists
    ‘Revolutionary’ policies needed to tackle crisis

  7. Posted Jun 23, 2008 at 12:13 AM | Permalink

    That’s some interesting scientific methodology Hansen has there.


  8. fred
    Posted Jun 23, 2008 at 1:14 AM | Permalink

    Congratulations. And thanks. This is really excellent work. Am sure you will get to the bottom of the remaining issues soon.

  9. James Erlandson
    Posted Jun 23, 2008 at 3:34 AM | Permalink

    James Hansen will be on the Diane Rehm show today at 10:00 am eastern time.

    On the 20th anniversary of his groundbreaking global warming testimony before the U.S. Senate, James Hansen discusses his belief that the planet is dangerously close to tipping points that would be extremely difficult to reverse. The NASA climate scientist reflects on what has and hasn’t changed in two decades, political pressures, and the controversy he’s stirred by speaking out.

  10. Stan Palmer
    Posted Jun 23, 2008 at 4:37 AM | Permalink

    Hansen weights stations with the inverse of their distance. Shouldn’t this be by the inverse square of their distance? The current program would weight two stations at 200km as equivalent to 1 station at 100km but the former represent 4 times as much surface area. This seems to preferentially weight stations that are further from the urban location.

    Has their been any justification provided fro this inverse linear weighting?

  11. Steve McIntyre
    Posted Jun 23, 2008 at 6:01 AM | Permalink

    #10. I don’t think that this matters much given all the issues and is not a procedure that bothers me particularly. There’s no magic formula.

  12. Steve McIntyre
    Posted Jun 23, 2008 at 6:02 AM | Permalink

    I’ve made a slight change to the hansenref subroutine to fix a problem that didn’t affect Wellington but did affect Batticaloa, which now is reconciled as well.

  13. Michael Jankowski
    Posted Jun 23, 2008 at 6:20 AM | Permalink

    Re#11, it may not bother you, may not be relevant to this thread and what you’re attempting to do in general, and may only have a small impact on results. But while there may not be a “magic formula,” the weighting methodology should have to be justified somehow.

    I guess when all of the code has been decoded into R, maybe someone can do some sensitivity analysis regarding inverse linear vs inverse square, 500 and 1000 km cutoffs vs 500 only, or 250 and 750, etc, to see how much of an effect these seemingly arbitrary intermediate steps have on the end result.

  14. Steve McIntyre
    Posted Jun 23, 2008 at 6:48 AM | Permalink

    #13. With a proper methodology, a variety of studies can be carried out. This particular sensitivity is low on my list, but we’re getting towards the point where tools will be available. But I’ll be showing some things that will make you scratch your head much more this particular weighting system.

  15. Kieran
    Posted Jun 23, 2008 at 10:05 AM | Permalink

    First comment from multi-year lurker:

    Steve, would it be worth creating a Sourceforge/Savannah project for this work, so that version tracking/bug tracking
    is available, and third-party contributions can be made?

  16. Arthur Edelstein
    Posted Jun 23, 2008 at 10:23 AM | Permalink

    Per Steve’s request, I have worked out how to generate text files (instead of the usual binary output) from the STEP2 code of GISTEMP. All sources and output files (binary and text) can be downloaded here, either in a single zip file or in individual files.

  17. TheDude
    Posted Jun 23, 2008 at 10:31 AM | Permalink

    Speed it up, Hansen wants you to go to jail for this:P, naw, seriously

  18. Larry T
    Posted Jun 23, 2008 at 11:15 AM | Permalink

    I hop;e this isnt too off topic for this particular thread but I think that using longest overlap period etc as criteria for order of combining data and weighting makes little sense to me. I would think that promary weights should be to those stations whose temperature record was best ( least amount of missing data) and/or most consistent (lowest std dev maybe).

  19. Steve McIntyre
    Posted Jun 23, 2008 at 11:18 AM | Permalink

    #15. There are some things that would make sense to do. I am not prepared to spend the time to set them up. It makes a lot of sense to coordinate blog entries with a wiki style record and some efforts in this direction have been made. But they are not ones that I’ve been involved with and tend to have a little less drive and lose focus. Also the brand here is pretty strong and drives a lot of traffic, so that there’s little point not using the brand. I would dearly like to have a climateaudit wiki to provide a bit of a joint archive, but someone would have to do it and satisfy someone that I know (like John A or Pete or Nicholas or Anthony) that it’s going to work.

  20. John Goetz
    Posted Jun 23, 2008 at 8:36 PM | Permalink

    Folks, what Steve did here to tease the tiniest details out of the methodoolgy is quite amazing. I started clawing my eyeballs out over a week ago when I could no longer follow the notes I had made on the routines only a few hours earlier.

    I agree that some of the details – such as how much to weight a station 200km away, probably don’t matter much. It is seemingly more arbitrary decisions – such as zeroing the last year or extending the overlap 50%, that seem to have greater and more bizarre impacts. Furthermore, none of it seems documented in the peer-reviewed literature.

  21. Frank Upton
    Posted Jul 2, 2008 at 5:23 AM | Permalink

    I suppose the 100 or so versions with no data might be versions that did not, shall we say, give appropriate results and were abandoned.

%d bloggers like this: