Hansen “Step 1” which was the source of the August crossword puzzles does not occur in a crossword puzzle form in the USHCN stations that we’ve been discussing this week. The “Hansen bias” in combining scribal versions, noticed by John Goetz, does not occur in USHCN stations for two separate reasons: (1) in the USHCN stations that I’ve examined, there is only one dset=0 version and the dset=1 version is equal to the dset=0 version and the “Hansen bias” issue noticed in the Russian stations doesn’t come into play; (2) to be material, the “Hansen bias” requires a very short overlap between the historical data and the MCDW data. For the Russian stations, this is typically only 4 years so that the December 1986 anomaly problem comes into play.
Jean S, as so often, was the first to figure out how the Hansen bias is implemented in multiple versions. He sent me his Matlab code about 2 weeks ago. Since then Hansen has archived his Step 1 code. I’ve done it my own way in R and am compatible with Jean S’ results up to an annoying rounding difference, that should be resolvable.
Jean S code (Matlab) is here
My code (R) is here.
Here are notes on the process.
Hansen’s description is as follows:
Two records are combined as shown in Figure 2, if they have a period of overlap. The mean difference or bias between the two records during their period of overlap (dT) is used to adjust one record before the two are averaged, leading to identification of this way for combining records as the “bias” method (HL87) or, alternatively, as the “reference station” method [Peterson et al., 1998b].
The adjustment is useful even with records for nominally the same location, as indicated by the latitude and longitude, because they may differ in the height or surroundings of the thermometer, in their method of calculating daily mean temperature, or in other ways that influence monthly mean temperature. Although the two records to be combined are shown as being distinct in Figure 2, in the majority of cases the overlapping portions of the two records are identical, representing the same measurements that have made their way into more than one data set.
A third record for the same location, if it exists, is then combined with the mean of the first two records in the same way, with all records present for a given year contributing equally to the mean temperature for that year (HL87). This process is continued until all stations with overlap at a given location are employed. If there are additional stations without overlap, these are also combined, without adjustment, provided that the gap between records is no more than 10 years and the mean temperatures for the nearest five year periods of the two records differ by less than one standard deviation. Stations with larger gaps are treated as separate records.
My own code uses the following functions:
hansen_bias
hansenq
recent
overlap_hansen
emulate_dset1
You can download it with a source instruction. Note that the function read.giss is handy for downloading GISS data in a usable form.
source(“http://data.climateaudit.org/scripts/station/hansen/read.giss.txt”)
source(“http://data.climateaudit.org/scripts/station/hansen/step1.txt”)
1. hansen_bias
This function calculates the “Hansen bias” between two time series by comparing the annual averages. It is calculated by “Hansen summing” in an overlap period. Two formats are allowed for: the default method=”hansen” uses the table layout used in GISTEMP: year, 12 months, 4 quarters, annual; method = “ts” will compare two annual time series. This function has been tested against our crossword puzzle examples.
You can test this function as follows (watch for quotation marks out of WordPress, which you may have to replace):
id0=”22340405000″#Gassim,
station0=read.giss(id0)
hansen_bias(station0[[1]],station0[[2]])
#0.15
2. hansenq
This function will calculate quarters and annual averages Hansen-style from monthly data with missing values. Monthly averages are calculated for all months and also quarterly averages (“normal”); note that this is over available dates not a reference period which will cause results to migrate over time. Quarters are DJF, MAM, JJA, SON which causes problems when a scribal version starts in January as we’ve oberved.
If there are 2 or 3 months in a quarter, the quarterly anomaly is calculated as the average anomaly in available months plus the quarterly normal. If there are 3 or 4 quarters, the annual aanomaly is calculated the same way as the quarterly anomaly. The quarterly and annual values are then calculated by re-centering based on the quarterly and monthly averages calculated previously.
Here is a test of this function:
test=hansenq(station0[[1]])
round(test,2)- as.matrix(station0[[1]][,14:18])
In this case, all values are within 0.05, but don’t match exactly. There’s a loose end in the rounding somewhere, that I need to attend to. I think that Jean S’ rounding is exact.
3. recent
This is a little function that calculates the most recent value in a time series; used in ordering
4. overlap_hansen
This is a little function that calculates the length of overlap between two ANNUAL time series.
5. emulate_dset1
This is the function that emulates Hansen’s calculation of dset=1 from dset=0. Its only necessary value is id0, which here is the he 11-digit GISS id for a station: I find it best to use characters. I’ve included a check and patch if you input numeric values, but haven’t tested the patch. The GISS id is 11 digits as follows:
CCCWWWWWIII – 3 -digit country; 5-digit WMO; 3-digit local
There is an option for automatic download of current values from GISS or you can set download0=FALSE and pick up values that you already have for station0 and station1 for testing purposes – as long as these are in Hansen 18-column format.
A variety of objects returned for diagnostic purposes
$station0 the input dset0 as monthly ts
$annual0 -input dset0 as annual
$station1 – input dset0 annual as ts
$annual1 – input dset1 annual as ts
$adjusted -Hansen adjusted monthly
$adjusted_annual – Hansen adjusted annual
$reference – closing Hansen quarterly as ts
$delta – adjustments for each version
Here’s a quick example for Gassim used for the hansen_bias example above.
id0=”22340405000″#Gassim,
test=emulate_dset1(id0);
round(test$delta,2)
# 0.15 0.00
Here’s how to do another example.
id0=”22230554000″ #bagdarin
test=emulate_dset1(id0);
round(test$delta,2)
# -0.34 -0.16 0.00
If you want to access the related objects e.g. if you want to plot the dset=1 annual, you can do so quickly.
plot.ts(test$annual1)
The first series selected is typically the MCDW series (See discussion http://www.climateaudit.org/?p=2034) using Hansen’s “get_best” routine. As others have observed, this is inconsistent with the published methodology and had to be decoded.
34 Comments
My guess is that your rounding difference occurs with negative values.
Its probable that your program/OS/machine rounds, for example, -1.25 to -1.3 instead of -1.2. A trivial and effective way to overcome this is to add a minuscule value to the number being rounded before actually rounding it. If you add 0.00000000001 before rounding then you will get the same results as Jean.
🙂
I’ll try that again:
< Sound of crickets > 🙂
Steve and Jean S. have done a fantastic job replicating the results.
There are a number of problems this exercise has identified. For one thing, Hansen does not follow the process he described in his two papers, HRGS99 and HL87.
HRGS99 gives the first description of what Hansen does, and is what Steve quotes first in this blog entry. This description refers us to a presumably more detailed description of the method in HL87:
Problem 1: From this I clearly read that Hansen orders the stations from longest record to shortest record. However, what we have found is that Hansen starts with the latest record (which is almost never the longest) and works backwards to the earliest record. The station combined with the latest record is the one that overlaps it the most. This usually means a station with overlap from 1987 – 1991 beats out an overlap of 1987 – 1990. A tie seems to be broken by the station with the longest record.
Sub-problem 1a: Because Hansen uses the latest record first, and because the latest record (presumably) continues to grow, this method of station combination will cause the combined record to (in Steve’s words) “migrate” over time.
I originally thought it clear how the length of overlap is determined. When examining the raw data downloaded from GISS, and Hansen’s statement above, one would easily be led to believe that length of overlap is a function of the number of months the records overlapped. However, the team here determined that the number of years of overlap was the deciding factor. Jean S. provided further support of this notion by showing a reference to a HL87 figure 5 that mentioned “common years”. This poses highlights two problems:
Problem 2: Using the coarser granularity of years versus months means that “extrema” (months of overlap before or after the annual overlap) are not used to determine station “bias”, thereby narrowing the window used in the bias calculation. This can give unfair weight to short periods of overlap.
Problem 3: Hansen often must estimate an annual temperature because one or more months of data are missing from one or both records. Missing months are usually not the same from one record to the next. For periods of short overlap, this can give unfair weight to Hansen’s method of estimating missing temperatures.
Sub-problem 1b: Hansen’s estimation method relies in large part on calculating quarterly and annual averages over the period of record. As mentioned above in Sub-problem 1a, the latest record presents a moving target because data is being added to the back end on a monthly basis. Because of this (and enhanced by the fact that the latest station is used first), the combined record is in a constant state of flux, even though all but the latest scribal record are static.
HL87 also says:
Problem 4: This implies that an overlap of less that 20 years does not qualify for combining records. As we have observed, records with four years overlap are combined regularly. Thus, the implemented procedure does not match the documented procedure.
HRGS observes:
If this is the case, why not use the following process: When combining two stations, if the value for one station does not exist, use the other station’s value. For the period of month-to-month overlap, average the two station values.
References:
Hansen, J., and S. Lebedeff, Global trends of measured surface air temperature, J. Geophys. Res., 92, 13,345-13,372, 1987
Hansen, J., R. Ruedy, J. Glascoe, and Mki. Sato, 1999: GISS analysis of surface temperature change. J. Geophys. Res., 104, 30997-31022, doi:10.1029/1999JD900835.
If this is the case, why not use the following process: When combining two stations, if the value for one station does not exist, use the other stations value. For the period of month-to-month overlap, average the two station values.
If you know that there is bias in the dataset because you can measure it where the dataset overlaps, then do you:
1. Throw out the part of the dataset that does not overlap (because you know it has a bias),
2. Use it without correction,
or
3. Try to adjust it using the bias measurement found in the overlap (assuming it is predictable).
I’m going to ask again.
If what we want is a slope, why not calculate the slope from internally consistent records and then combine the slopes?
No data homogenization required.
If climate is the interest why do records have to be combined?
JOhn G, your wording here is not the best – as you know, we’re not talking about different stations in most cases, but simply two different editions of the station history – what I’ve called scribal versions. I don’t think the we have any misunderstanding here, but your terminology here is not the most apt.
This may be off topic but is related to the sphere of influence which this site is currently (and thankfully) achieving. A recent post by Cryosphere Today:
When the story broke I thought it would make every headline around the globe. It did not. Instead, the following correction was made days later:
Since this falls into the realm of climate I was hoping you or one of your auditors could investigate whether or not this was a glitch and if so, if all of the data previously provided by this entity is somehow corrupted. I find retractions (or suspicious modifications) of very newsworthy events such as the one posted above highly suspect. In any case (for what it is worth) your search for the truth is noble and your supporters span the globe. Keep up the good work!
Thanks a ton for this post Steve. Can I have your permission to cross post your R-code as the initial content for step 1 on my wiki? Note that it will then be creative commons licensed. I’ll probably make some changes to use pre-loaded datasets instead of going to the GISS website (which appears to be down)
Steve,
You are correct. I should have replaced station with scribal record.
#9. Sure. You should include some comments about the “HAnsen bias” from prior discussions though.
#4, #9 and #11
John:
So what does this say about combining station records as opposed to scribal versions?
Steve, re Hansen’s first quote, creating new averages stepwise from succesive sets. This introduces a bias.
Take the seven simple unordered numbers 5, 7, 3, 8, 2, 6, 4.
The sum is 35 and the arithmetic average is 5.
If you average the first two, 5 and 7, you get 6
Average 6 and 3 to get 4.5
Average 4.5 and 8 to get 6.25
Average 6.25 and 2 to get 4.125
Average 4.125 and 6 to get 5.0625
Average 5.0625 and 4 to get the final result of 4.503125, which is NOT 5.0000.
The method weights the result in favour of the last numbers.
If in the application of the method the last numbers averaged are from the stations with the shortest records, this could be because they are both recent and warmer. They would thus magnify the warming trend.
If you order the above test numbers from high to low you get a rather lower result (about 3 average) than from ordering from low to high (about 7 average).
Am I reading this right?
Geoff.
One thing that rather confuses me. It’s obvious that there are a lot of different factors and problematic issues with trying to create the best temperature record possible, and get the most accurate results. Hansen’s got a big difficult job there. Why be so belligerent and difficult about it? Why did Schmidt spend so much time at RC making excuses for not releasing the code? Why wasn’t there more transparency to the process from the get-go? Why start changing all the data for no reason?
They confuse me.
#13. I don’t think that this particular error occurs. His code has a method for weighting each iteration by the number of series. I’ve emulated this by re-calculating the interim average using all included stations.
#13 Geoff
The bias method as described in HL87 (looking at the equations) actually does the division properly when combining successive records. This is done through the selection of the weights applied to the terms in the numerator. I originally thought the same thing, which is what prompted me to originally write Dr. Hansen and request a copy of his paper, which he did so promptly.
#12 Bernie
I haven’t even begun to boil that ocean. His method of combining stations is apparently the same as what is done for scribal records, with the exception that the weights given to station records decrease with distance from a grid center, whereas when combining scribal records all records are weighted equally. The method introduces bias, as does the method for estimating missing annual averages. I just don’t have an opinion right now as to whether or not what he does is mathematically appropriate.
ref 10 John Goetz, the distinction between station and scribal records is a bit blurred in the 1987 to present combinations. The 1987-ish to 1991-ish time frame is when new instrumentation on the same site was installed. The problem 1 and subs you mentioned is even more evident in the daily values.
In the USHCN the transition from LIG to MMTS, ASOS etc. instruments was made starting in the mid ’80s. While there are no separate scribal records in the GISS/GHCN data, there was an overlap period. Some problems with the new instrumentation accuracy are noted in the GHCN documents. Several AMS papers indicate that the new instruments have primarily warm bias errors. As of 2004, the AMS noted that errors from 0.3 C to over 1 degree C were still noted at some sites. The GHCN station history data base should show how combinations were made with the new instruments and if there were any adjustments to the LIG and digital histories other than TOB. Unfortunately, my Vista operating system is not playing well with 130 MB station history file. Until I work that out I can’t be of much help.
If this has been mentioned elsewhere, forgive me.
#13: Geoff, although I think there is an (small?) ordering bias in Hansen’s method, it is not coming from the averaging the way you describe. Hansen’s method work in the following way (hopefully I can describe this in an understandable way; English is not my native language):
1) Initialize a list A of “adjusted” series to include the first series. The selection of this
series was described earlier by Steve, it’s not in any published works but was decoded from the published code.
Put the rest of the series to a list B of “available series”.
2) Calculate the “reference version” r by the average of all series in the list A.
3) Find the the longest matching (in annual averages) series s between the reference series r and the series from the available list B.
4) Calculate the Hansen bias between the reference version r and the series s. Adjust (remove bias from) the series
s and put that to the list A of adjusted series. Remove s from the list B.
5) Repeat from 2) until the list (B) is empty (no more series to be combined)
6) Calculate the final combined series as the average of the list A.
So the most crucial thing in the order of the series is which series is selected first. This is done in the get_best routine described by Steve here.
#19 (cotd): The algorithm is implemented in Hansen’s code in a, hmmm, cumbersome way. Basicly, it is keeping track of the cumulatively sum of temperature values of the adjusted temperatures and the number of stations contributing to each value. Dividing those two you get my “reference series”.
Just a silly question : where to find the R code to scrape gistemp data in steve’s script directory ? (I’ll add pause code to be undetected by robot.txt promise !).
I’ve never understood the averaging business.
From my experience with implementing DSP algorithms in hardware,
there is a very big difference between averaging a time varying signal
and filtering it.
Re 18
captdallas2 says:
September 19th, 2007 at 8:23 am
There have been several anecdotal instances of station custodians who claimed that data were taken from both the MMTS and LIG measuring equipment for up to two years after the MMTS was installed. The original data sheets or copies may be available in the NCDC archives somewhere.
#18. Again, as I stated in the post, this step is not material in USHCN stations – pease re-read on this. So please don’t bring USHCN issues into this post.
Has anyone generated a set of truth data to test with, other than rerunning sets of data that exhibit the various flaws that these step 1 algorithms are supposed to fix? That is,
There are some data sets that are much continuous and reliable, without the flaws that the step 1 algorithms are supposed to fix. They could be artifically modified to exhibit the flaws, and after running through the step 1 algorithms are the flaws fixed, by comparision to the original flawless data set?
This would illuminate whether the step 1 algorithms are creating any of the postulated problems / biases that have been mentioned by various comments(e.g., #4). It would be a valuable addition to distributing the step 1 algorithms, to distribute these test data sets.
SteveMC.
I think a “data flow” diagram will help people understand the “maze o data”
A picture is worth 988.6784679 words
RE: #26 – in combination with a state machine diagram (to depict the specific logic including all its branches and erroneous race conditions and dead ends) that would be a very powerful analysis step.
RE: #27 – I’d add that (this is to all you estudientes out there, especially ones trying to come up with Masters Thesis or PhD dissertation topics) doing such an excercise would, in and of itself, constitute valuable original research and a welcome contribution to the general knowledge and experience base of Software Engineering and Quality studies. A wonderful case study. Especially for those of you interested in root cause and corrective action analysis, forensics, etc.
re 27&28.
Exactly!
Re 15, 16, 19, 20 etc
Thank you for your patience re my #14. Occasionally, very occasionally, there is a Eureka moment but this was not one of them. You guys are far better at that and far closer to analysis of the developments.
I continue to have serious misgivings about (a) adjusting a data string based on an overlap (why up, why down, which one?) instead of simply averaging the overlap pairs (b) infilling missing values by interpolations that make dubious assumptions about constancy of weather over time (c) inappropriate assigning of weights to weighted averages (d) using a search range of 1000 km. and a linear drop-off as well (e) rejecting outliers instead of using them for valuable metadata interpretations.(f) as John G says, using a method that favours one dataset through order of selection over another when no apparent basis for preference exists.
Can I please ask another basic question? Why does one not take a simple arithmetic average of all available values at a point at a time, using the only value if missing data reduce you to just that one? (and using no value if one does not exist).
I wrote Steve a little while ago pointing out that we still have an error in the ordering used to combine scribal records. When I ran Steve’s code against all Russian stations, I noticed the bias calculated tended to be warm. This was counter to what I had observed when I randomly looked at GISS graphs for the Russian stations. I found it very easy to find stations where the early record was biased cool, but quite difficult to find ones in which the early record was biased warm. I don’t tend to be streaky with my luck, so this sounded some alarm bells.
I took a quick look at one of the stations with two records and a large warm bias applied to one of the records: CEKUNDA (GISS ID 22231532000). If you compare the two graphs here and here, you can see that the record is obviously biased downward (this is why Al Gore invented tabbed browsing). So something about our assumptions regarding the first record is incorrect.
Two other stations quickly jumped out and I think help point to an answer: BARGUZIN (GISS ID 22230636000) and BORZJA (GISS ID 22230965000). I see that the first record is at least tied for the latest record, if it is not clearly the latest record. If it is tied for the latest record, I noticed it is also at least tied for the longest record, as measured by the number of valid annual averages. Interestingly, Borzja’s second record is longer than the first, as measured in months, but equal as measured in years. The first record seems to have been selected as the starting point in the Step 1 process. Interesting…because it has more estimated values than the second record (a whole ‘nother topic).
At any rate, the ordering seems to be:
1) sort by record with the latest annual data
2) if tie, sort by record with the longest annual record
3) if tie, use the record with the smallest record index.
Now, in Russia (and the rest of the world outside the US), the only way the first record index has a snowball’s chance in an AGW world is if no MCDW (post 1990) record exists for the station. Well, that seems to happen a lot in the rest of the world, and it happened with Borzja, Cekunda, and Barguzin. And a bunch of others.
What I find really interesting, and will investigate further, is that the first record for those three stations (which “ceased operation” c1990) is cooler than the other records. That means the other records for that station are biased downward to the first record. However, for many, many stations in Russia that continued reporting past 1990, the first record is warmer than most, if not all, later records that overlap it. Thus, the first record is biased downward toward the other records. I find it interesting that, no matter how we slice it, the early Russian record gets biased downward. I didn’t think they were that oppressed.
HEre’s HAnsen’s ranking subroutine from the comb_records.py. Picking the right-hand series only works when there’s a MCDW record (Which is always on the right.) John, in the cases that you’ve identified, looking at this script: there’s no USHCN, MCDW applicable; so ranks are tied; the lengths are tied. Examining the details of the program, because the second record failed to be longer than the first record, the first record kept priority. The ordering of records here is taken over from GHCN. I inquired from GHCN whether they had any concordance of the sources of their versions and there is none online; they said that there may be some information offline somewhere and they are looking into it (even if they give an answer that I don’t like, the NOAA folks do so pleasantly as opposed to Hansen’s crowd).
Anyway this confirms John G’s point and I need to modify my algorithm a little. As to why the ordering should seemingly be non-commutative in its impact on trends (if indeed that is the case), that should prove another worthy topic for investigation.
Re #32
Sorry to bother, but my reading of the subroutine is that if no record is ranked higher than ‘UNKNOWN’ then the first record with the longest length is used. If there is at least one record with a rank greater than ‘UNKNOWN’ then the first record with the highest rank is used regardless of length.
#33. yes we’re on the same page.