I’ve got an idea of how Hansen sliced the salami in Praha-Libus, an enterprise no doubt dear to Lubo” heart or, at least, stomach. There are only 2 series in play. Version 61111520000-0 goes from 1971 to 1991 and version 61111520000 – 1 goes from 1986 to 2007.
For all values where there is only the later version, the combined version is exactly equal to the later version 61111520000 – 1. During the period of overlap, there are 61 potential readings. In 59 of the 61 readings, both series are present and have identical values. So these series are in some sense merely scribal variations. In one of the 61 overlap readings, neither series is present. In one out of 61 readings, Version 61111520000-0 has a value of 1.4, while there is no reading for Version 61111520000-1.
When both versions are present, the combined value is equal to the common value.
But in the earlier portion of the record, the combined value is 0.1 deg C less than the value of Version 61111520000-0, imparting an erroneous slice of 0.1 deg C. from the early portion of the combined record – something that we’ve seen in many other series.
Ver0 Ver1 Combined
Jun 1985 14.4 NA 14.3
Jul 1985 18.7 NA 18.6
Aug 1985 18.0 NA 17.9
…
Nov 1986 5.0 5.0 5.0
Dec 1986 1.4 NA 1.3
Jan 1987 -6.7 -6.7 -6.7
…
May 2006 NA 14.2 14.2
Jun 2006 NA 18.1 18.1
…
But how can one derive this using any sort of algorithm? or Al-Gore-ithm?
Here’s a method for replicating this strange result, melding some of the ideas developed in different contexts as we’ve pondered this.
First – applying an idea that Damek presented for Bagdarin. Obviously if you calculate the difference between the two series for the months in which they have common values, you don’t generate any difference. Damek hypothesized that Hansen calculates the start and the end of the period for which the two series have common values and then calculates their means over the total period. So in this case, the mean for version 0 is calculated over 60 months and for version 1 is calculated over 59 months. Since the missing value of version 1 is a winter month, this yields an “upward bias” for version 1 of 0.1316102. I took this value to one digit 0.1 and then subtracted .01 from it (this derives the result, but there is probably some equivalent rendering), yielding a “warm bias” of 0.09 for version 2, the one continuing to the present. Here’s the code for this step:
temp=!is.na(X[,1])&!is.na(X[,2])
K= range(time(X)[temp])
temp1= (time(X)>=K[1])&(time(X)< =K[2]);sum(temp1)
m1=round(apply(X[temp1,1:2],2,mean,na.rm=TRUE),1)
delta=m1[2]-m1[1] -.01; delta #0.09
In this case, series 2 continues to the present and therefore is not adjusted. Instead of deducting the warm bias from the “warm” series, Hansen deducts it from the other series. It’s pretty hard to deduce a rationale for this, other than mixing up signs, but there is no doubt that it’s deducted from the earlier version. Then do Hansen-rounding: multiply by 10, add 0.49999, take the floor and divide by 10.
Bingo – an exact match.
y= floor(10* apply(cbind(X[,1]-delta,X[,2]),1,mean,na.rm=TRUE)+.4999 )/10
range(y-combine[,3],na.rm=T)
#[1] 0 0
In this particular case, there is quite obviously no bias in one version relative to the other. There is one missing value in one series. However, as a result of one missing value, Hansen added 0.1 deg C to the warming at Praha – Libus relative to a rational combining of records.
Is this what Hansen actually did? It will take some more experiments to find out? However, I’m convinced that this is getting the salient features on the table.


41 Comments
It is beginning to look like an adustment by dartboard to choose the offset, or they may in fact know something about the site micro bias and aren’t sharing.
Steve – very cool. Your crossword puzzle analogy in a recent interview was right on. It’s quite a rush as you get closer and closer to the solution.
Re 1
Is there metadata of history moves/changes for the international sites? In the US data they imply that they adjust for known changes before guessing at non identified changes.
Sticking to HL87’s original description:
The term “period in common” is apparently vague. I interpreted this to mean “months in which both stations had valid records”. But what you seem to be finding is that it means “all months in which both stations kept records”.
HL87 also says:
I think what we are finding is that the stations are actually ordered from latest to earliest. How ties are broken is not yet known.
Steve:
In HL87 in the last paragraph in Section 3, they write: “However, we include the subbox temperature change on the data tape which we make available, and that we anticipate it will useful for many purposes.” I wonder if such a data tape was actually in circulation.
all months in which both stations kept records.
Verbal descriptions like this are not as helpful as the algorithm, which I showed.
Calculate the date range of common records i.e. the first month in which both are represented and the last month in which both are represented. Then calculate the mean of each during that interval for all available values – disregarding that bias may be introduced by a missing winter month or missing summer month. It’s completely nuts.
Do you think that Hansen is using the same process for subboxes and boxes as well as compiling fragmented records? In addition, what does it mean that a single site has multiple records that overlap? I assume these count as one station not multiple stations?
#5. I think that they start from the station with the most records -and that’s OK. In theory it wouldn’t matter whether you added the delta to one series or subtracted it from the other. So I think that the deltas are calculated in count-order, but when the latest series enters into the calculation, it’s kept at zero and the delta goes against the other series.
Now we get into the question of whether Hansen has done his book-keeping correctly. I’ll bet that when the adjustment to the last series is done, Hansen subtracts the delta from the earlier series, regardless of which series it should be subtracted from. In this case, is there any rational basis for subtracting the delta from version 0? Of course not.
#8. Huh? Bernie, this multiple record issue is exactly what this thread is about. Don’t worry about sub-boxes for now. That’s a different story.
#7 Steve…I was using a verbal description to tie back to HL87. I think the algorithmic description is always the clearest, but Hansen does not give us code or pseudo-code.
Well, we’re nowhere near being out of the woods on this yet. John Goetz sent me a list of some sites with 2 series. A couple had no overlap and shed on light. Joenssu is very similar to Praha-Libus in that it has one missing value, but the method above doesn’t work. This time a summer month a summer month is missing. In this case, the result is biased up by 0.1 deg C in the early portion, while according to the above patch proposal, it would be 0.3 deg C. I’m not sure which is worse: Hansen’s disclosure or his methods. What a beauty contest.
Brno is the same as Praha. One missing winter value in Dec 1986 in the overlap period and all early values are downshifted 0.1 deg C.
Steve:
I realize that we are talking about reconciling multiple records at a single site – I guess I thought we were looking at HL87 as a possible source of clues as to what he was also doing when he created individual site records. My earlier comment (#5) was about the existence of an actual freely distributed data tape back around 1987.
Steve,
This sounds really close, but just like Bagdorin, the sign of the correction seems reversed for what it “should” be. Damek, John and I traded some comments over this and were puzzled in that thread.
It would be very interesting to find a dataset pair where the opposite relationship occurs, (i.e. the later set is the cooler set) and see if the sign is still wrong (the earlier set is adjusted up). Something tells me the earlier set always gets adjusted down regardless of sign. Hope I’m wrong, but I have a bad feeling about this.
#13 Bernie…I think where we are now is a pretty narrow focus on how Hansen combines station records at the same location. The current investigations are taking place on stations with exactly two records – presumably the simplest to combine. What Steve, myself, and others are finding is that there is no simple algorithm that allows us to combine two station records in the way Hansen does. Read section 3 of HL87 and you will see a pretty straightforward description, but darned if we can duplicate it.
The problem with Hansen’s data tape is that it contains data (which we have lots of). We need his program, or at least some pseudo-code.
#14 Jeff C…take a look at Gassim. I never want to be accused of cherry-picking, but I will say it was hard to find.
Maybe some progress – anyone. The following bizarre AlGore-ithm yielded Hansen-combinations for both Praha and Joenssu. As before, calculate the averages over the period from the first month with common values to the last month with common values. Subtract the difference of means (which in this case merely reflects whether a summer or winter month is missing. Instead of rounding, which we’ve not seen: calculate the delta according to the formula below.
Yielded both Praha and Joenssu – although it’s late and I’ll have to check this tomorrow.
Does any part of this make sense? The only part that is making sense is why Hansen is refusing to release the source code – and it has nothing to do with being messy.
#14. Joenssu is the reverse. However, it appears to me that “Hansen-rounding” has the effect of attenuating the early-warming in this case. This is related to the impression that the error has an overall bias in favor of later warming although the opposite can occur.
#17. Didn’t work on Gassim
It seems to me that the objective of the random adjustments is to create “usable” data from questionable. The use of multiple Al-Gor-iths to creat this data is highly questionable. If the data was subjected to a single process I would find it less objectionable. My conclusion is that the “adjusted data” does not have the sensitivity nessessary for the job it has been assigned.
Playing around with Gassim it does look like the later set (set 1) was cooler than the earlier set (set 0) by 0.12 deg during the overlap period of 1987 through 1990.917. Interestingly, it does look like the earlier set was adjusted up by 0.2 dB to get the combined data. This does seem to indicate that the sign of the correction is the opposite of what would be expected. I didn’t yet try playing around with Steve’s algorithm, just looking for correction trends at this point.
For some reason the overlap period (1987.000 through 1990.917) for Gassim is exactly the same as for sets 1 and 2 from Bagdorin. Now why would stations in Russia and Saudi Arabia have the same overlap period between datasets? A side issue I know, just something I noticed.
Oops, 0.2 dB in #21 should be 0.2 degrees. My RF Comm engineer habits die hard.
Regarding the overlap period between Gassim and Bagdorin I meant the dates of the overlap were the same, not the data. That seemed a bit unclear when rereading the comment.
Over at RealClimate, Gavin said apropos the algorithm: “Its a two-piece linear correction, not rocket science”. And yet it’s apparently riddled with bugs and rounding errors. I don’t even want to know how actual rocket science turns out at NASA; no wonder their shuttles keep exploding.
Wow, thanks for this detailed care to Prague, Steve! Your ability to decode not only the right data but also the algorithms that could have been used to introduce errors is amazing. 😉 Still, there could be other, less embarrassing “microscopic” mechanisms that would lead to the same outcome although I am not able to say anything specific. Good luck to further adventures, Lubos
If you’re interested in some cultural background about the place, see
http://motls.blogspot.com/2007/09/steve-mcintyre-and-praha-libus.html
Some local info about “Joenssu” might be in order here as well. It has to refer to Joensuu, capital of the Northern Karelia province in Finland. This is an airport station at 62.7 N, 29.6 E, the “easternmost airport in the European Union” (Wikipedia). That means a continental, Russian-type climate in contrast to more maritime, less extreme climes in western and southern Finland. http://fi.wikipedia.org/wiki/Joensuun_lentoasema (sorry, only in Finnish so far.)
A funny thing about the Joensuu record in the GISS database is that, temperature values from Jan. 1971 to July 1987 are missing, with the exception of 1981 which is complete. Now, looking at the Finnish meteorological institute’s website, one learns that the weather station is automatic and has been in operation since March 1955. The GISS data begins in 1961.
There is not much metadata on the internet about the station history. A picture of the airport terminal is here: http://www.airpro.fi/files/airpro/toimipisteet/joensuu.gif – the temperature sensor is possibly located in this asphalted area or close to it, so a local warming bias can not be ruled out. Somebody should pay a visit and talk to the very nice people in Joensuu; any adjustments done to their data in a Manhattan office will suffer from a lack of credibility.
Jeff C. September 1st, 2007 at 11:39 pm,
I’d say temperature anomalies are affecting your BER 🙂
What’s going on in the Central Med?
I looked at Malta temperature (Luqa site) because Luqa has data since 1881 till 2007.
There are 4 source series: 630165970000-4.
Strangely, the adjusted one start on 1951 and is colder by 0.2°C from 1951 to 1954 than the combined one. Since 1955 the adjustment is reduced to -0.1° and after 1961 the two series are identical.
So Hansen’s temperature for the Mediterranean is “adjusted” upwards.
Boston has only 3 raw datasets. Might be ripe for some sharp fella to investigate.
Raw0 1881-2006 is 100% complete.
Raw1 1951-1990 appears scribal from raw0 for most of the period. Has 11 months missing and one full year.
Raw2 1984-1993 is 100% complete. Raw2 has numerous differences.
Some differences I noticed between Raw0 and Raw1. Not comprehensive.
Date Raw0 Raw1 Difference Comment
Sep 1953 19.1 21.3 2.2 ??
Jan 1956 -0.8 -0.5 0.3 eyesight?
Dec 1959 2.4 2 -0.4 ??
Jan 1962 -1.8 1.8 3.6 Sign error of 3.6 ouch!
Mar 1971 2.6 2.8 0.2 eyesight?
Dec 1972 0.6 3.3 2.7 no data next months. Short Dec?
May 1975 16.4 18.6 2.2 ??
Sep 1975 17.7 15.5 -2.2 ??
Jun 1978 20.2 22.4 2.2 No data in May. Short Jun?
Aug 1978 22 21.7 -0.3 No data in July. Short Aug?
Jun 1980 19.1 18.7 -0.4 eyesight??
Jul 1980 22.3 22.8 0.5 eyesight??
Apr 1978 7.2 9.2 2 eyesight?? no data in May
Mar 1982 3.7 3.2 -0.5 eyesight??
Jun 1987 18.4 19.9 1.5 no data Jul. Short Jun?
Below are a couple years compared. The Dec 1972 raw1 value of 3.3 looks to be erroneously derived from a very short month. The following months are missing. It’s likely Dec records were cut short. Its use in calculating the combined data for 1973 has added a tenth to the combined ANN for that year.
A study of the values adjacent to missing data may show a skew in temperature. I’ll bet there are a lot of short record months. Wonder what their minimum requirement for days per month is. People seem to have more liberal time off over the holidays than when I was young. Skeleton staff may not keep up.
1972 1972 1972 1973 1973 1973
Raw1 Raw0 Combined Raw1 Raw0 Combined
JAN 0.6 0.6 0.6 999.9 -0.3 -0.3
FEB -1.3 -1.3 -1.3 999.9 -1.1 -1.1
MAR 2.4 2.4 2.4 999.9 6.3 6.3
APR 7.2 7.2 7.2 999.9 9.9 9.9
MAY 14.2 14.2 14.2 13.9 13.9 13.9
JUN 18.6 18.6 18.6 21.1 21.1 21.1
JUL 23.2 23.2 23.2 23.5 23.5 23.5
AUG 21.9 21.9 21.9 23.8 23.8 23.8
SEP 18.7 18.7 18.7 18 18 18
OCT 11 11 11 13.1 13.1 13.1
NOV 5.7 5.7 5.7 7.7 7.7 7.7
DEC 3.3 0.6 1.9 4.2 4.2 4.2
D-J-F 0.6 0.6 0.6 999.9 -0.3 0.2
M-A-M 7.9 7.9 7.9 999.9 10 10
J-J-A 21.2 21.2 21.2 22.8 22.8 22.8
S-O-N 11.8 11.8 11.8 12.9 12.9 12.9
ANN 10.38 10.38 10.38 999.9 11.38 11.48
Could it be possible they are only displaying one decimal for the months yet they are doing full floating point calculations behind the scenes?
Made a couple errors in my noticed differences near the end. Jun 1987 should be no data May not Jul. Apr 1978 should be Apr 1987. I suppose there could be more. Whatever. Tired. Gonna take a nap.
Re 29 Raw ) vs Raw 1: a difference of 2.2 shows up 4 times in 15 records, twice in Sept, albeit with opposite signs. Seems very strange. Does it provide a clue, or is it just a bizarre coincidence? Murray
Murray Duffin,
About those multiple 2.2 degree differences.
The June 1978 I think is likely due to not having observations for first days of June, since May of that year is missing. I don’t know for sure. I suspect they may have been out of operation straddling May through part of June. That would leave them only the warmer part of June to derive their monthly temperature.
The others three I’m clueless. Could be coincidental or maybe it has somthing to do with the distribution of days of observation. They do seem to be somewhat symmetrical around the warmest time of the year.
Wish I knew how few days they needed out of a month before they felt it acceptable to calculate a monthly temperature. Also what sort of symmetry would be required for the observation days during month. Bah. They probably don’t even consider that last one.
Sheesh, Hanson should be able to clear all this up in a minute. Has anyone contacted their congresspeople? Hanson’s boss? Barton?
I wonder why there are two series for Prague – Libus. The station area itself is here . It’s quite close to the Institute I work in since 1974, so I know, how that part of the city has developed these 30+ years. The station was finished in 1970.
The Czech page for Libus station says that in 1971, official daily measurements started (every 3 hrs) in 1973, they changed it to hourly measurements. In 1991, automatic station Vaisala Milos 200 was installed taking hourly measurements. This station was replaced in 2003 by Vaisala Milos 520. There’s definitely no reason for change or starting a new series for 1986.
Re #36 EW,
I suspect that there being two series is an artifact of how the data
were collected into the GHCN version 2 in the mid 1990s.
About 1000 locations in GHCN had series ending in 1990, and/or 1991, and
new series starting in 1987. My guess is that one, or two, of the
sources of GHCN data ended in 1990 and/or 1991, and that a different
source was used for more recent data for those stations, but with some
overlap to be on the safe side.
YMMV.
EW:
I get the soccer field and tennis courts, bottom right, but what is with qhat appears to be the baseball diamonds, center top? Or are they shot-put, discus, hammer and javelin throwing fields? This site certainly looks reasonable, assuming the station is away from the buildings.
#38
I don’t know – something new.
If you look at Lubos’ link of presentation in Czech about Libus station then and now, you’ll see that in 70’s it was practically in the fields, then the concrete housing (for e.g. right from the station) and more villas have been built (slide 33).
EW:
Fascinating history – it is a shame I cannot read Czech since it looks like some of Lubos charts could shed light on the Praha Libus
data set. Do you know in which direction were the picture taken? If it is looking West away from the river, Praha Libus looks like a possible contender for a UHI trend. If it is looking East, I assume that there would be less of a trend. Perhaps Lubos or anyother person familair with Prague can also comment on the other Prague station, Praha/Ruzyne. I gave up on analyzing the different versions of the Praha_Libus data since there are not enough variations to provide clues as to a possible algorithm. Praha/Ruzyne has a much longer record, 4 sets of data and an intriguing pattern of adjustments. It is out by the airport, about 17 Km from Praha Lubus. With the two stations being so close, I was hoping to be able to check to see whether and how Praha Ruzyne was being used to adjust Praha Libus.
I found the place at Google Earth although with worse resolution. Just paste exactly 50° 0’28.88″N 14°26’49.47″E and you’ll get marked the position of a green trash container visible at the slide No. 33. The picture was apparently taken from south.
About UHI – Prague gets prevalent winds from Atlantic – from west. Therefore the western parts got cleaner and colder air. Libus is at the southern edge and there’s not much of industry or dwellings west of it. I always notice, that this part of the city is colder than the center or the eastern part, where I live (my part of the city is somewhat related to London’s East End – started as a typical neighborhood for working people in 19th century).
EW:
See my #91 here
Mike,
NASA is a very large organization, and your characterizations painting everybody there as incompentent is not only grossly unfair, but tiring as well. Spaceflight is a inherently very dangerous and difficult undertaking. That there have been losses of spacecraft and people is tragic, but not surprising. As a counter, take a look at the two mars rovers. Not only did they get them both there undamaged, but they are still going today – which is well beyond the expected lifetimes for these systems.
I have no problem with people being critical of Hansen, he deserves in, IMHO. If he were in charge of “rocket science” at NASA none of the missions would even get off the ground.
2 Trackbacks
[…] year, which is 26600 fewer than last year, but the number of first graders will be 1400 higher, r…Comment on Slicing some Czech Salami by Roger DueckIt seems to me that the objective of the random adjustments is to create “usable” data from […]
[…] Do you think that Hansen is using the same process for subboxes and boxes as well as compiling fragmented records? In addition, what does it mean that a single site has multiple records that overlap? I assume these count as one station … …more […]