-
Tip Jar
-
Pages
-
Categories
-
Articles
-
Blogroll
- Accuweather Blogs
- Andrew Revkin
- Anthony Watts
- Bishop Hill
- Bob Tisdale
- Dan Hughes
- David Stockwell
- Icecap
- Idsos
- James Annan
- Jeff Id
- Josh Halpern
- Judith Curry
- Keith Kloor
- Klimazweibel
- Lubos Motl
- Lucia's Blackboard
- Matt Briggs
- NASA GISS
- Nature Blogs
- RealClimate
- Roger Pielke Jr
- Roger Pielke Sr
- Roman M
- Science of Doom
- Tamino
- Warwick Hughes
- Watts Up With That
- William Connolley
- WordPress.com
- World Climate Report
-
Favorite posts
-
Links
-
Weblogs and resources
-
Archives
37 Comments
This, I think, sums it up pretty well:
http://cbullitt.wordpress.com/2009/11/22/the-harry_read_me-file/
Anyone know where Dr. Tim Mitchell went that he couldn’t be consulted on the code??
Perhaps raptured?
http://www.e-n.org.uk/p-1129-Climate-change-and-the-Christian.htm
So do you think there is going to be any talk of this at the AGU meeting in December?
I’ve also been trying to raise awareness of HARRY_READ_ME.TXT with comments on various high-traffic sites. The reality is that they were just trying things and if it looked right they took it. No formal software validation, not even any formal software development processes, hand-tuned data files scattered all around. It also seems that they were trying to go back after the fact and were having difficulty re-creating some published graphs.
If you had this level of software quality in a medical device the FDA would close you down in a heartbeat.
Steve,
I’ve long admired your work.
I look forward to a very interesting and careful/detailed analysis of that Harry file from you.
I’m sure you’ll find no end of useful information in it. I certainly hope so.
The day that science does not involve others checking ones work is the day it should no longer be called science.
Steve
I’ve been pawing through the file myself for a while.
As a software guy, I find it really disturbing.
I haven’t found any references to a specification. Maybe I’m missing them.
Basically, this software doesn’t qualify as “tested”. It just produces output that looks right:
It’s not complete yet but it already gives extremely helpful information – I was able to look at the first problem (Guatemala in Autumn 1995 has a massive spike) and find that a station in Mexico has a temperature of 78 degrees in November 1995!
There’s also indication that the data has been manually fiddled:
I am seriously close to giving up, again. The history of this is so complex that I can’t get far enough into it before by head hurts and I have to stop. Each parameter has a tortuous history of manual and semi-automated interventions that I simply cannot just go back to early versions and run the update prog. I could be throwing away all kinds of corrections – to lat/lons, to WMOs (yes!), and more.
Is this CRUTEMP he’s talking about?
People trust this?
Steve
I don’t know if this is something you already have but here is a good bibliography from Yamal from 1998
Original Filename: 907975032.txt | Return to the index page | Permalink | Later Emails
From: Rashit Hantemirov
To: Keith Briffa
Subject: Short report on progress in Yamal work
Date: Fri, 9 Oct 1998 19:17:12 +0500
Reply-to: Rashit Hantemirov
Dear Keith,
I apologize for delay with reply. Below is short information about
state of Yamal work.
Samples from 2,172 subfossil larches (appr. 95% of all samples),
spruces (5%) and birches (solitary finding) have been collected within
a region centered on about 67030’N, 70000’E at the southern part of
Yamal Peninsula. All of them have been measured.
Success has already been achieved in developing a continuous larch
ring-width chronology extending from the present back to 4999 BC. My
version of chronology (individual series indexed by corridor method)
attached (file “yamal.gnr”). I could guarantee today that last
4600-years interval (2600 BC – 1996 AD) of chronology is reliable.
Earlier data (5000 BC – 2600 BC) are needed to be examined more
properly.
Using this chronology 1074 subfossil trees have been dated. Temporal
distribution of trees is attached (file “number”). Unfortunately, I
can’t sign with confidence the belonging to certain species (larch or
spruce) of each tree at present.
Ring width data of 539 dated subfossil trees and 17 living larches are
attached (file “yamal.rwm”). Some samples measured on 2 or more radii.
First letter means species (l- larch, p- spruce, _ – uncertain), last
cipher – radius. These series are examined for missing rings. If you
need all the dated individual series I can send the rest of data, but
the others are don’t corrected as regards to missing rings.
Residuary 1098 subfossil trees don’t dated as yet. More than 200 of
them have less than 60 rings, dating of such samples often is not
confident. Great part undated wood remnants most likely older than
7000 years.
Some results (I think, the temperature reconstruction you will done
better than me):
Millennium-scale changes of interannual tree growth variability have
been discovered. There were periods of low (5xxx xxxx xxxxBC), middle
(2xxx xxxx xxxxBC) and high interannual variability (1700 BC – to the
present).
Exact dating of hundreds of subfossil trees gave a chance to clear up
the temporal distribution of trees abundance, age structure, frequency
of trees deaths and appearances during last seven millennia.
Assessment of polar tree line changes has been carried out by mapping
of dated subfossil trees.
According to reconsructions most favorable conditions for tree growth
have been marked during 5xxx xxxx xxxxBC. At that time position of tree
line was far northward of recent one.
[Unfortunately, region of our research don’t include the whole area
where trees grew during the Holocene. We can maintain that before 1700
BC tree line was northward of our research area. We have only 3 dated
remnants of trees from Yuribey River sampled by our colleagues (70 km
to the north from recent polar tree line) that grew during 4xxx xxxx xxxx
and 3xxx xxxx xxxxBC.]
This period is pointed out by low interannual variability of tree
growth and high trees abundance discontinued, however, by several
short xxx xxxx xxxxyears) unfavorable periods, most significant of them
dated about 4xxx xxxx xxxxBC. Since about 2800 BC gradual worsening of
tree growth condition has begun. Significant shift of the polar tree
line to the south have been fixed between 1700 and 1600 BC. At the
same time interannual tree growth variability increased appreciably.
During last 3600 years most of reconstructed indices have been varying
not so very significant. Tree line has been shifting within 3-5 km
near recent one. Low abundance of trees has been fixed during
1xxx xxxx xxxxBC and xxx xxxx xxxxBC. Relatively high number of trees has been
noted during xxx xxxx xxxxAD.
There are no evidences of moving polar timberline to the north during
last century.
Please, let me know if you need more data or detailed report.
Best regards,
Rashit Hantemirov
Lab. of Dendrochronology
Institute of Plant and Animal Ecology
8 Marta St., 202
Ekaterinburg, 620144, Russia
e-mail: rashit@xxxxxxxxx.xxx
Fax: +7 (34xxx xxxx xxxx; phone: +7 (34xxx xxxx xxxx
Attachment Converted: “c:eudoraattachyamal.rwm”
Attachment Converted: “c:eudoraattachYamal.gnr”
Attachment Converted: “c:eudoraattachNumber”
“There are no evidences of moving polar timberline to the north during
last century.”
Amazing!!!
Anybody who is trying to duplicate Harry’s work with the IDL code, first, you have my sympathy.
Second, if you don’t have access to IDL, it is at least worth trying to use the open source IDL-compatible package GDL. Here is a link to the home page for the project.
http://gnudatalanguage.sourceforge.net/
Further on the CRU documents – has anyone checked out the tellingly titled “Extreme2100.pdf”? All looks suspiciously like cherry-picked “Yamal ‘extreme’ tree rings”, to a mere ignoramus like myself.
So now we have the real reason why climate scientists would not “free the code”, and it’s something suspected on CA for a long time.
Embarrassment.
If you look up Harry at CRU, under his name it says:
“Dendroclimatology, climate scenario development, data manipulation and visualisation, programming”
I wonder if CRU might like to rephrase that…
What is the ‘decline’ thing anyway? It is in a lot of code, seems to involve splicing two data sets, or adjusting later data to get a better fit. Mostly (as a programmer), it seems like a ‘magic number’ thing, where your results aren’t quite right, so you add/multiply by some constant rather than deal with the real problem. Aka “a real bad thing to do” : ).
\FOIA\documents\osborn-tree6\mann\mxdgrid2ascii.pro
printf,1,’Osborn et al. (2004) gridded reconstruction of warm-season’
printf,1,'(April-September) temperature anomalies (from the 1961-1990 mean).’
printf,1,’Reconstruction is based on tree-ring density records.’
printf,1
printf,1,’NOTE: recent decline in tree-ring density has been ARTIFICIALLY’
printf,1,’REMOVED to facilitate calibration. THEREFORE, post-1960 values’
printf,1,’will be much closer to observed temperatures then they should be,’
printf,1,’which will incorrectly imply the reconstruction is more skilful’
printf,1,’than it actually is. See Osborn et al. (2004).’
\FOIA\documents\osborn-tree6\briffa_sep98_d.pro
;mknormal,yyy,timey,refperiod=[1881,1940]
;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
\FOIA\documents\osborn-tree6\mann\mxd_pcr_localtemp.pro
; Tries to reconstruct Apr-Sep temperatures, on a box-by-box basis, from the
; EOFs of the MXD data set. This is PCR, although PCs are used as predictors
; but not as predictands. This PCR-infilling must be done for a number of
; periods, with different EOFs for each period (due to different spatial
; coverage). *BUT* don’t do special PCR for the modern period (post-1976),
; since they won’t be used due to the decline/correction problem.
; Certain boxes that appear to reconstruct well are “manually” removed because
; they are isolated and away from any trees.
\FOIA\documents\osborn-tree6\combined_wavelet_col.pro
;
; Remove missing data from start & end (end in 1960 due to decline)
;
kl=where((yrmxd ge 1402) and (yrmxd le 1960),n)
sst=prednh(kl)
\FOIA\documents\osborn-tree6\mann\oldprog\calibrate_correctmxd.pro
; We have previously (calibrate_mxd.pro) calibrated the high-pass filtered
; MXD over 1911-1990, applied the calibration to unfiltered MXD data (which
; gives a zero mean over 1881-1960) after extending the calibration to boxes
; without temperature data (pl_calibmxd1.pro). We have identified and
; artificially removed (i.e. corrected) the decline in this calibrated
; data set. We now recalibrate this corrected calibrated dataset against
; the unfiltered 1911-1990 temperature data, and apply the same calibration
; to the corrected and uncorrected calibrated MXD data.
\FOIA\documents\osborn-tree6\mann\oldprog\maps12.pro
;
; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions
; of growing season temperatures. Uses “corrected” MXD – but shouldn’t usually
; plot past 1960 because these will be artificially adjusted to look closer to
; the real temperatures.
;
\FOIA\documents\osborn-tree6\mann\oldprog\pl_decline.pro
;
; Now apply I completely artificial adjustment for the decline
; (only where coefficient is positive!)
;
tfac=declinets-cval
fdcorrect=fdcalib
for iyr = 0 , mxdnyr-1 do begin
fdcorrect(*,*,iyr)=fdcorrect(*,*,iyr)-tfac(iyr)*(zcoeff(*,*) > 0.)
endfor
;
; Now save the data for later analysis
;
save,filename=’calibmxd3.idlsave’,$
g,mxdyear,mxdnyr,fdcalib,mxdfd2,fdcorrect
;
end
\FOIA\documents\osborn-tree6\summer_modes\pl_decline.pro
;
; Plots density ‘decline’ as a time series of the difference between
; temperature and density averaged over the region north of 50N,
; and an associated pattern in the difference field.
; The difference data set is computed using only boxes and years with
; both temperature and density in them – i.e., the grid changes in time.
; The pattern is computed by correlating and regressing the *filtered*
; time series against the unfiltered (or filtered) difference data set.
;
;*** MUST ALTER FUNCT_DECLINE.PRO TO MATCH THE COORDINATES OF THE
; START OF THE DECLINE *** ALTER THIS EVERY TIME YOU CHANGE ANYTHING ***
I wouldn’t be too hard on Harry. I read through that whole read-me tonight, and he has the job from hell (been a programmer for a long time, but that tops any horror stories I’ve lived through!). He is mostly trying to merge incompatible data sets that are inconsistent, partial, and undocumented. He is also working with code he inherited that seems pretty awful. He is making a lot of mistakes, but everyone does, just not everyone documents them truthfully in semi real time like that. Being honest about your process and retaining a sense of humor in that kind of task speaks a lot about the guy.
The state of data collection there is pretty shocking though. The end result of all that will never be good — no surprise when things don’t match measurements later, just don’t shoot the programmer : ).
I’m sure this will all be clearer at some point, but what, if any, is the connection between “Mike’s Nature Trick” and the “decline” adjustments in the HARRY_READ_ME? What little I’ve read suggests to me that “Harry” was working to try to update the code for CRU TS. Is that the conclusion of any of the rest of you? That is different ball game than the paleo/divergence issue, isn’t it?
Yes, I think Harry has been told to try and replicate their current datasets and especially the gridded output databsets.
This starts with station data from GHCN and other places, which gets processed (to make .cts files). These then get merged into a ‘database’. The database consisting of .dtb/.dts and other files. This isn’t database, in the sense that a programmer in the 21st century might think of a database. we aren’t talking realional or SQL here. Hell there isn’t even any sign of indexing. This is how universities did computing back in the 1970s. data came in on cards or tapes, and you’d process them to produce a new pile of cards or output on tape.
To produce the gridded datasets, the .dtb file data gets converted to some text files. These can then be read in by some IDL scripts. IDL is a higher level llanguage than Fortran, has lots of special graphics stuff. And most usefully has a routine for triangulating data. This IDL routine is what does the interpolation between station data points to generatee the gridded data points.
All of this, is explained by Tim in the readme’s. Harry seems to be so offended by the fact that a readme starts with an underscore, that he doesn’t actually read them.
It’s my guess that there are no professional software developers at CRU. Nor are there any professional data librarians or archivists. The code and data just exists, a communal pile of junk. If a postgraduate needs something they just have to go and code it for themselves. In practice, in this environment, some poor sod gets a reputation as the person who know how to make the computers work. This was someone called Mark, then Tim got lumbered, and now young Harry has had the baton passed to him.
what yo uhave to consider is that CRU isn’t a governemnt agency, say like the office of national Statistics or the met Office. They are basically a university department. Status for a university comes in terms of PhDs produced, papers authored and so on. Money isn’t going to be taken away from that, in order to fund someone managing the data properly.
And anyway, PhD students are a hell of a lot cheaper than professioanl software developers 🙂
What is this code supposed to do? Do we know how it fits into the overall GCM codes (if it does at all)? Is it just a code that manipulates the data?
Maybe Phil should spend less time jetting around the world and more time making sure his poor programmers don’t inherit nightmare legacy code. Needles-in-your-eyes awful.
BTW I’ve uploaded the readme in sections starting here – http://di2.nu/foia/HARRY_READ_ME-0.html – the last section (35) needs further splitting.
Personally I found section 20 to be short and fascinating….
I should point out that, as with the others here, my main feelings regarding “Harry” are “Thank %deity% I didn’t have that job”.
It is blindingly obvious that this code – which appears to be the magic blackbox that creates the HADCRU3 stuff – is a total mess. No wonder CRU didn’t want to show it to anyone.
Was it Harry blowing the wissle? Is this his alibi?
I wonder if Harry had been given the thankless job of trying to sort out the CRU data to try to get them asap into a better state for audit – ie knowing that future non-refusable demand for audit/transparency was possible.
How many times have these guys said that you don’t need the code because all the details of how to duplicate the work are in the published literature?
This file shows that they can’t duplicate their own work from the published literature. And it also shows that they apparently don’t have all of their old code… so did they ever refuse to provide code rather than admit that they didn’t have it?
I can understand that HadCRU isn’t that keen on studying the climatic effects of changes in cloudiness. According to Harry the code to create gridded cloudiness data has been lost for good, and was undocumented, so they can’t recreate it. Up to 1995 they only have the results file. After 1995 they use a different procedure that calculates cloudiness from sunshine data.
I suppose they could start from scratch, but it would look odd if they got a different result from the published one the next time around.
Just a comment, possibly someone has already noticed. In my downloaded file all documents in the mail folder has the timestamp 2009-01-01-06:00:00, same for some files in the documents folder, including README_HARRY.txt, does this lead us somewhere? I do see that the files in the document folder with this timestamp seems to be the most interesting, but maybe I’m just fooled by this observation. greenpeace.txt seems to be a mail which I can’t fint in the mails folder. I find it very hard to believe that someone actually was at work at UEA at this very moment.
An addition, I just had a look at the files in Windows, xp, there the timestamps are 2009-01-01-00:00, in my linux box 2009-01-01-00:06, an interesting difference. My timezone is CET, Stockholm Sweden.
“Gizza’ job!” “I can do that!” (Yosser Hughes, Boys from the Black stuff)
Seriously it is a bit of a trip down memory lane.
Real programmers don’t do comments, and they speak in Octal.
Any comments that do exist would have been written because the author lacked confidence, didn’t know what he was doing, and would have been written first and then ignored.
Some comments were useful. One that comes to mind: “You won’t understand this bit!”
Trust the code, not the programmer.
Compilers exist, so that you can bin the source code.
If your program doesn’t do what it ought, run it on another computer, or in the middle of the night.
But if it is an old object try to find the computer the programmer had hand-wired with that magic extra instruction.
There are reasons why all that changed. But it was a hoot! Primarily the rise of the IT manager, the introduction of the career path, and the invention of specialities. The death of the programmer as renaissance man.
I should like to say that the commentary in the readme, reads like something out of the 70s, but I guess it is a bit more up to date than that.
It seems that he is re-working what was meant to be a one-off but had turned into a two-off, it is not software that was ever intended to handed over to operations. When you are done you are done.
Unfortunately, stuff like that is never intended for the level of scrutiny it is now going to get. That does not make it bad per se. Maybe he will never get it to do what it did last time, but hey, maybe last time wasn’t right either.
But the Big Question is: Fit for purpose?
Well yes and no, the purpose has magically changed. It was when the output was of purely academic interest, but as part of a mission to change the world, sadly not.
I very much doubt that, Harry’s situation is unique, or that his fubar post holocaust wreckage of a system is as bad as it could be. Also do not blame anyone for not being a professional. The original versions of all the great classic software systems were written by non-professionals. And I might say that during my career whenever any claimed to be a professional I reminded him/her that we did not belong to a professional body, we could not be struck off, defrocked, or court martialed, and that when we walked away we just left the faint echo of jangling spurs.
From what little I have read, he does know what he is doing, in that he does understand the objective. That is very 1970’s 80’s, pick people who understand the problem over people who know nothing but computing. It used to be said like this: “If I need to explain it all to you, it would be quicker to do it myself.” His is not a huge software project, it is a man and his dog effort. Also it is not necessarily something that can be pondered about too much in advance. It is just an old fashion can of worms that must be swallowed one at a time. To his credit he has provided a commentary, a narrative that could be of help the next time. If the next programmer bothers to read it, which he/she may not.
So all in all, I can only see this as a lack of trust thing.
Can someone working in this fashion come up with good results? Yes.
Should his boss trust him to do so? Yes
Should the big boss trust the boss? Yes
Should a politician trust the big boss? Yes
Should we trust the politician? ….
Err NO!
So with the introduction of the last link, a lack of trust cascades back down the line, and the little guy, gets the red face, and the rest of them get to point a lot of fingers both up and down the chain. So who is to blame for their distress? Well, who brought this house of cards down?
Well I think that is obvious.
Steve, take a bow.
Many Thanks
Alex
All very true, I really feel for this guy Harry, he didn’t invent this he’s the guy who was appointed to carry on someones else work, someone who really seems to have messed things up. To make it worse he’s in the wrong bussines, what other bussines would demand a steady, yearly growth from the it system, ‘Hey we had a revenue of 500 million bucks last year and have had a steady 5% increase for the last 10 years, the new system has to keep up with that!’ Poor Harry!
1062618881.txt
Read it all
Some of the numeric suffixs on filenames handled by Harry might give a clue to their dates. Near the start, the files in the directory beginning with ‘+’ are tmp.0311051552 – that’s almost 4pm on the 5th Nov 2003.
Almost halfway through the file –
WELCOME TO THE DATABASE UPDATER
Writing vap.0710241541.dtb
and the next attempt at a run
Writing vap.0710241549.dtb
Reading other filenames, Harry was working on this patch on the 24th Oct 2007, using a master database from 18th Nov 2003 and an update from 11th Sept 2007.
This actually looks like programmer who was given faulty data and was told to make it fit.
In any case the AGW people are in deep trouble because now it is clear that they were lying.
I wonder if Harry was the hacker? Maybe he got tired of dealing with obvious lies and exported all of this.
I have referred this (the HARRY file) to the UK PM’s office for comment and also for a comment on Monbiot’s call for re-analysis and the head of Jones. I doubt I will get a reply [or will end uplike poor Dr Kelley.]
Another interesting comment from Harry
“Oh, sod it. It’ll do. I don’t think I can justify spending any longer on a dataset, the previous version of which was
completely wrong (misnamed) and nobody noticed for five years.”
Robin Debreuil: the decline is a decrease in the temperature calculated by measuring tree growth since 1960 or so. The fudges appear intended to bring it in line with actual measured temperature, which has not declined.
http://www.geog.ox.ac.uk/staff/mnew.html
Think i found Mark
9 Trackbacks
[…] […]
[…] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]
[…] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]
[…] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]
[…] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]
[…] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]
[…] line stream-of-consciousness or stream-of-work log that has been discussed some elsewhere [1, 2, 3, […]
[…] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]
[…] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]