"a very disturbing HARRY_READ_ME.txt file"

Good notes on source code by a blogger here/ Also here


  1. Calvin Ball
    Posted Nov 22, 2009 at 9:51 PM | Permalink

    This, I think, sums it up pretty well:

    Posts: 21296
    Incept: 2007-08-26

    Ok, one thing still to say. After reading 4000 lines of this now, I actually feel sorry for the guy. He’s trying his damnedest to straighten out somebody else’s mess.

    NOT something to crucify him for.

    I sure would like to know what happened to Tim Mitchell and why he wasn’t around to explain all his undocumented **** to. And why he didn’t document it.

    And why such a ****ty programmer was running this.

    And several other things, but still, my main point is:

    Harry didn’t make the mess, he’s trying to clean it up. So don’t think TOO bad of him. I really do feel sorry for him now, and there’s a good chance that some of the things I’ve noted above have been fixed now….

    I’m up to sometime after 2007 now in the file. He’s pasting data from then right where I’m stopping. This isn’t old. It started in 2006 or so.

    Posts: 21296
    Incept: 2007-08-26

    Pika: I don’t think that’s the intention of any of these people. They’re just scientists fighting for funding for their projects. Some of them might belong in that category, but I doubt a whole lot of them do.

    It’s the politicians that you need to focus on (as normal…)

    Odd how often they come up as bad guys. Wonder why that is?

  2. stevemcintyre
    Posted Nov 22, 2009 at 10:25 PM | Permalink


  3. kuhnkat
    Posted Nov 22, 2009 at 10:35 PM | Permalink

    Anyone know where Dr. Tim Mitchell went that he couldn’t be consulted on the code??

  4. kf
    Posted Nov 22, 2009 at 10:38 PM | Permalink

    So do you think there is going to be any talk of this at the AGU meeting in December?

  5. Posted Nov 22, 2009 at 11:52 PM | Permalink

    I’ve also been trying to raise awareness of HARRY_READ_ME.TXT with comments on various high-traffic sites. The reality is that they were just trying things and if it looked right they took it. No formal software validation, not even any formal software development processes, hand-tuned data files scattered all around. It also seems that they were trying to go back after the fact and were having difficulty re-creating some published graphs.

    If you had this level of software quality in a medical device the FDA would close you down in a heartbeat.

  6. Posted Nov 23, 2009 at 12:21 AM | Permalink

    I’ve long admired your work.

    I look forward to a very interesting and careful/detailed analysis of that Harry file from you.

    I’m sure you’ll find no end of useful information in it. I certainly hope so.

    The day that science does not involve others checking ones work is the day it should no longer be called science.

  7. Dishman
    Posted Nov 23, 2009 at 1:38 AM | Permalink

    I’ve been pawing through the file myself for a while.

    As a software guy, I find it really disturbing.

    I haven’t found any references to a specification. Maybe I’m missing them.

    Basically, this software doesn’t qualify as “tested”. It just produces output that looks right:

    It’s not complete yet but it already gives extremely helpful information – I was able to look at the first problem (Guatemala in Autumn 1995 has a massive spike) and find that a station in Mexico has a temperature of 78 degrees in November 1995!

    There’s also indication that the data has been manually fiddled:

    I am seriously close to giving up, again. The history of this is so complex that I can’t get far enough into it before by head hurts and I have to stop. Each parameter has a tortuous history of manual and semi-automated interventions that I simply cannot just go back to early versions and run the update prog. I could be throwing away all kinds of corrections – to lat/lons, to WMOs (yes!), and more.

    Is this CRUTEMP he’s talking about?

    People trust this?

  8. Posted Nov 23, 2009 at 2:06 AM | Permalink


    I don’t know if this is something you already have but here is a good bibliography from Yamal from 1998

    Original Filename: 907975032.txt | Return to the index page | Permalink | Later Emails
    From: Rashit Hantemirov
    To: Keith Briffa
    Subject: Short report on progress in Yamal work
    Date: Fri, 9 Oct 1998 19:17:12 +0500
    Reply-to: Rashit Hantemirov

    Dear Keith,

    I apologize for delay with reply. Below is short information about
    state of Yamal work.

    Samples from 2,172 subfossil larches (appr. 95% of all samples),
    spruces (5%) and birches (solitary finding) have been collected within
    a region centered on about 67030’N, 70000’E at the southern part of
    Yamal Peninsula. All of them have been measured.

    Success has already been achieved in developing a continuous larch
    ring-width chronology extending from the present back to 4999 BC. My
    version of chronology (individual series indexed by corridor method)
    attached (file “yamal.gnr”). I could guarantee today that last
    4600-years interval (2600 BC – 1996 AD) of chronology is reliable.
    Earlier data (5000 BC – 2600 BC) are needed to be examined more

    Using this chronology 1074 subfossil trees have been dated. Temporal
    distribution of trees is attached (file “number”). Unfortunately, I
    can’t sign with confidence the belonging to certain species (larch or
    spruce) of each tree at present.

    Ring width data of 539 dated subfossil trees and 17 living larches are
    attached (file “yamal.rwm”). Some samples measured on 2 or more radii.
    First letter means species (l- larch, p- spruce, _ – uncertain), last
    cipher – radius. These series are examined for missing rings. If you
    need all the dated individual series I can send the rest of data, but
    the others are don’t corrected as regards to missing rings.

    Residuary 1098 subfossil trees don’t dated as yet. More than 200 of
    them have less than 60 rings, dating of such samples often is not
    confident. Great part undated wood remnants most likely older than
    7000 years.

    Some results (I think, the temperature reconstruction you will done
    better than me):

    Millennium-scale changes of interannual tree growth variability have
    been discovered. There were periods of low (5xxx xxxx xxxxBC), middle
    (2xxx xxxx xxxxBC) and high interannual variability (1700 BC – to the

    Exact dating of hundreds of subfossil trees gave a chance to clear up
    the temporal distribution of trees abundance, age structure, frequency
    of trees deaths and appearances during last seven millennia.
    Assessment of polar tree line changes has been carried out by mapping
    of dated subfossil trees.

    According to reconsructions most favorable conditions for tree growth
    have been marked during 5xxx xxxx xxxxBC. At that time position of tree
    line was far northward of recent one.
    [Unfortunately, region of our research don’t include the whole area
    where trees grew during the Holocene. We can maintain that before 1700
    BC tree line was northward of our research area. We have only 3 dated
    remnants of trees from Yuribey River sampled by our colleagues (70 km
    to the north from recent polar tree line) that grew during 4xxx xxxx xxxx
    and 3xxx xxxx xxxxBC.]
    This period is pointed out by low interannual variability of tree
    growth and high trees abundance discontinued, however, by several
    short xxx xxxx xxxxyears) unfavorable periods, most significant of them
    dated about 4xxx xxxx xxxxBC. Since about 2800 BC gradual worsening of
    tree growth condition has begun. Significant shift of the polar tree
    line to the south have been fixed between 1700 and 1600 BC. At the
    same time interannual tree growth variability increased appreciably.
    During last 3600 years most of reconstructed indices have been varying
    not so very significant. Tree line has been shifting within 3-5 km
    near recent one. Low abundance of trees has been fixed during
    1xxx xxxx xxxxBC and xxx xxxx xxxxBC. Relatively high number of trees has been
    noted during xxx xxxx xxxxAD.
    There are no evidences of moving polar timberline to the north during
    last century.

    Please, let me know if you need more data or detailed report.

    Best regards,
    Rashit Hantemirov

    Lab. of Dendrochronology
    Institute of Plant and Animal Ecology
    8 Marta St., 202
    Ekaterinburg, 620144, Russia
    e-mail: rashit@xxxxxxxxx.xxx
    Fax: +7 (34xxx xxxx xxxx; phone: +7 (34xxx xxxx xxxx
    Attachment Converted: “c:eudoraattachyamal.rwm”

    Attachment Converted: “c:eudoraattachYamal.gnr”

    Attachment Converted: “c:eudoraattachNumber”

    • Dean
      Posted Nov 23, 2009 at 8:45 AM | Permalink

      “There are no evidences of moving polar timberline to the north during
      last century.”


  9. GaryC
    Posted Nov 23, 2009 at 2:09 AM | Permalink

    Anybody who is trying to duplicate Harry’s work with the IDL code, first, you have my sympathy.

    Second, if you don’t have access to IDL, it is at least worth trying to use the open source IDL-compatible package GDL. Here is a link to the home page for the project.


  10. The Blissful Ignoramus
    Posted Nov 23, 2009 at 2:56 AM | Permalink

    Further on the CRU documents – has anyone checked out the tellingly titled “Extreme2100.pdf”? All looks suspiciously like cherry-picked “Yamal ‘extreme’ tree rings”, to a mere ignoramus like myself.

  11. AndyL
    Posted Nov 23, 2009 at 3:08 AM | Permalink

    So now we have the real reason why climate scientists would not “free the code”, and it’s something suspected on CA for a long time.


  12. Posted Nov 23, 2009 at 4:14 AM | Permalink

    If you look up Harry at CRU, under his name it says:

    “Dendroclimatology, climate scenario development, data manipulation and visualisation, programming”

    I wonder if CRU might like to rephrase that…

  13. Posted Nov 23, 2009 at 5:41 AM | Permalink

    What is the ‘decline’ thing anyway? It is in a lot of code, seems to involve splicing two data sets, or adjusting later data to get a better fit. Mostly (as a programmer), it seems like a ‘magic number’ thing, where your results aren’t quite right, so you add/multiply by some constant rather than deal with the real problem. Aka “a real bad thing to do” : ).


    printf,1,’Osborn et al. (2004) gridded reconstruction of warm-season’
    printf,1,'(April-September) temperature anomalies (from the 1961-1990 mean).’
    printf,1,’Reconstruction is based on tree-ring density records.’
    printf,1,’NOTE: recent decline in tree-ring density has been ARTIFICIALLY’
    printf,1,’REMOVED to facilitate calibration. THEREFORE, post-1960 values’
    printf,1,’will be much closer to observed temperatures then they should be,’
    printf,1,’which will incorrectly imply the reconstruction is more skilful’
    printf,1,’than it actually is. See Osborn et al. (2004).’


    ; Apply a VERY ARTIFICAL correction for decline!!
    2.6,2.6,2.6]*0.75 ; fudge factor


    ; Tries to reconstruct Apr-Sep temperatures, on a box-by-box basis, from the
    ; EOFs of the MXD data set. This is PCR, although PCs are used as predictors
    ; but not as predictands. This PCR-infilling must be done for a number of
    ; periods, with different EOFs for each period (due to different spatial
    ; coverage). *BUT* don’t do special PCR for the modern period (post-1976),
    ; since they won’t be used due to the decline/correction problem.
    ; Certain boxes that appear to reconstruct well are “manually” removed because
    ; they are isolated and away from any trees.

    ; Remove missing data from start & end (end in 1960 due to decline)
    kl=where((yrmxd ge 1402) and (yrmxd le 1960),n)


    ; We have previously (calibrate_mxd.pro) calibrated the high-pass filtered
    ; MXD over 1911-1990, applied the calibration to unfiltered MXD data (which
    ; gives a zero mean over 1881-1960) after extending the calibration to boxes
    ; without temperature data (pl_calibmxd1.pro). We have identified and
    ; artificially removed (i.e. corrected) the decline in this calibrated
    ; data set. We now recalibrate this corrected calibrated dataset against
    ; the unfiltered 1911-1990 temperature data, and apply the same calibration
    ; to the corrected and uncorrected calibrated MXD data.

    ; Plots 24 yearly maps of calibrated (PCR-infilled or not) MXD reconstructions
    ; of growing season temperatures. Uses “corrected” MXD – but shouldn’t usually
    ; plot past 1960 because these will be artificially adjusted to look closer to
    ; the real temperatures.


    ; Now apply I completely artificial adjustment for the decline
    ; (only where coefficient is positive!)
    for iyr = 0 , mxdnyr-1 do begin
    fdcorrect(*,*,iyr)=fdcorrect(*,*,iyr)-tfac(iyr)*(zcoeff(*,*) > 0.)
    ; Now save the data for later analysis

    ; Plots density ‘decline’ as a time series of the difference between
    ; temperature and density averaged over the region north of 50N,
    ; and an associated pattern in the difference field.
    ; The difference data set is computed using only boxes and years with
    ; both temperature and density in them – i.e., the grid changes in time.
    ; The pattern is computed by correlating and regressing the *filtered*
    ; time series against the unfiltered (or filtered) difference data set.

  14. Posted Nov 23, 2009 at 5:56 AM | Permalink

    I wouldn’t be too hard on Harry. I read through that whole read-me tonight, and he has the job from hell (been a programmer for a long time, but that tops any horror stories I’ve lived through!). He is mostly trying to merge incompatible data sets that are inconsistent, partial, and undocumented. He is also working with code he inherited that seems pretty awful. He is making a lot of mistakes, but everyone does, just not everyone documents them truthfully in semi real time like that. Being honest about your process and retaining a sense of humor in that kind of task speaks a lot about the guy.

    The state of data collection there is pretty shocking though. The end result of all that will never be good — no surprise when things don’t match measurements later, just don’t shoot the programmer : ).

  15. Basil Copeland
    Posted Nov 23, 2009 at 7:20 AM | Permalink

    I’m sure this will all be clearer at some point, but what, if any, is the connection between “Mike’s Nature Trick” and the “decline” adjustments in the HARRY_READ_ME? What little I’ve read suggests to me that “Harry” was working to try to update the code for CRU TS. Is that the conclusion of any of the rest of you? That is different ball game than the paleo/divergence issue, isn’t it?

    • ncmoon
      Posted Nov 23, 2009 at 9:23 AM | Permalink

      Yes, I think Harry has been told to try and replicate their current datasets and especially the gridded output databsets.

      This starts with station data from GHCN and other places, which gets processed (to make .cts files). These then get merged into a ‘database’. The database consisting of .dtb/.dts and other files. This isn’t database, in the sense that a programmer in the 21st century might think of a database. we aren’t talking realional or SQL here. Hell there isn’t even any sign of indexing. This is how universities did computing back in the 1970s. data came in on cards or tapes, and you’d process them to produce a new pile of cards or output on tape.

      To produce the gridded datasets, the .dtb file data gets converted to some text files. These can then be read in by some IDL scripts. IDL is a higher level llanguage than Fortran, has lots of special graphics stuff. And most usefully has a routine for triangulating data. This IDL routine is what does the interpolation between station data points to generatee the gridded data points.

      All of this, is explained by Tim in the readme’s. Harry seems to be so offended by the fact that a readme starts with an underscore, that he doesn’t actually read them.

      It’s my guess that there are no professional software developers at CRU. Nor are there any professional data librarians or archivists. The code and data just exists, a communal pile of junk. If a postgraduate needs something they just have to go and code it for themselves. In practice, in this environment, some poor sod gets a reputation as the person who know how to make the computers work. This was someone called Mark, then Tim got lumbered, and now young Harry has had the baton passed to him.

      what yo uhave to consider is that CRU isn’t a governemnt agency, say like the office of national Statistics or the met Office. They are basically a university department. Status for a university comes in terms of PhDs produced, papers authored and so on. Money isn’t going to be taken away from that, in order to fund someone managing the data properly.

      And anyway, PhD students are a hell of a lot cheaper than professioanl software developers 🙂

  16. Dean
    Posted Nov 23, 2009 at 8:49 AM | Permalink

    What is this code supposed to do? Do we know how it fits into the overall GCM codes (if it does at all)? Is it just a code that manipulates the data?

  17. bender
    Posted Nov 23, 2009 at 8:51 AM | Permalink

    Maybe Phil should spend less time jetting around the world and more time making sure his poor programmers don’t inherit nightmare legacy code. Needles-in-your-eyes awful.

  18. Posted Nov 23, 2009 at 9:24 AM | Permalink

    BTW I’ve uploaded the readme in sections starting here – http://di2.nu/foia/HARRY_READ_ME-0.html – the last section (35) needs further splitting.

    Personally I found section 20 to be short and fascinating….

    • Posted Nov 23, 2009 at 9:48 AM | Permalink

      I should point out that, as with the others here, my main feelings regarding “Harry” are “Thank %deity% I didn’t have that job”.

      It is blindingly obvious that this code – which appears to be the magic blackbox that creates the HADCRU3 stuff – is a total mess. No wonder CRU didn’t want to show it to anyone.

  19. Håkan B
    Posted Nov 23, 2009 at 11:42 AM | Permalink

    Was it Harry blowing the wissle? Is this his alibi?

  20. Posted Nov 23, 2009 at 12:19 PM | Permalink

    I wonder if Harry had been given the thankless job of trying to sort out the CRU data to try to get them asap into a better state for audit – ie knowing that future non-refusable demand for audit/transparency was possible.

  21. SineCos
    Posted Nov 23, 2009 at 2:17 PM | Permalink

    How many times have these guys said that you don’t need the code because all the details of how to duplicate the work are in the published literature?

    This file shows that they can’t duplicate their own work from the published literature. And it also shows that they apparently don’t have all of their old code… so did they ever refuse to provide code rather than admit that they didn’t have it?

  22. tty
    Posted Nov 23, 2009 at 3:54 PM | Permalink

    I can understand that HadCRU isn’t that keen on studying the climatic effects of changes in cloudiness. According to Harry the code to create gridded cloudiness data has been lost for good, and was undocumented, so they can’t recreate it. Up to 1995 they only have the results file. After 1995 they use a different procedure that calculates cloudiness from sunshine data.
    I suppose they could start from scratch, but it would look odd if they got a different result from the published one the next time around.

  23. Håkan B
    Posted Nov 23, 2009 at 4:54 PM | Permalink

    Just a comment, possibly someone has already noticed. In my downloaded file all documents in the mail folder has the timestamp 2009-01-01-06:00:00, same for some files in the documents folder, including README_HARRY.txt, does this lead us somewhere? I do see that the files in the document folder with this timestamp seems to be the most interesting, but maybe I’m just fooled by this observation. greenpeace.txt seems to be a mail which I can’t fint in the mails folder. I find it very hard to believe that someone actually was at work at UEA at this very moment.

  24. Håkan B
    Posted Nov 23, 2009 at 5:53 PM | Permalink

    An addition, I just had a look at the files in Windows, xp, there the timestamps are 2009-01-01-00:00, in my linux box 2009-01-01-00:06, an interesting difference. My timezone is CET, Stockholm Sweden.

  25. Alexander Harvey
    Posted Nov 23, 2009 at 6:48 PM | Permalink

    “Gizza’ job!” “I can do that!” (Yosser Hughes, Boys from the Black stuff)

    Seriously it is a bit of a trip down memory lane.

    Real programmers don’t do comments, and they speak in Octal.
    Any comments that do exist would have been written because the author lacked confidence, didn’t know what he was doing, and would have been written first and then ignored.
    Some comments were useful. One that comes to mind: “You won’t understand this bit!”
    Trust the code, not the programmer.
    Compilers exist, so that you can bin the source code.
    If your program doesn’t do what it ought, run it on another computer, or in the middle of the night.
    But if it is an old object try to find the computer the programmer had hand-wired with that magic extra instruction.

    There are reasons why all that changed. But it was a hoot! Primarily the rise of the IT manager, the introduction of the career path, and the invention of specialities. The death of the programmer as renaissance man.

    I should like to say that the commentary in the readme, reads like something out of the 70s, but I guess it is a bit more up to date than that.

    It seems that he is re-working what was meant to be a one-off but had turned into a two-off, it is not software that was ever intended to handed over to operations. When you are done you are done.

    Unfortunately, stuff like that is never intended for the level of scrutiny it is now going to get. That does not make it bad per se. Maybe he will never get it to do what it did last time, but hey, maybe last time wasn’t right either.

    But the Big Question is: Fit for purpose?

    Well yes and no, the purpose has magically changed. It was when the output was of purely academic interest, but as part of a mission to change the world, sadly not.

    I very much doubt that, Harry’s situation is unique, or that his fubar post holocaust wreckage of a system is as bad as it could be. Also do not blame anyone for not being a professional. The original versions of all the great classic software systems were written by non-professionals. And I might say that during my career whenever any claimed to be a professional I reminded him/her that we did not belong to a professional body, we could not be struck off, defrocked, or court martialed, and that when we walked away we just left the faint echo of jangling spurs.

    From what little I have read, he does know what he is doing, in that he does understand the objective. That is very 1970’s 80’s, pick people who understand the problem over people who know nothing but computing. It used to be said like this: “If I need to explain it all to you, it would be quicker to do it myself.” His is not a huge software project, it is a man and his dog effort. Also it is not necessarily something that can be pondered about too much in advance. It is just an old fashion can of worms that must be swallowed one at a time. To his credit he has provided a commentary, a narrative that could be of help the next time. If the next programmer bothers to read it, which he/she may not.

    So all in all, I can only see this as a lack of trust thing.
    Can someone working in this fashion come up with good results? Yes.
    Should his boss trust him to do so? Yes
    Should the big boss trust the boss? Yes
    Should a politician trust the big boss? Yes
    Should we trust the politician? ….

    Err NO!

    So with the introduction of the last link, a lack of trust cascades back down the line, and the little guy, gets the red face, and the rest of them get to point a lot of fingers both up and down the chain. So who is to blame for their distress? Well, who brought this house of cards down?

    Well I think that is obvious.

    Steve, take a bow.

    Many Thanks


    • Håkan B
      Posted Nov 23, 2009 at 7:51 PM | Permalink

      All very true, I really feel for this guy Harry, he didn’t invent this he’s the guy who was appointed to carry on someones else work, someone who really seems to have messed things up. To make it worse he’s in the wrong bussines, what other bussines would demand a steady, yearly growth from the it system, ‘Hey we had a revenue of 500 million bucks last year and have had a steady 5% increase for the last 10 years, the new system has to keep up with that!’ Poor Harry!

  26. andrew
    Posted Nov 24, 2009 at 10:13 PM | Permalink


    Read it all

  27. Gary Luke
    Posted Nov 25, 2009 at 2:01 AM | Permalink

    Some of the numeric suffixs on filenames handled by Harry might give a clue to their dates. Near the start, the files in the directory beginning with ‘+’ are tmp.0311051552 – that’s almost 4pm on the 5th Nov 2003.

    Almost halfway through the file –
    Writing vap.0710241541.dtb
    and the next attempt at a run
    Writing vap.0710241549.dtb

    Reading other filenames, Harry was working on this patch on the 24th Oct 2007, using a master database from 18th Nov 2003 and an update from 11th Sept 2007.

  28. Fai Mao
    Posted Nov 25, 2009 at 2:02 AM | Permalink

    This actually looks like programmer who was given faulty data and was told to make it fit.

    In any case the AGW people are in deep trouble because now it is clear that they were lying.

    I wonder if Harry was the hacker? Maybe he got tired of dealing with obvious lies and exported all of this.

  29. BruceL
    Posted Nov 25, 2009 at 5:04 AM | Permalink

    I have referred this (the HARRY file) to the UK PM’s office for comment and also for a comment on Monbiot’s call for re-analysis and the head of Jones. I doubt I will get a reply [or will end uplike poor Dr Kelley.]

  30. Gary
    Posted Nov 25, 2009 at 8:40 PM | Permalink

    Another interesting comment from Harry

    “Oh, sod it. It’ll do. I don’t think I can justify spending any longer on a dataset, the previous version of which was
    completely wrong (misnamed) and nobody noticed for five years.”

  31. makomk
    Posted Nov 26, 2009 at 8:42 AM | Permalink

    Robin Debreuil: the decline is a decrease in the temperature calculated by measuring tree growth since 1960 or so. The fudges appear intended to bring it in line with actual measured temperature, which has not declined.

  32. Posted Dec 5, 2009 at 3:31 AM | Permalink


    Think i found Mark

9 Trackbacks

  1. […] […]

  2. […] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]

  3. […] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]

  4. […] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]

  5. […] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]

  6. […] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]

  7. […] line stream-of-consciousness or stream-of-work log that has been discussed some elsewhere [1, 2, 3, […]

  8. […] appears to have been quite correct: the CRU’s bluster was hiding the fact that even they couldn’t understand their own climate models or data. However, as their internal documents show, they were being well-funded for making up scare stories […]

  9. […] now know that these scientists wrote programming notes in the source code of their own climate models admitting that results were being manually […]

%d bloggers like this: