Hansen Code

Technical Comments only.

163 Comments

  1. Not sure
    Posted Sep 8, 2007 at 5:23 PM | Permalink

    STEP0 completes successfully on a Gentoo Linux system. The result is a 43MB text file that looks to be in the same format as the v2.mean downloaded from GHCN.

    Notes: The FORTRAN compiler was gfortran 4.1.1. I used ksh-93 from Gentoo portage. In addition to adding “FC=gfortran” to your environment, you’ll have to add the current directory “.” to your PATH. Also, my version of tail did not like line 43 of do_comb_step0.sh. I had to change it to:

    tail -n+100 input_files/t_hohenpeissenberg_200306.txt_as_received_July17_2003 > t_hohenpeissenberg

    Note that that does not exactly “disregard pre-1880 data” as the comments say, because it still includes 1879.

  2. Martin Å
    Posted Sep 8, 2007 at 5:32 PM | Permalink

    I also managed to complete STEP0 on a Ubuntu system. But STEP1 seems harder, since you have to compile some python extensions first. I got some error messages doing that which I don’t have a clue how to deal with. And without the extensions the python scripts wont run. Anyone with better luck?

  3. Not sure
    Posted Sep 8, 2007 at 6:01 PM | Permalink

    RE: 111

    Yeah, I stumbled on the Python modules too. They were developed against an ancient version of Python (1.5). The “Makefile.pre.in” method is obsolete. I’m currently trying to build them using the newer “distutils” method, but I”m probably going to have to take a break for a while.

  4. bernie
    Posted Sep 8, 2007 at 6:09 PM | Permalink

    Martin #103. Thanks for posting the code. It looks like “Bad” is defined earlier. Also what I see looks close to what Steve and others had flushed out, namely the link to the long term anomaly for the station. I still think we need to be sure we understand how they are combining the data. SOme stations like Praha Ruzyne #91 — have adjustments over the time period of close to 3C.

    #91
    Doug
    Excellent question. I think one task is to hone in on whatever leads to the classification of a site as rural versus non-rural, lit or unlit since these classifications dictate what should be influencing what. Rural or unlit stations that should really be classified as non-rural, i.e., large potential UHI or microsite effects, could have a big influence on adjustments that are made or not made to a non-rural site.

  5. Posted Sep 8, 2007 at 9:00 PM | Permalink

    Here is the kind of thing that bothers me:

    GISTEMP_sources/STEP0/USHCN2v2.f:

    if(temp.gt.-99.00) itemp(m)=nint( 50.*(temp-32.)/9 ) ! F->.1C

    — Sinan

  6. Not sure
    Posted Sep 9, 2007 at 1:02 AM | Permalink

    I created trivial setup.py files to get the Python C extensions to compile, but I guess the approach in #184 will work as well. Also, I used perl to munge the .c files and fix the references to deprecated memory functions e.g.:

    perl -p -i.pymalloc -e "s/Py_Malloc/PyMem_Malloc/g" EXTENSIONS/cartgrid/cartgridmodule.c

    I also found that lines 504 and 505 in stationstringmodule.c will cause my Python interpreter (2.4) to segfault. I commented them out, as they seem to be in error. Finally, I changed line 54 in v2_to_bdb.py to read “if key == ‘id':”. There’s probably other stuff that I’m forgetting, but I’m still getting memory errors (*** glibc detected *** /usr/bin/python: free(): invalid pointer: 0x080e2c8c ***). This last error is also in stationstringmodule. It’s late and I’m going to bed. I’ll return to this tomorrow night if no one else has figured it out by then.

  7. Mark
    Posted Sep 9, 2007 at 5:32 AM | Permalink

    Not Sure (#191)

    To get rid of the glibc free() error, comment out the two PY{Mem}_Free calls near line 506 of stationstringmodule.c

    Mark.

  8. Not sure
    Posted Sep 9, 2007 at 9:12 AM | Permalink

    Mark (202) I did comment those lines as the Python docs make it clear that the results of PyString_AsString must not be deallocated. I was getting a different memory error before I did this.

    Dr. Motl (205) that will certainly work, but it will litter the code with useless comments which eventually could become a nuisance.

  9. Not sure
    Posted Sep 9, 2007 at 9:38 AM | Permalink

    The changes I’ve made should not affect the logic of the code. I just wonder if I can make it work on a low-cost Linux system. Sort of my personal crossword puzzle, if you will. I justify it to myself by thinking this will help make the logic accessible to those much smarter and actually versed in statistics, unlike me. It looks like this stuff was run on an AIX system, and not many people get time on those.

    Dr. Hansen certainly has nothing to fear from me, and I actually harbor an (unrealistic?) hope that my efforts will be useful to him and his team. I find it interesting that it seems most think any errors that are found will adjust the mean temperature downwards. I guess that is to be expected considering Dr. Hansen’s predictions of doom, and his evasiveness, but I am sure that the people on this site will be honest about any errors that underestimated the amount of warming, so I fail to see the harm done here.

  10. Posted Sep 9, 2007 at 10:26 AM | Permalink

    #223 I count 9854 lines of code actually. Only one order of magnitude less complex. Gavin is exaggerating.

    $ find . -type f -name *.sh -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
    297
    $ find . -type f -name *.f -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
    4239
    $ find . -type f -name *.py -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
    1167
    $ find . -type f -name *.c -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
    4151

    Total 9854

  11. Not sure
    Posted Sep 9, 2007 at 11:30 AM | Permalink

    Mark (218) Thanks, that did it. Those lines looked suspicious to me too, but I really didn’t know what they do, so I left them alone. Unfortunately it’s still not running for me. The script drop_strange.py dies for me now, and I’m out of time for today.

    Out of curiosity, did you write setup.py scripts or did you u the approach in #184?

  12. Mark
    Posted Sep 9, 2007 at 12:17 PM | Permalink

    Not Sure (#243)

    Out of curiosity, did you write setup.py scripts or did you u the approach in #184?

    Hi,

    First thing I did was to install a seperate copy of python for use only by this program (Python 2.5.1).
    Changed the PATH to point to the new python.
    Modified the Makefile.pre.in file to remove the aix defines and change the shell to /bin/bash.
    Changed the #!/…./python path in the python scripts to point to the new python.
    Made the changes as described so far.
    And ran the scripts.

    If you build python without any CFLAG modifiers it will build with debugging symbols built in. This version can then be run under gdb, the gnu debugger (which is how I found the seg faults).

    create a little parameter file with the arguments to use, ie:


    set args -d comb_records.py v2.mean_comb
    run

    and then execute by running gdb -x {parameter_file_name} /path/to/new/python, if anything goes wrong the debugger will break and it should be possible to work out what has gone wrong.

    There are some other small changes I made but they should not affect the running of the code.
    Maybe a set of diffs posted to the google code site would help.

    Mark.

  13. Mark
    Posted Sep 9, 2007 at 1:04 PM | Permalink

    Hi,

    I was so busy talking about the C code I should have mentioned that in drop_strange.py I changed the
    bsddb.hashopen() calls from .hashopen(NEW_BDB_NAME, ‘nc’) to .hashopen(NEW_BDB_NAME, ‘n’), removing the c from the hashopen options arguements.

    hth,
    Mark.

  14. Mark
    Posted Sep 9, 2007 at 2:14 PM | Permalink

    Hi,

    For anyone interested I have placed a patch at the follwing address

    http://www.zen103403.zen.co.uk/patches/patch.html

    It should apply cleanly to a fresh extract of the GISTEMP_sources.tar.gz file (don’t forget to extract the EXTENSION.tgz file in the STEP1 directory).

    This should allow completion of steps 0 and 1 and part of step 2.

    Mark.

  15. 2dogs
    Posted Sep 9, 2007 at 2:51 PM | Permalink

    #39 – Use the PRE tag

  16. Martin Å
    Posted Sep 9, 2007 at 4:48 PM | Permalink

    Regarding the code.

    I have with help from Mark:s patch (#268) been able to run STEP0 and STEP1 on data downloaded from NASA. I have only verified the two separate records and the combined record from Brno/Turany (611117230001). But all figures seems to match the version you can download from GISS. There is an artificial 0.1 degree lowering of the 1951-1986 part of the record, due to missing value in the older record for december 1991. The seasonal averages also doesn’t add up when a value is missing. The seasonal means are not produced as an output from STEP1 though.

  17. steven mosher
    Posted Sep 9, 2007 at 4:54 PM | Permalink

    RE 286. VERY COOL.

    After STEP 1 the output shuld be GISS temp Dset =1… combing stations at same location.

    PERHAPS a good test Prog to write would be to compare GISS temp dset=1 ( steve has scraped that)
    with the out put of step 1.

  18. anonymous
    Posted Sep 9, 2007 at 7:15 PM | Permalink

    quick programming note about python and division for those not familiar with the language:

    Float/Float -> Float

    Int/Int -> Int (not Float; and is always truncated)

    for example:

    >>> a = 2.0
    >>> b = 4.0
    >>> a/b
    0.5
    >>> a = 2
    >>> b = 4
    >>> a/b
    0

    Also, keep in mind that it will not be possible to tell from the source code the exact type of any given variable as type is a run type feature. That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.

  19. Mark
    Posted Sep 9, 2007 at 7:40 PM | Permalink

    Hi,

    I’ve just updated the page pointed to by my earlier message.

    There is a new patch that applies to a clean source tree that will run steps 0,1 and 2.

    I have started to encounter rounding/representation issues with the code.

    in STEP2 there is a source file called PApars.f. Code in this file caused an infinte loop because of a comparison between an integer cast to a float and a float expression.

    The line

    IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))

    was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following


    IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))

    and the STEP2 shell script ran to completion.

    I’m not sure of the effects this has on the final result, I don’t know any Fortran to be able to undestand fully what algorithm the code is trying to implement. It appears to be checking for the number of years needed.

    Mark.

  20. JerryB
    Posted Sep 10, 2007 at 8:10 AM | Permalink

    A question to those who have succeeded in running STEP0:
    where did you find a copy of the input file named USHCN.v2.mean_noFIL ?

  21. steven mosher
    Posted Sep 10, 2007 at 8:21 AM | Permalink

    334: JerryB here is the key

    get_offset_noFIL

    # get appropriate GHCN data -> fort.2
    grep ^425 ../../DOWN*/v2.mean > ghcn_us_new

    f77 dump.1980.f ; ln ghcn_us_new fort.1 ; a.out ; rm -f fort.1

    # get USHCN data -> fort.1
    cp -p /gcm/USHCN/DOWN*/USHCN.v2.mean_noFIL.gz .
    gunzip USHCN.v2.mean_noFIL.gz
    grep -v -e ‘-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999′ USHCN.v2.mean_noFIL > xxx
    sort -n xxx > fort.1 ; rm -f xxx USHCN.v2.mean_noFIL
    f77 dif.ushcn.ghcn.2005.f ; a.out ; rm -f fort.[12] a.out
    mv fort.10 ushcn-ghcn_offset_noFIL

  22. Mark
    Posted Sep 10, 2007 at 8:27 AM | Permalink

    A question to those who have succeeded in running STEP0:
    where did you find a copy of the input file named USHCN.v2.mean_noFIL ?

    Hi,

    The only files I needed to obtain were v2.mean and hcn_doe_mean_data. After placing these two files in the input_files directory and uncompressing them the STEP0 process ran to completion.

    I obtained them both from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2.

    I am currently stuck in STEP3, trimSBBX.f is sometimes stopping with a seg fault (memory error) and sometimes stopping with an end of file condition. I don’t know enough Fortran to work out whats going wrong.

    Its the process of STEP3, once that runs its onto the last.

    Mark.

  23. JerryB
    Posted Sep 10, 2007 at 8:44 AM | Permalink

    Re #334,

    steven,

    I saw that “key”, but I’m looking for the door that it fits. :-)

    Re #335,

    Mark,

    The hcn_doe_mean_data file can be found in the USHCN directory of the server
    to which you linked, and also on a CDIAC server, but that file does not
    include data with USHCN SHAP adjustments without FILNET adjustments. Without
    the USHCN.v2.mean_noFIL file, I would not expect you to get the same temperature
    numbers for USHCN stations that GISS gets.

  24. Jean S
    Posted Sep 10, 2007 at 9:36 AM | Permalink

    #330: Thanks! I’m suspecting that the solution for the rounding thing is in the average-function:

    data_row[n] = sums_row[n] / wgt

    where I suspect both of the quantities on the right are integers. This is why I’m interested in how exactly Python rounds int/int. Notice also that the offset (diff) seems to be rounded:

    diff = sum / wgt

  25. JerryB
    Posted Sep 10, 2007 at 10:10 AM | Permalink

    The temperature data contained in the USHCN.v2.mean_noFIL file presumably
    came from the 3A lines of the hcn_shap_ac_mean.Z file at
    ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/OtherIntermediates/
    however, the format of those two files is apparently quite different, and
    converting a copy of one into a copy of the other could be quite a chore.

    Similar, but possibly slightly different, numbers could be found in the GHCN
    v2.mean_adj.Z file, which would seem to be more similar to the expected format,
    and with slight editing, might be “close enough”, but not an exact match. Such
    and effort would also require extracting only the USHCN stations, and only
    through year 2000 data.

    Alternate plan B would be to see if someone such as Reto Ruedy would put a
    copy from whence it could be downloaded.

  26. Stevan Naylor
    Posted Sep 10, 2007 at 10:25 AM | Permalink

    Re 328 – According to my 1982 Fortran 77 Reference Guide – which I amazingly managed to unearth from the family tech archives – ” Integer division always truncates, rather than rounds.”

    Thus in F77: -2/3 = 0
    Not sure abount Python though…

  27. Stevan Naylor
    Posted Sep 10, 2007 at 11:32 AM | Permalink

    RE 328 – I checked the Python online reference (here) and it appears that integer division involves the floor function:

    “The / (division) and // (floor division) operators yield the quotient of their arguments. The numeric arguments are first converted to a common type. Plain or long integer division yields an integer of the same type; the result is that of mathematical division with the `floor’ function applied to the result. Division by zero raises the ZeroDivisionError exception. “

    Thus: -2/3 = -1

  28. steven mosher
    Posted Sep 10, 2007 at 11:40 AM | Permalink

    340… And did the SBXXgrid execute successfully and output successfully?

  29. steven mosher
    Posted Sep 10, 2007 at 11:51 AM | Permalink

    340… SREAD…

    My bet is the offending Variable in the SREAD call is ML… which is assigned
    from INFO(4)

    INFO(4) = Length of record.

    In step 2 there were adjustments made to length of record.. Maybe your fix there casacaded
    into an Run off the end f the file error here.

    Might also explain the SEG error…

    Steve

  30. steven mosher
    Posted Sep 10, 2007 at 1:38 PM | Permalink

    re 362. No sweat.

    If need be We can force a solution to the SREAD in step 2 by Hardcoding ML
    to a small number and seeing if we get rid of the EOF problem. That will
    let you continue while I look at the code.. If Restricting ML solves the
    EOF issue then we KNOW its tracked back to INFO(4) and I can see if that was
    updated in the file writes of Step2.

  31. Scott Lurndal
    Posted Sep 10, 2007 at 2:23 PM | Permalink

    Re #185,#192, et. al.

    The usage ‘Makefile.pre.in’ in the EXTENSIONS indicates that the extensions were built around autoconf. Makefile.pre.in
    is one of the output files from autoconf (the main being configure). One would distribute the output from autoconf and the end-user would then run configure to generate the Makefile(s) specific to the actual platform (aix, linux, solaris, etc.) being used.

    Here, however, only the intermediate makefile (hand customized for AIX) was distributed. Either the author didn’t understand how autoconf works, or the EXTENSIONS tarball is incomplete.

  32. Not sure
    Posted Sep 10, 2007 at 2:55 PM | Permalink

    Scott Lurndal (364): I didn’t use the Makefile.pre.in files at all. I wrote setup.py scripts that I’ll try to publish somewhere soon.

    The very first FORTRAN file in STEP2 is segfaulting for me. I uncommented the debug print at line 41 of text_to_binary.f, and here’s the output I get every time:


    524 131617103790000 100POTSDAM
    525 133617103810000 58BERLIN-DAHLEM
    525 133617103810001 58BERLIN-DAHLEM
    525 133617103810002 58BERLIN-DAHLEM
    525 134617103840005 49BERLIN-TEMPEL
    522 141617103930000 104LINDENBERG
    ./do_comb_step2.sh: line 18: 8602: Memory fault
    breaking up Ts.bin into 6 zonal files
    At line 20 of file split_binary.f
    Fortran runtime error: Bad unit number in OPEN statement
    trimming Ts.bin1-6
    200690003 OSTROV VIZE 4 9999 9999 9999 9999
    202920005 GMO IM.E.K. F

    The shell script has no error checking at all, which makes it very frustrating to use.

    Interestingly, my Ts.txt from STEP1 is 30759146 bytes long. It looks like Mark and I are diverging. I’ll try to get my stuff up on the web somewhere so that hopefully we can begin converging.

  33. steven mosher
    Posted Sep 10, 2007 at 4:07 PM | Permalink

    Thanks Steve.

    Currently to my Knowledge Mark and “Not Sure” are the only guys trying to compile
    and run the code.

    What we need. Guys who know Fortran, guys who know Python.. and we suspect the
    distribution was hosted on AIX.
    If you know Fortran or Python start learning the code as the guys trying to compile
    do their thing. Remote debugging VIA Blog!

    Plus Guys like JerryB who understand all the data that is out there ( ushcn) are
    also a big bonus and a huge help.

    Guys.. Lets recap the following For New Comers.

    MARK and “not sure” Recap your status

    1. System Config you have
    1. Links to patches
    2. Current status.
    3. Open Issues.

  34. Not sure
    Posted Sep 10, 2007 at 5:03 PM | Permalink

    1. Gentoo Linux, gfortran 4.1.1, ksh-93, Python 2.4
    1.(?) No patches yet. Getting to it
    2. Working on STEP2
    3. Need to start on STEP3

  35. Not sure
    Posted Sep 10, 2007 at 5:10 PM | Permalink

    OK, I resolved the text_to_binary.f issue by changing line 89 to read

    if (ix.gt.0.and.idatum.ne.9999) then

    STEP2 finished, and created the 12 files described in gistemp.txt.

    file to_next_step/Ts.GHCN.CL.*
    to_next_step/Ts.GHCN.CL.1: data
    to_next_step/Ts.GHCN.CL.2: data
    to_next_step/Ts.GHCN.CL.3: data
    to_next_step/Ts.GHCN.CL.4: data
    to_next_step/Ts.GHCN.CL.5: data
    to_next_step/Ts.GHCN.CL.6: data
    to_next_step/Ts.GHCN.CL.PA.1: data
    to_next_step/Ts.GHCN.CL.PA.2: data
    to_next_step/Ts.GHCN.CL.PA.3: data
    to_next_step/Ts.GHCN.CL.PA.4: data
    to_next_step/Ts.GHCN.CL.PA.5: data
    to_next_step/Ts.GHCN.CL.PA.6: data
    to_next_step/Ts.GHCN.CL.PA.station.list: ASCII text
    to_next_step/Ts.GHCN.CL.station.list: ASCII text
    to_next_step/Ts.GHCN.CL.station.list_orig: ASCII text

    file to_next_step/Ts.GHCN.PA.*
    to_next_step/Ts.GHCN.PA.1: data
    to_next_step/Ts.GHCN.PA.2: data
    to_next_step/Ts.GHCN.PA.3: data
    to_next_step/Ts.GHCN.PA.4: data
    to_next_step/Ts.GHCN.PA.5: data
    to_next_step/Ts.GHCN.PA.6: data
    to_next_step/Ts.GHCN.PA.station.list: ASCII text

    ls -lh to_next_step/Ts.GHCN.PA.*
    -rw-r–r– 1 jose users 770K Sep 10 16:08 to_next_step/Ts.GHCN.PA.1
    -rw-r–r– 1 jose users 13M Sep 10 16:08 to_next_step/Ts.GHCN.PA.2
    -rw-r–r– 1 jose users 2.2M Sep 10 16:08 to_next_step/Ts.GHCN.PA.3
    -rw-r–r– 1 jose users 1.3M Sep 10 16:08 to_next_step/Ts.GHCN.PA.4
    -rw-r–r– 1 jose users 1.2M Sep 10 16:08 to_next_step/Ts.GHCN.PA.5
    -rw-r–r– 1 jose users 64K Sep 10 16:08 to_next_step/Ts.GHCN.PA.6
    -rw-r–r– 1 jose users 510K Sep 10 16:08 to_next_step/Ts.GHCN.PA.station.list

    As an aside, the -fbounds-check argument to gfortran can be very useful.

    (Look Ma, I’m hacking FORTRAN!)

  36. Paul Linsay
    Posted Sep 10, 2007 at 6:25 PM | Permalink

    #335, Mark, in Hansen Frees the Code

    I am currently stuck in STEP3, trimSBBX.f is sometimes stopping with a seg fault (memory error) and sometimes stopping with an end of file condition.

    This is usually the sign of an array index that is out of bounds.

  37. scott lurndal
    Posted Sep 10, 2007 at 6:33 PM | Permalink

    #3

    If monthly(m) has a negative or zero value, one must wonder if the read at line 83 read the correct values. What was the value that was causing a the memory fault, zero or a negative value?

  38. ural
    Posted Sep 10, 2007 at 8:01 PM | Permalink

    From what I gather the data changes day by day for some of the inputs. It would be nice to freeze input on a certain date … just for testing purposes.

    I would be happy to help on reversing the time stamps on the data sets … 1960-2000, 2000 now 1960, 1960 now 2000. It should show global cooling at the same rate as global warming … if the assumptions are correct.

  39. Andy
    Posted Sep 10, 2007 at 8:06 PM | Permalink

    Re #374, I think Geoff’s right about the excitement factor in play here. I set up a repository on code DOT google DOT com/p/open-gistemp yesterday and have seen zero interest.

    I’m going to use it for my own hacking, and I’ll do my best to incorporate the great work being done by Not Sure, Mark and the rest on the technical thread, with references back to the applicable posts.

    Anyone who wants to join in the effort is more than welcome.

  40. steven mosher
    Posted Sep 10, 2007 at 8:30 PM | Permalink

    RE4…

    Ya thanks Thats the path we are headed down. I think we have an index walking off the end
    of an array/file…

    If you know Fortran… can we do any exception handling on the EOF error or are we resigned
    to “printf” debugging?

  41. steven mosher
    Posted Sep 10, 2007 at 8:36 PM | Permalink

    re 3. Ok… I’ll check the code and see if the change is substantial..

    DID you implement the Fix that Mark had for the Line that was causing the Inf Loop?

    The line

    IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))

    was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following

    IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))

  42. Paul Linsay
    Posted Sep 10, 2007 at 8:54 PM | Permalink

    #7, there is no such thing as exception handling in FORTRAN77 or earlier. READ/WRITE is your only choice.

  43. Not sure
    Posted Sep 10, 2007 at 9:27 PM | Permalink

    Scott Lurndal (5) the negative value was not monthly(m), it was “ix” because the data contained some years before 1880, for some reason.

    Steven Mosher (8) I did apply Mark’s fix, but I wasn’t even making it to that code. I was thinking about trying it without, just to see.

    Paul Linsay (4) The segfault I was encountering was an index out of bounds condition. Compiling the file with -fbounds-check demonstrated this.

  44. Not sure
    Posted Sep 10, 2007 at 9:45 PM | Permalink

    Steve (43) I shore as heck can’t find it. I’ve uploaded my result from STEP0 here

  45. Andy
    Posted Sep 10, 2007 at 10:46 PM | Permalink

    Re #44. Results of a diff between my STEP0 and yours:

    diff v2.mean_comb-andy v2.mean_comb-notsure
    1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
    6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
    6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

    These 3 records are the only differences between the 2 files.

    My system is CentOS 4.5, pdksh 5.2.14-30.3, gfortran 4.1.1-53.EL4, python 2.3.4-14.4. No modifications to the GISS code were required for me to complete STEP0, and I downloaded the 2 input files in the STEP0_README.txt file today.

  46. Andy
    Posted Sep 10, 2007 at 10:53 PM | Permalink

    Re #45 – boy that sure posted different than it looked in the preview. Let’s try that diff output again:

    diff v2.mean_comb-andy v2.mean_comb-notsure
    Andy’s row = 1, Not Sure’s row = 2

    1-1016035500022007 137 145 139 172 202 230-9999 264-9999-9999-9999-9999
    2-1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
    1-6244025000012007 75 108 132 169 254 280 301 300-9999-9999-9999-9999
    2-6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
    1-6244031000012007 65 99 119 162 232 251 259 266-9999-9999-9999-9999
    2-6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

  47. Not sure
    Posted Sep 10, 2007 at 10:57 PM | Permalink

    OK, I got STEP3 to run to completion, but I’m nervous about what I had to do. Starting at line 401 it now reads:


    SUBROUTINE GRIDEA (p_NRM,TITLEZ, p_IBMM,p_NZS)
    C****
    C**** This output grid dependent routine sets the parameters
    C**** IBM(J) (= number of regions in latitude belt J) and
    C**** KZONE(J,N) (=1 if belt J belongs to domain N) and the TITLES
    C****
    C**** Current order of boxes: 1-4 north , west->east ...
    C**** . to
    C**** 77-80 south , west->east
    C**** Order of subboxes: 1-10 south , west->east ...
    C**** . to
    C**** 91-100 north , west->east
    C**** Sergei's ordering 1-10 west , south->north
    C**** . to
    C**** 91-100 east , south->north
    C****
    INTEGER p_NRM, p_NZS, p_IBMM
    PARAMETER (NRM=80,ICM=10,JCM=10, NZS=3, NCM=ICM*JCM)
    CHARACTER*80 TITLEZ(*),TITLES(1+NZS)/
    * ' LATITUDE BELT FROM XX.X ? TO XX.X ?',
    * ' NORTHERN LATITUDES: 23.6 N TO 90.0 N',
    * ' LOW LATITUDES: 23.6 S TO 23.6 N',
    * ' SOUTHERN LATITUDES: 90.0 S TO 23.6 S'/
    COMMON/GRIDC/IBM(8),KZONE(8,NZS+3)
    C****
    C**** Sergej's equal area grid
    C****
    C**** Grid constants for latitude zones
    REAL BANDS(9)/90.,64.2,44.4,23.6,0.,-23.6,-44.4,-64.2,-90./
    INTEGER NUMJ(8)/4,8,12,16,16,12,8,4/
    C**** Check grid dimensions
    IF(p_NRM.NE.NRM.OR.NZS.NE.p_NZS) THEN
    WRITE(6,'(4I9)') p_NRM,NRM, p_NZS,NZS
    STOP 'ERROR: INCONSISTENCY FOR NRM NCM NZS'
    END IF
    C**** Loop over all ZONES
    DO 50 J=1,8
    IBM(J)=NUMJ(J)
    IF(IBM(J).LE.p_IBMM) GO TO 20

    For some reason the integer parameters p_NRM (originally called “NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the “INTEGER p_NRM…” line. Also, the original parameter names caused gfortran to raise syntax errors.

  48. Not sure
    Posted Sep 10, 2007 at 11:06 PM | Permalink

    D’oh the file I modified in 47 is zonav.f

  49. Ian McLeod
    Posted Sep 10, 2007 at 11:08 PM | Permalink

    Looks like diff is real. Andy is getting another month compared to Not Sure. Is that an interger or rounding problem?

  50. Ross Berteig
    Posted Sep 11, 2007 at 12:43 AM | Permalink

    Re 47: My rusty mental file cabinet immediately tossed up the phrase “The dollar sign is reserved to Digital”. (A big chunk of my early programming experience was with a PDP-11/73 and a VAX-11/780 in assembler and C. Incidentally, I still have that PDP-11, and it still runs.) In a fair number of older languages the dollar sign is an acceptable part of an identifier along with the usual alphanumerics and underscore, and was often used much like underscore is by C folk.

  51. David
    Posted Sep 11, 2007 at 12:53 AM | Permalink

    A dollar sign at the end of a variable was used to denote Strings in BASIC. A$=”HELLO”

  52. Mark
    Posted Sep 11, 2007 at 2:20 AM | Permalink

    Not Sure (#47)

    Are you sure that all steps of STEP3 are completing?

    I’m still getting a memory error while running trimSBBX, I haven’t had a chance to look since yesterday, I’m just picking it back up again now.

    I made similar changes to you for zonaf.f and managed to get it to run. I just made my changes the same as yours but the memory error while running trimSBBX still persists (I thought that the data types of zonaf may be causing the errors with trimSBBX).

    For reference I’m developing on Linux using GNU Fortran (GCC) 4.2.1 and Python 2.5.1. I installed a Korn shell to run the scripts (mksh).

    I’m also using ftncheck to give a quick heads up on possible fortran problems.

    My current patch is available from here

    I’m using v2.mean from here, but I’ll switch to a better one once all the steps are running.

    My current sticking points are:

    1. trimSBBX is stopping with either a memory fault or end of file condition. It appears to be random and I suspect its an issue with data types.

    2. Assembling data for Step4_5. I’ve downloaded data from hadobs but I can’t access http://ftp.emc.ncep.noaa.gov, the connection times out. STEP4 is looking for SBBX.HadR2, but I cannot see where this file is created, maybe I should read the manual :))

    Mark.

  53. Cliff Huston
    Posted Sep 11, 2007 at 5:06 AM | Permalink

    #52 Mark,

    I can’t access http://ftp.emc.ncep.noaa.gov, the connection times out.

    Try ftp://ftp.emc.ncep.noaa.gov/cmb/sst/oimonth_v2/

    Cliff

  54. H
    Posted Sep 11, 2007 at 5:29 AM | Permalink

    re # 47

    “For some reason the integer parameters p_NRM (originally called “NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the “INTEGER p_NRM…” line. Also, the original parameter names caused gfortran to raise syntax errors.”

    I recall that parameters with $ were system or shell parameters. Like the name of OS or the size of the memory.

  55. Cliff Huston
    Posted Sep 11, 2007 at 5:53 AM | Permalink

    Mark,

    STEP4 is looking for SBBX.HadR2 . . .

    You probably have this figured by now, but: SBBX.HadR2 is the output from step 4, not an input to step 4.

    Cliff

  56. Mark
    Posted Sep 11, 2007 at 6:13 AM | Permalink

    #(53,54,55)

    Hi,

    THanks for your comments, they are all helping.

    After downloading the monthly SST data I started having real problems reading the data. I then noticed a test program in the README and tried that. It turns out that you have to compile with the option “-fconvert=swap” when using gfortran.

    I’m now starting again from STEP0 using this option on all the fortran code to see if this changes the end result.

    Mark.

  57. Not sure
    Posted Sep 11, 2007 at 6:53 AM | Permalink

    Mark, yes all the of STEP3 is working for me.

    Did you just create an empty “SBBX.HadR2″ file? That’s what I was going to try.

  58. Not sure
    Posted Sep 11, 2007 at 7:00 AM | Permalink

    Ross Berteig (50) I figured it was some sort of obsolete syntax, but it’s nice to be sure. Thanks.

  59. Not sure
    Posted Sep 11, 2007 at 7:31 AM | Permalink

    After reading some more it seems that “SBBX.HadR2″ should come from the hadodbs site, and that it should be one of the ASCII format files. But which one? There’s no single file that covers the period they say they used (1870-present), and there are none with that name.

  60. Posted Sep 11, 2007 at 7:46 AM | Permalink

    If I produce variable type statements for the fortran routines can you use them? How would I get the modified code to you? I have an online place where I could leave the modified routines. I have the STEP0 routines doen if you want to see examples.

  61. steven mosher
    Posted Sep 11, 2007 at 7:48 AM | Permalink

    re 60.

    Dan, good to see you here! and thanks for helping

  62. James
    Posted Sep 11, 2007 at 8:05 AM | Permalink

    Re #47: Learn some FORTRAN. Variables with first character I through N (as in INteger) are integers, and all other starting characters become floats. Good luck.

  63. Posted Sep 11, 2007 at 8:09 AM | Permalink

    Concerning the Praha-Ruzynì (airport) data, see especially barnie’s fascinating comment #91 at

    http://www.climateaudit.org/?p=2018

    Let me just point out one trivial observation. There is no “periodicity” of 3-4 years (one that would be related to leap years or anything like that). The reason why these lines are organized to periods that are 3-4 years long is simply that they want to make an adjustment that is continously increasing from -2.6 Celsius in 1909 to 0 in 2007. If you kindly allow me to give my own interpretation, they want to add a 2.6 Celsius degrees change to the trend in the last century, at least to the December values.

    The precise adjustment that should be applied to a particular December is a linear function of time and it is eventually rounded to the 0.1 Celsius degree accuracy. That’s why the years end up being clumped into groups of 3-4 years. Does anyone understand what these corrections mean [snip]? Incidentally, if the trend is only added to one month, I would expect the overall trend to be artifically increased by 2.6/12 = 0.2 Celsius degrees (per century) or so, do I understand it well?

    If I understand the signs of the adjustments well, they can hardly be justified by e.g. the urban heat island because the sign is wrong. It is the recent temperatures that should be reduced if you believe that they’re high because of urbanization.

    One comment related to the “comment” section of Praha-Ruzyne: they say “WWII & Russian occupation”. Let me say that no Czech person would certainly use the word “Russian occupation” for anything that happened in the WWII years. We were liberated in 1945, not occupied. ;-) But that’s probably one of the less serious problems in the file.

  64. Posted Sep 11, 2007 at 8:20 AM | Permalink

    Let me just say one thing. I am not accusing anyone of anything, just making a thought experiment. See how temperatures and precipitation depend on the month (in Prague).

    If I wanted to make fraud to support a theory that there is warming and it is primarily affecting winters, the best way would be to make old winters look colder. December and January are the two coldest months and the temperature is minimized near these two months – it is stationary so that it looks pretty much constant. So if you reduce the temperature in December, you will create colder winters in 1909 which is precisely what you want if you want to show that global warming makes winters milder. Moreover, the points near the stationary points – in this case, months near the coldest December or January – are the most natural months to be modified because you don’t create too big discontinuities.

    These corrections look bizarre to me. It would be good if someone tried to give us a less criminal explanation what’s going on – as well as a more comprehensive analysis which stations besides Prague-Ruzynì were treated in a similar way.

  65. Richard Sharpe
    Posted Sep 11, 2007 at 8:24 AM | Permalink

    James says:

    Re #47: Learn some FORTRAN. Variables with first character I through N (as in INteger) are integers, and all other starting characters become floats. Good luck.

    Sigh. A little knowledge is a dangerous thing. What about IMPLICIT?

  66. Jaye
    Posted Sep 11, 2007 at 8:29 AM | Permalink

    I did comment those lines as the Python docs make it clear that the results of PyString_AsString must not be deallocated. I was getting a different memory error before I did this.

    Ok guys if you get different results based on remapping memory by commenting out lines then at least one memory bug is living somewhere else and has not been found yet. Suggest somebody run the thing with valgrind in full verbose mode.

  67. Posted Sep 11, 2007 at 8:31 AM | Permalink

    Mark #14, wouldn’t it be easier for you and others to publish the full patched tar.gz archive of the full code and data?

  68. Mark
    Posted Sep 11, 2007 at 8:36 AM | Permalink

    wouldn’t it be easier for you and others to publish the full patched tar.gz archive of the full code and data?

    I guess there would be no problem, its not that large.

    I’ll make a full archive up of my changes and post them up at the same page as my patch.

    Mark.

  69. Jean S
    Posted Sep 11, 2007 at 8:42 AM | Permalink

    I think I’m able to perfectly replicate the GISS station combining method (STEP1). The key thing for extending the 2-station method to more stations is simply to always create an “intermediate reference station” for comparing with the new records. The final combination is then simply the average of the shifted records. Additionally, there seems to be no rounding issue after all. I guess rounding problems arouse as previously we tried to replicate with rounded annual means (=>rounded bias). My code seems to replicate perfectly all the stations I’ve tested so far (Bagdarin, Praha, Joensuu, Gassim, Bratsk).

    A request: please check the workings of this code for other stations. I’ve spent already too much time with this, I should take a break :) Also, if there are people out there who are fluent with R, it would be nice to have the code also in R :)

    I’ve sent my code to Steve, so I’ll might put it up somewhere if it is too hard to copy from here. Here’s the main function:

    --- cut here ---

    function combined=GISScombine(temperatures);

    % combines stations ala GISS
    % by Jean S (9/2007)
    % assumes integers; and full years starting in Jan

    % Minimum overlap for combining
    min_overlap=4;

    [years,stations]=size(temperatures);

    % 1st station is assumed to be the "best"
    fixed_stations(:,1)=temperatures(:,1);
    no_fixed=1;
    ref_station=temperatures(:,1);

    % annuals for all stations
    for k=1:stations;
    annual(:,k)=GISSannual(temperatures(:,k));
    end

    available_stations=2:stations;
    % repeat until no new records to add
    while ~isempty(available_stations),
    [ref_annual,ref_seasonal]=GISSannual(ref_station);
    % Find the annual mean matches
    overlaps=repmat(isfinite(ref_annual),1,length(available_stations)) & ...
    isfinite(annual(:,available_stations));
    % Choose the longest match (if equal then the last maximum)
    [overlap_years,overlaps_loc]=sort(sum(overlaps));
    if overlap_years(end)

    Here's the function for the annual means:

    --- cut here ---
    function [annual,seasonal]=GISS_annual(temperatures);

    % Calculates the GISS seasonal and annual means
    % by Jean S (9/2007)
    % Assumes full year input starting in Jan

    years=length(temperatures)/12;
    temp=reshape(temperatures,12,years)';
    seasonal=ones(years,4)*NaN;
    annual=ones(years,1)*NaN;

    % calculate the reference means and anomalies
    ref_mean=nanmean(temp);
    anomal=temp-repmat(ref_mean,years,1);

    % DJF
    seasonal(2:end,1)=mean([temp(1:end-1,12) temp(2:end,1:2)],2);
    season=[NaN anomal(1,1:2); anomal(1:end-1,12) anomal(2:end,1:2)];
    I=sum(isnan(season),2)==1;
    seasonal(I,1)=nanmean(season(I,:)')'+repmat(mean(ref_mean([1:2 12])),sum(I),1);

    % MAM
    seasonal(:,2)=mean(temp(:,3:5),2);
    season=anomal(:,3:5);
    I=sum(isnan(season),2)==1;
    seasonal(I,2)=nanmean(season(I,:)')'+repmat(mean(ref_mean(3:5)),sum(I),1);

    % JJA
    seasonal(:,3)=mean(temp(:,6:8),2);
    season=anomal(:,6:8);
    I=sum(isnan(season),2)==1;
    seasonal(I,3)=nanmean(season(I,:)')'+repmat(mean(ref_mean(6:8)),sum(I),1);

    % SON
    seasonal(:,4)=mean(temp(:,9:11),2);
    season=anomal(:,9:11);
    I=sum(isnan(season),2)==1;
    seasonal(I,4)=nanmean(season(I,:)')'+repmat(mean(ref_mean(9:11)),sum(I),1);

    % Annual
    annual=mean(seasonal,2);
    ref_mean_season=nanmean(seasonal);
    season=seasonal-repmat(ref_mean_season,years,1);
    I=sum(isnan(season),2)==1;
    annual(I)=nanmean(season(I,:)')'+repmat(mean(ref_mean_season),sum(I),1);
    --- cut here ---

    The input should be in integers such that every record is in a column. The first column is assumed to be the “best” record (which is not selected according to the data, but according to a preference list. See Steve’s post and Hansen’s code for details). Usually this is the only one extending to current times. Here’s an example call (Bratsk; I sent the file also to Steve):

    bratsk_data=load('bratsk.txt');
    bratsk_data(:,2:end)=10*bratsk_data(:,2:end);
    originals=bratsk_data(:,[6 2:5]);
    combined_bratsk=GISScombine(originals);
    max(abs(combined_bratsk-bratsk_data(:,7)))

    Have fun!

  70. Not sure
    Posted Sep 11, 2007 at 8:45 AM | Permalink

    James (62) You think they could’ve mentioned that here. I guess the right way to fix those parameter names is NRM_ IBMM_, etc.

  71. Jean S
    Posted Sep 11, 2007 at 8:45 AM | Permalink

    Ok, the first code was cut due the smaller than sign. Let’s wait for Steve to post the code.

  72. Mark
    Posted Sep 11, 2007 at 8:46 AM | Permalink

    Hi,

    I have now put a tgz archive at http://www.zen103403.zen.co.uk/patches/patch.html

    This archive is the plain sources with my current patch applied.

    Mark.

  73. Steve McIntyre
    Posted Sep 11, 2007 at 9:00 AM | Permalink

    JEan S code online at http://www.climateaudit.org/scripts/station/hansen/jeans/

  74. Not sure
    Posted Sep 11, 2007 at 9:03 AM | Permalink

    Andy, what version of tail do you have? I’m using GNU tail 6.4:

    tail –version
    tail (GNU coreutils) 6.4
    Copyright (C) 2006 Free Software Foundation, Inc.
    This is free software. You may redistribute copies of it under the terms of
    the GNU General Public License .
    There is NO WARRANTY, to the extent permitted by law.

    Written by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering.

    You didn’t have to modify line 48 of “do_comb_step0.sh”?

    Also, I appreciate your efforts in setting up a Google project for us. I haven’t participated yet because I’m wondering if there’s going to be a “CA branded” project as some have talked about. I also think that git might be a better choice for source control than svn. With git we’ll be able to have several different patch sets concurrently. I like this because I’m focused on creating the smallest diff that will yield code that runs on a commodity Linux system. I’m not going to try to fix memory leaks or anything else. If the original code leaks, then hopefully my modified version will leak in exactly the same way. This is to preempt objections about the correctness of modifications. I can see how others would like to fix things as they find them. If we have multiple patch sets we can accommodate these two approaches simultaneously. If it turns out that disparate patch sets produce the same outputs for the same inputs they can be merged later.

  75. steven mosher
    Posted Sep 11, 2007 at 9:36 AM | Permalink

    re 47 and 65.

    U know Fortran? Download the dusty deck and get crackin. Pick a routine.
    document it.

  76. steven mosher
    Posted Sep 11, 2007 at 9:40 AM | Permalink

    RE 66.

    Somebody can run this through Pychecker and Valgrind while other folks try to get this
    running.

    Divide and conquer guys. Do what you are best at. report back.

    Herding cats isnt that bad when they are smart cats.

  77. steven mosher
    Posted Sep 11, 2007 at 9:57 AM | Permalink

    RE 74.

    I think once somebody gets the pile to compile then it makes sense to mv to
    a more amenable enviroment. As it stands we are hacking the blog paradigm
    to do a remote coordinated debug.. or something like that.

  78. Andy
    Posted Sep 11, 2007 at 10:08 AM | Permalink

    Re #74, I didn’t have to modify line 48. Tail version as follows (from CentOS 4.5 / Redhat EL4).

    tail –version
    tail (coreutils) 5.2.1
    Written by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering.

    Copyright (C) 2004 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    I agree with you on svn. Figured something was better than nothing, though.

    I was thinking this morning that I can probably best leverage my limited Python (and non-existent Fortran) skills by grabbing the changes you guys post that I need to get my platform to do a clean run and publishing them in the google code. I’m pretty far behind you since I had to build a new sandbox for this, but I can take the time to document until I catch up (if possible :-))

  79. James
    Posted Sep 11, 2007 at 10:21 AM | Permalink

    Re: #65, Richard

    What about IMPLICIT? If it isn’t used in the code and the comment I was replying to was confused about variables being floats, explaining IMPLICIT won’t help.
    A little knowledge can be dangerous, but almost zero FORTRAN knowledge will make this project really difficult.

  80. James
    Posted Sep 11, 2007 at 10:28 AM | Permalink

    Re: #65, Richard

    What about IMPLICIT? The original post didn\’t understand why certain variables were floats, and my explanation clarified that. If IMPLICIT isn\’t used in the code, why would I mention it?

    Yes, \”A little knowledge can be dangerous\”, but almost zero FORTRAN knowledge and this project can be a lot more difficult.

  81. Not sure
    Posted Sep 11, 2007 at 10:33 AM | Permalink

    OK, I’ve uploaded a clean working copy here. I tried to test it on a clean environment, but I ran into this, so YMMV.

  82. James
    Posted Sep 11, 2007 at 10:35 AM | Permalink

    Re #75 Mosher

    I\’ll get on it but you won\’t see me here. I\’ll be on the Google Code site, working in SVN and using the wiki there. This blog does not lend itself well to development, and since it isn\’t threaded conversations are a real pain…

  83. Mark
    Posted Sep 11, 2007 at 10:47 AM | Permalink

    Hi,

    From what I can make out the program named zonav, called by do_comb_step3.sh, attempts to use fort.11 as its input.
    CALL SREAD(11,ML ,AR(1,IB),WTR(1,IB),LENR(IB))

    But this file has been renamed by the previous operation in do_comb_step3.sh

    mv fort.11 BX.Ts.${label}.$rad
    so it can never run with valid data.

    zonav writes its output to fort.10 which is what trimSBBX is expecting to use, and I think that is why I couldn’t get trimSBBX to run.

    I am left thinking that these two operations may not actually be required for STEP4_5.

    Has anyone else looked at this yet?

    Mark.

  84. David
    Posted Sep 11, 2007 at 10:53 AM | Permalink

    People used variable names beginning with I-N for integers by convention. I don’t think they have to ‘become’ anything. Web search engines are your friend, do a search on: implicit fortran

  85. MarkW
    Posted Sep 11, 2007 at 11:06 AM | Permalink

    David,

    Fortran does not (or at least did not) require one to declare a variable before using it. Variables that started with the letters I-N were declared integers, everything else was declared float.

    This is another reason why Fortran variable names tended to be as short as possible. If you happened to misspell a variable, instead of declaring a parse error, Fortran would just declare the misspelled name to be a new variable, and continue happily on it’s way.

  86. Mark
    Posted Sep 11, 2007 at 11:32 AM | Permalink

    Hi,

    I think I’ve made some progress on STEP4.

    The elusive SBBX.HadR2 file is, I think, SBBX1880.Ts.GHCN.CL.PA.1200 (one of the files from STEP3) renamed.

    After downloading all the oiv… data files and unpacking them into the input_files directory, then renaming SBBX1880.Ts.GHCN.CL.PA.1200 to SBBX.HadR2 as described above, I am beginning to get somewhere (possibly somewhere wrong!).

    As I said earlier, the oiv data files are in big-endian format and therefore need to be accessed differently than the others.

    To get around this issue set the following environment variable (if you are using gfortran on an intel platform).

    GFORTRAN_CONVERT_UNIT='swap;native:1-10,12-99'

    This tells the fortran runtime to use byte swapping on only channel 11. Doing this will enable convert1.HadR2_mod4.exe to read the downloaded files.

    I am now able to run convert1.HadR2_mod4.exe and see the program appearing to work.

    Unfortunately it is now stopping with an end of file condition.

    Mark.

  87. Mark
    Posted Sep 11, 2007 at 11:43 AM | Permalink

    Hi,

    Lol, its just as easy to do

    GFORTRAN_CONVERT_UNIT='swap:11'

    Has the same effect.

    Mark.

  88. Not sure
    Posted Sep 11, 2007 at 11:51 AM | Permalink

    Mark (86) Interesting step 5 requires both SBBX1880.Ts.GHCN.CL.PA.1200 and SBBX.HadR2. I guess this can be right since SBBX.HadR2 is modified in step 4.

  89. steven mosher
    Posted Sep 11, 2007 at 1:04 PM | Permalink

    78. ANDY

    thats cool. I know SteveMc and John Goetz seem to be focusing on Step 1. And I have
    avoided the python since I’d just beclown myself ….

    A really cool thing would be a stand alone Step0 Step 1 Module. Basically
    Step 0 and step1 Take in all the land data and do a hansen dance and then output.

    A standalone EXE that did that sh*t would be huge.

    once verified we could asses the hansen Scribal dance.

  90. David
    Posted Sep 11, 2007 at 1:04 PM | Permalink

    #85: I stand corrected. It has been a while! It seems so crude now days. If I were a scientist demanding conciseness…

  91. Steve McIntyre
    Posted Sep 11, 2007 at 1:17 PM | Permalink

    #89. I’m more interested in replicating this in R (or Matlab) – in large part, so I can see what they are really doing. The blizzard of subscripts in Fortran and Python makes it very inelegant programming.

    I can see my way towards Step 0 implementation already (although as I observe in another post) it looks like one needs a hypothesized obsolete USHCN version to precisely replicate the results presently online. Their Y2K patch is really goofy and can be circumvented by using up-to-date USHCN versions.

    After I look through Jean S’s MAtlab script a little more closely, I should be able to have a simple R implementation of Step 1 as well.

  92. steven mosher
    Posted Sep 11, 2007 at 1:21 PM | Permalink

    re 85.

    I’m having PTFS right now. Post Tramatic Fortran Syndrome.

    So Time out for a funny story.

    1987. I’m sitting in a class taught by Mason Woo. ( google him he’s cool)

    I’m sitting behind this DDG blond.

    Mason: how many of you write C?
    The whole frickin class raises their hand except the lovely lady.
    Mason: Miss, what language do you know?
    Babe: Fortran.

    At this point I lean over to my partner in crime and say.
    ” She may be cute, but she’s no rocket scientist”

    Then Mason asks everyone in the class what they do…blah blah
    blah blah…

    Miss Fortran babe: I program the guidance systems for ICBMs

    DOH!

  93. Richard deSousa
    Posted Sep 11, 2007 at 1:45 PM | Permalink

    RE #92: LOL…. so, did you submit your name for a date? :P

  94. steven mosher
    Posted Sep 11, 2007 at 2:09 PM | Permalink

    re 93.

    She wrote Fortran for christs sake. If I asked her to pass me a pointer she would
    have dissolved into tears and core dumped.

    Instead, I cruised El Camino and payed my tributes at Andy Capp’s tavern.

  95. Jean S
    Posted Sep 11, 2007 at 2:24 PM | Permalink

    The blizzard of subscripts in Fortran and Python makes it very inelegant programming.

    Yes, and at least the Python code I looked, was rather terrible. Not to talk about the Fortran code (Step2), which is almost impossible to read, at least, for a non-Fortran programmer. Anyhow, I have a feeling that the best pearls of Hansen’s code are still hidden somewhere in that Fortran spaghetti. If I had even decent understanding of Fortran, I’d take a very careful look at padjust.f (which seems to contain the homogenization code) …

  96. David
    Posted Sep 11, 2007 at 2:51 PM | Permalink

    #94: She would have spent 80% of the time trying to debug you. :)

  97. steven mosher
    Posted Sep 11, 2007 at 2:53 PM | Permalink

    re 95.

    Thats the big mystery routine.

  98. SteveSadlov
    Posted Sep 11, 2007 at 3:11 PM | Permalink

    RE: #5 – With a quick scan of the thread, I did not see follow up comments. It bothers me too. I’ve got no issue with a F to C conversion, but it’s the reservation of an integer value that I cannot understand. That is, unless the goal is illogical built in rounding via automatic truncation of everything to the right of the decimal point.

  99. Dave Dardinger
    Posted Sep 11, 2007 at 4:03 PM | Permalink

    I was reformating padjust.f after pulling it up in notepad to see how much of it I could still understand after decades not having seen any fortran. Most of it was pretty self-evident but I have a couple of questions.

    1 What is the Calt which appears a couple of places after a “then”?
    2 What happens in a “call adj(….)”?

    I’d expect both of these call a program external to padjust, but where are they at?

    Thanks :)

  100. Not sure
    Posted Sep 11, 2007 at 4:10 PM | Permalink

    Andy (46) I ran STEP0 again on a different machine (gfortran 4.1.2, pdksh 5.2.14, AMD 64) and it gave me exactly the same file. I wonder if the version of tail is the difference. My work_files/t_hohenpeissenberg file starts in 1879 and ends in 2003.

  101. Andy
    Posted Sep 11, 2007 at 4:20 PM | Permalink

    Interesting – so does mine. Tar up your work files and put up a link to them. I’ll pull them down and diff them against mine.

  102. Stevan Naylor
    Posted Sep 11, 2007 at 5:03 PM | Permalink

    Dave [99] Anytime the Fortran compiler sees the character ‘C’ in the first “column” of a line of code, the whole line is treated as a COMMENT.

    Looks like alternate ‘IF()’ statements were commented out but left in. I wonder why…

  103. Not sure
    Posted Sep 11, 2007 at 5:06 PM | Permalink

    Andy, my work_files are here.

  104. Stevan Naylor
    Posted Sep 11, 2007 at 5:26 PM | Permalink

    Dave [99] ‘alt()’ is a subroutine which is coded at the end of the padjust.f source code. It claims to apply ‘linear approximation’ (so sez the comment) but it’s going to take a while to understand what it’s really doing.

  105. Dave Dardinger
    Posted Sep 11, 2007 at 5:40 PM | Permalink

    re: 102 Steve N

    Thanks!

  106. Dave Dardinger
    Posted Sep 11, 2007 at 5:48 PM | Permalink

    re: #104

    Well, duh! I was distracted by the C**** (which I did realize was a comment) and missed the “subroutine” above. Thanks again.

  107. Andy
    Posted Sep 11, 2007 at 8:27 PM | Permalink

    Re #’s 100, 101 and 103

    Differences are in v2.meany and v2.meanz

    diff andy notsure
    Binary files andy/cmb2.ushcn.v2.exe and notsure/cmb2.ushcn.v2.exe differ
    Binary files andy/cmb.hohenp.v2.exe and notsure/cmb.hohenp.v2.exe differ
    Binary files andy/dif.ushcn.ghcn.2005.exe and notsure/dif.ushcn.ghcn.2005.exe differ
    Binary files andy/dump_old.exe and notsure/dump_old.exe differ
    Binary files andy/hohp_to_v2.exe and notsure/hohp_to_v2.exe differ
    Binary files andy/USHCN2v2.exe and notsure/USHCN2v2.exe differ

    diff andy/v2.meany notsure/v2.meany
    1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
    6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
    6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

    diff andy/v2.meanz notsure/v2.meanz
    1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
    6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
    6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

  108. Not sure
    Posted Sep 11, 2007 at 10:24 PM | Permalink

    I bet you’re using a 32-bit system. That’s why all the executables differ. But the delta for the text output is exactly the same. Interesting.

  109. Not sure
    Posted Sep 11, 2007 at 10:47 PM | Permalink

    Mystery solved. I decided to start again from scratch, including re-downloading the data files and presto! I get the same file you do. I tracked the difference down to v2.mean:

    diff input_files/v2.mean ../../GISData/v2.mean
    70c70
    &lt 1016035500022007 137 145 139 172 202 230-9999 264-9999-9999-9999-9999

    &gt 1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
    533592c533592
    &lt 6244025000012007 75 108 132 169 254 280 301 300-9999-9999-9999-9999

    &gt 6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
    533786c533786
    &lt 6244031000012007 65 99 119 162 232 251 259 266-9999-9999-9999-9999

    &gt 6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

    So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point I’m really not sure (pun intended) of what we consider authoritative.

  110. H
    Posted Sep 12, 2007 at 12:06 AM | Permalink

    I looked trough padjust.f. It seems like a normal research code. Unfortunately :) Basically it reads input files and writes adjusted result to corresponding output files. If input data is marked bad it is skipped. The data is looped by year (staring from December) and the adjustment is calculated from yearly values. Then it is added to the monthly values for the given range. Months marked bad are skipped.

    What makes the result unprecitable are the year limits

    subroutine adj
    “———————-
    sl=sl1
    if (iy .gt. knee) sl=sl2

    iya=iy
    C-H Lower limit
    if (iy .lt. iy1a) iya=iy1a
    C-H Upper limit
    if (iy .gt. iy2a) iya=iy2a

    iadj=nint( (iya-knee)*sl-(iy2a-knee)*sl2 )
    ——————–“

  111. Ralph Becket
    Posted Sep 12, 2007 at 12:19 AM | Permalink

    When creating diffs, you may find the -u option to diff gives you more readable output.

  112. Posted Sep 12, 2007 at 3:10 AM | Permalink

    Re #91,

    Steve, I’m working on an R port too. Check out the page on Stikir.

    What do you make of this in antarc_comb.f?

    if(icc0.eq.iccp.and.idp.eq.id0) then ! should not happen
    write(*,*) 'ID already used',icc0,id0
    stop 4
    end if

    I read that as saying that there should be no station overlap in v2_mean and the antarc?.txt files. Unless I’ve made a mistake, I think there is.

  113. Jean S
    Posted Sep 12, 2007 at 3:41 AM | Permalink

    #110:

    the adjustment is calculated from yearly values.

    So how exactly is the adjustment calculated? It is described in Hansen (1999) as

    The homogeneity adjustment for a given city is defined to change linearly with time between
    1950 and the final year of data and to change linearly with a possibly different slope between 1950
    and the beginning of the record. The slopes of the two straight line segments are chosen to minimize
    the weighted-mean root-mean-square difference of the urban station time series with the time series of nearby rural stations.

  114. Mark
    Posted Sep 12, 2007 at 3:59 AM | Permalink

    Hi,

    Right from STEP0/antarc_to_v2.sh there is a possibility for error. When the shell script is extracting data it does a grep in the .list file using the station name as a key to extract the station id.

    ‘Faraday\Vernadsky’ (as it is defined in antarc1.txt) is defined in antarc1.list as ‘FaradayVernadsky’.
    It can only be that the ‘\V’ is detected as in invalid backslash sequence and replaced with a ‘V’ for it to work.

    Mark.

  115. Posted Sep 12, 2007 at 6:10 AM | Permalink

    #109 Not sure

    “So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point I’m really not sure (pun intended) of what we consider authoritative.”

    Does this look like someone is making corrections as we speak? Perhaps on the basis of reading this blog? Is Reto Reudy making changes to the ‘simplified version of the code that real scientists would be interested’ in a the moment and updating the files on GISS using it.

    KevinUK

  116. Steve McIntyre
    Posted Sep 12, 2007 at 6:12 AM | Permalink

    Could someone who’s compiled Step 1 pause for a moment and produce some intermediates for the Step 1 crossword puzzle. Could you do the Step 1 calculations for (say) Bratsk “22230309000. This has 5 dset=0 series which are added one by one. Could you report on the results for each step so that I can benchmark replication in R (and confirm what is going on in this step.)

  117. Steve McIntyre
    Posted Sep 12, 2007 at 6:16 AM | Permalink

    #107. It looks like some 2007 values have been updated. NOthing untoward in that in a system that is producing monthly results.

  118. Scott
    Posted Sep 12, 2007 at 6:32 AM | Permalink

    Something looks wrong in GISTEMP_source/STEP0/USCHN2v2.f, line 18:

    itemp=-9999

    The problem I see is itemp is declared as an integer array. Gnu Fortran 3.4.4 compiler halts on line 18 with an error. I assume the above assignment is supposed to initialize all indicies in itemp to -9999.
    I fixed this by commenting out line 18 and adding the following at line 27:

    itemp(m)=-9999

    Any thoughts? Maybe someone who is not get a compiler error can verify the assignment is properly initialzing all indicies in the itemp array.

  119. gb
    Posted Sep 12, 2007 at 7:12 AM | Permalink

    Re # 118.

    In Fortran90 the statement itemp = -9999 initializes all array elements I think, itemp(m) initializes only element m. Probably more correct is itemp(1:m) = -9999.

  120. Posted Sep 12, 2007 at 7:13 AM | Permalink

    Re my #112

    There is overlapping station data between SCAR and GHCN, but they are maintained as seperate time series in v2_meanx using the duplicate field (character 12 in ID I think).

    Deduced from antarc_to_v2.sh:

    Duplicate number 7 is the australian data in antarc2.txt
    Duplicate number 8 is surface station data in antarc1.txt
    Duplicate number 9 is autom. weather stn data in antarc3.txt

  121. Posted Sep 12, 2007 at 7:21 AM | Permalink

    re # 118

    Absoft FORTRAN 77 Compiler ver. 8.2a, (Mac OS X 10.4.10), also gives the error. A local DO ENDO can be used for a fix.

  122. Andy
    Posted Sep 12, 2007 at 7:23 AM | Permalink

    Re #’S 107 & 117, I had listed this mismatch as an issue on the Google code page and speculated that it may be due to different input files being used. Glad to know it’s resolved.

    Steve Mc, is it possible to host a frozen copy of the input files in the CA downloads area for testing?

  123. Trent
    Posted Sep 12, 2007 at 8:11 AM | Permalink

    Re #118,

    In FORTRAN 90 and 95, entire arrays can be assigned a value with a single statement of the form:

    itemp = -9999

    In FORTRAN 77, this was not possible. One had to assign each element separately in a do loop.

  124. Not sure
    Posted Sep 12, 2007 at 11:39 AM | Permalink

    Steve (116) The step 1 results for Bratsk are here. The full results for step 1 are here. These are with Saturday’s data file. I can re-run it with the latest data file if you like.

  125. Lucius Vorenus
    Posted Sep 12, 2007 at 3:57 PM | Permalink

    anonymous: That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.

    This is something which has infuriated me for years.

    Kernighan & Ritchie [& their ilk, such as McCarthy] made a terrible, disastrous error when they decided to go with poor typing, and character-driven, business-oriented, ASCII-ness of data values, rather than binary, scientific-oriented, strong typing of data, and almost every language which has succeeded them has made the same mistake, and it plagues the discipline like a cancer to this day.

    I know of precious few languages which have strong typing of mathematical quantities: languages which support [inherently] a distinction between 8, 16, 32, and 64 bit integers [preferably signed & unsigned]; 16, 32, 64, 80/96, and 128 bit real floats; and 32, 64, 128, 160/192, and 256 bit complex floats.

    Or even which are capable of understanding [inherently] the distinction between 8 [ASCII], 16, and 32 bit Unicode characters, etc.

    Instead, what we’re left with is inanities such as:

    <html>
    <body>

    <script language="javascript">

    var x = 2;
    var y = 2;
    window.alert(x + " + " + y + " = " + x + y);

    window.alert("Oops!\n\nLet’s try that again.");

    var i = parseInt(2);
    var j = parseInt(2);
    var k = parseInt(parseInt(i) + parseInt(j));
    window.alert(i + " + " + j + " = " + k);

    </script>

    </body>
    </html>

    And the very fact that someone would introduce a poorly-typed language into any implementation of a mathematical calculation raises a huge red flag.

  126. Mark
    Posted Sep 13, 2007 at 3:48 AM | Permalink

    Lucius Vorenus (#125)

    This is probably off topic, but you could have got the correct result if you had typed

    window.alert(x + " + " + y + " = " + (x + y));

    (Note brackets round expression).

    It is the ECMA specification that specifies that the + sign concatenates the values if there are strings.

    The first x would be treated as a number, but when the string object is encountered everthing is converted to a string object and concatenated. The objects within the ‘(x + y)’ are all numbers so the x + y is evaluated as a numeric expression and then converted to string.

    I don’t think that its the typing of the object thats the problem, its when you convert between types that problems arise. Lack of knowledge about the particular language that you use is also a factor (not directed at you).

    I personally think that objects should be typeless until the point that you do something with them, and then they should adopt a type that is suitable for the operation being performed on them. If there is no suitable representation of that object for the particular operation then an error should be raised.

    Why should numbers be given special preference?

    Mark.

  127. Mark
    Posted Sep 13, 2007 at 4:03 AM | Permalink

    In a different thread, Steven Mosher said,

    Areas covered occasionally by sea ice are masked using a time-independent mask.
    The Reynolds climatology is included, since it also may be used to find that
    mask. Programs are included to show how to regrid these anomaly maps:
    do_comb_step4.sh adds a single or several successive months for the same year
    to an existing ocean file SBBX.HadR2; a program to add several years is also
    included.

    Result: update of SBBX.HadR2

    Does anyone know where to get this existing SBBX.HadR2 ?

    Mark.

  128. fFreddy
    Posted Sep 13, 2007 at 4:19 AM | Permalink

    Re #125, Lucius Vorenus

    Kernighan & Ritchie [& their ilk, such as McCarthy] made a terrible, disastrous error when they decided to go with poor typing

    That’s a bit unfair. The original K&R required explcit declaration of type, and the C compiler I was using 20 years ago generated an error if you tried to make an assignment between types without an explicit cast.

    I entirely agree with your general point though – I still always switch on all available strong typing options.

  129. fFreddy
    Posted Sep 13, 2007 at 4:28 AM | Permalink

    Re #126, Mark

    I personally think that objects should be typeless until the point that you do something with them, and then they should adopt a type that is suitable for the operation being performed on them.

    I think this relies too much on perfection from the programmer. It’s too easy to make small mistakes like assigning a float to an integer when you didn’t mean to. When that happens, I would prefer it if the compiler points it out as an error. If the assignment is what I meant, then I can override with an explicit conversion.
    Strong typing means slightly more work for the programmer, but a big increase in confidence that the code is doing what you think it is doing.

  130. Andy
    Posted Sep 13, 2007 at 7:54 AM | Permalink

    Re #127, GISS added the following in the revised gistemp.txt included the 9/10/07 source code release.

    Final Notes
    ———–
    A program that can read the two basic files SBBX1880.Ts.GHCN.CL.PA.1200 and
    SBBX.HadR2 in order to compute anomaly and trend maps etc was available on our
    web site for many years and still is.

    This prompted me to nose around their ftp site, and I located a SBBX.HadR2 at ftp://data.giss.nasa.gov/pub/gistemp

  131. Mark
    Posted Sep 13, 2007 at 8:42 AM | Permalink

    Andy (#130)

    Thats great, thank you very much. I’ll start from there.

    Mark.

  132. Posted Sep 14, 2007 at 5:22 AM | Permalink

    Hi team.

    I’d love to get in on this but my current workload prohibits me. I’d just like to leave this link to (imho) a good Fortran on-line reference for those who are frustrated.
    http://www.star.le.ac.uk/~cgp/prof77.html

    BTW I usually work with an engineering tool used for verification of safety critical systems. The tool comprises of ksh/perl/fortran/C (latter two cross-compiled) software. Sometimes we cross-compile it with C++ and ADA too. That just happens when you keep a tool around for a while, so I’m not surprised at what you’re reporting about “inelegant programming”.

    Good Luck!

  133. Lynn Teuscher
    Posted Sep 14, 2007 at 4:26 PM | Permalink

    Where did all the discussion of testing the code go? No discussion of results or problems. I’m new to this site and find this very interesting. Would like to continue to follow the progress. I have extensive experience with Fortran from the early 60’s to the mid 90’s might be able to help some.

  134. steven mosher
    Posted Sep 14, 2007 at 5:22 PM | Permalink

    Hey Lynn.welcome aboard. read the thread and contact the people working the issues.

  135. Andy
    Posted Sep 14, 2007 at 9:41 PM | Permalink

    FWIW, no changes have been made to the “official” GISS code in the last few days (can’t say the same for the data :-) )

    Has anyone gotten all the source code steps to run yet?

  136. bernie
    Posted Sep 15, 2007 at 5:48 AM | Permalink

    I am with Lynn. Can someone summarize where “the gang” (that is the varied snd cosmopolitan computer geniuses, would be hackers, programmers in forgotten languages and those who remember punch cards) is with respect to getting the code to replicate one or other versions of the data?

  137. Steve McIntyre
    Posted Sep 15, 2007 at 6:45 AM | Permalink

    FWIW, no changes have been made to the “official” GISS code in the last few days (can’t say the same for the data.

    Please keep in mind that the “official version” of the code on Sept 7 introduced a change in data provenance that was inconsistent with the dset=1 versions of USHCN sites that I downloaded earlier in the day on Sept 7 (The timing being a coincidence.) They changed the online data versions some time in the next few days. This has been a little tricky to sort out.

  138. Mark
    Posted Sep 15, 2007 at 7:03 AM | Permalink

    Hi,

    I’m currently stuck on stage4_5 with an end of file condition, so, as a distraction I set about rewriting STEP0 in ‘C’ (after all, I wouldn’t be a programmer unless I had the urge to rewrite every bit of code I see).

    I’ve done everything up to dif.ushcn.ghcn.2005, which I’m just in the process of converting.

    I’ve pretty much stuck to their original algorithms, maybe changing gotos for while/do-while loops, and this far, the output compares exactly to that of the Fortran code.

    Mark.

  139. Clayton B.
    Posted Sep 15, 2007 at 7:55 PM | Permalink

    I too am with Lynn. It seems there’s been a loss in momemtum on this thing (coming from one that has contributed nothing!).

    I am very interested in the progress of understanding the original code and the potential development of an enhanced version. Please keep us up to date guys.

  140. steven mosher
    Posted Sep 15, 2007 at 8:03 PM | Permalink

    Ok, STEVEMC…

    Here is what I suggest. Mark is Furthest along. Mark if you do a 2 paragraph write up
    Of where you are and what you need and ASK STEVE MC to post it as an article you will
    Do two things.

    1. Become famous on CA.
    2. Refocus attention for thse who dig this.

    Basically it will give you top billing.

    SteveMc? what say ye?

  141. steven mosher
    Posted Sep 15, 2007 at 8:10 PM | Permalink

    And Mark, Your having another EOF in step 4 step 5.

    Is it random or deterministic?
    Apparently there is no EOF error trapping or exception handling so we are stuck with “printf” debugging.

    Which routine? sread?

    Steve

  142. Mark
    Posted Sep 16, 2007 at 3:08 AM | Permalink

    I think I need to dig out a description of the required processes for stages 4 and 5.

    Stage 4 adds years to to the SBBX.HadR2, but I can’t seem to find out which years need to be added (and do they even need to be applied to the version of SBBX.HadR2 that I have?).

    The error I’m now getting is ‘STOP ERROR: MONM0 TOO SMALL’ in SBBXotoBX.f during stage 5.

    I will do a short write up of what needs to be done to get to where I am. I’m away for a while today and I start my new job tomorrow, so busy times for me.

    Mark.

  143. H
    Posted Sep 16, 2007 at 6:55 AM | Permalink

    # 142 Mark

    MONM0 is used only to reserve memory. Why it’s not big enough, is impossible to say, since the comparition value comes from input files. The “standard” practice is to make a note and double MONM0 in the parameter line.

  144. Posted Sep 18, 2007 at 6:48 AM | Permalink

    Here’s my R port of Step 0.

    I’m ready to get stuck into Step 1, but am feeling a bit daunted by the Python code. Does anyone have a good enough understanding of the record combination algorithm to post some pseudo-code?

    Mike

  145. Jean S
    Posted Sep 18, 2007 at 6:59 AM | Permalink

    Mike, see #69/#73 above.

  146. Steve McIntyre
    Posted Sep 18, 2007 at 7:53 AM | Permalink

    #145. I’ll do a post on Step 1 and link to JEan S and my code. I’ve worked through the problem concurrent with Jean S and have reconciled in R up to annoying small differences of less than 0.5.

  147. Steve McIntyre
    Posted Sep 18, 2007 at 7:55 AM | Permalink

    #144. By the way, Mike, that looks like a clean way to present an implementation of Step 0. I’ll try to take a look in light of some of the version issues that have been discussed recently.

  148. Posted Sep 18, 2007 at 8:23 AM | Permalink

    Great..

    A few notes on my R implementation:

    – It’s based on my interpretation of the Fortran source and hasn’t been verified.
    – All the input filenames were prefixed with “GISSTEMP_” and suffixed with “.txt”
    – It relies on Importing USHCN data and Importing GHCN data into R workspaces, which where done separately.
    – It relies on an awk script that tidies up the messy antarc*.txt files. You can get both the script and the resulting CSV file from that link.

    So it might just look clean because I’ve hidden my mess ;)

  149. H
    Posted Sep 19, 2007 at 3:59 PM | Permalink

    Can someone produce or point me to padjust.f input files fort.1, fort.12 .. fort.17? And preferably also the output files fort.22 .. fort.27?

  150. Steve McIntyre
    Posted Sep 20, 2007 at 10:06 PM | Permalink

    Has anyone got through to Step 2? If so, I would like to get some info from your run.

  151. Posted Sep 21, 2007 at 9:54 AM | Permalink

    I’ve cross-posted the R code to my wiki and it now runs on the site using pre-loaded GISS datasets. I had made some code changes, mostly because of differences in data formats. I’m reasonably sure I haven’t broken anything in the process.

    I see a discussion on STEP 2 has started. Am I right in saying that there’s been no R code yet to replicate the concluding pieces of STEP1

    – running the above for each ID, and picking out the combined version
    – manual changes listed in Ts.discont.RS.alter.IN, combine_pieces_helena.in, Ts.strange.RSU.list.IN

    I also notice that the GISS input_files sumofday.tbl, mcdw.tbl, ushcn.tbl, & v2.inv are not required in Steve’s posted code for STEP 1. What are they for?

    Regards, Mike

  152. Posted Sep 21, 2007 at 10:16 AM | Permalink

    Sorry, meant to be post previous message on the STEP 1 thread.

  153. H
    Posted Sep 21, 2007 at 3:27 PM | Permalink

    I converted padjust.f (STEP 2) to MATLAB. If somebody can provide the input files, I could debug it :) Anyway, the code is here http://koti.mbnet.fi/ppppp/padjust_H.m

  154. NeedleFactory
    Posted Sep 22, 2007 at 3:54 PM | Permalink

    Question about STEP0:
    antarc_comb.exe appears to take two (filename) arguments.
    The first is ‘v2_antarct.dat’ (produced by ‘antarc_to_v2.sh’).
    What is the second?

  155. Rui Sousa
    Posted Dec 18, 2008 at 8:40 AM | Permalink

    Hello friends,

    I have taken the v2.mean file from GHCN and I have imported it to an SQL Server 2008 database, to explore the data with the tools I am used to.

    I imported the raw data to a new table and then runned a SQL script to transform -9999 into Null values.

    The create table script is:


    CREATE TABLE [dbo].[v2mean](
    [stationID] [varchar](11) NULL,
    [Tie] [varchar](1) NULL,
    [Ano] [varchar](4) NULL,
    [1] [varchar](5) NULL,
    [2] [varchar](5) NULL,
    [3] [varchar](5) NULL,
    [4] [varchar](5) NULL,
    [5] [varchar](5) NULL,
    [6] [varchar](5) NULL,
    [7] [varchar](5) NULL,
    [8] [varchar](5) NULL,
    [9] [varchar](5) NULL,
    [10] [varchar](5) NULL,
    [11] [varchar](5) NULL,
    [12] [varchar](5) NULL,
    [filler] [varchar](1) NULL
    ) ON [PRIMARY]

    The script to handle the null values is here:


    declare @Month as int
    Set @Month = 1
    Declare @SQLString as varchar(500)

    while (@Month <= 12)
    BEGIN

    SET @SQLString = ‘update dbo.[v2mean]
    set [{0}] = null
    where [{0}] = {1}’

    set @SQLString = REPLACE(@SQLString,'{0}’,cast(@Month as varchar))
    set @SQLString = REPLACE(@SQLString,'{1}’,’-9999′)

    execute (@sqlstring)

    set @Month = @Month + 1
    END

  156. Rui Sousa
    Posted Dec 18, 2008 at 8:51 AM | Permalink

    Then I imported the station data into another table, even if I wasn’t able to identify all columns, the porpuse was to be able to establish a relation with the mean data. This table has the following script:


    CREATE TABLE [dbo].[GHCNStationIventory](
    [StationID] [varchar](11) NULL,
    [Column 1] [varchar](1) NULL,
    [Station Name] [varchar](31) NULL,
    [Latitude] [varchar](6) NULL,
    [Longitude] [varchar](8) NULL,
    [Elevation] [varchar](5) NULL,
    [col 6] [varchar](5) NULL,
    [Urbanization] [varchar](1) NULL,
    [Column 8] [varchar](5) NULL,
    [Column 9] [varchar](2) NULL,
    [Column 10] [varchar](2) NULL,
    [Column 11] [varchar](7) NULL,
    [LandType] [varchar](16) NULL,
    [C13] [varchar](1) NULL,
    [Filler] [varchar](1) NULL
    ) ON [PRIMARY]

    My next step was to create a new table that in my opinion addresses better the Normals Forms of relational database (http://en.wikipedia.org/wiki/Database_normalization), here is the script to create this new table:

    CREATE TABLE [dbo].[StationMonthValue](
    [StationID] [bigint] NOT NULL,
    [Tie] [int] NOT NULL,
    [Year] [int] NOT NULL,
    [Month] [int] NOT NULL,
    [Value] [int] NULL,
    CONSTRAINT [PK_StationMonthValue_1] PRIMARY KEY CLUSTERED
    (
    [StationID] ASC,
    [Tie] ASC,
    [Year] ASC,
    [Month] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
    IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    On this new table I will have one record per value, which makes future queries much easier.

  157. Rui Sousa
    Posted Dec 18, 2008 at 8:58 AM | Permalink

    To copy data from the original format to the new table I used the following code:


    declare @Month as int
    Set @Month = 1
    Declare @SQLString as varchar(500)

    while (@Month <= 12)
    BEGIN

    SET @SQLString = ‘insert into dbo.StationMonthValue
    SELECT CAST( v2mean.stationID as bigint), Tie, cast(v2mean.Ano as int) as [year],
    cast(? as varchar) as [Month], cast( v2mean.[?] as int) as value
    FROM GHCNStationIventory INNER JOIN
    v2mean ON GHCNStationIventory.StationID = v2mean.stationID’

    SET @SQLString = REPLACE(@SQLString,’?’,cast(@Month as varchar))

    EXECUTE (@SQLString)

    SET @Month = @Month + 1
    END

    Then I queried this table to obtain the global average of the values, with this querie:

    SELECT AVG(Value)
    FROM dbo.StationMonthValue

    The result was: 120

    Does this mean that the whole dataset presents a raw average deviation from the reference period of 1.2 ºC ?

  158. Rui Sousa
    Posted Jan 22, 2009 at 7:09 AM | Permalink

    Sorry, it means the average temperature of this period is 12 ºC

  159. Posted Mar 28, 2009 at 5:28 AM | Permalink

    I just ran across this comment on the GISTemp Updates page:

    Feb. 11,2009: Two bugs in STEP0 programs were corrected before they had a chance to affect any results. Thanks to Dr. Peter O’Neill for discovering and reporting them to us.

    I don’t understand how bugs discovered in 2009 cannot have affected results of calculations made in, say, the 1990s.

    • Peter O'Neill
      Posted Mar 29, 2009 at 5:52 PM | Permalink

      Re: Dan Hughes (#159),

      I just ran across this comment on the GISTemp Updates page:

      Feb. 11,2009: Two bugs in STEP0 programs were corrected before they had a chance to affect any results. Thanks to Dr. Peter O’Neill for discovering and reporting them to us.

      I don’t understand how bugs discovered in 2009 cannot have affected results of calculations made in, say, the 1990s.

      Surprising as that may seem, that is the case here.

      The first bug I reported was located in unreachable code in antarc_comb.f, and would have had serious consequences but for the fact that it could never be executed. The danger of course would be that at some future date the code might be taken over by someone with experience only of structured programming, who might make the mistake of believing that FORTRAN code of a certain era was always first designed and then coded – and who might then “fix” the unreachable code.

      The second bug was in USHCN2v2.f, where the format statement used to read ushcn.tbl selected the wrong (and blank) column for a single digit numeric value, but as the FORTRAN code reads a blank as zero, and as all the values in the correct column are in fact zero for ushcn.tbl, again the current calculations were not affected. The other .tbl files, where the corresponding values would not have been zero, are read instead (correctly) in Python.

      The coders of GISTemp seemed to have liked to live dangerously. I’m curious whether this lucky streak will still hold should I find any further bugs as I work my way through the later steps.

      On reflection I’m beginning to think releasing the code could have been a clever strategy to get it debugged at no cost before old-timers like myself, who started out with FORTRAN II and may still be able to put ourselves in the mindset needed to follow some of the more esoteric code, die off.

      Perhaps this could be our final business opportunity – “we can review your ‘legacy’ FORTRAN II/IV, ALGOL, PL/1, 360 ASSEMBLY, COBOL, APL, RPG II … code; GOTOs welcome; all common, even if now deprecated, stupid programming tricks accepted”?

  160. Posted Mar 29, 2009 at 6:29 PM | Permalink

    Thanks for the info Peter.

    I would like to help get the code up and running. I have all the FORTRAN code compiled, but I did not yet look into the details of that coding. I’ve noticed that all the independent efforts started back in 2007/2008 seem to have come to a dead stop. I remain amazed that whatever it is that prevents independent porting of the code so problematic. Can code actually be so tightly tied specific machines and OS configurations?

    Drop me an e-mail dhughes4 at mac dot com.

  161. E.M.Smith
    Posted Jul 27, 2009 at 1:26 AM | Permalink

    I have GIStemp running on a x86 Linux (RedHat 7.2) box using the G95 compiler.

    I’ve run it up through step3. At step4_5 I hit the ‘endian’ problem with the read of the SBBX.HadR2 dataset in Step5 SBBXotoBX.f file (read(11) at line 112 or so) and convert.HadR2_mod4.upto15full_yrs.f with the same file. I’ve not yet reached the point of finding out if the oisstv2_mod4… et al files have similar problems.

    So I’m now looking at making an “endian conversion tool”. We’ll see. It might be easier just to get an old Macintosh PowerPC box (also big endian) and put linux on it with the G95 compiler…

    The major issue I found with the G95 compiler was the use, by GIStemp, of a non-standard extension to the language for doing data declarations in a variable declaration. Where they have:

    INT foo(3)/1,2,3/

    You need instead:

    INT foo(3)
    DATA foo /1,2,3/

    FWIW, I’ve moved the source files into a ‘src’ directory, the executables into a ‘bin’ directory and added a Makefile to do all the compiles (pulling that out of the other scripts). At this point I have a version that works pretty well (up to the bigendian point…) compiles well, and generally seems usable for testing.

  162. Posted Jul 27, 2009 at 1:31 AM | Permalink

    Oh, BTW, anyone wanting to ask question of me can do so at:

    http://chiefio.wordpress.com/gistemp/

    where I’m documenting my “discoveries”…

One Trackback

  1. [...] the graphic below compares the Detroit Lakes version from Step 0 from Not Sure not here to the Dset=0 version currently online at NASA GISS. The results are obviously similar in recent [...]

Follow

Get every new post delivered to your Inbox.

Join 3,317 other followers

%d bloggers like this: