Hansen Code

Technical Comments only.

This entry was written by Stephen McIntyre, posted on Sep 10, 2007 at 3:51 PM, filed under Hansen, Surface Record. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

163 Comments

Not sure

Posted Sep 8, 2007 at 5:23 PM | Permalink

STEP0 completes successfully on a Gentoo Linux system. The result is a 43MB text file that looks to be in the same format as the v2.mean downloaded from GHCN.

Notes: The FORTRAN compiler was gfortran 4.1.1. I used ksh-93 from Gentoo portage. In addition to adding “FC=gfortran” to your environment, you’ll have to add the current directory “.” to your PATH. Also, my version of tail did not like line 43 of do_comb_step0.sh. I had to change it to:

tail -n+100 input_files/t_hohenpeissenberg_200306.txt_as_received_July17_2003 > t_hohenpeissenberg

Note that that does not exactly “disregard pre-1880 data” as the comments say, because it still includes 1879.
Martin Å

Posted Sep 8, 2007 at 5:32 PM | Permalink

I also managed to complete STEP0 on a Ubuntu system. But STEP1 seems harder, since you have to compile some python extensions first. I got some error messages doing that which I don’t have a clue how to deal with. And without the extensions the python scripts wont run. Anyone with better luck?
Not sure

Posted Sep 8, 2007 at 6:01 PM | Permalink

RE: 111

Yeah, I stumbled on the Python modules too. They were developed against an ancient version of Python (1.5). The “Makefile.pre.in” method is obsolete. I’m currently trying to build them using the newer “distutils” method, but I”m probably going to have to take a break for a while.
bernie

Posted Sep 8, 2007 at 6:09 PM | Permalink

Martin #103. Thanks for posting the code. It looks like “Bad” is defined earlier. Also what I see looks close to what Steve and others had flushed out, namely the link to the long term anomaly for the station. I still think we need to be sure we understand how they are combining the data. SOme stations like Praha Ruzyne #91 — have adjustments over the time period of close to 3C.

#91
Doug
Excellent question. I think one task is to hone in on whatever leads to the classification of a site as rural versus non-rural, lit or unlit since these classifications dictate what should be influencing what. Rural or unlit stations that should really be classified as non-rural, i.e., large potential UHI or microsite effects, could have a big influence on adjustments that are made or not made to a non-rural site.
Sinan Unur

Posted Sep 8, 2007 at 9:00 PM | Permalink

Here is the kind of thing that bothers me:

GISTEMP_sources/STEP0/USHCN2v2.f:

if(temp.gt.-99.00) itemp(m)=nint( 50.*(temp-32.)/9 ) ! F->.1C

— Sinan
Not sure

Posted Sep 9, 2007 at 1:02 AM | Permalink

I created trivial setup.py files to get the Python C extensions to compile, but I guess the approach in #184 will work as well. Also, I used perl to munge the .c files and fix the references to deprecated memory functions e.g.:

perl -p -i.pymalloc -e "s/Py_Malloc/PyMem_Malloc/g" EXTENSIONS/cartgrid/cartgridmodule.c

I also found that lines 504 and 505 in stationstringmodule.c will cause my Python interpreter (2.4) to segfault. I commented them out, as they seem to be in error. Finally, I changed line 54 in v2_to_bdb.py to read “if key == ‘id’:”. There’s probably other stuff that I’m forgetting, but I’m still getting memory errors (*** glibc detected *** /usr/bin/python: free(): invalid pointer: 0x080e2c8c ***). This last error is also in stationstringmodule. It’s late and I’m going to bed. I’ll return to this tomorrow night if no one else has figured it out by then.
Mark

Posted Sep 9, 2007 at 5:32 AM | Permalink

Not Sure (#191)

To get rid of the glibc free() error, comment out the two PY{Mem}_Free calls near line 506 of stationstringmodule.c

Mark.
Not sure

Posted Sep 9, 2007 at 9:12 AM | Permalink

Mark (202) I did comment those lines as the Python docs make it clear that the results of PyString_AsString must not be deallocated. I was getting a different memory error before I did this.

Dr. Motl (205) that will certainly work, but it will litter the code with useless comments which eventually could become a nuisance.
Not sure

Posted Sep 9, 2007 at 9:38 AM | Permalink

The changes I’ve made should not affect the logic of the code. I just wonder if I can make it work on a low-cost Linux system. Sort of my personal crossword puzzle, if you will. I justify it to myself by thinking this will help make the logic accessible to those much smarter and actually versed in statistics, unlike me. It looks like this stuff was run on an AIX system, and not many people get time on those.

Dr. Hansen certainly has nothing to fear from me, and I actually harbor an (unrealistic?) hope that my efforts will be useful to him and his team. I find it interesting that it seems most think any errors that are found will adjust the mean temperature downwards. I guess that is to be expected considering Dr. Hansen’s predictions of doom, and his evasiveness, but I am sure that the people on this site will be honest about any errors that underestimated the amount of warming, so I fail to see the harm done here.
David Stockwell

Posted Sep 9, 2007 at 10:26 AM | Permalink

#223 I count 9854 lines of code actually. Only one order of magnitude less complex. Gavin is exaggerating.

$ find . -type f -name *.sh -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
297
$ find . -type f -name *.f -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
4239
$ find . -type f -name *.py -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
1167
$ find . -type f -name *.c -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
4151

Total 9854
Not sure

Posted Sep 9, 2007 at 11:30 AM | Permalink

Mark (218) Thanks, that did it. Those lines looked suspicious to me too, but I really didn’t know what they do, so I left them alone. Unfortunately it’s still not running for me. The script drop_strange.py dies for me now, and I’m out of time for today.

Out of curiosity, did you write setup.py scripts or did you u the approach in #184?
Mark

Posted Sep 9, 2007 at 12:17 PM | Permalink

Not Sure (#243)

Out of curiosity, did you write setup.py scripts or did you u the approach in #184?

Hi,

First thing I did was to install a seperate copy of python for use only by this program (Python 2.5.1).
Changed the PATH to point to the new python.
Modified the Makefile.pre.in file to remove the aix defines and change the shell to /bin/bash.
Changed the #!/…./python path in the python scripts to point to the new python.
Made the changes as described so far.
And ran the scripts.

If you build python without any CFLAG modifiers it will build with debugging symbols built in. This version can then be run under gdb, the gnu debugger (which is how I found the seg faults).

create a little parameter file with the arguments to use, ie:

set args -d comb_records.py v2.mean_comb run

and then execute by running gdb -x {parameter_file_name} /path/to/new/python, if anything goes wrong the debugger will break and it should be possible to work out what has gone wrong.

There are some other small changes I made but they should not affect the running of the code.
Maybe a set of diffs posted to the google code site would help.

Mark.
Mark

Posted Sep 9, 2007 at 1:04 PM | Permalink

Hi,

I was so busy talking about the C code I should have mentioned that in drop_strange.py I changed the
bsddb.hashopen() calls from .hashopen(NEW_BDB_NAME, ‘nc’) to .hashopen(NEW_BDB_NAME, ‘n’), removing the c from the hashopen options arguements.

hth,
Mark.
Mark

Posted Sep 9, 2007 at 2:14 PM | Permalink

Hi,

For anyone interested I have placed a patch at the follwing address

http://www.zen103403.zen.co.uk/patches/patch.html

It should apply cleanly to a fresh extract of the GISTEMP_sources.tar.gz file (don’t forget to extract the EXTENSION.tgz file in the STEP1 directory).

This should allow completion of steps 0 and 1 and part of step 2.

Mark.
2dogs

Posted Sep 9, 2007 at 2:51 PM | Permalink

#39 – Use the PRE tag
Martin Å

Posted Sep 9, 2007 at 4:48 PM | Permalink

Regarding the code.

I have with help from Mark:s patch (#268) been able to run STEP0 and STEP1 on data downloaded from NASA. I have only verified the two separate records and the combined record from Brno/Turany (611117230001). But all figures seems to match the version you can download from GISS. There is an artificial 0.1 degree lowering of the 1951-1986 part of the record, due to missing value in the older record for december 1991. The seasonal averages also doesn’t add up when a value is missing. The seasonal means are not produced as an output from STEP1 though.
steven mosher

Posted Sep 9, 2007 at 4:54 PM | Permalink

RE 286. VERY COOL.

After STEP 1 the output shuld be GISS temp Dset =1… combing stations at same location.

PERHAPS a good test Prog to write would be to compare GISS temp dset=1 ( steve has scraped that)
with the out put of step 1.
anonymous

Posted Sep 9, 2007 at 7:15 PM | Permalink

quick programming note about python and division for those not familiar with the language:

Float/Float -> Float

Int/Int -> Int (not Float; and is always truncated)

for example:

>>> a = 2.0
>>> b = 4.0
>>> a/b
0.5
>>> a = 2
>>> b = 4
>>> a/b
0

Also, keep in mind that it will not be possible to tell from the source code the exact type of any given variable as type is a run type feature. That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.
Mark

Posted Sep 9, 2007 at 7:40 PM | Permalink

Hi,

I’ve just updated the page pointed to by my earlier message.

There is a new patch that applies to a clean source tree that will run steps 0,1 and 2.

I have started to encounter rounding/representation issues with the code.

in STEP2 there is a source file called PApars.f. Code in this file caused an infinte loop because of a comparison between an integer cast to a float and a float expression.

The line
IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))

was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following

IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))

and the STEP2 shell script ran to completion.

I’m not sure of the effects this has on the final result, I don’t know any Fortran to be able to undestand fully what algorithm the code is trying to implement. It appears to be checking for the number of years needed.

Mark.
JerryB

Posted Sep 10, 2007 at 8:10 AM | Permalink

A question to those who have succeeded in running STEP0:
where did you find a copy of the input file named USHCN.v2.mean_noFIL ?
steven mosher

Posted Sep 10, 2007 at 8:21 AM | Permalink

334: JerryB here is the key

get_offset_noFIL

# get appropriate GHCN data -> fort.2
grep ^425 ../../DOWN*/v2.mean > ghcn_us_new

f77 dump.1980.f ; ln ghcn_us_new fort.1 ; a.out ; rm -f fort.1

# get USHCN data -> fort.1
cp -p /gcm/USHCN/DOWN*/USHCN.v2.mean_noFIL.gz .
gunzip USHCN.v2.mean_noFIL.gz
grep -v -e ‘-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999’ USHCN.v2.mean_noFIL > xxx
sort -n xxx > fort.1 ; rm -f xxx USHCN.v2.mean_noFIL
f77 dif.ushcn.ghcn.2005.f ; a.out ; rm -f fort.[12] a.out
mv fort.10 ushcn-ghcn_offset_noFIL
Mark

Posted Sep 10, 2007 at 8:27 AM | Permalink

A question to those who have succeeded in running STEP0:
where did you find a copy of the input file named USHCN.v2.mean_noFIL ?

Hi,

The only files I needed to obtain were v2.mean and hcn_doe_mean_data. After placing these two files in the input_files directory and uncompressing them the STEP0 process ran to completion.

I obtained them both from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2.

I am currently stuck in STEP3, trimSBBX.f is sometimes stopping with a seg fault (memory error) and sometimes stopping with an end of file condition. I don’t know enough Fortran to work out whats going wrong.

Its the process of STEP3, once that runs its onto the last.

Mark.
JerryB

Posted Sep 10, 2007 at 8:44 AM | Permalink

Re #334,

steven,

I saw that “key”, but I’m looking for the door that it fits. 🙂

Re #335,

Mark,

The hcn_doe_mean_data file can be found in the USHCN directory of the server
to which you linked, and also on a CDIAC server, but that file does not
include data with USHCN SHAP adjustments without FILNET adjustments. Without
the USHCN.v2.mean_noFIL file, I would not expect you to get the same temperature
numbers for USHCN stations that GISS gets.
Jean S

Posted Sep 10, 2007 at 9:36 AM | Permalink

#330: Thanks! I’m suspecting that the solution for the rounding thing is in the average-function:

data_row[n] = sums_row[n] / wgt

where I suspect both of the quantities on the right are integers. This is why I’m interested in how exactly Python rounds int/int. Notice also that the offset (diff) seems to be rounded:

diff = sum / wgt
JerryB

Posted Sep 10, 2007 at 10:10 AM | Permalink

The temperature data contained in the USHCN.v2.mean_noFIL file presumably
came from the 3A lines of the hcn_shap_ac_mean.Z file at
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/OtherIntermediates/
however, the format of those two files is apparently quite different, and
converting a copy of one into a copy of the other could be quite a chore.

Similar, but possibly slightly different, numbers could be found in the GHCN
v2.mean_adj.Z file, which would seem to be more similar to the expected format,
and with slight editing, might be “close enough”, but not an exact match. Such
and effort would also require extracting only the USHCN stations, and only
through year 2000 data.

Alternate plan B would be to see if someone such as Reto Ruedy would put a
copy from whence it could be downloaded.
Stevan Naylor

Posted Sep 10, 2007 at 10:25 AM | Permalink

Re 328 – According to my 1982 Fortran 77 Reference Guide – which I amazingly managed to unearth from the family tech archives – ” Integer division always truncates, rather than rounds.”

Thus in F77: -2/3 = 0
Not sure abount Python though…
Stevan Naylor

Posted Sep 10, 2007 at 11:32 AM | Permalink

RE 328 – I checked the Python online reference (here) and it appears that integer division involves the floor function:

“The / (division) and // (floor division) operators yield the quotient of their arguments. The numeric arguments are first converted to a common type. Plain or long integer division yields an integer of the same type; the result is that of mathematical division with the `floor’ function applied to the result. Division by zero raises the ZeroDivisionError exception. “

Thus: -2/3 = -1
steven mosher

Posted Sep 10, 2007 at 11:40 AM | Permalink

340… And did the SBXXgrid execute successfully and output successfully?
steven mosher

Posted Sep 10, 2007 at 11:51 AM | Permalink

340… SREAD…

My bet is the offending Variable in the SREAD call is ML… which is assigned
from INFO(4)

INFO(4) = Length of record.

In step 2 there were adjustments made to length of record.. Maybe your fix there casacaded
into an Run off the end f the file error here.

Might also explain the SEG error…

Steve
steven mosher

Posted Sep 10, 2007 at 1:38 PM | Permalink

re 362. No sweat.

If need be We can force a solution to the SREAD in step 2 by Hardcoding ML
to a small number and seeing if we get rid of the EOF problem. That will
let you continue while I look at the code.. If Restricting ML solves the
EOF issue then we KNOW its tracked back to INFO(4) and I can see if that was
updated in the file writes of Step2.
Scott Lurndal

Posted Sep 10, 2007 at 2:23 PM | Permalink

Re #185,#192, et. al.

The usage ‘Makefile.pre.in’ in the EXTENSIONS indicates that the extensions were built around autoconf. Makefile.pre.in
is one of the output files from autoconf (the main being configure). One would distribute the output from autoconf and the end-user would then run configure to generate the Makefile(s) specific to the actual platform (aix, linux, solaris, etc.) being used.

Here, however, only the intermediate makefile (hand customized for AIX) was distributed. Either the author didn’t understand how autoconf works, or the EXTENSIONS tarball is incomplete.
Not sure

Posted Sep 10, 2007 at 2:55 PM | Permalink

Scott Lurndal (364): I didn’t use the Makefile.pre.in files at all. I wrote setup.py scripts that I’ll try to publish somewhere soon.

The very first FORTRAN file in STEP2 is segfaulting for me. I uncommented the debug print at line 41 of text_to_binary.f, and here’s the output I get every time:

…
524 131617103790000 100POTSDAM
525 133617103810000 58BERLIN-DAHLEM
525 133617103810001 58BERLIN-DAHLEM
525 133617103810002 58BERLIN-DAHLEM
525 134617103840005 49BERLIN-TEMPEL
522 141617103930000 104LINDENBERG
./do_comb_step2.sh: line 18: 8602: Memory fault
breaking up Ts.bin into 6 zonal files
At line 20 of file split_binary.f
Fortran runtime error: Bad unit number in OPEN statement
trimming Ts.bin1-6
200690003 OSTROV VIZE 4 9999 9999 9999 9999
202920005 GMO IM.E.K. F
…

The shell script has no error checking at all, which makes it very frustrating to use.

Interestingly, my Ts.txt from STEP1 is 30759146 bytes long. It looks like Mark and I are diverging. I’ll try to get my stuff up on the web somewhere so that hopefully we can begin converging.
steven mosher

Posted Sep 10, 2007 at 4:07 PM | Permalink

Thanks Steve.

Currently to my Knowledge Mark and “Not Sure” are the only guys trying to compile
and run the code.

What we need. Guys who know Fortran, guys who know Python.. and we suspect the
distribution was hosted on AIX.
If you know Fortran or Python start learning the code as the guys trying to compile
do their thing. Remote debugging VIA Blog!

Plus Guys like JerryB who understand all the data that is out there ( ushcn) are
also a big bonus and a huge help.

Guys.. Lets recap the following For New Comers.

MARK and “not sure” Recap your status

1. System Config you have
1. Links to patches
2. Current status.
3. Open Issues.
Not sure

Posted Sep 10, 2007 at 5:03 PM | Permalink

1. Gentoo Linux, gfortran 4.1.1, ksh-93, Python 2.4
1.(?) No patches yet. Getting to it
2. Working on STEP2
3. Need to start on STEP3
Not sure

Posted Sep 10, 2007 at 5:10 PM | Permalink

OK, I resolved the text_to_binary.f issue by changing line 89 to read

if (ix.gt.0.and.idatum.ne.9999) then

STEP2 finished, and created the 12 files described in gistemp.txt.

file to_next_step/Ts.GHCN.CL.*
to_next_step/Ts.GHCN.CL.1: data
to_next_step/Ts.GHCN.CL.2: data
to_next_step/Ts.GHCN.CL.3: data
to_next_step/Ts.GHCN.CL.4: data
to_next_step/Ts.GHCN.CL.5: data
to_next_step/Ts.GHCN.CL.6: data
to_next_step/Ts.GHCN.CL.PA.1: data
to_next_step/Ts.GHCN.CL.PA.2: data
to_next_step/Ts.GHCN.CL.PA.3: data
to_next_step/Ts.GHCN.CL.PA.4: data
to_next_step/Ts.GHCN.CL.PA.5: data
to_next_step/Ts.GHCN.CL.PA.6: data
to_next_step/Ts.GHCN.CL.PA.station.list: ASCII text
to_next_step/Ts.GHCN.CL.station.list: ASCII text
to_next_step/Ts.GHCN.CL.station.list_orig: ASCII text

file to_next_step/Ts.GHCN.PA.*
to_next_step/Ts.GHCN.PA.1: data
to_next_step/Ts.GHCN.PA.2: data
to_next_step/Ts.GHCN.PA.3: data
to_next_step/Ts.GHCN.PA.4: data
to_next_step/Ts.GHCN.PA.5: data
to_next_step/Ts.GHCN.PA.6: data
to_next_step/Ts.GHCN.PA.station.list: ASCII text

ls -lh to_next_step/Ts.GHCN.PA.*
-rw-r–r– 1 jose users 770K Sep 10 16:08 to_next_step/Ts.GHCN.PA.1
-rw-r–r– 1 jose users 13M Sep 10 16:08 to_next_step/Ts.GHCN.PA.2
-rw-r–r– 1 jose users 2.2M Sep 10 16:08 to_next_step/Ts.GHCN.PA.3
-rw-r–r– 1 jose users 1.3M Sep 10 16:08 to_next_step/Ts.GHCN.PA.4
-rw-r–r– 1 jose users 1.2M Sep 10 16:08 to_next_step/Ts.GHCN.PA.5
-rw-r–r– 1 jose users 64K Sep 10 16:08 to_next_step/Ts.GHCN.PA.6
-rw-r–r– 1 jose users 510K Sep 10 16:08 to_next_step/Ts.GHCN.PA.station.list

As an aside, the -fbounds-check argument to gfortran can be very useful.

(Look Ma, I’m hacking FORTRAN!)
Paul Linsay

Posted Sep 10, 2007 at 6:25 PM | Permalink

#335, Mark, in Hansen Frees the Code

I am currently stuck in STEP3, trimSBBX.f is sometimes stopping with a seg fault (memory error) and sometimes stopping with an end of file condition.

This is usually the sign of an array index that is out of bounds.
scott lurndal

Posted Sep 10, 2007 at 6:33 PM | Permalink

#3

If monthly(m) has a negative or zero value, one must wonder if the read at line 83 read the correct values. What was the value that was causing a the memory fault, zero or a negative value?
ural

Posted Sep 10, 2007 at 8:01 PM | Permalink

From what I gather the data changes day by day for some of the inputs. It would be nice to freeze input on a certain date … just for testing purposes.

I would be happy to help on reversing the time stamps on the data sets … 1960-2000, 2000 now 1960, 1960 now 2000. It should show global cooling at the same rate as global warming … if the assumptions are correct.
Andy

Posted Sep 10, 2007 at 8:06 PM | Permalink

Re #374, I think Geoff’s right about the excitement factor in play here. I set up a repository on code DOT google DOT com/p/open-gistemp yesterday and have seen zero interest.

I’m going to use it for my own hacking, and I’ll do my best to incorporate the great work being done by Not Sure, Mark and the rest on the technical thread, with references back to the applicable posts.

Anyone who wants to join in the effort is more than welcome.
steven mosher

Posted Sep 10, 2007 at 8:30 PM | Permalink

RE4…

Ya thanks Thats the path we are headed down. I think we have an index walking off the end
of an array/file…

If you know Fortran… can we do any exception handling on the EOF error or are we resigned
to “printf” debugging?
steven mosher

Posted Sep 10, 2007 at 8:36 PM | Permalink

re 3. Ok… I’ll check the code and see if the change is substantial..

DID you implement the Fix that Mark had for the Line that was causing the Inf Loop?

The line

IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))

was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following

IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))
Paul Linsay

Posted Sep 10, 2007 at 8:54 PM | Permalink

#7, there is no such thing as exception handling in FORTRAN77 or earlier. READ/WRITE is your only choice.
Not sure

Posted Sep 10, 2007 at 9:27 PM | Permalink

Scott Lurndal (5) the negative value was not monthly(m), it was “ix” because the data contained some years before 1880, for some reason.

Steven Mosher (8) I did apply Mark’s fix, but I wasn’t even making it to that code. I was thinking about trying it without, just to see.

Paul Linsay (4) The segfault I was encountering was an index out of bounds condition. Compiling the file with -fbounds-check demonstrated this.
Not sure

Posted Sep 10, 2007 at 9:45 PM | Permalink

Steve (43) I shore as heck can’t find it. I’ve uploaded my result from STEP0 here
Andy

Posted Sep 10, 2007 at 10:46 PM | Permalink

Re #44. Results of a diff between my STEP0 and yours:

diff v2.mean_comb-andy v2.mean_comb-notsure
1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

These 3 records are the only differences between the 2 files.

My system is CentOS 4.5, pdksh 5.2.14-30.3, gfortran 4.1.1-53.EL4, python 2.3.4-14.4. No modifications to the GISS code were required for me to complete STEP0, and I downloaded the 2 input files in the STEP0_README.txt file today.
Andy

Posted Sep 10, 2007 at 10:53 PM | Permalink

Re #45 – boy that sure posted different than it looked in the preview. Let’s try that diff output again:

diff v2.mean_comb-andy v2.mean_comb-notsure
Andy’s row = 1, Not Sure’s row = 2

1-1016035500022007 137 145 139 172 202 230-9999 264-9999-9999-9999-9999
2-1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
1-6244025000012007 75 108 132 169 254 280 301 300-9999-9999-9999-9999
2-6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
1-6244031000012007 65 99 119 162 232 251 259 266-9999-9999-9999-9999
2-6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999
Not sure

Posted Sep 10, 2007 at 10:57 PM | Permalink

OK, I got STEP3 to run to completion, but I’m nervous about what I had to do. Starting at line 401 it now reads:

SUBROUTINE GRIDEA (p_NRM,TITLEZ, p_IBMM,p_NZS) C**** C**** This output grid dependent routine sets the parameters C**** IBM(J) (= number of regions in latitude belt J) and C**** KZONE(J,N) (=1 if belt J belongs to domain N) and the TITLES C**** C**** Current order of boxes: 1-4 north , west->east ... C**** . to C**** 77-80 south , west->east C**** Order of subboxes: 1-10 south , west->east ... C**** . to C**** 91-100 north , west->east C**** Sergei's ordering 1-10 west , south->north C**** . to C**** 91-100 east , south->north C**** INTEGER p_NRM, p_NZS, p_IBMM PARAMETER (NRM=80,ICM=10,JCM=10, NZS=3, NCM=ICM*JCM) CHARACTER*80 TITLEZ(*),TITLES(1+NZS)/ * ' LATITUDE BELT FROM XX.X ? TO XX.X ?', * ' NORTHERN LATITUDES: 23.6 N TO 90.0 N', * ' LOW LATITUDES: 23.6 S TO 23.6 N', * ' SOUTHERN LATITUDES: 90.0 S TO 23.6 S'/ COMMON/GRIDC/IBM(8),KZONE(8,NZS+3) C**** C**** Sergej's equal area grid C**** C**** Grid constants for latitude zones REAL BANDS(9)/90.,64.2,44.4,23.6,0.,-23.6,-44.4,-64.2,-90./ INTEGER NUMJ(8)/4,8,12,16,16,12,8,4/ C**** Check grid dimensions IF(p_NRM.NE.NRM.OR.NZS.NE.p_NZS) THEN WRITE(6,'(4I9)') p_NRM,NRM, p_NZS,NZS STOP 'ERROR: INCONSISTENCY FOR NRM NCM NZS' END IF C**** Loop over all ZONES DO 50 J=1,8 IBM(J)=NUMJ(J) IF(IBM(J).LE.p_IBMM) GO TO 20

For some reason the integer parameters p_NRM (originally called “NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the “INTEGER p_NRM…” line. Also, the original parameter names caused gfortran to raise syntax errors.
Not sure

Posted Sep 10, 2007 at 11:06 PM | Permalink

D’oh the file I modified in 47 is zonav.f
Ian McLeod

Posted Sep 10, 2007 at 11:08 PM | Permalink

Looks like diff is real. Andy is getting another month compared to Not Sure. Is that an interger or rounding problem?
Ross Berteig

Posted Sep 11, 2007 at 12:43 AM | Permalink

Re 47: My rusty mental file cabinet immediately tossed up the phrase “The dollar sign is reserved to Digital”. (A big chunk of my early programming experience was with a PDP-11/73 and a VAX-11/780 in assembler and C. Incidentally, I still have that PDP-11, and it still runs.) In a fair number of older languages the dollar sign is an acceptable part of an identifier along with the usual alphanumerics and underscore, and was often used much like underscore is by C folk.
David

Posted Sep 11, 2007 at 12:53 AM | Permalink

A dollar sign at the end of a variable was used to denote Strings in BASIC. A$=”HELLO”
Mark

Posted Sep 11, 2007 at 2:20 AM | Permalink

Not Sure (#47)

Are you sure that all steps of STEP3 are completing?

I’m still getting a memory error while running trimSBBX, I haven’t had a chance to look since yesterday, I’m just picking it back up again now.

I made similar changes to you for zonaf.f and managed to get it to run. I just made my changes the same as yours but the memory error while running trimSBBX still persists (I thought that the data types of zonaf may be causing the errors with trimSBBX).

For reference I’m developing on Linux using GNU Fortran (GCC) 4.2.1 and Python 2.5.1. I installed a Korn shell to run the scripts (mksh).

I’m also using ftncheck to give a quick heads up on possible fortran problems.

My current patch is available from here

I’m using v2.mean from here, but I’ll switch to a better one once all the steps are running.

My current sticking points are:

1. trimSBBX is stopping with either a memory fault or end of file condition. It appears to be random and I suspect its an issue with data types.

2. Assembling data for Step4_5. I’ve downloaded data from hadobs but I can’t access http://ftp.emc.ncep.noaa.gov, the connection times out. STEP4 is looking for SBBX.HadR2, but I cannot see where this file is created, maybe I should read the manual :))

Mark.
Cliff Huston

Posted Sep 11, 2007 at 5:06 AM | Permalink

#52 Mark,

I cant access http://ftp.emc.ncep.noaa.gov, the connection times out.

Try ftp://ftp.emc.ncep.noaa.gov/cmb/sst/oimonth_v2/

Cliff
H

Posted Sep 11, 2007 at 5:29 AM | Permalink

re # 47

“For some reason the integer parameters p_NRM (originally called NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the INTEGER p_NRM line. Also, the original parameter names caused gfortran to raise syntax errors.”

I recall that parameters with $ were system or shell parameters. Like the name of OS or the size of the memory.
Cliff Huston

Posted Sep 11, 2007 at 5:53 AM | Permalink

Mark,

STEP4 is looking for SBBX.HadR2 . . .

You probably have this figured by now, but: SBBX.HadR2 is the output from step 4, not an input to step 4.

Cliff
Mark

Posted Sep 11, 2007 at 6:13 AM | Permalink

#(53,54,55)

Hi,

THanks for your comments, they are all helping.

After downloading the monthly SST data I started having real problems reading the data. I then noticed a test program in the README and tried that. It turns out that you have to compile with the option “-fconvert=swap” when using gfortran.

I’m now starting again from STEP0 using this option on all the fortran code to see if this changes the end result.

Mark.
Not sure

Posted Sep 11, 2007 at 6:53 AM | Permalink

Mark, yes all the of STEP3 is working for me.

Did you just create an empty “SBBX.HadR2” file? That’s what I was going to try.
Not sure

Posted Sep 11, 2007 at 7:00 AM | Permalink

Ross Berteig (50) I figured it was some sort of obsolete syntax, but it’s nice to be sure. Thanks.
Not sure

Posted Sep 11, 2007 at 7:31 AM | Permalink

After reading some more it seems that “SBBX.HadR2” should come from the hadodbs site, and that it should be one of the ASCII format files. But which one? There’s no single file that covers the period they say they used (1870-present), and there are none with that name.
Dan Hughes

Posted Sep 11, 2007 at 7:46 AM | Permalink

If I produce variable type statements for the fortran routines can you use them? How would I get the modified code to you? I have an online place where I could leave the modified routines. I have the STEP0 routines doen if you want to see examples.
steven mosher

Posted Sep 11, 2007 at 7:48 AM | Permalink

re 60.

Dan, good to see you here! and thanks for helping
James

Posted Sep 11, 2007 at 8:05 AM | Permalink

Re #47: Learn some FORTRAN. Variables with first character I through N (as in INteger) are integers, and all other starting characters become floats. Good luck.
Lubo Motl

Posted Sep 11, 2007 at 8:09 AM | Permalink

Concerning the Praha-Ruzynì (airport) data, see especially barnie’s fascinating comment #91 at

http://www.climateaudit.org/?p=2018

Let me just point out one trivial observation. There is no “periodicity” of 3-4 years (one that would be related to leap years or anything like that). The reason why these lines are organized to periods that are 3-4 years long is simply that they want to make an adjustment that is continously increasing from -2.6 Celsius in 1909 to 0 in 2007. If you kindly allow me to give my own interpretation, they want to add a 2.6 Celsius degrees change to the trend in the last century, at least to the December values.

The precise adjustment that should be applied to a particular December is a linear function of time and it is eventually rounded to the 0.1 Celsius degree accuracy. That’s why the years end up being clumped into groups of 3-4 years. Does anyone understand what these corrections mean [snip]? Incidentally, if the trend is only added to one month, I would expect the overall trend to be artifically increased by 2.6/12 = 0.2 Celsius degrees (per century) or so, do I understand it well?

If I understand the signs of the adjustments well, they can hardly be justified by e.g. the urban heat island because the sign is wrong. It is the recent temperatures that should be reduced if you believe that they’re high because of urbanization.

One comment related to the “comment” section of Praha-Ruzyne: they say “WWII & Russian occupation”. Let me say that no Czech person would certainly use the word “Russian occupation” for anything that happened in the WWII years. We were liberated in 1945, not occupied. 😉 But that’s probably one of the less serious problems in the file.
Lubo Motl

Posted Sep 11, 2007 at 8:20 AM | Permalink

Let me just say one thing. I am not accusing anyone of anything, just making a thought experiment. See how temperatures and precipitation depend on the month (in Prague).

If I wanted to make fraud to support a theory that there is warming and it is primarily affecting winters, the best way would be to make old winters look colder. December and January are the two coldest months and the temperature is minimized near these two months – it is stationary so that it looks pretty much constant. So if you reduce the temperature in December, you will create colder winters in 1909 which is precisely what you want if you want to show that global warming makes winters milder. Moreover, the points near the stationary points – in this case, months near the coldest December or January – are the most natural months to be modified because you don’t create too big discontinuities.

These corrections look bizarre to me. It would be good if someone tried to give us a less criminal explanation what’s going on – as well as a more comprehensive analysis which stations besides Prague-Ruzynì were treated in a similar way.
Richard Sharpe

Posted Sep 11, 2007 at 8:24 AM | Permalink

James says:

Re #47: Learn some FORTRAN. Variables with first character I through N (as in INteger) are integers, and all other starting characters become floats. Good luck.

Sigh. A little knowledge is a dangerous thing. What about IMPLICIT?
Jaye

Posted Sep 11, 2007 at 8:29 AM | Permalink

I did comment those lines as the Python docs make it clear that the results of PyString_AsString must not be deallocated. I was getting a different memory error before I did this.

Ok guys if you get different results based on remapping memory by commenting out lines then at least one memory bug is living somewhere else and has not been found yet. Suggest somebody run the thing with valgrind in full verbose mode.
Lubo Motl

Posted Sep 11, 2007 at 8:31 AM | Permalink

Mark #14, wouldn’t it be easier for you and others to publish the full patched tar.gz archive of the full code and data?
Mark

Posted Sep 11, 2007 at 8:36 AM | Permalink

wouldnt it be easier for you and others to publish the full patched tar.gz archive of the full code and data?

I guess there would be no problem, its not that large.

I’ll make a full archive up of my changes and post them up at the same page as my patch.

Mark.
Jean S

Posted Sep 11, 2007 at 8:42 AM | Permalink

I think I’m able to perfectly replicate the GISS station combining method (STEP1). The key thing for extending the 2-station method to more stations is simply to always create an “intermediate reference station” for comparing with the new records. The final combination is then simply the average of the shifted records. Additionally, there seems to be no rounding issue after all. I guess rounding problems arouse as previously we tried to replicate with rounded annual means (=>rounded bias). My code seems to replicate perfectly all the stations I’ve tested so far (Bagdarin, Praha, Joensuu, Gassim, Bratsk).

A request: please check the workings of this code for other stations. I’ve spent already too much time with this, I should take a break 🙂 Also, if there are people out there who are fluent with R, it would be nice to have the code also in R 🙂

I’ve sent my code to Steve, so I’ll might put it up somewhere if it is too hard to copy from here. Here’s the main function:

--- cut here --- function combined=GISScombine(temperatures); % combines stations ala GISS % by Jean S (9/2007) % assumes integers; and full years starting in Jan % Minimum overlap for combining min_overlap=4; [years,stations]=size(temperatures); % 1st station is assumed to be the "best" fixed_stations(:,1)=temperatures(:,1); no_fixed=1; ref_station=temperatures(:,1); % annuals for all stations for k=1:stations; annual(:,k)=GISSannual(temperatures(:,k)); end available_stations=2:stations; % repeat until no new records to add while ~isempty(available_stations), [ref_annual,ref_seasonal]=GISSannual(ref_station); % Find the annual mean matches overlaps=repmat(isfinite(ref_annual),1,length(available_stations)) & ... isfinite(annual(:,available_stations)); % Choose the longest match (if equal then the last maximum) [overlap_years,overlaps_loc]=sort(sum(overlaps)); if overlap_years(end)
Here's the function for the annual means:
--- cut here --- function [annual,seasonal]=GISS_annual(temperatures); % Calculates the GISS seasonal and annual means % by Jean S (9/2007) % Assumes full year input starting in Jan years=length(temperatures)/12; temp=reshape(temperatures,12,years)'; seasonal=ones(years,4)*NaN; annual=ones(years,1)*NaN; % calculate the reference means and anomalies ref_mean=nanmean(temp); anomal=temp-repmat(ref_mean,years,1); % DJF seasonal(2:end,1)=mean([temp(1:end-1,12) temp(2:end,1:2)],2); season=[NaN anomal(1,1:2); anomal(1:end-1,12) anomal(2:end,1:2)]; I=sum(isnan(season),2)==1; seasonal(I,1)=nanmean(season(I,:)')'+repmat(mean(ref_mean([1:2 12])),sum(I),1); % MAM seasonal(:,2)=mean(temp(:,3:5),2); season=anomal(:,3:5); I=sum(isnan(season),2)==1; seasonal(I,2)=nanmean(season(I,:)')'+repmat(mean(ref_mean(3:5)),sum(I),1); % JJA seasonal(:,3)=mean(temp(:,6:8),2); season=anomal(:,6:8); I=sum(isnan(season),2)==1; seasonal(I,3)=nanmean(season(I,:)')'+repmat(mean(ref_mean(6:8)),sum(I),1); % SON seasonal(:,4)=mean(temp(:,9:11),2); season=anomal(:,9:11); I=sum(isnan(season),2)==1; seasonal(I,4)=nanmean(season(I,:)')'+repmat(mean(ref_mean(9:11)),sum(I),1);
% Annual annual=mean(seasonal,2); ref_mean_season=nanmean(seasonal); season=seasonal-repmat(ref_mean_season,years,1); I=sum(isnan(season),2)==1; annual(I)=nanmean(season(I,:)')'+repmat(mean(ref_mean_season),sum(I),1); --- cut here ---

The input should be in integers such that every record is in a column. The first column is assumed to be the “best” record (which is not selected according to the data, but according to a preference list. See Steve’s post and Hansen’s code for details). Usually this is the only one extending to current times. Here’s an example call (Bratsk; I sent the file also to Steve):

bratsk_data=load('bratsk.txt'); bratsk_data(:,2:end)=10*bratsk_data(:,2:end); originals=bratsk_data(:,[6 2:5]); combined_bratsk=GISScombine(originals); max(abs(combined_bratsk-bratsk_data(:,7)))

Have fun!
Not sure

Posted Sep 11, 2007 at 8:45 AM | Permalink

James (62) You think they could’ve mentioned that here. I guess the right way to fix those parameter names is NRM_ IBMM_, etc.
Jean S

Posted Sep 11, 2007 at 8:45 AM | Permalink

Ok, the first code was cut due the smaller than sign. Let’s wait for Steve to post the code.
Mark

Posted Sep 11, 2007 at 8:46 AM | Permalink

Hi,

I have now put a tgz archive at http://www.zen103403.zen.co.uk/patches/patch.html

This archive is the plain sources with my current patch applied.

Mark.
Steve McIntyre

Posted Sep 11, 2007 at 9:00 AM | Permalink

JEan S code online at http://www.climateaudit.org/scripts/station/hansen/jeans/
Not sure

Posted Sep 11, 2007 at 9:03 AM | Permalink

Andy, what version of tail do you have? I’m using GNU tail 6.4:

tail –version
tail (GNU coreutils) 6.4
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License .
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering.

You didn’t have to modify line 48 of “do_comb_step0.sh”?

Also, I appreciate your efforts in setting up a Google project for us. I haven’t participated yet because I’m wondering if there’s going to be a “CA branded” project as some have talked about. I also think that git might be a better choice for source control than svn. With git we’ll be able to have several different patch sets concurrently. I like this because I’m focused on creating the smallest diff that will yield code that runs on a commodity Linux system. I’m not going to try to fix memory leaks or anything else. If the original code leaks, then hopefully my modified version will leak in exactly the same way. This is to preempt objections about the correctness of modifications. I can see how others would like to fix things as they find them. If we have multiple patch sets we can accommodate these two approaches simultaneously. If it turns out that disparate patch sets produce the same outputs for the same inputs they can be merged later.
steven mosher

Posted Sep 11, 2007 at 9:36 AM | Permalink

re 47 and 65.

U know Fortran? Download the dusty deck and get crackin. Pick a routine.
document it.
steven mosher

Posted Sep 11, 2007 at 9:40 AM | Permalink

RE 66.

Somebody can run this through Pychecker and Valgrind while other folks try to get this
running.

Divide and conquer guys. Do what you are best at. report back.

Herding cats isnt that bad when they are smart cats.
steven mosher

Posted Sep 11, 2007 at 9:57 AM | Permalink

RE 74.

I think once somebody gets the pile to compile then it makes sense to mv to
a more amenable enviroment. As it stands we are hacking the blog paradigm
to do a remote coordinated debug.. or something like that.
Andy

Posted Sep 11, 2007 at 10:08 AM | Permalink

Re #74, I didn’t have to modify line 48. Tail version as follows (from CentOS 4.5 / Redhat EL4).

tail –version
tail (coreutils) 5.2.1
Written by Paul Rubin, David MacKenzie, Ian Lance Taylor, and Jim Meyering.

Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I agree with you on svn. Figured something was better than nothing, though.

I was thinking this morning that I can probably best leverage my limited Python (and non-existent Fortran) skills by grabbing the changes you guys post that I need to get my platform to do a clean run and publishing them in the google code. I’m pretty far behind you since I had to build a new sandbox for this, but I can take the time to document until I catch up (if possible :-))
James

Posted Sep 11, 2007 at 10:21 AM | Permalink

Re: #65, Richard

What about IMPLICIT? If it isn’t used in the code and the comment I was replying to was confused about variables being floats, explaining IMPLICIT won’t help.
A little knowledge can be dangerous, but almost zero FORTRAN knowledge will make this project really difficult.
James

Posted Sep 11, 2007 at 10:28 AM | Permalink

Re: #65, Richard

What about IMPLICIT? The original post didn\’t understand why certain variables were floats, and my explanation clarified that. If IMPLICIT isn\’t used in the code, why would I mention it?

Yes, \”A little knowledge can be dangerous\”, but almost zero FORTRAN knowledge and this project can be a lot more difficult.
Not sure

Posted Sep 11, 2007 at 10:33 AM | Permalink

OK, I’ve uploaded a clean working copy here. I tried to test it on a clean environment, but I ran into this, so YMMV.
James

Posted Sep 11, 2007 at 10:35 AM | Permalink

Re #75 Mosher

I\’ll get on it but you won\’t see me here. I\’ll be on the Google Code site, working in SVN and using the wiki there. This blog does not lend itself well to development, and since it isn\’t threaded conversations are a real pain…
Mark

Posted Sep 11, 2007 at 10:47 AM | Permalink

Hi,

From what I can make out the program named zonav, called by do_comb_step3.sh, attempts to use fort.11 as its input.
CALL SREAD(11,ML ,AR(1,IB),WTR(1,IB),LENR(IB))

But this file has been renamed by the previous operation in do_comb_step3.sh

mv fort.11 BX.Ts.${label}.$rad
so it can never run with valid data.

zonav writes its output to fort.10 which is what trimSBBX is expecting to use, and I think that is why I couldn’t get trimSBBX to run.

I am left thinking that these two operations may not actually be required for STEP4_5.

Has anyone else looked at this yet?

Mark.
David

Posted Sep 11, 2007 at 10:53 AM | Permalink

People used variable names beginning with I-N for integers by convention. I don’t think they have to ‘become’ anything. Web search engines are your friend, do a search on: implicit fortran
MarkW

Posted Sep 11, 2007 at 11:06 AM | Permalink

David,

Fortran does not (or at least did not) require one to declare a variable before using it. Variables that started with the letters I-N were declared integers, everything else was declared float.

This is another reason why Fortran variable names tended to be as short as possible. If you happened to misspell a variable, instead of declaring a parse error, Fortran would just declare the misspelled name to be a new variable, and continue happily on it’s way.
Mark

Posted Sep 11, 2007 at 11:32 AM | Permalink

Hi,

I think I’ve made some progress on STEP4.

The elusive SBBX.HadR2 file is, I think, SBBX1880.Ts.GHCN.CL.PA.1200 (one of the files from STEP3) renamed.

After downloading all the oiv… data files and unpacking them into the input_files directory, then renaming SBBX1880.Ts.GHCN.CL.PA.1200 to SBBX.HadR2 as described above, I am beginning to get somewhere (possibly somewhere wrong!).

As I said earlier, the oiv data files are in big-endian format and therefore need to be accessed differently than the others.

To get around this issue set the following environment variable (if you are using gfortran on an intel platform).

GFORTRAN_CONVERT_UNIT='swap;native:1-10,12-99'

This tells the fortran runtime to use byte swapping on only channel 11. Doing this will enable convert1.HadR2_mod4.exe to read the downloaded files.

I am now able to run convert1.HadR2_mod4.exe and see the program appearing to work.

Unfortunately it is now stopping with an end of file condition.

Mark.
Mark

Posted Sep 11, 2007 at 11:43 AM | Permalink

Hi,

Lol, its just as easy to do

GFORTRAN_CONVERT_UNIT='swap:11'

Has the same effect.

Mark.
Not sure

Posted Sep 11, 2007 at 11:51 AM | Permalink

Mark (86) Interesting step 5 requires both SBBX1880.Ts.GHCN.CL.PA.1200 and SBBX.HadR2. I guess this can be right since SBBX.HadR2 is modified in step 4.
steven mosher

Posted Sep 11, 2007 at 1:04 PM | Permalink

78. ANDY

thats cool. I know SteveMc and John Goetz seem to be focusing on Step 1. And I have
avoided the python since I’d just beclown myself ….

A really cool thing would be a stand alone Step0 Step 1 Module. Basically
Step 0 and step1 Take in all the land data and do a hansen dance and then output.

A standalone EXE that did that sh*t would be huge.

once verified we could asses the hansen Scribal dance.
David

Posted Sep 11, 2007 at 1:04 PM | Permalink

#85: I stand corrected. It has been a while! It seems so crude now days. If I were a scientist demanding conciseness…
Steve McIntyre

Posted Sep 11, 2007 at 1:17 PM | Permalink

#89. I’m more interested in replicating this in R (or Matlab) – in large part, so I can see what they are really doing. The blizzard of subscripts in Fortran and Python makes it very inelegant programming.

I can see my way towards Step 0 implementation already (although as I observe in another post) it looks like one needs a hypothesized obsolete USHCN version to precisely replicate the results presently online. Their Y2K patch is really goofy and can be circumvented by using up-to-date USHCN versions.

After I look through Jean S’s MAtlab script a little more closely, I should be able to have a simple R implementation of Step 1 as well.
steven mosher

Posted Sep 11, 2007 at 1:21 PM | Permalink

re 85.

I’m having PTFS right now. Post Tramatic Fortran Syndrome.

So Time out for a funny story.

1987. I’m sitting in a class taught by Mason Woo. ( google him he’s cool)

I’m sitting behind this DDG blond.

Mason: how many of you write C?
The whole frickin class raises their hand except the lovely lady.
Mason: Miss, what language do you know?
Babe: Fortran.

At this point I lean over to my partner in crime and say.
” She may be cute, but she’s no rocket scientist”

Then Mason asks everyone in the class what they do…blah blah
blah blah…

Miss Fortran babe: I program the guidance systems for ICBMs

DOH!
Richard deSousa

Posted Sep 11, 2007 at 1:45 PM | Permalink

RE #92: LOL…. so, did you submit your name for a date? 😛
steven mosher

Posted Sep 11, 2007 at 2:09 PM | Permalink

re 93.

She wrote Fortran for christs sake. If I asked her to pass me a pointer she would
have dissolved into tears and core dumped.

Instead, I cruised El Camino and payed my tributes at Andy Capp’s tavern.
Jean S

Posted Sep 11, 2007 at 2:24 PM | Permalink

The blizzard of subscripts in Fortran and Python makes it very inelegant programming.

Yes, and at least the Python code I looked, was rather terrible. Not to talk about the Fortran code (Step2), which is almost impossible to read, at least, for a non-Fortran programmer. Anyhow, I have a feeling that the best pearls of Hansen’s code are still hidden somewhere in that Fortran spaghetti. If I had even decent understanding of Fortran, I’d take a very careful look at padjust.f (which seems to contain the homogenization code) …
David

Posted Sep 11, 2007 at 2:51 PM | Permalink

#94: She would have spent 80% of the time trying to debug you. 🙂
steven mosher

Posted Sep 11, 2007 at 2:53 PM | Permalink

re 95.

Thats the big mystery routine.
SteveSadlov

Posted Sep 11, 2007 at 3:11 PM | Permalink

RE: #5 – With a quick scan of the thread, I did not see follow up comments. It bothers me too. I’ve got no issue with a F to C conversion, but it’s the reservation of an integer value that I cannot understand. That is, unless the goal is illogical built in rounding via automatic truncation of everything to the right of the decimal point.
Dave Dardinger

Posted Sep 11, 2007 at 4:03 PM | Permalink

I was reformating padjust.f after pulling it up in notepad to see how much of it I could still understand after decades not having seen any fortran. Most of it was pretty self-evident but I have a couple of questions.

1 What is the Calt which appears a couple of places after a “then”?
2 What happens in a “call adj(….)”?

I’d expect both of these call a program external to padjust, but where are they at?

Thanks 🙂
Not sure

Posted Sep 11, 2007 at 4:10 PM | Permalink

Andy (46) I ran STEP0 again on a different machine (gfortran 4.1.2, pdksh 5.2.14, AMD 64) and it gave me exactly the same file. I wonder if the version of tail is the difference. My work_files/t_hohenpeissenberg file starts in 1879 and ends in 2003.
Andy

Posted Sep 11, 2007 at 4:20 PM | Permalink

Interesting – so does mine. Tar up your work files and put up a link to them. I’ll pull them down and diff them against mine.
Stevan Naylor

Posted Sep 11, 2007 at 5:03 PM | Permalink

Dave [99] Anytime the Fortran compiler sees the character ‘C’ in the first “column” of a line of code, the whole line is treated as a COMMENT.

Looks like alternate ‘IF()’ statements were commented out but left in. I wonder why…
Not sure

Posted Sep 11, 2007 at 5:06 PM | Permalink

Andy, my work_files are here.
Stevan Naylor

Posted Sep 11, 2007 at 5:26 PM | Permalink

Dave [99] ‘alt()’ is a subroutine which is coded at the end of the padjust.f source code. It claims to apply ‘linear approximation’ (so sez the comment) but it’s going to take a while to understand what it’s really doing.
Dave Dardinger

Posted Sep 11, 2007 at 5:40 PM | Permalink

re: 102 Steve N

Thanks!
Dave Dardinger

Posted Sep 11, 2007 at 5:48 PM | Permalink

re: #104

Well, duh! I was distracted by the C**** (which I did realize was a comment) and missed the “subroutine” above. Thanks again.
Andy

Posted Sep 11, 2007 at 8:27 PM | Permalink

Re #’s 100, 101 and 103

Differences are in v2.meany and v2.meanz

diff andy notsure
Binary files andy/cmb2.ushcn.v2.exe and notsure/cmb2.ushcn.v2.exe differ
Binary files andy/cmb.hohenp.v2.exe and notsure/cmb.hohenp.v2.exe differ
Binary files andy/dif.ushcn.ghcn.2005.exe and notsure/dif.ushcn.ghcn.2005.exe differ
Binary files andy/dump_old.exe and notsure/dump_old.exe differ
Binary files andy/hohp_to_v2.exe and notsure/hohp_to_v2.exe differ
Binary files andy/USHCN2v2.exe and notsure/USHCN2v2.exe differ

diff andy/v2.meany notsure/v2.meany
1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

diff andy/v2.meanz notsure/v2.meanz
1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999
Not sure

Posted Sep 11, 2007 at 10:24 PM | Permalink

I bet you’re using a 32-bit system. That’s why all the executables differ. But the delta for the text output is exactly the same. Interesting.
Not sure

Posted Sep 11, 2007 at 10:47 PM | Permalink

Mystery solved. I decided to start again from scratch, including re-downloading the data files and presto! I get the same file you do. I tracked the difference down to v2.mean:

diff input_files/v2.mean ../../GISData/v2.mean
70c70
&lt 1016035500022007 137 145 139 172 202 230-9999 264-9999-9999-9999-9999
—
&gt 1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
533592c533592
&lt 6244025000012007 75 108 132 169 254 280 301 300-9999-9999-9999-9999
—
&gt 6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
533786c533786
&lt 6244031000012007 65 99 119 162 232 251 259 266-9999-9999-9999-9999
—
&gt 6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999

So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point I’m really not sure (pun intended) of what we consider authoritative.
H

Posted Sep 12, 2007 at 12:06 AM | Permalink

I looked trough padjust.f. It seems like a normal research code. Unfortunately 🙂 Basically it reads input files and writes adjusted result to corresponding output files. If input data is marked bad it is skipped. The data is looped by year (staring from December) and the adjustment is calculated from yearly values. Then it is added to the monthly values for the given range. Months marked bad are skipped.

What makes the result unprecitable are the year limits

subroutine adj
“———————-
sl=sl1
if (iy .gt. knee) sl=sl2

iya=iy
C-H Lower limit
if (iy .lt. iy1a) iya=iy1a
C-H Upper limit
if (iy .gt. iy2a) iya=iy2a

iadj=nint( (iya-knee)*sl-(iy2a-knee)*sl2 )
——————–“
Ralph Becket

Posted Sep 12, 2007 at 12:19 AM | Permalink

When creating diffs, you may find the -u option to diff gives you more readable output.
Michael Cassin

Posted Sep 12, 2007 at 3:10 AM | Permalink

Re #91,

Steve, I’m working on an R port too. Check out the page on Stikir.

What do you make of this in antarc_comb.f?
if(icc0.eq.iccp.and.idp.eq.id0) then ! should not happen write(*,*) 'ID already used',icc0,id0 stop 4 end if

I read that as saying that there should be no station overlap in v2_mean and the antarc?.txt files. Unless I’ve made a mistake, I think there is.
Jean S

Posted Sep 12, 2007 at 3:41 AM | Permalink

#110:

the adjustment is calculated from yearly values.

So how exactly is the adjustment calculated? It is described in Hansen (1999) as

The homogeneity adjustment for a given city is defined to change linearly with time between
1950 and the final year of data and to change linearly with a possibly different slope between 1950
and the beginning of the record. The slopes of the two straight line segments are chosen to minimize
the weighted-mean root-mean-square difference of the urban station time series with the time series of nearby rural stations.
Mark

Posted Sep 12, 2007 at 3:59 AM | Permalink

Hi,

Right from STEP0/antarc_to_v2.sh there is a possibility for error. When the shell script is extracting data it does a grep in the .list file using the station name as a key to extract the station id.

‘Faraday\Vernadsky’ (as it is defined in antarc1.txt) is defined in antarc1.list as ‘FaradayVernadsky’.
It can only be that the ‘\V’ is detected as in invalid backslash sequence and replaced with a ‘V’ for it to work.

Mark.
KevinUK

Posted Sep 12, 2007 at 6:10 AM | Permalink

#109 Not sure

“So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point Im really not sure (pun intended) of what we consider authoritative.”

Does this look like someone is making corrections as we speak? Perhaps on the basis of reading this blog? Is Reto Reudy making changes to the ‘simplified version of the code that real scientists would be interested’ in a the moment and updating the files on GISS using it.

KevinUK
Steve McIntyre

Posted Sep 12, 2007 at 6:12 AM | Permalink

Could someone who’s compiled Step 1 pause for a moment and produce some intermediates for the Step 1 crossword puzzle. Could you do the Step 1 calculations for (say) Bratsk “22230309000. This has 5 dset=0 series which are added one by one. Could you report on the results for each step so that I can benchmark replication in R (and confirm what is going on in this step.)
Steve McIntyre

Posted Sep 12, 2007 at 6:16 AM | Permalink

#107. It looks like some 2007 values have been updated. NOthing untoward in that in a system that is producing monthly results.
Scott

Posted Sep 12, 2007 at 6:32 AM | Permalink

Something looks wrong in GISTEMP_source/STEP0/USCHN2v2.f, line 18:

itemp=-9999

The problem I see is itemp is declared as an integer array. Gnu Fortran 3.4.4 compiler halts on line 18 with an error. I assume the above assignment is supposed to initialize all indicies in itemp to -9999.
I fixed this by commenting out line 18 and adding the following at line 27:

itemp(m)=-9999

Any thoughts? Maybe someone who is not get a compiler error can verify the assignment is properly initialzing all indicies in the itemp array.
gb

Posted Sep 12, 2007 at 7:12 AM | Permalink

Re # 118.

In Fortran90 the statement itemp = -9999 initializes all array elements I think, itemp(m) initializes only element m. Probably more correct is itemp(1:m) = -9999.
Michael Cassin

Posted Sep 12, 2007 at 7:13 AM | Permalink

Re my #112

There is overlapping station data between SCAR and GHCN, but they are maintained as seperate time series in v2_meanx using the duplicate field (character 12 in ID I think).

Deduced from antarc_to_v2.sh:

Duplicate number 7 is the australian data in antarc2.txt
Duplicate number 8 is surface station data in antarc1.txt
Duplicate number 9 is autom. weather stn data in antarc3.txt
Dan Hughes

Posted Sep 12, 2007 at 7:21 AM | Permalink

re # 118

Absoft FORTRAN 77 Compiler ver. 8.2a, (Mac OS X 10.4.10), also gives the error. A local DO ENDO can be used for a fix.
Andy

Posted Sep 12, 2007 at 7:23 AM | Permalink

Re #’S 107 & 117, I had listed this mismatch as an issue on the Google code page and speculated that it may be due to different input files being used. Glad to know it’s resolved.

Steve Mc, is it possible to host a frozen copy of the input files in the CA downloads area for testing?
Trent

Posted Sep 12, 2007 at 8:11 AM | Permalink

Re #118,

In FORTRAN 90 and 95, entire arrays can be assigned a value with a single statement of the form:

itemp = -9999

In FORTRAN 77, this was not possible. One had to assign each element separately in a do loop.
Not sure

Posted Sep 12, 2007 at 11:39 AM | Permalink

Steve (116) The step 1 results for Bratsk are here. The full results for step 1 are here. These are with Saturday’s data file. I can re-run it with the latest data file if you like.
Lucius Vorenus

Posted Sep 12, 2007 at 3:57 PM | Permalink

anonymous: That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.

This is something which has infuriated me for years.

Kernighan & Ritchie [& their ilk, such as McCarthy] made a terrible, disastrous error when they decided to go with poor typing, and character-driven, business-oriented, ASCII-ness of data values, rather than binary, scientific-oriented, strong typing of data, and almost every language which has succeeded them has made the same mistake, and it plagues the discipline like a cancer to this day.

I know of precious few languages which have strong typing of mathematical quantities: languages which support [inherently] a distinction between 8, 16, 32, and 64 bit integers [preferably signed & unsigned]; 16, 32, 64, 80/96, and 128 bit real floats; and 32, 64, 128, 160/192, and 256 bit complex floats.

Or even which are capable of understanding [inherently] the distinction between 8 [ASCII], 16, and 32 bit Unicode characters, etc.

Instead, what we’re left with is inanities such as:

<html>
<body>

<script language="javascript">

var x = 2;
var y = 2;
window.alert(x + " + " + y + " = " + x + y);

window.alert("Oops!\n\nLet’s try that again.");

var i = parseInt(2);
var j = parseInt(2);
var k = parseInt(parseInt(i) + parseInt(j));
window.alert(i + " + " + j + " = " + k);

</script>

</body>
</html>

And the very fact that someone would introduce a poorly-typed language into any implementation of a mathematical calculation raises a huge red flag.
Mark

Posted Sep 13, 2007 at 3:48 AM | Permalink

Lucius Vorenus (#125)

This is probably off topic, but you could have got the correct result if you had typed

window.alert(x + " + " + y + " = " + (x + y));

(Note brackets round expression).

It is the ECMA specification that specifies that the + sign concatenates the values if there are strings.

The first x would be treated as a number, but when the string object is encountered everthing is converted to a string object and concatenated. The objects within the ‘(x + y)’ are all numbers so the x + y is evaluated as a numeric expression and then converted to string.

I don’t think that its the typing of the object thats the problem, its when you convert between types that problems arise. Lack of knowledge about the particular language that you use is also a factor (not directed at you).

I personally think that objects should be typeless until the point that you do something with them, and then they should adopt a type that is suitable for the operation being performed on them. If there is no suitable representation of that object for the particular operation then an error should be raised.

Why should numbers be given special preference?

Mark.
Mark

Posted Sep 13, 2007 at 4:03 AM | Permalink

In a different thread, Steven Mosher said,

Areas covered occasionally by sea ice are masked using a time-independent mask.
The Reynolds climatology is included, since it also may be used to find that
mask. Programs are included to show how to regrid these anomaly maps:
do_comb_step4.sh adds a single or several successive months for the same year
to an existing ocean file SBBX.HadR2; a program to add several years is also
included.

Result: update of SBBX.HadR2

Does anyone know where to get this existing SBBX.HadR2 ?

Mark.
fFreddy

Posted Sep 13, 2007 at 4:19 AM | Permalink

Re #125, Lucius Vorenus

Kernighan & Ritchie [& their ilk, such as McCarthy] made a terrible, disastrous error when they decided to go with poor typing

That’s a bit unfair. The original K&R required explcit declaration of type, and the C compiler I was using 20 years ago generated an error if you tried to make an assignment between types without an explicit cast.

I entirely agree with your general point though – I still always switch on all available strong typing options.
fFreddy

Posted Sep 13, 2007 at 4:28 AM | Permalink

Re #126, Mark

I personally think that objects should be typeless until the point that you do something with them, and then they should adopt a type that is suitable for the operation being performed on them.

I think this relies too much on perfection from the programmer. It’s too easy to make small mistakes like assigning a float to an integer when you didn’t mean to. When that happens, I would prefer it if the compiler points it out as an error. If the assignment is what I meant, then I can override with an explicit conversion.
Strong typing means slightly more work for the programmer, but a big increase in confidence that the code is doing what you think it is doing.
Andy

Posted Sep 13, 2007 at 7:54 AM | Permalink

Re #127, GISS added the following in the revised gistemp.txt included the 9/10/07 source code release.

Final Notes
———–
A program that can read the two basic files SBBX1880.Ts.GHCN.CL.PA.1200 and
SBBX.HadR2 in order to compute anomaly and trend maps etc was available on our
web site for many years and still is.

…

This prompted me to nose around their ftp site, and I located a SBBX.HadR2 at ftp://data.giss.nasa.gov/pub/gistemp
Mark

Posted Sep 13, 2007 at 8:42 AM | Permalink

Andy (#130)

Thats great, thank you very much. I’ll start from there.

Mark.
Erik Petrini

Posted Sep 14, 2007 at 5:22 AM | Permalink

Hi team.

I’d love to get in on this but my current workload prohibits me. I’d just like to leave this link to (imho) a good Fortran on-line reference for those who are frustrated.
http://www.star.le.ac.uk/~cgp/prof77.html

BTW I usually work with an engineering tool used for verification of safety critical systems. The tool comprises of ksh/perl/fortran/C (latter two cross-compiled) software. Sometimes we cross-compile it with C++ and ADA too. That just happens when you keep a tool around for a while, so I’m not surprised at what you’re reporting about “inelegant programming”.

Good Luck!
Lynn Teuscher

Posted Sep 14, 2007 at 4:26 PM | Permalink

Where did all the discussion of testing the code go? No discussion of results or problems. I’m new to this site and find this very interesting. Would like to continue to follow the progress. I have extensive experience with Fortran from the early 60’s to the mid 90’s might be able to help some.
steven mosher

Posted Sep 14, 2007 at 5:22 PM | Permalink

Hey Lynn.welcome aboard. read the thread and contact the people working the issues.
Andy

Posted Sep 14, 2007 at 9:41 PM | Permalink

FWIW, no changes have been made to the “official” GISS code in the last few days (can’t say the same for the data 🙂 )

Has anyone gotten all the source code steps to run yet?
bernie

Posted Sep 15, 2007 at 5:48 AM | Permalink

I am with Lynn. Can someone summarize where “the gang” (that is the varied snd cosmopolitan computer geniuses, would be hackers, programmers in forgotten languages and those who remember punch cards) is with respect to getting the code to replicate one or other versions of the data?
Steve McIntyre

Posted Sep 15, 2007 at 6:45 AM | Permalink

FWIW, no changes have been made to the official GISS code in the last few days (cant say the same for the data.

Please keep in mind that the “official version” of the code on Sept 7 introduced a change in data provenance that was inconsistent with the dset=1 versions of USHCN sites that I downloaded earlier in the day on Sept 7 (The timing being a coincidence.) They changed the online data versions some time in the next few days. This has been a little tricky to sort out.
Mark

Posted Sep 15, 2007 at 7:03 AM | Permalink

Hi,

I’m currently stuck on stage4_5 with an end of file condition, so, as a distraction I set about rewriting STEP0 in ‘C’ (after all, I wouldn’t be a programmer unless I had the urge to rewrite every bit of code I see).

I’ve done everything up to dif.ushcn.ghcn.2005, which I’m just in the process of converting.

I’ve pretty much stuck to their original algorithms, maybe changing gotos for while/do-while loops, and this far, the output compares exactly to that of the Fortran code.

Mark.
Clayton B.

Posted Sep 15, 2007 at 7:55 PM | Permalink

I too am with Lynn. It seems there’s been a loss in momemtum on this thing (coming from one that has contributed nothing!).

I am very interested in the progress of understanding the original code and the potential development of an enhanced version. Please keep us up to date guys.
steven mosher

Posted Sep 15, 2007 at 8:03 PM | Permalink

Ok, STEVEMC…

Here is what I suggest. Mark is Furthest along. Mark if you do a 2 paragraph write up
Of where you are and what you need and ASK STEVE MC to post it as an article you will
Do two things.

1. Become famous on CA.
2. Refocus attention for thse who dig this.

Basically it will give you top billing.

SteveMc? what say ye?
steven mosher

Posted Sep 15, 2007 at 8:10 PM | Permalink

And Mark, Your having another EOF in step 4 step 5.

Is it random or deterministic?
Apparently there is no EOF error trapping or exception handling so we are stuck with “printf” debugging.

Which routine? sread?

Steve
Mark

Posted Sep 16, 2007 at 3:08 AM | Permalink

I think I need to dig out a description of the required processes for stages 4 and 5.

Stage 4 adds years to to the SBBX.HadR2, but I can’t seem to find out which years need to be added (and do they even need to be applied to the version of SBBX.HadR2 that I have?).

The error I’m now getting is ‘STOP ERROR: MONM0 TOO SMALL’ in SBBXotoBX.f during stage 5.

I will do a short write up of what needs to be done to get to where I am. I’m away for a while today and I start my new job tomorrow, so busy times for me.

Mark.
H

Posted Sep 16, 2007 at 6:55 AM | Permalink

# 142 Mark

MONM0 is used only to reserve memory. Why it’s not big enough, is impossible to say, since the comparition value comes from input files. The “standard” practice is to make a note and double MONM0 in the parameter line.
Michael Cassin

Posted Sep 18, 2007 at 6:48 AM | Permalink

Here’s my R port of Step 0.

I’m ready to get stuck into Step 1, but am feeling a bit daunted by the Python code. Does anyone have a good enough understanding of the record combination algorithm to post some pseudo-code?

Mike
Jean S

Posted Sep 18, 2007 at 6:59 AM | Permalink

Mike, see #69/#73 above.
Steve McIntyre

Posted Sep 18, 2007 at 7:53 AM | Permalink

#145. I’ll do a post on Step 1 and link to JEan S and my code. I’ve worked through the problem concurrent with Jean S and have reconciled in R up to annoying small differences of less than 0.5.
Steve McIntyre

Posted Sep 18, 2007 at 7:55 AM | Permalink

#144. By the way, Mike, that looks like a clean way to present an implementation of Step 0. I’ll try to take a look in light of some of the version issues that have been discussed recently.
Michael Cassin

Posted Sep 18, 2007 at 8:23 AM | Permalink

Great..

A few notes on my R implementation:

– It’s based on my interpretation of the Fortran source and hasn’t been verified.
– All the input filenames were prefixed with “GISSTEMP_” and suffixed with “.txt”
– It relies on Importing USHCN data and Importing GHCN data into R workspaces, which where done separately.
– It relies on an awk script that tidies up the messy antarc*.txt files. You can get both the script and the resulting CSV file from that link.

So it might just look clean because I’ve hidden my mess 😉
H

Posted Sep 19, 2007 at 3:59 PM | Permalink

Can someone produce or point me to padjust.f input files fort.1, fort.12 .. fort.17? And preferably also the output files fort.22 .. fort.27?
Steve McIntyre

Posted Sep 20, 2007 at 10:06 PM | Permalink

Has anyone got through to Step 2? If so, I would like to get some info from your run.
Michael Cassin

Posted Sep 21, 2007 at 9:54 AM | Permalink

I’ve cross-posted the R code to my wiki and it now runs on the site using pre-loaded GISS datasets. I had made some code changes, mostly because of differences in data formats. I’m reasonably sure I haven’t broken anything in the process.

I see a discussion on STEP 2 has started. Am I right in saying that there’s been no R code yet to replicate the concluding pieces of STEP1

– running the above for each ID, and picking out the combined version
– manual changes listed in Ts.discont.RS.alter.IN, combine_pieces_helena.in, Ts.strange.RSU.list.IN

I also notice that the GISS input_files sumofday.tbl, mcdw.tbl, ushcn.tbl, & v2.inv are not required in Steve’s posted code for STEP 1. What are they for?

Regards, Mike
Michael Cassin

Posted Sep 21, 2007 at 10:16 AM | Permalink

Sorry, meant to be post previous message on the STEP 1 thread.
H

Posted Sep 21, 2007 at 3:27 PM | Permalink

I converted padjust.f (STEP 2) to MATLAB. If somebody can provide the input files, I could debug it 🙂 Anyway, the code is here http://koti.mbnet.fi/ppppp/padjust_H.m
NeedleFactory

Posted Sep 22, 2007 at 3:54 PM | Permalink

Question about STEP0:
antarc_comb.exe appears to take two (filename) arguments.
The first is ‘v2_antarct.dat’ (produced by ‘antarc_to_v2.sh’).
What is the second?
Rui Sousa

Posted Dec 18, 2008 at 8:40 AM | Permalink

Hello friends,

I have taken the v2.mean file from GHCN and I have imported it to an SQL Server 2008 database, to explore the data with the tools I am used to.

I imported the raw data to a new table and then runned a SQL script to transform -9999 into Null values.

The create table script is:

CREATE TABLE [dbo].[v2mean](
[stationID] [varchar](11) NULL,
[Tie] [varchar](1) NULL,
[Ano] [varchar](4) NULL,
[1] [varchar](5) NULL,
[2] [varchar](5) NULL,
[3] [varchar](5) NULL,
[4] [varchar](5) NULL,
[5] [varchar](5) NULL,
[6] [varchar](5) NULL,
[7] [varchar](5) NULL,
[8] [varchar](5) NULL,
[9] [varchar](5) NULL,
[10] [varchar](5) NULL,
[11] [varchar](5) NULL,
[12] [varchar](5) NULL,
[filler] [varchar](1) NULL
) ON [PRIMARY]

The script to handle the null values is here:

declare @Month as int
Set @Month = 1
Declare @SQLString as varchar(500)

while (@Month <= 12)
BEGIN

SET @SQLString = ‘update dbo.[v2mean]
set [{0}] = null
where [{0}] = {1}’

set @SQLString = REPLACE(@SQLString,'{0}’,cast(@Month as varchar))
set @SQLString = REPLACE(@SQLString,'{1}’,’-9999′)

execute (@sqlstring)

set @Month = @Month + 1
END
Rui Sousa

Posted Dec 18, 2008 at 8:51 AM | Permalink

Then I imported the station data into another table, even if I wasn’t able to identify all columns, the porpuse was to be able to establish a relation with the mean data. This table has the following script:

CREATE TABLE [dbo].[GHCNStationIventory](
[StationID] [varchar](11) NULL,
[Column 1] [varchar](1) NULL,
[Station Name] [varchar](31) NULL,
[Latitude] [varchar](6) NULL,
[Longitude] [varchar](8) NULL,
[Elevation] [varchar](5) NULL,
[col 6] [varchar](5) NULL,
[Urbanization] [varchar](1) NULL,
[Column 8] [varchar](5) NULL,
[Column 9] [varchar](2) NULL,
[Column 10] [varchar](2) NULL,
[Column 11] [varchar](7) NULL,
[LandType] [varchar](16) NULL,
[C13] [varchar](1) NULL,
[Filler] [varchar](1) NULL
) ON [PRIMARY]

My next step was to create a new table that in my opinion addresses better the Normals Forms of relational database (http://en.wikipedia.org/wiki/Database_normalization), here is the script to create this new table:

CREATE TABLE [dbo].[StationMonthValue](
[StationID] [bigint] NOT NULL,
[Tie] [int] NOT NULL,
[Year] [int] NOT NULL,
[Month] [int] NOT NULL,
[Value] [int] NULL,
CONSTRAINT [PK_StationMonthValue_1] PRIMARY KEY CLUSTERED
(
[StationID] ASC,
[Tie] ASC,
[Year] ASC,
[Month] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

On this new table I will have one record per value, which makes future queries much easier.
Rui Sousa

Posted Dec 18, 2008 at 8:58 AM | Permalink

To copy data from the original format to the new table I used the following code:

declare @Month as int
Set @Month = 1
Declare @SQLString as varchar(500)

while (@Month <= 12)
BEGIN

SET @SQLString = ‘insert into dbo.StationMonthValue
SELECT CAST( v2mean.stationID as bigint), Tie, cast(v2mean.Ano as int) as [year],
cast(? as varchar) as [Month], cast( v2mean.[?] as int) as value
FROM GHCNStationIventory INNER JOIN
v2mean ON GHCNStationIventory.StationID = v2mean.stationID’

SET @SQLString = REPLACE(@SQLString,’?’,cast(@Month as varchar))

EXECUTE (@SQLString)

SET @Month = @Month + 1
END

Then I queried this table to obtain the global average of the values, with this querie:

SELECT AVG(Value)
FROM dbo.StationMonthValue

The result was: 120

Does this mean that the whole dataset presents a raw average deviation from the reference period of 1.2 ºC ?
Rui Sousa

Posted Jan 22, 2009 at 7:09 AM | Permalink

Sorry, it means the average temperature of this period is 12 ºC
Dan Hughes

Posted Mar 28, 2009 at 5:28 AM | Permalink

I just ran across this comment on the GISTemp Updates page:

Feb. 11,2009: Two bugs in STEP0 programs were corrected before they had a chance to affect any results. Thanks to Dr. Peter O’Neill for discovering and reporting them to us.

I don’t understand how bugs discovered in 2009 cannot have affected results of calculations made in, say, the 1990s.
- Peter O'Neill
  
  Posted Mar 29, 2009 at 5:52 PM | Permalink
  
  Re: Dan Hughes (#159),
  
  I just ran across this comment on the GISTemp Updates page:
  
  Feb. 11,2009: Two bugs in STEP0 programs were corrected before they had a chance to affect any results. Thanks to Dr. Peter O’Neill for discovering and reporting them to us.
  
  I don’t understand how bugs discovered in 2009 cannot have affected results of calculations made in, say, the 1990s.
  
  Surprising as that may seem, that is the case here.
  
  The first bug I reported was located in unreachable code in antarc_comb.f, and would have had serious consequences but for the fact that it could never be executed. The danger of course would be that at some future date the code might be taken over by someone with experience only of structured programming, who might make the mistake of believing that FORTRAN code of a certain era was always first designed and then coded – and who might then “fix” the unreachable code.
  
  The second bug was in USHCN2v2.f, where the format statement used to read ushcn.tbl selected the wrong (and blank) column for a single digit numeric value, but as the FORTRAN code reads a blank as zero, and as all the values in the correct column are in fact zero for ushcn.tbl, again the current calculations were not affected. The other .tbl files, where the corresponding values would not have been zero, are read instead (correctly) in Python.
  
  The coders of GISTemp seemed to have liked to live dangerously. I’m curious whether this lucky streak will still hold should I find any further bugs as I work my way through the later steps.
  
  On reflection I’m beginning to think releasing the code could have been a clever strategy to get it debugged at no cost before old-timers like myself, who started out with FORTRAN II and may still be able to put ourselves in the mindset needed to follow some of the more esoteric code, die off.
  
  Perhaps this could be our final business opportunity – “we can review your ‘legacy’ FORTRAN II/IV, ALGOL, PL/1, 360 ASSEMBLY, COBOL, APL, RPG II … code; GOTOs welcome; all common, even if now deprecated, stupid programming tricks accepted”?
Dan Hughes

Posted Mar 29, 2009 at 6:29 PM | Permalink

Thanks for the info Peter.

I would like to help get the code up and running. I have all the FORTRAN code compiled, but I did not yet look into the details of that coding. I’ve noticed that all the independent efforts started back in 2007/2008 seem to have come to a dead stop. I remain amazed that whatever it is that prevents independent porting of the code so problematic. Can code actually be so tightly tied specific machines and OS configurations?

Drop me an e-mail dhughes4 at mac dot com.
E.M.Smith

Posted Jul 27, 2009 at 1:26 AM | Permalink

I have GIStemp running on a x86 Linux (RedHat 7.2) box using the G95 compiler.

I’ve run it up through step3. At step4_5 I hit the ‘endian’ problem with the read of the SBBX.HadR2 dataset in Step5 SBBXotoBX.f file (read(11) at line 112 or so) and convert.HadR2_mod4.upto15full_yrs.f with the same file. I’ve not yet reached the point of finding out if the oisstv2_mod4… et al files have similar problems.

So I’m now looking at making an “endian conversion tool”. We’ll see. It might be easier just to get an old Macintosh PowerPC box (also big endian) and put linux on it with the G95 compiler…

The major issue I found with the G95 compiler was the use, by GIStemp, of a non-standard extension to the language for doing data declarations in a variable declaration. Where they have:

INT foo(3)/1,2,3/

You need instead:

INT foo(3)
DATA foo /1,2,3/

FWIW, I’ve moved the source files into a ‘src’ directory, the executables into a ‘bin’ directory and added a Makefile to do all the compiles (pulling that out of the other scripts). At this point I have a version that works pretty well (up to the bigendian point…) compiles well, and generally seems usable for testing.
E.M.Smith

Posted Jul 27, 2009 at 1:31 AM | Permalink

Oh, BTW, anyone wanting to ask question of me can do so at:

GIStemp

where I’m documenting my “discoveries”…

One Trackback

By Crossword Puzzle #4: Re-Visiting Y2K « Climate Audit on Oct 29, 2011 at 2:15 PM

[…] the graphic below compares the Detroit Lakes version from Step 0 from Not Sure not here to the Dset=0 version currently online at NASA GISS. The results are obviously similar in recent […]

Climate Audit

Hansen Code

163 Comments

One Trackback

Tip Jar

Pages

Categories

Articles

Blogroll

Favorite posts

Links

Weblogs and resources

Archives

NOTICE

Search

Blog Stats

Twitter Updates

Recent Posts

Recent Comments

Meta

Climate Audit

Hansen Code

Share this:

Related

163 Comments

One Trackback

Tip Jar

Pages

Categories

Articles

Blogroll

Favorite posts

Links

Weblogs and resources

Archives

NOTICE

Search

Blog Stats

Twitter Updates

Recent Posts

Recent Comments

Meta