Technical Comments only.
-
Tip Jar
-
Pages
-
Categories
-
Articles
-
Blogroll
- Accuweather Blogs
- Andrew Revkin
- Anthony Watts
- Bishop Hill
- Bob Tisdale
- Dan Hughes
- David Stockwell
- Icecap
- Idsos
- James Annan
- Jeff Id
- Josh Halpern
- Judith Curry
- Keith Kloor
- Klimazweibel
- Lubos Motl
- Lucia's Blackboard
- Matt Briggs
- NASA GISS
- Nature Blogs
- RealClimate
- Roger Pielke Jr
- Roger Pielke Sr
- Roman M
- Science of Doom
- Tamino
- Warwick Hughes
- Watts Up With That
- William Connolley
- WordPress.com
- World Climate Report
-
Favorite posts
-
Links
-
Weblogs and resources
-
Archives
163 Comments
STEP0 completes successfully on a Gentoo Linux system. The result is a 43MB text file that looks to be in the same format as the v2.mean downloaded from GHCN.
Notes: The FORTRAN compiler was gfortran 4.1.1. I used ksh-93 from Gentoo portage. In addition to adding “FC=gfortran” to your environment, you’ll have to add the current directory “.” to your PATH. Also, my version of tail did not like line 43 of do_comb_step0.sh. I had to change it to:
tail -n+100 input_files/t_hohenpeissenberg_200306.txt_as_received_July17_2003 > t_hohenpeissenberg
Note that that does not exactly “disregard pre-1880 data” as the comments say, because it still includes 1879.
I also managed to complete STEP0 on a Ubuntu system. But STEP1 seems harder, since you have to compile some python extensions first. I got some error messages doing that which I don’t have a clue how to deal with. And without the extensions the python scripts wont run. Anyone with better luck?
RE: 111
Yeah, I stumbled on the Python modules too. They were developed against an ancient version of Python (1.5). The “Makefile.pre.in” method is obsolete. I’m currently trying to build them using the newer “distutils” method, but I”m probably going to have to take a break for a while.
Martin #103. Thanks for posting the code. It looks like “Bad” is defined earlier. Also what I see looks close to what Steve and others had flushed out, namely the link to the long term anomaly for the station. I still think we need to be sure we understand how they are combining the data. SOme stations like Praha Ruzyne #91 — have adjustments over the time period of close to 3C.
#91
Doug
Excellent question. I think one task is to hone in on whatever leads to the classification of a site as rural versus non-rural, lit or unlit since these classifications dictate what should be influencing what. Rural or unlit stations that should really be classified as non-rural, i.e., large potential UHI or microsite effects, could have a big influence on adjustments that are made or not made to a non-rural site.
Here is the kind of thing that bothers me:
GISTEMP_sources/STEP0/USHCN2v2.f:
if(temp.gt.-99.00) itemp(m)=nint( 50.*(temp-32.)/9 ) ! F->.1C
— Sinan
I created trivial setup.py files to get the Python C extensions to compile, but I guess the approach in #184 will work as well. Also, I used perl to munge the .c files and fix the references to deprecated memory functions e.g.:
perl -p -i.pymalloc -e "s/Py_Malloc/PyMem_Malloc/g" EXTENSIONS/cartgrid/cartgridmodule.c
I also found that lines 504 and 505 in stationstringmodule.c will cause my Python interpreter (2.4) to segfault. I commented them out, as they seem to be in error. Finally, I changed line 54 in v2_to_bdb.py to read “if key == ‘id’:”. There’s probably other stuff that I’m forgetting, but I’m still getting memory errors (*** glibc detected *** /usr/bin/python: free(): invalid pointer: 0x080e2c8c ***). This last error is also in stationstringmodule. It’s late and I’m going to bed. I’ll return to this tomorrow night if no one else has figured it out by then.
Not Sure (#191)
To get rid of the glibc free() error, comment out the two PY{Mem}_Free calls near line 506 of stationstringmodule.c
Mark.
Mark (202) I did comment those lines as the Python docs make it clear that the results of PyString_AsString must not be deallocated. I was getting a different memory error before I did this.
Dr. Motl (205) that will certainly work, but it will litter the code with useless comments which eventually could become a nuisance.
The changes I’ve made should not affect the logic of the code. I just wonder if I can make it work on a low-cost Linux system. Sort of my personal crossword puzzle, if you will. I justify it to myself by thinking this will help make the logic accessible to those much smarter and actually versed in statistics, unlike me. It looks like this stuff was run on an AIX system, and not many people get time on those.
Dr. Hansen certainly has nothing to fear from me, and I actually harbor an (unrealistic?) hope that my efforts will be useful to him and his team. I find it interesting that it seems most think any errors that are found will adjust the mean temperature downwards. I guess that is to be expected considering Dr. Hansen’s predictions of doom, and his evasiveness, but I am sure that the people on this site will be honest about any errors that underestimated the amount of warming, so I fail to see the harm done here.
#223 I count 9854 lines of code actually. Only one order of magnitude less complex. Gavin is exaggerating.
$ find . -type f -name *.sh -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
297
$ find . -type f -name *.f -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
4239
$ find . -type f -name *.py -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
1167
$ find . -type f -name *.c -exec wc -l {} \; | awk ‘{s+=$1} END {print s}’
4151
Total 9854
Mark (218) Thanks, that did it. Those lines looked suspicious to me too, but I really didn’t know what they do, so I left them alone. Unfortunately it’s still not running for me. The script drop_strange.py dies for me now, and I’m out of time for today.
Out of curiosity, did you write setup.py scripts or did you u the approach in #184?
Not Sure (#243)
Hi,
First thing I did was to install a seperate copy of python for use only by this program (Python 2.5.1).
Changed the PATH to point to the new python.
Modified the Makefile.pre.in file to remove the aix defines and change the shell to /bin/bash.
Changed the #!/…./python path in the python scripts to point to the new python.
Made the changes as described so far.
And ran the scripts.
If you build python without any CFLAG modifiers it will build with debugging symbols built in. This version can then be run under gdb, the gnu debugger (which is how I found the seg faults).
create a little parameter file with the arguments to use, ie:
set args -d comb_records.py v2.mean_comb
run
and then execute by running gdb -x {parameter_file_name} /path/to/new/python, if anything goes wrong the debugger will break and it should be possible to work out what has gone wrong.
There are some other small changes I made but they should not affect the running of the code.
Maybe a set of diffs posted to the google code site would help.
Mark.
Hi,
I was so busy talking about the C code I should have mentioned that in drop_strange.py I changed the
bsddb.hashopen() calls from .hashopen(NEW_BDB_NAME, ‘nc’) to .hashopen(NEW_BDB_NAME, ‘n’), removing the c from the hashopen options arguements.
hth,
Mark.
Hi,
For anyone interested I have placed a patch at the follwing address
http://www.zen103403.zen.co.uk/patches/patch.html
It should apply cleanly to a fresh extract of the GISTEMP_sources.tar.gz file (don’t forget to extract the EXTENSION.tgz file in the STEP1 directory).
This should allow completion of steps 0 and 1 and part of step 2.
Mark.
#39 – Use the PRE tag
Regarding the code.
I have with help from Mark:s patch (#268) been able to run STEP0 and STEP1 on data downloaded from NASA. I have only verified the two separate records and the combined record from Brno/Turany (611117230001). But all figures seems to match the version you can download from GISS. There is an artificial 0.1 degree lowering of the 1951-1986 part of the record, due to missing value in the older record for december 1991. The seasonal averages also doesn’t add up when a value is missing. The seasonal means are not produced as an output from STEP1 though.
RE 286. VERY COOL.
After STEP 1 the output shuld be GISS temp Dset =1… combing stations at same location.
PERHAPS a good test Prog to write would be to compare GISS temp dset=1 ( steve has scraped that)
with the out put of step 1.
quick programming note about python and division for those not familiar with the language:
Float/Float -> Float
Int/Int -> Int (not Float; and is always truncated)
for example:
>>> a = 2.0
>>> b = 4.0
>>> a/b
0.5
>>> a = 2
>>> b = 4
>>> a/b
0
Also, keep in mind that it will not be possible to tell from the source code the exact type of any given variable as type is a run type feature. That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.
Hi,
I’ve just updated the page pointed to by my earlier message.
There is a new patch that applies to a clean source tree that will run steps 0,1 and 2.
I have started to encounter rounding/representation issues with the code.
in STEP2 there is a source file called PApars.f. Code in this file caused an infinte loop because of a comparison between an integer cast to a float and a float expression.
The line
IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))
was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following
IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))
and the STEP2 shell script ran to completion.
I’m not sure of the effects this has on the final result, I don’t know any Fortran to be able to undestand fully what algorithm the code is trying to implement. It appears to be checking for the number of years needed.
Mark.
A question to those who have succeeded in running STEP0:
where did you find a copy of the input file named USHCN.v2.mean_noFIL ?
334: JerryB here is the key
get_offset_noFIL
# get appropriate GHCN data -> fort.2
grep ^425 ../../DOWN*/v2.mean > ghcn_us_new
f77 dump.1980.f ; ln ghcn_us_new fort.1 ; a.out ; rm -f fort.1
# get USHCN data -> fort.1
cp -p /gcm/USHCN/DOWN*/USHCN.v2.mean_noFIL.gz .
gunzip USHCN.v2.mean_noFIL.gz
grep -v -e ‘-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999-9999’ USHCN.v2.mean_noFIL > xxx
sort -n xxx > fort.1 ; rm -f xxx USHCN.v2.mean_noFIL
f77 dif.ushcn.ghcn.2005.f ; a.out ; rm -f fort.[12] a.out
mv fort.10 ushcn-ghcn_offset_noFIL
Hi,
The only files I needed to obtain were v2.mean and hcn_doe_mean_data. After placing these two files in the input_files directory and uncompressing them the STEP0 process ran to completion.
I obtained them both from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2.
I am currently stuck in STEP3, trimSBBX.f is sometimes stopping with a seg fault (memory error) and sometimes stopping with an end of file condition. I don’t know enough Fortran to work out whats going wrong.
Its the process of STEP3, once that runs its onto the last.
Mark.
Re #334,
steven,
I saw that “key”, but I’m looking for the door that it fits. 🙂
Re #335,
Mark,
The hcn_doe_mean_data file can be found in the USHCN directory of the server
to which you linked, and also on a CDIAC server, but that file does not
include data with USHCN SHAP adjustments without FILNET adjustments. Without
the USHCN.v2.mean_noFIL file, I would not expect you to get the same temperature
numbers for USHCN stations that GISS gets.
#330: Thanks! I’m suspecting that the solution for the rounding thing is in the average-function:
where I suspect both of the quantities on the right are integers. This is why I’m interested in how exactly Python rounds int/int. Notice also that the offset (diff) seems to be rounded:
The temperature data contained in the USHCN.v2.mean_noFIL file presumably
came from the 3A lines of the hcn_shap_ac_mean.Z file at
ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/OtherIntermediates/
however, the format of those two files is apparently quite different, and
converting a copy of one into a copy of the other could be quite a chore.
Similar, but possibly slightly different, numbers could be found in the GHCN
v2.mean_adj.Z file, which would seem to be more similar to the expected format,
and with slight editing, might be “close enough”, but not an exact match. Such
and effort would also require extracting only the USHCN stations, and only
through year 2000 data.
Alternate plan B would be to see if someone such as Reto Ruedy would put a
copy from whence it could be downloaded.
Re 328 – According to my 1982 Fortran 77 Reference Guide – which I amazingly managed to unearth from the family tech archives – ” Integer division always truncates, rather than rounds.”
Thus in F77: -2/3 = 0
Not sure abount Python though…
RE 328 – I checked the Python online reference (here) and it appears that integer division involves the floor function:
Thus: -2/3 = -1
340… And did the SBXXgrid execute successfully and output successfully?
340… SREAD…
My bet is the offending Variable in the SREAD call is ML… which is assigned
from INFO(4)
INFO(4) = Length of record.
In step 2 there were adjustments made to length of record.. Maybe your fix there casacaded
into an Run off the end f the file error here.
Might also explain the SEG error…
Steve
re 362. No sweat.
If need be We can force a solution to the SREAD in step 2 by Hardcoding ML
to a small number and seeing if we get rid of the EOF problem. That will
let you continue while I look at the code.. If Restricting ML solves the
EOF issue then we KNOW its tracked back to INFO(4) and I can see if that was
updated in the file writes of Step2.
Re #185,#192, et. al.
The usage ‘Makefile.pre.in’ in the EXTENSIONS indicates that the extensions were built around autoconf. Makefile.pre.in
is one of the output files from autoconf (the main being configure). One would distribute the output from autoconf and the end-user would then run configure to generate the Makefile(s) specific to the actual platform (aix, linux, solaris, etc.) being used.
Here, however, only the intermediate makefile (hand customized for AIX) was distributed. Either the author didn’t understand how autoconf works, or the EXTENSIONS tarball is incomplete.
Scott Lurndal (364): I didn’t use the Makefile.pre.in files at all. I wrote setup.py scripts that I’ll try to publish somewhere soon.
The very first FORTRAN file in STEP2 is segfaulting for me. I uncommented the debug print at line 41 of text_to_binary.f, and here’s the output I get every time:
The shell script has no error checking at all, which makes it very frustrating to use.
Interestingly, my Ts.txt from STEP1 is 30759146 bytes long. It looks like Mark and I are diverging. I’ll try to get my stuff up on the web somewhere so that hopefully we can begin converging.
Thanks Steve.
Currently to my Knowledge Mark and “Not Sure” are the only guys trying to compile
and run the code.
What we need. Guys who know Fortran, guys who know Python.. and we suspect the
distribution was hosted on AIX.
If you know Fortran or Python start learning the code as the guys trying to compile
do their thing. Remote debugging VIA Blog!
Plus Guys like JerryB who understand all the data that is out there ( ushcn) are
also a big bonus and a huge help.
Guys.. Lets recap the following For New Comers.
MARK and “not sure” Recap your status
1. System Config you have
1. Links to patches
2. Current status.
3. Open Issues.
1. Gentoo Linux, gfortran 4.1.1, ksh-93, Python 2.4
1.(?) No patches yet. Getting to it
2. Working on STEP2
3. Need to start on STEP3
OK, I resolved the text_to_binary.f issue by changing line 89 to read
if (ix.gt.0.and.idatum.ne.9999) then
STEP2 finished, and created the 12 files described in gistemp.txt.
As an aside, the -fbounds-check argument to gfortran can be very useful.
(Look Ma, I’m hacking FORTRAN!)
#335, Mark, in Hansen Frees the Code
This is usually the sign of an array index that is out of bounds.
#3
If monthly(m) has a negative or zero value, one must wonder if the read at line 83 read the correct values. What was the value that was causing a the memory fault, zero or a negative value?
From what I gather the data changes day by day for some of the inputs. It would be nice to freeze input on a certain date … just for testing purposes.
I would be happy to help on reversing the time stamps on the data sets … 1960-2000, 2000 now 1960, 1960 now 2000. It should show global cooling at the same rate as global warming … if the assumptions are correct.
Re #374, I think Geoff’s right about the excitement factor in play here. I set up a repository on code DOT google DOT com/p/open-gistemp yesterday and have seen zero interest.
I’m going to use it for my own hacking, and I’ll do my best to incorporate the great work being done by Not Sure, Mark and the rest on the technical thread, with references back to the applicable posts.
Anyone who wants to join in the effort is more than welcome.
RE4…
Ya thanks Thats the path we are headed down. I think we have an index walking off the end
of an array/file…
If you know Fortran… can we do any exception handling on the EOF error or are we resigned
to “printf” debugging?
re 3. Ok… I’ll check the code and see if the change is substantial..
DID you implement the Fix that Mark had for the Line that was causing the Inf Loop?
The line
IF(FLOAT(N3).LT.XCRIT*(N3L-N3F+1.))
was evaluating to true even when N3 and the RHS expression were supposedly equal (I logged them both having the value of 62 to 8 decimal places). I changed the code to the following
IF((FLOAT(N3)+.00001).LT.XCRIT*(N3L-N3F+1.))
#7, there is no such thing as exception handling in FORTRAN77 or earlier. READ/WRITE is your only choice.
Scott Lurndal (5) the negative value was not monthly(m), it was “ix” because the data contained some years before 1880, for some reason.
Steven Mosher (8) I did apply Mark’s fix, but I wasn’t even making it to that code. I was thinking about trying it without, just to see.
Paul Linsay (4) The segfault I was encountering was an index out of bounds condition. Compiling the file with -fbounds-check demonstrated this.
Steve (43) I shore as heck can’t find it. I’ve uploaded my result from STEP0 here
Re #44. Results of a diff between my STEP0 and yours:
diff v2.mean_comb-andy v2.mean_comb-notsure
1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999
These 3 records are the only differences between the 2 files.
My system is CentOS 4.5, pdksh 5.2.14-30.3, gfortran 4.1.1-53.EL4, python 2.3.4-14.4. No modifications to the GISS code were required for me to complete STEP0, and I downloaded the 2 input files in the STEP0_README.txt file today.
Re #45 – boy that sure posted different than it looked in the preview. Let’s try that diff output again:
diff v2.mean_comb-andy v2.mean_comb-notsure
Andy’s row = 1, Not Sure’s row = 2
1-1016035500022007 137 145 139 172 202 230-9999 264-9999-9999-9999-9999
2-1016035500022007 137 145 139 172 202 230-9999-9999-9999-9999-9999-9999
1-6244025000012007 75 108 132 169 254 280 301 300-9999-9999-9999-9999
2-6244025000012007 75 108 132 169 254 280 301-9999-9999-9999-9999-9999
1-6244031000012007 65 99 119 162 232 251 259 266-9999-9999-9999-9999
2-6244031000012007 65 99 119 162 232 251 259-9999-9999-9999-9999-9999
OK, I got STEP3 to run to completion, but I’m nervous about what I had to do. Starting at line 401 it now reads:
SUBROUTINE GRIDEA (p_NRM,TITLEZ, p_IBMM,p_NZS)
C****
C**** This output grid dependent routine sets the parameters
C**** IBM(J) (= number of regions in latitude belt J) and
C**** KZONE(J,N) (=1 if belt J belongs to domain N) and the TITLES
C****
C**** Current order of boxes: 1-4 north , west->east ...
C**** . to
C**** 77-80 south , west->east
C**** Order of subboxes: 1-10 south , west->east ...
C**** . to
C**** 91-100 north , west->east
C**** Sergei's ordering 1-10 west , south->north
C**** . to
C**** 91-100 east , south->north
C****
INTEGER p_NRM, p_NZS, p_IBMM
PARAMETER (NRM=80,ICM=10,JCM=10, NZS=3, NCM=ICM*JCM)
CHARACTER*80 TITLEZ(*),TITLES(1+NZS)/
* ' LATITUDE BELT FROM XX.X ? TO XX.X ?',
* ' NORTHERN LATITUDES: 23.6 N TO 90.0 N',
* ' LOW LATITUDES: 23.6 S TO 23.6 N',
* ' SOUTHERN LATITUDES: 90.0 S TO 23.6 S'/
COMMON/GRIDC/IBM(8),KZONE(8,NZS+3)
C****
C**** Sergej's equal area grid
C****
C**** Grid constants for latitude zones
REAL BANDS(9)/90.,64.2,44.4,23.6,0.,-23.6,-44.4,-64.2,-90./
INTEGER NUMJ(8)/4,8,12,16,16,12,8,4/
C**** Check grid dimensions
IF(p_NRM.NE.NRM.OR.NZS.NE.p_NZS) THEN
WRITE(6,'(4I9)') p_NRM,NRM, p_NZS,NZS
STOP 'ERROR: INCONSISTENCY FOR NRM NCM NZS'
END IF
C**** Loop over all ZONES
DO 50 J=1,8
IBM(J)=NUMJ(J)
IF(IBM(J).LE.p_IBMM) GO TO 20
For some reason the integer parameters p_NRM (originally called “NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the “INTEGER p_NRM…” line. Also, the original parameter names caused gfortran to raise syntax errors.
D’oh the file I modified in 47 is zonav.f
Looks like diff is real. Andy is getting another month compared to Not Sure. Is that an interger or rounding problem?
Re 47: My rusty mental file cabinet immediately tossed up the phrase “The dollar sign is reserved to Digital”. (A big chunk of my early programming experience was with a PDP-11/73 and a VAX-11/780 in assembler and C. Incidentally, I still have that PDP-11, and it still runs.) In a fair number of older languages the dollar sign is an acceptable part of an identifier along with the usual alphanumerics and underscore, and was often used much like underscore is by C folk.
A dollar sign at the end of a variable was used to denote Strings in BASIC. A$=”HELLO”
Not Sure (#47)
Are you sure that all steps of STEP3 are completing?
I’m still getting a memory error while running trimSBBX, I haven’t had a chance to look since yesterday, I’m just picking it back up again now.
I made similar changes to you for zonaf.f and managed to get it to run. I just made my changes the same as yours but the memory error while running trimSBBX still persists (I thought that the data types of zonaf may be causing the errors with trimSBBX).
For reference I’m developing on Linux using GNU Fortran (GCC) 4.2.1 and Python 2.5.1. I installed a Korn shell to run the scripts (mksh).
I’m also using ftncheck to give a quick heads up on possible fortran problems.
My current patch is available from here
I’m using v2.mean from here, but I’ll switch to a better one once all the steps are running.
My current sticking points are:
1. trimSBBX is stopping with either a memory fault or end of file condition. It appears to be random and I suspect its an issue with data types.
2. Assembling data for Step4_5. I’ve downloaded data from hadobs but I can’t access http://ftp.emc.ncep.noaa.gov, the connection times out. STEP4 is looking for SBBX.HadR2, but I cannot see where this file is created, maybe I should read the manual :))
Mark.
#52 Mark,
Try ftp://ftp.emc.ncep.noaa.gov/cmb/sst/oimonth_v2/
Cliff
re # 47
“For some reason the integer parameters p_NRM (originally called NRM$), p_IBMM (originally called IBMM$), and p_NZS (orig NZS$) were getting converted into wonky floats until I added the INTEGER p_NRM line. Also, the original parameter names caused gfortran to raise syntax errors.”
I recall that parameters with $ were system or shell parameters. Like the name of OS or the size of the memory.
Mark,
You probably have this figured by now, but: SBBX.HadR2 is the output from step 4, not an input to step 4.
Cliff
#(53,54,55)
Hi,
THanks for your comments, they are all helping.
After downloading the monthly SST data I started having real problems reading the data. I then noticed a test program in the README and tried that. It turns out that you have to compile with the option “-fconvert=swap” when using gfortran.
I’m now starting again from STEP0 using this option on all the fortran code to see if this changes the end result.
Mark.
Mark, yes all the of STEP3 is working for me.
Did you just create an empty “SBBX.HadR2” file? That’s what I was going to try.
Ross Berteig (50) I figured it was some sort of obsolete syntax, but it’s nice to be sure. Thanks.
After reading some more it seems that “SBBX.HadR2” should come from the hadodbs site, and that it should be one of the ASCII format files. But which one? There’s no single file that covers the period they say they used (1870-present), and there are none with that name.
If I produce variable type statements for the fortran routines can you use them? How would I get the modified code to you? I have an online place where I could leave the modified routines. I have the STEP0 routines doen if you want to see examples.
re 60.
Dan, good to see you here! and thanks for helping
Re #47: Learn some FORTRAN. Variables with first character I through N (as in INteger) are integers, and all other starting characters become floats. Good luck.
Concerning the Praha-Ruzynì (airport) data, see especially barnie’s fascinating comment #91 at
http://www.climateaudit.org/?p=2018
Let me just point out one trivial observation. There is no “periodicity” of 3-4 years (one that would be related to leap years or anything like that). The reason why these lines are organized to periods that are 3-4 years long is simply that they want to make an adjustment that is continously increasing from -2.6 Celsius in 1909 to 0 in 2007. If you kindly allow me to give my own interpretation, they want to add a 2.6 Celsius degrees change to the trend in the last century, at least to the December values.
The precise adjustment that should be applied to a particular December is a linear function of time and it is eventually rounded to the 0.1 Celsius degree accuracy. That’s why the years end up being clumped into groups of 3-4 years. Does anyone understand what these corrections mean [snip]? Incidentally, if the trend is only added to one month, I would expect the overall trend to be artifically increased by 2.6/12 = 0.2 Celsius degrees (per century) or so, do I understand it well?
If I understand the signs of the adjustments well, they can hardly be justified by e.g. the urban heat island because the sign is wrong. It is the recent temperatures that should be reduced if you believe that they’re high because of urbanization.
One comment related to the “comment” section of Praha-Ruzyne: they say “WWII & Russian occupation”. Let me say that no Czech person would certainly use the word “Russian occupation” for anything that happened in the WWII years. We were liberated in 1945, not occupied. 😉 But that’s probably one of the less serious problems in the file.
Let me just say one thing. I am not accusing anyone of anything, just making a thought experiment. See how temperatures and precipitation depend on the month (in Prague).
If I wanted to make fraud to support a theory that there is warming and it is primarily affecting winters, the best way would be to make old winters look colder. December and January are the two coldest months and the temperature is minimized near these two months – it is stationary so that it looks pretty much constant. So if you reduce the temperature in December, you will create colder winters in 1909 which is precisely what you want if you want to show that global warming makes winters milder. Moreover, the points near the stationary points – in this case, months near the coldest December or January – are the most natural months to be modified because you don’t create too big discontinuities.
These corrections look bizarre to me. It would be good if someone tried to give us a less criminal explanation what’s going on – as well as a more comprehensive analysis which stations besides Prague-Ruzynì were treated in a similar way.
James says:
Sigh. A little knowledge is a dangerous thing. What about IMPLICIT?
Ok guys if you get different results based on remapping memory by commenting out lines then at least one memory bug is living somewhere else and has not been found yet. Suggest somebody run the thing with valgrind in full verbose mode.
Mark #14, wouldn’t it be easier for you and others to publish the full patched tar.gz archive of the full code and data?
I guess there would be no problem, its not that large.
I’ll make a full archive up of my changes and post them up at the same page as my patch.
Mark.
I think I’m able to perfectly replicate the GISS station combining method (STEP1). The key thing for extending the 2-station method to more stations is simply to always create an “intermediate reference station” for comparing with the new records. The final combination is then simply the average of the shifted records. Additionally, there seems to be no rounding issue after all. I guess rounding problems arouse as previously we tried to replicate with rounded annual means (=>rounded bias). My code seems to replicate perfectly all the stations I’ve tested so far (Bagdarin, Praha, Joensuu, Gassim, Bratsk).
A request: please check the workings of this code for other stations. I’ve spent already too much time with this, I should take a break 🙂 Also, if there are people out there who are fluent with R, it would be nice to have the code also in R 🙂
I’ve sent my code to Steve, so I’ll might put it up somewhere if it is too hard to copy from here. Here’s the main function:
--- cut here ---
function combined=GISScombine(temperatures);
% combines stations ala GISS
% by Jean S (9/2007)
% assumes integers; and full years starting in Jan
% Minimum overlap for combining
min_overlap=4;
[years,stations]=size(temperatures);
% 1st station is assumed to be the "best"
fixed_stations(:,1)=temperatures(:,1);
no_fixed=1;
ref_station=temperatures(:,1);
% annuals for all stations
for k=1:stations;
annual(:,k)=GISSannual(temperatures(:,k));
end
available_stations=2:stations;
% repeat until no new records to add
while ~isempty(available_stations),
[ref_annual,ref_seasonal]=GISSannual(ref_station);
% Find the annual mean matches
overlaps=repmat(isfinite(ref_annual),1,length(available_stations)) & ...
isfinite(annual(:,available_stations));
% Choose the longest match (if equal then the last maximum)
[overlap_years,overlaps_loc]=sort(sum(overlaps));
if overlap_years(end)
Here's the function for the annual means:
--- cut here ---
function [annual,seasonal]=GISS_annual(temperatures);
% Calculates the GISS seasonal and annual means
% by Jean S (9/2007)
% Assumes full year input starting in Jan
years=length(temperatures)/12;
temp=reshape(temperatures,12,years)';
seasonal=ones(years,4)*NaN;
annual=ones(years,1)*NaN;
% calculate the reference means and anomalies
ref_mean=nanmean(temp);
anomal=temp-repmat(ref_mean,years,1);
% DJF
seasonal(2:end,1)=mean([temp(1:end-1,12) temp(2:end,1:2)],2);
season=[NaN anomal(1,1:2); anomal(1:end-1,12) anomal(2:end,1:2)];
I=sum(isnan(season),2)==1;
seasonal(I,1)=nanmean(season(I,:)')'+repmat(mean(ref_mean([1:2 12])),sum(I),1);
% MAM
seasonal(:,2)=mean(temp(:,3:5),2);
season=anomal(:,3:5);
I=sum(isnan(season),2)==1;
seasonal(I,2)=nanmean(season(I,:)')'+repmat(mean(ref_mean(3:5)),sum(I),1);
% JJA
seasonal(:,3)=mean(temp(:,6:8),2);
season=anomal(:,6:8);
I=sum(isnan(season),2)==1;
seasonal(I,3)=nanmean(season(I,:)')'+repmat(mean(ref_mean(6:8)),sum(I),1);
% SON
seasonal(:,4)=mean(temp(:,9:11),2);
season=anomal(:,9:11);
I=sum(isnan(season),2)==1;
seasonal(I,4)=nanmean(season(I,:)')'+repmat(mean(ref_mean(9:11)),sum(I),1);
% Annual
annual=mean(seasonal,2);
ref_mean_season=nanmean(seasonal);
season=seasonal-repmat(ref_mean_season,years,1);
I=sum(isnan(season),2)==1;
annual(I)=nanmean(season(I,:)')'+repmat(mean(ref_mean_season),sum(I),1);
--- cut here ---
The input should be in integers such that every record is in a column. The first column is assumed to be the “best” record (which is not selected according to the data, but according to a preference list. See Steve’s post and Hansen’s code for details). Usually this is the only one extending to current times. Here’s an example call (Bratsk; I sent the file also to Steve):
bratsk_data=load('bratsk.txt');
bratsk_data(:,2:end)=10*bratsk_data(:,2:end);
originals=bratsk_data(:,[6 2:5]);
combined_bratsk=GISScombine(originals);
max(abs(combined_bratsk-bratsk_data(:,7)))
Have fun!
James (62) You think they could’ve mentioned that here. I guess the right way to fix those parameter names is NRM_ IBMM_, etc.
Ok, the first code was cut due the smaller than sign. Let’s wait for Steve to post the code.
Hi,
I have now put a tgz archive at http://www.zen103403.zen.co.uk/patches/patch.html
This archive is the plain sources with my current patch applied.
Mark.
JEan S code online at http://www.climateaudit.org/scripts/station/hansen/jeans/
Andy, what version of tail do you have? I’m using GNU tail 6.4:
You didn’t have to modify line 48 of “do_comb_step0.sh”?
Also, I appreciate your efforts in setting up a Google project for us. I haven’t participated yet because I’m wondering if there’s going to be a “CA branded” project as some have talked about. I also think that git might be a better choice for source control than svn. With git we’ll be able to have several different patch sets concurrently. I like this because I’m focused on creating the smallest diff that will yield code that runs on a commodity Linux system. I’m not going to try to fix memory leaks or anything else. If the original code leaks, then hopefully my modified version will leak in exactly the same way. This is to preempt objections about the correctness of modifications. I can see how others would like to fix things as they find them. If we have multiple patch sets we can accommodate these two approaches simultaneously. If it turns out that disparate patch sets produce the same outputs for the same inputs they can be merged later.
re 47 and 65.
U know Fortran? Download the dusty deck and get crackin. Pick a routine.
document it.
RE 66.
Somebody can run this through Pychecker and Valgrind while other folks try to get this
running.
Divide and conquer guys. Do what you are best at. report back.
Herding cats isnt that bad when they are smart cats.
RE 74.
I think once somebody gets the pile to compile then it makes sense to mv to
a more amenable enviroment. As it stands we are hacking the blog paradigm
to do a remote coordinated debug.. or something like that.
Re #74, I didn’t have to modify line 48. Tail version as follows (from CentOS 4.5 / Redhat EL4).
I agree with you on svn. Figured something was better than nothing, though.
I was thinking this morning that I can probably best leverage my limited Python (and non-existent Fortran) skills by grabbing the changes you guys post that I need to get my platform to do a clean run and publishing them in the google code. I’m pretty far behind you since I had to build a new sandbox for this, but I can take the time to document until I catch up (if possible :-))
Re: #65, Richard
What about IMPLICIT? If it isn’t used in the code and the comment I was replying to was confused about variables being floats, explaining IMPLICIT won’t help.
A little knowledge can be dangerous, but almost zero FORTRAN knowledge will make this project really difficult.
Re: #65, Richard
What about IMPLICIT? The original post didn\’t understand why certain variables were floats, and my explanation clarified that. If IMPLICIT isn\’t used in the code, why would I mention it?
Yes, \”A little knowledge can be dangerous\”, but almost zero FORTRAN knowledge and this project can be a lot more difficult.
OK, I’ve uploaded a clean working copy here. I tried to test it on a clean environment, but I ran into this, so YMMV.
Re #75 Mosher
I\’ll get on it but you won\’t see me here. I\’ll be on the Google Code site, working in SVN and using the wiki there. This blog does not lend itself well to development, and since it isn\’t threaded conversations are a real pain…
Hi,
From what I can make out the program named zonav, called by do_comb_step3.sh, attempts to use fort.11 as its input.
CALL SREAD(11,ML ,AR(1,IB),WTR(1,IB),LENR(IB))
But this file has been renamed by the previous operation in do_comb_step3.sh
mv fort.11 BX.Ts.${label}.$rad
so it can never run with valid data.
zonav writes its output to fort.10 which is what trimSBBX is expecting to use, and I think that is why I couldn’t get trimSBBX to run.
I am left thinking that these two operations may not actually be required for STEP4_5.
Has anyone else looked at this yet?
Mark.
People used variable names beginning with I-N for integers by convention. I don’t think they have to ‘become’ anything. Web search engines are your friend, do a search on: implicit fortran
David,
Fortran does not (or at least did not) require one to declare a variable before using it. Variables that started with the letters I-N were declared integers, everything else was declared float.
This is another reason why Fortran variable names tended to be as short as possible. If you happened to misspell a variable, instead of declaring a parse error, Fortran would just declare the misspelled name to be a new variable, and continue happily on it’s way.
Hi,
I think I’ve made some progress on STEP4.
The elusive SBBX.HadR2 file is, I think, SBBX1880.Ts.GHCN.CL.PA.1200 (one of the files from STEP3) renamed.
After downloading all the oiv… data files and unpacking them into the input_files directory, then renaming SBBX1880.Ts.GHCN.CL.PA.1200 to SBBX.HadR2 as described above, I am beginning to get somewhere (possibly somewhere wrong!).
As I said earlier, the oiv data files are in big-endian format and therefore need to be accessed differently than the others.
To get around this issue set the following environment variable (if you are using gfortran on an intel platform).
GFORTRAN_CONVERT_UNIT='swap;native:1-10,12-99'
This tells the fortran runtime to use byte swapping on only channel 11. Doing this will enable convert1.HadR2_mod4.exe to read the downloaded files.
I am now able to run convert1.HadR2_mod4.exe and see the program appearing to work.
Unfortunately it is now stopping with an end of file condition.
Mark.
Hi,
Lol, its just as easy to do
GFORTRAN_CONVERT_UNIT='swap:11'
Has the same effect.
Mark.
Mark (86) Interesting step 5 requires both SBBX1880.Ts.GHCN.CL.PA.1200 and SBBX.HadR2. I guess this can be right since SBBX.HadR2 is modified in step 4.
78. ANDY
thats cool. I know SteveMc and John Goetz seem to be focusing on Step 1. And I have
avoided the python since I’d just beclown myself ….
A really cool thing would be a stand alone Step0 Step 1 Module. Basically
Step 0 and step1 Take in all the land data and do a hansen dance and then output.
A standalone EXE that did that sh*t would be huge.
once verified we could asses the hansen Scribal dance.
#85: I stand corrected. It has been a while! It seems so crude now days. If I were a scientist demanding conciseness…
#89. I’m more interested in replicating this in R (or Matlab) – in large part, so I can see what they are really doing. The blizzard of subscripts in Fortran and Python makes it very inelegant programming.
I can see my way towards Step 0 implementation already (although as I observe in another post) it looks like one needs a hypothesized obsolete USHCN version to precisely replicate the results presently online. Their Y2K patch is really goofy and can be circumvented by using up-to-date USHCN versions.
After I look through Jean S’s MAtlab script a little more closely, I should be able to have a simple R implementation of Step 1 as well.
re 85.
I’m having PTFS right now. Post Tramatic Fortran Syndrome.
So Time out for a funny story.
1987. I’m sitting in a class taught by Mason Woo. ( google him he’s cool)
I’m sitting behind this DDG blond.
Mason: how many of you write C?
The whole frickin class raises their hand except the lovely lady.
Mason: Miss, what language do you know?
Babe: Fortran.
At this point I lean over to my partner in crime and say.
” She may be cute, but she’s no rocket scientist”
Then Mason asks everyone in the class what they do…blah blah
blah blah…
Miss Fortran babe: I program the guidance systems for ICBMs
DOH!
RE #92: LOL…. so, did you submit your name for a date? 😛
re 93.
She wrote Fortran for christs sake. If I asked her to pass me a pointer she would
have dissolved into tears and core dumped.
Instead, I cruised El Camino and payed my tributes at Andy Capp’s tavern.
Yes, and at least the Python code I looked, was rather terrible. Not to talk about the Fortran code (Step2), which is almost impossible to read, at least, for a non-Fortran programmer. Anyhow, I have a feeling that the best pearls of Hansen’s code are still hidden somewhere in that Fortran spaghetti. If I had even decent understanding of Fortran, I’d take a very careful look at padjust.f (which seems to contain the homogenization code) …
#94: She would have spent 80% of the time trying to debug you. 🙂
re 95.
Thats the big mystery routine.
RE: #5 – With a quick scan of the thread, I did not see follow up comments. It bothers me too. I’ve got no issue with a F to C conversion, but it’s the reservation of an integer value that I cannot understand. That is, unless the goal is illogical built in rounding via automatic truncation of everything to the right of the decimal point.
I was reformating padjust.f after pulling it up in notepad to see how much of it I could still understand after decades not having seen any fortran. Most of it was pretty self-evident but I have a couple of questions.
1 What is the Calt which appears a couple of places after a “then”?
2 What happens in a “call adj(….)”?
I’d expect both of these call a program external to padjust, but where are they at?
Thanks 🙂
Andy (46) I ran STEP0 again on a different machine (gfortran 4.1.2, pdksh 5.2.14, AMD 64) and it gave me exactly the same file. I wonder if the version of tail is the difference. My work_files/t_hohenpeissenberg file starts in 1879 and ends in 2003.
Interesting – so does mine. Tar up your work files and put up a link to them. I’ll pull them down and diff them against mine.
Dave [99] Anytime the Fortran compiler sees the character ‘C’ in the first “column” of a line of code, the whole line is treated as a COMMENT.
Looks like alternate ‘IF()’ statements were commented out but left in. I wonder why…
Andy, my work_files are here.
Dave [99] ‘alt()’ is a subroutine which is coded at the end of the padjust.f source code. It claims to apply ‘linear approximation’ (so sez the comment) but it’s going to take a while to understand what it’s really doing.
re: 102 Steve N
Thanks!
re: #104
Well, duh! I was distracted by the C**** (which I did realize was a comment) and missed the “subroutine” above. Thanks again.
Re #’s 100, 101 and 103
Differences are in v2.meany and v2.meanz
I bet you’re using a 32-bit system. That’s why all the executables differ. But the delta for the text output is exactly the same. Interesting.
Mystery solved. I decided to start again from scratch, including re-downloading the data files and presto! I get the same file you do. I tracked the difference down to v2.mean:
So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point I’m really not sure (pun intended) of what we consider authoritative.
I looked trough padjust.f. It seems like a normal research code. Unfortunately 🙂 Basically it reads input files and writes adjusted result to corresponding output files. If input data is marked bad it is skipped. The data is looped by year (staring from December) and the adjustment is calculated from yearly values. Then it is added to the monthly values for the given range. Months marked bad are skipped.
What makes the result unprecitable are the year limits
subroutine adj
“———————-
sl=sl1
if (iy .gt. knee) sl=sl2
iya=iy
C-H Lower limit
if (iy .lt. iy1a) iya=iy1a
C-H Upper limit
if (iy .gt. iy2a) iya=iy2a
iadj=nint( (iya-knee)*sl-(iy2a-knee)*sl2 )
——————–“
When creating diffs, you may find the -u option to diff gives you more readable output.
Re #91,
Steve, I’m working on an R port too. Check out the page on Stikir.
What do you make of this in antarc_comb.f?
if(icc0.eq.iccp.and.idp.eq.id0) then ! should not happen
write(*,*) 'ID already used',icc0,id0
stop 4
end if
I read that as saying that there should be no station overlap in v2_mean and the antarc?.txt files. Unless I’ve made a mistake, I think there is.
#110:
So how exactly is the adjustment calculated? It is described in Hansen (1999) as
Hi,
Right from STEP0/antarc_to_v2.sh there is a possibility for error. When the shell script is extracting data it does a grep in the .list file using the station name as a key to extract the station id.
‘Faraday\Vernadsky’ (as it is defined in antarc1.txt) is defined in antarc1.list as ‘FaradayVernadsky’.
It can only be that the ‘\V’ is detected as in invalid backslash sequence and replaced with a ‘V’ for it to work.
Mark.
#109 Not sure
“So that file has changed since I downloaded it on Saturday! It has a modification date of today!? At this point Im really not sure (pun intended) of what we consider authoritative.”
Does this look like someone is making corrections as we speak? Perhaps on the basis of reading this blog? Is Reto Reudy making changes to the ‘simplified version of the code that real scientists would be interested’ in a the moment and updating the files on GISS using it.
KevinUK
Could someone who’s compiled Step 1 pause for a moment and produce some intermediates for the Step 1 crossword puzzle. Could you do the Step 1 calculations for (say) Bratsk “22230309000. This has 5 dset=0 series which are added one by one. Could you report on the results for each step so that I can benchmark replication in R (and confirm what is going on in this step.)
#107. It looks like some 2007 values have been updated. NOthing untoward in that in a system that is producing monthly results.
Something looks wrong in GISTEMP_source/STEP0/USCHN2v2.f, line 18:
itemp=-9999
The problem I see is itemp is declared as an integer array. Gnu Fortran 3.4.4 compiler halts on line 18 with an error. I assume the above assignment is supposed to initialize all indicies in itemp to -9999.
I fixed this by commenting out line 18 and adding the following at line 27:
itemp(m)=-9999
Any thoughts? Maybe someone who is not get a compiler error can verify the assignment is properly initialzing all indicies in the itemp array.
Re # 118.
In Fortran90 the statement itemp = -9999 initializes all array elements I think, itemp(m) initializes only element m. Probably more correct is itemp(1:m) = -9999.
Re my #112
There is overlapping station data between SCAR and GHCN, but they are maintained as seperate time series in v2_meanx using the duplicate field (character 12 in ID I think).
Deduced from antarc_to_v2.sh:
Duplicate number 7 is the australian data in antarc2.txt
Duplicate number 8 is surface station data in antarc1.txt
Duplicate number 9 is autom. weather stn data in antarc3.txt
re # 118
Absoft FORTRAN 77 Compiler ver. 8.2a, (Mac OS X 10.4.10), also gives the error. A local DO ENDO can be used for a fix.
Re #’S 107 & 117, I had listed this mismatch as an issue on the Google code page and speculated that it may be due to different input files being used. Glad to know it’s resolved.
Steve Mc, is it possible to host a frozen copy of the input files in the CA downloads area for testing?
Re #118,
In FORTRAN 90 and 95, entire arrays can be assigned a value with a single statement of the form:
itemp = -9999
In FORTRAN 77, this was not possible. One had to assign each element separately in a do loop.
Steve (116) The step 1 results for Bratsk are here. The full results for step 1 are here. These are with Saturday’s data file. I can re-run it with the latest data file if you like.
anonymous: That is, every time a variable is assigned, it (potentially) takes on a new value AND a new type.
This is something which has infuriated me for years.
Kernighan & Ritchie [& their ilk, such as McCarthy] made a terrible, disastrous error when they decided to go with poor typing, and character-driven, business-oriented, ASCII-ness of data values, rather than binary, scientific-oriented, strong typing of data, and almost every language which has succeeded them has made the same mistake, and it plagues the discipline like a cancer to this day.
I know of precious few languages which have strong typing of mathematical quantities: languages which support [inherently] a distinction between 8, 16, 32, and 64 bit integers [preferably signed & unsigned]; 16, 32, 64, 80/96, and 128 bit real floats; and 32, 64, 128, 160/192, and 256 bit complex floats.
Or even which are capable of understanding [inherently] the distinction between 8 [ASCII], 16, and 32 bit Unicode characters, etc.
Instead, what we’re left with is inanities such as:
And the very fact that someone would introduce a poorly-typed language into any implementation of a mathematical calculation raises a huge red flag.
Lucius Vorenus (#125)
This is probably off topic, but you could have got the correct result if you had typed
window.alert(x + " + " + y + " = " + (x + y));
(Note brackets round expression).
It is the ECMA specification that specifies that the + sign concatenates the values if there are strings.
The first x would be treated as a number, but when the string object is encountered everthing is converted to a string object and concatenated. The objects within the ‘(x + y)’ are all numbers so the x + y is evaluated as a numeric expression and then converted to string.
I don’t think that its the typing of the object thats the problem, its when you convert between types that problems arise. Lack of knowledge about the particular language that you use is also a factor (not directed at you).
I personally think that objects should be typeless until the point that you do something with them, and then they should adopt a type that is suitable for the operation being performed on them. If there is no suitable representation of that object for the particular operation then an error should be raised.
Why should numbers be given special preference?
Mark.
In a different thread, Steven Mosher said,
Does anyone know where to get this existing SBBX.HadR2 ?
Mark.
Re #125, Lucius Vorenus
That’s a bit unfair. The original K&R required explcit declaration of type, and the C compiler I was using 20 years ago generated an error if you tried to make an assignment between types without an explicit cast.
I entirely agree with your general point though – I still always switch on all available strong typing options.
Re #126, Mark
I think this relies too much on perfection from the programmer. It’s too easy to make small mistakes like assigning a float to an integer when you didn’t mean to. When that happens, I would prefer it if the compiler points it out as an error. If the assignment is what I meant, then I can override with an explicit conversion.
Strong typing means slightly more work for the programmer, but a big increase in confidence that the code is doing what you think it is doing.
Re #127, GISS added the following in the revised gistemp.txt included the 9/10/07 source code release.
This prompted me to nose around their ftp site, and I located a SBBX.HadR2 at ftp://data.giss.nasa.gov/pub/gistemp
Andy (#130)
Thats great, thank you very much. I’ll start from there.
Mark.
Hi team.
I’d love to get in on this but my current workload prohibits me. I’d just like to leave this link to (imho) a good Fortran on-line reference for those who are frustrated.
http://www.star.le.ac.uk/~cgp/prof77.html
BTW I usually work with an engineering tool used for verification of safety critical systems. The tool comprises of ksh/perl/fortran/C (latter two cross-compiled) software. Sometimes we cross-compile it with C++ and ADA too. That just happens when you keep a tool around for a while, so I’m not surprised at what you’re reporting about “inelegant programming”.
Good Luck!
Where did all the discussion of testing the code go? No discussion of results or problems. I’m new to this site and find this very interesting. Would like to continue to follow the progress. I have extensive experience with Fortran from the early 60’s to the mid 90’s might be able to help some.
Hey Lynn.welcome aboard. read the thread and contact the people working the issues.
FWIW, no changes have been made to the “official” GISS code in the last few days (can’t say the same for the data 🙂 )
Has anyone gotten all the source code steps to run yet?
I am with Lynn. Can someone summarize where “the gang” (that is the varied snd cosmopolitan computer geniuses, would be hackers, programmers in forgotten languages and those who remember punch cards) is with respect to getting the code to replicate one or other versions of the data?
Please keep in mind that the “official version” of the code on Sept 7 introduced a change in data provenance that was inconsistent with the dset=1 versions of USHCN sites that I downloaded earlier in the day on Sept 7 (The timing being a coincidence.) They changed the online data versions some time in the next few days. This has been a little tricky to sort out.
Hi,
I’m currently stuck on stage4_5 with an end of file condition, so, as a distraction I set about rewriting STEP0 in ‘C’ (after all, I wouldn’t be a programmer unless I had the urge to rewrite every bit of code I see).
I’ve done everything up to dif.ushcn.ghcn.2005, which I’m just in the process of converting.
I’ve pretty much stuck to their original algorithms, maybe changing gotos for while/do-while loops, and this far, the output compares exactly to that of the Fortran code.
Mark.
I too am with Lynn. It seems there’s been a loss in momemtum on this thing (coming from one that has contributed nothing!).
I am very interested in the progress of understanding the original code and the potential development of an enhanced version. Please keep us up to date guys.
Ok, STEVEMC…
Here is what I suggest. Mark is Furthest along. Mark if you do a 2 paragraph write up
Of where you are and what you need and ASK STEVE MC to post it as an article you will
Do two things.
1. Become famous on CA.
2. Refocus attention for thse who dig this.
Basically it will give you top billing.
SteveMc? what say ye?
And Mark, Your having another EOF in step 4 step 5.
Is it random or deterministic?
Apparently there is no EOF error trapping or exception handling so we are stuck with “printf” debugging.
Which routine? sread?
Steve
I think I need to dig out a description of the required processes for stages 4 and 5.
Stage 4 adds years to to the SBBX.HadR2, but I can’t seem to find out which years need to be added (and do they even need to be applied to the version of SBBX.HadR2 that I have?).
The error I’m now getting is ‘STOP ERROR: MONM0 TOO SMALL’ in SBBXotoBX.f during stage 5.
I will do a short write up of what needs to be done to get to where I am. I’m away for a while today and I start my new job tomorrow, so busy times for me.
Mark.
# 142 Mark
MONM0 is used only to reserve memory. Why it’s not big enough, is impossible to say, since the comparition value comes from input files. The “standard” practice is to make a note and double MONM0 in the parameter line.
Here’s my R port of Step 0.
I’m ready to get stuck into Step 1, but am feeling a bit daunted by the Python code. Does anyone have a good enough understanding of the record combination algorithm to post some pseudo-code?
Mike
Mike, see #69/#73 above.
#145. I’ll do a post on Step 1 and link to JEan S and my code. I’ve worked through the problem concurrent with Jean S and have reconciled in R up to annoying small differences of less than 0.5.
#144. By the way, Mike, that looks like a clean way to present an implementation of Step 0. I’ll try to take a look in light of some of the version issues that have been discussed recently.
Great..
A few notes on my R implementation:
– It’s based on my interpretation of the Fortran source and hasn’t been verified.
– All the input filenames were prefixed with “GISSTEMP_” and suffixed with “.txt”
– It relies on Importing USHCN data and Importing GHCN data into R workspaces, which where done separately.
– It relies on an awk script that tidies up the messy antarc*.txt files. You can get both the script and the resulting CSV file from that link.
So it might just look clean because I’ve hidden my mess 😉
Can someone produce or point me to padjust.f input files fort.1, fort.12 .. fort.17? And preferably also the output files fort.22 .. fort.27?
Has anyone got through to Step 2? If so, I would like to get some info from your run.
I’ve cross-posted the R code to my wiki and it now runs on the site using pre-loaded GISS datasets. I had made some code changes, mostly because of differences in data formats. I’m reasonably sure I haven’t broken anything in the process.
I see a discussion on STEP 2 has started. Am I right in saying that there’s been no R code yet to replicate the concluding pieces of STEP1
– running the above for each ID, and picking out the combined version
– manual changes listed in Ts.discont.RS.alter.IN, combine_pieces_helena.in, Ts.strange.RSU.list.IN
I also notice that the GISS input_files sumofday.tbl, mcdw.tbl, ushcn.tbl, & v2.inv are not required in Steve’s posted code for STEP 1. What are they for?
Regards, Mike
Sorry, meant to be post previous message on the STEP 1 thread.
I converted padjust.f (STEP 2) to MATLAB. If somebody can provide the input files, I could debug it 🙂 Anyway, the code is here http://koti.mbnet.fi/ppppp/padjust_H.m
Question about STEP0:
antarc_comb.exe appears to take two (filename) arguments.
The first is ‘v2_antarct.dat’ (produced by ‘antarc_to_v2.sh’).
What is the second?
Hello friends,
I have taken the v2.mean file from GHCN and I have imported it to an SQL Server 2008 database, to explore the data with the tools I am used to.
I imported the raw data to a new table and then runned a SQL script to transform -9999 into Null values.
The create table script is:
The script to handle the null values is here:
Then I imported the station data into another table, even if I wasn’t able to identify all columns, the porpuse was to be able to establish a relation with the mean data. This table has the following script:
My next step was to create a new table that in my opinion addresses better the Normals Forms of relational database (http://en.wikipedia.org/wiki/Database_normalization), here is the script to create this new table:
On this new table I will have one record per value, which makes future queries much easier.
To copy data from the original format to the new table I used the following code:
Then I queried this table to obtain the global average of the values, with this querie:
The result was: 120
Does this mean that the whole dataset presents a raw average deviation from the reference period of 1.2 ºC ?
Sorry, it means the average temperature of this period is 12 ºC
I just ran across this comment on the GISTemp Updates page:
I don’t understand how bugs discovered in 2009 cannot have affected results of calculations made in, say, the 1990s.
Re: Dan Hughes (#159),
Surprising as that may seem, that is the case here.
The first bug I reported was located in unreachable code in antarc_comb.f, and would have had serious consequences but for the fact that it could never be executed. The danger of course would be that at some future date the code might be taken over by someone with experience only of structured programming, who might make the mistake of believing that FORTRAN code of a certain era was always first designed and then coded – and who might then “fix” the unreachable code.
The second bug was in USHCN2v2.f, where the format statement used to read ushcn.tbl selected the wrong (and blank) column for a single digit numeric value, but as the FORTRAN code reads a blank as zero, and as all the values in the correct column are in fact zero for ushcn.tbl, again the current calculations were not affected. The other .tbl files, where the corresponding values would not have been zero, are read instead (correctly) in Python.
The coders of GISTemp seemed to have liked to live dangerously. I’m curious whether this lucky streak will still hold should I find any further bugs as I work my way through the later steps.
On reflection I’m beginning to think releasing the code could have been a clever strategy to get it debugged at no cost before old-timers like myself, who started out with FORTRAN II and may still be able to put ourselves in the mindset needed to follow some of the more esoteric code, die off.
Perhaps this could be our final business opportunity – “we can review your ‘legacy’ FORTRAN II/IV, ALGOL, PL/1, 360 ASSEMBLY, COBOL, APL, RPG II … code; GOTOs welcome; all common, even if now deprecated, stupid programming tricks accepted”?
Thanks for the info Peter.
I would like to help get the code up and running. I have all the FORTRAN code compiled, but I did not yet look into the details of that coding. I’ve noticed that all the independent efforts started back in 2007/2008 seem to have come to a dead stop. I remain amazed that whatever it is that prevents independent porting of the code so problematic. Can code actually be so tightly tied specific machines and OS configurations?
Drop me an e-mail dhughes4 at mac dot com.
I have GIStemp running on a x86 Linux (RedHat 7.2) box using the G95 compiler.
I’ve run it up through step3. At step4_5 I hit the ‘endian’ problem with the read of the SBBX.HadR2 dataset in Step5 SBBXotoBX.f file (read(11) at line 112 or so) and convert.HadR2_mod4.upto15full_yrs.f with the same file. I’ve not yet reached the point of finding out if the oisstv2_mod4… et al files have similar problems.
So I’m now looking at making an “endian conversion tool”. We’ll see. It might be easier just to get an old Macintosh PowerPC box (also big endian) and put linux on it with the G95 compiler…
The major issue I found with the G95 compiler was the use, by GIStemp, of a non-standard extension to the language for doing data declarations in a variable declaration. Where they have:
INT foo(3)/1,2,3/
You need instead:
INT foo(3)
DATA foo /1,2,3/
FWIW, I’ve moved the source files into a ‘src’ directory, the executables into a ‘bin’ directory and added a Makefile to do all the compiles (pulling that out of the other scripts). At this point I have a version that works pretty well (up to the bigendian point…) compiles well, and generally seems usable for testing.
Oh, BTW, anyone wanting to ask question of me can do so at:
where I’m documenting my “discoveries”…
One Trackback
[…] the graphic below compares the Detroit Lakes version from Step 0 from Not Sure not here to the Dset=0 version currently online at NASA GISS. The results are obviously similar in recent […]