MBH98 Source Code: Status Report

I reported recently on that the recently archived (July 2005) source code multiproxy.f shows that the cross-validation R2 statistic was calculated and not reported.

Today, I’m merely going to summarize some collation details from my inspection of the code, listing input files, output files and providing a lexicon of variables. The source code requires a variety of input files which do not exist at the existing data archive. Under the circumstances, I would have expected punctilious archiving, but this hasn’t happened. In addition, if one crosschecks the steps documented in the new source code with the replication issues listed here , rather few of them are covered in this current code dump.

I’ll return to this in a future post, with a particular consideration of Preisendorfer’s Rule N.

I went through the fortran code here and collated information on directories and files used to read in information (see Table 1 below). There are 3 types of data: gridcell temperature, proxy series and information files (rosters). None of the directory/file combinations match any files at the mmost recent (July 2004) "authoritative" archive http://holocene.meteo.psu.edu/shared/research/MANNETAL98 (formerly ftp://holocene.evsc.virginia.edu/pub/MANNETAL98/). The program multiproxy.f is in working order because Mann has used it within the last 2 years to emulate our calculations. However, it cannot run on files either at http://holocene.meteo.psu.edu/shared/research/MANNETAL98 (formerly ftp://holocene.evsc.virginia.edu/pub/MANNETAL98/) or at the older archive made public in November 2003 (ftp://holocene.evsc.virginia.edu/pub/MBH98/) [no longer available- I have a copy.]

Table 1 is a collation of all input calls in multiproxy.f, showing the directory and file name. There are 3 types of files: temperature, proxy and information (rosters, locations etc.) Comments on these directory/files follow the table.
Table 1. Input Files to MBH98 Source Code multiproxy.f

File

Page

Directory

File

Contents

1

7

DATA/JONESBRIFFA/MONTHLY

glb-train-month**.int

Gridcell temperature

2

8

DATA/JONESBRIFFA/MONTHLY

globe-1902.dat

Gridcell locations

3

26

JONESBRIFFA/1854-1993/

globe-1854.dat

Roster

4

26

JONESBRIFFA/1854-1993

globe-1854.mask

Roster

5

26

JONESBRIFFA/1854-1993

glb-long-all*.int

Gridcell temperature series

6

26

JONESBRIFFA/1854-1993

glb-long-cold*.int

gridcell temperature series

7

26

JONESBRIFFA/1854-1993

glb-long-warm*.int

Gridcell temperature series

8

35

DATA/PROXY-ANNUAL/

names-longtemp

Roster

9

35

DATA/PROXY-ANNUAL/

temp-1820.loc

Roster

10

35

DATA/PROXY-ANNUAL/

nome2(j)

Temperature annual

11

14

MULTIPROXY/DATA/nome0/

nome(j)

Proxy data

12

13

MULTIPROXY/DATA/

multiproxy.dat

Roster

13

13

MULTIPROXY/DATA/

multiproxy-proxy.dat

Roster

14

13

MULTIPROXY/DATA/

multiproxy-instr.dat

Roster

15

31

 

quinn.dat

Nino index

None of the temperature or information files can be matched in the FTP/MANNETAL directory or FTP/MBH98 directory. First, the directory nomenclature is inconsistent with directory nomenclature in both archives. Second, nearly all of the file names lack matches (a couple of output file names do match.)

1. The monthly gridcell temperature files glb-train-month**.int are presumably derived from the file FTP/MANNETAL98/INSTRUMENTAL/anomalies-new (archived for the first time in July 2004) by taking a subset and carrying out interpolations. However, the code for this step is not provided. I have been unable to replicate Mann’s selection of 1082 gridcells using the criteria reported in the Corrigendum SI. The code for making this selection is not provided.The file globe-1902.dat looks like it might be the same as the archived file FTP/MANNETAL98/INSTRUMENTAL/gridpoints.loc, but this is only a guess.

2. The file globe-1854.dat is a roster of latitude/longiture gridcell identifiers. It is not archived anywhere.

3. The file globe-1854.mask presumably identified the 219 gridcells said to have “nearly continuous” records from 1854 and illustrated in a diagram in MBH98, but it is not archived anywhere.

5-7. The files glb-long-all*.int, glb-long-cold*.int and glb-long-warm*.int are presumably derived from the file FTP/MANNETAL98/INSTRUMENTAL/anomalies-new mentioned above. However, they are not archived in this form.The file names-longtemp cannot be identified in any current archive.

8. The file temp-1820.loc cannot be identified in any current archive. It is a list of 10 “long” series. There will probably be some connection to the file FTP/MBH98/INSTR/TEMP/temp.loc, but this has more than 10 series.

9. The files PROXY-ANNUAL/nome2(j) are presumably 10 “long” temperature series. Again this is perhaps connected to the long series in FTP/MBH98/INSTR/TEMP, but the precise series are unidentified.

10. The file multiproxy.dat does not exist in any archive. Nothing remotely like it exists in FTP/MANNETAL98. The older archive FTP/MBH98 contains a roster file FTP/MBH98/PCS/multiproxy.inf , which has the same sort of structure as is contemplated for the file multiproxy.dat.

The file multiproxy.inf has directory calls corresponding exactly to directories in FTP/MBH98 e.g. CORAL/MISC and the file names correspond to file names in the FTP/MBH98 directories e.g. redsea-o18.dat. For the non-PC series, I can see how multiproxy.inf works. For the PC series, it’s still unclear. Here MBH98 calculated PC series in steps – something that was not mentioned in MBH98 itself. Although the Corrigendum says that PC series were re-calculated in each step, there is no evidence of this in the data archives, with PC series calculated fresh for some steps and some networks, but not others and no rationale has ever been provided.

In order to find the right directory, an extra subdirectory level has to be included e.g. TREE/VAGANOV/BACKTO_1750 is one of the PC directories in multiproxy.inf. This method can be used to pick out PC series from different BACKTO subdirectories, but you’d need to have a separate multiproxy.dat for each calculation step (11 in all). None of these are provided.

I think that there may be different rosters with names like backto1820.dat (which was a file emailed to me in April 2003 by Scott Rutherford and is identical to the file multiproxy.inf.) This suggests that there would be corresponding files for other periods, which have not been archived.

In the new archive FTP/MANNETAL98, there are a set of files sort of like this FTP/MANNETAL98/datalist1400.dat etc., but the read-in form of these files is not consistent with the read-in format of multiproxy.dat as the directory information is not in these files.12-14. 12-14. The individual proxy series are read in from files with the form MULTIPROXY/DATA/subdirectory/filename.dat. This form is inconsistent with a read-in from FTP/MANNETAL98 since the proxy series have been collated into matrices in each calculation step. It is consistent with the forms in FTP/MBH98 as long as the multiproxy.dat files are specified right.

The table here shows the various output files written from the program multiproxy.f. The few files that have been archived under the multiproxy.f filename are indicated below with their location: for the temperature principal components calculations, the eigenvectors, eigenvalues and re-standardized and annualized PC series were archived at UMASS in 1999 and are also at FTP/MANNETAL98. The gridcell standard deviations were archived in 2004 at FTP/MANNETAL98. There are 5 rpc series archived in both locations, but these appear to be spliced and the staging is hard to reconcile. The files betas*- and corrs*- are statistics which are collated in stats-supp.htm, subject to withholding discussed yesterday.

I’ve provided a lexicon here of variable names in Mann’s fortran program here in case anyone’s interested.

To make this operational, either the missing input files are required or the programs transitioning from the data as archived to the files referred to in multiproxy.f.

5 Comments

  1. fFreddy
    Posted Aug 2, 2005 at 9:21 AM | Permalink

    Steve, the link to the lexicon is bust. Steve: Fixed.

  2. Larry Huldén
    Posted Aug 2, 2005 at 11:34 PM | Permalink

    Is it possible that Mann originally believed in his hookey stick that much that he made the calculations quickly to get the results published and expected that they would be more thoroughly verified later? Because nobody has later been able to verify what he did he doesn’t dare to disclose everything because the original reconstruction was not properly done. I feel pretty sure that he will never disclose completely the methods and data used in MBH98 & 99.

  3. Jim Mitroy
    Posted Aug 6, 2005 at 9:21 PM | Permalink

    Since the economy of the free world apparently depends on the
    hockey stick I decided to run the code through the the FORTRAN
    syntax checker ftnchek.

    There were close to 100 screen fulls of warnings concerning
    mixed mode (mainly REAL*4/REAL*8) arithmetic. The tail-end
    of the output is enclosed below.

    This is what I got

    ……………………………………………………..
    ……………………………………………………..
    Warning near line 2960 col 13 file multiproxy.f: real G
    promoted to real*8 S: may not give desired precision
    3012 570 RETURN
    Error near line 3012 file multiproxy.f: missing END statement
    inserted at end of file

    2 syntax errors detected in file multiproxy.f
    476 warnings issued in file multiproxy.f

    Warning: Subprogram CSVD argument usage mismatch at position 6:
    Dummy arg IP in module CSVD line 2749 file multiproxy.f is used before set
    Actual arg IP0 in module %MAIN line 1138 file multiproxy.f is not set

    ************************** END ftnchek output ********************

    I really would be happier if decisions concerning the future of
    the world economy did not depend on whether the compiler sets
    unitialized variables to zero.

    Steve: Could some of the diagnostics be compiler-specific? He’s produced results, so at some level I presume that his program must at least compile.

  4. Jim Mitroy
    Posted Aug 8, 2005 at 3:12 AM | Permalink

    The ftnchek program is designed to check the syntax of program
    vs strict fortran standards. So programs which fail the checks
    can potentially give different results depending on the
    compiler/CPU mix.

    Uninitialized variables are usually set to zero, and most
    compilers do have an option to enforce this. But, the people
    who write programs with unitialized variables generally do
    not bother fiddling with compiler options.

    The 470 or so such mixed REAL*4/REAL*8 mode expressions
    mean the program is more susceptible to round-off error
    creeping in and possibly polluting the quality of any
    computation. I noticed that the code had an SVD in there,
    if I was doing an SVD I would generally like to maintain
    the integrity of all the 15 digits of REAL*8. It should
    be noted that there are compilers with options that can
    force everything to be REAL*8.

    For the record, I like to know what is happening to all
    15 digits of my floating point numbers as I add, subtract,
    multiply and divide them. Mixed-mode arithemetic and/or
    unitialized variables do not constitute good programming
    practice.

    Jim Mitroy

  5. Sam Urbinto
    Posted Nov 7, 2007 at 11:33 AM | Permalink

    Mann’s code: Based on a question from Steve and what Jimmy Smith said and steven mosher said, I’m bumping this one. Have at it gents.