Team Euro Code

Team Euro’s CPD submission online here has done a commendable job archiving data in their supplementary information online here . It’s by no means perfect, but it’s a big improvement in practices in the field. So while they’ve chosen to unfairly disparage my own efforts in respect to source code, perhaps one can construe the marked improvement in their own practices as a type of compliment.

They state that they will provide source code for the calculations after the review period as follows:

The software (in the `python’ language) used to generate the reconstructions presented in the manuscript will be made available on http://mitrie.badc.rl.ac.uk when the manuscript is published.

Now this is obviously an improvement, but, to the extent that they are trying to adhere to improved practices in the paleoclimate field – something that I obviously endorse – it would be much better if they archived their code at the time of submission, as required in econometrics journals.

Notwithstanding the above disclaimer, some of their Python functions are archived at the MITRIE website here, although there does not appear to be any unifying script to date. To give a flavor of Juckes’ programming style, I’ve attached his code for the do_regress.py function below. I suspect that their job would have been more cost-effective if they had used R, but people use what they’re used to and I’ve got no complaint about that.

At my present rate, I’ll have much of the article replicated in R in a day or two.

73 Comments

  1. Armand MacMurray
    Posted Nov 1, 2006 at 1:11 PM | Permalink

    I’ve attached his code for the do_regress.py function below.

    Just a note that this isn’t visible to me yet.

  2. Steve McIntyre
    Posted Nov 1, 2006 at 2:17 PM | Permalink

    1 #
    2 ###########################################################################
    3 # module do_regress.py
    4 #
    5 ###########################################################################
    6 #
    7 # Utilities for proxy based temperature reconstructions
    8 ###########################################################################
    9 from Numeric import *
    10 import cdms
    11 import string, shelve, mods
    12 from config import data_dir, work_dir
    13 import basic_utils
    14
    15 class regresser:
    16
    17 def __init__(self, ncfile=data_dir+’nc/mitrie_proxies_v01.nc’,
    18 xrange=(1000,1980),
    19 std_opt=’std’,
    20 key_list=None,
    21 key_opt=None,
    22 std_period = ‘full’,
    23 proxy_id=’mann_etal1999′, \
    24 n_extrap=1, \
    25 select_on_id=True, \
    26 replace_pcs=None, \
    27 ppcs = None, \
    28 mann_etal1999_fixed_pc = True, \
    29 verbose=0, \
    30 lcal=79 ):
    31
    32 self.name = ‘regresser’
    33 self.n_extrap=n_extrap
    34
    35 self.ee = {}
    36 self.start = xrange[0]
    37 self.end_year = xrange[1]
    38 self.proxy_id = proxy_id
    39 self.ncfile = ncfile
    40 self.lcal = lcal
    41 self.time_res = False
    42
    43 self.nc = cdms.open( self.ncfile, ‘r’ )
    44 keys = self.nc.variables.keys()
    45 keys.sort()
    46
    47 if key_list != None:
    48 nnf=0
    49 for k in key_list:
    50 if k not in keys:
    51 print k, ‘ not found in keys’
    52 nnf +=1
    53 if nnf != 0:
    54 print keys
    55 self.nc.close()
    56 raise ‘error’
    57 keys = key_list
    58 else:
    59 keys = basic_utils.xx_where( \
    60 lambda x: self.nc.variables[x].standard_name == ‘proxy’, keys, extract=True )
    61 if len (keys) == 0:
    62 print ‘ no proxies found in: ‘,self.ncfile
    63 self.nc.close()
    64 raise ‘error’
    65
    66 if select_on_id:
    67 keys = basic_utils.xx_where( \
    68 lambda x: self.nc.variables[x].collection_id == proxy_id, keys, extract=True )
    69 if len (keys) == 0:
    70 print ‘no proxies matching id %s found in %s ‘ % (proxy_id, self.ncfile)
    71 self.nc.close()
    72 raise ‘error’
    73
    74 keys = basic_utils.xx_where( \
    75 lambda x: (self.nc.variables[x].start[0] = self.end_year-n_extrap ), keys, extract=True )
    77
    78 if proxy_id == ‘mann_etal1999’:
    79 if mann_etal1999_fixed_pc:
    80 keys.pop( keys.index( ‘mbh99_13’ ) )
    81 else:
    82 keys.pop( keys.index( ‘mbh99_01’ ) )
    83 elif proxy_id == ‘mann_etal1998’:
    84 sect_list = [‘VAGANOV_SOVIET_TREEL’, ‘ITRDB-NORTH_AMERICA’, \
    85 ‘Stahle_swm_late’, ‘Stahle_swm_early’ ]
    86 keys = basic_utils.xx_where( \
    87 lambda x: self.nc.variables[x].section not in sect_list, \
    88 keys, extract=True )
    89
    90 if len (keys) == 0:
    91 print ‘no proxies matching id %s and starting before %s found in %s ‘ % (proxy_id, self.start, self.ncfile)
    92 self.nc.close()
    93 raise ‘error’
    94
    95 if verbose > 1:
    96 print ‘keys extracted:’,keys
    97 self.keys = keys
    98
    99 nproxy = len( keys )
    100 ny = self.end_year + 1 – self.start
    101 self.predictor = multiarray.zeros( (nproxy,ny), ‘f’ )
    102 self.predictor_raw = multiarray.zeros( (nproxy,ny), ‘f’ )
    103 self.predictor_mean = multiarray.zeros( (nproxy), ‘f’ )
    104 self.predictor_var = multiarray.zeros( (nproxy), ‘f’ )
    105 tax = self.nc.getAxis( ‘time’ ).getValue()
    106 i0 = self.start – tax[0]
    107 ik=0
    108
    109 if std_period == ‘cal’:
    110 n_right=lcal
    111 elif std_period == ‘full’:
    112 n_right=-1
    113 else:
    114 print ‘std_period should be cal or full’
    115 raise ‘error’
    116
    117 if verbose > 1:
    118 print std_opt, lcal
    119 self.meta_dict = {}
    120 for k in keys:
    121 mdict = {}
    122 for at in self.nc.variables[k].listattributes():
    123 mdict[at] = self.nc.variables[k].getattribute( at )
    124
    125 self.predictor[ik,:], self.predictor_mean[ik], self.predictor_var[ik] = \
    126 self.standardise( \
    127 self.nc.variables[k].getValue().tolist()[i0:i0+ny], \
    128 opt=std_opt, n_right=n_right )
    129 self.predictor_raw[ik,:] = \
    130 self.nc.variables[k].getValue().tolist()[i0:i0+ny]
    131 ik+=1
    132 self.meta_dict[k] = mdict
    133
    134 self.keys = keys
    135 self.l1 = nproxy
    136 self.nc.close()
    137 ##
    138 ## recalculated mann et al, 1998, 1999 proxy pcs
    139 ##
    140 if ppcs != None:
    141 if replace_pcs == None:
    142 kks = range(3)
    143 else:
    144 kks = []
    145 for p in replace_pcs:
    146 kks.append( self.keys.index( p ) )
    147
    148 nc = cdms.open( data_dir + ‘nc/mitrie_new_proxy_pcs_%4.4i_v01.nc’ % self.start, ‘r’ )
    149 k1=0
    150 for k in kks:
    151 v2 = nc.variables[ ppcs[k1] ].getValue()
    152 self.predictor[k,:], self.predictor_mean[k], self.predictor_var[k] = \
    153 self.standardise( v2, opt=std_opt, n_right=n_right )
    154 self.predictor_raw[k,:] = v2
    155 k1+=1
    156 nc.close()
    157
    158 ############################################
    159 ### predictand stuff.
    160
    161 self.load_predictand( )
    162 if proxy_id == ‘mann_etal1999’:
    163 for k in range(3):
    164 ss = sum( self.predictor[k,-self.lcal:]*self.predictand )
    165 if ss 1:
    248 print ‘getting composite::’,res[1],res[2]
    249 print len(res[0])
    250 print res[0][0:10]
    251 self.composite = array( res[0] )
    252
    253 def regress_composite( self, opt=’cols’ ):
    254
    255 cc = dot( self.composite[-self.lcal:], self.composite[-self.lcal:] )
    256 if self.time_res:
    257 print ‘smoothing predictand in variance calculation’
    258 sp = basic_utils.smooth( self.predictand, self.time_res )
    259 self.pp = dot( sp, sp )*len(self.predictand)/len(sp)
    260 else:
    261 self.pp = dot( self.predictand, self.predictand )
    262
    263 if opt == ‘cols’:
    264 self.gg = dot( self.composite[-self.lcal:], self.predictand )/cc
    265 self.predicted_temp = self.gg*self.composite
    266 elif opt == ‘cvm’:
    267 self.gg = sqrt( self.pp/cc )
    268
    269 self.predicted_temp = self.gg*self.composite
    270 else:
    271 print ‘option ‘,opt,’ not available’
    272 raise ‘error in regress_composite’
    273
    274 self.residuals = self.predictand – self.predicted_temp[-self.lcal:]
    275
    276
    277 def residuals_pp( self ):
    278
    279 lch = self.lcal/2
    280 self.ac = multiarray.zeros( (2,lch), ‘f’ )
    281 for i in range( self.lcal/2 ):
    282 self.ac[0,i] = vdot( self.residuals[:lch],
    283 self.residuals[i:i+lch] )/lch
    284 if i != 0:
    285 self.ac[1,i] = vdot( self.residuals[:-i],
    286 self.residuals[i:] )/(self.lcal-i)
    287 else:
    288 self.ac[1,i] = self.ac[0,i]
    289
    290 self.ac[0,:] = self.ac[0,:]/self.ac[0,0]
    291 self.ac[1,:] = self.ac[1,:]/self.ac[1,0]
    292
    293 v0 = vdot( self.residuals, self.residuals )/self.lcal
    294 v1 = vdot( self.residuals[1:], self.residuals[:-1] )/(self.lcal-1)
    295 v2 = vdot( self.residuals[2:], self.residuals[:-2] )/(self.lcal-2)
    296 v3 = vdot( self.residuals[3:], self.residuals[:-3] )/(self.lcal-3)
    297 std_err = sqrt( v0 )
    298 ndf1 = self.lcal*( 1.-v1/v0 )
    299 ndf2 = self.lcal*( 1.-sum( [v1/v0, v2/v1, v3/v2] )/3. )
    300 x = self.predicted_temp[-self.lcal:]
    301 u0 = vdot( self.predictand, self.predictand )/self.lcal
    302 u1 = vdot( self.predictand[1:], self.predictand[:-1] )/(self.lcal-1)
    303
    304 tau_d = 1. – self.ac[0,1]
    305 tau_i = sum( self.ac[0,:] )/lch
    306 ndf3 = int( self.lcal*( 1.-u1/u0 ) )
    307 alpha = u1/u0
    308 return (std_err, ndf1, ndf2 )
    309
    310
    311 ##########################################################
    312 ### standardise proxies — either on whole time series
    313 ### or on last n_right values (as in Mann et al., 1998)
    314 ##########################################################
    315 def standardise( self, aa, n_right=-1, opt=’std’ ):
    316 ##
    317 ## n_right > 0:: standardise using latter end of series only.
    318 ##
    319 if self.n_extrap > 0:
    320 ii = basic_utils.xx_where( lambda x: x 0:
    322 print ‘standardise:: filling’
    323 for i in ii:
    324 aa[i] = aa[ii[0]-1]
    325
    326 if n_right > 0:
    327 if min( aa[-n_right:] )

  3. john lichtenstein
    Posted Nov 1, 2006 at 3:30 PM | Permalink

    The readme file makes it look like this package is everything. There are a couple of dependencies most folks would need to add, CDAT (Climate Data Analysis Tools) and an NCAR graphics package. Folks thinking of saving Python code for posterity should probably consider the readability guidelines from the Benevolent Dictator.

  4. Steve McIntyre
    Posted Nov 1, 2006 at 3:56 PM | Permalink

    In fairnes to the authors, their tabs and spacing got lost in the posting to the site. What I like about R is the vector and matrix handling so that you can elgeantly avoid do-loops. When I see them, I just see busy work.

  5. bender
    Posted Nov 1, 2006 at 5:38 PM | Permalink

    R rocks

  6. Mike T
    Posted Nov 1, 2006 at 9:01 PM | Permalink

    Steve, there is an extension for python called numpy which makes the language very MATLAB/R like, . I think Python is also a little more general purpose than R.

  7. Jaye Bass
    Posted Nov 1, 2006 at 10:51 PM | Permalink

    2 comments.

    1. Looks like typical academic code. Poorly commented, pasta-like, etc.

    2. Languages like C++, Java, etc can be set-up so that matrix operations look like matrix operations. I would wager that R doesn’t have the robustness (or speed?) that a general purpose language has which would make it less suitable for really large sims/models. For instance, Blitz++ is a very fast matrix/vector library. Not for the faint of heart though…tough on compilers also.

  8. john lichtenstein
    Posted Nov 2, 2006 at 1:23 PM | Permalink

    Mike T is right that there are great matrix functions for Python in numpy. This code even uses Numeric, the predecessor to numpy.

    Bender R is more complete but there is a lot of weight behind scipy now. And Python is way better at parsing data salad than R (or even SAS).

  9. bender
    Posted Nov 2, 2006 at 1:47 PM | Permalink

    Re #7

    Languages like C++, Java, etc can be set-up so that matrix operations look like matrix operations.

    It’s not just the look of the code, it’s also performance. I’ll take your wager on R’s speed for matrix operations.

    Re #8

    there is a lot of weight behind scipy now

    I’ll take power over popularity.

  10. fFreddy
    Posted Nov 2, 2006 at 2:07 PM | Permalink

    I’ll wager R itself is written in C++.

  11. Posted Nov 2, 2006 at 2:14 PM | Permalink

    X=randn(1000);tic,out=inv(X’*X)*X’;toc

    elapsed_time =

    14.4530

    I have time to wait, old Matlab does fine 🙂

  12. bender
    Posted Nov 2, 2006 at 2:43 PM | Permalink

    I’ll wager R’s statistical libraries far outstrip anything available for C++. Run-time is only a tiny fraction of the real time-cost associated with any particular language. C++ is not for dummies. R is.

  13. fFreddy
    Posted Nov 2, 2006 at 3:24 PM | Permalink

    I’ll wager R’s statistical libraries far outstrip anything available for C++.

    I would expect that R’s statistical libraries are themselves written in C++.
    Given that it’s an open source program, you should be able to download the source code, compile it, and link it into your own program written in C, Basic, or whatever you like.

  14. bender
    Posted Nov 2, 2006 at 3:27 PM | Permalink

    Not sure why you keep saying that. It may be true but it’s irrelevant to anyone looking for statistical solutions.

  15. fFreddy
    Posted Nov 2, 2006 at 3:55 PM | Permalink

    Fair point. I’ll shut up.

  16. jaye
    Posted Nov 2, 2006 at 4:00 PM | Permalink

    C++ is not for dummies. R is.

    Which one to use depends on the scale of your problem. Large scale apps with a long lifetime probably should be done in the “real” language of your choice. Smaller scale, prototyping or quick what ifs are amenable to things like R, MathCad (one of my favorites) and MatLab. I’d wager they are a better choice.

    I’ll take your wager on R’s speed for matrix operations.

    Blitz++ runs at Fortran speeds…or faster.

  17. bender
    Posted Nov 2, 2006 at 4:43 PM | Permalink

    Re #16 I wasn’t going to say the F word …

    Re #15 Hey, fFreddy, I wasn’t tryin to razz ya. Comments & insights appreciated. (Overall I’ve used C++ as much as R; but I haven’t touched it in years.)

    Hearing about unfamiliar things like Python, numpy, blitz++ – this is good. You can’t get that at RC.

  18. Spence_UK
    Posted Nov 2, 2006 at 4:54 PM | Permalink

    UC, I tried that on my machine:

    >> X=randn(1000);tic,out=inv(X’*X)*X’;toc

    elapsed_time =

    5.5180

    See? You don’t need a new language, you just need a faster computer 😉

  19. Mike T
    Posted Nov 2, 2006 at 8:16 PM | Permalink

    I hope you don’t actually use inv to compute that :),

    >> X=randn(1000);tic,out=inv(X’*X)*X’;toc

    Elapsed time is 1.376841 seconds.

    >> X=randn(1000);tic,out=(X’*X)\X’;toc

    Elapsed time is 0.943003 seconds.

    Not only is inv slower than \, but it’s also more susceptible to numerical precision problems.

  20. john lichtenstein
    Posted Nov 2, 2006 at 10:36 PM | Permalink

    Bender popularity is power. What makes R great is that a lot of smart people put some great functions there. Now lots of smart people are working on sciPy and Climate Data Analysis Tools.

  21. Steve McIntyre
    Posted Nov 2, 2006 at 10:40 PM | Permalink

    Just looking at the Python code, my impression – and it’s gut feel only – is that there are 4-10 times more lines than you’d need with R. In R, you hardly ever use do-loops because of the matrix and vector orientation, and Juckes’ code was full of do-loops.

  22. bender
    Posted Nov 2, 2006 at 10:57 PM | Permalink

    Popularity is good, no question. Power is more than just popularity however.

  23. Jaye Bass
    Posted Nov 2, 2006 at 11:54 PM | Permalink

    Heh…the F word…clunky but fast. The gold standard for computational speed.

  24. Nicholas
    Posted Nov 3, 2006 at 12:25 AM | Permalink

    I’m very much a fan of C-like languages and their derivatives, but I would say that if there is a language specifically designed for your purpose, and it’s designed well, then you should use it. I can probably write any function you perform in R much faster in C, but with more effort. If you aren’t running a script that’s going to take weeks or months to execute, performance probably doesn’t matter much to you and you should just stick with whatever makes your life easiest.

    I certainly don’t see the point of using a scripting language that’s not specifically designed for mathematical applications in order to write mathematical functions. There’s no reason you can’t, but as Steve says the scripts are going to end up longer and messier. Still, if one is very familiar with a given language, and unfamiliar with R, for a single program it may not be worth learning. If statistics are used heavily in your line of work, learning R or Matlab or similar (Maple?) is probably a good idea.

  25. Rafe
    Posted Nov 3, 2006 at 12:53 AM | Permalink

    Re: #24: Nicholas,

    I can probably write any function you perform in R much faster in C

    I would be surprised if the R implementation does not perform a whole bevy of high-level optimisations that would make it hard for a roll-your-own C program to compete on any non-trivial problem without major programming effort.

  26. Posted Nov 3, 2006 at 1:59 AM | Permalink

    Mike T, thanks for the note. Don’t tell anyone I’ve used inv().

    There is a clear difference:

    X=randn(79,415);

    out=X*((X’*X)\X’);max(max(abs(eye(79)-out)))

    Warning: Matrix is close to singular or badly scaled.
    Results may be inaccurate. RCOND = 7.411750e-021.

    ans =

    1.8097e-014

    out2=X*inv(X’*X)*X’;max(max(abs(eye(79)-out2)))

    Warning: Matrix is close to singular or badly scaled.
    Results may be inaccurate. RCOND = 7.411750e-021.

    ans =

    31.1229

  27. john lichtenstein
    Posted Nov 3, 2006 at 3:08 AM | Permalink

    SteveM you can’t judge Python by this script. Scientists and engineers both are notoriously lousy programmers. Phds especially. They just can’t be bothered to write comments. For a Phd scientist, professor J seems to be a couple of deviations better than average, yet he’s clear as mud. Write me the next time you find yourself confounded by data tossed salad and I will show you how Python goes beyond R, SAS, Perl, or anything in conversion of nonsense into arrays. The R tutorial page suggests learning Perl or SAS along with R since the R/S text processing facilities are junk, But Python is mainly a text processing language. With numpy for arrays and matplotlib for graphs, it’s the room, the sun, and the sky.

    Bender would I bother you with popular but useless things? Guido Van Rossum is just so darn smart, I am trying to institute his pep8 style guide for Python as a style guide for SAS in my shop. Alex Martelli and Tim Peters too are such clear thinkers and concise writers that you have to consider what they are spending so much time on. While I am clearly enthralled, I will make an effort to contain my evangalism to this post. So, Bender, stay with me a minute.

    A long while ago I had a job where I needed to hand SAS code over to a COBOL programmer for implementation into a production system. One extremely bright COBOL programmer was brutal in his criticism of any use of repeated code that did not use a loop. He was very critical of the SAS code we gave him. It was a little unfair. Jim from Mar Vista could see several steps beyond the code what was going to happen next, yet we could barely see what we were writing. It’s only in Python that I can write clear code that I don’t worry that Jim from Mar Vista would complain.

    Yes R is great if you can think in arrays. But Python lets you look like an organized thinker even if you are like me. I need to wear all my keys on my necklace or I would lose them. I selected the car I drive partially on the basis that it’s physically impossible to lock the keys in the ignition or to leave the lights on. But while I need help from Tokyo just to make my way to work, I like to think that if you look at my code it’s transparent.

    If you are really smart. like SteveM, you no syntax can hold you back. But for the rest of us, clarity matters. R is great. But you need also something for the datasalad r is no good with. Python is less well developed wrt stat functions but is the best thing ever for datasalad.

  28. sione
    Posted Nov 3, 2006 at 3:59 AM | Permalink

    Steve,

    I have extended JAMA (JAva Matrix Algebra) package which is freely available from the link below, to be more Matlab-like. It means that it is basically writing Matlab codes but in Java.

    “JAMA”
    http://math.nist.gov/javanumerics/jama/

    If you want a copy, I can send you one.

    Here is a piece of code by UC shown above:

    —————- UC Code Snippets —————
    X=randn(79,415);

    out=X*((X’*X)\X’);
    ————————————————-

    —————- Sione Code Snippets Using JAMA—————
    Matrix X = JDatafun.randn(79,415);

    Matrix out = X.times(X.transpose().times(X)).times(X.transpose().inverse());
    ————————————————————–

  29. Nicholas
    Posted Nov 3, 2006 at 4:25 AM | Permalink

    Rafe: A mathematician friend of mine who writes a lot of software to be run on large clusters swears by C, and uses some high-performance C mathematical libraries (for stuff like matrix operations, fourier transforms, etc.) I would ask him what library he likes best and start with that. That should take care of providing me with well-optimized high level functions.

    As for actually performing mathematical optimizations by refactoring formulae, that is something I suppose I would be bad at because I am not a trained mathematician as such. But most of the code I see on this blog is pretty straightforward matrix ops. I’m sure a good matrix library would deal with them quite well.

    As I said in general I would use R for this type of thing, unless I was writing, say, a climate model which processed vast amounts of data. Then I would use something low level like C in order to be able to get close to optimal low-level code to reduce run-time. But if anyone is going to write mathematical code in C or any other language not specifically designed for it, chances are you are better off doing a little research and finding the premier mathematical library(ies) for the language you have chosen. There’s no point re-inventing the wheel, especially since the prewritten libraries are likely to be more heavily optimized than anything you’re likely to come up with yourself.

  30. Posted Nov 3, 2006 at 4:34 AM | Permalink

    Firstly I’d like to apologise for an error in our manuscript: On page 19, line 6, the manuscript states: “The code used by MM2005 is not, at the time of writing, available”: This comment should have referred to the code used by MM2005c (their Energy and Environment paper) — the code used by MM2005 (their GRL paper) was made available at time of publication.

    We apologise to McIntyre and McKitrick for suggesting that the code for MM2005 was unavailable when it was in fact the code for MM2005c which we had not been able to obtain, and note that they have made efforts to provide code for this and other publications which go beyond the norm in this field.

    Further details are on the Climate of the Past Discussion site, where anyone is welcome to post a comment.

    Stephen McIntyre has now made the code for their Energy and Environment paper available to us (though his http://www.climate2003.com site still declares that it is unavailable). Interestingly, given the strong emphasis of McIntyre and McKitrick’s 2005 Energy and Environment paper on the importance of centering data for principal component calculations this step appears to have been omitted for figure 2 of their paper, which displays a principal component of uncentred data. Further details are available at mitrie.badc.rl.ac.uk.

    Concerning the code, this is provided in the spirit of transparency. I fully agree with the comment above that python should not be judged on the basis of this code.

  31. Nicholas
    Posted Nov 3, 2006 at 5:20 AM | Permalink

    Martin: I’m a bit confused. I’m looking at Figure 2 in M&M’s 2005 Energy & Environment paper. The caption says:

    Figure 2. North American AD1400 PC1. Above: PC1 using centered calculations; below: MBH98 PC1 using decentered method. Solid-base case; dashed – with arbitrary addition of 0.5 to non-Graybill sites from 1400-1450. 25-year smoothing is applied.

    This seems, to me, to be explicitly indicating that the graph is of four sets of data. Two of them are the PC1 using centered calculation method (with & without arbitrary adjustment) and the other two are the PC1 using decentered calculation method (with & without arbitrary adjustment).

    Without reading the rest of the paper, it seems clear to me the figure is attempting to display the difference that decentering/centering makes to the results of the method. Why is it strange that they are presenting a graph of decentered data, when they are using it as a basis of comparison? It seems entirely straight-forward to me. Otherwise I don’t know how they could illustrate the impact of the decentering.

  32. Willis Eschenbach
    Posted Nov 3, 2006 at 5:58 AM | Permalink

    Martin, thank you for your letter. Would you like to also put some comments in your paper on the unavailability of the codes and data for some of your co-authors on your paper? Because if you don’t, I don’t see the point of commenting on code availability at all …

    In particular, could you respond to, or even better, offer some help with, Steve McIntyre’s problems with getting the following information? Steve M. says:

    Esper never responded to any requests for data. It practically took a federal case with Science to get any data from him and it is still incomplete. Esper failed to provide any coherent explanation of his methodology, let alone source code. Briffa has never provided source code. Brifffa has even failed to even identify the sites in Briffa et al 2001 (which have been used in 6 other studies.) At this point, despite all the publicity, Mann has still not archived the actual results for his AD1400 step through to 1980 and the Euro Team didn’t collect it.

    Steve M. has always provided data and code, both in advance and on request, and has always been totally open about his methods and data. I find your singling out of Steve M. for your comments, while totally ignoring the well-known problems with your co-authors Briffa, Esper, and Moberg, to be a very troubling and disconcerting occurrence.

    Your comments greatly appreciated …

    w.

    PS – Everything in science requires a citation from some reputable Journal, so regarding your preferable course of action in this matter, let me offer a reference to the following Journal article. The Abstract says:

    … “And why beholdest thou the mote that is in thy brother’s eye, but considerest not the beam that is in thine own eye? Or how canst thou say to thy brother, Let me pull out the mote out of thine eye; and, behold, a beam is in thine own eye? Thou hypocrite, first cast out the beam out of thine own eye; and then shalt thou see clearly to cast out the mote out of thy brother’s eye.”

    “The Sermon On The Mount”, Christ, J. et al., Journal of Applied Theology, 30 CE

  33. Posted Nov 3, 2006 at 7:20 AM | Permalink

    Response to comment 32. The lower two curves in the plot are calculated using the MBH method, centred on the calibration period, which is described in the caption as “decentred”, i.e. not centred on the complet data record. The upper two curves, which are described in the caption as “Above: PC1 using centred calculations” are calculated without centering of any kind. The discussion on page 77 of the paper makes it clear that the authors intend to show the contrast between the MBH PC and a centred PC, but what is plotted in the figure is a comparison between the MBH PC and a PC calculated with no centering.

    Response to comment 33: I have never advocated that code should always be disclosed. For the record, I asked for the code for the McIntyre and McKitrick 2005 paper in December 2005, and it was not provided. I checked Stephen McIntyre’s website prior to submission and found a line clearly saying that the code was not yet available. It has now been provided and, as I acknowledge, this goes beyomd what is normal in the field.

  34. Posted Nov 3, 2006 at 7:27 AM | Permalink

    I have never advocated that code should always be disclosed.

    Why not? Is it the threat of terrorism?

    For the record, I asked for the code for the McIntyre and McKitrick 2005 paper in December 2005, and it was not provided.

    That’s a matter of record that Steve can clarify.

    I checked Stephen McIntyre’s website prior to submission and found a line clearly saying that the code was not yet available.

    If you’re referring to http://www.climate2003.com then it hadn’t been updated in months. It’s no excuse to make the false claim about software code being unavailable six months later because that is simply untrue – and you know it.

    It has now been provided and, as I acknowledge, this goes beyomd what is normal in the field.

    That’s because what is normal in the field is a disgrace to science and can fairly be described as rampant misconduct.

  35. Steve McIntyre
    Posted Nov 3, 2006 at 7:35 AM | Permalink

    #30. Martin Juckes draws our attention today to the following comment which he has just published at CPD:

    On page 19, line 6, the manuscript states: "The code used by MM2005 is not, at the time of writing, available": This comment should have referred to the code used by MM2005c (their Energy and Environment paper) — the code used by MM2005 (their GRL paper) was made available at time of publication.In view of these considerations the following modification to the text is needed: on page 19, line 6-9: replace "The code …. (standardised)" with: "The code used by MM2005c (available at http://www.climate2003.com/scripts/) shows that they use PCs which have been (except in figure 2) centred but not normalised to unit variance (standardised). Figure 2 of MM2005c is an exception: it shows a graph with non-zero mean which cannot be the principal component of centred data."

    We apologise to McIntyre and McKitrick for suggesting that the code for MM2005 was unavailable when it was in fact the code for MM2005c which we had not been able to obtain, and note that they have made efforts to provide code for this and other publications which go beyond the norm in this field.

    The code used by MM2005c is now (mostly since March 2006, updated October 2006 following correspondence about missing components) available at http://www.climate2003.com/scripts/ (McIntyre, personal communication), as stated in our corrected text above. At the present time this page is not linked to from the main site (www.climate2003.com), which contains the statement that “The computer script used to generate the figures and statistics in the E&E here will be located here [in a couple of days]” Our intended comment (that the code used by MM2005c was not available) was based on the above statement and on an email from Stephen McIntyre to us saying that he would forward the code when it became available. He first informed us of its availability after the publication of our manuscript.A detailed discussion of the provided code can be found on the MITRIE project website (http://mitrie.badc.rl.ac.uk), in addition to some comments on the issue of disclosing software.

    I will revert on a number of points here more fully. I wish to note for the record that the stale index webpage at http://www.climate2003.com was updated yesterday (November 2, 2006) and that a direct link to the Supplementary Information for MM05 was placed on the index page. This change was made promptly after the matter was brought to my attention. The date of the change is recorded at the website and, if Juckes wishes confirmation, I will provide an access password to coauthor Zorita to confirm this. I presume that Juckes either did not double-check the webpage when he submitted his correction or possibly he may have inadvertently used a cached version.

    Second, on the issue of EE Figure 2, when the matter was brought to my attention, I edited the backup script on Oct 31, 2006, including a commentary and a script to produce a re-stated version of Figure 2 using prcomp. I posted this re-stated version at climateaudit on Nov 2 here . Juckes discusses Figure 2, but failed to discuss the comments as of Oct 31, 2006.

    Third, the backup script originally archived in March 2006 includes a case using correlation PCs (which were reported in MM05b). Juckes’ correction stated: "…shows that they use PCs which have been (except in figure 2) centred but not normalised to unit variance (standardised)." Correlation PCs obviously yield identical results to PCs in which the chronologies are divided by their standard deviation – a point which Juckes, like Wahl and Ammann fails to acknowledge. Juckes failed to refer to any of the discussion in Huybers 2005 or Reply to Huybers 2005 about correlation and covariance PCs.

    Juckes said that there were "uncertainties" about what we did, but never asked me any direct questions about his "uncertainties". I presume that he did not consult his coauthors either. During the last year, I have exchanged over 100 emails with 2 of his coauthors. One of his coauthors has posted here frequently. Eduardo Zorita exchanged code to reconcile results. I have met met with 2 of his coauthors on 3 different occasions. I spent nearly an entire day at KNMI with Nanne Weber, one of his coauthors. On none of these occasions did the Juckes group express any "uncertainties" about what we did.

  36. welikerocks
    Posted Nov 3, 2006 at 7:39 AM | Permalink

    Question to the consensus: Whatsa matter with you? Why are you behaving this way?

    Answer from the consensus: Amateurs like Steve McIntyre might just find out what’s wrong with our important scientific papers and show everybody! These amateurs must to be silenced and discredited in any way shape of form! Hrrrumph, waffle waffle waffle…kah kah kah

    LOL

  37. bender
    Posted Nov 3, 2006 at 7:47 AM | Permalink

    Another AGW double-standard for the file:

    Steve M must make all his codes (1) available, and (2) turnkey, and (3) must advise all team members when such codes are available, and (4) must update all stale web pages linking to such material, and (5) must be sure team members aren’t using cached web material; whereas the team is under no obligation to do anything whatsoever.

    Martin, you’re an extremely bright guy. Grow up. Stop being cute.

  38. Steve McIntyre
    Posted Nov 3, 2006 at 8:07 AM | Permalink

    Juckes says:

    Response to comment 33: I have never advocated that code should always be disclosed. For the record, I asked for the code for the McIntyre and McKitrick 2005 paper in December 2005, and it was not provided. I checked Stephen McIntyre’s website prior to submission and found a line clearly saying that the code was not yet available. It has now been provided and, as I acknowledge, this goes beyomd what is normal in the field.

    Actually January 2006. The code was archived on March 1, 2006. The line on the old website does not "clearly say that the code was not yet available". It was a very stale webpage, but it says that the code will be archived – not that it was "not available". At the time that Juckes purported to check, it was archived – in the same directory as the MM03 scripts which Juckes appears to have located.

    During this period, I exchanged code with one of Juckes’ coauthors – I would have presumed Juckes consulted with his coauthors, but it appears that he failed to consult Eduardo Zorita about his "uncertainties".

  39. MarkR
    Posted Nov 3, 2006 at 8:10 AM | Permalink

    Zorita has put his name to the paper, presumably if he had reason to disagree with its contents, he wouldn’t have signed off on it.

  40. Posted Nov 3, 2006 at 8:24 AM | Permalink

    Hello Stephen,

    As you are no doubt aware, my update to the Climate of the Past Discussion site was posted yesterday, Nov 2nd, not today.

    Concerning your revised figure 2, would you confirm that you still have a minus sign multiplying the centred PC, and that without this minus sign the effect of the adjustment to the non-BCP is to decrease the PC in the first 50 years, as it does for the MBH PC?

  41. welikerocks
    Posted Nov 3, 2006 at 8:38 AM | Permalink

    re: #40

    Just an observation.

    Why wasn’t this all straightened out before statements were made and submitted in the paper ? And look, instant communication right here right now! Amazing how easy it is and unfolding right before our eyes too.

    My husband is a published enviromental scientist.
    I know the real drill. This is not it.

  42. Steve McIntyre
    Posted Nov 3, 2006 at 8:59 AM | Permalink

    #40. First, if an increase to non-BCP ring widths from 1400-1450 results in a decrease in both PC1s in the period 1400-1450, that hardly helps you, as it simply shows that PC methods are singularly inappropriate whether Mannian or centered. In this case, the sign was chosen so that the orientation of the post-1450 values were consistent in the two versions. I notice that your “evaluation” of the Mannian method does not evaluate this perverse property of the methodology.

    You stated “at the present time this page is not linked to from the main site (www.climate2003.com)”. Yes, I realize that you posted your “correction” yesterday, but the webpage was amended yesterday.

  43. Steve McIntyre
    Posted Nov 3, 2006 at 9:11 AM | Permalink

    #40. The webpage was updated yesterday. I checked the time as against the time of your submission to CPD and the webpage was updated prior to your submission to CPD. Thus this statement in your “correction” was untrue as well.

  44. bender
    Posted Nov 3, 2006 at 9:12 AM | Permalink

    Let’s (yes, including me) keep the chatter down on this thread while Steve and Martin hash this out.

  45. Posted Nov 3, 2006 at 9:30 AM | Permalink

    The sign of the Principal Component is arbitrary, I only mentioned it because you appear to attach importance to the direction of the perturbation.

    As regards the difference between your centred PC and the Mann et al. PC, would you agree that this is partly due to the fact that you have not normalised the data, so that proxy series with high variance contribute disproportionately to your PC? If the data is normalised (by setting cor=TRUE in the call to princomp) the leading PC does reflect the anomalous growth in the 20th century, though it is still significantly different from the Mann et al. PC. Wouldn’t you agree that using normalised data is appropriate here?

    As far as I can tell from your modified site, it was updated after I posted the comment yesterday, but if I’ve got that wrong, and the check I made on your site yesterday got information from a cache for some reason, please accpet my apologies.

    Just a brief comment in response to 41: the paper is on a discussion site, the idea is that the site can act as a forum for discussion (hence the name). This is slightly different from traditional printed publications.

  46. Steve McIntyre
    Posted Nov 3, 2006 at 9:44 AM | Permalink

    #45. As I just pointed out in #43, it was updated prior to your submission. Specifically, the webpage was updated at 14:24 Greenwich Mean Time, while your comment is shown as being made at 17.40 yesterday. If I used the term "at the present time" in the document, I would ensure that the comment was actually true at the time that I made the comment – something that is easily done and which you failed to do. It’s no good making an "apology" here. Your obligation is to correct your correction.

  47. Posted Nov 3, 2006 at 9:49 AM | Permalink

    So have you contacted the editor of Energy and Environment about the mistake in your published paper?

  48. Armand MacMurray
    Posted Nov 3, 2006 at 12:22 PM | Permalink

    Re:#44,48,49
    Bender makes a good suggestion.

  49. john lichtenstein
    Posted Nov 3, 2006 at 12:42 PM | Permalink

    Martin Juckes thanks for introducing folks to CDAT.

  50. Steve McIntyre
    Posted Nov 3, 2006 at 4:03 PM | Permalink

    Martin Juckes stated:

    The sign of the Principal Component is arbitrary, I only mentioned it because you appear to attach importance to the direction of the perturbation.

    and

    Concerning your revised figure 2, would you confirm that you still have a minus sign multiplying the centred PC, and that without this minus sign the effect of the adjustment to the non-BCP is to decrease the PC in the first 50 years, as it does for the MBH PC?

    The sign of a PC series, as it comes out of a PC algorithm, is arbitrary. However, in the cases of interest here, the interpretation of a PC1, in which virtually all first eignevector coefficents have the same sign, as a weighted averages of ring width site chronologies (i.e. nearly all positive coefficients) is far more natural than an interpretation in which they nearl all have negative coefficients. It’s always a good idea to look at the eigenvector coefficients and I recommend this to you. Note that the MBH99 PC1 which came out as an upside-down hockey stick had mainly negative first eigenvector coefficients; Mann multiplied it by negative 1 and presented it as an upside-up hockeystick. I have no objection to this presentation as the underlying bristlecone chronologies have upside-up hockeysticks.

    In the cases of interest here, nearly all the weighting coefficients of the PC1 have the same sign and the most natural physical interpretation of the PC1 is as a weighted average with nearly all positive coefficients.

    In Figure 2, the first eigenvector coefficients were mostly positive in the AD1400 Mannian PC step. However, when non-BCP ring widths were increased in the early 15th century, this had the effect of flipping over the entire hockeystick i.e. it assigned negative first eigenvector coefficients to nearly all the series. For interpretation purposes, there is no particular meaning to all negative coefficients and I multiplied the Mannian PC1 with 15th century increment be negative 1 to yield the series in MM05 (EE) Figure 2.

    Using prcomp on the network i.e. centered, the first eigenvector coefficients were again nearly all positive. When non-BCP ring widths were increased in the early 15th century, once again, this had the effect of flipping the entire series over yielding mainly negative first eigenvector coefficients. Again, as with the MAnnian case, I multiplied the PC1 with 15th century increment by negative 1 to yield the series in re-stated Figure 2.

    The critical difference in the two situations is that after all the PCs are consistently oriented (such that first eigenvector coefficients are nearly all positive), the 15th century increment yields an increase with a centered prcomp PC1 and a decrease with the Mannian PC1. We expressed this conclusion in our EE 2005 article as follows:

    For a centered calculation, the increased ring widths for the first 50 years lead to an increase in the PC1 as expected. However, under the MBH98 de-centered method the increased ring widths for 50 non-Graybill sites in the early 15th century causes a significant decrease (!) in the early 15th century PC1. Carried forward through to Northern Hemisphere temperature calculations, these increased ring widths would be construed by the MBH98 method as evidence of colder temperatures in the early 15th century.

    This statement is correct. Since you have raised the matter and alleged an error, do you seriously deny the correctness of the above the statement?

  51. Jean S
    Posted Nov 3, 2006 at 4:06 PM | Permalink

    re #45: Although Bender’s suggestion (#44) is good, I just can not keep my hands off from this:

    If the data is normalised (by setting cor=TRUE in the call to princomp) the leading PC does reflect the anomalous growth in the 20th century, though it is still significantly different from the Mann et al. PC. Wouldn’t you agree that using normalised data is appropriate here?

    No, Martin, no. It is inappropriate. Since the series are already in standard units, it is normal
    practice to use the covariance matrix for PCA. If you use the correlation matrix instead in this case, the burden of justification is on you. I could not find any such thing from your paper, only an astonishing claim about “another apparent error by McIntyre and McKitrick”. IMO, only thing apparent here is that you are not familiar with PCA.

    BTW, M&M have already discussed the issue in their reply to Huybers. That comment and other related things are available from this site by clicking “multiproxy pdfs” on the right panel. I suggest you take a look on those publications before re-iterating more refuted claims.

  52. Willis Eschenbach
    Posted Nov 4, 2006 at 2:43 AM | Permalink

    I have commented on this and another question at the "Climate of the Past" discussion site. Here is the part about proxy selection:

    Next, transparency in regards to proxy selection. For any multi-proxy reconstruction
    to have statistical weight, the proxies must be selected according to a priori rules.
    Otherwise, as some authors seem to have done, proxies are simply selected because
    they agree with the authors’ point of view.

    What were your criteria for proxy selection? I do not find any explanation of this in your
    paper. You are using 10 of the 18 proxies used by Moberg … why did you not use the
    other 8?

    For example, I note that you have included the Arabian Sea: Globigerina bulloides
    proxy, but not the Sargasso Sea dO18 proxy. And that you have included the Yamal
    proxy, but not the Indigirka, Siberia proxy. Why?

    Replacing just these two proxies out of the 18 selected gives a vastly different result,
    negating your claim of "robustness". (See http://www.climateaudit.org/?p=887 for a
    graph and discussion of the difference) Your test for "robustness" consists of deleting
    one proxy at a time to see if it makes a difference. As this example shows, your test is
    not comprehensive enough for the job.

    In short, because you have not specified your proxy selection criteria, and because the
    results are quite sensitive to proxy selection, your paper is not scientifically robust as it
    stands.

    Any comments greatly appreciated,

    w.

  53. Willis Eschenbach
    Posted Nov 4, 2006 at 2:48 AM | Permalink

    Sorry, left off the address of the “Climate of the Past” discussion site.

    w.

  54. Steve McIntyre
    Posted Nov 4, 2006 at 8:00 AM | Permalink

    Willis, I saw your comment. For the record, Moberg’s SI was sufficient to identify data versions for 16 of 18 series. He refused (said he was not authorized) to provide either the Indigirka data (which is non-HS) and the Lauritzen data. This time , I filed a complaint with Nature (not Science) immediately and they required him to file a Corrigendum and provide both series, which he did. The inconsistency between the Lauritzen published version with cold point dated to the mid 19th century and the Nature version with the same cold point supposedly dated to the 1930s remains unexplained.

  55. Posted Nov 6, 2006 at 4:13 AM | Permalink

    I note that Stephen is tiring of this discussion and starting a new thread on http://www.climateaudit.org/?p=893#more-893.

    To clarify, the “error” I referred to in McIntyre and McKitrick (2005, Energy and Environment) was plotting an obvioulsy un-centred PC and describing it as centred.

    Concerning the normalisation of data (#51 above) (or, in other words, the use of the correlation matrix instead of the covariance): it is true, as noted above, that the tree-ring series have already been normalised, but the normalisation is different for each series. Sounds weird, but it is a logical consequence of the fact that the different series span different periods. But it still sounds weird, so I think calling the input data normalised is a little misleading, though not strictly false. Normalising the data actually used (i.e. the selected time period) is standard practise where the normalisation is arbitrary or, as here, affected by factors outside the study period. Normalising the data on a period other than the study period sounds a little like what Mann et al. 1998 did, i.e. centering the data, but not on the study period.

    However, the main point regarding McIntyre and McKitrick (2005, Energy and Enviroment) is that the claim that the difference the new reconstruction published there and the Mann et al. (1998) reconstruction is primarily due to differences in the centering of the data is false. Most of the difference results from the difference in normalisation.

    Concerning our choice of proxies, page 1021, line 15 of our manuscript says: “Finally there is a `Union’ composite made using 18 independent northern hemisphere proxy series marked with “*” in table 1. These series have been chosen on the basis that the extend to 1980 (the HCA composites and the French tree ring series end earlier), the southern hemisphere series have been omitted apart from the Quelcaya glacier data, Peru, which are included ensure adequate representation of tropical temperatures. The MBH1999 North American PCs have been omitted in favour of individual series used in other studies. Finally the Polar Urals data of ECS2002, MBH1999 and the Tornetraesk data of MSH2005 have been omitted in favour of data from the same sites used by JBB1998 and ECS2002 respectively.”

  56. Willis Eschenbach
    Posted Nov 6, 2006 at 5:38 AM | Permalink

    Martin, your response is much appreciated. However, what are needed are a priori rules for proxy selection which are actually followed. You say:

    Concerning our choice of proxies, page 1021, line 15 of our manuscript says: “Finally there is a `Union’ composite made using 18 independent northern hemisphere proxy series marked with “*” in table 1. These series have been chosen on the basis that the[y] extend to 1980 (the HCA composites and the French tree ring series end earlier), the southern hemisphere series have been omitted apart from the Quelcaya glacier data, Peru, which are included ensure adequate representation of tropical temperatures. The MBH1999 North American PCs have been omitted in favour of individual series used in other studies. Finally the Polar Urals data of ECS2002, MBH1999 and the Tornetraesk data of MSH2005 have been omitted in favour of data from the same sites used by JBB1998 and ECS2002 respectively.”

    While two of these are a priori proxy selection rules (must extend back to the year 1000, must go forward to 1980), the rest are just descriptions of proxy choices without giving any reason for the choice. Why the exclusion of later Tornetraesk data in favor of earlier? Why use the Tornetraesk data from ECS2002, but reject the Polar Urals data from the same source?

    Finally, you have not explained the omission of the Indigirka series. Since this runs from the year 0 to 1996, it clearly fits your criteria. The Yamal series is used … why not the Indigirka? If those are your proxy selection rules as you state, you have not followed them in the Indigirka case.

    In short, it is apparent that you have selected proxies on an “ad hoc” basis, and thus your study has no value. Let me pick the proxies as I wish, as you have done, and I will use your methodology to give you a very different result. For your study to be significant, you need to:

    1) Establish clear a priori proxy selection criteria, and

    2) Provide a rationale for the selection criteria, and

    3) Include all of the proxies that meet the criteria.

    You have not done this.

    w.

    PS – I know you don’t want to answer this, because I see you skate around it every time it comes up, so I will ask again. Why is Steve M. singled out in your paper by the false claim that he has withheld information, when your co-authors Briffa, Moberg, and Esper have withheld details of their data and methodology despite repeated requests? Why are they not mentioned in your paper?

    I don’t like to push you on this, but I have asked it several times, including on the CPD discussion forum, without any reply. Your unwillingness to answer this question is … interesting.

  57. Posted Nov 6, 2006 at 7:27 AM | Permalink

    Dear Willis,

    your question about why I singled out Stephen for mention of non-provision of data (I was doing no more than echo what he said on his own web site at the time) has been answered elsewhere on this wiki. I guess the proliferation of different pages is not helping the transparency of the discussion. Anyway, the answer is simply that Stephen has made many assertions to the effect that software should be made available on publication. The fact that he didn’t follow this procedure for his 2005 Energy and Environment paper is … interesting.

    The answer to your question about selection is also elsewhere on this blog, in a contribution from Stephen — so I think the information is available in the manuscript, but will endeavour to make it clearer on revision: where several versions exist, the earliest version is used. The reason for this (not given in the manuscript) is to try to avoid any suggestion of being influenced, in selection/rejection of data, by the results of the earlier publications in selection/rejection of data. For the Indigirka — see Moberg et al. (corrigendum).

  58. Steve McIntyre
    Posted Nov 6, 2006 at 8:39 AM | Permalink

    #57. Martin has evaded answering about Sargasso Sea, which goes back to 1000, as pointed out elsewhere.

    The Moberg Corrigendum, which resulted from my inquiry, is familiar to readers of this blog, states as follows:

    The authorship of this Letter is amended to include Stein-Erik Lauritzen. Details of the SàƒÆ’à‚ⷹlegrotta Cave record (series 8 ), which should have been accredited to S.-E.L., were not supplied in the paper but are available from the corresponding author (A.M., anders.moberg@natgeo.su.se) on request. In addition, the tree-ring-width data from the Indigirka river region (series G) were inadvertently used without the proper permissions: although the series has been discussed in the literature 1, they are unpublished data that have not been made publicly available; they may, however, be obtained through A.M.1. Sidorova, O. V., Naurzbaev, M. M. Response of Larix cajanderi to climatic changes at the Upper Timberline and in the Indigirka River Valley [in Russian]. Lesovedenie 2, 73—75 (2002)

    .

    The data is available from one of the Mitrie coauthos; indeed, he’s provided it to me. Sidorova et al. stated:

    Current warming started at the beginning of the XIX-th century and presently does not exceed the amplitude of the medieval warming. The tree ring chronologies do not indicate unusually abrupt temperature rise during the last century

    As to selection criteria – and Juckes has provided no selection criteria so far to justify excluding Sargasso Sea and Indigirka – Mitrie coauthor Esper stated:

    this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

  59. Francois Ouellette
    Posted Nov 6, 2006 at 8:44 AM | Permalink

    #57

    Dr. Juckes, you state:

    Anyway, the answer is simply that Stephen has made many assertions to the effect that software should be made available on publication. The fact that he didn’t follow this procedure for his 2005 Energy and Environment paper is … interesting.

    So if I understand well, this personal attack on Steve McIntyre in a peer reviewed publication is a little revenge, because he dared give openness lessons to the “established” scientists. How civilized and polite!

  60. Gerald Machnee
    Posted Nov 6, 2006 at 9:01 AM | Permalink

    Re # 57 and #59 – What is not stated is that Steve M will provide the information when requested while certain others will not provide it. THAT IS THE SIGNIFICANT PART!

  61. Steve Sadlov
    Posted Nov 6, 2006 at 10:25 AM | Permalink

    RE: #57 – So in other words, you allowed an ad hominem element of consideration to influence your selection of who to single out. It was particularly ad hominem since you went for a hair splitting argument to say “he hath not the data” – for of course, his own archiving and transparancy far exceeds most of what we see within this discipline. You wanted to make and example of he who dared to, as I would imagine you probably feel, opened the can of worms in the first place. It was pure revenge.

  62. Martin Juckes
    Posted Nov 6, 2006 at 3:47 PM | Permalink

    Folks, if saying the sofware was not available is such a crime, why did Stephen say just that on his own website?

    While we are on the issue, I’d appreciate a copy of c:/climate/data/mann/proxy.tab as used in the 2005 Energy and Environment paper. [this is a request for data, repeating an earlier request, not an accusation]

  63. Steve McIntyre
    Posted Nov 6, 2006 at 3:55 PM | Permalink

    First, where did I say that the software "was not available"? Any climate scientist that thinks that climate2003.com is my primary website, rather than climateaudit,org, is living on another planet. The index page at climate2003.com was obviously stale. It said in January 2005 that the code will be available in a few days. It never said that the code was "not available". I exchanged code with one of your coauthors.

    The table proxy.tab is the same as proxy.MBH which loads from the script. Repeating an answer already given.

  64. per
    Posted Nov 6, 2006 at 4:13 PM | Permalink

    …where several versions exist, the earliest version is used. The reason for this (not given in the manuscript) is to try to avoid any suggestion of being influenced, in selection/rejection of data, by the results of the earlier publications in selection/rejection of data.

    Dear Martin,
    I am sure you will appreciate that I think your rationalisation (not given in the manuscript) is “… interesting”. I fail to see how your procedure prevents you from cherry-picking series that give a desired result. In fact, much the opposite.

    yours confused
    per

  65. Willis Eschenbach
    Posted Nov 6, 2006 at 4:13 PM | Permalink

    Re #57,

    Dear Martin,

    This is astounding. You say:

    Dear Willis,

    your question about why I singled out Stephen for mention of non-provision of data (I was doing no more than echo what he said on his own web site at the time) has been answered elsewhere on this wiki. I guess the proliferation of different pages is not helping the transparency of the discussion. Anyway, the answer is simply that Stephen has made many assertions to the effect that software should be made available on publication.

    First, let’s get one thing straight. Non-provision of data means you don’t provide it when you are asked. Steve is not guilty of that, you just couldn’t find it. Esper, Briffa, and Moberg are guilty of that. But that’s not the astounding part.

    The astounding part is this. You are saying that you singled out Steve in your paper because he has always stood for openness and transparency, and failed to send you a notification that his data had been placed where he said he would place it. This you see as hypocrisy.

    The clear implication, of course, is that that since Esper, Briffa, and Moberg have always stood for obfuscation, concealment, and trickery, and since they continue to practice that, there’s no need to mention that in your paper. There’s no hypocrisy there at all, they’re just doing what they always did.

    Say what? So if Esper, Briffa, and Moberg actually release their data, will you bust them for being hypocritical and not following their code of dishonor?

    Dr. Juckes, the data was there all along. Sure, Steve forgot to mention that it was there … but then, you forgot to look, didn’t you.

    Also, please quit saying you’ve answered questions when you haven’t. I said:

    Finally, you have not explained the omission of the Indigirka series.

    You said:

    The answer to your question about selection is also elsewhere on this blog, in a contribution from Stephen …

    NO. IT. IS. NOT. ANSWERED. ELSEWHERE PLEASE. ANSWER. THE. QUESTION. ABOUT. INDIGIRKA.

    I really hate to be petty about this, but handwaving and saying “I answered that elsewhere” WHEN YOU HAVEN’T just doesn’t cut it.

    What about Indigirka? What about Sargasso? Both of them fit your criteria, and you didn’t use them. Why not?

    w.

    (crossposted to Juckes Omnibus)

  66. Martin Juckes
    Posted Nov 6, 2006 at 4:31 PM | Permalink

    Stephen, the script you have provided attempts to load c:/climate/data/mann/proxy.tab. This looks like a file to me (it is not loaded directly in ee2005.backup.txt, but in MBH98config4.txt, which is itself loaded by ee2005.backup.txt).

  67. Steve McIntyre
    Posted Nov 6, 2006 at 4:40 PM | Permalink

    These are Mann’s proxies. The table is at http://www.climateaudit.org/data/ee/proxy.MBH.txt

  68. K Goodman
    Posted Nov 6, 2006 at 4:47 PM | Permalink

    re #67 – The link doesn’t work for me.

  69. bender
    Posted Nov 6, 2006 at 5:15 PM | Permalink

    Re #68 It works for me.

  70. Dave Dardinger
    Posted Nov 6, 2006 at 5:17 PM | Permalink

    re: #68 It did for me so either it’s been fixed or the problem is on your end.

  71. Hans Erren
    Posted Nov 6, 2006 at 5:34 PM | Permalink

    …where several versions exist, the earliest version is used. The reason for this (not given in the manuscript) is to try to avoid any suggestion of being influenced, in selection/rejection of data, by the results of the earlier publications in selection/rejection of data.

    Martin,

    Could you clarify your choice for the TornetràƒÆ’à‚⣳k data?
    (eg I see “Tornetraesk” data plotting in the Baltic Sea.)
    see map
    http://home.casema.nl/errenwijlens/co2/swedenmap.htm

    To my knowledge Schweingruber refers to the only real TornetràƒÆ’à‚⣳k location.
    and why is the “Tornetraesk” data plotting in the Baltic Sea?

  72. sione
    Posted Nov 7, 2006 at 6:45 AM | Permalink

    Has anyone tried to use any other dimensional reduction methods other than PC? Here are some new dimensional reductions non-linear techniques available from the internet in Matlab codes.

    #1) LLE (Locally Linear Embedding)
    http://www.cs.toronto.edu/~roweis/lle/

    #2) IsoMap
    http://isomap.stanford.edu/

    #3) Principal Curve Analysis
    (R implementation)
    http://rss.acs.unt.edu/Rdoc/library/pcurve/html/00Index.html

    (Java Implementation)
    http://www.iro.umontreal.ca/~kegl/research/pcurves/implementations/index.html

    #4) KICA (Kernel Independent Component Analysis)
    http://www.cs.berkeley.edu/~fbach/kernel-ica/

    #5) KPCA (Kernel Principal Component Analysis) – Statistical Pattern Recognition free toolbox.
    http://cmp.felk.cvut.cz/cmp/software/stprtool/

  73. Posted Nov 7, 2006 at 9:18 AM | Permalink

    #71: Hans, looks like a typing error (by me) entering the Tornetraesk position, it should be 68N, 20E, which hopefully will keep it out of the Baltic. Thanks for pointing that out.

    #67-70: I could download the file, but not read it with R. I’ve asked Stephen for clarification on page 894.

    #72: Not that I know of. Most people in this field don’t even use Principal Component Analysis.