Gergis “Significance”

Jean S observed in comments to another thread that he was unable to replicate the claimed “significant” correlation for many, if not, most of the 27 Gergis “significant” proxies. See his comments here  here

Jean S had observed:

Steve, Roman, or somebody 😉 , what am I doing wrong here? I tried to check the screening correlations of Gergis et al, and I’m getting such low values for a few proxies that there is no way that those can pass any test. I understood from the text that they used correlation on period 1921-1990 after detrending (both the instrumental and proxies), and that the instrumental was the actual target series (and not the against individual grid series). Simple R-code and data here.

I’ve re-checked his results from scratch and can confirm them.

The following graphic shows t-values for a regression of each detrended “passing” proxy against the target instrumental series, all downloaded from original data. ( I got exactly the same correlations as Jean S.) These calculations calculate significance in the style of Santer et al 2008. (The AR1 coefficient of residuals of proxy~instrumental is calculated and the number of degrees of freedom adjusted. A two-sided 5% significance test (again in Santer style) has a t-value benchmark of about +-2 for series of the length of the calibration period. (It varies a little, but this is not material for the point here and t-values of +-2 are shown in the graphic.) Only six of the 27 proxies have t-values exceeding +-2.

Figure 1. t-values from regression of detrended proxies on detrended instrumental. AR1 adjusted.

The next figure plots the locations of the screened-out proxies on the Gergis location map. I’ve shown the “not used” proxies by a red plus sign. I’ve also placed a circle around the proxies passing a t-test as above. (In two cases, the relevant t-value was lower than 2 and thus there are eight circles.)

Figure 2. Gergis Location Map annoted to show unused proxies and “significant” proxies.

One of the underlying mysteries of Gergis-style analysis is one seemingly equivalent proxies can be “significant” while another isn’t. Unfortunately, these fundamental issues are never addressed in the “peer reviewed literature”.

Gergis et al 2012 had stated:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly ([pre]p<0.05[/pre]) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

Bolded values are significant as determined by a normal distribution white noise p-value, p<0.05.


  1. Steve McIntyre
    Posted Jun 6, 2012 at 12:02 PM | Permalink
    ##Packages Used
    #Metadate : manually from files
    	info=infog[order(infog$noaa),] #in order of dataset
    #Get Gergis data from archive
    	for(j in 1:27) work[,j]=c(unlist(work[,j]))
    	proxy =window(work,1000)
    #instrumental target
    	trim=function(x) window( x,start= min( time(x)[!]), end=max(time(x)[!]) )
    	recon=trim(work[,2]) #reconstruction
    	screen_start <- 1921
    	screen_end  info$qt	
    	instrs= window(Instr,1921,1990)
    	info$direct=round(cor(proxys,instrs),2 )  #matches
    # detrended correlation
    	for ( i in 1:27) proxy_det[,i]=  lm(proxys[,i]~c(time(proxys)))$residuals
    	instr_det= lm(instrs~c(time(instrs)))$residuals
    	info$detrend=round(cor(proxy_det,instr_det),2 )  #matches
    	#info$spear = round(cor(proxy_det,instr_det,method="spearman"),2 ) 
    #adjusted t-value for AR1 autocorrelation in usual style
      t_adj=function(fm) {
            beta= fm$coef[2]
            neff= N * (1-r)/(1+r) ;neff # 102.2358 
            se.obs= sqrt((N-2)/(neff-2))* se.ols; 
    	t_adj= beta/se.obs
    	q= qt(.95,neff)
        for (i in 1:27) {
    	fm= lm(proxy_det[,i]~instr_det)
    	info[i,c("t_adj","qt")]= round(unlist(t_adj(fm)),3)
        info$prob=abs(info$t_adj)> info$qt	
    #  png("gergis_t_values.png",w=640, h=480)
    	mtext(side=3, "Gergis 'Significance'",font=2, line=1.1)
    	mtext(side=3, "t-values AR1-adjusted",line=.2,cex=.9)
    	infop$long[infop$long<0]=  (360+ infop$long[infop$long<0]) 
    	info$long[info$long<0]=  (360+ info$long[info$long<0]) 
    	rasterImage(img, 92,-93.3,206,22) 
    	temp= info$prob
    	points(info$long[temp], info$lat[temp],cex=1.8,col=1,lwd=2)
    	points(infop$long[temp],infop$lat[temp], pch="+",col=2,cex=1.7)
    	legend("bottomright",fill=c("blue","green","yellow","red"), legend=c("coral","tree","ice","Not Used"), inset =c(.02,.08),cex=1.5)
    	title("Screened Gergis Proxies")
    • Steve McIntyre
      Posted Jun 6, 2012 at 12:31 PM | Permalink

      Re: Steve McIntyre (Jun 6 12:02),
      for some reason, this script did not cut-and-paste right on a couple of tries. It may need a little babysitting to run turnkey.

      • Posted Jun 7, 2012 at 12:23 AM | Permalink

        I’ve run Steve’s code with and without detrending, and with and without the Quenouille correction. Without detrending (but with zero mean) or AR1 correction all (exc maybe Madang) proxies doo seem significant. The t-values are:
          #  No AR1  AR1 corrected
          1  6.419  5.117
          2  2.468  2.284
          3  4.081  1.706
          4 -3.602 -2.885
          5  4.035  4.156
          6  5.098  3.929
          7  3.928  1.885
          8  4.267  2.434
          9 -2.963 -1.793
         10  3.143  1.715
         11  5.959  1.927
         12  3.217  1.330
         13 -4.328 -3.335
         14 -2.131 -0.980
         15  3.739  2.377
         16 -5.148 -3.170
         17  3.884  1.558
         18  2.774  2.913
         19 -2.637 -2.131
         20 -2.587 -2.164
         21 -2.357 -2.023
         22 -2.743 -2.022
         23 -3.249 -3.278
         24 -4.403 -2.240
         25 -4.565 -4.219
         26 -1.860 -1.291
         27 -3.874 -3.037

        • Jean S
          Posted Jun 7, 2012 at 2:10 AM | Permalink

          Re: Nick Stokes (Jun 7 00:23),

          Without detrending

          Yeah, sure. Just to remind you (my bold):

          For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis. This process identified 27 temperature-sensitive predictors for the SONDJF warm season (Figure 1 and Table 1) henceforth referred to as R27.

          Readers may also want to re-visit the discussions here and here. Jim Bouldin (my bold):

          Rather it is MUCH more likely that, if one does in fact observe just such a relationship, that the rings are in fact, responding to that climate variable of interest. Furthermore, one can test this conclusion even further, by the further test of detrending the composite ring series (not the original detrending of each core to remove the size effect, but a second detrending on the chronology), and then evaluating, statistically, whether there is a relationship between the resulting residuals and similarly produced climate residuals. If the generating process for the ring series was non-climatic, there will be no stat. significant relationship observed therein. If, instead, there *is* a relationship, it will show up. And lo and behold!…this is exactly what Gergis et al did, and exactly what they found.

          I wonder why it seems to me that Jim has now left the building.

        • Posted Jun 7, 2012 at 2:32 AM | Permalink

          Indeed so. I just observe the numerical results.

        • HaroldW
          Posted Jun 7, 2012 at 6:26 AM | Permalink

          Nick –
          Some of the Quenouille-adjusted t-values in your 12:23 post are relatively small. E.g. New Caledonia (#14) at -0.98. Can you provide the p=0.05 thresholds as well, for the statistically-impaired (such as myself)? Gergis apparently used a two-sided test, but a one-sided test would seem to be more appropriate; I’d be interested in both threshold values if it’s not too difficult to produce.

        • Paul Matthews
          Posted Jun 7, 2012 at 6:56 AM | Permalink

          Thanks Nick. There is now a non-technical summary at Bishop Hill for those who find Jean’s remarks a bit cryptic, or who haven’t appreciated the “significance” of this.

        • Posted Jun 7, 2012 at 8:25 AM | Permalink

          Yeah, Another Hockey Stick broken. Personally, though, I’m waiting for the tweet.

        • Posted Jun 7, 2012 at 4:22 PM | Permalink

          “I wonder why it seems to me that Jim has now left the building.”

          Maybe because he has other things to do, like work on the tree ring analysis papers he’s getting ready to submit? Otherwise I would certainly hang on your every word and respond to everything said here…

        • Posted Jun 7, 2012 at 8:24 PM | Permalink

          Here’s the expanded table with 2-sided p-values added. Hope the formatting works.

          t1=t-value no detrend, no AR1 adjustment
          p1=two=sided p-value
          t2=t-value Quenouille AR1 adjustment
          p2=two-sided p-value
                 t1    p1     t2    p2
          1   6.419 0.000  5.117 0.000   Mt Read
          2   2.468 0.016  2.284 0.026     Oroko
          3   4.081 0.000  1.706 0.110 Buckleys C
          4  -3.602 0.001 -2.885 0.006   Palmyra
          5   4.035 0.000  4.156 0.000 Celery Top
          6   5.098 0.000  3.929 0.000     Kauri
          7   3.928 0.000  1.885 0.076 Pink Pine
          8   4.267 0.000  2.434 0.023 Mangawhero
          9  -2.963 0.004 -1.793 0.084   Urewera
          10  3.143 0.002  1.715 0.100 North Isla
          11  5.959 0.000  1.927 0.086  Takapari
          12  3.217 0.002  1.330 0.205 Stewart_Is
          13 -4.328 0.000 -3.335 0.002   Fiji_AB
          14 -2.131 0.037 -0.980 0.342 New_Caledo
          15  3.739 0.000  2.377 0.024 NI_LIBI_Co
          16 -5.148 0.000 -3.170 0.004 Rarotonga_
          17  3.884 0.000  1.558 0.143    Vostok
          18  2.774 0.007  2.913 0.005    Vostok
          19 -2.637 0.010 -2.131 0.038    Fiji_1F
          20 -2.587 0.012 -2.164 0.035      Bali
          21 -2.357 0.021 -2.023 0.048  Abrolhos
          22 -2.743 0.008 -2.022 0.050    Maiana
          23 -3.249 0.002 -3.278 0.002   Bunaken
          24 -4.403 0.000 -2.240 0.037 Rarotonga_
          25 -4.565 0.000 -4.219 0.000  Ningaloo
          26 -1.860 0.067 -1.291 0.205    Madang
          27 -3.874 0.000 -3.037 0.004     Laing

        • HaroldW
          Posted Jun 7, 2012 at 8:58 PM | Permalink

          Thanks very much Nick!
          So with the Quenouille adjustment, without detrending, 8 of the 27 aren’t correlated to the p<0.05 level. Makes one suspect that the selection step is not doing what was intended.
          Makes me even more curious about Gergis' code.

        • Posted Jun 7, 2012 at 10:45 PM | Permalink

          I don’t think they claimed to have adjusted for autocorrelation. Famously, people often don’t.

          My guess is that the un-detrended unadjusted numbers were the ones actually used. Maybe some option for detrending hadn’t been set.

        • Steve McIntyre
          Posted Jun 7, 2012 at 10:52 PM | Permalink

          My guess is that the un-detrended unadjusted numbers were the ones actually used. Maybe some option for detrending hadn’t been set.

          Nick, as opposed to simply speculating, maybe you could get some clarification from the authors. If they’ve misdescribed their methodology on an important point, they should issue a corrigendum.

          In addition, if, as you speculate, they’ve misdeccribed the actual methodology, their actual methodology is classic Screening Fallacy.

      • Posted Jun 7, 2012 at 11:40 PM | Permalink

        Indeed, it is speculative, and I see from your comment elsewhere after reading Neukom and Gergis, that their methodology is different. It’s a Monte Carlo technique, which should yield results in accord with your program, except for a step which they interpolate with loess smoothing to eliminate high frequencies.

        I think that may be important, so I’ll refrain altogether from speculating as to whether their method is correctly applied.

        • Posted Jun 8, 2012 at 12:01 AM | Permalink

          But not ask the authors? Wouldn’t you have much more chance of obtaining a candid answer than Steve? Or does the good of the science in fact not matter that much to you, only the argument on CA?

        • HaroldW
          Posted Jun 8, 2012 at 5:36 AM | Permalink

          Nick –
          I also had noted that comment about Neukom and Gergis 2011. [It’s here, for reference.] Having read that, I tried applying a 10-year loess filter (as described in that paper) to the detrended data, and recomputed correlations — both normal and Spearman (used in N&G 2011). The correlations remain too low, and although I didn’t (couldn’t!) convert to p-values, they clearly were not going to produce a significant relationship in many of the proxies. E.g. for the Oroko proxy, both correlation coefficients had an absolute value below .01.
          So, I don’t think that’s what Gergis 2012 did. As long as we’re speculating, here’s mine — no detrending but using Spearman correlation. I don’t know how to convert Spearman coefficients to p-values, but Wiki says that t= r*sqrt((N-2)/(1-r*r)) is approximately t-distributed, where r is the Spearman correlation coefficient. Applying this formula yields |t|>2 for all 27 selected proxies.

        • Posted Jun 8, 2012 at 6:58 PM | Permalink

          And here’s my speculation: Nick did contact the authors, helping to trigger Karol’s letter to Steve, and Nick was not responsible for the improbable ‘also’. Add that to the data analysis on this thread and CA may have a new friend/hero/honest broker (delete as appropriate). It’s only speculation, mind.

  2. Steve McIntyre
    Posted Jun 6, 2012 at 12:33 PM | Permalink

    Jean S reported the following correlations:

    The detrended correlations I get are:

    Mt.Read Oroko Buckleyâ..s.Chance Palmyra Celery.Top.Pine.East Kauri Pink.Pine.South.Island.composite Mangawhero Urewera
    [1,] 0.3652164 0.04252273 0.2291246 -0.1843481 0.1821315 0.3158175 0.1725652 0.1186184 -0.179043
    North.Island_LIBI_Composite_1 Takapari Stewart_Island_HABI_composite Fiji_AB New_Caledonia NI_LIBI_Composite_2 Rarotonga Vostok_d18O
    [1,] 0.2301791 0.2386844 0.1397027 -0.3171772 0.03182531 0.1423895 -0.3007384 0.307115
    Vostok_Accumulation Fiji_1F Bali Abrolhos Maiana Bunaken Rarotonga.3R Ningaloo Madang Laing
    [1,] 0.2045613 -0.1540278 -0.04817993 -0.08125016 -0.03874459 -0.3797934 -0.1161867 -0.331931 -0.0960036 -0.4123975

    The normal correlations are:

    Mt.Read Oroko Buckleyâ..s.Chance Palmyra Celery.Top.Pine.East Kauri Pink.Pine.South.Island.composite Mangawhero Urewera
    [1,] 0.6142767 0.2867239 0.4435069 -0.4002756 0.4395454 0.5258422 0.4300725 0.4595843 -0.3381076
    North.Island_LIBI_Composite_1 Takapari Stewart_Island_HABI_composite Fiji_AB New_Caledonia NI_LIBI_Composite_2 Rarotonga Vostok_d18O
    [1,] 0.356136 0.5857284 0.3634118 -0.4647431 -0.2502446 0.4129373 -0.5295329 0.4260891
    Vostok_Accumulation Fiji_1F Bali Abrolhos Maiana Bunaken Rarotonga.3R Ningaloo Madang Laing
    [1,] 0.3188423 -0.3046171 -0.2993104 -0.2748438 -0.3156338 -0.3665768 -0.4709854 -0.4843583 -0.2200733 -0.425169

    • Jean S
      Posted Jun 6, 2012 at 1:47 PM | Permalink

      Re: Steve McIntyre (Jun 6 12:33),
      I think it is additionally worth pointing out that no matter what was their true correlation calculation procedure they seem to be using a two-sided test. That leads, for instance, as HaroldW observed, to the situation that “Urewera” (a NZ tree ring width chronology) has passed screening with negative correlation to the target instrumental. All other tree ring series exhibit positive correlation. The situation is even “funnier” in Neukom & Gergis (2011), where, e.g., different composites of the same species (spatial separation) are determined to be “significant” with opposite signs (see ALT composite 2 & 3 w.r.t. NINO3.4 in Table 5). I’m still waiting Jim Bouldin to handwave me an explanation.

      • Steve McIntyre
        Posted Jun 6, 2012 at 3:03 PM | Permalink

        Re: Jean S (Jun 6 13:47),

        Jean S,
        here’s something in Neukom and Gergis 2011 on their “methodology”

        Spearman correlation coefficients between proxy records and all instrumental data were calculated over the 1901–2000 period (or where both series have overlapping data) after linearly detrending both the proxy and instrumental records over the twentieth century. The detrending was applied to remove long-term trends in the data that can inflate apparent statistical significance. Correlations were then recalculated after applying a decadal ‘loess’ filter that removed all frequencies higher than ten years. All 5% significance levels used herein were estimated based on Monte Carlo estimates from 3000 correlation coefficients
        of AR(1) noise time series with the same length and lag-1 autocorrelations as the original (filtered and unfiltered) proxy and instrumental time series.

        🙂 🙂

        Why use a methodology with known properties? What a Team!

        • Espen
          Posted Jun 6, 2012 at 3:15 PM | Permalink

          Do they have any rationale for using non-parametric methods (Spearman correlation) here?

        • Craig Loehle
          Posted Jun 7, 2012 at 9:02 AM | Permalink

          Since the warming of the 20th century is a major part of the temperature signal they are trying to match with the proxies, detrending it out makes no sense.
          Dollars to donuts, when they tried the correlation without the loess filter they got zip.
          The whole procedure lacks a test on known data to see how it performs.

        • RomanM
          Posted Jun 7, 2012 at 10:37 AM | Permalink

          Actually, if linear de-trending is used, it does make some sense.

          The correlation of the two residual series from the regressions of each of proxy and temperature on time is equal to the partial correlation coefficient of proxy and temperature when the effect of the time variable has been removed. However, this means that you are looking at the correlation of two series with possibly different auto-correlation structures that needs to be properly interpreted for determining p-values.

          Here is a short script for calculating simple partial correlations:

          part.cor = function(vmat, cortype = "complete.obs"){
          # Handles either of two situations:
          # When vmat is a vector of three correlations, c(r12,r13,r23), calculates partial corr of vars 1 and 2 given 3
          # OR when vmat is three column matrix, calculates partial corr of columns 1 and 2 given column 3.
          cormat = NULL
          if (!is.null(dim(vmat))) {
          cormat = cor(vmat,use = cortype)
          corrs = c(cormat[2,1],cormat[3,1],cormat[3,2])} else { corrs = vmat}
          parcor = (corrs[1] - corrs[2]*corrs[3]) / sqrt((1 - corrs[2]^2)*(1-corrs[3]^2))
          list(part.cor = parcor, cor.mat = cormat)}

          Steve: very cool.

        • Posted Jun 7, 2012 at 12:15 PM | Permalink

          Wow, from Roman’s link;

          “In fact, the first-order partial correlation is nothing else than a difference between a correlation and the product of the removable correlations divided by the product of the coefficients of alienation of the removable correlations.”

          ‘Coefficient of Alienation’

          Who says statisticians can’t tell a joke?

        • clt510
          Posted Jun 8, 2012 at 12:21 AM | Permalink

          Espen, sorry for the late response, but you can use Spearman instead of Pearson (or I do anyway) when you don’t have a linear relationship, but the relationship remains monotonic. Wiki hits on this topic here. There’s more detail here.

          If the relationship is linear there’s little difference between the two methods.

          I assume they Loess filter to removing high frequency since it’s a standard method for low-pass filtering irregularly sampled data.

          I think I understand why they don’t give enough details to replicate their paper too–the mystery generates interest!


        • Espen
          Posted Jun 8, 2012 at 2:36 AM | Permalink

          clt510, thanks for the helpful response!. I was aware of when and how to use Spearman, although my knowledge of this is getting a bit rusty since it’s been 27 years since I did this kind of analysis. I’m still puzzled, though, because if you want to do principal component analysis and/or regression analysis, as these authors do, in the end you have to assume anyway that your variables are normally distributed and (in the case of regression) that the relationship is linear (or provide transformations so that the variables meet these requirements). I mean, it goes without saying that monotonicity isn’t sufficient if you want to use your analysis to actually estimate numeric values of one variable from the values of other variables 🙂

        • clt510
          Posted Jun 8, 2012 at 9:37 AM | Permalink

          Espen, to You use regression or PCI on non-normal data. I believe the assumption of normal distribution only enters if you use it to estimate the significance of your trend. (When they are estimating significance from the t-test, they are clearly assuming normality.)

          If the errors in the measurement aren’t normal but you can estimate their true distribution, that’s a place where Monte Carlo analysis shines.

      • Espen
        Posted Jun 6, 2012 at 3:11 PM | Permalink

        Upside Down Urewera – the Down Under Tiljander!

        I can’t wait till you guys start looking at the verification of the proxies they actually use, their “ensemble median RE” in Figure 3 is hard to give any meaningful interpretation, and the RE against the “early verification period” looks suspiciously low before 1600. I wouldn’t be surprised if this reconstruction may have some merit post 1600, but that the earlier part has very little value.

        • Robert L
          Posted Jun 7, 2012 at 7:24 AM | Permalink

          Urewera actually means “hot penis” in Maori (it’s a volcanic region with hot pools named for a Maori chief who burned himself rather badly). Maybe his penis isn’t the only thing that has been cooked in this case.

        • Alexej Buergin
          Posted Jun 7, 2012 at 10:44 AM | Permalink

          Espen: Since one of the co-authors is from the university of Bern, he, being down under, is currently upside down.
          Drake: That is some other guy from the university of Bern.

        • Posted Jun 7, 2012 at 2:55 PM | Permalink

          Sorry for the dangling reference Alexej. I was snipped in a painful area. I should have been more circumspect.

    • Paul Matthews
      Posted Jun 6, 2012 at 4:38 PM | Permalink

      So the ‘normal correlations’, which I assume means without de-trending (?) are generally much larger, 0.2 – 0.6, than the de-trended correlations, 0.03 – 0.4. Is the implication that they ‘forgot’ to de-trend, despite what the paper says, and therefore the usual proxy selection “bogus hockey stick effect” does apply?

      What results does Steve’s script give without de-trending?

  3. Posted Jun 6, 2012 at 12:39 PM | Permalink

    These “passing” “close” proxies may seem passing strange
    To get the “right” answers, be loose with your range!
    A graph of the whole 62 might well preface it
    To show how selection here runs at a deficit

    ===|==============/ Keith DeHavelle

    • Posted Jun 6, 2012 at 3:49 PM | Permalink

      Paleo partners in rhyme
      Make series of proxies in time
      They don’t speak too well
      As good poets can tell
      But boy can they do a good mime

      • Ben
        Posted Jun 7, 2012 at 12:18 PM | Permalink

        There once was a lass from down under
        Who relished statistical blunder
        What should make her blush,
        only led to a hush…
        No worries! Pols still won’t defund her

  4. Kenneth Fritsch
    Posted Jun 6, 2012 at 12:43 PM | Permalink

    JeanS, where did you obtain the Gergis proxy series and in the form required for doing correlations? I am in the process of converting these proxy data to proper form from the NOAA repository. Tree ring width data need to be extracted from the tables provided and the monthly/seasonal coral data have to be reduced to SONDJF data. Is the Gergis data as the authors used it available somewhere?

    • Kenneth Fritsch
      Posted Jun 6, 2012 at 12:46 PM | Permalink

      I see a link above in SteveM’s code for the Gergis data. When did these data become available? It is like I have been sleeping and missed an important development here.

      • Jean S
        Posted Jun 6, 2012 at 1:10 PM | Permalink

        Re: Kenneth Fritsch (Jun 6 12:46),
        I already gave the link in the other thread, but I guess you didn’t follow it 😉 According to the time stamp in the NOAA ftp site, data became available on May 24th. As stated a few times, only 27 (out of 62) that “were selected for analysis” (see the actual quote from the post) are available.

        • Steve McIntyre
          Posted Jun 6, 2012 at 1:20 PM | Permalink

          Re: Jean S (Jun 6 13:10),

          I’ve examined the source Neukom and Gergis population to try to locate the “62”. I was able to locate 57. Whether 62 is an error, the box is misreported or I’ve missed something, I don’t know.

          Many of the 57-27=30 are at NOAA – especially the coral series.

          The main gaps re Gergis et al 2012 are dendro chronologies and dendro measurement data. There are many many unarchived series listed in Neukom and Gergis 2011 in other SH areas.

        • Kenneth Fritsch
          Posted Jun 6, 2012 at 3:09 PM | Permalink

          JeanS and SteveM, sorry for my confusion as it was the 62-27 or 35 proxies that were in question between SteveM and Gergis. I did not realize that the 27 proxies were available in Gergis form. I might still confirm that I can get the same final Gergis form data starting with the NOAA repository.

          Keeping score with Mann(08) and Gergis(12): For Mann (08), we have approximately a p=<0.13, a cull rate for selection of 75 % and an average r of 0.22, while for Gergis we have a p=<0.05, a cull rate of 6/62 or about 90% and an average r =0.40 (not detrended) and r=0.20 (detrended) with r for Gergis averaged as absolute values.

          I think we are seeing more of the same old same old. What does surprise me is not so much what these authors are not forthcoming about but what they present (found with some digging I admit) and with which expect the reader to be impressed.

          Interesting for me is that the Gergis proxy correlations got worse on average when the series were detrended, but Gergis used detrended for pre-selection. For a proxy to be a good thermometer I would think that getting trends right would be more important than inter annual correlations.

          I find that the ratio of high frequency amplitude to that from lower frequencies in two series is what determines whether the two series can have a very different trends and yet have good higher frequency correlation. With a high ratio the higher frequency correlation (detrended or not) will not say much about trend differences in the series. In the case of a low ratio the data in both series defines a less noisy trend and any trend difference should be more easily seen in the low frequency correlation provided, of course, that the data series are not detrended.

        • Steve McIntyre
          Posted Jun 6, 2012 at 3:51 PM | Permalink

          Kenneth, detrending was a dispute between Mann and von Storch-Zorita, with the core Team arguing that not deterending was the only “right” way of doing things on that particular occasion. In the dispute with us, Wahl and Ammann argued that “high frequency” correlations were not of any interest in paleo reconstructions.

          It’s for them to keep all their stories straight.

        • Kenneth Fritsch
          Posted Jun 6, 2012 at 4:32 PM | Permalink

          “Kenneth, detrending was a dispute between Mann and von Storch-Zorita, with the core Team arguing that not deterending was the only “right” way of doing things on that particular occasion. In the dispute with us, Wahl and Ammann argued that “high frequency” correlations were not of any interest in paleo reconstructions.”

          Interesting then why Mann and Gergis use high frequency correlations in their pre-selections processes. Does any of this have to do with using r versus RE and CE for reconstruction calibration and verification? And further would the use of RE and CE protect against the problem of two series with excellent high frequency correlations having significantly different trends? Obviously this could depend on the amplitudes of high frequency and low frequency parts of the series. I am thinking that the use of RE, CE and high frequency r would all allow a divergent series to “pass” these test. Time for me to go to R to simulate some series to look at.

          Most of my observations come with working with monthly temperature series and calculating correlations with closest neighbor stations and trends of the difference series.

        • Nicholas
          Posted Jun 6, 2012 at 9:24 PM | Permalink

          So what are the correlations like for the 30 series which were not used? Better or worse than those they did use?

  5. Kenneth Fritsch
    Posted Jun 6, 2012 at 1:03 PM | Permalink

    “Only six of the 27 proxies have t-values exceeding +-2.”

    In Mann(08) the screening that the authors estimated to be for p <=0.13 (AR1 corrected) weeded out all but some 30 odd percent of the 1200 plus proxies. When all the proxies that should not have been used in the screening (not considering that screening is legitimate in the first place) like the instrumental, the altered MXD and Tiljander series that weeding out left something between 20% to 25% as I recall. The average correlation to instrumental temperature of those left after pre-selection in the TRW proxies was 0.20 as I recall.

    Qne cannot forget here that these 27 proxies were screened from a larger population.

  6. RomanM
    Posted Jun 6, 2012 at 1:21 PM | Permalink

    What’s with the Urewera tree ring series?

    It is suffering from a serious case of “decline” and an upside-down U-shaped look to it. It seems to be the only tree series negatively correlated with temperature.

    • Posted Jun 6, 2012 at 2:25 PM | Permalink

      As I proposed over at, “wrong-signed” proxies like Urewera can be handled in a PC framework by first sorting the 62 proxy universe by type, then computing PCs for each type, then splitting each PC into its + and – components, and then sorting the split PCs by the amount of total variance each acounts for.

      Then even if the positive half of the first TR PC, TRPC1+, is picking up temperature with the right sign, Urewera will most likely be in TRPC1- instead, and since it’s exceptional will be way down the list of split TRPCs, and likely not be jointly significant with the split TRPCs that precede it in this list. A reasonable reconstruction would then not include its “wrong way” signal. Even if it were jointly significant, one might still reject it on prior grounds, while retaining TRPC1+.

      • Steve McIntyre
        Posted Jun 6, 2012 at 2:56 PM | Permalink

        if one doesn’t know the sign of a proxy ex ante, then I don’t think one can use it as “proxy” based merely on correlation.

        In the case of the corals, the negative correlations are consistent. But rather than doing a PC calculation and permit arbitrary flipping, I think that the series should be given the correct orientation ex ante and NOT do a procedure which permits negative weights.

        • MikeN
          Posted Jun 6, 2012 at 6:38 PM | Permalink

          I never understood the point of Mann’s 3rd sentence in his reply to McIntyre.
          Screening, when used, employed one-sided tests only when a definite sign could be a priori reasoned on physical grounds.

          I would think this should be done in all cases, as it was with Tiljander. Nevertheless, the implication of this sentence is that it was not used with regards to Tiljander, allowing the bizarre argument to proceed.

        • Geoff Sherrington
          Posted Jun 6, 2012 at 11:28 PM | Permalink

          Correlation is neater when supported by plausible mechanism, shown by prior controlled experiments. This Gergis paper has some assumptions that contradict earlier observations by others. I’m still looking for explanations.

        • Posted Jun 7, 2012 at 1:32 PM | Permalink

          I doubt that Vostok Accumulation has any a priori sign relative to temperature.

          When sorting proxies by type, Vostok d18O and Vostok Accumulation would be different types, even though they are measurements from the same ice core.

          By including Ice Core Accumulation as a proxy type, Gergis et al are unnecessarily increasing the numerator degrees of freedom of any test of the joint significance of their network, even if they reduce each type to a single PC (or a single pair of signed PCs).

    • jitthacker
      Posted Jun 6, 2012 at 3:37 PM | Permalink

      In Xiong and Palmer (2000), Urewera is positively correlated with Mangawhero and Takapari at p<0.01. How then can their correlations with temp be opposite in sign? (I know the correlations are weak, but still.)

      Or have those series been updated?

      • jitthacker
        Posted Jun 6, 2012 at 3:40 PM | Permalink

        By the way, all three series are North Island, NZ, and at similar altitude (800-1000m).

      • Kenneth Fritsch
        Posted Jun 6, 2012 at 4:58 PM | Permalink

        jitthacker, you are perhaps looking at a different and longer time period than the instrumental period of 1920-1990 used in this case. The other two proxies do not correlate positively with Urewera in the 1920-1990 period using the Gergis data. Perhaps some divergence going on there.

  7. Rogelio Escobar
    Posted Jun 6, 2012 at 3:22 PM | Permalink

    These people have found highly a significant MWP in the SH

    A MWP “climate anomaly” is conceded in AR5. I wouldnt get too excited about an individual study. This study has some extra interest because it appears to be right within the heart of the South American tree ring chronologies indexed in Neukom and Gergis 2011.

  8. NicL
    Posted Jun 6, 2012 at 4:28 PM | Permalink

    May I ask the experts present a question about the N_eff= N*(1 – r)/(1 + r) correction factor for autocorrelation, per the 1952 formula by Quenouille, that Steve employs in his t_adj function?

    I can see the logic of deducting the two estimated parameters (intercept and slope) after making the AR1 correction, rather than before. But doing so can result in a negative adjusted DoF.

    This is not just of academic interest. It applies, for instance, to the Levitus 2005 trend in 0-3000m global ocean pentadal-mean heat content (AR1 coef=0.9, N=45), and presumably to the more recent darasets also. On that basis the ocean heat content trend is completely meaningless. (Even deducting 2 before rather than after the AR1 correction, the trend isn’t significant.)

    Is there any better correction method to use when autocorrelation is very high?

    Apologies that this is a bit off topic.

    • Kenneth Fritsch
      Posted Jun 6, 2012 at 4:41 PM | Permalink

      “May I ask the experts present a question about the N_eff= N*(1 – r)/(1 + r) correction factor for autocorrelation, per the 1952 formula by Quenouille, that Steve employs in his t_adj function?”

      Not an expert, but did not MM use a different method in a paper or submitted paper on the tropical troposphere. Lucia at the Black Board used a formula different than Quenouille, but I cannot recall its name.

    • Posted Jun 6, 2012 at 9:52 PM | Permalink

      I’ve described what I think is a more general method here. Though Quenouille is pretty good.

      I think the problem for r close to 1 is more to do with the finite data available to determine it. I think also the basis for choosing the AR1 model is that it is a kind of first order approx likely to be good for low r. Situations where you have r close to 1 and AR1 is still good would, I think, be rather special.

      • clt510
        Posted Jun 7, 2012 at 12:01 AM | Permalink

        Nick, I’ve read through your treatment. One question that came to my mind is AR(1) is usually an approximation of the real noise model.

        Have you done any tests to see whether you get an improvement in the trend estimate when you have “realistic” climate noise, and you’re fitting to a linear model + AR(1)? I’m assuming that for most noise sources, the main effect of the noise is to increase the uncertainty in the estimate of the trend (that is there is no net bias, is that true?).

        It’s been a long day so I apologize if this isn’t completely clear.

        • Posted Jun 7, 2012 at 1:03 AM | Permalink

          I’ve only fitted AR(n) models, and in fact only AR1 and AR2. There’s no problem in doing that – you get 3 extra regressors for AR2 etc. You can use other linear fitting models in the linear algebra.

          It’s true that the main effect of the noise is to increase the uncertainty of the trend. But it does alter the estimated trend a little. One of my motivations was to determine how much, since the Quenouille method doesn’t correct the trend at all.

          I did promise a follow-up, nut got sucked back into the climate plotter. I’ll get back to it, and hopefully post a more general code.

        • clt510
          Posted Jun 7, 2012 at 9:49 PM | Permalink

          Hi Nick, That’s me Carrick. (The blog software playing games with name tags again.)

          I suppose I should code that one up and test it.

          There are various reasons for including noise models in the inverse solution. I’m sure you’re familiar with this but basically there are the justifications I’ve seen 1) improved SNR, 2) reduction in bias, 3) [less often] more reliable estimate of the uncertainty. The biggest pitfall is introduction of bias into your trend estimate.

          So it seems to me that this is the sort of thing you’d need to test before favoring this approach over e.g. Quenouille.

          My personal preference as you probably know is a spectral Monte Carlo based approach.

        • clt510
          Posted Jun 7, 2012 at 9:55 PM | Permalink

          Hi Nick, that’s me (Carrick) … wordpress keeps insisting on calling me by my login name. (And this is a repost so if there are duplicates I apologize).

          I think the usual argument favoring modeling noise involves a) improving SNR, b) reducing bias in the estimate or c) improving the reliability of the estimate of the uncertainty in the central value. On the negative, it can generate a net bias in the estimate of the central value, so it has to be used with care.

          Anyway it’d be interesting to see a post that factors all of this in.


        • Posted Jun 8, 2012 at 12:02 AM | Permalink

          In essence, my approach says that with an ARMA noise model, you can perform a general regression on a few lagged variables which will lead to a (biquadratic) polynomial equation also in a few parameters, with coefficients determined my the usual kinds of summed products. The Quenouille correction then approximates that process (well), but my contention is that there is little cost in doing it exactly.

          You need some such noise model to get get such a reduction in variables.

        • clt510
          Posted Jun 8, 2012 at 2:00 AM | Permalink

          Yes I understand that. Again, the questions are what I listed above… does it introduce bias (or perhaps reduce it), does it reduce the variance (increase SNR), does it provide a better model of the uncertainty in the central value? I could probably add how sensitive is it to the assumed model for noise? I

          I think these are what any new method needs to explore… e.g.. what is needed to validate the method, to determine whether there is really a benefit of doing it “right”.

          (I’ve been burned enough many times with “more exact” methods, to know to be cautious. I suspect you have too.)

        • Posted Jun 8, 2012 at 5:17 AM | Permalink

          I claim that my part of the method is not “more exact”, it’s exact. It just sets up the linear equations for the multiple regression that I think all methods try to solve. You regress the time series against time and a set of lagged values, constrained by the AR(p) model. So I don’t think there is a question of introducing bias. The novelty (if any) is the use of Newton-Raphson iteration to solve the resulting equations, rather than approximating.

          Hu described here how to set up the full matrix for an AR(1) system and solve it with a big numbercrunch. I claim to be solving exactly the same system with a small numbercrunch. The O(N) ops are the same as Quenouille; the rest is just a set of O(3) ops (though there are several).

          I guess I should demonstrate that equivalence numerically.

        • Posted Jun 8, 2012 at 8:08 AM | Permalink

          Nick —

          Hu described here how to set up the full matrix for an AR(1) system and solve it with a big numbercrunch. I claim to be solving exactly the same system with a small numbercrunch.

          Thanks — I’ll have to take a look at it. With N = 100 and modern computers, O(N^2) is trivial, but with monthly or spatial or millenial data it could start to be a problem. I see how the known Cholesky decomposition of the AR(1) covariance matrix could cut the computation time of my “sandwich” matrix in half, but not how to get it down to O(N).

          (For some reason, WordPress is not giving a “reply” button on higher order comments and replies, so I’m having to reply to a comment way back up the chain.)

        • clt510
          Posted Jun 8, 2012 at 10:02 AM | Permalink

          Nick I must not be making myself very clear.

          It’s an “exact” treatment of an assumed noise model that probably doesn’t represent the true model. That’s the part that really worries me, and needs to be checked. How sensitive is your analysis to the assumed noise model, and does using the model introduce bias in your trend estimate if the noise model is wrong.

          When you apply more sophisticated least-squares models, it may not be unbiased and it may not reduce the variance in the estimate of the trend. More exact in one sense doesn’t make it “better” if it introduces a bias in the mean, or otherwise fails to improve the analysis.

          I’m not worried about the equivalence of yours and Hu’s algorithm, I see how they are equivalent. I’m interested in whether they are better than ignoring the presence of autocorrelation in the fit of the model, but using a measure of the autocorrelation in the estimation of the uncertainty in the trend.

          In a Monte Carlo based test (where you know the true slope, let’s call it m_true), what you will have in general is

          m_nick = (1 + epsilon) m_true ± sigma_nick

          where epsilon is a measure of your bias and sigma_nick is the p < 0.05 error in your estimate (e.g., 2-standard deviations). Assume you've done a large number of trials, so you have reasonably good accuracy (say a part in 1e3) in epsilon and sigma_nick.

          The questions for method validation is whether epsilon = 0 and whether sigma_nick/sigma_ols < 1.

          I claim that using OLS with y = m x + b to determine m_ols (and neglecting serial correlation) you'll find

          m_ols = (1 + epsilon_ols) m_true ± sigma_ols_corrected,

          where sigma_ols_corrected is corrected for serial correlation (by what ever method you choose) that epsilon_ols = 0.

          If this fails to explain what I'm referring to, I'll wait to I see a cleaned up code on your website and do the Monte Carlo myself.

          I can also provide you a Monte Carlo model that you can run in R (spectral based noise model that seems to give a reasonable description of climate noise in the monthly temperature series) if you prefer to do your own tests.


        • Posted Jun 8, 2012 at 1:19 PM | Permalink

          Nick —
          I’ve taken a look at your autocorrelation page at . How is what you’re doing different in its result from Cochrane-Orcutt?

          Cochrane-Orcutt (or its refined Praiss-Winston modification) is a “Feasible GLS” approach, in that it reestimates the regression coefficients optimally given the autoregressive structure. But since the autoregressive structure is only being estimated from the residuals, it is sometimes safer just to use OLS coefficients and then try to correct the standard errors, as in Quenouille, my CA at, and my “Moment Ratio Estimator” approach at . The latter paper shows that the popular Newey-West HAC standard errors are often inadequate.

        • Posted Jun 8, 2012 at 9:39 PM | Permalink

          Yes, I think you’re right – in the case of AR(1) it reduces to Cochrane-Orcutt. But it does do higher order cases.

          I see too on reflection that it isn’t the same as your method for AR(1), as I had thought, since you don’t modify beta.

        • Posted Jun 10, 2012 at 7:19 AM | Permalink

          Nic —
          Your procedure may still be faster than the Cochrane-Orcutt iterative algorithm, since (I think) you are estimating the regression coefficients and autoregressive structure simultaneously with Newton-Raphson, while CORC estimates the one given the other and vice versa to convergence.

        • Posted Jun 10, 2012 at 8:02 AM | Permalink

          Yes, it is simultaneous. I’ll be writing a new post shortly with a simpler version of the code useful for any AR(n). I’ve been checking it against C-O; it’s very similar in the error measures, but the actual trend isn’t the same. When I’ve resolved that I’ll post.

    • NicL
      Posted Jun 7, 2012 at 6:41 AM | Permalink

      Nick and Kenneth
      Thank you for your helpful comments and references about adjustment for autocorrelation.

      • NicL
        Posted Jun 8, 2012 at 3:20 PM | Permalink

        “Hu described here how to set up the full matrix for an AR(1) system and solve it with a big numbercrunch.”

        I’ve now used the method described in your 2009 post, using the G matrix, on the Levitus ocean heat dataset. It gives a significantly different result from using the Quenouille formula, presumably because N=37 and an AR1 coefficient of 0.9 is too demanding for a large sample limit based approximation.

        I know you say that the method you set out is standard, but I haven’t been able to find it in any of my textbooks. Can you give me a reference to any published paper or textbook setting it out, please?

        Could I ask you to clarify a couple of other points:

        a) is the reduction of the effective degrees of freedom using your G matrix method the same as with the Quenouille formula, being the square of the increase in standard error? N_eff is needed for the t-distribution.

        b) With your method, how should one allow for uncertainty in the raw data values, separate from the that in the statistical fit (regression residuals). Levitus gives uncertainties for the annual ocean heat content pentadal-average figures. I’m have no way of testing if these are themelves autocorrelated, but I think logically they must be.


  9. Howling Winds
    Posted Jun 6, 2012 at 8:17 PM | Permalink

    I wonder if someone could answer what might be well known already, but not to someone outside of the field. What criteria is used when choosing the trees themselves? Why trees in New Zealand as opposed to trees in Iowa, for example?

    • Duster
      Posted Jun 6, 2012 at 11:45 PM | Permalink

      If you are hypothesizing, for example, that the Medieval Warm Period was a regional or hemispheric phenomenon, then traveling far way from where you believe the phenomenon is centered and collecting data there would help test the hypothesis. The null hypothesis would be that the MWP was global. Evidence of MWP at a remote location from your presumed regional or hemispheric limits would fail to confirm your initial hypothesis and indicate that the null hypothesis remains the preferred view. So, if you think that the MWP is limited to the northern hemisphere, tree rings from Idaho would not be a test, while rings from New Zealand would be. If the MWP was hemispheric, there should be no evidence of it New Zealand. You want note however that tree rings are not good thermoeters so really you would want different proxies.

    • Craig Loehle
      Posted Jun 7, 2012 at 9:31 AM | Permalink

      The idea is to pick trees that are temperature limited rather than moisture limited. However, the choice of high elevation are high latitude trees does not always accomplish this. Mountain soils are thin and may dry out in drier years, leading to a spike in stress. Far northern and mountain sites can get confounding effects from snow depth. Boreal sites can get water-logged when it is warmer and permafrost melts, and trees don’t like that. So I view the choices of sites as subject to considerable error.

  10. William Larson
    Posted Jun 6, 2012 at 10:45 PM | Permalink

    Based on Mr. McIntyre’s graph above, with all the coral series having negative correlations (except, of course, for the one positive near-zero), are Gergis et alii claiming somewhere that all the coral data have been heretofore misinterpreted? Am I a dunce, or is not the onus on them to show/prove that the sign of each is actually positive before it can be considered a temperature proxy?

    Steve: Coral O18 data is presumed to be negatively correlated. Nothing to see here. But I’d prefer that the orientation be set up ex ante rather than calculated.

  11. Oakwood
    Posted Jun 6, 2012 at 11:44 PM | Permalink

    Is it possible to show graphs of the seleected proxies, or a link? I suspect this would be enlightening.

  12. ferd berple
    Posted Jun 7, 2012 at 12:32 AM | Permalink

    So some proxies accidental match the global averages, even though they all respond regionally. This then makes them likely to predict outside the selection range? Nonsense.

    One might as well argue that by taking the states that performed closest to the national average over the past 10 years, they would be a better proxy for the national average than taking all the states.

    All that the screening process is doing is to increase the noise, by selecting proxies that accidentally correspond while rejecting those that accidentally don’t correspond, all as a result of the very high noise to signal ratio in the underlying data.

  13. cd
    Posted Jun 7, 2012 at 7:01 AM | Permalink

    Steve (and/or others)

    Can you explain to me why there is a need to detrend the data. Surely if you’re interested in “long term proxies” it’d be more desirable to use the raw control data; by removing the trend you’re assessing the correlation at higher frequencies and similarly could force correlation where the proxy is sensitive to short term changes in temperature but not long term (upper and lower bounds as is the case in some tree-types). Conversely proxies that respond to low frequency changes may lose any signature and becoming stationary.

    • Steveta_uk
      Posted Jun 7, 2012 at 7:28 AM | Permalink

      cd, a tree grows during a growing season, typically 3 months of the year. The average temperature over the last decade is going to have little impact to how much the tree grows this year.

      • johanna
        Posted Jun 7, 2012 at 7:48 PM | Permalink

        That’s not always the case, at all. For a start, some trees in the tropics grow all year round – the ‘growing season’ of trees varies from a few weeks a year to constant depending on the species and the location. The length of the growing season is also affected by precipitation changes and local events such as the tree next door falling over, thereby letting in more light.

        Why, oh why, do the tree ring enthusiasts refuse to properly engage with biologists and foresters before they open their mouths?

        • johanna
          Posted Jun 8, 2012 at 8:48 AM | Permalink

          snip – no need to editorialize about tree rings. there’s been lots.

    • EdeF
      Posted Jun 7, 2012 at 8:33 AM | Permalink


      Trees grow more when they are young, ergo larger rings, and slow down as they get older,
      thus smaller rings. The detrending is an attempt to take out that bias. Otherwise, tree ring data would seem to be much warmer in the past (assuming wide tree rings equate to
      higher temps).

      • cd
        Posted Jun 7, 2012 at 8:45 AM | Permalink

        Cheers chaps.

        Still I have heard it said many times in relation to stable isotope proxies as well. For the life of me I can’t figure this one out either? If there is concordant drift in the proxy data then why assume, unlike the instrumental temperature record, that this is not a real temperature-controlled trend?

      • Ben
        Posted Jun 7, 2012 at 2:11 PM | Permalink

        It is not necessarily true that a narrower ring means slower growth. It would be surprising if ring widths were identical, as this would infer that the tree goes faster as it gets older, because an identical width is deposited over a longer circumference.

        The area of each ring can be approximated by Pi*(r2^2 – r1^2), more exactly by a line integral or arc length integration. Either method should correlate better than ring width.

        Should’nt area occupied by the ring be used instead of ring width?

        No detrending necessary…

        • Craig Loehle
          Posted Jun 7, 2012 at 3:33 PM | Permalink

          I agree completely but because it gets hard to do this with irregular cross-section trunks (not perfectly round) they use the exponential decay function. It is also the case that for some species the constant basal area increment rule does not hold.

        • Posted Jun 7, 2012 at 4:07 PM | Permalink

          It’s not that simple unfortunately. The change of width with size is dependent on both biological and geometric effects. The geometrical effects you describe are correctable by math alone, (but only perfectly if you have an estimate of the rings missed near the pith, which are often not included), while the biological effects are not. Biological effects are the potentially varying production, and allocation, of the products of photosynthesis as a function of tree age/size. The irregular cross section issue is relatively minor given that two cores, often at right angles, are taken from most trees, allowing for a correction from non-circularity.

          So using ring area (= basal area increment) doesn’t necessarily obviate the need to detrend each series. In fact, it will rarely do so, if you look at typical ring size variations in the ITRDB data set.

    • Posted Jun 7, 2012 at 4:04 PM | Permalink


      Posted Jun 7, 2012 at 7:01 AM

      Steve (and/or others)

      Can you explain to me why there is a need to detrend the data. Surely if you’re interested in “long term proxies” it’d be more desirable to use the raw control data;

      I agree — if you are going to cherry pick proxies by pre-screening on correlation with temperature, you get the best “cherry pie” (to use Roseanne D’Arrigo’s phrase) if you correlate directly on temperature rather than detrending first.

      By pre-screening with detrended data, Gergis et al are picking up some of the correlation with temperature itself (since temperature is not a perfect time trend), but not all of it. They could have gotten even “better” results by pre-screening with direct correlations on temperature.

      The variation of tree growth with age mentioned by EdeF is an entirely different issue. These TR proxies have all been run through an “RCS” program that tries to remove the effect of age per se.

      • cd
        Posted Jun 8, 2012 at 4:49 AM | Permalink

        Thanks Hu

        Just spotted your response. I understand the arguments for detrending but to be honest I think you’re solving one set of problems for another. I can understand the problems that can arise when dealing with highly autocorrelated data pairs and assessing correlation between these. But by detrending you’re examining the correlation of the pairs in terms of higher frequency variation. And if you detrend using a multi-level spline then the correlation and the freqnecy of the variation, is partially a function of the level of the spline as well as the variation in the data. AND IN THE END you have to retrend the output for the final reconstruction.

        • Posted Jun 8, 2012 at 7:59 AM | Permalink

          Smoothing the data before running a regression, whether with a spline or Gergis’s loess filter, may improve the correlation, but just generates serial correlation even if there was none to start with.

          A loess filter with a 10-year span is similar in effect to a 6-year MA, and hence reduces the effective sample size by a factor of 6. Ad hoc AR(1) corrections may reduce the distortion this has on standard errors, but it’s generally better just to use the raw data and live with its noisiness.

          The article “Spurious Periodicity in Inappropriately Detrended Time Series,” by Nelson and Kang, Econometrica 1981, shows that when a random walk is “detrended” with a trendline regression, the residuals exhibit strong low-order serial correlation, plus spurious patterns at high-orders. A similar effect may be present in Gergis’s “detrended” series, and may be generating spurious correlations between them.

  14. Posted Jun 7, 2012 at 8:10 AM | Permalink

    Jean S:

    Steve, Roman, or somebody 😉 , what am I doing wrong here?

    Something to learn from that somewhere, Ms Gergis.

  15. Andy
    Posted Jun 7, 2012 at 8:22 AM | Permalink

    I just love the irony of one of the coauthors of this paper being in a video looking smug and declaring that her work is peer reviewed, as if that guarantees it being correct.

  16. Matt Skaggs
    Posted Jun 7, 2012 at 9:08 AM | Permalink

    You are talking about age compensation, that is different from detrending the climate signal derived after age compensation.

    This paper describes why climate data needs to be detrended (and also discusses how detrending can be subjective):

    although I confess that I would have to spend more time on it to fully understand it myself!

    • cd
      Posted Jun 7, 2012 at 9:34 AM | Permalink

      Thanks Matt

      I have had a quick read and still I don’t got it – it could be I’m a bit thick. I understand the argument often made that where we have two strongly autocorrelated datasets (due to drift) correlation between them can be forced. Decomposing the signal into different frequencies enables the user to assess whether there is significant agreement at different resolutions, help identify aliasing issues or even if two different, independant processes are just comming into phase (but you need stationary data to do this). But that is pure analysis, and while it might seem worthwhile surely it doesn’t lend itself to prediction. If you detrend, say with a spline, the “degree of detrending” changes with the level variance (and resulting covariance for two pairs at longer time-periods)about the mean. This is way too subjective.


      if you do then find siginificant agreement between your control and your proxy (after detrending), then you use this relationship to make predictions about temperature in your unknown period, surely you still need to add the trend back into your reconstruction to get the final result(?). If so, and I see this with Kriging, you superficially reduce the uncertainty (in Kriging’s case the Kriging variance) depending on the level of fit – say if you use a variable trend like a spline. Is this not an issue?

      • HaroldW
        Posted Jun 7, 2012 at 12:32 PM | Permalink

        cd –
        My reading of the paper is that the detrended proxies were used only to determine the selection of the proxies (that is, the step which selected 27 of the original 62 proxies). Once the subset of proxies to be used was established, the reconstructions were based the original proxy time series (and not the detrended versions).

    • EdeF
      Posted Jun 7, 2012 at 10:49 AM | Permalink


      Thanks for the reference…….my ignorance of climate science, along with advanced statistics, is vast.

  17. DocMartyn
    Posted Jun 7, 2012 at 3:38 PM | Permalink

    I hate to ask, but does the Law Dome temperature series pass the t-Test?

  18. kim
    Posted Jun 7, 2012 at 9:31 PM | Permalink

    This isn’t just bad, it’s egergis.

    • Steven Mosher
      Posted Jun 7, 2012 at 9:37 PM | Permalink


  19. fastfreddy101
    Posted Jun 8, 2012 at 4:18 AM | Permalink

    Veritas Emergis

  20. Posted Jun 8, 2012 at 4:45 AM | Permalink

    I’ve now looked more carefully through your code and data, and I’ve read the paper of Neukom and Gergis 2011, which describes the methods in more detail. There’s one thing that doesn’t look right to me. You have done a correlation of proxy data with the aggregate Australasian temperature index from 1921 to 1990. This comes from their sheet 2 col D.

    But that is included as the measure (target in their terms) for the overall reconstruction. They don’t say that it was used for calibrating individual proxies, and in my view it would be a surprising choice.

    They clearly have gridded Hadcrut3v information available, and have plotted it in Fid S1. This shows the correlation of grid cells with the mean you have used, and while they say it is good overall, it’s not good everywhere. In fact, using your criterion there are clearly many local thermometer averages which would fail if tested as proxies.

    Surely with local grid information calculated, it is that which they would have used to screen proxies, rather than the region aggregate, which would be expected to have poorer correlation.

    • Jean S
      Posted Jun 8, 2012 at 5:58 AM | Permalink

      Re: Nick Stokes (Jun 8 04:45), (my bold)

      There’s one thing that doesn’t look right to me. You have done a correlation of proxy data with the aggregate Australasian temperature index from 1921 to 1990. This comes from their sheet 2 col D.

      But that is included as the measure (target in their terms) for the overall reconstruction. They don’t say that it was used for calibrating individual proxies,

      English is my third language, but (my bold):

      Only records that were significantly ([pre]p<0.05[/pre]) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

      • Posted Jun 8, 2012 at 6:32 AM | Permalink

        Well, that hangs on whether by target they mean the spatial average, which it’s true they did previously refer to as a target, or just the Hadcrut3v dataset. I have to agree that the earlier usage suggests the mean. On the other hand, it doesn’t make sense to me to use the spatial mean when you have calculated the local values.

        Over at Bishop Hill, Rob Wilson says:
        “Proxy screening, if done at all, should only be done at the local scale – i.e. does a proxy record provide a valid estimate of local scale temperature variability. Screening against a large scale instrumental target “can” lead to an overfitting situation.”

        • Layman Lurker
          Posted Jun 8, 2012 at 9:29 AM | Permalink

          Nick, if the local temperature is the target for calibration, then you would have to consider error in both x and y would you not?

        • Posted Jun 8, 2012 at 5:02 PM | Permalink

          I’m not sure what you mean there, but in the light of Karoly’s email it looks as if they did indeed use the spatial mean temp, as the word target implies. I think local would have been better.

    • Steve McIntyre
      Posted Jun 8, 2012 at 9:54 AM | Permalink

      Surely with local grid information calculated, it is that which they would have used to screen proxies, rather than the region aggregate, which would be expected to have poorer correlation.

      It’s not what they say. As Jean S observed, their paper said:

      Only records that were significantly ([pre]p<0.05[/pre]) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

      Nick, I’m sure that you can speculate endlessly on what they actually did. However, it is evident that they didn’t do what their paper says they did.

      • Steven Mosher
        Posted Jun 8, 2012 at 2:19 PM | Permalink

        I suppose they didnt post code. code describes what is done in code. its the best definition of what was done.
        written text about the code, is not the code. written text about the code, is a claim that needs to
        be audited.
        perhaps this will serve as a lesson to the authors, community, journal editors, etc that providing
        code is best practices.

        ok. done preaching to the choir.

        • Posted Jun 8, 2012 at 2:26 PM | Permalink

          But it needs to be said again and again Steve. The only model of the code that we know matters is the code itself. The only description of the code that we know matters is the code itself. (Other stuff may be useful as you say – or it may not.)

          The old climate approach must no longer be seen as a clever ‘trick’ to hide stuff but as totally stupid as well as unethical. That the IPCC is prepared to take such papers on board (in whatever draft), with a view to influencing world policy, should be seen both by policy makers and public as the utter joke it is.

  21. Kenneth Fritsch
    Posted Jun 8, 2012 at 8:42 AM | Permalink

    Below I have linked the individual Gergis 27 proxy series. The coral proxies were inverted to correspond to temperature. I show a long series of the coral and TRW proxies and a series from 1800 forward for the corral, TRW and ice core proxies. In the 1800 series I have included the HadCRUT temperature for the region that was used in calculating correlations in Gergis (2012).

    Notable to me in viewing these proxies is the large variation in the series structure and direction over time. Even if one were to agree that the pre-selection of proxies were a valid concept and further that the proxies were responding credibly well to temperature, I would think in order to get a representative picture of the Australasia temperatures would require a huge number of proxies.

    It is difficult for me to see how putting these individual series into a PCR calculation is going to produce something that correlates with temperature in the instrumental period with r=0.86 in any manner other than chance – and a careful pre-selection of proxies.

    Next I want to compare the proxy series in the instrumental period to the local temperatures.

    • Posted Jun 8, 2012 at 10:27 AM | Permalink

      Thanks, Ken — It looks like Ni Libi Composite 2 may have a Urewera issue!

    • Kenneth Fritsch
      Posted Jun 8, 2012 at 10:29 AM | Permalink

      The plot of the Australasia HadCRUT temperature anomaly series from Gergis (2012) was rescaled (standardized) to better compare with the proxy series in the links from my previous post. It is linked below. My HadCRUT temperature plots from the previous post are not properly scaled for comparisons.

  22. ferd berple
    Posted Jun 8, 2012 at 8:44 AM | Permalink

    Statisticians may want to have a look at this paper:

    Pielke Sr has posted portions at this site. This quote is interesting:

    “After decreasing over several decades of scale, to a minimum of ≈ +/-0.1 K at around 10–100 yrs, temperature fluctuations begin to increase, ultimately reaching +/-3 to +/-5 K at glacial-interglacial scales.”

    This implies that temperature does not have a constant variance. Most statistical methods assume a constant variance. Thus, conclusions that observed variance is not natural may well be artifacts of a wrong statistical assumption.

  23. Paul Matthews
    Posted Jun 8, 2012 at 9:02 AM | Permalink

    The Gergis et al paper seems to have been withdrawn!
    It is no longer on the AMS website.

    Steve: It’s still online at Melbourne. As you say, it isn’t online at J Climate., but I would not conclude that it’s been withdrawn.

  24. Steve McIntyre
    Posted Jun 8, 2012 at 12:06 PM | Permalink

  25. Paul Matthews
    Posted Jun 8, 2012 at 12:41 PM | Permalink

    Email just sent to Journal of Climate editor:

    Dear Professor Broccoli

    Please could explain why the Journal of Climate paper
    “Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium”
    is no longer available on the journal website?
    Has the paper been withdrawn by the authors?
    It appears that a significant error in the paper was found at the Climate Audit blog.

    • Paul Matthews
      Posted Jun 8, 2012 at 4:50 PM | Permalink

      Somewhat redundant now, but for information, Broccoli replies that he is unable to provide any information, due to journal policy.

  26. Skiphil
    Posted Jun 8, 2012 at 12:58 PM | Permalink

    Does it mean anything that the original PR is still on the U. of Melbourne site? Are they just slower to “update” that site?

    still on University of Melbourne news site as of 13:56 EST

    [emphasis added]

    “Our study revealed that recent warming in a 1000 year context is highly unusual and cannot be explained by natural factors alone, suggesting a strong influence of human-caused climate change in the Australasian region,” she said.

    The study published today in the Journal of Climate will form the Australasian region’s contribution to the 5th IPCC climate change assessment report chapter on past climate.

  27. Skiphil
    Posted Jun 8, 2012 at 2:33 PM | Permalink

    Real Climate has noted the disappearance of the paper at the AMS/Journal of Climate link:

    Gavin updates at Real Climate

    gavin says:
    8 Jun 2012 at 1:57 PM

    “The link to the Gergis et al paper at J. Climate is broken, and the paper no longer appears to be accessible. We have no information as to why or whether it’s temporary or not, but we’ll update if necessary when we have news.”

    • AndyL
      Posted Jun 8, 2012 at 3:08 PM | Permalink

      A few years ago Steve drew favourable attention to a maths paper that was withdrawn after a comemnt in a blog. We may be able to give similar praise to Gergis

    • Posted Jun 8, 2012 at 5:05 PM | Permalink

      Re: Skiphill Jun 8, 2012 at 2:33 PM

      We have no information as to why or whether it’s temporary or not, but we’ll update if necessary when we have news.” [emphasis added -hro]

      Now this is interesting! Gavin’s turns of phrases are sometimes quite puzzling, IMHO. This one certainly makes one wonder what conversations might have transpired behind closed screens. With this in mind … a possible translation from Gavinesque:

      “That damn ClimateAudit has been at it again! We are currently reviewing the best way of modifying things so that we can point out how spurious the claims of the contrarians are; then we can point out that they don’t affect our results in any significant way. Worst case scenario is Plan B: Just ignore ’em and advise IPCC to delete all references to paper as well (for now). It can always be put back in after all AR5 reviews have been completed.”

  28. EdeF
    Posted Jun 8, 2012 at 5:52 PM | Permalink

    I have found a reference to detrending as related to climate studies from a 500-level
    class at U. of Arizona, Spring 2011. I found it to be helpful. This is chapter 7. More
    chapters cover various statistical aspects of climate science such as correlation, filtering, etc.

    Click to access notes_7.pdf

    • Posted Jun 8, 2012 at 7:03 PM | Permalink

      Re: EdeF (Jun 8 17:52), Interesting — maybe it is worth comparing to wavelet analysis for doing the same things. See here:

      In this paper we propose to overcome the problem of spurious regression between fractionally differenced
      processes by applying the discrete wavelet transform (DWT) to both processes and then estimating the
      regression in the wavelet domain. The DWT is known to approximately decorrelate heavily autocorrelated
      processes and, unlike applying a first difference filter, involves a recursive two-step filtering and
      downsampling procedure. We prove the asymptotic normality of the proposed estimator and demonstrate
      via simulation its efficacy in finite samples.

  29. dp
    Posted Jun 9, 2012 at 11:58 AM | Permalink

    I don’t know if it has already been stated, but this is exactly why it is so damned important to distribute the source data and code with climate papers. Any debate otherwise is hereby squashed.

12 Trackbacks

  1. […] Gergis “Significance” […]

  2. […] Climate Audit, the paper was examined in more detail, and alarm bells went off. Concern centered around the 27 […]

  3. […] Gergis “Significance” […]

  4. […] Climate audit […]

  5. […] Joelle Gergis and her colleagues committed capital methodological and calculation errors, as McIntyre together with Jean S were able to show. McIntyre began to correpsond with Gergis by e-mail, but instead of thanking McIntyre for the […]

  6. […] experts began examining the “threads” of this “silk purse”. And they found that the glue was not quite, well, […]

  7. […] the Gergis paper was withdrawn upon the publication of the JeanS ‘Gergis Significance’ t-values. Unsurprisingly, Palmyra was one of the proxies that failed the t-test, so is a rogue data […]

  8. […] screening correlations were incorrectly calculated. JeanS calculated properly. Only 6 out of 27 proxies passed. (NB none of the six proxies outside the area […]

  9. By The Climate Change Debate Thread - Page 1646 on Oct 19, 2012 at 8:49 AM

    […] […]

  10. By Gergis et al Correspondence « Climate Audit on Oct 28, 2012 at 8:42 PM

    […] the target Australasian temp series and is unable to reproduce the claimed results in the paper. I suggest that you look at this Stephen Mcintyre post. Given that the error is now identified in […]

  11. […] the majority of their proxies did not pass a detrended correlation test – see CA discussion here (building on an earlier thread) reporting that only 6 of 27 proxies passed the stated significance […]

  12. […] that the majority of their proxies did not pass a detrended correlation test – see CA discussion here (building on an earlier thread) reporting that only 6 of 27 proxies passed the stated significance […]

%d bloggers like this: