Mann 2008 Correlation Benchmarks

Update Sep 24 – I suggest that you start with a later post here.

The purpose of working through frustrating details of Mannian lat-longs and so on was to start testing the assertion that the network contained 484 “significant” proxies and that this meant something. As so often, there’s more to this than meets the eye.

I’ve calculated a number of different distributions as cases vary. I’ve gotten quite fond of flash gifs to show these sorts of analyses. I’ll list the cases shown in the flash-gif below and discuss the cases below the graphic.

Figure 1. Graphic with different distribution assumptions. The cases analysed: whether one or two gridcells are used for comparison, Luterbacher inclusion; degrees of freedom (Neff); SI versus replicated correlations. The graphs appear in the following order:
1. One pick; Luterbacher in; Neff=140
2. One pick; Luterbacher out; Neff=140
3. Pick two daily keno; Luterbacher out; Neff=140
4. Pick two daily keno; Luterbacher out; Neff=110
5. Pick two daily keno; Luterbacher out; Neff=110; absmax (replicated correlations not SI)
6. Pick two daily keno; Luterbacher out; Neff=64; absmax (replicated correlations not SI)

The SI refers to a couple of correlation benchmarks as 90% and 95% percentiles (actually Mann calls this “significance”, but I’m going to use the more neutral term percentile for now.) The SI doesn’t explain where these benchmarks come from, but I’m familiar enough with the ecology here to have a pretty good idea. A fairly reasonable rule of thumb is that the Fisher transformation of a distribution of correlation coefficients has a normal distribution with sd=1/sqrt(N-3), where N is the number of degrees of freedom. The Fisher transformation can be represented in r by the atanh function and its inverse by the tanh function. If you calculate the tanh of the 90% and 95% percentile of a normal distribution with sd=sqrt(1/140), you get correlation hurdles that closely match the Mann numbers. The tanh function in this range is very close to unity and maybe the values are just the qnorm percentiles, which are a titch higher. Regardless, this is almost certainly where the standards come from. I’ll apply this to consider some permutations for deriving alternative benchmarks.

tanh( qnorm(.95, sd=1/sqrt(140))) #[1] 0.1381269
tanh( qnorm(.9, sd=1/sqrt(140))) #[1] 0.1078893

There are some features to this graphic that I find pretty interesting.

First, the Mann distributions have a very noticeable bimodal distribution. In the pick two daily keno cases, I made two columns each 10,000 long simulated from rnorm(sd=1/sqrt(140)) etc and then picked the value with the highest absolute value. This picking procedure yielded a bimodal distribution. So it seems pretty plausible that the bimodal distribution of Mann’s correlation coefficients has something to do with the pick two procedure and that this needs to be allowed for in estimating statistical significance.

Second, the Luterbacher correlations are all implausibly high as representatives of a “proxy” populations – they show up as a peculiar bulge on the far right of the distribution. They have instrumental data in them and cannot be used in tests of the ability of proxies to capture signal. In graphics 2-6, these correlations have been excluded, as they should have been in the original article.

Third, I can’t replicate many correlations in the SI. I can replicate some almost exactly – so it’s a bit puzzling right now. My next nearest gridcell algorithm is a little different than Mann’s; I’m aware of the difference; I can’t see why the present differences would be material, but will re-visit this at some point. So I may re-issue my absmax calculation at some point. The bulk of the differences arise in the problematic Briffa MXD network.

Fourth, the value of Neff=110 comes from an aside in the SI. Mann says (in effect) that, because of
“modest autocorrelation”, what seems to be a 90th percentile is actually a 87.2 percentile. I can get this result using my interpretation of how he derives benchmarks by Neff=110 as shown below:

tanh( qnorm(.9, sd=1/sqrt(140))) #[1] 0.1078893
tanh(qnorm(.872, sd=1/sqrt(110))) #[1] 0.1078820

The combination of allowing for autocorrelation, allowing for pick two daily keno, Luterbacher exclusion and calculated correlations, makes the Mann distribution look increasingly like draws from random data. If Neff is reduced to 64, one gets an almost exact match.

The article contains no analysis of autocorrelation in this network. I’ve examined many individual series and it is impossible to baldly assert that autocorrelation is “modest”. Autocorrelation varies hugely between “proxies”; it is very low in some series (e.g. Luterbacher) but it is immense in others. It’s an issue that needed to be worked through. At this point, and these are just notes, it might be pretty hard to show that the observed correlation distribution could not come from pick two daily keno, if more autocorrelation assumption is allowed for.

There are still some other shoes to drop. All these correlations use Mannian infilled proxy data, Mannian infilled temperature data and truncated Briffa MXD series. If true Briffa MXD values are used, the Briffa MXD correlations are going to drop like the value of a subprime mortgage fund.

This entry was written by Stephen McIntyre, posted on Sep 23, 2008 at 6:09 PM, filed under Mann et al 2008. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

8 Comments

hengav

Posted Sep 23, 2008 at 7:13 PM | Permalink

Steve, I work fairly often with seismic data where I use PC analysis for noise removal and then cepstral tapering to increase resolution of the signal. Perhaps I will try to show this in greater detail at some later point, but I will be brief on why this may or may not be germane to your analysis. At its simplest, a seismic signal recovered from the earth is the convolution of the reflectivity(the layered earth) and the wavelet traveling through it plus noise. Processing of seismic signals sometimes does not take into account that, for accurate analysis, the wavelet and the reflectivity must be separated. For example when you apply some process, say in the Fourier domain, that you can no longer accurately reconstruct the original convolutional operator. How does this manifest itself in data that I am looking at? I can always tell a data set that has been corrupted by its bimodal power spectra. Not every data set I work with has this bimodal nature though. I can usually trace problem data sets to certain pre-processing algorithms, the FFT being just one of a handful that will give me grief.

Keep digging.
Steve McIntyre

Posted Sep 23, 2008 at 8:33 PM | Permalink

#1. In this case, I think that the bimodal is readily explained by the pick two procedure – one that is, to say the least, another novel statistical procedure.
- hengav
  
  Posted Sep 23, 2008 at 11:59 PM | Permalink
  
  Re: Steve McIntyre (#2),
  
  Can I find the “pick two daily keno” procedure explained in the Mann SI? If so I will check that out. Otherwise I will asume the odds of me figuring it out myself are around 1/13.
  
  Another analogy to seismic is gridding. We would call it “binning”. The procedure of combining information in each grid cell would be called “stacking”. The advantage of stacking information is to correlate signals and reduce noise, noise is assumed to be random. A major problem with stacking very sparse data is commonality. Think of it this way: if each grid had exactly the same type of information (like a weather station), over the same time period, then some process f(grid)=a*B+n would be meaningful as it would have the form of input in common. In the case of the gridcell data for climate proxies, you could have ice cores in one grid adjacent to some tree-ring data. And in these adjacent grid cells you add other various series, each distinct from cell to cell. In the output f(apple) does not relate to f(orange) at all. I assume this is where Keno comes in to play.
Henry

Posted Sep 24, 2008 at 1:32 AM | Permalink

The changing graphics might be easier to watch if they had a constant vertical scale.

You are clearly correct about the “pick two” double peak, though it might be affected by any relationship between adjacent grid cells.
Demesure

Posted Sep 24, 2008 at 2:45 AM | Permalink

Figure 1 is an animated gif, not “flash gif”. Flash is another format than Gif for animated images and has cute features like pause or rewind but much more. Flahs = gif on steroid, sort of.
Chris H

Posted Sep 24, 2008 at 4:28 AM | Permalink

It would be VERY USEFUL if the frames of your animated GIFs were numbered (perhaps in the graph title), so that we could easily tell how far we were through the animation, and how much more there was to see. (As well as making it easy to identify which frame(s) we wanted to examine closer, since there is no pause button.)
Natalie

Posted Sep 24, 2008 at 6:52 AM | Permalink

Wow! You really know your climate stuff. All of that just went over my head.

My husband and I leave on September 30th to backpack to various countries around the world for a year or so. As we have been planning for our adventure we have been updating a blog. http://www.nomadbackpackers.com

Since you are an expert…(yes flatery can’t hurt)…could you let my husband and I know if we are going to be traveling in a country or region of the world that is going to experience a major weather problem? I know I’ve read about monsoon season here or there, but I just thought it won’t hurt to ask.

Feel free to subscribe to our blogs feed, follow us and be our hero if you know of any climate advice, info, etc. let us know!

Thanks so much!!! Natalie
- navy bob
  
  Posted Sep 24, 2008 at 8:22 AM | Permalink
  
  Re: Natalie (#7), I’d stay away from Tanzania. Too much rain blowing in from Spain.