Update Sep 24 – I suggest that you start with a later post here.
The purpose of working through frustrating details of Mannian lat-longs and so on was to start testing the assertion that the network contained 484 “significant” proxies and that this meant something. As so often, there’s more to this than meets the eye.
I’ve calculated a number of different distributions as cases vary. I’ve gotten quite fond of flash gifs to show these sorts of analyses. I’ll list the cases shown in the flash-gif below and discuss the cases below the graphic.
Figure 1. Graphic with different distribution assumptions. The cases analysed: whether one or two gridcells are used for comparison, Luterbacher inclusion; degrees of freedom (Neff); SI versus replicated correlations. The graphs appear in the following order:
1. One pick; Luterbacher in; Neff=140
2. One pick; Luterbacher out; Neff=140
3. Pick two daily keno; Luterbacher out; Neff=140
4. Pick two daily keno; Luterbacher out; Neff=110
5. Pick two daily keno; Luterbacher out; Neff=110; absmax (replicated correlations not SI)
6. Pick two daily keno; Luterbacher out; Neff=64; absmax (replicated correlations not SI)
The SI refers to a couple of correlation benchmarks as 90% and 95% percentiles (actually Mann calls this “significance”, but I’m going to use the more neutral term percentile for now.) The SI doesn’t explain where these benchmarks come from, but I’m familiar enough with the ecology here to have a pretty good idea. A fairly reasonable rule of thumb is that the Fisher transformation of a distribution of correlation coefficients has a normal distribution with sd=1/sqrt(N-3), where N is the number of degrees of freedom. The Fisher transformation can be represented in r by the atanh function and its inverse by the tanh function. If you calculate the tanh of the 90% and 95% percentile of a normal distribution with sd=sqrt(1/140), you get correlation hurdles that closely match the Mann numbers. The tanh function in this range is very close to unity and maybe the values are just the qnorm percentiles, which are a titch higher. Regardless, this is almost certainly where the standards come from. I’ll apply this to consider some permutations for deriving alternative benchmarks.
tanh( qnorm(.95, sd=1/sqrt(140))) # 0.1381269
tanh( qnorm(.9, sd=1/sqrt(140))) # 0.1078893
There are some features to this graphic that I find pretty interesting.
First, the Mann distributions have a very noticeable bimodal distribution. In the pick two daily keno cases, I made two columns each 10,000 long simulated from rnorm(sd=1/sqrt(140)) etc and then picked the value with the highest absolute value. This picking procedure yielded a bimodal distribution. So it seems pretty plausible that the bimodal distribution of Mann’s correlation coefficients has something to do with the pick two procedure and that this needs to be allowed for in estimating statistical significance.
Second, the Luterbacher correlations are all implausibly high as representatives of a “proxy” populations – they show up as a peculiar bulge on the far right of the distribution. They have instrumental data in them and cannot be used in tests of the ability of proxies to capture signal. In graphics 2-6, these correlations have been excluded, as they should have been in the original article.
Third, I can’t replicate many correlations in the SI. I can replicate some almost exactly – so it’s a bit puzzling right now. My next nearest gridcell algorithm is a little different than Mann’s; I’m aware of the difference; I can’t see why the present differences would be material, but will re-visit this at some point. So I may re-issue my absmax calculation at some point. The bulk of the differences arise in the problematic Briffa MXD network.
Fourth, the value of Neff=110 comes from an aside in the SI. Mann says (in effect) that, because of
“modest autocorrelation”, what seems to be a 90th percentile is actually a 87.2 percentile. I can get this result using my interpretation of how he derives benchmarks by Neff=110 as shown below:
tanh( qnorm(.9, sd=1/sqrt(140))) # 0.1078893
tanh(qnorm(.872, sd=1/sqrt(110))) # 0.1078820
The combination of allowing for autocorrelation, allowing for pick two daily keno, Luterbacher exclusion and calculated correlations, makes the Mann distribution look increasingly like draws from random data. If Neff is reduced to 64, one gets an almost exact match.
The article contains no analysis of autocorrelation in this network. I’ve examined many individual series and it is impossible to baldly assert that autocorrelation is “modest”. Autocorrelation varies hugely between “proxies”; it is very low in some series (e.g. Luterbacher) but it is immense in others. It’s an issue that needed to be worked through. At this point, and these are just notes, it might be pretty hard to show that the observed correlation distribution could not come from pick two daily keno, if more autocorrelation assumption is allowed for.
There are still some other shoes to drop. All these correlations use Mannian infilled proxy data, Mannian infilled temperature data and truncated Briffa MXD series. If true Briffa MXD values are used, the Briffa MXD correlations are going to drop like the value of a subprime mortgage fund.