]]>[note 3 Feb 2009: due to an off-by-one error in the degrees of freedom that

were used to enter significance thresholds into the code, the effective P

values used in the low-frequency screening are slightly higher (P=0.11 to

P=0.12) than the nominal (P=0.10) cited value. This actually brings the decadal

screening threshold closer to the annual screening threshold (P approximately

0.13 when serial correlation is accounted for, as discussed in the Supplementary

Information document)].

http://signals.auditblogs.com/files/2008/10/zi670.txt

and corresponding grid instrumental,

http://signals.auditblogs.com/files/2008/10/xi670.txt

n=146, so t = tinv(0.975,146-2) = 1.98 . Some other values are needed,

Resulting local reconstruction with 95 % CI is:

As you can see, intervals computed this way are not very short. For year 1000 ( Z=-1.43 ), it is -8.6 … 0.52 C

(*) glon(i)= -72.5000, glat(i) -43.5000, name ?

]]>Re: RomanM (#54),

The values calculated by Matlab correspond to exact distribution values that are contained in tables in statistical text books, so perhaps Prof. Mann could provide an explanation of where the values for 8 and 13 df come from.

Maybe he applied Monte Carlo to get those values?

Anyway, for i=1885 r=-0.02 and for i=1932 r=-0.005, they shouldn’t enter gridboxcps.m. Maybe the selection between two closest grid points saves them ? In addition, there are 3 other series, (i =978,

1813, 1859 ) where the information content is too weak to construct 95 % CI ( using Brown’s )

In the case of testing for correlation equal to zero (or equivalently when the slope β = 0), that statistic is (using a little algebra) identical to the one you give and has an exact t-distribution.

Thanks, exactly what I was looking for. Same test applies, whether x is stochastic (analysis of joint distribution of x and y ) or not (analysis carried out conditionally, given x ).

]]>The t statistic formula can be inverted as

where df = n-2 to calculate critical values for proxy acceptance. I suspect that that is what was supposedly done (using a table?) and then commented out of the program. I calculated the values using Matlab (not copyrighted – can be used without attribution 😉 ):

df = [8, 13, 98, 144];

tval = tinv(.9, df)

ans = 1.3968 1.3502 1.2902 1.2875

sqrt((tval.^2)./(df+(tval.^2)))

ans = 0.4428 0.3507 0.1292 0.1067

The corresponding critical values given in the SI were

.42, .34, .13, .11.

The values calculated by Matlab correspond to exact distribution values that are contained in tables in statistical text books, so perhaps Prof. Mann could provide an explanation of where the values for 8 and 13 df come from. By the way, did anyone notice that, in several places in the SI, the word “degrees” was replaced by the symbol for degrees (as in angle), e.g. “n = 8^o of freedom? Ah … CliSci stat notation!

There was another item in the SI that I found particularly bothersome:

Although 484 ( 40%) pass the temperature screening process over the full (1850–1995) calibration interval, one would expect that no more than 150 ( 13%) of the proxy series would pass the screening procedure described above by chance alone. This observation indicates that selection bias, although potentially problematic when employing screened predictors (see e.g. Schneider (5); note, though, that in their reply, Hegerl et al. (10) contest that this is actually an issue in the context of their own study), does not appear a significant problem in our case.

Since a spurious proxy will pass the test when its absolute value exceeds the critical value, this means that such false proxies will be accepted not 13% of the time, but twice that or 26% of the time (13% on the positive side plus 13% more on the negative side). Anyone who has taken elementary statistics would realize that the significance level of a two-sided test (done on the proxies) is double that of the one-sided test when the same critical value is used. So according to the calculations in the SI, there could likely be as many as 300 proxies that have gotten in “by chance alone” and are uncorrelated with the temperature.

]]>Puzzled I am,

has a Student-t distribution on (n-2) dof, and

how do I transform t-test to be based on r only ?

(Why I am asking, I need statistically significant slopes to build satisfactory CIs for calibration.. )

((Some proxies that enter gridboxcps (i=1885, i=1932) do not pass any of those r criterions Mann mentions.

))

Don’t forget that 71 Luterbacher series have temp data included. Also the 95 accepted Schweingruber have 38 years of infilled high correlation data on the end.

]]>It makes a substantial difference for K. Hamed’s calculation in #49 which, if either, sign is expected. Furthermore, Kendall’s tau as used by K. does not take into account the “best of two” procedure that pre-picks cherries with big correlations.

This is correct, I used a one-sided test for all proxies in #49. A two-sided test would eliminate even more proxies.

]]>Where the sign of the correlation could a priori be specified (positive for tree-ring data, ice-core oxygen isotopes,

lake sediments, and historical documents, and negative for coral oxygen-isotope records), a one-sided significance criterion was used. Otherwise, a two-sided significance criterion was used.

Thanks, Jean! My assumption in #45 that all “should” be positive was clearly wrong.

Would treering MXD’s be a priori positive, or two-sided?

The Anadalusia/Serengeti precipitation record is an “historical record”, yet is accepted despite its negative sign, so this can’t quite be a complete list of the assumed signs. There doesn’t seem to be a field for this important factor on the XLS listing of 1209 proxies.

It makes a substantial difference for K. Hamed’s calculation in #49 which, if either, sign is expected. Furthermore, Kendall’s tau as used by K. does not take into account the “best of two” procedure that pre-picks cherries with big correlations. If adjacent gridcells were prefectly correlated, this would not make a difference, but Steve’s figures show they are not.

]]>