"Standardization" and averaging are operations that are done time after time in paleoclimate studies without much discussion of the underlying distributions. If one browses through recent statistical literature on "robust statistics", one finds much sophisticated analysis of how to handle outliers. The term "robust" is commonly used in paleoclimate, but the term as used in paleoclimate is merely a term of self-approval rather than an application of methods known to Hampel or Huber, to mention two prominent practitioners of robust statistics.
The multiproxy studies covering the MWP are all small-sample populations (4-14 series). Only MBH98, which has other problems, and Briffa et al MXD 2001, have large populations and these both only go back to 1400. Thus robustness becomes a real issue. I’ve previously noted the fantastic non-normality of key Moberg series. I’ve been reading articles on robust statistics from time and time and, in doing so, ran across Tukey 1960 on contaminated distributions. Tukey was one of the pre-eminent statisticians of the last half-century. Tukey 1960 is relatively accessibly written and is still stimulating today.
Tukey interspersed his article with a series of questions, inviting the readers to think about them and then turn the page for the answer. Pretend to do the same.
1. Given two normal populations with the same mean, one having three times the standard deviation of the other, it is proposed to prepare a sequence of mixed populations by adding varying small amounts of the wider normal population to the narrower one, It is well known that in large sample the relative efficiency as a measurement of the scale of the mean deviation as compared with the standard deviation is 88% when the underlying population is normal. As specific amounts of the wider normal population are added to the narrower one, thus defining new classes of distributions of fixed shape, will the relative efficiency for scaling of the mean deviation compared to the standard deviation increase or decrease or stay the same?
The relative increase of the mean deviation will increase.
2. Will the relative efficiency ever reach 100%. In other words will the mean deviation be as good a measure of scale as the standard deviation for any of the contaminated populations obtained by mixing tow normal populations which have the same mean but whose standard deviations are in the ratio 3:1? Never, just reach or reach and go beyond?
For some contaminated populations, the mean deviation will be a better estimate of scale, in large samples, than the standard deviation.
3. What fraction (between 0 and 1) of the wider normal population must be added to the narrower one for the mean deviation to be as good a large-sample measure of scale as the standard deviation?
When just less than 0.008 of the mixed population comes from the wider normal population, the mean deviation has the same large sample precision as the measure of scale as the standard deviation has. (Note: Do not judge an answer which deviates widely from 0.008 harshly. Many distinguished and experienced statisticians have given answers between 0.15 and 0.25).
4. Can I expect to know whether or not the population from which I draw large samples deviate from normality as much as say a contaminated population containing 0.008, 0.02 or 0.05 of the wider normal? If I cannot, what should I do about estimating the scale of an actual population from a large sample?
Tukey’s recommendations should cause great concern to paleoclimatologists accustomed to substracting the mean and dividing by the standard deviation of a calibration period.
“Clearly the second moment which corresponds to the standard deviation is the least safe of all. Its use can only be recommended when àÅ½àⱠis far less than 0.01 and we are rarely sure of this…
Probably the most promising of the alternatives shown, if substantial àÅ½àⱠis to be feared, are (i) the mean of exp(-x^2/4) and if àÅ½àⱠis quite likely to be <0.07 say (ii) the 2%-truncated variance. …
It is hard to imagine a situation where contamination would appear and yet appear in such small amounts as to make the standard deviation either as good as the mean deviation or nearly as good as the 2% truncated standard deviation, at least insofar as variability of scaling goes…
Because of practical questions of computing [this was 1960], we may find averaging exp(-x^2/4) uncomfortable. If so, then the most reasonable solutions for this problem are:
1) the truncated variance and its square root the truncated standard deviation with 2% to 5% of the observations deleted from each tail.
2) the mean deviation.
Since the variance (and its square root the standard deviation) cannot be more than 11% better than the 2-% truncated variance, while it can be 140% worse (for àÅ½àⱽ0.05), the variance will never be a safe choice.
One of Tukey’s conclusions: Nearly imperceptible non-normalities may make conventional relative efficiencies of estimates of location of scale and location entirely useless…If contamination is a real possibility (and when is it not?), neither the mean nor variance is likely to be a wisely chosen basis for making estimates from a large sample.
Now look at the qqnorm-distributions of (say) Moberg. Here we are far from "nearly imperceptible non-normalities". We have gross non-normality.
A truncated-mean applied to the MBH98 roster would have "thrown out" the MBH98 PC1 from the fifteenth century roster together with one series on the other side and yielded an entirely different result (with a high 15th century.)
In the hands of later writers (e.g. Hampel), one of the objectives of analysis was to identify outliers and then determine on scientific grounds whether they should be included. This would lead to the type of analysis carried out in MM05b, where we considered specialist opinion on the validity of bristlecones as a proxy. The only effort to reclaim bristlecones has come from Rob Wilson in a posting to climateaudit; there is none in the peer-reviewed literature. But regardless, it’s pretty evident that Tukey would have had no truck with a result which could not be replicated with a truncated mean.
Update: Here are qqnorm plots for Moberg previously posted up here
Moberg has a knife-edge difference between MWP and modern levels, precariously balanced so that they “show a profit”. The most important contributors to modern-MWP differences are the Arabian sea diatoms, Agassiz melt and the Yang China composite (imprinted by Thompson’s Dunde series). The first and third are arguably precipitation proxies. The first and second are not calibrated to temperature but raw non-normal series are used. It is inconceivable that any attention was paid here as to compliance with Nature’s statistical policies.
Tukey, J. 1960. A survey of sampling from contaminated distributions in I. Olkin et al (eds) Contributions to Probability and Statistics.