## Allen and Tett 1999

I’ve posted up a pdf for Allen and Tett 1999 here as this seems to be a frequently cited article that said that "optimal fingerprinting" was linear regression and gives a flavor for the literature. The approach looks to me like pretty garden variety methodology, such as one would see in the fall term of an econometrics course. It’s hard to believe that this is the Royal Society’s "advanced statistical methods" – I wonder if they checked this with any statisticians.

1. Willis Eschenbach
Posted Sep 24, 2006 at 1:29 AM | Permalink

Some interesting stuff from that paper:

4 Consistency checks to detect model inadequacy

Having framed the optimal “ngerprinting algorithm as a
linear regression problem, a variety of simple checks
for model adequacy immediately present themselves,
drawn from the standard statistical literature. For
simplicity, following Hasselmann (1997) we will focus
on parametric tests based on the assumption of multivariate
normality. To judge from the analyses we have
performed to date, the assumption of normality is likely
to be reasonably close to valid for temperature data
on large spatio-temporal scales. Assuming normality
for other data types (such as precipitation) would be
more problematic.

Right, temperature series are normal … we have the HadCRUT3 monthly global temperature series, for example, on the largest spatio-temporal scale possible, which is incredibly non-normal, Jarque-Bera test says X-squared = 60.5801, df = 2, p-value = 7.006e-14 … in other words, non-normal at the 99.999999999999% level … even detrended it is still non-normal at the 99% level.

Or how about the Kaplan North Atlantic monthly SST series, undetrended it’s non-normal at the 99.9999999etc level, detrended it’s still non-normal at the same level.

Then there’s the null hypothesis statement, which immediately follows the previous statement:

Our null-hypothesis,H0, is that the control simulation
of climate variability is an adequate representation
of variability in the real world in the truncated statespace
which we are using for the analysis, i.e. the subspace
defined by the first k EOFs of the control run
does not include patterns which contain unrealistically
low (or high) variance in the control simulation of
climate variability. Because the effects of errors in observations
are not represented in the climate model,
H0 also encompasses the statement that observational
error is negligible in the truncated state-space (on the
spatio-temporal scales) used for detection. A test of
H0, therefore, is also a test of the validity of this
assumption.

I love this one. Their null hypothesis is that the climate model works just fine, and if they can’t disprove the null hypothesis, why, everything’s just dandy … wonder how much time they spent trying to disprove the null hypothesis …

My null hypothesis, on the other hand, is that the climate models don’t work for sh*t, and it’s up to the modelers to prove otherwise.

Then we have the way they test their null hypothesis …

We formulate a simple test of this null-hypothesis as follows: if H0 is true then the residuals of regression [of the model results on the data] should behave like mutually independent, normally distributed random noise in the coordinate system …

Umm … well, OK. Here’s the model results from the Santer et al. paper published Science magazine, Amplification of Surface Temperature Trends and Variability in the Tropical Atmosphere, that was supposed to provide a “fingerprint” of tropical tropospheric warming. These are the models, and the non-normality of the residuals. Sorry for all the decimals, but I had to put them in to show how non-normal the residuals are:

CM2.1 , 99.9999898480390%
UKMO , 99.9999999999258%
M_hires , 99.9999999999513%
CM2.0 , 87.7303407597607%
CCSM3 , 99.9999965856309%
GISS-EH , 99.9999999999999%
GISS-ER , 99.9999999999999%
M_medres , 99.9999999892319%
PCM , 99.9999918309551%

Only one of these models (CM2.0) has even vaguely normally distributed residuals (we only have 87% confidence it’s not normal, so we can’t reject it), and that model gave wildly wacky results, with huge plunges above and below the actual data.

In addition, the tropical NOAA SST data, as well as the tropical HadCRUT SST data, both of which were used in the study, are non-normal at the 99.99999999% and 99.9999999% levels respectively … (As an aside, only one of these models (HadCRUT) showed a significant correlation with the data (p less than 0.05)

This was the study that famously said that:

These results suggest that
either different physical mechanisms control
amplification processes on monthly and decadal
timescales, and models fail to capture such behavior, or
(more plausibly) that residual errors in several
observational datasets used here affect their
representation of long-term trends.

Oh, right, the models disagree with the data, so it is more plausible to think that the data is wrong …

How do these guys get away with this stuff?

w.

2. Pat Frank
Posted Sep 24, 2006 at 2:20 AM | Permalink

#1 — “How do these guys get away with this stuff?

That’s the 64 thousand \$ question dogging all of climate science.

3. bender
Posted Sep 24, 2006 at 5:11 AM | Permalink

It’s hard to believe that this is the Royal Society’s “advanced statistical methods”

That it’s marketed as “advanced” probably tells you something about the “unadvanced” methods that came before it.

4. Willis Eschenbach
Posted Sep 24, 2006 at 5:12 AM | Permalink

Given that much of the temperature data is non-normal, in some cases greatly so, I don’t see how their test for normal residuals would ever get passed. I’ve been looking at the Kaplan North Atlantic SST data, which is wildly non-normal, both trended (Jarque-Bera test, non-normal at p = 4e-35) and detrended (non-normal at p = 4e-37).

Having been unsuccessful at detrending it, I thought at first that instead of detrending it with a linear trend, if I detrended it with say a six year gaussian smoothing of the data, that it would have normal residuals. But no joy, the gaussian residuals were non-normal as well (p = 9e-26, moving in the right direction but a long way from there). This is despite the fact that the gaussian average itself is also significantly non-normal (p = 4e-37).

So I tried removing the average monthly anomalies from the linear detrended data, that was the best yet but didn’t work either, still non-normal at p = 0.001. How about the gaussian detrended data minus the gaussian monthly average anomalies? Getting close, p = 0.02, but still non-normal.

OK, how about detrending it with a longer gaussian smoothing, maybe 12 years, then remove the monthly average residuals … whoa, I did it. We can’t reject the null hypotesis that the residuals are normal, p = 0.13. Of course, this means we have 87% confidence that they are in fact non-normal … but we can’t reject the hypothesis. And lengthening the smoothing filter beyond that doesn’t improve matters.

I’m fresh out of ideas … but I’d sure like to see the climate model that can successfully pass their test regarding emulating the Kaplan North Atlantic SST record …

w.

5. Douglas Hoyt
Posted Sep 24, 2006 at 6:14 AM | Permalink

Willis, it would be helpful if you showed a histogram of one of these residuals compared to a normal distribution.

6. Posted Sep 24, 2006 at 9:24 AM | Permalink

Willis, this is all publishable stuff, it seems. Why don’t you write it up?

7. Tim Ball
Posted Sep 24, 2006 at 12:26 PM | Permalink

#2
One major way appears to be they peer review each others papers, which is why they keep stressing that those who question are not for the most part peer reviewed.

8. TCO
Posted Sep 24, 2006 at 7:18 PM | Permalink

For the most part, those who are criticizing are not bothering to put ass to chair seat and write papers. And when they do, they too often look like BC06 or Steve’s PPT presentation for the AGU: abortions. Let’s ditch the “hating the journals” when people are not even trying to get published.

9. Posted Sep 25, 2006 at 8:20 AM | Permalink

TCO,

I have to agree. Scientific revolutions are won on the battlefield of scientific journals. What many of the critics are doing is similar to guerilla warfare : don’t face the ennemy in the open, just stay on the fringe and strike here and there, claiming victory each time, but without really making a dent in the established regime. On the other hand, every published paper is another battle won. AGW proponents have understood that right from the start.

10. rwnj
Posted Sep 25, 2006 at 7:16 PM | Permalink

I have not followed the technical details that have been developed on this website, so I apologize if this has been answered somewhere else. I have two related comments:
1. PCA is a calculation performed on a covariance matrix. In a typical application, the covariance matrix is an estimate of the “true” covariance of the system which is derived from a finite number of observations of the system. Also in the typical application, the covariance matrix is estimated with the sample means removed. This can be shown to be the optimal estimate (with respect to likelihood) under some simple assumptions. If the sample means are not used to center the covariance estimate, then the estimate is simply not optimal with respect to likelihood. Perhaps an “uncentered” estimate is optimal with respect to some other data model. What is that data model?
2. Why are not all of these discussions prefaced with a description of an assumed data model? (e.g., the signal is Brownian motion with variance 1 per unit time and the noise is uncorrelated normal with mean 0 and variance 2). With an explicit data model, the discussion divides into two parts: is the data model appropriate and does the statistical technique diminish the noise and enhance the signal?

Regards,

rwnj

11. Posted Sep 26, 2006 at 1:14 AM | Permalink

If the model simulation of internal variability is correct, the variance in the response patterns from an M-member ensemble is approximately 1/M times the variance in the observations (exactly so if all distributions are Gaussian).

Lets see, Fisz (1980) Probability Theory and Mathematical Statistics:

Theorem 3.6.3. The variance of the sum of an arbitrary finite number of independent random variables, whose variances exist, equals the sum of their variance.

Didn’t find a theorem that explains ‘exactly so if all distributions are Gaussian’.

But this is more important: they use ‘control integrations’ to solve the covariance matrix of the ‘climate noise’, just like IDAG. Mann uses ad hoc spectrum smoothing. It is not detection, it is circular reasoning. Useless.

(and what is rank-L vector?)

12. Willis Eschenbach
Posted Sep 26, 2006 at 3:22 AM | Permalink

Re #5, Doug, you asked for a histogram of the Kaplan dataset. Here ’tis …

Like I said … radically non-normal …

The problem is that the earth’s climate is chaotically bi-stable (or more properly multistable). Think for example of the PDO, or in this case, the AMO. The datasets from these chaotically bi-stable distributions are typically “humped”, with a concentration of data on both sides of the overall average. This gives us non-normal distributions.

w.

13. Posted Sep 26, 2006 at 3:53 AM | Permalink

#1,12

How do you test normality of time series? i.i.d case is easy, but what if the correlation in time is high?

14. Steve McIntyre
Posted Sep 26, 2006 at 7:19 AM | Permalink

#10. I agree. You’d think that people purporting to make reconstructions based on a methodology such as PCA would show the applicability of the methodology. However, the original article was published in Nature and the referees did not require that any demonstration of the applicability of a “novel” methodology be made. So for a full explanation of thephenomenon, you’d have to ask Nature.

15. bender
Posted Sep 26, 2006 at 7:28 AM | Permalink

Re #13
Tests of normality are available in most standard stats packages.
Shapiro-Wilk’s test is one. In R, use function:

shapiro.test(x)

which assumes iid, of course.

If the series is nonstationary, then there is by definition more than one distribution. If the system is bistable, then there are 2 distributions – which you can estimate if you have a long enough data series and if you eliminate the transient phase where states are switching. If the series are autocorrelated than you can use pre-whitening to remove the red.

16. bender
Posted Sep 26, 2006 at 7:31 AM | Permalink

Re #14 Steve M, that’s what my supervisory committee asked me to do when as a grad student I proposed using PCA to extract signals from a network of tree ring data. They wanted proof of concept before any manuscripts were written. (But of course, there were no climatologists on my committee.)

17. Steve McIntyre
Posted Sep 26, 2006 at 8:45 AM | Permalink

#16. no wonder this Mann stuff seems so bizarre to you. Precautions taken by your advisory comnmittee (presumably some time ago) completely thrown to the winds by Nature and IPCC and then a scorched-earth policy by the Team of deny, deny, deny.

18. bender
Posted Sep 26, 2006 at 9:12 AM | Permalink

Re #17
That’s how I knew from the start (i.e. from the time the bizarre off-centering method was revealed) that your criticisms of their methods and data were spot on. Their reluctance to release the new bristlecone pine data is, as you have said many times, suspicious. Their reluctance to admit how statistically non-independent these “independent” multiproxy reconstructions is telling. That they fail to understand Wegman’s primary point – that the community is too inbred and too far removed from the statistics community resonates with my observations. They say they’ve “moved on”, but look at the faulty statistical methods still being used in the analysis of the hurricane data. Detection and attribution is becoming non-scientific as they become single-minded in their focus on the A in AGW and unwilling to consider alternative hypotheses.

Then again, maybe climatology never was a science. (In the Popperian sense of growing knowledge through iterative conjecture & refutation.)

19. Martin Ringo
Posted Sep 26, 2006 at 10:06 PM | Permalink

Re: tests of normality – 4 notes

1) I am about 20 — OK 30 — years out of date on distribution tests, but the wag within the community – say 1975 – was that distribution tests didn’t work well in practice, which lead to there being ignored.

2) Be careful with the Shapiro-Wilk test for data with multiple observations with the same value. The test is based or order statistics which are obviously sensitive to ties.

3a) If you are testing a variable Y for normality of the levels, you need to account for the ARMA structure in the model and then test the residuals — a point that is pretty easy to see when you plot the realizations of an AR(1)= 0.5 series next to a series of i.i.d. N(0,1) of the same sample size.
3b) If Y(t) = X(t)*B+e(t) where e(t) is i.i.d. N(0,sigma^2) but X is not normal, Y won’t be normal either. Thus again one has to look to the residuals for the test or normality to have any meaning.

4) The error term in a linear model does not have to be normal for the Gauss-Markoff theorem to hold, only that the error is identically and independent distributed with a zero mean and finite variance. Of course, the significance of the coefficients can no longer be read straight from the tables (t, F or Chi-Square) which presumed normally distributed errors. In practice most applied econometric studies do not test for normality. The do test for serial correlation and heteroskedasicity in the residuals. (And most readers of said studies do a little implicit discounting of the significance both for lack of normality and for data mining.) A “good” model has clean residuals: they will look roughly like white noise on a plot although they will probably fail most tests of normality.

20. Posted Sep 26, 2006 at 11:37 PM | Permalink

#19

Good points. Should add to 4) that if the error distribution is unknown, 2-sigma and 3-sigma limits are quite useless.

We should avoid making same mistake as Mann & Jones do in More on the Arctic post. (5-sigma events in correlated time series)

21. bender
Posted Sep 27, 2006 at 12:44 AM | Permalink

Re #20

We should avoid making same mistake as Mann & Jones

You mean the mistake of assuming that the error distribution is (i) known and (ii) homogeneous?

22. bender
Posted Sep 27, 2006 at 12:46 AM | Permalink

Re #12
I am intrigued by Willis’ confident assertion in #12 of [chaotic] multi-(bi-)stability.

1. Mathematically & intuitively I understand the concept. But in reality, doesn’t local bistability in a network of n cells across a globe imply the system actually has 2^n super-states? i.e. ENSO, AMO, PDO are just big, resilient chunks of a superstate. If the globe is warming and we are passing from one superstate to the next, then some of these chunks will be more resilient to change over time than others. i.e. The illusion of large-scale bistability persists for some time, until familiar chunk after familiar chunk is finally broken down and reconstituted to form new and different chunks that characterize the new superstates. If this is the shape GW will take, then the notion of bistability and metastability seem not very useful. [Apologies for vagueness, imprecision, ambiguity. I’m trying my best with my limited toolkit.]

2. Are these systems really locally bistable? Or is this largely an illusion enhanced by the way warm & cool waters vertically separate? i.e. It’s not a single variable system with two states, but a two variable system with continuous states. [Again, apologies for the bandwidth-consuming musings. It makes sense to the writer, but probably sounds incoherent to the reader. Even after substantial efforts in editing.]

I’m willing to read if anyone’s got suggestions.

23. Willis Eschenbach
Posted Sep 27, 2006 at 1:30 AM | Permalink

Re #22, bender, thanks for the interesting question. I assert it based on the existence of a variety of “oscillations”, which is climatespeak for a couple of separate stable states.

Take for example the PDO. Here’s the correlation of the PDO with the SST:

And here’s the PDO index, from here.

Note the stable period between 1945 and 1975. That’s what I mean by “bi-stable” … but I’m willing to learn. In any case, I’m not sure I agree that there are 2^n superstates. Rather, I would say that there are various sub-systems, many of which have more than one stable state.

w.

24. Posted Sep 27, 2006 at 1:33 AM | Permalink

re #21

RC:

The April mean temperature is almost 5 standard deviations above the mean, a “5 sigma event” in statistical parlance. Under the assumption of stationary ‘normal’ statistics, such an event is considered astronomically improbable..

So, sample mean and sample std computed from 1961-1990 data. Then it is observed that 2006 value is 5 sample standard deviations above the sample mean. And that is astronomically improbable, they say.

Mistake: stationary does not mean i.i.d. It means that all of the distribution functions of the process are unchanged regardless of the time shift applied to them (let’s be strict here π

I think no one disagrees that there are non-zero autocorrelations in temperature data. Bera-Jarque test assumes random sample, and autocorrelated series won’t necessarily do (as said in #19 3). And sample mean and sample std won’t tell much if the samples are not random.

Maybe I should it put this way: In #1 Willis proofs that the way Mann & Jones interpret their data is wrong. But now I’m confusing people (and myself), so I need to stop.

related discussion here (http://www.climateaudit.org/?p=678)

25. bender
Posted Sep 27, 2006 at 1:48 AM | Permalink

Re #23
Fair enough. Bistability is useful conceptually, but in practice it has its limits. Worth mentioning because I sometimes get the sense that skeptics take the “bistability” proposition to mean there is a hard ceiling on global warming. In reality there is no telling how many ceilings there are to bust through. Just wanted to be clear that local bistability does not imply global bistability.

26. bender
Posted Sep 27, 2006 at 2:03 AM | Permalink

Re #24
I follow better now. If underlying distribution is changing, then that “5 sigma” “event” may really be a 2 sigma event, with the 3 sigma difference attributable to a trend, or a switch among bistable states, or nonstationary background forcing effects, or what have you. So their “astronomic improbability” is exaggerated by cherry-picking the time-frame of the baseline “normals” used for comparison. And 1 and 2 sigma events aren’t all that uncommon.

27. Posted Sep 27, 2006 at 2:44 AM | Permalink

Generate AR1 with p=0.9, Gaussian driving noise, N=300, and take sample std and sample mean using 30 samples. Won’t take long to find 6-sigmas.

So their “astronomic improbability” is exaggerated by cherry-picking the time-frame of the baseline “normals” used for comparison.

Yep. But of course they can claim that stationary ‘normal’ actually means those weakly correlated AR1 background processes that CGCMSs and spectrum smoothing methods provide. This might be infinite loop.

And 1 and 2 sigma events aren’t all that uncommon.

5-sigma events are not uncommon, if you let me choose the distribution. Or if we observe astronomical number of samples.