I’ve posted up a pdf for Allen and Tett 1999 here as this seems to be a frequently cited article that said that "optimal fingerprinting" was linear regression and gives a flavor for the literature. The approach looks to me like pretty garden variety methodology, such as one would see in the fall term of an econometrics course. It’s hard to believe that this is the Royal Society’s "advanced statistical methods" – I wonder if they checked this with any statisticians.

Tip Jar

Pages

Categories

Articles

Blogroll
 Accuweather Blogs
 Andrew Revkin
 Anthony Watts
 Bishop Hill
 Bob Tisdale
 Dan Hughes
 David Stockwell
 Icecap
 Idsos
 James Annan
 Jeff Id
 Josh Halpern
 Judith Curry
 Keith Kloor
 Klimazweibel
 Lubos Motl
 Lucia's Blackboard
 Matt Briggs
 NASA GISS
 Nature Blogs
 RealClimate
 Roger Pielke Jr
 Roger Pielke Sr
 Roman M
 Science of Doom
 Tamino
 Warwick Hughes
 Watts Up With That
 William Connolley
 WordPress.com
 World Climate Report

Favorite posts

Links

Weblogs and resources

Archives
30 Comments
Some interesting stuff from that paper:
Right, temperature series are normal … we have the HadCRUT3 monthly global temperature series, for example, on the largest spatiotemporal scale possible, which is incredibly nonnormal, JarqueBera test says Xsquared = 60.5801, df = 2, pvalue = 7.006e14 … in other words, nonnormal at the 99.999999999999% level … even detrended it is still nonnormal at the 99% level.
Or how about the Kaplan North Atlantic monthly SST series, undetrended it’s nonnormal at the 99.9999999etc level, detrended it’s still nonnormal at the same level.
Then there’s the null hypothesis statement, which immediately follows the previous statement:
I love this one. Their null hypothesis is that the climate model works just fine, and if they can’t disprove the null hypothesis, why, everything’s just dandy … wonder how much time they spent trying to disprove the null hypothesis …
My null hypothesis, on the other hand, is that the climate models don’t work for sh*t, and it’s up to the modelers to prove otherwise.
Then we have the way they test their null hypothesis …
Umm … well, OK. Here’s the model results from the Santer et al. paper published Science magazine, Amplification of Surface Temperature Trends and Variability in the Tropical Atmosphere, that was supposed to provide a “fingerprint” of tropical tropospheric warming. These are the models, and the nonnormality of the residuals. Sorry for all the decimals, but I had to put them in to show how nonnormal the residuals are:
CM2.1 , 99.9999898480390%
UKMO , 99.9999999999258%
M_hires , 99.9999999999513%
CM2.0 , 87.7303407597607%
CCSM3 , 99.9999965856309%
GISSEH , 99.9999999999999%
HadCRUT , 99.1232969781016%
GISSER , 99.9999999999999%
M_medres , 99.9999999892319%
PCM , 99.9999918309551%
Only one of these models (CM2.0) has even vaguely normally distributed residuals (we only have 87% confidence it’s not normal, so we can’t reject it), and that model gave wildly wacky results, with huge plunges above and below the actual data.
In addition, the tropical NOAA SST data, as well as the tropical HadCRUT SST data, both of which were used in the study, are nonnormal at the 99.99999999% and 99.9999999% levels respectively … (As an aside, only one of these models (HadCRUT) showed a significant correlation with the data (p less than 0.05)
This was the study that famously said that:
Oh, right, the models disagree with the data, so it is more plausible to think that the data is wrong …
How do these guys get away with this stuff?
w.
#1 — “How do these guys get away with this stuff?”
That’s the 64 thousand $ question dogging all of climate science.
That it’s marketed as “advanced” probably tells you something about the “unadvanced” methods that came before it.
Given that much of the temperature data is nonnormal, in some cases greatly so, I don’t see how their test for normal residuals would ever get passed. I’ve been looking at the Kaplan North Atlantic SST data, which is wildly nonnormal, both trended (JarqueBera test, nonnormal at p = 4e35) and detrended (nonnormal at p = 4e37).
Having been unsuccessful at detrending it, I thought at first that instead of detrending it with a linear trend, if I detrended it with say a six year gaussian smoothing of the data, that it would have normal residuals. But no joy, the gaussian residuals were nonnormal as well (p = 9e26, moving in the right direction but a long way from there). This is despite the fact that the gaussian average itself is also significantly nonnormal (p = 4e37).
So I tried removing the average monthly anomalies from the linear detrended data, that was the best yet but didn’t work either, still nonnormal at p = 0.001. How about the gaussian detrended data minus the gaussian monthly average anomalies? Getting close, p = 0.02, but still nonnormal.
OK, how about detrending it with a longer gaussian smoothing, maybe 12 years, then remove the monthly average residuals … whoa, I did it. We can’t reject the null hypotesis that the residuals are normal, p = 0.13. Of course, this means we have 87% confidence that they are in fact nonnormal … but we can’t reject the hypothesis. And lengthening the smoothing filter beyond that doesn’t improve matters.
I’m fresh out of ideas … but I’d sure like to see the climate model that can successfully pass their test regarding emulating the Kaplan North Atlantic SST record …
w.
Willis, it would be helpful if you showed a histogram of one of these residuals compared to a normal distribution.
Willis, this is all publishable stuff, it seems. Why don’t you write it up?
#2
One major way appears to be they peer review each others papers, which is why they keep stressing that those who question are not for the most part peer reviewed.
For the most part, those who are criticizing are not bothering to put ass to chair seat and write papers. And when they do, they too often look like BC06 or Steve’s PPT presentation for the AGU: abortions. Let’s ditch the “hating the journals” when people are not even trying to get published.
TCO,
I have to agree. Scientific revolutions are won on the battlefield of scientific journals. What many of the critics are doing is similar to guerilla warfare : don’t face the ennemy in the open, just stay on the fringe and strike here and there, claiming victory each time, but without really making a dent in the established regime. On the other hand, every published paper is another battle won. AGW proponents have understood that right from the start.
I have not followed the technical details that have been developed on this website, so I apologize if this has been answered somewhere else. I have two related comments:
1. PCA is a calculation performed on a covariance matrix. In a typical application, the covariance matrix is an estimate of the “true” covariance of the system which is derived from a finite number of observations of the system. Also in the typical application, the covariance matrix is estimated with the sample means removed. This can be shown to be the optimal estimate (with respect to likelihood) under some simple assumptions. If the sample means are not used to center the covariance estimate, then the estimate is simply not optimal with respect to likelihood. Perhaps an “uncentered” estimate is optimal with respect to some other data model. What is that data model?
2. Why are not all of these discussions prefaced with a description of an assumed data model? (e.g., the signal is Brownian motion with variance 1 per unit time and the noise is uncorrelated normal with mean 0 and variance 2). With an explicit data model, the discussion divides into two parts: is the data model appropriate and does the statistical technique diminish the noise and enhance the signal?
Regards,
rwnj
Lets see, Fisz (1980) Probability Theory and Mathematical Statistics:
Theorem 3.6.3. The variance of the sum of an arbitrary finite number of independent random variables, whose variances exist, equals the sum of their variance.
Didn’t find a theorem that explains ‘exactly so if all distributions are Gaussian’.
But this is more important: they use ‘control integrations’ to solve the covariance matrix of the ‘climate noise’, just like IDAG. Mann uses ad hoc spectrum smoothing. It is not detection, it is circular reasoning. Useless.
(and what is rankL vector?)
Re #5, Doug, you asked for a histogram of the Kaplan dataset. Here ’tis …
Like I said … radically nonnormal …
The problem is that the earth’s climate is chaotically bistable (or more properly multistable). Think for example of the PDO, or in this case, the AMO. The datasets from these chaotically bistable distributions are typically “humped”, with a concentration of data on both sides of the overall average. This gives us nonnormal distributions.
w.
#1,12
How do you test normality of time series? i.i.d case is easy, but what if the correlation in time is high?
#10. I agree. You’d think that people purporting to make reconstructions based on a methodology such as PCA would show the applicability of the methodology. However, the original article was published in Nature and the referees did not require that any demonstration of the applicability of a “novel” methodology be made. So for a full explanation of thephenomenon, you’d have to ask Nature.
Re #13
Tests of normality are available in most standard stats packages.
ShapiroWilk’s test is one. In R, use function:
shapiro.test(x)
which assumes iid, of course.
If the series is nonstationary, then there is by definition more than one distribution. If the system is bistable, then there are 2 distributions – which you can estimate if you have a long enough data series and if you eliminate the transient phase where states are switching. If the series are autocorrelated than you can use prewhitening to remove the red.
Re #14 Steve M, that’s what my supervisory committee asked me to do when as a grad student I proposed using PCA to extract signals from a network of tree ring data. They wanted proof of concept before any manuscripts were written. (But of course, there were no climatologists on my committee.)
#16. no wonder this Mann stuff seems so bizarre to you. Precautions taken by your advisory comnmittee (presumably some time ago) completely thrown to the winds by Nature and IPCC and then a scorchedearth policy by the Team of deny, deny, deny.
Re #17
That’s how I knew from the start (i.e. from the time the bizarre offcentering method was revealed) that your criticisms of their methods and data were spot on. Their reluctance to release the new bristlecone pine data is, as you have said many times, suspicious. Their reluctance to admit how statistically nonindependent these “independent” multiproxy reconstructions is telling. That they fail to understand Wegman’s primary point – that the community is too inbred and too far removed from the statistics community resonates with my observations. They say they’ve “moved on”, but look at the faulty statistical methods still being used in the analysis of the hurricane data. Detection and attribution is becoming nonscientific as they become singleminded in their focus on the A in AGW and unwilling to consider alternative hypotheses.
Then again, maybe climatology never was a science. (In the Popperian sense of growing knowledge through iterative conjecture & refutation.)
Re: tests of normality – 4 notes
1) I am about 20 — OK 30 — years out of date on distribution tests, but the wag within the community – say 1975 – was that distribution tests didn’t work well in practice, which lead to there being ignored.
2) Be careful with the ShapiroWilk test for data with multiple observations with the same value. The test is based or order statistics which are obviously sensitive to ties.
3a) If you are testing a variable Y for normality of the levels, you need to account for the ARMA structure in the model and then test the residuals — a point that is pretty easy to see when you plot the realizations of an AR(1)= 0.5 series next to a series of i.i.d. N(0,1) of the same sample size.
3b) If Y(t) = X(t)*B+e(t) where e(t) is i.i.d. N(0,sigma^2) but X is not normal, Y won’t be normal either. Thus again one has to look to the residuals for the test or normality to have any meaning.
4) The error term in a linear model does not have to be normal for the GaussMarkoff theorem to hold, only that the error is identically and independent distributed with a zero mean and finite variance. Of course, the significance of the coefficients can no longer be read straight from the tables (t, F or ChiSquare) which presumed normally distributed errors. In practice most applied econometric studies do not test for normality. The do test for serial correlation and heteroskedasicity in the residuals. (And most readers of said studies do a little implicit discounting of the significance both for lack of normality and for data mining.) A “good” model has clean residuals: they will look roughly like white noise on a plot although they will probably fail most tests of normality.
#19
Good points. Should add to 4) that if the error distribution is unknown, 2sigma and 3sigma limits are quite useless.
We should avoid making same mistake as Mann & Jones do in More on the Arctic post. (5sigma events in correlated time series)
Re #20
You mean the mistake of assuming that the error distribution is (i) known and (ii) homogeneous?
Re #12
I am intrigued by Willis’ confident assertion in #12 of [chaotic] multi(bi)stability.
1. Mathematically & intuitively I understand the concept. But in reality, doesn’t local bistability in a network of n cells across a globe imply the system actually has 2^n superstates? i.e. ENSO, AMO, PDO are just big, resilient chunks of a superstate. If the globe is warming and we are passing from one superstate to the next, then some of these chunks will be more resilient to change over time than others. i.e. The illusion of largescale bistability persists for some time, until familiar chunk after familiar chunk is finally broken down and reconstituted to form new and different chunks that characterize the new superstates. If this is the shape GW will take, then the notion of bistability and metastability seem not very useful. [Apologies for vagueness, imprecision, ambiguity. I’m trying my best with my limited toolkit.]
2. Are these systems really locally bistable? Or is this largely an illusion enhanced by the way warm & cool waters vertically separate? i.e. It’s not a single variable system with two states, but a two variable system with continuous states. [Again, apologies for the bandwidthconsuming musings. It makes sense to the writer, but probably sounds incoherent to the reader. Even after substantial efforts in editing.]
I’m willing to read if anyone’s got suggestions.
Re #22, bender, thanks for the interesting question. I assert it based on the existence of a variety of “oscillations”, which is climatespeak for a couple of separate stable states.
Take for example the PDO. Here’s the correlation of the PDO with the SST:
And here’s the PDO index, from here.
Note the stable period between 1945 and 1975. That’s what I mean by “bistable” … but I’m willing to learn. In any case, I’m not sure I agree that there are 2^n superstates. Rather, I would say that there are various subsystems, many of which have more than one stable state.
w.
re #21
RC:
So, sample mean and sample std computed from 19611990 data. Then it is observed that 2006 value is 5 sample standard deviations above the sample mean. And that is astronomically improbable, they say.
Mistake: stationary does not mean i.i.d. It means that all of the distribution functions of the process are unchanged regardless of the time shift applied to them (let’s be strict here 🙂
I think no one disagrees that there are nonzero autocorrelations in temperature data. BeraJarque test assumes random sample, and autocorrelated series won’t necessarily do (as said in #19 3). And sample mean and sample std won’t tell much if the samples are not random.
Maybe I should it put this way: In #1 Willis proofs that the way Mann & Jones interpret their data is wrong. But now I’m confusing people (and myself), so I need to stop.
related discussion here (http://www.climateaudit.org/?p=678)
Re #23
Fair enough. Bistability is useful conceptually, but in practice it has its limits. Worth mentioning because I sometimes get the sense that skeptics take the “bistability” proposition to mean there is a hard ceiling on global warming. In reality there is no telling how many ceilings there are to bust through. Just wanted to be clear that local bistability does not imply global bistability.
Re #24
I follow better now. If underlying distribution is changing, then that “5 sigma” “event” may really be a 2 sigma event, with the 3 sigma difference attributable to a trend, or a switch among bistable states, or nonstationary background forcing effects, or what have you. So their “astronomic improbability” is exaggerated by cherrypicking the timeframe of the baseline “normals” used for comparison. And 1 and 2 sigma events aren’t all that uncommon.
Generate AR1 with p=0.9, Gaussian driving noise, N=300, and take sample std and sample mean using 30 samples. Won’t take long to find 6sigmas.
Yep. But of course they can claim that stationary ‘normal’ actually means those weakly correlated AR1 background processes that CGCMSs and spectrum smoothing methods provide. This might be infinite loop.
5sigma events are not uncommon, if you let me choose the distribution. Or if we observe astronomical number of samples.
RE: #23 – OK, I am going to be a tree (well, actually, bush) ringer for a short while here. This is based on personal recollection so please excuse any slight errors. I recall reading an article in the Los Angeles Times back during the early to mid 80s (probably ~ 1983) where it was discussing a ring width study done on chaparral (don’t recall the exact species, perhaps ceanothus) in the near coastal transverse ranges (now Dano, that is most definitely Mediterranean! 🙂 ). The altitude was such that the main limiting factor of growth was moisture availability. The assertion of the study was that ring widths depicted that there was in general less moisture available from the 1940s to the time of the sampling (I suspect late 1970s) than had been available from some point in the 1800s (I seem to recall late 1860s) to the 1940s. The folks who did the study also issued a warning that the 1940s – 1970s lull in precip (taken as a proxy for late fall through spring mid latitude cyclones and cold fronts) would, based on the result for the earlier period, likely not hold. So, what has happened since then? After a fairly significant – ~ 7 year – regional drought (interestingly, affecting mostly Southern, but not Northern, California) during much of the 1980s, there came some very wet years since, with the exception of the odd dry one (expected that far south). I am not sure to what level the concept of the PDO was understood back then. Nonetheless I find it fascinating that the time frames seem to align.
Additional notes, Southern California (where I resided during most of the 1980s) experienced very moist years 1981 – 1983, corresponding to the upper portion of that significant rising edge of the PDO figure of merit. The So Cal drought I mentioned 1984 – 1991 was during the subsequent slight lowering and trough in the waveform after that significant 1973 through 1985 rising edge. IIRC, El Ninos during the rising edge were and 78 – 79, 82 – 83. There have been no truly extreme El Ninos since. There were two moderate ones during the time since, one in 97 and early 98 the other back in 00 and 01. Interestingly, the severe flooding and mud slides a few years back (03 I believe?) in So Cal were not during a true El Nino event but were a result of a persistent split polar jet with one leg stuck over So Cal. Also, the worst drought in recent California history, affecting the entire state, was 75 – 77, years during which most of the state melded in with Baja and Arizona from a precipitation standpoint. Also, Feb 1976 we had one of the more notable cP outbreaks ever witnessed, bringing a couple of inches of snow to most of the lowland areas of the state.
RE: #29 – sorry, final note, while the 98 El Nino was really major from an ENSO figure of merit perspective, as experienced in much of California, the 81 – 83 one was more impactful. My statement about extremity is as experienced here, not from an ENSO figure of merit perspective.