Both in climate blog world and the financial world, there has been much talk recently about the interaction of models and data distributions. Linear regression models assume normal distributions. What happens to models when the data distributions don’t meet the assumptions. Sometimes it doesn’t matter much, sometimes it does. But it seems like an important thing to study.
The usual relaxation of white noise assumptions for residuals (and the one used in Mannian studies) is low order red noise. As so often in Team studies, this is asserted rather than proven. It didn’t hold in MBH98 and many of the “new” proxies in Mann et al 2008 depart from this assumption even further than before.
There are a variety of graphical techniques for showing different time series properties. Applied statisticians (as opposed to climate Teams) emphasize the need to examine data graphically and it’s something that I do. I do hundreds of plots and only illustrate some here. Willis has also been looking at the proxies graphically and has a post on the way, from the same sort of perspective but handled a bit differently, so keep an eye for that.
As a start, I’ll show 4-panel plot for two sediment time series, with each plot showing left to right – the time series on a common scale 800-2000; a violin plot of the distribution (this is a sort of histogram); an autocorrelation function out to lag 150; a spectrum of the scaled series again on a common scale. I’ve done these analyses on the period prior to the 20th century so that possible 20th century anthropogenic impact is not included in the distributions and spectra.
In these two cases, the distributions are quite different with the paleosalinity series being rather unsymmetric, and both have noticeable autocorrelation. I’ve uploaded similar plots for all the sediment series in this directory. These ones are in no way “wild” within the group; actually, the first one is relatively “tame” and this is one of the reasons that I’ve illustrated it here.
Now I’ll show similar plots for white noise, low order red noise (rho=.375) and a random walk (this one spending time in the negative half). White noise and low-order (Mannian assumption) red noise have very symmetric violin plots, decorrelate very rapidly in the ACF and have little low-frequency in the power spectrum. The random walk decorrelates very slowly and has an asymmetric distribution (random walks nearly always spend their time on one side or other of the 0-axis.) The salient point is that these two sediment distributions (like the others as well) do not fit the assumption of low-order red noise.
What sort of assumptions are needed to yield simulations that look sort of like these series? For the Black sediment series, here is a corresponding plot using AR=.97 (!) and MA=-0.3 (the combination of high AR and negative MA in this range is a method that I’ve experimented with in the past.) This yields a simulation that visually looks sort of like the Black sediment series. But you’re now dealing with series with much more troublesome statistical properties than a low-order AR1 series.
The fracdiff package has some nice simulation tools as well. The parameters for the Black series under fracdiff are d=0.48 (very close to the top level of 0.5!), ar =0.5 and ma=0.04. Here’s a realization from fracdiff.sim (using 3 parameters only.)
In our simulations of the North American tree ring network, we used a more awkward simulation method which used the entire ACF functions. This has resulted in some carping by Wahl and Ammann, though they conceded that the method yielded realistic looking time series. If we were doing this again today, I’d simulate these results using fracdiff as above. It’s not something that makes any difference to the effect discussed in MM05a, as the bimodel HS distribution reported in MM05a holds even with simple AR1 and the longer-persistence models simply spread out the bimodal lobes a bit.
The big problem with these series for statistical analysis is that they are much closer to random walks (AR1=1) than to white noise (AR1=0) and these sorts of series are highly prone to spurious regression. Tests designed for white noise (and relaxed slightly to include low order red noise) don’t work. Not that that deters Mann et al.
Of course, it’s hard to imagine real life events occurring which were unanticipated by models, isn’t it?