THe Ritson Coefficient

In my previous comment on Ritson’s AR1 calculation, I think that I correctly diagnosed that the calculation was goofy, but I didn’t diagnose what was going on correctly (thanks to Demetris Kousoyannis who emailed me). I’ve re-visited it and I’m pretty sure that I’ve now diagnosed the problem with what Ritson was doing. My starting premise was 100% correct – if you simply do an AR1 fit to the US tree ring series, you get the high AR1 coefficients that I reported before and which differ radically from the Ritson coefficient. So you have to ask yourself: if there’s a perfectly good algorithm for calculating AR1 fits, why does Ritson propose a new algorithm for calculating an AR1 coefficient? Why wouldn’t he just use a standard algorithm? Needless to say, I am naturally pretty suspicious of Hockey Team non-standard algorithms?

Anyway, I checked Ritson’s method against synthetic AR1 series of varying AR1 coefficients up to and including random walk and, while the answers were different than those of a standard algorithm, they were all in the right general range.

Then I experimented with ARMA(ar=.9,ma=-0.6) noise, a type of noise pretty familiar from climate time series (leaving aside the larger question of long-term persistence and multiple scaling). ARMA (1,1) series are something that should be on the radar screen even of the Hockey Team.

Here the performance of the various methods varied fantastically. This is based on very quick simulations. If you correctly specified the model as ARMA(1,1), estimates of the AR1 coefficient using standard arima function in R were 0.8-.93, all pretty reasonable. If you estimated the AR1 coefficient using a mis-specified ARMA(1,0) model, you got AR1 coefficients in the 0.34-0.53 range, which, interestingly enough, is also in the range of observed AR1 fits to North American tree ring data.

Now for the Ritson coefficient: it was in the 0.0 to 0.2 range, again almost exactly the Ritson coefficients for the North American tree ring network. So the Ritson method fails catastrophically in the face of ARMA(1,1) noise. A conventional AR1 calculation is a little more stable against misspecification of ARMA(1,1); the Ritson method goes haywire.

Of course at bizarroclimate, they won’t care about such details.

It’s never easy with the Hockey Team.

http://www.climateaudit.org/?p=682

This entry was written by Stephen McIntyre, posted on May 29, 2006 at 11:21 AM, filed under MBH98. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

26 Comments

TCO

Posted May 29, 2006 at 12:27 PM | Permalink

DO you know if this is an issue in other fields? Failure to deal with both components of ARMA leading to misspecification of the first one? Are there other models with even more components and to do algorithms “fail” by not including all of them?
Jack Lacton

Posted May 30, 2006 at 3:57 AM | Permalink

TCO,

You are getting very boring these days in asking such inanities. Once upon a time you at least appeared to undertake some study whereas now you’re simply sniping from the sidelines with valueless drivel.
Steve McIntyre

Posted May 30, 2006 at 7:04 AM | Permalink

#1,2. I don’t think that it’s a bad question. I don’t know what the situation is in other fields. My impression is that other fields make a much better try at the statistics.

BTW in one of Demetris Koutsoyannis’ articles, which I was re-reading, he points out that if you take a time-period average of an AR1 stochastic process, you get an ARMA(1,1) process. So even if the “real” process is AR1, if there’s been any averaging along the way – many measured series are averages and many natural series are smoother than the gridcell temperatures -, you need to use ARMA(1,1) at a minimum.
Demetris Koutsoyiannis

Posted May 30, 2006 at 9:18 AM | Permalink

Re #3. Right and well formulated: you need an ARMA(1,1) at a minimum.

But the minimum may not be enough. I believe that the Markovian notion that lies behind it (because in this case averaging is not a natural thing, it is a mathematical procedure) cannnot be a good perception of natural processes. Also, a lot of empirical analyses that I have done using several types of time series (for instance at very fine time scales of seconds, minutes etc., in which no large scale mechanisms/changes interfere) have not suggested a Markovian structure ever. As I wrote recently somewhere else:

Viewing complex natural phenomena as AR(1) processes, which means Markovian processes, may be too simplified. Recall from the theory of stochastic processes that a Markovian process is by definition “a stochastic process whose past has no influence on the future if its present is specified” (Papoulis, 1991, p. 635). Thus, for me it is very difficult to imagine that only the present state of a complex natural system matters for its future and that we can drop our knowledge of its past.
David Stockwell

Posted May 30, 2006 at 9:43 AM | Permalink

To me the problem lies in the determination of present state. Typically in climate it is taken as the averages of states for the last year. This is not ‘present’. While there is clear annual periodicity, it is still an essentially arbitrary time scale. Reliance on a particular scale introduces artifacts of analysis.
Jack Lacton

Posted May 30, 2006 at 12:23 PM | Permalink

#3

Steve – my point was that TCO asks questions almost as a reflex action without first having had a look-see at what’s going on elsewhere. He could have done some work first and then asked his question in terms of comparison to other fields rather than leave yourself (and others) to do the heavy lifting for him.
Demetris Koutsoyiannis

Posted May 30, 2006 at 2:46 PM | Permalink

Re #5
This is a strong point. Indeed, reliance of a particular scale introduces artifacts. Therefore, a decent model should preserve its structure on several time scales. Note that an AR(1) model defined at a certain scale will no longer be an AR(1) model at any other scale (provided that the switching to different scales is done by time averaging). As pointed out in #3, #4, it yields an ARMA(1,1) structure at aggregate scales.

But I think there are additional problems with the Markovian notion. To overcome the problem of scales, let us assume that we have a Markovian model in continuous time. This has an exponential decay of autocorrelation structure in continuous time.*

Now, assume that we have some information from observation of the process at a resolution as high as you wish. That is, we know precisely the value of the process at the present time instance t = 0, as well as in a past time interval as long as you wish. We wish to do a prediction for the future. Given that the process is assumed Markovian, we will discard all observation for the past and base our prediction on the state at a single time instance (t = 0) only. Do you find this complying with the behaviour of a complex natural system? Do you find that for the future trajectory of the natural systen only its current state matters, whereas the path which has followed to reach the current state is totally irrelevant for the future?

[*Footnote: A Markovian model in continuous time, if converted in discrete time taking intervals of length D yields (a) an AR(1) autocorrelation structure if we take instantaneous values every interval D or (b) an ARMA(1,1) autocorrelation structure if we take averages over intervals of length D.]
Peter Hartley

Posted May 30, 2006 at 3:01 PM | Permalink

Re #4, #7 Demetris, why could you not generalize the notion of a markov process so the “current state” depends on a vector of current and lagged values of the “state variables.” This is done in economic models. Admittedly, it works best if we have discrete rather than continuous time, and then only if no more than a finite number of past periods is relevant to the future, but it allows for a more complicated autoregressive structure than AR(1).
Demetris Koutsoyiannis

Posted May 30, 2006 at 3:14 PM | Permalink

Re #8. That is what I am saying. The more components your vector has the better the approximation to reality. So, the past matters and the Markovian/AR(1) notion is abandoned. The problem with ARMA type models is that if you add vector components, you inflate model parameters. Parameter parsimony is important. With a scaling model you have all available observations in your vector but at the same time your model has only one parameter, exactly as an AR(1) model.
Peter Hartley

Posted May 30, 2006 at 3:20 PM | Permalink

This does make the scaling model sound attractive! I must look at it some more. I downloaded some of your papers that Steve linked to but have not had time to read them carefully or fully as yet.
Steve McIntyre

Posted May 30, 2006 at 4:01 PM | Permalink

#9. Demetris, Peter Huybers has an interesting article (Nature) in preprint , which constructs a frequency continuum from seconds to Milankowitch. He presented this graphic at AGU in December and it was very interesting. The impression that I get (and I don’t guarantee that I’ve understood it fully) is that centennial scale is almost half way between (in spectral terms) annual and Milankowitch and is the most indeterminate in some sense. I don’t know how this converts in your multi-scale view of hte world, but I’m sure that you’ll find it interesting.

Huybers has just (May 30) posted up a preview of an article using frequency analysis on millennial reconstructions, which I won’t have time to read for a few days.
Demetris Koutsoyiannis

Posted May 31, 2006 at 2:15 AM | Permalink

Re #11. The article seems to be extremely interesting and I have to read it in detail. From first glance the figures harmonize with my scaling/entropic view — but this is a just a first feeling. The different scaling exponents above and below centennial scale is an interesting finding. But it would be also interesting to put it in terms of uncertainty (particularly because I see in the patchwork spectrum that the exponents above and below the centennial scale were calculated from different data sets).

I must say that the authors’ contrast between “purely stochastic processes” and “deterministic control on the continuum” made me sad. I cannot understand what is meant by “purely stochastic”. “Purely random”? Or was “purely” used just as pejorative to “stochastic”? I think that stochastic processes have provided the only reliable way to link probability/uncertainty/randomness with causal deterministic dynamics. Stochastic processes incorporate the deterministic controls in their structure. Stochastic processes have provided such tools as spectral analysis, analysis of uncertainty, and many others to explore and analyze data, and identify the deterministic controls. Stochastic processes have provided such tools as stochastic (Monte Carlo) simulation, stochastic forecasting, stochastic integration (effective and efficient even for “purely deterministic” problems such as the numerical calculation of an integral), stochastic optimization and others which enable us to get insight into complex systems and solve difficult problems. Stochastic processes have enabled views of the world that are much richer that mechanistic views. Poor stochastic processes! We use them every day and simultaneously depreciate them. We use them even to refuse them.

[Footnote: I do not blame this interesting paper, which just triggered my thought; my disappointment is related to a more general feeling from the dominant line in geophysical research].
TCO

Posted Jun 20, 2006 at 3:39 PM | Permalink

OK. I went and googled and could not find anything good on the general issue of AR1 versus /AR11 determinations. You would think this is a pretty basic concept and in some sense similar to a single factor regression versus a two factor regression. Would think that there would be some decent writing on when to use one versus the other, how much the answers change, degrees of freedom, physicality iissues, etc. Would seem to almost be textbook level stuff.
TCO

Posted Jun 20, 2006 at 3:40 PM | Permalink

In any case, I think looking at the problem with this lense versus one purely restricted to this individual case would be instructive. Who is the Tukey of ARMA? Can we engage him?
Steve McIntyre

Posted Jun 20, 2006 at 10:08 PM | Permalink

#13. Things like the Akaiche Information Criterion are often used; or log-likelihood. I like to look at the standard errors of the coefficient. For most climate series, both AR1 and MA1 coefficients are very significant. Also remember Koutsoyannis’ point: if you take an average of measurements from an AR1 process and take that series as your time series (e.g. days into a month, month into a year), you get an ARMA 1,1 series. So it’s hard to picture circs in which you would not have at least ARMA 1,1.
TCO

Posted Jun 20, 2006 at 11:51 PM | Permalink

Actually, I’m having a hard time picturing when Koutsy’s criteria would occur. Not for tree rings, which is the bulk of the stuff.
UC

Posted Jun 21, 2006 at 7:43 AM | Permalink

All models are false, but some are useful.

Here’s one way to choose the model: Pick the model such that sequential prediction of the future given the past leads to lowest prediction error. Place a thermometer next to a proxy. Then choose the model that predicts proxy minus temperature sequences most efficiently.
TCO

Posted Jun 21, 2006 at 11:39 AM | Permalink

How would Ritson do on the Akiake Criteria? What about you? Should there be more factors? Physicality arguments (the summing to annual is very unsatisfying given that the vast majority of the series are annual resolution already).
UC

Posted Jun 22, 2006 at 9:31 AM | Permalink

IMO Ritson Coeff works well if the assumptions, very low freq signal and AR1 additive noise, are valid. Works for random walk as well. But if the signal has high-frequency components, the coefficient will be underestimated. And I don’t get it, why the signal (local annual temperature, right?) should have only slow components? It would imply that we could predict near future local annual temperatures with infinitesimal error. Makes no sense / I’m completely lost.

BTW, #14 Do you mean the Cooley&Tukey Tukey?
Phil B.

Posted Jun 22, 2006 at 11:07 AM | Permalink

Re #4 Determis, you wrote “Recall from the theory of stochastic processes that a Markovian process is by definition “a stochastic process whose past has no influence on the future if its present is specified” (Papoulis, 1991, p. 635). ” But isn’t the present just a weighted sum of the past. ie for a first order discrete markov process x(n+1)= a*x(n) + e(n) => x(3) = e(2) + a*e(1) + a^2*e(0) + a^3*x(0). Or did I miss your point?
Phil B.

Posted Jun 22, 2006 at 11:31 AM | Permalink

Re #3&4, It is straight forward to obtain the z transform tranfer function for a moving average. Multiply this transfer function by the AR1 transfer function. By downsampling the output to the annual rate you now create a transfer function that contains this transfer function plus aliases. My guess is that your ARMA(1,1) modeling is a good approximation to this downsampled transfer function. A good reference for this downsampling is chapter 3 in Strang/Nguyen “Wavelets and Filter Banks” book.
fFreddy

Posted Jun 22, 2006 at 12:08 PM | Permalink

Re #20, Phil B.
What you say is true, but there are lots of different combinations of x0, e0, e1 and e2 which could get you to any given x3. With a Markov process, you don’t care about the path by which you got to x3: x4 will be the same no matter what the different possible x2, x1, etc.

Demetris’ point is basically that a climate system must have some “momentum”. You could imagine a point 50 years before the depths of the Little Ice Age, and a point 50 years after, where a snapshot of the global mean temperature would be the same at both points. But it would not be reasonable to say that the environment one year after each of those points has the same probability distribution.
fFreddy

Posted Jun 22, 2006 at 12:14 PM | Permalink

I wonder if you could make an argument that the more data you could include in your “environmental snapshot” at any one time, the less you would need multiple ARMA terms ? So the multiple terms are (sort of) acting as estimators for all the relevant real world data that you can’t measure, and that the process could only be Markovian if your data vector contained absolutely all relevant data ?
Hmm. Bit spacey …
Mark T.

Posted Jun 22, 2006 at 12:23 PM | Permalink

With a Markov process, you don’t care about the path by which you got to x3: x4 will be the same no matter what the different possible x2, x1, etc.

That’s a good description. I spent a lot of time with Markov this past semester in a Stochastic Modeling class and our teacher was nice enough to make sure we understood this very point.

Mark
Phil B.

Posted Jun 22, 2006 at 2:24 PM | Permalink

Re #22 fFreddy, given the equation x(n+1)= a*x(n) + e(n) were e(n)~ N(0,1) i.e. e(n) is normally distributed with zero mean and a variance of 1. Then the pdf for x(n+1) is N(a*x(n),1). Which is what you and Mark T., I believe are pointing out. For the Little ice age example you have described, you are now assuming that e(n) has a time varying mean and variance i.e. e(n)~ N(m(t),sigma(t)) or nonstationary statistics. Or that there is a better model than an AR1.
fFreddy

Posted Jun 22, 2006 at 4:25 PM | Permalink

The LIA+-50 example was straight off the top of my head and was only intended to illustrate a point: don’t take it too seriously.

Or that there is a better model than an AR1.

Unless AR1 describes global temperature perfectly, then certainly there must be a better model. Haven’t a clue what it is, though.

One Trackback

By Take a Ritalin, Dave « Climate Audit on Dec 4, 2010 at 1:19 PM

[…] to Ritson’s recent postings at realclimate about autocorrelation which I discussed here and here. Ritson is now promoting the idea that autocorrelation in proxy series is really low and that we […]

Climate Audit