Klemeš on Stochastic Processes

TCO asked about physical processes that can generate time series with autocorrelation properties. This is a harder question than it seems and leads into the giant topic of stochastic processes, which rapidly gets very complicated. I’m not in a position to give a thorough answer, although it’s a topic that interests me a lot. I’ve provided here some extended quotes from V Klemeš, who has spoken to the issue (although more towards persistent processes than AR1 or AMRA(1,1) processes. He also writes acidly on climate modeling, which I also excerpt here, e.g. the following:

This I would call an “honest and humble” search for signals as opposed to the boastful claims of assorted “modellers” about all kinds of climate-change effects, motivated more by politics than by science and reflecting prejudices rather than fact.

The discussion is oriented towards persistent processes, rather than simple AR1 or ARMA(1,1) processes. I’ll come back to it with a few more examples on another day. Klemeš got into hydrology via the study of civil engineering in Czechoslovakia – he reminisces that this was one of the few university courses that a person of politically incorrect background could then study.

Any hydrologist is necessarily interested in these series with highly persistent autocorrelations. Mandelbrot modeled such time series (e.g. Nile river levels, tree rings, varves) with fractional difference “long memory” processes. While the very gradually declining autocorrelations of such geophysical series could be modeled through “long memory processes”, Klemeš argued (very plausibly in my opinion) than it was inconceivable that nature engaged in such bureaucratic depreciation accounting and sought a plausible mechanism to achieve gradually declining autocorrelations. A couple of his criticisms about people who apply inappropriate mathematical models:

Somehow the operational attitude toward mathematical modeling, the exaggerated strife for mathematical tractability and convenience ("Oh Lord, please keep our world linear and Gaussian") has blurred our sense for reality…

Our mathematical models, including time-series models, by which we try to describe geophysical records are only as good as our understanding of the processes that generated them. …a geophysical process must be analysed and understood from a physical point of view before it can be adequately mathematically described. ..

There is one important but often overlooked fact which I shall cause the mischief factor of mathematics. It sometimes causes mathematics to frustrate rather than facilitate a scientific discovery: the specific mathematical method used in data analysis may introduce into the result some features which are then wrongly attributed to the physical process being discussed. Perhaps the most common case arises in correlation analysis where the mischief factor often manifests itself as spurious correlation…

To go back to TCO’s question, Klemeš’ own suggested mechanism for generating persistence is “semi-infinite storage” models, a model which seems very plausible to me, but which looks difficult to fully understand the behavior :

An exceptionally fruitful concept for the mathematical modeling of hydrological processes is the so-called semi-infinite storage reservoir, especially the type with a fixed bottom and no fixed maximum (Klemeš, 1970, 1971, 1973]. It adequately describes the basic mechanism common to such different water reservoirs as, for instance, a lake, a single dew droplet, a glacier, a groundwater basin and a man-made reservoir operated for flood control or hydroelectric generation. Their common property is on the one hand, the possibility of running dry and the other, the fact that they have no fixed limit of storage capacity (water level in a dam can rise to any elevation above the dam crest, as is demonstrated in the history of dam failures, and a glacier can cover whole continents as is documented in geological history.)

Even a very simple model of this type can reveal very disturbing properties to be expected in hydrologic processes. For instance, a single non-linear reservoir fed with white noise will produce output that is nonstationary, a first-order Markov chain with time variant serial correlation and random component [Klemeš 1973]. ..

Most geophysical processes involve strong cumulative effects: they themselves represent processes of storage fluctuations.

I can sort of see how you can picture El Nino events from this perspective. Klemeš points out that there are often successive levels of integration in geophysical processes, adding to the modeling difficulties. For example, levels of Lake Ontario integrate precipitation; flow of the St Lawrence River is an even further integration.

The difficulty of the problem increases as we move from precipitation records to records of hydrological processes, which already by themselves reflect the effect of some hydrological storage and thus in their raw form already represent cumulative processes or include them as their components.

This is a pretty simple model to picture, but you can intuitively see how it generates autoregressive features, even if it’s hard to go from the process description to the autocorrelation properties. (There are many other processes which yield autocorrelations). He has an extended discussion of how the process of integration applied several times, quickly yields "cycles" and "trends", applying Slutsky’s 1933 Econometrica article. So when he points out that:

Hydrological series have a tendency to exhibit more pronounced and smoother cycles than precipitation series.

it is entirely consistent with a Slutsky integration phenomenon.

He observes re proxy data:

Such and similar effects may contaminate interpretations of past climatic trends, in particular when they are reconstructed from proxy data.

I really will try to post up on the Slutsky issue. Meanwhile, I’ll quote from Klemeš on climate modelers:

Nor do I see any point in constructing time-series models for “scenarios” of runoff, precipitation, temperature etc., for the “doubling” of CO2, year 2050 etc.; implying an onset of some “stationary” state”.. This I would call an “honest and humble” search for signals as opposed to the boastful claims of assorted “modellers” about all kinds of climate-change effects, motivated more by polities than by science and reflecting prejudices rather than fact. …
” their thinking is insincere because their wishes discolour the facts and determine their conclusions, instead of seeking to extend their knowledge to the utmost by impartially investigating the nature of things. [quoting from Confucius)

the ease with which scenarios involving “impacts” of any arbitrarily imposed “climate change’ can now be produced even by amateurs and dilettantes whose grasp of problems does not extend beyond an ability to insert DO-loops into the various models to which they may have gained access.

This has led to “metabluffing’ where, in contrast to ordinary bluffing described above, it is now not just the various questionable approximations of real (historic) evens that is meticulously polished and presented as rigorous sciences, but concoctions produced by arbitrary and often physically incongruent changes of model parameters, ‘process realizations’ that may be unrealizable under known physical law. ..

It has often been pointed out [Klemeš 1982b, 1991b, 1992a; Rogers, 1991, Kennedy 1991] that not much more can be said about the hydrological effects of a possible climate change beyond the fact that it introduces another source of uncertainty into water management. Specifically I summarized the achievements of a decade of “climate-change-impact’ modelling in these words: “Basically the only credible information obtained from the complex hydrological modeling exercises relating to climate variability and change is that, if the climate becomes drier, there will be less water available and the opposite for a wetter climate….

14 Comments

  1. Peter Hartley
    Posted Sep 25, 2005 at 3:08 PM | Permalink

    “I can sort of see how you can picture El Nino events from this perspective.” If El Nino events are driven by cold water upwelling around the equator west of South America, they would be just like the lake wouldn’t they? There is a lower limit of zero to the amount of cold water upwelling at that spot, but the amount has no upper bound.

    On the original issue of the origin of ARMA processes, one explanation for an MA error term is that a shock to the system lasts for a longer period than the time gap between observations. The errors for successive observations are then correlated, but the correlation goes away once the gap is large enough that the errors no longer have a shock in common. As for an AR process, a common explanation is that deviations from the mean set up forces that either tend to bring the variable back to mean gradually (a positive AR coefficient) or lead to short-run over-adjustment but still long run convergence (a negative AR coefficient).

  2. JerryB
    Posted Sep 25, 2005 at 4:32 PM | Permalink

    Some upwelling equatorial cold water near South America is “usual”, more than “usual” is associated with La Nina, less than “usual” is associated with El Nino.

    Some factors in El Nino development may reenforce each other (e.g. eastward expansion of the “warm pool”, and reduction of the tradewinds and Walker circulation).

    A single driver may have been identified, but I do not know if it has.

  3. TCO
    Posted Sep 25, 2005 at 8:03 PM | Permalink

    1. Does random walk process, autocorrelated? If so, I can see how storage would give you a random walk versus independant behavior. Each year the net input to the reservoir might change level by some amount of inches. Let’s say the input is independant in variation. The level will be a random walk then, not an independant Gaussian distribution, no? I could also see someone throwing darts at a board (let’s say a board that had values of -10 to 10. The working sum of the throws would be a random walk? Not sure what the low bottom is supposed to mean. Why is that invoked? That seems more related to skews (2 below).

    2. RE bottom: there is some interesting stuff in Hunter and Schmidt’s review on the value of talent in the workplace (J. Appll. Psych, I think) that talks about how the expected normal variation in output by workers may not be a normal curve but may be a skewed normal. Some obvious physical reasons for this can be if there are impediments or dependancies that limit how well/poorly a worker can do (for instance a machine that tops out at a certain rate). Not sure if this has any relation to stuff that is being done here, but thought I would throw it out.

    3. PDF or citation for the Kleme article?

    4. Yes, I guess if there are storage factors (or cycles) in the climate, that would explain some correlated behavior of the proxies. Intuitively, it’s a lot easier to understand that in the lake level than in the tree rings. I guess there could be some, though. Tough year having a carryover effect? Or the climate itself having “storage”.

    5. a. Muchas gracial, Peter, that was helpful but half physical. Could you actually give a (hopefully easy to understand) cliched example of both of these processes where the physical rationale for the behavior is explained/verified–i.e. the mechanical or biological basis?
    b. On the MA: I guess if the tree is traumatized by a bad year and needs some time to recover (like my lawn when I don’t water it in the summer), this would be such an issue? Don’t know if that actually happens biologically with a tree as I would expect the seasonality of the year (winter) to be far more an issue of “trauma” than a good/bad growing season memory.
    c. On the AR: straining to think of an example for that, other than in man-made controls systems. Gotta physical example?

  4. Steve McIntyre
    Posted Sep 25, 2005 at 8:49 PM | Permalink

    Re 1. Random walk has AR1 coefficient of 1 and is classic autocorrelation. I find it helpful to think of AR1 =0.9 as a blend of 90% random walk and 10% white noise. I’m not sure that it’s exactly right, but if not it’s pretty close.
    Re: #2 – there’s the old salesman distribution – 10% of the salesmen make 90% of the sales. My original hypothesis on the hockeystick was that 10% of the proxies would be accounting for 90% of the shape – that was before I knew how the principal components methods worked, but it was right.

    Re #5. Bristlecones retain needles from a good growth year for about 30 years and have more photosynthetic capacity. Drought can kill tree branches and reduced growth. O18 can diffuse in a ice core creating autocorrelation.

  5. TCO
    Posted Sep 25, 2005 at 8:57 PM | Permalink

    2: I know you know this, but of course to some extent the Pareto effect (80-20 rule) has to be in effect. I mean…I once saw a chart that said that less then half of stocks ogt more than half of the gains and I was like…DUH! 🙂

    5. Thanks.

  6. TCO
    Posted Sep 25, 2005 at 9:00 PM | Permalink

    Do you correspond with Keme?

    Your first two would be MA process no? (storage). The last one I’m not sure of. Is it AR? Something else?

  7. Peter Hartley
    Posted Sep 26, 2005 at 7:26 AM | Permalink

    TCO — Sorry, I lost my internet connection for about 12 hours. Here are two examples of AR and MA processes related to weather. We can think of climate as the long term average weather conditions in a given location that result from the usual locations of pressure belts, ocean currents etc. resulting from oceanic and atmospheric circulations in 3D. Unusual weather for a given location involves a departure from the long term average, but the forces that create the average weather conditions for that location are still operative. They gradually reassert themselves and bring the weather conditions back toward the average. The result would be an AR type of error structure. A hurricane passing a given location might lead to an MA type of structure. The weather over a couple of successive days will be correlated as the hurricane approaches, passes over, then moves on, but the correlation will only be for a few days — an MA structure.

  8. TCO
    Posted Sep 26, 2005 at 8:47 PM | Permalink

    1. Thanks, good examples. Just to nail it down, are there some other processes from other fields that are even more classical? (non climate examples?) P.s. I really liked the water level thing. I guess that is the slutsky that you refer to offhand.

    2. the stuff about forcing back to the average is interesting. Does anyone ever make analogies to classical controls theory (dampers and springs and 2nd order diffeqs)?

  9. Peter Hartley
    Posted Sep 27, 2005 at 7:56 AM | Permalink

    TCO — the “classic” example of a moving average error structure in financial markets occurs when relevant information is realeased more infrequently than the period of observation. A given piece of unanticipated information then affects a finite number of successive observations.

    As regards forces tended to move variables back to average — yes indeed in economics people do make analogies to classical control theory. The idea that aspects of the economy can behave like dampers etc is one theory behing business “cycles” (in quotes because these are stochastic tendencies rather than something deterministic that you could “set your watch by”.)

  10. TCO
    Posted Sep 27, 2005 at 8:49 AM | Permalink

    You seem to know your stuff. What is your background?

  11. Posted Jul 2, 2008 at 2:45 AM | Permalink

    I’m also interested to know your background.

  12. Posted Jul 2, 2008 at 2:48 AM | Permalink

    awesome post. Very informative. It really shows your expertise in your field.

  13. Posted Jul 2, 2008 at 2:50 AM | Permalink

    I just don’t get some of the points. Is it different from La Nina phenomenon?

  14. Geoff Sherrington
    Posted Jul 2, 2008 at 5:01 AM | Permalink

    Re Peter Hartley # 1

    Is the following a non-climate example of a shock as you decribe-

    one explanation for an MA error term is that a shock to the system lasts for a longer period than the time gap between observations.

    Possible example. Taking antibiotics to shock kill a bacterial infection. If several doses are prescribed, it is also needed to prescribe how far apart they should be spaced as “shocks” otherwise the infection grows back between doses and might approach or exceed the starting level. If taken too close together, they admit the possibility of incomplete extermination and regrowth of the bacteria after the dose regime has ended. (I have predator-prey maths in mind, partly, Hartley). Do clinical mathmaticians have relevant models? I have no idea.