Comments on: A realclimate Advisory

By: Mark Frank

Mark Frank — Wed, 11 Jan 2006 07:10:36 +0000

John and Steve thanks for your time. I am not convinced, but I now understand your viewpoint better. I suspect any further debate would waste everyone’s time.

By: John S

John S — Wed, 11 Jan 2006 01:09:50 +0000

One aspect of complicated model vs simple model is comparing reduced form models with structural models.

Structural models put in all the steps in a chain while reduced form models solve out for all the intermediate steps and express equations in their simplest possible form by removing intermediate variables. Depending on your purpose, each can perform a role and mathematically they are equivalent. For forecasting or real world applications, however, a problem with structural models is that errors at each step of a chain can cumulate and your ultimate forecast will be very poor indeed. A reduced form model just needs to estimate the total effect and is much more robust to real world errors.

I will try an analogy suggested by Rasmus: If I want to know the effect on the temperature of a gas from adding energy to it this can be expressed very simply in one equation. This is akin to a reduced form model. A structural model, on the other hand is akin to trying to model each particle in the gas. Theoretically you should get the same result, but in practice you won’t. The reduced form still incorporates physics in its parameter estimates, but has distilled them down to only those relationships that are necessary for the purpose at hand.

Your comment about correlation and causation is important but pretty funny. MBH essentially confuses correlation with causation for bristlecone pines and global temperature without any understanding of the underlying causation (which turns out to be pretty evanescent). Anyway, there are a number of statistical tests that can minimise confusing correlation with causation. An important one in the current context is that with trending series (high autocorrelation) there is a significant problem of spurious correlation (or related issues) and you need to explicitly test for this possibility. The whole realclimate thread was kicked off by a paper that (to paraphrase) said “you might be confusing correlation with causation because you are not accounting for high autocorrelation properly”. So I agree with you absolutely that correlation vs correlation is a problem – but I see the fault in this issue with Rasmus et al because they seem to be doing precisely that.

As for your cats, in the real world the perfect is the enemy of the good. Provided you knew the limits to extrapolation from your model (i.e. the range over which your linear approximation to the non-linear relationship held true) then it sounds like a eminently useful and sensible solution to a problem. Only if you had to deal with very high vermin numbers outside the limits of your model would you have needed to improve the model.

You can Google for ‘RBC model’ but explaining it wouldn’t particularly benefit the discussion unless you are already familiar with them and their place in the history of economic thought.

By: Mark Frank

Mark Frank — Tue, 10 Jan 2006 08:18:47 +0000

Re #52.

John – I am not sure why you think a model based on scientific reality need have excess variables. It should have the right variables that reflect what is actually going on in the world. This may be more or less complex than a mathematical model based purely on the observered data – but it certainly shouldn’t have unnecessary variables. The climate is complex and any model that reflects reality is likely to be complex – but that’s only because all those factors affect the outcome – I don’t think anyone is suggesting throwing in variables that are irrelevant.

I do agree that the best model is the one that makes the best forecasts. A model that is based purely on finding the simplest solution that describes past data but has no foundation in the underlying science will surely come unstuck in the future (unless it happens by luck to also represent reality). In the example of my cats a linear model was a very simple solution that perfectly described the data available. However, it would come severely unstuck had I used it to predict the number of cats with very high levels of vermin (or vice versa). A solution based on a better understanding of the physical attributes of cats would be much more robust in making future predictions.

Doesn’t this go right back to your first statistics course – don’t confuse correlation with causation (even auto-correlation!)? I believe this means – not just that you can’t deduce causation from correlation but that you should not make predictions from an observed correlation unless you understand the underlying causation.

PS What’s an RBC model?

By: Steve McIntyre

Steve McIntyre — Tue, 10 Jan 2006 04:15:01 +0000

John S.: re your random comment, arguably a lot of microeconomics is simply the study of convex functions. I remember doing an econometrics course nearly all about convex functions (this was 1969). Here’s an intriguing comment about thermodynamics: "phenomenological thermodynamics is the study of Legendre transformations of convex functions." So one would expect some analogies to develop. I’ve been mulling over some of the parallels from time to time, but my head gets sore after a while and I think about proxies to soothe the pain.

By: John S

John S — Tue, 10 Jan 2006 03:29:03 +0000

Re #45
By ‘better’ I mean either it fits the data better or forecasts better. There are numerous statistics that can measure this quantitatively (e.g. R-squared or forecast error variances)

“If the simple model is created by tinkering with models until you find one that fits the data then it will almost certainly be “better” than the GCM model.”

The risk of ‘overfitting’ or otherwise generating the result you want with enough hard work is real and serious – but it applies equally to GCMs as simple models. However, in the long run, one model can’t be better than another model unless it contains all that is necessary to forecast (and nothing more). Just because a particular model fits the data well doesn’t mean it is right. But if a particular model doesn’t fit the data then you can be certain it is wrong. That is why rigorous statistical testing of models is required – to weed out the pretenders. No model should have a divine right to rule absent verified and consistent performance.

Let me try to explain: Suppose that the true model of the world is that x=f(y,z). If I have a model that is univariate i.e. x=f(x) (forgive my abuse of notation here making it correct is just too much trouble, hopefully the idea is clear) then it can not be better than a properly specified model with x=f(y,z). On the other hand, if you have a model where x=f(a,b,c,d,e,f,g,…) it can ultimately be no better at forecasting than the simple univariate model x=f(x) (and probably worse). The reason is that whenever y or z change any model that excludes them will perform poorly – certainly worse than any model that includes them. Alternatively, if a complicated model is the same at forecasting as a univariate model I would be inclined to believe that all the extra variables that are included in the complicated model are, in fact, irrelevant.

Statistical testing is about achieving optimal parsimony, including everything that should be in there but nothing else (Occam’s Razor again).

(As for your cats, extremes can be tricky, but local linear approximations of non-linear relationships can be very useful – even climate scientists do it.)

[And in a random comment that I just wanted to put out there for those that have ears to hear, GCMs remind me of RBC models – a lot.]

By: Mark Frank

Mark Frank — Mon, 09 Jan 2006 19:02:23 +0000

Thanks for the info on GCMs. I might try and get some info on how they work on RC as I am not a known enemy 🙂

By: Steve McIntyre

Steve McIntyre — Mon, 09 Jan 2006 17:59:36 +0000

#48. Mark, GCMs are not run thousands of times – that’s one of the problems. Ammann told me that 25 years of model-time took one calendat day of supercomputer time. So there has never been even one run of a GCM covering the Pleistocene – anything that’s been run is something quite different. But you must recognize that you are really dealing with single runs of GCMs. Presumably these have been selected and we don’t know much about the selection process. There are more than one model, but in each case we are dealing with selected single runs – so, if there is a monoculture (as there is), there is a real possibility of systemic choice-making (from a statistical point of view).

It is also a very real issue whether GCMs are properly “grounded” in physical reality, not least of which is that the parameterizations used in GCMs are ad hoc methods of “solving” Navier-Stokes on a gridcell basis, but there is no proof that you can expand Navier-Stokes from the micro-basis that it is shown to apply to, to gridcell aggregates.

One way of testing the “grounding” itself is whether they generate autocorrelation features that match proxy information. I get the impresssion that they don’t do this very well – which points to potential problems in their methodology. If the data has persistence (as it does), then, if your models do not yield similar persistence, it seems evident to me that there must be some problem with the GCM and that you cannot rely on it to generate null distributions. (However, since you are dealing with single runs, they are not generating null distributions anyway.)

By: Mark Frank

Mark Frank — Mon, 09 Jan 2006 15:50:27 +0000

Steve

Rather than get into what Rasmus has said elsewhere, why not consider the core proposition in the RC thread – independent of who said it. I take this as:

1) Stochastic models that reflect the data, but are not founded in physical reality, are not reasonable candidates for generating the null hypothesis and this includes autocorrelation models.

2) GCMs models are founded in physical reality and reflect the data and therefore do provide a reasonable basis for generating the null hypothesis. (I am not at all clear how this works – do they run the models thousands of times with marginally different starting conditions?)

I don’t know enough about GCMs to comment on the second part and I think you are saying you don’t either. So I guess someone else will have to fill that in.

I don’t see why autocorrelation in proxies is relevant to 1. As I understand it, the proxies are likely to be correlated with the temperature, but surely they are consequences of the temperature not causes. So they are really just better ways of estimating the temperature. They are not independent of the data. What is needed is a model that takes into account the *causes* of the temperature data and shows why an autocorrelation model is appropriate. This is not a dispute about the accuracy of the data. It is about the best model to use to predict it in the future.

Unfortunate I can’t view Demetris papers without paying $30 which is a bit more than I am prepared to spend. The abstracts suggest that, while they are undoubtably outstanding work, they are all about the consequences of different models on the data. This is not relevant to deciding whether the models have a physical justification or not. But maybe there is more in the papers themselves.

I have thought of another way of looking at this. Significance testing is all about the probability of the observed data given a model and statisticians are experts at this. They are aware of a number of models and can select models that best predict the observed data i.e. the probability of the data given the model is high including subtle characteristics of the data such as autocorrelation. However, going right back to basics, this is not the same as the probability of the model. To estimate that we need an a priori estimate of the probability of the model independent of the data (which can be modified somewhat by the observed data). This is where the subject matter expert comes in. They can look at a model proposed by the statistician and say “if that happened it might explain the data but I am afraid the world doesn’t work like that”.

By: Steve McIntyre

Steve McIntyre — Mon, 09 Jan 2006 13:59:24 +0000

#41: Mark, Cohn and Lins argued that statistical significance tests should attend to the possibility of persistence in the data – which changes the relevant significance test dramatically. Rasmus has objected that you can’t define the persistence based on the temperature data itself. This is one issue and I’ve seen disagreement on this. But aside from the temperature data, you get very high levels of persistence in many forms of proxy data – some of which I’ve surveyed here, ranging from Vostok to Bob Carter’s ODP logs. Demetris Koutsoyannis has pointed out the same phenomenon in far more detail in his publications. So there is much evidence of persistence which is independent of the temperature data itself.

It seems to me to be a valid question whether the GCMs generate the forms of autocorrelation observed in the long proxy data. I don’t know at present, but I would be cautious about taking Rasmus’ word for it.

Aside from this particular thread, Rasmus’ has published both at realclimate and in the literature on statistics of record levels based on i.i.d. distributions – none of this survives any significance testing in which autocorrelation is considered. My recent post on Trenberth shows that Trenberth pointed out the problem (wll known in general statistics) to climate scientists as long ago as 1984.

By: Mark Frank

Mark Frank — Mon, 09 Jan 2006 13:38:08 +0000

Louis – I am sorry – I don’t know which model you are referring to when you say “That mathematical model did not, and this is the problem.”