Rational Decisions, Random Matrices and Spin Glasses

Interesting title, no? What if I added Principal Components to this odd concatenation of concepts?

Galluccio et al 1998 published a paper with the above title here, which has led to a number of follow-ups, which you can locate by googling. I’ll try to summarize Galluccio’s basic idea and then tie it back into principal components and multiproxy networks.

I haven’t fully grasped all aspects of this article but the concepts seem intriguing relative to issues that we are working on.

Galluccio considers construction of financial portfolios from networks in which the stocks have low inter-series correlation (a "noisy covariance matrix"), allowing for short selling (negative coefficients), subject to the constraint that you have to put up margin on the short sales, concluding that allowing negative coefficients results in a dramatic change in optimization properties. Galluccio describes the problem as follows:

In order to illustrate this rather general scenario with an explicit example, we shall investigate the problem of portfolio selection in the case one can buy, but also to short sell stocks, currencies, commodities and other kinds of financial assets. This is the case of “Åfutures’ markets or margin accounts. The only requirement is to leave a certain deposit (margin) proportional to the value of the underlying asset [4].

Galluccio compares this problem to the analysis of "spin glasses" and borrows some technical methods from solutions to this problem in physics. They demonstrate that an infinite number of "optimal" portfolios can be developed under plausible constraints and that slight changes in assumptions can lead to very different results.

By means of a combination of analytical and numerical arguments, we have thus shown that the number of optimal portfolios in futures markets (where the constraint on the weights is non linear) grows exponentially with the number of assets. On the example of U.S. stocks, we find that an optimal portfolio with 100 assets can be composed in ~10^29 different ways ! Of course, the above calculation counts all local optima, disregarding the associated value of the residual risk R. One could extend the above calculations to obtain the number of solutions for a given R. Again, in analogy with [10], we expect this number to grow as expNf(R, a), where f(R, a) has a certain parabolic shape, which goes to zero for a certain “Åminimal’ risk Rà⣃ ’ ”¬’. But for any small interval around Rà⣃ ’ ”¬’, there will already be an exponentially large number of solutions. The most interesting feature, with particular reference to applications in economy, finance or social sciences, is that all these quasi-degenerate solutions can be very far from each other: in other words, it can be rational to follow two totally different strategies!

[It is ] “Åchaotic’ in the sense that a small change of the matrix J, or the addition of an extra asset, completely shuffles the order of the solutions (in terms of their risk). Some solutions might even disappear, while other appear. … When, in addition, nonlinear contraints are present, we expect a similar proliferation of solutions. As emphasized above, the existence of an exponentially large number of solutions forces one to re-think the very concept of rational decision making.

Now one of the things that we’re seeing as more is learned about the properties of MBH98-type methodologies is the proliferation of alternative solutions. But my hunch, and it’s only a hunch, is that Burger and Cubasch have only scratched the surface of the proliferation with their 64 flavors. I think that the real problem is that, when you have a noisy covariance matrix, which is what the MBH networks are, changes in weighting factors resulting from seemingly equally plausible methods can lead to very different reconstructions. I think that the number of flavors may be much more than 64 and may even expand exponentially in line with Galluccio et al (but I’m not sure of this.) I haven’t been able to fully map the two problems one to the other, but here are some of the similarities that strike me almost immediately.

First, both have constraints. In Galluccio, this is the portfolio constraint. In a singular value decomposition of a matrix (equivalent to principal components), the sum of squares of the eigenvector coefficients adds up to 1. This looks like it might be equivalent to the Galluccio constraint.

Second, both can have positive and negative signs. In Galluccio, it is short selling. Obviously, one of the essential aspects of principal components methods (and more generally regression methods) is that you can have both positive and negative signs. This seems like a small thing, but Galluccio considers that this is very important in the proliferation of solutions. Now it seems to me that if something is supposed to be a temperature proxy, you should know ex ante whether it points up or down and you should not permit your multiproxy method to invert the sign. Mannian methods permit this – thus some instrumental gridcell temperature series actually have negative coefficients in the later MBH reconstruction steps (they are flipped over).

Hughes at the NAS Panel made an interesting distinction between the "Schweingruber Method" and the "Fritts Method". The "Schweingruber Method" in principle selects temperature-sensitive proxies ex ante and averages them or some such simple method. The Fritts method didn’t worry about whether proxies were temperature proxies or precipitation proxies. It just put everything into a hopper, turned on the black box and waited for the answer at the other end. Mann took this to its logical extreme.

The Schweingruber method led to Schweingruber et al 1993 with its large network of over 400 temperature-sensitive sites, also described in several Briffa et al publications, about which I’ve previously posted. The problem for the Hockey Team is that the Schweingruber method, which seems far more logical to me, led to the "divergence problem" – ring widths and densities went down over the large population in the second half of the 20th century, as grudgingly admitted in several Briffa, Schweingruber publications.

The Mann method, as I’ll show in more detail in another post, seems to have taken the exact opposite position: allowing any sort of proxy, including precipitation proxies, even instrumental precipitation measurements, hoping that the multivariate method would sort it out through teleconnections. This complete disregard for whether something is a temperature proxy and total reliance on multivariate statistics to sort out the mess is what increasingly seems to me to the distinctive "contribution" of MBH.

However, it seems obvious that people have not fully thought through the properties of the Mann method as applied to noisy networks. However, Galluccio has already established some properties of wild networks and some of their findings should translate into the analysis of multiproxy networks. But exactly how? Maybe Jean S or Luboà’¦à⟠will have some thoughts on this.

You can also see why the issue of covariance or correlation PC methods gets submerged when you start thinking about the properties of noisy covariance (or correlation matrices). At the end of the day, reconstructions are simply linear combinations of the original proxies and each multivariate method is simply yielding a set of weighting factors. These weighting factors are called "regression coefficients" under a multiple linear regression and an eigenvector under principal components/singular value decomposition. A simple average results from weighting factors of 1/n,..,1/n. The issue in noisy covariance matrices is that the final reconstruction is not robust to different plausible choices of weighting factors.

Arguably the NAS Panel has implicitly rebuked the grab-bag Mann approach to proxy selection, stating the following:

Using proxies sensitive to hydrologic variables (including moisture-sensitive trees
and isotopes in tropical ice cores and speleothems) to take advantage of observed correlations
with surface temperature could lead to problems and should be done only if the proxy–
temperature relationship has climatologic justification. (p. 110)

Reference: Stefano Galluccio, Jean-Philippe Bouchaud and Marc Potters, 1998. Rational Decisions, Random Matrices and Spin Glasses http://www.citebase.org/cgi-bin/fulltext?format=application/pdf&identifier=oai:arXiv.org:cond-mat/9801209

If you google noisy covariance matrix together with galluccio and various combinations, you can get a bibliography pretty fast.

1. Dave Dardinger

Steve,

Perhaps Ross will weigh in on this topic. I’ve finally got around to getting “Taken By Storm.” for which you’ve unjustly taken some grief. (People wanting you to defend something you weren’t a part of and aren’t apparently qualified to talk about.) Anyway much of Chapter 3 which I’m presently reading is much concerned with the complexity of systems. I’d think that either Christopher or Ross would have something to say on whether there’s a connection between such a portfolio problem and Principle Components analysis or not.

2. TCO

I’m not a quant jock. Understand the very basic concepts from an MBA, non-math perspective. Will try to read the article.

In your post, I don’t quite understand the comment about local optimums. From a finance perspective, what matters is the actual optimum (portfolio risk reduction). Local optima would be mathematical curiousities.

Yes, clearly Mann has taken an approach that is very prone to mining and overfiotting and the Bloomfield (name?) comment in the NAS questioning that you can qualify which flavor of BC to use by recent performance fits right in there. The economics article on sinning in the basement seems relevant here as well (as data mining is a sin that is cautioned against, but some argument for a bit of it now and then is made). I think fishing for relationships is interesting. But only if you follow it up with some qualification of what you’ve found. Not just a math mess.

In addition to the mining of the Mann hopper, the whole thing seems rather strange. How did he come up with the overall methdo that he uses for the recon. The varuious steps and combinations. Why are some proxies more equal then others? How does he justify his particular geo-weighting. The whole thing seems like a kludge.

I guess I need to re-read MBH98, but one thing that I’ve always wondered, always bothered me. Does he give a rationale for what series are included, what not? I think this is a standard thing in social science meta-analysis. You describe what the criteria are and then say that you did a lit search and this was what came back.

3. TCO

Disregard the remark about local optima. I didn’t understand what was going on. Reading the paper now and it makes more sense.

4. Steve McIntyre

He said that he chosed series according to “clear a priori criteria in Mann et al 2000 and that was one of the reasons for the robustness of the result. We asked Nature to provide us with what the criteria were. Mann refused and Nature refused. The journals are good institutions on balance, but they are really pathetic in some respects.

We raised the data selection issue with NAS and they dodged it.

5. TCO

Description of the series choosing method or sample choosing if this were social science or study choosing if this were medical science is basic part of the methodology. That is part of the meta-analysis. Something that people can look at, can see if it is appropriate, can speculate about how different critiera would give different results. It pains me that Mann, the mathematical physicist is so oblivious of basic practices from the weaker sciences like sociology. And how can the data selection be from 2000? It should be part of the 1998 paper. I really need to reread it. I’m just feeling lazy today.

6. TCO

I looked at the paper. The part at beginning with the portfolio theory stuff was quite readable. The part where it got into math was above me.

I think one of the things that may make things curious by including short-sales is the nature of the option. In a sense a short sale has unlimited downside. Is the model including this aspect or just the margin at risk? If you consider that they may take your house away from you, then you really have more money at play then just the margin.

7. Paul Linsay

#2

Local optima would be mathematical curiousities.

Um, no. This is very common in multivariate non-linear systems. There’s one best optimum, but lots of others that are nearly as good. The most famous example is the traveling salesman problem. What’s the shortest route between N different towns? It turns out to be impossible to solve exactly without simply walking through all possible routes when N is large. But you can get good approximations that are close to the best.

8. TCO

Agreed. I thought they meant local optima in the sense of variation of a parameter (of curvature) but far from the optimal frontier. The paper spells it out. Saw that when I read it.

Still don’t think much of Steve’s point about covariance and correlation getting lost when you look at the larger concept. I think this drives too much from an advocacy and a he-said she-said view of the discussion as just about trashing Mann or Steve. I have no idea if correlation of covariance makes more sense in the given example of MBH. But I do know that we should not tar the sin of off-centering with some of the result that comes from standard deviation dividing. That’s just clear thinking and disaggregation of issues.

9. fFreddy

Re #7, Paul Linsay

There’s one best optimum, but lots of others that are nearly as good.

True, in most real world examples. But in theory, there is no reason not to have multiple optima – like multiple minima on y= (x^2 – 1)^2.

10. Posted Jul 1, 2006 at 8:47 AM | Permalink | Reply

My intuition on this is that a better model of the eigenvalues is the Boltzmann distribution, with the values as energy levels rather than strategies, which seems to complex for the PCA. In a ‘cold’ system with little long variance, the eigenvalues are more evenly spread than in a network with a ‘hotspot’ like the proxies with a hockey stick.

Following the analogy to VS and his tame networks, there really is only one independent dimension there, so the result should be fairly tame. In the real word case there is greater independence between the samples, and that the PCA resolves into eigenvectors. Due to noise these can come out in different orders, so there is lack of robustness introduced by taking a small finite number. The real world series are probably sampled from and infinite dimension population with cross correlation and autocorrealtion, a complexity that is very hard to replicate.

11. Dave Dardinger