Right now, I’m working on two main projects where I intend to produce papers for journals: one is on the non-robustness of the "other" HS studies; the other is on MBH98 multivariate methods. The latter topic is somewhat "in the news" with the two BàÆà⻲ger articles and with the exchange at Science between VZ and the Hockey Team goon line of Wahl, Ritson and Ammann. ("Goon" is a technical term in discussion of hockey teams: Hockey Teams all have enforcers sent out to fight with and intimidate opposing players.)
From work in progress on the second project, here are some comments on the linear algebra in BàÆà⻲ger et al. Contrary to B-C claims that MBH98 methods cannot be identified in known multivariate literature, I show that – even though this is seemingly unknown to the original authors and later commentators – MBH98 regression methods can be placed within the corpus of known multivariate methods as one-factor Partial Least Squares (this is probably worth publishing by itself although it opens the door to other results) . (Update – I’ve edited this to reflect comments from Gerd BàÆà⻲ger, which have reconciled some points.)
While there is much to like in BàÆà⻲ger and Cubasch 2005 and BàÆà⻲ger et al 2006, they were unable to place MBH98 within a multivariate framework, stating:
MBH98 utilize a completely different scheme which one might call inverse regression. (To our knowledge, no other application makes use of this scheme.)
They summarize the algebra of MBH as follows:
There, one first regresses x (proxies) on y (temperature) and then inverts the result, which yields, according to eq. (5), the linear model
where the “+” indicates the Moore–Penrose (pseudo) inverse, with , , etc. denoting the cross-covariance between x and x, and y and x, respectively and using the matrix
Here’s what I agree with and disagree with in this formulation:
1) I agree that the reconstructed PCs are linear combination of the proxies (as is the NH temperature index), a point that we made as early as MM03, although a contrary view is expressed in Zorita et al 2003 and von Storch et al 2004;
2) I think that this is a cumbersome and infelictous algebraic expression, although I’ve now reconciled it to my own notation. It’s obviously not expressed this way in MBH98 – this is how B-C characterize MBH98, not what MBH said they did. I think that there are some important cancellations.
3) Because BàÆà⻲ger et al have not carried out all possible cancellations (and in particular have not focussed on the one-dimenstional case which governs the MWP), they’ve been unable to place MBH98 within the universe of multivariate methods. I’ve referred in passing to the placement that I’ve derived and will show a few more details today.
4) I’m still checking on how BàÆà⻲ger dealt with re-scaling.
It would be amusing if B-C was one more instance of commentators not quite implementing what MBH98 actually did. In terms of robustness issues, I don’t think that this would affect their non-robustness conclusions, but it might affect specific results. The continuing inability of commentators to place MBH98 methods within the vast corpus of multivariate methods is surely intriguing. As a follow-on, I think that a couple of later studies have ended up, using methodology that is formally equivalent or near-equivalent to MBH98 without being aware of it and even describing the methods as novel.
The viewpoint that the reconstructions is linear in the proxies was one that we expressed as early as MM03 as follows:
It should be noted that each of the above steps in the MBH98 northern hemisphere temperature index construction is a linear operation on the proxies. Accordingly, given the roster of proxies and TPCs in each period, the result of these linear operations is a set of proxy weighting factors, which generates the NH average temperature construction.
Similar comments were made in MM05b (E&E). An opposing view was expressed in Zorita et al 2003 as follows:
“The optimized temperature fields target the whole available proxy network at a given time, so that the inclusion of a few instrumental data sets in the network should have little influence on the estimated fields, unless the instrumental records are explicitly overweighted. The advantage is that the method is robust against very noisy local records. This contrasts with direct regression methods, where the estimated temperature fields are the predictands of a regression equation. In this case a few instrumental records, highly correlated to the temperature fields, may overwhelm the influence of proxy records with lower correlations in the calibration period.”
Von Storch et al 2004 took the same position as Zorita et al 2003 that the effect of individual proxies could not be allocated:
“MBH98′s method yields an estimation of the value of the temperature PCs that is optimal for the set of climate indicators as a whole, so that the estimations of individual PCs cannot be traced back to a particular subset of indicators or to an individual climate indicator. This reconstruction method offers the advantage that possible errors in particular indicators are not critical, since the signal is extracted from all the indicators simultaneously.”
Obviously our viewpoint – that individual proxies i.e. bristlecones – had a dramatic and non-robust impact on MBH98 results was at odds with the viewpoint expressed by VZ. B-C agree that the MBH result is linear in the proxies. So I think that the earlier viewpoint of VZ has been definitively superceded (and it’s a view that Eduardo held passim rather than strongly). I don’t make this point as a gotcha, but because I think that it is extremely important to think in terms of estimators formed from linear combinations of proxies in order to think mathematically about the various issues in play.
MBH98 Linear Algebra
Earlier this year, I posted up linear algebra showing that the reconstructed PC1 was a linear combination of the proxies with weights proportional to the correlation coefficients to the target temperature PC1, followed by re-scaling; and that the NH temperature index was simply this series multiplied by a constant.
This first post showed this for the reconstructed temperature PC1 . If u = U[,1] the temperature PC1 and à?à = cor (u,Y), the vector of columnwise correlation coefficients, then I derived the reconstructed temperature PC1 as follows (before re-scaling):
with the result stated as follows:
That is, the reconstructed TPC1 in the AD1000 and AD1400 steps is simply a linear weighting of the proxies, with the weights proportional to their correlation to the TPC1.
In the next post here, discussing our source code, I showed that we re-scaled empirically and then extended the linear reconstruction to the NH temperature reconstruction, since all steps in this procedure are also linear. In the one-dimensional case, the RPC1 is just multiplied by a constant. However the re-scaling can be stated in linear algebra terms, yielding an expression for the re-scaled reconstruction as follows, if I’m not mistaken:
The simplifications and cancellations of this linear algebra are something that I’ve been working on for a while and am still working on. These results are not inconsistent with our code – they are equivalent to it. However, the cancellations involved here make it possible to condense about 20 lines of code into 1 or 2.
Contrary to what I’d posted earlier, I’ve worked through the linear algebra of the BàÆà⻲ger et al 2006 expression from first principles and can now reconcile their results to mine, although as noted above, I don’t believe that they have carried out all possible cancellations.
In terms of other studies, I have been unable to replicate Mann and Jones 2003. They appear to use proxies weighted by correlation coefficients. I corresponded with Phil Jones about this in an attempt to figure out what they did. Jones said that the weights did not "matter" and that similar results were obtained using unweighted average. This was false. I asked him what weights were used in Mann and Jones 2003; he said that he did not know, only Mann did, and terminated the correspondence. I think that Hegerl et al 2006 (J Clim) also use correlation-weighted proxies, but we’ll have to wait and see when it’s published.
One-Factor Partial Least Squares
Partial Least Squares is a known multivariate technique, commonly applied in chemometrics. Googling will turn up many references. Partial Least Squares is an iterative process and, after the first step, the iterations proceed through what have been identified mathematically to be Krylov subspaces, but these later subspaces are not relevant to the one-factor PLS process. The first step of Partial Least Squares also constructs an estimator by weighting series according to their correlation coefficients.
Bair et al 2004 has a concise formulation of this shown below, although the point can be seen in any multivariate text discussing
Comparing this formulation to the derivation in my post summarized above, in both cases the coefficients of the estimator are partial correlation coefficients. In the MBH98 case, the estimator is re-scaled to match the variance of the target – which is one of the PLS options. In the later MBH steps, more than one temperature PC is reconstructed, but each temperature PC is reconstructed through an individual PLS process. For the purposes of NH temperature, the estimate is totally dominated by the temperature PC1 and the lower order temperature PCs have little actual impact. In the 15th century step and the earlier MBH99 step, only one temperature PC is reconstructed anyway.
Once one expresses MBH98 as a multivariate method within a known statistical corpus, it totally transforms the ability to analyze it both in terms of available tools and even how you approach it.
As noted above, PLS is commonly used in chemometrics, where it is viewed as an alternative to other multivariate methods, including principal components regression. Viewed from the perspective of linear algebra, all these multivariate methods can be construed as simply a method of assigning weights to the individual proxy series. Again, this is a simple statement, but it is well to keep firm memory of this when confronted with bewildering matrix algebra – at the end of the day, after all the Wizard of Oz manipulations, you simply end up with a series of weights, and there is no guarantee that all this work will do any better than a simple average. Indeed, in my opinion, the FIRST question that should be asked is: why aren’t we just doing an average?
The corpus of available methods is bewildering: multiple linear regression is one obvious multivariate method – in this case, the resulting weights are called "regression coefficients". You could also do a principal components analysis and, if you truncate to one PC, the resulting weights are called the "first eigenvector" (which are then scaled by the first eigenvalue). Other named methods include ridge regression, partial least squares, canonical correspondence analysis, canonical ridge regression. Principal components regression is regression on the principal components. There are connections between most of these methods and there is much interesting literature on the connections between these methods (e.g. Stone and Brooks 1990; papers at Magnus Borga website; etc.)
More generally, surely all these multivariate methods are, from a linear algebra point of view, alternative mappings onto a dual space (the coefficient space). Each multivariate method is a different mapping onto the dual space. B-C do not construe their "flavors" in a multivariate method context. They generalize the non-robustness that we had observed phenomenonologically, but their consideration is also phenomenonological and doesn’t really rise to an explanation of what’s going on. (I’ve got an idea of what would be involved in a proper explanation, but doing so is a different issue.)
In this respect, it’s helpful to construe a simple unweighted average as also being a method of assigning coefficients in the dual space. When you read a good article like Stone and Brooks 1990, linking OLS, PLS and ridge regression through a "continuum", it makes one wonder how one can go from OLS coefficients to coefficients for a simple average through a "continuum". I’ve developed some thoughts along this line, which seem too obvious to be original and not wrong, but I’ve not located them anywhere.
If you do a simple average of the series after "standardizing" – this could lead to further variations involving robust measures of location and scale: perhaps using a trimmed mean as a measure of location and mean average deviation as measure of scale as alternatives to the OLS mean and standard deviation, which stand in poor repute among applied statisticians and standardization methods (see Tukey, Hampel etc.)
In the MBH case, it’s not just partial least squares. The tree ring networks have been pre-processed in blocks through principal components analysis; the gridcell temperature series have been pre-processed through principal components analysis on a monthly basis (and then averaged moving them slightly and annoyingly off orthogonality – but they are approximately orthogonal.) Some series are included in networks, others aren’t. However, I think that these are subsidiary issues and there is much advantage in separately considering the MBH98 regression module from the viewpoint of the known method of one-factor partial least squares.
Tomorrow I’ll show how a number of these methods apply to a VZ pseudoproxy network. The VZ pseudoproxy network is a massive oversimplification – it assumes that pseudoproxies add the SAME proportion of white noise to gridcell temperature. But because it’s so simple, you can test some things out before you move to more complicated red and low-frequency noise. My main beef is that results from this proxy Happy-world have been extended to situations where they do not apply.
von Storch, H., E. Zorita, J. M. Jones, Y. Dmitriev and S. F. B. Tett, 2004. Reconstructing past climate from noisy data. Science 306, 679-682.
Zorita,E., F. GonzàÆà⠬ez-Rouco and S. Legutke, 2003. Testing the Mann et al. (1998) Approach to Paleoclimate Reconstructions in the Context of a 1000-Yr Control Simulation with the ECHO-G Coupled Climate Model . Journal of Climate 16, 1378-1390.
Eric Bair, Trevor Hastie, Debashis Paul, and Robert Tibshirani, 2004. Prediction by Supervised Principal Components , http://www-stat.stanford.edu/~hastie/Papers/spca.pdf
BàÆà⻲ger et al 2006. Tellus,…