I’ve been reading through some articles on altitudinal reconstructions by Rob Wilson and other Luckman students. The studies all follow a similar strategy as Wilson et al 2007 – principal components analysis; truncation to eigenvalues 1, varimax rotation and regression. It’s pretty obvious that these operations are all linear and if the linear algebra were boiled down, the operations simply provide weights for the sites. (On a previous occasion, I showed how the MBH linear algebra was a vast panoply of calculations, many of which cancelled out if the linear algebra were analyzed.)
Today I’m going to work through the linear algebra of the multivariate method used in Wilson et al 2007 and similar studies. It appears to me that the Varimax rotation doesn’t actually do anything and is cancelled out.
I’m not 100% of the algebra below; maybe UC or Jean S or someone else can check this. Also if anyone has any references in a real statistics book (not by climate scientists) to the effect of combining regression with varimax principal components, I’d be interested.
First, denote the network of tree ring chronologies as the matrix . Equation (1) shows its decomposition into principal components.
(1) where is the principal components, the eigenvalues, the eigenvectors. Let’s say that the measurements are for n periods for m proxies. U,S, V are not truncated in this equation. Thus, dim X= (n,m); dim(U) =(N,m); dim(X)= (m,m); dim(V)= (m,m). A useful expression for the principal components matrix is the following:
since V is orthogonal and S diagonal. Denote the truncation to the k principal components with eigenvalues GT 1 by ; dim (=(n,k); dim()=(k,k); dim()=(k,k). Equation (1a) applies, mutatis mutandi, to the truncated versions as well. Denote the rotation operator by , dim(R) = (k,k), with the rotated varimax PCs obtained as per equation (2) as follows:
The inverse regression of temperature on the rotated varimax PCs is denoted by equation (3) as follows:
with the reconstruction obtained by equation (4)
By simple linear regression algebra, we have:
since is orthogonal and is orthogonal
From (1a) we have the following expansion for :
From (5), we therefore have:
From (4) we therefore can express the reconstruction as follows:
since R is orthogonal
where is the columnwise covariance of the each site chronology to the target temperature. This shows clearly that the matrix is a rotation of the vector of covariances, which simply change the weights of the contributing sites as the expansion of the expression is simply a set of weights (negative allowed).
Since the varimax rotation operator does not enter into equation (8), it looks to me like this operation doesn’t do anything when combined with a subsequent regression operation.
In my post a year ago on VZ pseudoproxies, if tree ring proxies are believed to be signal plus low-order noise (I don’t necessarily believe that, but it’s presumed in many analyses), then any rotations that result in negative coefficients result in overfitting. With the information available in these studies and the absence of any archived measurement or chronology information, it is impossible to assess the severity of the problem. My guess is that there’s some overfitting and the claimed r2 statistics are exaggerated, but that a simple average of the proxies or a simple allowance for regional weighting will probably yield results that are generically similar in appearance; I’d expect the greatest differences to be in the 20th century portion.
I mentioned in a post a year ago experiments with a VZ pseudoproxy network, which I construe as being a very “tame” network in that the noise is prescribed as white or low-order red and there is no contamination of the network by totally mis-specified series. A year ago, von Storch and Zorita were embroiled in a dispute with the team over the impact of different methodologies on attenuation of low-frequency variance. The VZ observation about low-frequency attenuation was 100% correct as far as it went, but the debate got sidetracked by an irrelevant distraction thrown up by the Team about whether trended or de-trended calibration was used. If one examined the linear algebra of that calibraiton (along the lines of the discussion above), one can see how idle the Team’s purported rebuttal was.
In developing my own view of the matter, I prepared the following two graphics, which I think is a very nice illustration of the impact of various multivariate methods on a tame network. The first graphic shows the effect of several different multivariate methods on a reconstruction – numerous other multivariate methods could be developed. The comparison between OLS and Averaging is intriguing – the calibration fit of OLS is highest, but it has the worst performance in recovering low-frequency information. Mannian regression (which I’ve shown is equivalent to Partial Least Squares regression through following the linear algebra) is in between the two. Principal components (either correlation or covariance) applied to a “tame” network does quite well (but is obviously vulnerable to breakdown with contamination as I’ve discussed elsewhere.) But I’d like you to look at the 2nd figure.
Figure 1. Reconstruction from VZ pseudoproxies using several multivariate methods.
This figure shows the coefficients of the different methods. People should recognize that, beneath the skin, “regression coefficients”, “eigenvectors”, “empirical orthogonal functions” etc all boil down eventually to producing a vector of weights. Here I plotted the vector of weights in a controlled network under the different schemes. The OLS coefficients are the most diverse . In this controlled network, the variation in coefficients is entirely adventitious. The introduction of negative coefficients through the OLS mechanics is particularly pernicious as this flips the signal over and works against any recovery of the signal.
Figure 2. Weights from different multivariate methods from VZ pseudiproxies.
If you think that something is a temperature proxy, at a minimum, you need to use a multivariate method that can’t result in negative coefficients. An advantage of simple CVM methods is that they avoid the introduction of negative coefficients. In the particular case of Rob’s network, without the data being archived, I can’t tell whether the coefficient distribution looks more like an OLS distribution or are balanced. Going towards OLS distributions of coefficients will increase calibration r2 but, in a controlled pseudoproxy system, these seeming gains are illusory and it is the same in a real network.