I’m not sure that there’s a huge demand for more linear algebra on MBH98, but here’s the rest of the proof that the NH temperature index in an MBH98-type calculation is simply a linear combination of proxies and, when only one temperature PC is reconstructed, the weights are proportional to the correlation between each proxy and the temperature PC1.
Re-scaling of Reconstructed Temperature PCs
After the calculation of RPCs, Mann re-scaled the variance of each RPC in the calibration period to the variance of the “observed”‘? RPC. This step is not described in MBH98 itself and was not replicated in our emulation for MM05b. There, purely for reasons of expediency, we re-scaled the variance of our emulated NH reconstruction to the variance of the MBH reconstruction. The impact on cross-verification statistics for the 15th century of the two methods is negligible -as the cross-validation R2, RE and CE all coincide within 1%. The method is also one that is not written in stone. Bürger and Cubasch describe it as a “flavor”‘? among various reconstruction alternatives.
The existence of this step in MBH98 has also eluded other observers. For example, Von Storch et al  postulated that the lesser low-frequency range of MBH relative to other reconstructions [e.g. Esper et al, 2002] was due to MBH regression procedures without subsequent “parching”‘? (i.e. re-scaling) of the series. The step was seemingly only introduced by Ammann and Wahl at the penultimate stage of their analysis. Their code includes an annotation dated April 2005 providing for re-scaling of the RPCs as described here, which presumably had been absent in their earlier code. The rescaling step is definitely observable in the MBH98 source code archived in summer 2005.
This step can be expressed in linear algebra terms as follows:
19) , where
In our algorithm, we have calculated the two standard deviations, which is a slight departure from a purely linear analysis as the term is slightly non-linear. However, it is very close to linear in the observed ranges and the linearization of the above equation has little consequence (especially relative to the general imprecision of the procedure) with benefits to the eventual analysis.
Calculation of NH Temperature Index
The RPCs are estimates of the temperature PC series, which were originally obtained from a PC decomposition of 1082 gridcell series (after division by their standard deviation and their area weight). The next MBH98 step is to take the vector of eigenvalues àŽⲠand the matrix of eigenvectors V from the original temperature PC decomposition (using Neofs principal components) and re-combine then to estimate the gridcell temperatures (after division by standard deviation and area-weighting):
Next, the division by gridcell standard deviation and area weighting prior to the PC calculation are reversed by multiplying by gridcell standard deviations and the reciprocal of the area weighting (gridcell cosine latitudes in MBH98, although this should have been their square root) as follows:
where is the matrix of estimated gridcell temperatures, is the vector of gridcell standard deviations, is the vector of gridcell weights (cos latitudes here). Finally, for the calculation of the NH temperature index, the gridcell temperatures for all NH gridcells were multiplied by their area weights (cosine latitude) to yield the NH temperature index by calculating an area-weighted average over the applicable NH gridcells. The identification of NH gridcells (or other candidate areas, such as the “sparse”‘? set of 219 gridcells or even an individual gridcell) can be specified by a logical vector index which functions as a selection of columns. The weighted average can be displayed in linear algebra terms as follows:
where is the NH temperature index, index is a logical vector denoting NH gridcells. The vector is used directly here rather than diag () as this effects the weighted average. To calculate results for a sparse subset as in MBH98, the logical vector index merely needs to be re-defined.
Weighting of the RPCs
There are some useful simplifications that can be obtained from the linear algebra here. First, I show that the calculation of the temperature index can be obtained as a linear combination of the RPCs, with the weights depending only on the selection of RPCs and the selection of candidate gridcells through a logical vector index.
It can be readily seen then that:
where is a vector of weights defined as follows:
This weighting can be readily calculated as a function of the selection of Neofs eigenvectors and the logical vector of gridcells, thereby quickly yielding either a NH temperature index or a temperature index for an individual gridcell. The key point here is that, up to the very slight linearization of the re-scaling step, the temperature index is a linear combination of the RPCs.
Collecting the Terms
There is an even further simplification possible. Let’s recall that the RPCs can be calculated in 2 lines:
We can now express the final NH temperature index in terms of the original proxies as follows:
where and G are calculated as shown above – both being calculable from the calibration period data. Since the proxies are located on the far left of the calculation, we can accomplish another simplification similar to the one for the RPCs, since the NH reconstruction is simply a linear combination of the proxies, expressed as follows,
27) , where w is vector of weights calculated as:
In the one-TPC situation (of AD1400 and AD1000), this reduces to (replacing matrices in capitals with vectors in Greek lower case (including which is a vector of weights here), constants in Roman lower case), see post 520 – :
That is, the NH temperature reconstruction is a linear combination of the individual proxies, each weighted by their correlation to the temperature PC1. I’ll post some more on this situation on some other occasion, but, for now, if the proxies are all orthogonal (and the inter-proxy correlation of most of the 15th century proxies is so low that this is surprisingly close to being the case for the major block of the proxies), then the coefficients would be identical to those from multiple correlation. With some inter-proxy correlation, this multivariate method gives coefficients that differ from those of multiple linear correlation. However, when one starts to think about this odd methodology in statistical terms, I think that it’s helpful to start with known results arising from multiple regression.
In the 15th century case, this boils down to an operation like multiple regression with 79 observations against 22 variables. In the AD1820 case, there are 112 variables. If you add autocorrelation into the mix, the “effective”‘? number of observations is reduced from 79 to some lower number – Moberg et al. estimate 6 in not dissimilar circumstances and then recoil from the implications. You can start to see why you can get some pretty high calibration R2 values and why estimation of confidence intervals based on calibration residuals is fraught with problems. We are starting to see an infectious spread of this aspect of MBH98 methodology in multiproxy studies in the guise of correlation-weighted reconstructions (see Mann and Jones, 2003; Mann, Rutherford, Ammann and Wahl, 2005). I’ll re-visit this on another occasion.