We have a number of readers who are highly qualified econometricians. I think that initially they find it hard to believe the description of Hockey Team statistical practices. Martin Ringo is one such reader, who has a doctorate in "finite sample properties of a variety of feasible, generalized least squares estimators". He’s sent in the following discussion of principal components reconstruction from a different aspect than any that I’ve considered – whether the temperature PCs from the short calibration period will even accomplish what MBH want. Marty:
Even if we had the exact PC component from the SVD of the long series, it is not enough to reconstruct the temperatures. (That is, even if MBH’s proxies could perfectly estimate the temperature PCs, they still won’t be able to reconstruct what they want to reconstruct.) To reconstruct the temperature, even the averages across all the grids, one needs a better estimate of the post-multiplying matrices associated with the long series SVD than the post-multiply matrices of the short series will give.
Martin Ringo: More on Linear Algebra
I am going to start by reviewing what the proxy variables and all those regressions were about (explained in “MBH Calibration-Estimation Procedure”). There is a relatively short temperature record, roughly 150 years, and Mann et al and most of us would like to have a much longer record. To do this the MBH98 procedure was to use the several linear relationships between the temperature records and proxy variables over that same period to produce a estimate of a set of the principal components of that temperature and then to use that relationship applied to the long period (600 or 1000 years) of the proxy records.
Now for a little linear algebra (although fortunately I am going to avoid all those proxy regression and scaling issues).
MBH98 had 1082 temperature series over 94 years. Call that a 94 by 1082 temperature matrix. The matrix has a most a rank of 94, and thus MBH98 couldn’t use it for principal component analysis (PCA) as such. They expressed on a monthly basis to give a 1128 x 1082 matrix (presumably of full rank 1082). Using the matrix form of MBH98’s equation (in the Methods Section), the temperature matrix could be expressed as:
1) T1 = U1 * S1 * V1′, (for monthly data)
where (for simplicity) I am going to assume that 1) was derived from the compact form of the SVD algorithm with U1 a 1128 x 1082 matrix of principal components, S1 a 1082 x 1082 diagonal matrix, and V1 a 1082 x 1082 matrix. U1 and V1 are, of course, both orthogonal matrices.
MBH98 took the first Neofs PCs of U1 to derive a T^ (“T hat”) estimate of the full temperature matrix.
The thing to note about equation 1) is that it can be used to “reconstruct” T. That is with U1, S1, and V1 we can get T, and if the first Neofs of U1 explain most of the variation, the can approximately reconstruct T from those first Neofs PC. MBH98 took the long series of proxies and relationships from the inverse regression equations to build a U1^ as long as the proxy data. If one goes through the linear algebra in “MBH Calibration-Estimation Procedure,” you will see that the length of the U1^ is independent of the length of T1. Hence the long proxy series allows MBH98 to produce a long estimate of the temperature PCs, a long U^ so to speak.
That is what was done. I looked at the reconstruction a slightly different way. Suppose we don’t have to use U^. Suppose we have the actual U, the actual PCs from the actual, long, temperature series. Then we should be able to do an even better job of reconstructing the full temperature matrix for the long period, say a full 600 or 1200 years, because the actual PCs must be a better estimator than the estimated PCs. Posing the problem this way gives a type of upper bound on what can be accomplished through reconstruction of estimated temperature PCs.
This type problem is just the kind of thing Monte Carlo experiment can — I won’t say solve, but — give insight into. The experiment simulated a set of serially and spatially correlated temperature variables (as deviations from the mean), a matrix T of monthly data. T_full, which is combination of T1 and, say, T2 corresponding to observed and unobserved parts. T1 is decomposed per equation 1) and the annual version of T_full is decomposed similarly:
2) T_full = U_full * S_full * V_full’, (annual)
[where T_full might be 1200 x 1028, U_full the same, S_full 1028 x 1028, and V_full 1028 x 1028]
The first Neofs PCs from U_full are taken: call them W. The U1 from the short period SVD is simply thrown away. In this “world” we know the true PCs. But the S1 and V1 of equation 1) are retained. This is the crucial point: the MBH98 reconstruction has to use the S1 and V1 matrices from the short (observed) period SVD not the S_full and V_full of the full period SVD. We then have two reconstructions
3a) R_short = [W| zeros] * S1 * V1′;
3b) R_full = [W| zeros] * S_full * V_full’;
[Note the “[W|zeros]” represents filling out the last 1082-Neofs columns of the matrix with zeros to make the matrix multiplication conform.]
The entire difference in the two estimates are the postmultiplying matrices S and V. with R_short we have a reconstruction just as MBH98 would have done only with a better, in fact perfect, estimate of the temperature PCs, but still limited by the postmultiply matrices being determined from the short (observed) period data. With R_full those matrices are taken from the full period data.
If the MBH98 reconstruction procedure is feasible, it will perform reasonably well in comparison to the R_full reconstruction. That is, the limitation will the fact that Neofs PCs cannot perfectly reconstruction 1082 temperature series.
MBH98 use the Reduction in Error (RE) statistic to “verify” their reconstruction. Alternatives to that are the “Coefficient of Efficiency,” CE, (do not use that term as climatologists use it in economics, sociology, educational research or industrial engineering) and the old standby rho-squared, R2. Let me suggest that Henri Theil’s U statistic — “coefficient of uncertainty [of forecasts],” (Theil, H, Applied Economic Forecasts and Policy, North Holland, 1961), is superior to all three.
4) U = ( sqrt( sum( (Yhat-Y)^2) ) )
/ ( sqrt( sum(Yhat^2))+sqrt( sum(Y^2)) )
John interjects: this might be easier to read
It measures the size of the squared errors in proportion to the values being forecasted, both the forecast and the actual values. This (the original) version of the statistic goes from 0, a perfect forecast, to 1, a forecast in which the errors completely overwhelm the forecast, and because of that, I prefer it to the later version with a more tractable distribution but unbounded upper end.
But in any case I calculated all four statistics.
The results: The RE statistics for the R_short reconstructions of the individual temperature series all average well less than 0 for 100 run batches. The grand average is probably something like -20. When the reconstruction is compared on the basis of a spatial average, the RE values improve a bit, to maybe around -5. While it might be fun to hoist Dr. Mann by his own petard, remember the RE is a lousy statistic for Monte Carlos because of its asymmetrical unboundedness. Get 99 RE values of 0.5 and one of -100 (which is can happen), the conclusion on the average is a -0.5, yet the procedures might have been pretty good. The Theil’s U statistic is much more telling, at least to me. There the R_short, i.e. the MBH98 type reconstruction, never performed lower than a 0.70 while the R_full, i.e. what could be done with accurate weightings, never was over 0.2.
A note of caution: It should always be remembered that all Monte Carlo results are dependent on the parameters. For things like the Classical Linear Model (the textbook regression model), changing the parameters doesn’t change the Monte Carlo results (as to the nature or the distribution). But with more complex models, especially when nonlinearity is introduced, use beware. One would like to sample over all the parameter space, but life and computing facilities will run short. And right now I haven’t figured out just what is the appropriate variation of parameters. Also because the time consuming nature of the simulation, I am doing it right now with about half the size of MBH98’s work. I report these preliminary results because they were so dramatic and have remained unchanged across my various parameter changes. Black and white: Neofs — be that 10% of number of series or 75% — principal components constructed from the whole temperature series can reconstruct the full temperature series fairly accurately on an individual basis and very accurately on a spatial average when that reconstruction is based on the postmultiplying matrices that were jointly part of the SVD. But those same principal components cannot even reasonably reconstruct the spatial average when that reconstruction is based on the postmultiplying matrices of a short span of the full period of the temperature reconstruction.
To give a numerical flavor to the differences in the S * V’ matrices from the full versus short series SVDs, below are the two S * V’ matrices for first the full and then the short SVD on a 40 x 4 data set:
Full data S*V’ values:
-0.178 1.432 0.920 -7.440
-0.086 -5.435 3.568 -0.603
-0.849 -2.786 -4.445 -1.065
-4.448 0.398 0.540 0.250
Short data S*V’ values:
0.930 1.379 -0.599 4.414
1.126 -1.833 1.631 0.557
1.244 -0.272 -1.055 -0.320
-0.556 -0.745 -0.548 0.276
And the effect? Below is the 40th observation on the data:
0.161 -0.135 -1.872 1.108
What did the short data SVD reconstruction predict?
0.016 0.031 -0.373 -0.959
-90.0% -122.7% -80.1% -186.5%
Maybe it isn’t even necessary to go through the Monte Carlo experiments. What MBH98 and the principal components reconstruction advocates appear to be butting their collective heads against is a little linear algebra.