Eduardo Zorita and I are in the process of reconciling some results. We have taken one issue off the table – VZ implemented Mannian PCs accurately enough that this does not account for any differences between our results and theirs. So I take back some observations and I’ll place updates in appropriate places. In fairness to me, their description in GRL of what they did was perhaps language-challenged, shall we say; my interpretation of their description was the most reasonable one. Other than me, the description was probably not a problem for anyone else, but I was also probably the only one trying to figure out what they did from their description. This proves one more time the advantages of using source code for reconciliation.
By getting this issue off the table, we can try to focus on other issues perhaps accounting for the differences in results e.g. how the pseudoproxies are constructed – which was what we primarily discussed in our GRL article. In this respect, Eduardo has acknowledged that their reconstructions testing the impact of Mannian PCs in an ECHO-G context have high verification r scores in both cases, which certainly suggests to me, as we hypothesized, that their pseudoproxies did not accurately represent the dog’s breakfast of MBH proxies, where the high RE-low verification r2 combination is a distinctive signature that a simulation should replicate. Hopefully there will be more on this.
To benchmark our PC calculations, I posted up a 581×70 network of AR1=0.7 red noise and we exhanged PC results and source code using our respective methods. The following graphic shows our respective implementations of Mannian methods. There’s one small difference which I’ll describe separately in another post, because it’s amusing and because it’s relevant to the trended-nondetrended argument.
Black, red: Zorita and McIntyre implementations of Mannian PCs on benchmark red noise network.
So if we’re getting the same results on the same data, why have I thought that there was a problem? In their GRL paper, VZ said:
MM05 noted that MBH98 normalized their data unconventionally prior to the PCA, by centering the time series relative to the instrumental-period mean, 1902–1980, instead of relative to the whole available period. Why this was done is unclear. It is, however, not entirely uncommon in climate sciences…
MM05 performed a Monte Carlo study with a series of independent red-noise series; they centered their 1000 year-series relative to the mean of the last 100 years, and calculated the PCs based on the correlation matrix.
If you do PCs using the correlation matrix, you cancel out the impact of the decentering. What Eduardo had done was to de-center the data, making a new matrix X, and then calculated the matrix t(X) * X – which is not a covariance or correlation matrix. However, doing SVD on this matrix will yield Mannian results, although they are not "principal components" within the definition of Preisendorfer, as they are not an analysis of variance. So it’s the description of the method that’s at fault – and, of course, the original method is at fault. But the algorithm used for checking is OK at this step. So my editorial surmises that VZ had not implemented this step "correctly" – in the sense of correctly implementing an incorrect method – were themselves incorrect and I’m updating and annotating accordingly.
It’s possible that there’s an interaction between detrended calibration and the impact of spurious PC1s – intuitively that seems very plausible to me, and that’s on the agenda. Also I am 100% convinced that the VZ pseudoproxies don’t adequately consider "bad apples". This type of robustness analysis is what’s done in "robust statistics" – a very active field.