The tendency for PCA to invert signals when the generate an inverse correlation seems perfectly natural. It is designed to transform input data sets in this way. But as a mechanism for combining temperature studies it seems an odd thing to do.

Unless I’m missing a trick (I will check the GRL code) I would be minded to conclude that PCA is not an appropriate tool for which to combine global temperature measurements, conventional or decentred.

]]>I’ve got a technical question regarding the basics of using a PCA for global temperature reconstruction. I did consider posting it on realclimate, but thought better of it… I’m not sure of the best place to post it, but a comment on another discussion regarding PCA for dummies seems perhaps the best bet :)

I’ve been conducting a parallel study to test your hypothesis that the MBH98 PCA method is flawed. This is partly for my own interest, as I regularly use statistics at work, and have relatively little practical experience of multi-variate stats and this seemed an interesting, practical example to follow.

I’ve coded up the MBH98 method as described in Mann’s algorithm description and in the pca-noamer.f file, in full using MATLAB, and sure enough, hockey sticks pop out pretty much every time.

While trying to get to grips with this, I observed that inverted correlations within the PCA are picked up by the process, rotated such that they align and incorporated into the PC’s. This seems a quirky thing to do with temperature. For example, if the northern hemisphere were warming and the southern hemisphere were cooling, the conventional PCA would invert one of the signals and correlate in the output PCs, resulting in one strong signal in the output of arbitrary sign.

This seems to me to be the case for conventional PCA. Is there a specific step that is taken in the MBH98 PCA (or others like it) that prevents such a thing from happening? Or have I got entirely the wrong end of the (hockey) stick here…

Thanks in advance.

**Steve’s comments:** I’ve archived the code for my simulations at GRL if you want to crosscheck. The code is in R, but you can probably get the idea. You’re exactly right about the inversions. There’s a pretty graphic in our E&E paper showing a take on this: we added 0.5 to the 50 non-bristlecone series in the AD1400 network and did MBH98 PCA. It flipped over all the 50 non-bristlecone series and the higher growth rates translated into lower 15th century temperature indexes under MBH98 methods. You’re exactly right on this – it’s pretty weird.