A Reader's Comment about Principal Components

Here’s an interesting comment from an applied statistician about principal components:

I am a chemical engineer and a statistician by both training and experience. I was in charge of applied statistical methods at a major chemical company for about 37 years; retired as a Corporate Fellow with 40 years of service, and work as a full-time consultant in statistical methods…

In a time when everyone who has access to statistical software becomes an "expert" in statistical methods, I see an affection for using PCA and the fruits thereof. It’s not a beautiful sight…

.. it illustrates a general principle and a major problem with PCA. Namely, the "answers" are sensitive to the metrics (scaling, engineering units, etc.) in which the data are expressed. In some instances the choice of metrics (engineering units, etc.) totally determines the perceived "meaning of the data", so we can make drastic changes in the "conclusions" just by changing the metrics. I’ve done this and have watched authors of papers go slack-jawed when they realize the foolishness of their profound "conclusions".

6 Comments

  1. David H
    Posted Feb 24, 2005 at 3:29 PM | Permalink

    Is this not what slide 20 of Prof. Jolliffe’s PCA presentation shows? Incidentally the link to this presentation on RealClimate which is supposed to justify MBH’s non centred PCAs no longer works. I wonder why. This is by no means the first bit of previously available data which disappears when people start to take interest.

  2. Spence_UK
    Posted Feb 24, 2005 at 3:42 PM | Permalink

    I’ve been looking into PCA methods and how the MBH98 curve might be influenced and I’m amazed at how basic guidelines recommended by so many experienced PCA users have been apparently completely ignored by Mann et al. I came across this powerpoint presentation http://http://www.met.rdg.ac.uk/~swx03itj/ToCentre.ppt, by the author of Professor Mann’s favoured PCA text book, Ian Jolliffe, which describes a number of different centring techniques, complete with an observation of how applying different centring techniques changes the interpretation of the results. This presentation stresses the importance of explain what has been done, why it has been done and what the consequences are — all conspicuously absent from the hockey stick papers. I’m pretty sure I’ve got a very clear explanation of the mathematics behind the construction of the hockey sticks now, and I maintain that the use of a more conventional centred PCA results in a very high risk of negative correlations creeping in, which have no sensible interpretation in reconstructing average global temperatures.

  3. John A.
    Posted Feb 24, 2005 at 4:20 PM | Permalink

    From Spence’s link (page 21):

    uncentred “covariance’ analysis

    “Results are not invariant to choice of scale”

    “It seems unwise to use uncentred analyses unless the origin is meaningful. Even then, it will be uninformative if all measurements are far from the origin”

  4. John S.
    Posted Feb 24, 2005 at 6:22 PM | Permalink

    Another problem with PCA is that the PCs that result have no inherent meaning. One has to argue very hard that particular components actually mean something. In the case of bristlecones and that North American PC1 (or is it PC4?) it seems that the principle component could well be an ‘aerial CO2 fertilisation’ effect rather than anything temperature related. Garbage in, garbage out. (Quite apart from any question about whether your method is the methodological equivalent of a blender.)

  5. Michael Ballantine
    Posted Feb 25, 2005 at 7:41 AM | Permalink

    John, I very much agree with your “aerial CO2 fertilisation’ comment. My circle of friends are very much into gardening and agriculture in general and very well informed compared to the general public. It is our concensus that tree ring growth of trees in the wild should not be used as temperature indicators in any useful temperature indication.

    Tree growth responds strongly to 4 things that can vary from year to year, water, CO2, sunlight and temperature. Assuming that tree ring growth can be shown to have a reliably positive correlation with temperature, how the hell to you account for water, CO2 and sunlight?

    Perhaps “Peter the farmer” can shed some useful comments on this.

  6. N. Joseph Potts
    Posted Feb 25, 2005 at 10:35 AM | Permalink

    The people over at realclimate express so much frustration at having to deal with non-climatologists. It seems they claim a brief wider than theirs is. The author of this post would appear frustrated at the amateurism of non-statisticians (Mann, et al.).
    It would seem that, in at least some of the technologies critically at issue, the high ground of expertise is on THIS side of the discussion, rather than THAT one. I’ve heard that over here, we’re “not doing science.” Might it be said that over there, they’re “not doing statistics?”
    This is not to say for a minute that they’re even doing climatology over there, if identification and selection of proxies is climatology. Their analysis seems seriously flawed on virtually every score.

%d bloggers like this: