More on PCs

DF criticized my post on principal components yesterday as follows:

Most of your figures for conventional PC analysis are misleading. You are comparing PCA1 to mean as if PCA1 has an intrinsically meaningful scale, when it does not. If you rescaled your comparison plots so that PCA1 and the mean had the same variance, then the results would be nearly indistinguishable (aside from questions of orientation). I do not believe such near equivalence holds for the Mann, offset-centered method.

I disagree with this comment on a number of grounds. In fact, I think that the scaling appropriately illustrates the near-identity of the PC1 and the first HS-shaped series. I agree with DF that, in the circumstances of this example, the rescaled mean approximates the PC1, but I take home an entirely different message: this illustrates the well-known non-robustness of the mean and illustrates the need for climate scientists to use a robust measure of location.

In my example, the weight of X[,1] in the PC1 is near unity – empirically it’s about 1-àƒÅ½à‚ⴞ2 here where àƒÅ½à‚Ⳡ~ 0.025 and the weights of the other series are close to 0 – empirically about àƒÅ½à‚⳯sqrt(n)*flip[i], where flip[i] is +1 or -1, roughly half and half here. The PC algorithm is not sensitive to n in picking out X[,1] from white noise; it is sensitive to the standard deviation àƒ?à†’€™ of the noise series, as àƒ?à†’€™ increases, the PC algorithm will eventually find random patterns in the white noise instead of the signal. In this toy example, the HS blade is big enough that it separates itself from the noise. In such a case, the pc1 is approximated by:

(1) pc1~ (1-àƒÅ½à‚ⴞ2)* X[,1] + àƒÅ½à‚⳯sqrt(n) àƒÅ½à‚⡠(i=2:n) X[,i] *flip[i]

Figure 8 below shows a slightly re-stated Figure 1 from yesterday, this time with the median (blue) added in the right panel. It notiveably separates from the mean (red).


Figure 8. As with Figure 1 yesterday. Blur is sd=0.05. Blue – median.

In this particular example, it’s the difference in variance that makes the HS stick out (rather than its HS-ness per se, although the HS-ness comes into play in the Mannian method.) To show this, I randomly permuted the values of the HS-series in Figure 9 below, leaving everything else unchanged. Obviously the PC1 recovers the high-variance series. You can also see where the mean and median yield different results.


Figure 9. As with Figure 8, with permutation of order.

DF stated that the rescaled mean would have similar properties to the PC1 in this particular case and this is, in fact, true. Figure 10 below shows the PC1, mean and median. As noted above, I take a very different moral from this. We know that the PC1 is to a very close approximation, simply the high-variance outlier series (the HS series in this example). In this example, 9 of 10 series are constant at 0 up to white noise and there is one outlier. The weights in the PC1 recover the outlier rather than the null value of 9 of 10 series. The re-scaled mean also recovers the outlier. If you think about it for a minute, the approximation of the PC1 to the re-scaled mean can be understood by working through the algebra a little. Since pc1~ X[,1],

pc1 — n* àƒÅ½à‚⺠ ~ X[,1] — n* (1/n * àƒÅ½à‚⡠(i=1:n) X[,i])
= (n-1) * 1/(n-1) àƒÅ½à‚⡠(i=2:n) X[,i]) =(n-1)* mean(X[,i])

Since the X[,i] i=2:10 are by construction white noise (àƒÅ½à‚⺠here being the mean series and àƒ?à†’€™ the standard deviation of the white noise series):

var(pc1- n*àƒÅ½à‚⺩ = (n-1)^2 var(mean(X[,i]))
= (n-1)^2 * àƒ?à†’€™^2 /n ~ (n-1)* àƒ?à†’€™^2

Thus,

sd( (pc1-n*àƒÅ½à‚⺩ = sqrt(n-1) * àƒ?à†’€™

I checked this formula empirically and it seems to be correct. So if you have one outlier series and (n-1) white noise series, the PC1 and n*àƒÅ½à‚⺠are going to be fairly similar, as DF observes. But that’s not the end of the story; it’s the beginning.


Figure 10. Scaled versions of PC1, mean and median.

The idea of variance re-scaling is one that is very peculiar to Hockey Team climate science right now. Variance re-scaling is implemented in Briffa, Mann; it’s discussed at length in von Storch et al [2004] and Esper et al [2005]. But you can’t go to Draper and Smith or Ripley and find a discussion of this method or a recommendation that it be done. I could stand corrected on this, but it seems to be somewhat sui generis to Hockey Team climate science right now.

We know that the PC1 is not robust since it essentially only recovers the outlier series X[,1]. Since the rescaled mean approximates a non-robust measure, it is evident that the re-scaled mean is not a robust statistic either -using "robust" in its technical sense. . If you google this term, you’ll find many discussions e.g. Ripley here . Ripley (and similar studies) specifically identify the "mean" as a non-robust statistic. Considerable effort has been spent in statistics over the past 20-30 years (see Ripley references) to develop location measures that are not subject to breakdown. The median is a simple method, shown here, because it is easy to implement and illustrates the point nicely.

So, while as DF points out, the rescaled mean approximates the PC1 (which is nearly identical to the outlier series), both methods in this particular example are yielding non-robust "reconstructions" and a very different result is obtained using a "robust" statistic – the median. Obviously this issue affects not just to MBH98 but to the many Hockey Team studies using pocket ( say 5-15) subsets of proxy networks.


36 Comments

  1. Martin Ringo
    Posted Mar 30, 2006 at 4:11 PM | Permalink

    What size is “n” in your example? (And apologies if I missed in previous PC post.)

  2. Jeremy
    Posted Mar 30, 2006 at 5:33 PM | Permalink

    Forgive a statistical newbie here, but wouldn’t ‘rescale(ing) the mean and PC1 so that they have the same variance’ be the nearly identical as plotting two sets of data on the same x but two different y-scales?

    In fact, I do this all the time, when I want two different sets of data with (usually) different structures and meanings to take up the same amount of page space. The idea is to get the most amount of information on how the data sets behave right next to each other. I don’t usually do it so that one function(data set) looks smaller compared to another function(data set).

  3. Steve McIntyre
    Posted Mar 30, 2006 at 5:59 PM | Permalink

    The point that I’m trying to make (and everyone seems to be missing) is that the PC1 (and the rescaled mean) are simply approximating X[,1] and are not “robust” averages. A “robust mean” in the sense of Ripley would be recovering the mean of X[,2:10] ie. a straight line of 0, which the median is coming closer to. The job of a robust statistic is to discount outliers – the 1 in 10 series. The PC1 and rescaled mean are doing exactly the opposite and breaking some important statistical rules.

  4. Jeremy
    Posted Mar 30, 2006 at 6:35 PM | Permalink

    I think I understand your point Steve. I was just trying to sound simple for the purpose of illustrating that rescaling the mean to show it looks like the PC1 is about as obvious as one could get. A mean is going to show everything as a proportion, so it stands to reason (but thank you for the algebra) that it’s going to show a hockey stick if you feed it one. So suggesting to rescale the variance on the mean to show this seems suspiciously like someone saying, “Well, those two should have the same size on the plot, so make it so.”

    Now that I think about it, I’ve never heard of someone rescaling sigma^2 on their data plots. I can’t understand why anyone would do this, but I don’t do statistical analysis.

  5. TCO
    Posted Mar 30, 2006 at 7:05 PM | Permalink

    1. variuance rescaling is a different, but venal, sin before God.

    2. You still owe me answers/responses in the some thread, steve.

    3. Well…if I grab the other PCs, does it all work out in the end?

  6. Dano
    Posted Mar 30, 2006 at 7:18 PM | Permalink

    Cheggidout! Steve gets some play! Credit where credit is due.

    Not what some of the dead-enders want to hear, but hey, you take what you can get.

    Best,

    D

  7. Bob K
    Posted Mar 30, 2006 at 7:44 PM | Permalink

    RE: 6

    It seems from the first sentence of the abstract that they are implying M&M did a reconstruction. Is that the way you read it?

  8. Paul
    Posted Mar 30, 2006 at 7:50 PM | Permalink

    It appears as if there’s an awful lot of Mann & Jones in there… Same ole, same ole.

  9. TCO
    Posted Mar 30, 2006 at 7:57 PM | Permalink

    Is it worth the $9.00. Could some one (heehee…who runs a blog…heehee) give me the net/net? I don’t understand how “long term persistence” proves “recent non-normal warming”. I guess they are saying that some aspect of the pattern in recent days is different from the aspects in earlier days? The arma paramaters are different? Is that what they say?

  10. Paul
    Posted Mar 30, 2006 at 8:07 PM | Permalink

    #9 –

    From the abstract, it appears that they’re using the same data set that’s been hashed over and over and over. Temperature proxies from the same trees, and Jones’ unreleased data. What’s different?

    Using different math with bad data still gives bad results – GIGO. Steve M first showed us that their original math was bad and then, for an encore, showed that the data itself is more than suspect.

    This new paper doesn’t use new data (at least from the abstract), so what does it matter if the math is “good?”

  11. MarkR
    Posted Mar 31, 2006 at 6:09 AM | Permalink

    Re #6
    “the mean temperature variations à?Æ'(m, L) between L years, obtained from moving averages over m years, are considerably larger than for uncorrelated or short-term correlated records”. Doesn’t this come back to using the “bristlecone data”?

    “Accordingly, the hypothesis that at least part of the recent warming cannot be solely related to natural factors, may be accepted with a very low risk, independently of the database used”. But all the databases used in this analysis are inter related, not independent, and all include the “Bristlecone”. I bet it does matter which database is used if you take the Bristlecones out.

    Dano, it doesn’t change a thing. The databases used by Mann and followers contain a wildly outlying data set, the “Bristlecone”. Take them out and there is no trace of the “Hockey Stick”, unless you use the special “data mining algorithm”, and no long term variances.

  12. Paul
    Posted Mar 31, 2006 at 8:59 AM | Permalink

    RE #11:

    Rob Wilson’s thread has some interesting issues related to the Bristlecone pines, and I think they relate to almost all treer ing derived data. Primary among them is that is probably easy to see a local “climage” signal from tree ring data, but not possible to extract a specific temperature signal from that data. The “divergence problem” is an example of why.

    I think the best we can say is that “we can see when the climate was good for trees and when the climate was bad for trees.”

  13. kim
    Posted Mar 31, 2006 at 9:28 AM | Permalink

    With what corresponding intrumental temperature records were these RW ‘temp proxies’ compared?
    ====================================

  14. Paul Penrose
    Posted Mar 31, 2006 at 9:43 AM | Permalink

    It seems all these studies depend on the assumption that the tree-rings contain a temperature signal and that signal is the strongest one present. However given the weak r2s and the non-robustness to the absence of the BCPs that these studies exhibit, I don’t see how that assumption can be considered valid. Now maybe there’s some other independant research that validates this key assumption, but I have not seen any yet. So, for the time being I don’t have a lot of confidence in these studies. This in turn casts a lot of doubt on the statement that the recent warming trends are too great to be natural.

    Another thing that is bothering me: I keep seeing statements about warming trends over the last 20 or 30 years and then linking it to “climate change”. How can we be taking about climate over such short time intervals? In my mind we aren’t talking about climate until we start getting into at least 100 year periods, and even then a single 100 year interval may not be significant by itself. Just a pet peave I guess.

  15. Spence_UK
    Posted Mar 31, 2006 at 9:49 AM | Permalink

    “the mean temperature variations àƒ?à†'(m, L) between L years, obtained from moving averages over m years, are considerably larger than for uncorrelated or short-term correlated records”.

    I’d be interested to know whether David Stockwell’s results also exhibit these properties. I might pop over to ENM later and have an ask…

  16. Douglas Hoyt
    Posted Mar 31, 2006 at 10:28 AM | Permalink

    It seems to me if you want to relate tree ring width (RW) to local temperature (T) that one would start with a regression equation of the form:

    RW = A + B*T + C*PDI + D*CO2

    where PDI is the Palmer Dought Index and probably a better variable than precipitation and CO2 is the concentration of carbon dioxide and a proxy for CO2 fertilization. Perhaps other variables may need to be added to the equation, but it is already complicated with 3 “independent” varaibles.

    Problems arise in common statististical packages because they usually assume the “independent” variables are really independent (not correlated with each other) and that is seldom the case. In fact, for example, CO2 and T might correlate for 1900-1940 and 1976-2005 and anti-correlate for 1940-1976, to give an obvious example. There are more sophisticated regression techniques such as ridge regression that take into account the cross-correlations between the independent variables, but I don’t know if they use it tree ring studies. I am also not sure if it would help for cases where the correlation comes and goes. You would probably get entirely different coefficients for 1900-1940 and 1940-1975, no matter how the analysis is done.

    It could very well be that tree rings represent temperature sometimes and not at other times. That is my 2 cents worth.

  17. jae
    Posted Mar 31, 2006 at 10:37 AM | Permalink

    Re: #14:

    It seems all these studies depend on the assumption that the tree-rings contain a temperature signal and that signal is the strongest one present. However given the weak r2s and the non-robustness to the absence of the BCPs that these studies exhibit, I don’t see how that assumption can be considered valid. Now maybe there’s some other independant research that validates this key assumption, but I have not seen any yet. So, for the time being I don’t have a lot of confidence in these studies. This in turn casts a lot of doubt on the statement that the recent warming trends are too great to be natural.

    Yes, that’s what I keep harping about. Why the heck doesn’t some dendrochronologist (or “dendroclimatologist”) get on here and explain how they can ASSUME that tree growth is LINEARLY correlated with temperature? It is fundamental to all this work!

  18. Mark
    Posted Mar 31, 2006 at 10:56 AM | Permalink

    Michael Mann even states that this assumption is requisite to the analysis he provides. Then he goes on to fail to provide the proof he hinged his claims on.

    Mark

  19. Dano
    Posted Mar 31, 2006 at 11:44 AM | Permalink

    11:

    Dano, it doesn’t change a thing. The databases used by Mann and followers contain a wildly outlying data set [standard meme follows]

    You’re right. It doesn’t change a thing. Decsion-makers are still making policy with the view that AGW is harmful. Too bad…

    Anyway, I read the paper. The paper statistically compares findings, which implies the data are good enough to analyze. The paper used MM03, so you should feel real good-like ’bout that.

    But if’n you don’t like the underlying data, complain to Rybski et al.

    Of course, one could always go out and get their own data to back their claim, but until then all there is is finger-pointing. And people who do this for a living are the ones being listened to (as evident by the recent pouting over the Clim Change snub).

    And decision-makers don’t read this site – they get counsel from folk who understand the science, which implies the data are good enough to use in decision-making.

    What’s the takeaway?

    Stop atomistically quibbling on blogs and gain access to decision-makers. Because they are moving forward in a policy direction some quibblers don’t want policy to go. Quibbling over a dataset won’t do it. And commenting won’t do it either, because those who get information to counsel decision-makers don’t read this blog. Skeptics aren’t turning the public discussion in the direction they want to go. Certain conservative policy-makers can’t halt the process, either, because they are outnumbered.

    What does that tell you boys?

    Best,

    D

  20. John Lish
    Posted Mar 31, 2006 at 11:53 AM | Permalink

    #19 Really Dano? Seems to me that countries are moving further away from Kyoto et al, certainly that seem to be the case for the UK. Are you confusing lip-service with action?

  21. Paul
    Posted Mar 31, 2006 at 12:10 PM | Permalink

    RE #19:

    Maybe there’s something to be gleaned from this link.

    Certain conservative policy-makers can’t halt the process, either, because they are outnumbered.

    What does that tell you boys?

    It tells us that people are easily scared. If you want to appeal to the majority having a consistent level head and strong basis in science, fine…History has taught me to have so such faith in the majority. Being the majority doesn’t make you right, only in the majority.

  22. jae
    Posted Mar 31, 2006 at 1:02 PM | Permalink

    re: 19. Now that’s a real scientific, reasoned, rational post, if I ever saw one.

  23. Paul
    Posted Mar 31, 2006 at 1:45 PM | Permalink

    RE #21 – Personal Edit.

    History has taught me to have so such faith in the majority.

    Should be “History has taught me to have NO so such faith in the majority.

  24. Michael Jankowski
    Posted Mar 31, 2006 at 1:54 PM | Permalink

    RE: 6

    It seems from the first sentence of the abstract that they are implying M&M did a reconstruction. Is that the way you read it?

    That’s certainly the way I read it.

  25. John A
    Posted Mar 31, 2006 at 4:07 PM | Permalink

    …decision-makers don’t read this site – they get counsel from folk who understand the science, which implies the data are good enough to use in decision-making.

    Are you sure? That’s not what I’m seeing in the weblog statistics.

  26. Follow the Money
    Posted Mar 31, 2006 at 4:14 PM | Permalink

    #20

    “”#19 Really Dano? Seems to me that countries are moving further away from Kyoto et al, certainly that seem to be the case for the UK. Are you confusing lip-service with action? “”

    I find it distressing countries are moving away from Kyoto because I wanted to see them suffer for what they agreed to. Western nations, one after another, when faced with strict reductions requirements meet the judgment day on Kyoto – the day they ask themselves, “Do we send money to China to buy carbon credits?” Report after report about how France, Britain, etc (not Germany of course, they rigged it) may not meet reduction target, nary a report offering the central purpose and motivator of the Kyoto scheme – the carbon credit trading casino – provides the solution.

    The other solution is statistical. 50-50 we start seeing downward hockey stick graphs showing EU compliance with CO2 targets. Heck, maybe they’ll have more cold winters and claim that the proof of compliance!

  27. Steve McIntyre
    Posted Mar 31, 2006 at 5:19 PM | Permalink

    I’ll post up something on Rybski et al. by itself. It gets frustrating when to endlessly have people proclaim MBH without bristlecones as the “MM reconstruction”. I categorically told vS that we did not provide an alternative reconstruction when I gave him the url for the dataset.

  28. Mark
    Posted Mar 31, 2006 at 5:20 PM | Permalink

    The US, in its infinite wisdom, is now setting up a Senate panel to discuss mandatory regulation of greenhouse gas emissions. It seems we will also need to suffer before seeing the light not unlike the EU.

    As to your last comment (re #26), doubtful. Even the most supportive of Kyoto realize that it cannot do anything significant even if they meet their targets. The costs are staggering. Perhaps when the downward trend in the actual record begins, correlating with decreased solar activity, we will finally be able to say “I told you so.” Unfortunately, it is my opinion that we will be so far down the road to bankruptcy by then it will no longer matter.

    Mark

  29. TCO
    Posted Mar 31, 2006 at 5:38 PM | Permalink

    Steve, I’ve commented a bit on the inkstain blog about “your reconstruction”. I’m a lot more interested in your take on the work itself first, (far) second on the issue of how independant the different reconstructions are (I imagine the “your reconstruction” has a lot of commonality with MBH) and then thirdly with the implied PR problem that someone will find fault in “your reconstruction” even though you did it as a variant of MBH (to show an effect) not as a reconstruction.

  30. jae
    Posted Mar 31, 2006 at 6:44 PM | Permalink

    Alas, what Dano says in his typically obnoxious way in #19 is probably correct, at least for the short run. Correct but not right. It’s the steam-roller effect, inertia generated during the last few years by blow-hard alarmists (many of whom are the scientists, themselves!) citing the poor science generated at some public institutions (like the Hokey Stick). Anyway, I am convinced that there is still not enough scientific proof of AGW to be formulating and implementing expensive “fixes.” Ironically, as Lomborg points out, any “fixes” will probably cause more harm than ignoring the issue. I think it is quite likely that the AGW “scare” is just another event that keeps the money flowing to the “scarers,” like the formaldehyde scare, Alar scare, etc., etc., etc. And like those events, the truth will eventually emerge.

  31. Bob K
    Posted Mar 31, 2006 at 6:49 PM | Permalink

    For the umpteenth mention on this site. No reconstruction was made. See Steve’s #27

    Myself, I’d call it a mathematical deconstruction. But Steve is probably trying to state it as accuarately as he feels comfortable with.

  32. jae
    Posted Apr 1, 2006 at 12:00 AM | Permalink

    ONCE AGAIN I CRY OUT TO THE “DENDROCLIMATOLOGISTS,” WHERE THE HELL IS THE PROOF THAT TREE GROWTH IS LINEARLY RELATED TO TEMPERATURE? YOUR WHOLE DAMN DISCIPLINE DEPENDS ON THIS RELATIONSHIP, AND IT DOESN’T APPEAR THAT YOU HAVE ANY FRIGGING CLUE! WHY DOESN’T ANYONE ADDRESS THIS TOPIC? DANO, YOU PORTRAY YOURSELF AS THE GURU OF TREE RINGS; WHAT YOU SAY, BOY?

  33. Armand MacMurray
    Posted Apr 1, 2006 at 4:28 AM | Permalink

    Jae,
    It’s not polite to “shout” in blogs!

    I think Dano’s poor analysis in #19 suggests that he may also lack any detailed knowledge of the underpinnings of the temperaturetree ring relationship.

  34. Peter Hearnden
    Posted Apr 1, 2006 at 4:31 AM | Permalink

    Re #32 Now that’s a real scientific, reasoned, rational post, if I ever saw one – not.

  35. Posted Apr 2, 2006 at 5:31 AM | Permalink

    oh dear

    Peter H has joined this line. Generally whenever Peter H joins I switch off and look at the newer blog lines for new discussions. Peter H’s interlocution generally means the beginning of claims of ad hominens and other nasty contributions, so generally as a rule I switch off (sign off) and go elswhere.

    However on this occsaion I volunteer the following to Peter H.

    The reason jae (no 32) “SHOUTED” was because nobody has bothered to answer the question or even made an attempt to answer the question over a very long time.

    Peter, perhaps you could use your influence to get somebody to answer the question. It would be very helpful to the discussion.

    Best wishes

    Harry G (from down under – bloody cold this summer no sign at all of AGW in my Olive Grove)

  36. kim
    Posted Apr 2, 2006 at 6:43 AM | Permalink

    Well, I’m sure I learned it in 3rd grade. We had a great teacher and learned a lot. I distinctly remember how enthralled I was at the idea that a sturdy, silent tree could pass on the information garnered through its whole life and held fast in its heart.
    ==============================================

Follow

Get every new post delivered to your Inbox.

Join 3,382 other followers

%d bloggers like this: