Jean S., thanks much for your reference explaining PCA and the time you have taken to explain the nuances of this analytical tool and its applications in the Mann publications. It makes the time spent at this blog well worth the effort and a great learning experience.

]]>Also, I would be very interested to see a reconstruction calibrated against world temp, US temp, and unfudged satellite temp. Using the R statistical package, which is open to anyone. A sort of Open Millenial Temperature Reconstruction.

I can understand there may be good reasons why SteveM has chosen not to do it, but I don’t understand why everyone else seems to be avoiding it. It is after all probably the most important issue at this time.

]]>And how does one know that the signal found is for the required variable, eg temp?

Indeed when it is described as mulitivariate, I can see only two factors, growth anomaly and temperature. I dont see any data for water, or CO2 etc being used.

And how does one deal with Benders point that the variables are not stationary over time? eg Trees may be Temp limited or they may be water, or CO2 limited at different times.

We wouldn’t know that any of the tree ring PC’s indicated temperature until we regressed T on the PC’s and found some of them to be significant (after correcting for serial correlation and spurious regression and adjusting critical values for data mining).

There is no guarantee that any of them would be significant,

no reason they couldn’t be.

CO2 is very relevant, since it arguably fertilizes trees, and since CO2 has increaed a lot in the past 50 years, correlating with the recent growth spike in many of the series. Also, we have a resonable measure of it over the past 1000 years or more from ice cores. This spike may be also due to temperature, or to tree-specific factors like injuries that leave it in a “stripbark” condition. Since the stripbark is visible, one can simply avoid those trees for this purpose as recommended by NAS, compute PC’s from the remaining trees, then regress Temp on the PC’s and CO2 during the last 79 years or whatever. Then if anything is significant, use these coefficients to construct pre-instrumental T estimates from the PC and CO2 data.

These estimates will of course have a big se, and are meaningless unless the se is reported alongside them. For this purpose, it is sufficient to use what I call the “coefficient forecast se” based only on the uncertainty of the coefficients, and not the “total forecast se” which also incorporates the variance of the regression error that will accompany the actual Temp value. Unfortunately, EViews automatically give you the total forecast se if you ask for the “forecast se”, so you have to remember to back the coefficient forecast se out of it, using @SE to recover the se of the regression errors along with the Pythagorean theorem. (If you’re using EViews, that is…)

According to Steve in the Almagre discussion, Graybill was originally looking for this CO2 fertilization effect when he cored his trees, but it turned out not to be very strong. Nevertheless, it can’t hurt to try adjusting for it. It can always be left out if it is insignificant.

One problem with adjusting for CO2 in this manner is that it might automatically erase any evidence of a causal link from CO2 to Temp, which after all is what this discussion is all about. Perhaps the AGW people have come up with a way around this. Have they?

]]>I take this to be the common observation that PCs have no inherent meaning. They are a mathematical construct that finds the orthogonal vectors that explains the most variance in your data. What that projection means is then a matter for philosophers to debate. Mann’s PC1 could be a temperature signal, a CO2 signal, a signal related to the albedo of my nether regions or some combination of all or none of the above.

If you want to attach some meaning to it you need to do a lot more work. A physical theory is one way, a good (non-spurious) regression might be another. But talking about Mann’s PC1 as “the” temperature component is just wishful thinking.

…or at least that is my take on it. And you seem to have said more or less that at #68 while I was typing.

]]>Some points:

-Yes, there are several variables in proxies and PCA does not fully separate them. It simply “condenses” your variables into few signals (assuming that variables are linearily related, which might be already too much assumed). It is not the purpose of PCA in this tree ring proxy thing to fully “extract” temperature signal, not even in MBH.

-In MBH, the “temperature extraction” is supposed to happen (it does not) in the final regression phase (the calibration). Keep in mind that MBH proxies contain also intrumental data, precipitaion proxies etc.

-PCA in general might be usuful even it does not separate the variables. This was indicated already by Hu (#37): one should test which PCs are best for forecasting (i.e. which contain most “temperature signal”). I do not believe it would work for tree rings and hemispheric temperature, but in principle it is a working idea.

-And once again: Mannin PCA is not PCA at all. It does not have any properties the true PCA has. What Mannian PCA does is that it projects proxies such that those proxies with highest difference in the calibration and overall mean get rewarded. The only purpose I can see for it, is that it guarantees the overfit in the final phase of the MBH algorithm even in the case only few proxies are used.

]]>Principal Components Analysis (PCA) provides a concise overview of a dataset and is usually the first step in any analysis. It is very powerful at recognising patterns in data: outliers, trends, groups etc.

I can see how the data is grouped now. Are the Principal Components recalculated and weighted for graphing?

Also, it seems to me that if Proxies as a group were originally chosen because they provided a good representation for temperature, why would one need to look for a “signal” in amongst the noise?

And how does one know that the signal found is for the required variable, eg temp?

Indeed when it is described as mulitivariate, I can see only two factors, growth anomaly and temperature. I don’t see any data for water, or CO2 etc being used.

And how does one deal with Benders point that the variables are not stationary over time? eg Trees may be Temp limited or they may be water, or CO2 limited at different times.

Seems to me that this choice of data manipulation merely disguises the fact that the “Proxies” chosen do not correlate well with each other in the first place. So they can’t all be temperature Proxies, if in fact any of them are.

Looking at the **example** application in the link slide show above it appears to be a one dimensional problem.

In summary PCA seems to be useful when you are looking for patterns in dat with multiple variables. Not when the the whole purpose of the exercise is based on the assumption that the data has only one variable, and you already know it is temperature.

]]>