What I want is someone who thinks about the subtleties in the method (even though it’s an “old development”) rather then just using it like a technician. There are people at the very, very highest levels of science who push the “I beleive” button. Sometimes that is fine and efficient. Other times, you want someone who always wonders what’s under the hood. In the case of PCA, the methods and assumption thinking would need someone with both knowledge of theoretical and application pitfalls.

]]>1 Find a number of different proxy (or alleged proxy) datasets.

2 Make sure you include one (Mann et al Bristlecone), or two (Moberg #1 and #11) that have a long flat line, folowed by a steep rise in the 20th century.

3 Use whatever statistical technique is available to overwight the specially selected (for HocketStickness) proxies, and drown out as far as possible the information from the others.

4 Don’t use the standard statistical tests for robustness, either ignore the inconvenient ones, or invent some new one.

5 Voila, you have a peer reviewed, multi proxy, statistically robust Hockey Stick.

PS If I had my way, every one of these Hockey Stick papers should have a qualified statistician on board, and also be reviewed by at least one.

The reason the climatologists don’t include the statisticians is because this is almost entirely statistical work, and they would have to credut the statistician accordingly.

The climatologists doing this work are merely recycling other peoples data, using statistical methods that as SteveM has shown, time and again, the climatologists have no real understanding of.

]]>1 Why use PCA at all? My understanding is it is a method for consolidating a large number of data series, but Moberg only has 11.

from Wegman:

Principal Components

Principal Component Analysis (PCA) is a method for reducing the dimension of a high

dimensional data set while preserving most of the information in those data. Dimension is here taken to mean the number of distinct variables (proxies). In the context of paleoclimatology, the proxy variables are the high dimensional data set consisting of several time series that are intended to carry the temperature signal. The proxy data set in general will have a large number of interrelated or correlated variables. Principal component analysis tries to reduce the dimensionality of this data set while also trying to explain the variation present as much as possible. To achieve this, the original set of variables is transformed into a new set of variables, called the principal components (PC) that are uncorrelated and arranged in the order of decreasing “explained variance.” It is hoped that the first several PCs explain most of the variation that was present in the many original variables. The idea is that if most of the variation is explained by the first several principal components, then the remaining principal components may be ignored for all practical purposes and the dimension of the data set is effectively reduced.

2 What is the effect of choosing a limited calibration period with rising temperatures? And any subsequent recentering?

From Wegman again:

Principal component methods are normally structured so that each of the data time series (proxy data series) are centered on their respective means and appropriately scaled. The first principal component attempts to discover the composite series that explains the maximum amount of variance. The second principal component is another composite series that is uncorrelated with the first and that seeks to explain as much of the remaining variance as possible. The third, fourth and so on follow in a similar way. In MBH98/99 the authors make a simple seemingly innocuous and somewhat obscure calibration assumption. **Because the instrumental temperature records are only available for a limited window, they use instrumental temperature data from 1902-1995 to calibrate the proxy data set. This would seem reasonable except for the fact that temperatures were rising during this period. So that centering on this period has the effect of making the mean value for any proxy series exhibiting the same increasing trend to be decentered low. Because the proxy series exhibiting the rising trend are decentered, their calculated variance will be larger than their normal variance when calculated based on centered data, and hence they will tend to be selected preferentially as the first principal component.** (In fact the effect of this can clearly be seen RPC no. 1 in Figure 5 in MBH98.). Thus, in effect, any proxy series that exhibits a rising trend in the calibration period will be preferentially added to the first principal component…. The net effect of the decentering is to preferentially choose the so-called hockey stick shapes.

Also, it would help to know who you’re up against if you want to go one (or more) better.

]]>Will read the article on Monday to get the full context. (e.g. to see how that sentence ends, among other things)

But for the moment,

“scaling its variance and adjusting its mean value so that these become identical to those in the instrumental record”

means nothing more than that: they rescaled the mean and rescaled the variance of some reconstruction vector to match that of the NH temperature record.

You disagree with rescaling for some reason? If t is temperature and p is proxy, then linear sensitivity of p to t would yield:

p=a*t+b (response function)

or

t’=(p-b)/a (restated as calibration function)

Rescaling just means figuring out values for a (which controls variance of the series) and b (controls mean). Rescaling has no effect on the shape (autocorrelation structure, information content, whatever you want to call it) of the reconstituted t’.

Maybe I’m missing something? Like I said, I need to read a few papers.

…

P.S. If you have a very specific question in mind (e.g. dependent on the context provided by some paper), but phrase it in vague or general times, then you’re not going to get the answer your seeking. If original point #14 had made reference to the paper I would have known there was context to the remark. As with TCO on the issue of how many PCs to interpret, you were after something very specific, but didn’t provide the necessary context.

Not complaining. Just pointing out that that’s why it takes 2-3 takes for me to understand what it is you’re *really* after.

]]>Supplementary methods

Isn’t calibrating to strongly rising temperatures the same flaw as Mann et al?

Also, I don’t see any mention of RE, and r2 results.

Surely what we have here is more spurious correlation, and spurious calibration?