Proxy Screening by Correlation

Um.

Isn’t that just a somewhat clinical way of saying “shaking the ol’ cherry tree”?

(That’s a very serious matter.)

]]>I’m sorry for posting on the wrong thread, I was pretty wound up. I’m still the new guy and have spent more time reviewing where Steve McIntyre and this group has been. Many of my intended directions have already been covered from previous work but I am still bothered by one main point.

If the concept of the tree ring proxy is in good correlation to temperature, what is the rational for throwing out the data. What I mean is (assuming this is temperature) that if there is noise through the period the data represents all these fancy statistics are doing is choosing an average with a high end peak. If the data is temperature for tree rings, why not use all of it? Is there some specific physical process that has been demonstrated which makes some trees bad for their entire multi-century lives?

I plotted the accepted and rejected data today on my blog and found that even though my data was 30 year smoothed, the rejected data had a high frequency component missing from the accepted data. It’s on my front page.

]]>Re: Hu McCulloch (#47),

are your inter-proxy correlations over the full length of the shorter proxy?

No. All my calculations are in the period 1850-1931 (I was trying to avoid the effect of the controversial trend afterwards). My calculated correlations are between the instrumental record in ‘HAD_NH_reform.txt’ and the proxies in ‘itrdbmatrix.txt’ and between the proxies themselves using the same common period of 1850-1931 for all. By the way, incidently I get 484 acceptable proxies in this period (336 +ve and 148 -ve). May be it is only this period? I am running the full 1850-1931 now, but it will take some time. I will post the results I get.

It would be helpful if somebody can confirm this behavior before we get carried away, maybe Jeff Id can help?

]]>]]>The problem is that I get a (relatively large) number of proxies that are all (significantly) positively correlated with temperature, but at same time are (significantly) negatively correlated with each other!

This is indeed curious, but perhaps you are looking at these correlations over different time periods? The correlations with temperatures must be over the instrumental period, 1850-1995 or perhaps some subset if you’re looking at local temperatures, but are your inter-proxy correlations over the full length of the shorter proxy? If the “significance” is then spurious (eg doesn’t adequately take serial correlation into account, or otherwise uses too many DOF), the one could be randomly positive while the other was randomly negative.

Just a thought.

Re: Hu McCulloch (#44),

Re: K. Hamed (#45),

Jeff Id’s comment#43 over in the Ian Jolliffe thread is off-topic there, but is right-on for this thread, so I’m commenting here instead. He wrote,

I just plotted the accepted data on top of the total average. You can clearly see the amplification by mann’s process on local temperature. I then subtracted the two, I am absolutely surprised by the result. It was much clearer than I expected. There is a sharp vertical increase in the difference between Mann and the average with a total rise equal to about 1 standard deviation.

You guys really should see this. Wow!

http://noconsensus.wordpress.com/2008/09/08/manns-statistical-amplification-of-local-data/

Just as a check, can Jeff Id calculate the covariance matrix of the accepted **All Positive** proxies? They should all be positive. Maybe there is something wrong with my calculations.

Re: Hu McCulloch (#43),

The problem is that I get a (relatively large) number of proxies that are **all (significantly) positively correlated with temperature**, but at same time are ** (significantly) negatively correlated with each other!** Positive correlation of each of two proxies with temperature implies that an increase (above the mean) in temperature is asscociated with increase in **both** proxies, but negative correlation between the same two proxies implies that an increase in one would be associated with a decrease in the other, hence the paradox.

The same thing occurs with some proxies that are all negatively correlated with temperature, but some of which are negatively correlated with each other (they should be positively correlated). Similarly some proxies that have different signs of correlation with temperature are positively correlated with each other (they should be negatively correlated).

I am aware that this can be easily generated in synthetic data by using a suitable covariance matrix. However, for this particular case of temperature proxies, I am under the impression that ** the sign of correlation between any two proxies should be the multiplication of the signs of their correlations with temperature (or zero if they are independent?)**. My question is whether the latter statement is correct in the current situation? If the answer is yes, then either something is wrong with the data as a group or most of the (significant) correlations we see are simply spurious. If the answer is no (with a convincing argument) then my apologies for making this comment.

I just plotted the accepted data on top of the total average. You can clearly see the amplification by mann’s process on local temperature. I then subtracted the two, I am absolutely surprised by the result. It was much clearer than I expected. There is a sharp vertical increase in the difference between Mann and the average with a total rise equal to about 1 standard deviation.

You guys really should see this. Wow!

http://noconsensus.wordpress.com/2008/09/08/manns-statistical-amplification-of-local-data/

Your graph does do a very good job of showing the different implications of the 484 series that passed “screening” versus those that did not. The ones that pass naturally have a good correlation with temperature, and if this is mainly by chance, will tend to average to the HS shape you find.

The tricky part is how to validly eliminate this data-mining factor, without at the same time rejecting any proxies with a valid temperature signal.

MZHBMRN08 are taking a empirical approach to the issue of which proxies have a good signal which are so weak that they are just adding noise to the results. This is not necessarily invalid, and is in fact in spirit of the standard MV calibration approach discussed here by UC last year: All proxies are considered, but then the reconstruction is based on a GLS average of the individual reconstructions. In the case of uncorrelated errors, this amounts to Weighted LS, with weights that depend on the strength of each correlation with temperature. This gives reduced weight to the weakly correlated series, which is just a more elegant version of what MZHBMRN08 are doing. However, I doubt that Mann et al appropriately adjusted their final confidence intervals for the effect of the pre-screening.

The fact that they only used proxies with positive correlations, when in fact as Steve shows above, half of the non-Luterbacher series have negative correlations, is another big potential problem.

]]>