Comments on: Proxy Screening by Correlation

By: Evan Jones

Evan Jones — Sat, 13 Sep 2008 03:41:05 +0000

Proxy Screening by Correlation

Um.

Isn’t that just a somewhat clinical way of saying “shaking the ol’ cherry tree”?

(That’s a very serious matter.)

By: Jeff Id

Jeff Id — Wed, 10 Sep 2008 20:26:56 +0000

I compiled the data by latitude now. 60ds-20ds, 20DS to 20DN, 20DN to 60DN on my homepage.

By: MrPete

MrPete — Wed, 10 Sep 2008 02:06:59 +0000

Jeff, tree data is complicated. the arguments include things like environmental context determines whether the trees are temp-limited or precip-limited. much of this is not well proven. there are also huge data issues such as shown in Almagre Tree #31 (there’s a whole post here on just that single tree.)

By: Jeff Id

Jeff Id — Wed, 10 Sep 2008 00:20:53 +0000

I had a commenter ask me to plot Europe vs north America. It turns out that North America matches the rejected data almost perfectly while Europe correlates with the passed calibration data.

http://noconsensus.wordpress.com

By: Jeff Id

Jeff Id — Tue, 09 Sep 2008 20:44:11 +0000

Hu McCulloch ,

I’m sorry for posting on the wrong thread, I was pretty wound up. I’m still the new guy and have spent more time reviewing where Steve McIntyre and this group has been. Many of my intended directions have already been covered from previous work but I am still bothered by one main point.

If the concept of the tree ring proxy is in good correlation to temperature, what is the rational for throwing out the data. What I mean is (assuming this is temperature) that if there is noise through the period the data represents all these fancy statistics are doing is choosing an average with a high end peak. If the data is temperature for tree rings, why not use all of it? Is there some specific physical process that has been demonstrated which makes some trees bad for their entire multi-century lives?

I plotted the accepted and rejected data today on my blog and found that even though my data was 30 year smoothed, the rejected data had a high frequency component missing from the accepted data. It’s on my front page.

By: K. Hamed

K. Hamed — Tue, 09 Sep 2008 19:14:08 +0000

In reply to Hu McCulloch. Re: Hu McCulloch (#47),

are your inter-proxy correlations over the full length of the shorter proxy?

No. All my calculations are in the period 1850-1931 (I was trying to avoid the effect of the controversial trend afterwards). My calculated correlations are between the instrumental record in 'HAD_NH_reform.txt' and the proxies in 'itrdbmatrix.txt' and between the proxies themselves using the same common period of 1850-1931 for all. By the way, incidently I get 484 acceptable proxies in this period (336 +ve and 148 -ve). May be it is only this period? I am running the full 1850-1931 now, but it will take some time. I will post the results I get. It would be helpful if somebody can confirm this behavior before we get carried away, maybe Jeff Id can help?

By: Hu McCulloch

Hu McCulloch — Tue, 09 Sep 2008 18:52:19 +0000

Re K. Hamed #45,

The problem is that I get a (relatively large) number of proxies that are all (significantly) positively correlated with temperature, but at same time are (significantly) negatively correlated with each other!

This is indeed curious, but perhaps you are looking at these correlations over different time periods? The correlations with temperatures must be over the instrumental period, 1850-1995 or perhaps some subset if you’re looking at local temperatures, but are your inter-proxy correlations over the full length of the shorter proxy? If the “significance” is then spurious (eg doesn’t adequately take serial correlation into account, or otherwise uses too many DOF), the one could be randomly positive while the other was randomly negative.

Just a thought.

By: K. Hamed

K. Hamed — Tue, 09 Sep 2008 18:04:00 +0000

Re: Jeff Id,
Re: Hu McCulloch (#44),
Re: K. Hamed (#45),

Jeff Id’s comment#43 over in the Ian Jolliffe thread is off-topic there, but is right-on for this thread, so I’m commenting here instead. He wrote,

I just plotted the accepted data on top of the total average. You can clearly see the amplification by mann’s process on local temperature. I then subtracted the two, I am absolutely surprised by the result. It was much clearer than I expected. There is a sharp vertical increase in the difference between Mann and the average with a total rise equal to about 1 standard deviation.

You guys really should see this. Wow!

Mann’s Statistical Amplification of Local Data

Just as a check, can Jeff Id calculate the covariance matrix of the accepted All Positive proxies? They should all be positive. Maybe there is something wrong with my calculations.

By: K. Hamed

K. Hamed — Tue, 09 Sep 2008 17:52:03 +0000

In reply to Hu McCulloch. Re: Hu McCulloch (#43), The problem is that I get a (relatively large) number of proxies that are all (significantly) positively correlated with temperature, but at same time are (significantly) negatively correlated with each other! Positive correlation of each of two proxies with temperature implies that an increase (above the mean) in temperature is asscociated with increase in both proxies, but negative correlation between the same two proxies implies that an increase in one would be associated with a decrease in the other, hence the paradox. The same thing occurs with some proxies that are all negatively correlated with temperature, but some of which are negatively correlated with each other (they should be positively correlated). Similarly some proxies that have different signs of correlation with temperature are positively correlated with each other (they should be negatively correlated). I am aware that this can be easily generated in synthetic data by using a suitable covariance matrix. However, for this particular case of temperature proxies, I am under the impression that the sign of correlation between any two proxies should be the multiplication of the signs of their correlations with temperature (or zero if they are independent?). My question is whether the latter statement is correct in the current situation? If the answer is yes, then either something is wrong with the data as a group or most of the (significant) correlations we see are simply spurious. If the answer is no (with a convincing argument) then my apologies for making this comment.

By: Hu McCulloch

Hu McCulloch — Tue, 09 Sep 2008 17:01:14 +0000

Re: Jeff Id’s comment#43 over in the Ian Jolliffe thread is off-topic there, but is right-on for this thread, so I’m commenting here instead. He wrote,

I just plotted the accepted data on top of the total average. You can clearly see the amplification by mann’s process on local temperature. I then subtracted the two, I am absolutely surprised by the result. It was much clearer than I expected. There is a sharp vertical increase in the difference between Mann and the average with a total rise equal to about 1 standard deviation.

You guys really should see this. Wow!
http://noconsensus.wordpress.com/2008/09/08/manns-statistical-amplification-of-local-data/

Your graph does do a very good job of showing the different implications of the 484 series that passed “screening” versus those that did not. The ones that pass naturally have a good correlation with temperature, and if this is mainly by chance, will tend to average to the HS shape you find.

The tricky part is how to validly eliminate this data-mining factor, without at the same time rejecting any proxies with a valid temperature signal.

MZHBMRN08 are taking a empirical approach to the issue of which proxies have a good signal which are so weak that they are just adding noise to the results. This is not necessarily invalid, and is in fact in spirit of the standard MV calibration approach discussed here by UC last year: All proxies are considered, but then the reconstruction is based on a GLS average of the individual reconstructions. In the case of uncorrelated errors, this amounts to Weighted LS, with weights that depend on the strength of each correlation with temperature. This gives reduced weight to the weakly correlated series, which is just a more elegant version of what MZHBMRN08 are doing. However, I doubt that Mann et al appropriately adjusted their final confidence intervals for the effect of the pre-screening.

The fact that they only used proxies with positive correlations, when in fact as Steve shows above, half of the non-Luterbacher series have negative correlations, is another big potential problem.