Proxy Screening by Correlation

I’ve made histograms of reported proxy correlations for 1850-1995, as reported in r1209.xls (which contains results for all proxies, unlike SI SD1.xls which withholds results below a benchmark.) The breaks are in 0.1 intervals. On the left is the histogram before screening; on the right, a histogram of the 484 proxies after screening.

Clearly a great deal of analysis could be done on this topic. I’ll just scratch the surface on this, as I’m going to be away for a couple of days, but felt that the issue warranted being on the table right away.

The first couple of things that struck me about the pre-screening distribution were –
1. there was odd tri-modality to the distribution, with a bulge off to the right with very high correlations;
2. the distribution was surprisingly symmetric other than the right-hand bulge, but was “spread” out more than i.i.d. normal distributions;
3. Mannian screening was, for the most part, one-sided, although high negative values were retained.

A little inspection showed that the right-hand bulge of very high correlations arose entirely from the Luterbacher gridded series, which, as I understand it (and I haven’t reviewed the Luterbacher data), contains instrumental information in the calibration period and is not a “proxy” in the sense of tree rings or ice cores. So when Mann says that 484 series passing a benchmark is evidence of “significance”, this inflates the perceived merit of tree ring and ice core data since the 71 Luterbacher series make a non-negligible contribution. Removing the Luterbacher series, one gets the more symmetric distribution shown below:

Next, the bimodality of this distribution calls for a little explanation. The vast majority of the proxies in this figure are tree rings, so we’re back to tree rings. It’s possible that this bimodality is a real effect, i.e. that some chronologies respond negatively to temperature and others positively. But it’s equally possible that a form of pre-screening has already taken place in collating the network with very “noisy” chronologies being excluded from even the pre-screened network. It would take some careful analysis of the tree ring networks to pin this down, but selection bias seems more likely to me than actual bimodality, but that’s just a guess right now.

Next, the correlations are more spread out than one would expect from i.i.d normal distributions, where Mann’s SI states that 90% of the proxies would be within -0.1 to 0.1 correlations. Given the fact that there are almost as many negative as positive correlations, this suggests to me that the effect of autocorrelation is substantially under-estimated in choosing 0.1 as a 90% standard. Given the relatively symmetric distribution, it looks far more likely to me that autocorrelation effects are wildly under-estimated in his benchmark and that the 90% benchmark is much higher. It’s not nearly as clear as Mann makes out that the yield of 484 proxies (less 71 Luterbacher) is as significant as all that.

This particular operation looks more and more like ex-post cherry picking from red noise (along the lines of discussions long ago by both David Stockwell and myself.) This is a low-tech way of generating hockey sticks, not quite as glamorous as Mannian principal components, but it “works” almost as well.

It’s pretty discouraging that Gerry North and Gaby Hegerl were unequal to the very slight challenge of identifying this problem in review.

This entry was written by Stephen McIntyre, posted on Sep 5, 2008 at 9:00 PM, filed under Mann et al 2008 and tagged correlation, luerbacher, Mann et al 2008, mann_2008, screening. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

52 Comments

David Stockwell

Posted Sep 5, 2008 at 10:48 PM | Permalink

The bulge to the right would be because it is a bounded distribution, I would think, more of a beta distribution.

Steve: I manually inspected the high-correlation proxies and they were all Luterbacher instrumental-influenced data sets. So it’s really a “mixed” distribution, with the Luterbacher instrumental-based having very high correlations to CRU. Both distributions are bounded and beta is a better image than normal, but the point that I was trying to emphasize was that the distribution is a mixture and the Luterbacher data should be excluded in calculations purporting to assess true “proxies”,
John A

Posted Sep 6, 2008 at 12:11 AM | Permalink

It’s pretty discouraging that Gerry North and Gaby Hegerl were unequal to the very slight challenge of identifying this problem in review.

Given Gerry North’s previous due diligence efforts, I’m surprised that you’re surprised.

Steve: I intended a little more irony in my remark. I was a little tired yesterday and missed the mot juste.
- Demesure
  
  Posted Sep 6, 2008 at 5:20 AM | Permalink
  
  Re: John A (#2), “undue diligence” is more appropriate word.
Pat Frank

Posted Sep 6, 2008 at 1:22 AM | Permalink

I have it on very good authority that Jerry North has won awards for his reviewing. He must have had an off day.
jeff id

Posted Sep 6, 2008 at 3:07 AM | Permalink

I’m still working on the negative hockey stick. We’ll see how it goes.

I am not as versed in statistics as many of you, but it seems to me that the 0.1 threshold represents a 10% limit on comparison of truly random data. Smoothing the data can shift that probability significantly.
- K. Hamed
  
  Posted Sep 6, 2008 at 4:11 AM | Permalink
  
  Re: jeff id (#4),
  
  Even before smoothing, if the temperature and the proxy are both autocorrelated (including long-term persistence), the variance of the cross-correlation estimator is inflated. If for example (using simulation) both series are fractional Gaussian Noise with H = 0.9, the variance may be inflated by a factor of around 3 (for n = 145 observations). For two AR(1) series each with r(1) = 0.7, the variance inflation would be around 2.7. I have also similar results for the distribution-free Kendall tau.
  - John Tofflemire
    
    Posted Sep 6, 2008 at 3:29 PM | Permalink
    
    Re: K. Hamed (#5),
    K. Hamid,
    
    Are you saying that the significance of the observed correlations are lower than they appear to be in the pre-screen histograms? It’s well known in economic time series analysis that two highly correlated variables that are also AR(1) autocorrelated can produce a high r2 when regressing one variable on the other but, when correcting for the autocorrelation, can result in extremely low r2. Is this somewhat the same idea? Thanks.
    - K. Hamed
      
      Posted Sep 6, 2008 at 6:42 PM | Permalink
      
      Re: John Tofflemire (#15),
      
      Yes, that’s what I meant. The actual distribution of r=0 when both series are autocorrelated is wider (;arger variance) than that for iid data. A correlation that seems highly significant assuming iid may actually be well within the 90% CI of the actual (wider) curve.
Jeff Id

Posted Sep 6, 2008 at 5:09 AM | Permalink

K. Hamed.

Wow! Nice.
Craig Loehle

Posted Sep 6, 2008 at 8:04 AM | Permalink

Steve: are these correlations wrt the nearest CRU grid or the global series?
Bob B

Posted Sep 6, 2008 at 9:12 AM | Permalink

http://wmbriggs.com/blog/2008/09/06/do-not-smooth-times-series-you-hockey-puck/
Jeff Id

Posted Sep 6, 2008 at 10:29 AM | Permalink

Bob, #9

I liked your link but couldn’t comment for some reason.
- Tony Edwards
  
  Posted Sep 6, 2008 at 3:28 PM | Permalink
  
  Re: Jeff Id (#10),
  
  Maybe if you enter through the main site
  
  http://wmbriggs.com/blog/
  
  and allow cookies, you will be able to comment.
  If you haven’t been before, it’s well worth while making a regular visit.
  Mr(?) Briggs is a very knowledgeable and entertaining writer on statistical matters.
Alan S. Blue

Posted Sep 6, 2008 at 1:09 PM | Permalink

How good is the overlap between the list of temperature indicators used in Loehle’s paper and this paper?
popoff

Posted Sep 6, 2008 at 1:42 PM | Permalink

sorry because of my ignorance, but I don’t understand the meaning of

before-screening and after-screening
Kenneth Fritsch

Posted Sep 6, 2008 at 2:18 PM | Permalink

I must assume that we all are viewing the histograms with an x axis that is in 0.1 bins of a p value for the correlations and not r values of the correlations — or am I assuming wrong. Mann used p values for screening and r values.
Jeff Id

Posted Sep 6, 2008 at 3:35 PM | Permalink

I have just completed a summation of all the Mann data. I scaled it to SD units and have averaged it with no weighting on my blog.

I think you will find this pretty interesting. I do need some guidance though from those of you who have more experience with the individual datasets. Any comments would be appreciated.

A Summation of the Mann Data on Global Warming
Jeff Id

Posted Sep 6, 2008 at 5:33 PM | Permalink

I plotted the averaged Luterbacher data the same way, using only data that passed one of the 3 significance tests specified in the main table.

http://noconsensus.wordpress.com

You can see where the correlation came from.
Henry

Posted Sep 6, 2008 at 6:11 PM | Permalink

I may have misunderstood, but the process seems to be:

(a) choose those series which correlate with recent temperature rises

(b) take some linear combination of them

with the result that they show an increase in the 20th century (the blade) and suggest something flatter before then (the shaft). There is no suprise there: the blade is inevitable from the process and the shaft could reasonably result from a random collection of series which average to around flat in the unconstrained period.

It would be interesting to see what this did if the same method was used but with the period used to select series shortened (say from 1960) to see what it suggested for the first half of the 20th century. Then do it again (say up to 1960) to see what it suggested for recent decades.
- mugwump
  
  Posted Sep 6, 2008 at 6:48 PM | Permalink
  
  Re: Henry (#18),
  
  It would be interesting to see what this did if the same method was used but with the period used to select series shortened (say from 1960) to see what it suggested for the first half of the 20th century. Then do it again (say up to 1960) to see what it suggested for recent decades.
  
  Even more interesting would be to use the same method but with the mirror image of the instrumental record (eg, reflect the record about the 1860 T value). Then the instrumental record would have a downward trend. Mann’s method would select for proxies that tracked that downward trend, and would extrapolate them back in the past. If Mann’s method is valid it should find no signal from such a reflected record. But I suspect it would show that the 20th century was the coldest for the last 1000 years…
Jeff Id

Posted Sep 6, 2008 at 7:10 PM | Permalink

Henry,

I’m working on reflecting and crudely correlating the temperature image. I should have something crude tomorrow.

As far as real discovery of true temperature variation, I have looked at the Mann data a dozen different ways (i can sort it, turn on and off different series and look at it pretty quickly now) and the warming period from about 200 to 800 AD really stands out even in the rejected data. I think we could correlate a set of hypothetical curves which would show a high degree of correlation and reconstruct a very high temperature from the oldest data in this paper. I think that the existence of this signal which matches the Mann paper in the rejected data, is evidence of why it shouldn’t have been rejected in the first place.

There seems to be an upslope in most of the averages I have done in present day temps. So perhaps there is some signal in there, or maybe they picked the right data.

I need to figure the correct way to scale these graphs so I can work in temperature units. Any suggestions which could speed this up?
deadwood

Posted Sep 6, 2008 at 9:41 PM | Permalink

Jeff@21:

“warming period from about 200 to 800 AD really stands out”

Does this sound right to you? It is my understanding that this period of history was cold.
- DaleC
  
  Posted Sep 7, 2008 at 4:24 AM | Permalink
  
  Re: deadwood (#22),
  
  I’m still looking at the entire data set so I cannot comment on the 484 finally retained series, but a plot of all 1209 scaled to between -1 and 1 certainly shows it warmer from 200 to 900, when it is generally thought to have been colder, following the Roman warm period. Note however that the number of proxies then is much much lower. To make anything of this would surely require detailed examination of which proxies are giving the increase, how many, and where they are.
  - Craig Loehle
    
    Posted Sep 8, 2008 at 8:01 AM | Permalink
    
    Re: DaleC (#24), One thing to watch out for is that not all the series cover the entire period. This means that when you compute anomalies, short series will be wrong. For example, let’s say series X covers only the MWP, say 800 to 1200. It is warmer at this site than at other times, but the data for other times is missing. If you take an anomaly, say against it’s own mean, you now lower all X temp values close to zero anomaly. There are ways people get around this by doing piecewise reconstructions, but it is unclear to me how they do it or if it is right.
Hu McCulloch

Posted Sep 6, 2008 at 9:42 PM | Permalink

RE Bob B, #9, William Briggs makes a very important point the webpage Bob links his comment: If you smooth a series and then do statistics with it, you’re surely kidding yourself, since the true independent sample size is much smaller than the nominal sample size.

Mann et al report (SI p. 5) that “To avoid aliasing bias, records with only decadal resolution were first interpolated to annual resolution, and then low-pass filtered to retain frequences f [less than] 0.05 cycle/yr (the Nyquist frequency for decadal sampling.) …. We assumed n = 144 nominal degrees of freedom [for correlations] over the 1850-1995 (146 year) interval for correlations betweeen annually resolved records … and n = 13 degrees of freedom for decadal resolution records. The corresponding one-sided p=0.10 significance thresholds are |r| = 0.11 and |r| = 0.34 respectively.”

The Punta Laguna examples selected for auditing by Steve would apparently be considered decadal, since the spacing is never less than about 8 years. They would then be interpolated to annual frequency, itself a smoothing operation, as shown in the graphs in the recent CA threads, and then further massaged with the unspecified (Butterworth?) low-pass filter, removing cycles shorter than approximately 1/.05 = 20 years. These were then correlated with local instrumental temperature, at an annual frequency, apparently, yielding r = .397 for #382 and .627 for #383 over the full 146 year calibration period, according to the spread sheet on Mann’s website. Since there are only about 15 decades in 146 years, they “conservatively” used a critical value of .34 based on only 13 DOF rather than 144 DOF as would be appropriate for serially independent errors. Since these two exceeded this value, they were apparently included, while the other two Punta Laguna series, which fell short, were not used for the reconstruction.

Although it is commendable that Mann et al did not use 144 DOF, 13 DOF is still way too many for these two series. #382 has only 9 true observations, while 383 has only 10. With 9 observations, there are only 7 DOF, and the .10 1-tailed critical value of r is .472, not .34 (actually .351 by my calculation, but close enough) as for 13 DOF per Mann et al. Since .397 falls short of this, there is no way #382 is significant, even at this feeble “significance” level, corresponding to a t-stat of about 1.28.

The reported r for #383 does exceed .472, but in order for the test to be valid, the 10 actual observations should have been correlated directly with the corresponding instrumental temperatures, per statistician Briggs. Since the 10 real observations were massaged by first interpolating and then smoothing with a filter that damps cycles under 20 years in duration, it is hard to say what the true effective sample size is — perhaps as small as 146/20 = 7, for 5 DOF!

So it looks as if Mann et al may have admitted far too many of the “decadal” series, even by their very weak criterion.

PS: Note that although the Pyrgophorus coronatus of #382 is a gastropod or snail, Cytheridella ilosvayi of #383 is a tiny crustacean or “seed shrimp”, and not a mollusk at all.
Joel McDade

Posted Sep 7, 2008 at 7:01 PM | Permalink

Is this NCDC.NOAA description of the NAO recon the same Luterbacher we are discussing here?

TIA
Jeff Id

Posted Sep 7, 2008 at 10:04 PM | Permalink

Dale,

Very nice graph. It gives a great look at the data sampling level.

I need help from someone to convert this raw data to temp. Can someone tell me how Mann did it? If not can someone suggest a statistically reasonable method. I’m no Steve but I can help.

I posted the Decade averaged data on my site today. It looks a lot like Dale’s graph.
http://noconsensus.wordpress.comC

I didn’t finish the slope analysis today or the other things I wanted to try, it was nice and hot out today 🙂

I could use some guidance.

Hu McColluch, I know I’m the new guy but..

Your post like others is f..n brilliant. I read every letter, and I know your analysis could force a correction of the Mann paper. The point shouldn’t be though that Mann admitted too many proxies, the point is that sorting and eliminating data by the noise level in a short window of known (slightly problematic) measured temperatures is faulty in general. Paleoclimatology is using this method everywhere from what I can tell, and it has to stop. If we can’t stop this incorrect analysis which flattens historic temperature trends compared to measured data, there will be one Mann after another to contend with.
- Dan White
  
  Posted Sep 7, 2008 at 10:38 PM | Permalink
  
  Re: Jeff Id (#26),
  
  I know your analysis could force a correction of the Mann paper….Paleoclimatology is using this method everywhere from what I can tell, and it has to stop.
  
  Oh you certainly are a newbie, aren’t you? Such wide-eyed optimism. Tsk, tsk. 🙂 Just kidding.
  
  You know 10 times the stats I ever will, and I’m sure you can make a contribution to this site. However, if you haven’t done so yet, you MUST take a break from your recent analysis and go back through CA’s archives. You really should start with the original hockey stick controversy (which will take a few days of reading in itself) at the very least. You will then have a good appreciation of what you are dealing with. Oh, check out the Ababneh thesis discussion as a nice companion to the hockey stick discussion. It’s classic. (I have to give you a heads up: It ends with Ababneh’s attorney advising her not to release her phD thesis raw data! Put that in the “can’t make this stuff up” file!) Also, be sure to spend some time at Real Climate if you haven’t already. I suggest you take a little valium first to keep your heart rate down.
  
  Good luck! 🙂
Jeff Id

Posted Sep 7, 2008 at 11:26 PM | Permalink

Dan,

You’re probably right, I promise to take your advice (minus the valium..Increases my dumbness). I have skimmed through this site before but a more careful review is a good idea. And by the way holy cr.. to the legal advice, I sure don’t remember that from reading before.

Regarding my last post, everyone needs to keep their eye on the ball, and the faulty method is far more important than this single paper. Pretty strong statement for a guy who figured out the broken statistic analysis on Thursday. Hey, at least I figured it out!

It’s too late to be thinking about this. I still need some advice on getting this Mann data converted in terms of temp anomaly. I don’t really care which method except that it needs to be considered reasonable.
JamesG

Posted Sep 8, 2008 at 1:58 AM | Permalink

JeffId: Choosing a proxy on the basis of an upslope at the blade doesn’t mean the proxy tail will be flat. I don’t like the method either but it all looks like pure noise to me so just eliminating some of it shoudn’t stop you getting a medieval warm period – and Mann indeed still gets one. Have you tested your theory by comparing the average plot of the total to the average plot of the remainder? It seems to me that regardless of the proxy selection, hockey-stick comes from the instrument data alone.
George Tobin

Posted Sep 8, 2008 at 5:56 AM | Permalink

1) DaleC #24 Thank you for that graph. It looks to my untrained eye like the only conclusion we can draw is that the more proxies we add, the noisier the outcome. If there really is a pony in there, I am not seeing it.

2) BTW, does anybody know if “Korttajärvi” really is the Finnish word for “bristlecone”?

3) JeffId & JamesG: Is the issue just about selective use of existing date sets or is it more about pasting modern instrumental data onto the blade in a manner that makes the blade more angled and the handle lower by comparison. If, hypothetically speaking, one set out to arrange data to support the proposition that the current warming period is the warmest in 2000 years, one could accomplish that either by (a) smoothing the past into a flat handle or (b) permitting variance in the handle but elevating the blade such that the older peaks look diminished. The latter seems more credible and scientific-looking so infilling and data integration would probably be at least as important ingredient as smoothing–if one were intent on arriving at a pre-conceived outcome, not that anyone has done that, mind you.
- JamesG
  
  Posted Sep 8, 2008 at 8:54 AM | Permalink
  
  Re: George Tobin (#30),
  I think that the argument isn’t the height of the MWP. Craig’s plot may be the better, or Moberg’s even. There is so much uncertainty there it’s probably not worth much. But even if the MWP was rather higher than today, it’s that fake end slope, from massaged and cherry-picked instrument data that adds the dramatic rapidity to the modern warming and makes it look manmade.
tomppa

Posted Sep 8, 2008 at 6:33 AM | Permalink

To #31, Korttajärvi means; kortta is derived from the finnish word korsi, which means straw. Järvi means lake. So a free translation to English would give “Strawlake”

No, Korttajärvi isn’t the world for bristlecone
Julian Droms

Posted Sep 8, 2008 at 7:02 AM | Permalink

Pardon my lack familiarity.

The distrubution graphs plot the correlation of the proxies with what exactly?

What is the y axis? Correlation of proxies with what?

Thanks, very interesting.
Julian Droms

Posted Sep 8, 2008 at 7:07 AM | Permalink

Oh… are these correlation values for the proxies against the CRU instrument recorded temperatures as interpreted by Hadley?
Jeff Id

Posted Sep 8, 2008 at 9:02 AM | Permalink

JamesG:

Asked
Have you tested your theory by comparing the average plot of the total to the average plot of the remainder? It seems to me that regardless of the proxy selection, hockey-stick comes from the instrument data alone.

I have only briefly looked at a couple dozen graphs. If you plot the data Mann used it has a steep tow at the end which by the statistical fraud employed fits the measured temp. The end of the hockey stick in his picture has a “bright” red curve overlaid on it forming the height of the hockey stick. Still, the end of the proxy data has been significantly amplified by this technique. It leaves the rest of the noise in the graph un-amplified. By performing this amplification at only the end of the graph he is creating an artificial correlation to temperature scale. The rest of the plot is NOT on the same vertical scale, and it might not even be temperature.

George Tobin: Answer B I think.

I posted last night a smoothed version of the total data.

Mann Data Smoothed and Averaged

I think this afternoon I will overlay the total data on the Mann data. If I am right, I think it will show my point nicely. If I am wrong, it wouldn’t be the first time 🙂
- IainM
  
  Posted Sep 8, 2008 at 9:10 AM | Permalink
  
  Re: Jeff Id (#36),
  
  I think the jpg is missing from your web page.
- IainM
  
  Posted Sep 8, 2008 at 9:22 AM | Permalink
  
  Re: Jeff Id (#36),
  
  Now it’s there.
Hu McCulloch

Posted Sep 8, 2008 at 10:02 AM | Permalink

Re DaleC, #24,

Your graph provides a very useful summary of the 1209 series considered by Mann, Zhang, Hughes, Bradley, Miller, Rutherford and Ni 2008 (hereafter MZHBMRN08).

Since you have scaled each series to have a max value of +1 and a min value of -1, each series should hit +1 at least once and -1 at least once. It is interesting that most of these hits, both positive and negative, occur during approximately 1500-1700, which even MWP denialists would grant were LIA years. There must be many duplicate years, so there won’t be a full 1209 +1s and -1s visible in the graph. However, I find it odd that there appear to be many more years with at least one +1 than with at least one -1. Is there an error here?

The average is interesting, but shouldn’t be taken at face value as an indication of warming or cooling. MZHBMRN08 presumably selected these series on the grounds that they thought there was some physicial reason to expect a positive correlation with temperature, but then find that only 484 have what they consider to be a significant positive correlation with instrumental temperature, even on a 1-tailed test that dismisses the approximately equal number of negative non-Luterbacher correlations.

It would therefore be very useful to run the same graph for the MZHBMRN08 484, as well as for the non-dendro, non-Luterbacher subset of these 484. Even this wouldn’t be formally calibrated to temperature, but would still be of interest to contemplate.

Your mean of the full 1209 scaled series shows a strong HS in the last few decaades. However, the blue line with the number of proxies shows that the number of proxies starts to tank after about 1960,so that there are only a handful of proxies in these last couple of decades. Craig Loehle and I, in our 2008 revision of Craig’s 2007 paper, found a similar problem in Craig’s 18-proxy set, and so terminated the reconstruction when the number of proxies fell to 9, or half of the full set. Since this occured in 1950, we terminated the 30-year smoothed reconstruction in 1935. MZHBMRN08 should have done something comparable.

The blue line also shows that the number of proxies falls off sharply before the LIA, so that there are inevitably relatively few maxima and minima before this period. In order to visualize what the MWP-relevant subset are doing, it would therefore be very useful also to do the same plot a) for the subset of the full 1209 that make it back to 950AD or so, and b) for the subset of the 484 that make it back to the same date.

Keep up the good work!
- DaleC
  
  Posted Sep 9, 2008 at 12:26 AM | Permalink
  
  Re: Hu McCulloch (#39),
  
  Hu, thanks for the kind words. I’m trying to find a way to reveal the underlying structure of the data set which requires nothing more than elementary high school mathematics to be able to follow. Jeff ID has a nice chart at http://noconsensus.wordpress.com/ which compares the 484 series to the 1209 scaled by standard deviation, the more common approach, but the concept of scaling by standard deviation is much too technical for most people. Isn’t a deviation some sort of kinky sexual practice? Scaling to between -1 to 1 has the advantage of giving everything equal weight in an average, which is easy to understand, and is surely fair enough as a starting point. Comparing the averages of the 1209 set to the 484 set, and then with/without the Luterbacher series, seems to me to be an elementary and defensible position which leads to the conclusion that the entire exercise is not worth pursuing. It looks like it may be that all these proxies indicate is that the Luterbacher set is the odd man out, and the rest is mostly just a very noisy ensemble. Luterbacher are the new bristle cones?
  
  My 22 year trailing moving average has all data points – I’m not being a purist, so all start and end points are shown. The moving average will reduce every point in a series which otherwise touches -1 or 1 except the first. What you have observed is that an unusually large number of series start with the highest (or lowest) value. I agree this looks a bit strange, but have not pursued it yet. Looking at each of the 1209 series in isolation shows a very mixed bag in terms of series topography. Some are blocky, some have all/most values different, some are fairly stable, some jerk around, some have outliers up to 10 times the amplitude of the other points, and so on. A dog’s breakfast, as the vernacular would have it.
  
  I put some of these issues to Dr Ben in an earlier thread, but he deflected with magnificent aplomb, so I’m none the wiser.
K. Hamed

Posted Sep 9, 2008 at 5:09 AM | Permalink

Steve, and others: I have a question. Consider for example proxies that are (significantly) positively correlated with the temperature. Is it normal that some of these would be (significantly) negatively correlated with each other? I know this is possible statistically, but is it OK in this particular application?
In other words can two (significantly) negatively correlated proxies contribute positively to the temperature? If I am not mistaken, there seems to be a lot of these in this dataset.
Hu McCulloch

Posted Sep 9, 2008 at 10:10 AM | Permalink

Dale C, #40, writes, re graph in #24,

My 22 year trailing moving average has all data points – I’m not being a purist, so all start and end points are shown. The moving average will reduce every point in a series which otherwise touches -1 or 1 except the first. What you have observed is that an unusually large number of series start with the highest (or lowest) value.

I see — but it might be more representative to omit the readings with fewer than 22 points in the average, since these have much higher variance and so are more likely to be the extremes. It also makes some difference whether you scale to +/-1 before or after you take the moving average, since in the former case, the MA will ordinarily never reach the limiting values, while in the latter it always will. I’m not sure which way is better, just that they are different.
Hu McCulloch

Posted Sep 9, 2008 at 10:26 AM | Permalink

K. Hamed writes in #41,

Steve, and others: I have a question. Consider for example proxies that are (significantly) positively correlated with the temperature. Is it normal that some of these would be (significantly) negatively correlated with each other? I know this is possible statistically, but is it OK in this particular application?
In other words can two (significantly) negatively correlated proxies contribute positively to the temperature? If I am not mistaken, there seems to be a lot of these in this dataset.

This is possible, but seems unlikely. Suppose y1 and y2 are two proxies related to temperature x by
y1 = x + e1 + u,
y2 = x + e2 – u,
where e1 and e2 are proxy-specific white noise, while u is a common white noise factor. Then cov(y1, y2) = var(x) – var(u), and hence will be negative if var(u) is bigger than var(x), even though both are positively correlated with x.

However, since many of the proxies use similar imperfect methodologies, the common factors are more likely to enter in with the same sign rather than opposite signs. If the covariances between many of the proxies is negative, it is more likely due to the fact that about half of the non-Luterbacher series are in fact negatively correlated with temperature x, as shown in Steve’s figures above.
- K. Hamed
  
  Posted Sep 9, 2008 at 11:52 AM | Permalink
  
  Re: Hu McCulloch (#43),
  The problem is that I get a (relatively large) number of proxies that are all (significantly) positively correlated with temperature, but at same time are (significantly) negatively correlated with each other! Positive correlation of each of two proxies with temperature implies that an increase (above the mean) in temperature is asscociated with increase in both proxies, but negative correlation between the same two proxies implies that an increase in one would be associated with a decrease in the other, hence the paradox.
  
  The same thing occurs with some proxies that are all negatively correlated with temperature, but some of which are negatively correlated with each other (they should be positively correlated). Similarly some proxies that have different signs of correlation with temperature are positively correlated with each other (they should be negatively correlated).
  
  I am aware that this can be easily generated in synthetic data by using a suitable covariance matrix. However, for this particular case of temperature proxies, I am under the impression that the sign of correlation between any two proxies should be the multiplication of the signs of their correlations with temperature (or zero if they are independent?). My question is whether the latter statement is correct in the current situation? If the answer is yes, then either something is wrong with the data as a group or most of the (significant) correlations we see are simply spurious. If the answer is no (with a convincing argument) then my apologies for making this comment.
K. Hamed

Posted Sep 9, 2008 at 12:04 PM | Permalink

Re: Jeff Id,
Re: Hu McCulloch (#44),
Re: K. Hamed (#45),

Jeff Id’s comment#43 over in the Ian Jolliffe thread is off-topic there, but is right-on for this thread, so I’m commenting here instead. He wrote,

I just plotted the accepted data on top of the total average. You can clearly see the amplification by mann’s process on local temperature. I then subtracted the two, I am absolutely surprised by the result. It was much clearer than I expected. There is a sharp vertical increase in the difference between Mann and the average with a total rise equal to about 1 standard deviation.

You guys really should see this. Wow!

Mann’s Statistical Amplification of Local Data

Just as a check, can Jeff Id calculate the covariance matrix of the accepted All Positive proxies? They should all be positive. Maybe there is something wrong with my calculations.
Hu McCulloch

Posted Sep 9, 2008 at 12:52 PM | Permalink

Re K. Hamed #45,

The problem is that I get a (relatively large) number of proxies that are all (significantly) positively correlated with temperature, but at same time are (significantly) negatively correlated with each other!

This is indeed curious, but perhaps you are looking at these correlations over different time periods? The correlations with temperatures must be over the instrumental period, 1850-1995 or perhaps some subset if you’re looking at local temperatures, but are your inter-proxy correlations over the full length of the shorter proxy? If the “significance” is then spurious (eg doesn’t adequately take serial correlation into account, or otherwise uses too many DOF), the one could be randomly positive while the other was randomly negative.

Just a thought.
- K. Hamed
  
  Posted Sep 9, 2008 at 1:14 PM | Permalink
  
  Re: Hu McCulloch (#47),
  
  are your inter-proxy correlations over the full length of the shorter proxy?
  
  No. All my calculations are in the period 1850-1931 (I was trying to avoid the effect of the controversial trend afterwards). My calculated correlations are between the instrumental record in ‘HAD_NH_reform.txt’ and the proxies in ‘itrdbmatrix.txt’ and between the proxies themselves using the same common period of 1850-1931 for all. By the way, incidently I get 484 acceptable proxies in this period (336 +ve and 148 -ve). May be it is only this period? I am running the full 1850-1931 now, but it will take some time. I will post the results I get.
  
  It would be helpful if somebody can confirm this behavior before we get carried away, maybe Jeff Id can help?
Jeff Id

Posted Sep 9, 2008 at 2:44 PM | Permalink

Hu McCulloch ,

I’m sorry for posting on the wrong thread, I was pretty wound up. I’m still the new guy and have spent more time reviewing where Steve McIntyre and this group has been. Many of my intended directions have already been covered from previous work but I am still bothered by one main point.

If the concept of the tree ring proxy is in good correlation to temperature, what is the rational for throwing out the data. What I mean is (assuming this is temperature) that if there is noise through the period the data represents all these fancy statistics are doing is choosing an average with a high end peak. If the data is temperature for tree rings, why not use all of it? Is there some specific physical process that has been demonstrated which makes some trees bad for their entire multi-century lives?

I plotted the accepted and rejected data today on my blog and found that even though my data was 30 year smoothed, the rejected data had a high frequency component missing from the accepted data. It’s on my front page.
Jeff Id

Posted Sep 9, 2008 at 6:20 PM | Permalink

I had a commenter ask me to plot Europe vs north America. It turns out that North America matches the rejected data almost perfectly while Europe correlates with the passed calibration data.

http://noconsensus.wordpress.com
MrPete

Posted Sep 9, 2008 at 8:06 PM | Permalink

Jeff, tree data is complicated. the arguments include things like environmental context determines whether the trees are temp-limited or precip-limited. much of this is not well proven. there are also huge data issues such as shown in Almagre Tree #31 (there’s a whole post here on just that single tree.)
Jeff Id

Posted Sep 10, 2008 at 2:26 PM | Permalink

I compiled the data by latitude now. 60ds-20ds, 20DS to 20DN, 20DN to 60DN on my homepage.
Evan Jones

Posted Sep 12, 2008 at 9:41 PM | Permalink

Proxy Screening by Correlation

Um.

Isn’t that just a somewhat clinical way of saying “shaking the ol’ cherry tree”?

(That’s a very serious matter.)