Inversions from Partial Correlation Coefficients

Willis raised an interesting point about trying to invert series based on partial correlation coefficients with monthly temperatures. His post, together with UC’s response are collected here.

Willis:

So, I figured I’d see if I could reconstruct what the tree ring widths looked like from the correlations to the Dulan temperature. Much to my surprise, I found out that there is no unique inversion from correlation to ring widths. Here are four synthetic ring widths, each with identical correlations to the Dulan month by month temperature, along with the Dulan temperature itself:

As you can see, the four synthetic ring width series are all quite different the Dulan temperature. In addition, the correlations with the quarterly (D-J-F, M-A-M, J-J-A, and S-O-N) and annual Dulan temperatures are very poor. None of the correlations are significant. Finally, the trends are very different.

Now, if we can’t compute the ring widths from the correlations, and we assume that the temperature is related to the ring widths because of the correlations, doesn’t that mean that we can’t compute the temperature from the ring widths? And since the trends can be quite different and still give the same correlations, doesn’t this mean that there is not necessarily a relation between temperature trends and tree ring width trends?

For my next excursion into dendro, I’m going to see if this is true of the Wilson study as well …

OK, six hours later, went to work (I’m currently building a handicapped access ramp on a commercial building), back again. Here’s Wilson et al. data, compared to the 1950-1990 instrumental record. Wilson used principal components, I’ve looked at the PC1 correlations. It has a number of very good correlations with various months, and it only has one negative correlation. The correlations also extend over a longer period, 21 months instead of 12 as in the Zhang et al. data. Here’s the correlations.

Prev. Jan, 0.09
Prev. Feb, 0.24
Prev. Mar, 0.31
Prev. Apr, 0.36
Prev. May, 0.34
Prev. Jun, 0.14
Prev. Jul, 0.02
Prev. Aug, 0.16
Prev. Sep, 0.22
Prev. Oct, 0.16
Prev. Nov, 0.05
Prev. Dec, -0.04
Jan, 0.08
Feb, 0.16
Mar, 0.36
Apr, 0.26
May, 0.44
Jun, 0.32
Jul, 0.21
Aug, 0.28
Sep, 0.30

So I expected that the synthetic PC1s would be very close to each other. Once again, I was surprised … here’s four synthetic PC1s that all have the same correlation with the monthly data, month by month:

As you can see, not only are the PC1s different, but their trends are also different.

CONCLUSIONS

1) Very different tree ring width patterns can give identical correlations with a given set of monthly temperatures.

2) The fact that tree ring widths are correlated with monthly temperatures does not mean that tree ring width trends are correlated with temperature trends.

3) For any given set of monthly RW/temperature correlations, there exists a family of individual different RW curves which will give the same correlations with the monthly temperatures (within instrumental accuracy).

… am I crazy for thinking that this makes it very hard to put confidence intervals on a historical tree-ring based reconstruction, and that it makes the calculated trends very suspect?

UC added here

Ref discussion at CA. I use simulated monthly temperature record, data here . Simulated ring width response is here. Correlations with monthly temps are here. But as Willis suggested, we can find other time series that give exactly the same correlations.
See e.g. simul1. and simul2. Try with Matlab,

s=corrcoef([originalrw temperature]);s(2:end,1)
s=corrcoef([simulrw1 temperature]);s(2:end,1)
s=corrcoef([simulrw2 temperature]);s(2:end,1)

rw1 and rw2 are obtained using different inital guess (white noise and red noise). Quite different results,

This entry was written by Stephen McIntyre, posted on Apr 18, 2007 at 7:50 AM, filed under Statistics. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

11 Comments

Steve McIntyre

Posted Apr 18, 2007 at 8:02 AM | Permalink

Leaving aside questions of precipitation, the Dulan junipers supposedly have a positive correlation to winter temperature and negative correlation to spring temperature. This type of positive and negative coefficient pattern reminds me of a Legendre polynomial where different orthogonal polynomials are constructed over the [-1,1] interval (i.e. in this case an annual cycle). The first Legendre polynomial P0(x)=1 i.e. an annual average. If the RW is acting as a type of integral with both positive and negative coefficients, it can quickly be orthogonal or close to orthogonal to the annual average P0(x). Here is a picture of the first few Legendre polynomials. I think that the real problems arise when you have both positive and negative correlations, such as these junipers. It’s worth considering the two cases separately.
bernie

Posted Apr 18, 2007 at 10:26 AM | Permalink

OK:
I have to admit that what you guys are doing statistically is beyond me. Logically, however, I am completely confused as to how it makes sense to talk about 12 separate monthly or 4 quarterly correlations of temperature with a single dependent variable annual ring width. There is clearly considerable correlation among months – winter months are cold and summer months are hot which amounts to a lack of independence among the independent variables. Wouldn’t a better approach be to class each year by the pattern of monthly temperatures against some norm and use those classifications as dummy variables? At moment, I sense a huge effort to construct findings that are more a matter of chance than a menaingful and definable mechanism. Of course once you do this you destroy the precision of being able to use rw as proxies for temperature – but at least you then can build a model that reflects the actual rw growth mechanisms. At the moment, the whole exercise looks serendipitous. I also think that there may be a problem in the degree of freedom in these models given the actual number of tree chronologies. What you have done is made me dig out all my stat books and rework things I haven’t looked at for 25 years.
UC

Posted Apr 18, 2007 at 1:21 PM | Permalink

It is quite interesting problem, you are given 12 vectors (zero mean) . You need to find a vector P that is aligned so that cosines of the angles between P and those 12 vectors are the given 12 values (sample correlation coefficients). I got my solutions using pseudoinverse + iterations, under-determined set of linear equations + constraint on the solution, I’m quite sure there is a better way ..
With simple 3D example it is easy to show that there can be 0, 1, or more solutions. If those given correlations are close to 1 it is harder to find solutions.
bernie

Posted Apr 18, 2007 at 3:02 PM | Permalink

But isn’t a problem with the real data is that the set of linear equations are not independent and, therefore, the solution is over-determined??
Joe Ellebracht

Posted Apr 18, 2007 at 4:22 PM | Permalink

This discussion is over my head, but let me make a suggestion anyway.

The ring widths are a function of at least 2 things, temperature and precipitation (OK, maybe CO2 concentration but I am ignoring that and nonlinearity and the interaction between temp and precip). Assuming for the purpose of discussion that the effects of temperature and precipitation are additive, then using a one factor temperature model assumes, in the model’s forecasts (backcasts), that precipitation has a unitary value at the average of its value during the calibration period. Starting from the obviously existing 2 factor model sitting there waiting to be calculated allows one to test the reasonableness of the one-factor model (temp) assumptions about the other factor (precip). Also to look for interaction factors.

Additionally, since (I think) some have calculated historical precipitation in some of the same regions from tree ring widths, perhaps these precipitation estimates could be used in a 2 factor model to get temps.

As to the slope of the one factor line, in a linear regression wouldn’t it be obvious from the error statistics whether the error of the estimate of the slope allows for the slope to be negative within the significance test?
bender

Posted Apr 18, 2007 at 9:04 PM | Permalink

It is trivially obvious that significant correlations may arise from high-frequency coherence or low-frequency coherence. I believe that is what is happening in the red and green cases above (each compared to blue). Basically, a Pearson correlation coefficient is frequency-independent, and is thus a poor measure of ‘match’ between two non-iid process. Most good dendroclimatologists understand this already.

Willis’s skepticism about one’s ability to robustly estimate confidence intervals stands, however. That’s precisely why the dendros don’t do it. Not because they haven’t thought of it, but because they don’t know how.
UC

Posted Apr 19, 2007 at 12:32 AM | Permalink

sim1 is obtained using red noise initial guess. It is true that the correlation with the original (r=0.36) is stronger in the HF band (0.5…1) that in the LF band. But note how weak that correlation is, given that these two series have identical correlations with 12 temperature series . Correlations from 0 to 0.5 give quite a lot of freedom for the alignment of that vector, the situation would be completely different if at least some of those correlations would be 0.9..0.99.

Basically, a Pearson correlation coefficient is frequency-independent, and is thus a poor measure of match’ between two non-iid process. Most good dendroclimatologists understand this already.

Hmm, if a reconstruction explains X % of the variance, it can be explaining HF part or LF part. In the GW sense the low frequencies are interesting, but for that band we need lots of samples..

Problems in climatology are very difficult, and it seems that due to this some of those scientists take the short cut. And never admit a mistake. Ref RC .
bender

Posted Apr 19, 2007 at 2:47 AM | Permalink

Re #7 Glad to see we agree.
UC

Posted Apr 21, 2007 at 1:39 PM | Permalink

Yes, hmm hmm, now for the next question:
If I make my own temperature series (better than HadCRUT), and it happens to have exact correlations with some proxy collection, INVR method (as described by Juckes et al) will give exactly the same reconstruction?

Works with JBB
Steve McIntyre

Posted Apr 21, 2007 at 2:19 PM | Permalink

UC, as usual this looks interesting. Can you make your write up a little (actually a lot less terse) as I can’t figure out what you’ve done, and, if I can’t, this may apply to others as well.
UC

Posted Apr 22, 2007 at 6:55 AM | Permalink

I just rotated the original temperature vector (Jean S can explain this part more clearly [ref: personal communication]).

Background: Juckes et al, Appendix A1, optimal estimate for the coefficients $\beta _i$ is

$\hat{\beta} _i=\frac{\sum _{k \in C} x_{ik} y _k}{\sum _{k \in C} y _k ^2 }$

in matrix form,

$\hat{\beta}=(T_c^T T_c)^{-1}T_c^T P_c$

where T is instrumental and P proxy data. To find another T that yields the same betas we can use method that Willis brought up; find a vector that has same sample correlation coefficients with N other vectors. I just took proxy matrix instead of temperature matrix, and got a new temperature series that yields the same betas as in JBB INVR. Low correlations make it possible to rotate that temperature vector quite a lot, as you can see from the differences between new and the original.