Eduardo Zorita Comments…

Eduardo Zorita sent the following in as a comment on earlier postings. As I did on a similar occasion with Rob Wilson, I’m re-posting this as a separate post on its own to ensure that it’s properly noted.

Zorita
Steve (I see that some of my comments below have already been dealt with during the weekend. Sorry for the possible repetitions..)

I will try to address some of the numerous points that you have pointed to. But before, let me
try to explain a little bit what the pseudo-proxy approach can and cannot achieve. In experimental sciences one cannot really prove a theory, one can only falsify it by performing an experiment in which the theory does not seem to hold. In paleoclimate, obviously we cannot do experiments, so we resort to parallel worlds that could mimic to a certain degree of realism the real world. In the pseudo-proxy approach these parallel worlds are the output of climate models,and this idea has been also applied in other far-away areas of research, for instance to test methods to disentangle the genetic linage of organisms. The draw-back is that we cannot represent the real world that realistically– we cannot grow bristle cone pines inside the computer, so we have to simplify the problem and get something that could look like a dendrochronological -or other proxy, time series. Given these limitations, you are bound in this approach by two factors: first you can try to be as realistic – or pessimistic if you prefer- as possible, generating artificial “bad apples” and test whatever method you prefer. If the method does not perform well, your study can be always regarded as too pessimistic, and therefore not relevant for the real world: “you have constructed the bad apples to discredit the method”.

On the other hand, one has to reach some degree of realism, to avoid a second caveat that is better illustrated with an example. Imagine that you have a marvelous proxy P that shows a correlation of 1 with the Northern hemisphere temperature. In this case, any method, indeed the simplest one T=P, will perform perfectly, but you will not be able to claim that method is right because the starting point was unrealistic. So one has to design, in one hand, proxies that are realistic enough but, on the other hand, that tend to be optimistic, so that at the end of your analysis you can write something like “even in this optimistic scenario, the method”. Therefore, one cannot test the method in “isolation”: the input data are also important.

The other side of the coin is that if you do not find something very significant, for instance in our response to your GRL paper, in which we did not found a large difference between “normal” pc centering and MBH-PC centering, it can be of course due to the fact that we were too optimistic in our generation of proxies, or due to the fact the differences do not exist. We found that in the world represented by ECHO-G and by our pseudoproxies these differences really were not large. Nothing more, but nothing less. This problem is similar as in statistical testing of hypothesis, and in science in general. Not being able to reject the null-hypothesis, in this case that the differences do not exist, does not mean that you have proven it.

Now, to some particular points:

yes. the PC-variance rescaling is implemented in V06, although I particularly think it is wrong. After finding the optimal (defined in some way) regression parameters, this rescaling shifts their values away from the optimum. Interestingly, there is paper that has not been cited in all this discussion about this point, written quite a few years ago by BàÆà⻲ger in Climate Research 1996 (the same BàÆà⻲ger as in BàÆà⻲ger and Cubasch) in the context of statistical downscaling. Statistical downscaling denotes the methods to estimate regional climate change from the output of global climate models, and technically is a problem similar to that of climate reconstruction – the target this time are the local variables, the predictors the large-scale fields. In this paper the tension between optimal estimation of the mean and variance conservation is clearly illustrated.

-Detrended or non-detrended calibration. This is an well-known issue and to my knowledge it has been considered in the statistical literature under different names: partial correlation, non-stationary regression, regression with serially correlated data.. The first paper seems to have been written by Yule as early as 1926 (“Why do we sometimes get nonsense correlations between timeseries?”), and I read recently one review paper on this topic written by Philips in 2005 (“Challenges of trending timeseries in econometrics”). So the literature must be large. In climate research it is actually very well recognized: this is why, for instance, to calculate the power of the monthly temperatures in Sidney to predict simultaneous monthly temperatures in Toronto you filter out the annual cycle. Otherwise you get a very nice high anticorrelation, which is of course useless. Or you can try to predict the number of births from its correlation with the number of storks, both showing a trend due to urbanization: again a nice, albeit, useless correlation, unless you believe that storks may indeed play a role. Many other examples abound, one particularly nice, indicating a very high (of course spurious) correlation between Northern Hemisphere temperature and West German unemployment, was shown in the NAS panel meeting. To ascertain a real link, you need a certain number of degrees of freedom, and a long-term trend is just one number, which can be arbitrary re-scaled through the calibration step to any other number one pleases. I think this is widely recognized in the analysis of instrumental data, but surprisingly not in paleoclimate.

In case of proxies, you should have to believe that the long-term trends in the proxy are completely due to the impact of its local climate, or to be more accurate, due to the impact of local temperature. This may be, or not, the case as proxies may be affected by many other long-term effects, especially in the 20th century, such as precipitation, nutrients, changes in the amplitude of the annual cycle, biological adaptation, and a long list. Actually, we know that this not just an assumption, since many tree-ring indicators and local temperatures do show a different link before and after approximately 1980, so that there must be a source of non-climatic long-term trends. As this behavior is not really understood, one has to assume that it could have also happened in the past.

This is essentially the rationale for detrending , or alternatively for including random trends in the pseudo-proxies if one relies on non-detrended calibration. Alternatively, if one has a very good knowledge of the proxies and one can rule out these potential sources of trends, then non-detrended calibration should be correct.

Surely, the econometrics literature may offer more sophisticated solutions to this problem,, and we would be well-advised to look more carefully into some of these, more professional, studies.

Ironically, in each one of the three papers submitted in which reconstructions methods are tested (VS04, VS06 and one under revisions), at least one reviewer required to test the method with red-noise pseudo-proxies (or proxies with random trends). In VS06, it was not even in the first draft and was included at request of a reviewer. This is indicative that the problem is recognized by at least some in the paleo community.

All this is however not really essential, since the method also fails even with non-detrended calibration and even with white noise, and in both models (ECHO-G and HadCM3) tested. BàÆà⻲ger, Fast and Cubasch had pointed out this already in January in the Tellus paper, which had been submitted to Science in spring 2005. Science did not consider it relevant enough for publication at that time, although we explicitly recommended it. Now, for some reason (or perhaps by chance) they changed their opinion. In my humble opinion, this paper is, however, better than the Wahl et al comment and actually better than our VS04, since it delves in a much more detail manner into the causes of the failure of many more methods.

This entry was written by Stephen McIntyre, posted on May 2, 2006 at 6:46 AM, filed under MBH98. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

21 Comments

Peter Hartley

Posted May 2, 2006 at 7:52 AM | Permalink

Re: “Actually, we know that this not just an assumption, since many tree-ring indicators and local temperatures do show a different link before and after approximately 1980, so that there must be a source of non-climatic long-term trends. As this behavior is not really understood, one has to assume that it could have also happened in the past.”

I have a question: To what extent is the deteriorating correlation between tree ring indicators and temperatures post-1980 a phenomenon of local temperatures as opposed to globally averaged surface temperature series?

I seem to recall Steve posted examples of where the local temperature correlated with tree ring indicators but both the local temperature and the tree ring indicators did not correlate with the globally-averaged series. If so, one wonders whether the poor correlation with average ground temperatures post-1980 is due to systematic non-stationary errors in those temperature measures as a result of inaccurate correction for urban heat islands and other factors such as changes in sample composition. For example, I think Douglas Hoyt suggested that the balloon and satellite temperatures were more closely correlated with tree ring widths and densities than the globally-avergaed surface tempertures.

On the other hand, I think Steve has also posted examples where the tree ring measurements did not correlate well even with local temperatures. There has been much discussion on this site of other confounding influences such as precipitation, CO2, other nutrients etc. It would seem to me that substantial progress could be made using modern data and correlating it with a huge number of local variables we can measure today. It is not just that the proxies should be “brought up to date.” They should be brought up to date with a whole battery of other data we can measure. If this is not being done, why not?
John Davis

Posted May 2, 2006 at 9:52 AM | Permalink

Re #1.
I rather think that the “divergence problem” of poor recent tree ring proxies is simply the result of selecting from a fairly random sample those which fit best to a steep trend over a given “calbration” period. Unsurprisingly, if the samples are revisited at some later date the fit is markedly worse and trends back towards the mean.
Steve McIntyre

Posted May 2, 2006 at 9:57 AM | Permalink

I’ve been re-reading some articles on “robust statistics”, especially by Frank Hampel, a leading scholar, and the following quote in Robust Inference, ftp://ftp.stat.math.ethz.ch/Research-Reports/93.pdf caught my eye (among many other important observations):

Leverage points can be good or bad; if they are proper observations, they contain a lot of information, much more than the other points, and can be extremely valuable (e.g., an isolated report on a solar eclipse in antiquity, together with many modern data). But if they are gross errors, they can completely spoil the fit to the “good” data. It is therefore of utmost importance in practice to identify and study all leverage points and try to find out whether they are trustworthy or dangerously misleading (this is mostly not a statistical problem!).

Think about this in the context of the various MBH commentaries. The bristlecones are, in Hampel’s sense, a “leverage point”. We identified them as a leverage point a year ago, by following the biased PC method and by showing (in our EE article especially) the degree of leverage of the bristlecones (doing business as the North American PC1) and the Gaspe cedars on the MBH98 reconstruction. In effect, we were viewing the various proxy series as “points” and identifying leverage. Our 2005 articles are phrased entirely in terms of robustness, rather than correctness. It’s a little frustrating to see this debate still stuck at the level of “correctness”, for which I doubt that there is a real answer.

MBH98 non-robustness also emerges through a difference between detrended and non-detrended calibration. In the empirical case of MBH98, the reconstructions with detrended and non-detrended calibration are virtually the same except for the 15th century period, where the difference in the 15th century arises because the two methods assign different weights to the bristlecones.

I would view this is as simply another way of getting to bristlecones as a leverage point. We got there through examining retained PC series, but you can get there by examining the leverage points through different calibration methods.

In our 2005 articles, we then examined the literature regarding bristlecones to see whether they had been established as a temperature proxy and found that specialist opinion, even Hughes, did not make that claim.

Simply by genuflecting on the various multiproxy series, you can’t decide whether the bristlecones dba the NOAMER PC1 are a valid temperature proxy. Simply shouting loudly that non-detrended methods are “correct” doesn’t make the bristlecones a valid proxy. Either they are or they aren’t, but that has to be done on its merits. Rob Wilson thinks that they aren’t as bad as I think. Maybe so, but someone has to show that. The literature as it stands now does not show it. Further the CO2 fertilization was a known caveat in IPCC 2AR and should not have been ignored in MBH98 without warning labels. There is a highly veiled discussion in MBH99 of a supposed adjstment for CO2 fertilization, but the “adjustment” was IMHO nonsense and definitely did not remove the leverage; it attenuated it only a little.

Here, in my opinion, people should try to move away from an abstract concept of what is the “correct” method – detrended vs non-detrended calibration – or 5 retained PCs versus 2 retained PCs, to recognizing that the differences resulting from these methods have identified a “leverage point” (bristlecones).
Steve McIntyre

Posted May 2, 2006 at 10:07 AM | Permalink

Also, Eduarado, I strongly diagree with your characterization of our results here, in which you repeat the claim in your GRL paper. I don’t object to testing the impact through simulations, only how you did it.

1) you did not implement the actual Mann algorithm. You did SVD on the correlation matrix of decentered data and not on the rectangular data matrix itself. There’s a big big difference. I’ve checked the code that you just sent me and I’m 100% certain of this. While you acknowledged the existence of the AHS bias in Mannian PC methods (based on pers comm from Zweiers), it doesn’t look like you ever benchmarked your PC methodology to ensure that you were able to produce the AHS effect from red noise. There’s some reconciliation that we need to do.

2) the amount of signal content makes a difference in where the “breakdown” in signal recovery occurs. In the signa;-noise ranges that you use, the MBH98 method and conventional PC will both pick up a signal. However below 20%signal – 80% noise, HS-mining competes with signal detection.

3) “bad apples” – we used this word in our Reply to the Comment by VZ. Mann’s PC method magnifies the impact of bad apples. You only need 1 trending bristlecone in a 50 series network to make a HS trend appear “significant”. It is the complete opposite of the point of view of robust statistics.
Ross McKitrick

Posted May 2, 2006 at 10:07 AM | Permalink

Eduardo, thanks for your extended comment. The literature on trend estimation remains very active in econometrics. Timothy Vogelsang is one of the leaders in it and has applied his methods to temperature data– his papers are worth a look.

It’s important to distinguish the issue of trend identification from nonstationary regression. Trend identification in economics helps to distinguish permanent productivity growth from transitory business cycle effects. This pre-supposes the data are stationary (or trend-stationary). Nonstationarity is predicted to arise in economic data because of forward-looking behaviour, and can yield spurious correlations even in data with no trend. But I would be very surprised if GCM-generated temperature series are nonstationary. Assuming they are not, you are right that removing common trends and cycles (eg the Sydney-Toronto example) is important in this kind of analysis, to avoid falsely identifying the shared trend or cycle as a structural component. The argument that detrending removes the information being sought (in this case a structural link between temperature and proxy) seems to me to come close to conceding that tree rings do not capture annual-frequency information. In any case the question of whether a trend should be included or not can be settled empirically, as I said in my post yesterday, so I can’t see why people try to decide it a priori. Not having seen the regressions I’d be very surprised if a trend term weren’t highly significant, and therefore the data should be detrended.

Thank you for clarifying the inherent limits of your pseudoproxy sims. However I don’t think it is sufficient to say that you left out ‘bad apples’ in order to avoid an accusation that it was an attempt to discredit the method by construction. In the MBH98 context, bad apples are a reality in the data barrel. We argued that by leaving them out of your pseudoproxy matrix you erred in the other direction–giving too much credit to the method, by construction. The test of all this, we argued, was that your simulation model cannot generate a (spuriously) high RE and a zero r2. If your RE statistic is high so will be your r2 (and vice-versa). Are we right in this?
jae

Posted May 2, 2006 at 10:13 AM | Permalink

re: 1 There is a lot of speculation about the “divergence” issue. It is intriguing to me that the Idsos show instrumental records for a different location each week in their weekly CO2 Science Newsletter, and all of the locations they have shown to date indicate a decreasing temperature (of course, it appears that they are “cherry picking,” by ignoring big cities). If tree rings are recording a temperature signal, maybe the divergence is due to global cooling! It is looking to me like the only way to demonstrate that the surface of the planet is warming is to incorporate instrumental records tainted by the “urban heat island effect.” Can someone point to a truly rural location that shows INCREASING temps.?
eduardo zorita

Posted May 2, 2006 at 10:40 AM | Permalink

#4.
Steve,

yes, I can get an Artificial Hockey Stick with my code very easily, and I presented it already in some project meetings. Take 100 red-noise timeseries, calculate the SVD decomposition on the decentered correlation matrix and you obtain a very nice hockey stick as leading PC. I think I sent you some figures sometime ago, but now I am not completely sure.

I would probably agree on your both last comments it is reasonable to think that signal-to-noise ratio is important and if you go below 80% it is also reasonable to think that the properties of the noise (in this case red and AHS) will come out more clearly.

(this paragraph seems to be written in boldface, and I do not know how to cancel it, sorry)
To your third comment, I think that we misunderstand each other because we are thinking in a different context. When we said “robust” it was in the context of including new pseudo-proxies in Africa and Asia to achieve a more regular coverage, or of including proxies without noise- something like long instrumental records.
We have never tested something like noise with strange propperties, as perhaps the britle cones pines may contain. Our set-up is quite general, all pseudo-proxies equal, the number of which not dimishing backwards in time, simple white or AR-1 noise…

I understand that perhaps you intepret our comment to your GRL as a prove that centering does not play a role in the real world, but this is not the case.

To Ross’ comment: I think that you are again (naturally) thinking of our comment to the GRL paper. I am thinking about VS04 or of more general questions about pseudo-proxies. You insist that particular flaws in MBH are caused by the bristle cone pines, this is ok, I have nothing to say about this. We focus on something perhaps a little bit more general.

I hope this clarifies somewhat my previous comments
Mark

Posted May 2, 2006 at 10:53 AM | Permalink

jae, I think the arctic has been shown to be warming. It’s pretty rural up there for sure! 🙂

Mark
jae

Posted May 2, 2006 at 11:09 AM | Permalink

Mark: I don’t know about that. There is so much confusing information out there. See:
http://www.co2science.org/scripts/CO2ScienceB2C/articles/V8/N2/EDIT.jsp
JerryB

Posted May 2, 2006 at 11:21 AM | Permalink

Regarding Arctic warming as a trend, vs got warmer in 1976 and roughly continued so, see Sue Ann Bowling’s http://climate.gi.alaska.edu/Bowling/FANB.html and her 1990 article to which it links.

Then look up the PDO shift of 1976, of which Bowling was not aware when she wrote those articles.
Mark

Posted May 2, 2006 at 11:38 AM | Permalink

I agree… but the arctic is the one that you see even skeptics supporting. Antarctica, however, is cooling with the exception of the area immediately surrouding the peninsula which sticks out into the southern Pacific. Of course, I actually had a guy read the GISS station data and tell me “half of the stations are showing warming and half are showing cooling!” He failed to note that the only stations with more than 10 years of data show cooling and the only ones showing warming are in the very small area around the peninsula. What a maroon.

Mark
Steve McIntyre

Posted May 2, 2006 at 12:34 PM | Permalink

Mark, jae, Jerry – please move this discussion to another thread – say one of the satellite threads or something like that.
Mark

Posted May 2, 2006 at 12:48 PM | Permalink

Oh, sorry, got off topic and failed to notice the divergence.
Mark
jae

Posted May 2, 2006 at 1:11 PM | Permalink

Sorry, Steve. I’ll move to General/Tree Ring Widths #1
TCO

Posted May 2, 2006 at 11:59 PM | Permalink

Thank you for dispatching the riffraff, Steve. Ed, I am reading VS06 right now (the one that talks about an MM reconstruction and fails to mention the 63 Burger and Cubasch reconstructions–haha–but that is not my main concern). I will have some clarifying comments for you. Maybe easier if I make a statement and you correct it.
Ross McKitrick

Posted May 3, 2006 at 8:34 AM | Permalink

#7, Yes, Eduardo, I am thinking about your comment on our paper in GRL, which is, of course, an application of the pseudoproxy method. However it is more than just the problems with bristlecones, it is the interaction between bristlecones and the flawed method. By disputing our analysis based on your simulation, you are implicitly saying you correctly simulated the underlying problem. In your 3rd paragraph above (“The other side of the coin is that …”) you say, and I agree, that there are 2 possible explanations why you didn’t find a big difference in outcomes: either you didn’t simulate the problem with the proxies, or there really isn’t a difference to find. In your note above you say the situation is like a non-rejected null, so the matter can’t be decided. But in our response we said, Yes it can, using the RE/r2 comparison. My working assumption is that your RE and r2 are both high, which means you didn’t correctly simulate the underlying problem (bristlecones interacting with AHS effect to yield a spuriously high RE even when there is no significant fit). I am quite willing to be proved wrong, but you have the numbers, not me.
eduardo zorita

Posted May 3, 2006 at 9:15 AM | Permalink

#17 “By disputing our analysis based on your simulation, you are implicitly saying you correctly simulated the underlying problem.”.

No, this is an extrapolation, and depends of what one sees as the underlying problem (this is too vague for me) and the meaning of “implicit”. We could only claim that the decentering did not cause a problem with the type of simple red-noise psudo-proxies that we used and in the ECHO-G world. Steve argues that MBH did something different as in our implementation, and this point may be right or not, but this is different to our claim.

” it is the interaction between bristlecones and the flawed method”
I have nothing to argue against this. Do you mean that we did not represent a bristle cone in the pseudoproxies properly enough? Of course , we did not. If you explain me the general statistical properties of a bristle cone, perhaps I could.

“I am quite willing to be proved wrong, but you have the numbers, not me”
Steve has the numbers
Steve McIntyre

Posted May 3, 2006 at 10:55 AM | Permalink

If anyone gets caught in Spam Karma, send me an email. Spam tends to come in waves and we are being inundated right now – 199 in the last 24 hours.

Eduardo has sent me a lot of data together with the Fortran programs by which he calculated the PCs. I’ve sent some questions back to him and I expect that we will arrive at a complete reconciliation.

To enable the reconciliation, I will post up a couple of tree ring-sized networks together with PC1s using an MBH98 style (which I’ve reconciled 100% to actual MBH calculations) and how I’ve interpreted Eduardo’s implementation – which yields different results. I expect that this reconciliaiton will proceed smoothly and hopefully set an example to Mannians on how such reconciliations should be done i.e. that attempts to reproduce results should be encouraged and not that obstacles should not be created. To do this, in my opinion, it’s important to have benchmark data because people use different computer languages. I can see what Eduardo’s done in his Fortran programs and easily implement it in R. This is more instructive to me than simply running a Fortran program. Likewise, if I give him back benchmark data and my results, he can compare his results to confirm that I’ve replicated his methods exactly.
Pat Frank

Posted May 3, 2006 at 12:43 PM | Permalink

#18, Steve wrote: “I expect that this reconciliaiton will proceed smoothly and hopefully set an example to Mannians on how such reconciliations should be done i.e. that attempts to reproduce results should be encouraged and not that obstacles should not be created. To do this, in my opinion, it’s important to have benchmark data…”

Actually, to do that requires people who are dedicated to knowledge for its own sake, and not for the sake of political action. Sincere committment to the work, in other words, not to tendentious connivance.

Thank-you Eduardo for showing everyone how a proper scientist approaches a difference of outcome — with cooperation, openness, and transparency. John A, please note.

Thanks also to Steve M., who has consistently adhered to that standard here. We all want to see the correct answer, and in science that means the same answer determined by a maximally rigorous method, no matter the personal expectations or opinions.
Bruce

Posted May 3, 2006 at 3:12 PM | Permalink

I would like to add my thanks to both Eduardo and Steve for engaging in a constructive manner designed to develop mutual understanding. It is hard to predict the outcome. Most likely both parties will learn and modify their approaches to at least some degree, and arrive at a common view that will provide a sound footing for further work. Sounds like good science to me.
BradH

Posted May 3, 2006 at 9:11 PM | Permalink

Bravo, Eduardo! A true scientist!

One Trackback

By Zorita Calls For Banning Mann, Jones and Rahmstorf from IPCC « the Air Vent on Nov 29, 2009 at 8:25 PM

[…] to reader Alberto for providing the heads up. Alberto also points out that Zorita has done a post on Climate Audit in 2006. If you’re interested, you can get a perspective on his take of proxy […]

Climate Audit