Eduardo Zorita sent the following in as a comment on earlier postings. As I did on a similar occasion with Rob Wilson, I’m re-posting this as a separate post on its own to ensure that it’s properly noted.
Steve (I see that some of my comments below have already been dealt with during the weekend. Sorry for the possible repetitions..)
I will try to address some of the numerous points that you have pointed to. But before, let me
try to explain a little bit what the pseudo-proxy approach can and cannot achieve. In experimental sciences one cannot really prove a theory, one can only falsify it by performing an experiment in which the theory does not seem to hold. In paleoclimate, obviously we cannot do experiments, so we resort to parallel worlds that could mimic to a certain degree of realism the real world. In the pseudo-proxy approach these parallel worlds are the output of climate models,and this idea has been also applied in other far-away areas of research, for instance to test methods to disentangle the genetic linage of organisms. The draw-back is that we cannot represent the real world that realistically– we cannot grow bristle cone pines inside the computer, so we have to simplify the problem and get something that could look like a dendrochronological -or other proxy, time series. Given these limitations, you are bound in this approach by two factors: first you can try to be as realistic – or pessimistic if you prefer- as possible, generating artificial “bad apples” and test whatever method you prefer. If the method does not perform well, your study can be always regarded as too pessimistic, and therefore not relevant for the real world: “you have constructed the bad apples to discredit the method”.
On the other hand, one has to reach some degree of realism, to avoid a second caveat that is better illustrated with an example. Imagine that you have a marvelous proxy P that shows a correlation of 1 with the Northern hemisphere temperature. In this case, any method, indeed the simplest one T=P, will perform perfectly, but you will not be able to claim that method is right because the starting point was unrealistic. So one has to design, in one hand, proxies that are realistic enough but, on the other hand, that tend to be optimistic, so that at the end of your analysis you can write something like “even in this optimistic scenario, the method”. Therefore, one cannot test the method in “isolation”: the input data are also important.
The other side of the coin is that if you do not find something very significant, for instance in our response to your GRL paper, in which we did not found a large difference between “normal” pc centering and MBH-PC centering, it can be of course due to the fact that we were too optimistic in our generation of proxies, or due to the fact the differences do not exist. We found that in the world represented by ECHO-G and by our pseudoproxies these differences really were not large. Nothing more, but nothing less. This problem is similar as in statistical testing of hypothesis, and in science in general. Not being able to reject the null-hypothesis, in this case that the differences do not exist, does not mean that you have proven it.
Now, to some particular points:
yes. the PC-variance rescaling is implemented in V06, although I particularly think it is wrong. After finding the optimal (defined in some way) regression parameters, this rescaling shifts their values away from the optimum. Interestingly, there is paper that has not been cited in all this discussion about this point, written quite a few years ago by BàÆà⻲ger in Climate Research 1996 (the same BàÆà⻲ger as in BàÆà⻲ger and Cubasch) in the context of statistical downscaling. Statistical downscaling denotes the methods to estimate regional climate change from the output of global climate models, and technically is a problem similar to that of climate reconstruction – the target this time are the local variables, the predictors the large-scale fields. In this paper the tension between optimal estimation of the mean and variance conservation is clearly illustrated.
-Detrended or non-detrended calibration. This is an well-known issue and to my knowledge it has been considered in the statistical literature under different names: partial correlation, non-stationary regression, regression with serially correlated data.. The first paper seems to have been written by Yule as early as 1926 (“Why do we sometimes get nonsense correlations between timeseries?”), and I read recently one review paper on this topic written by Philips in 2005 (“Challenges of trending timeseries in econometrics”). So the literature must be large. In climate research it is actually very well recognized: this is why, for instance, to calculate the power of the monthly temperatures in Sidney to predict simultaneous monthly temperatures in Toronto you filter out the annual cycle. Otherwise you get a very nice high anticorrelation, which is of course useless. Or you can try to predict the number of births from its correlation with the number of storks, both showing a trend due to urbanization: again a nice, albeit, useless correlation, unless you believe that storks may indeed play a role. Many other examples abound, one particularly nice, indicating a very high (of course spurious) correlation between Northern Hemisphere temperature and West German unemployment, was shown in the NAS panel meeting. To ascertain a real link, you need a certain number of degrees of freedom, and a long-term trend is just one number, which can be arbitrary re-scaled through the calibration step to any other number one pleases. I think this is widely recognized in the analysis of instrumental data, but surprisingly not in paleoclimate.
In case of proxies, you should have to believe that the long-term trends in the proxy are completely due to the impact of its local climate, or to be more accurate, due to the impact of local temperature. This may be, or not, the case as proxies may be affected by many other long-term effects, especially in the 20th century, such as precipitation, nutrients, changes in the amplitude of the annual cycle, biological adaptation, and a long list. Actually, we know that this not just an assumption, since many tree-ring indicators and local temperatures do show a different link before and after approximately 1980, so that there must be a source of non-climatic long-term trends. As this behavior is not really understood, one has to assume that it could have also happened in the past.
This is essentially the rationale for detrending , or alternatively for including random trends in the pseudo-proxies if one relies on non-detrended calibration. Alternatively, if one has a very good knowledge of the proxies and one can rule out these potential sources of trends, then non-detrended calibration should be correct.
Surely, the econometrics literature may offer more sophisticated solutions to this problem,, and we would be well-advised to look more carefully into some of these, more professional, studies.
Ironically, in each one of the three papers submitted in which reconstructions methods are tested (VS04, VS06 and one under revisions), at least one reviewer required to test the method with red-noise pseudo-proxies (or proxies with random trends). In VS06, it was not even in the first draft and was included at request of a reviewer. This is indicative that the problem is recognized by at least some in the paleo community.
All this is however not really essential, since the method also fails even with non-detrended calibration and even with white noise, and in both models (ECHO-G and HadCM3) tested. BàÆà⻲ger, Fast and Cubasch had pointed out this already in January in the Tellus paper, which had been submitted to Science in spring 2005. Science did not consider it relevant enough for publication at that time, although we explicitly recommended it. Now, for some reason (or perhaps by chance) they changed their opinion. In my humble opinion, this paper is, however, better than the Wahl et al comment and actually better than our VS04, since it delves in a much more detail manner into the causes of the failure of many more methods.