Could be translated as:

[Response: We, **the approved climate “scientists”**, have stressed repeatedly that single scientists

You might want to tuck the following RealClimate comment away for when the NAS panel’s results are released.

]]>[Response: We have stressed repeatedly that single scientists and single papers are not the things that the public or policy-makers should be paying much attention to. Instead, they should pay attention to the consensus summaries such as are produced by the National Academies or the IPCC where all of the science can be assimilated and put in context. In such summaries, it is very clear what everyone agrees on (gravity, the human created rise in CO2, the physics of the greenhouse effect, conservation of energy etc.), what still remains uncertain (aerosols etc.) and what implications these uncertainties may have. – gavin]

N(effective) = N (1-R1) / (1+R1)

where R1 is the lag one autocorrelation. Here’s a table of R1 versus (1-R1) / (1+R1) for some reasonable values of R1

R1 (1-R1) / (1+R1)

0.5 0.33

0.6 0.25

0.7 0.18

0.8 0.11

0.9 0.05

0.95 0.03

As Steve points out in #52, it only takes a lag one autocorreclation of 0.82, which is common in many temperature series, to give a ten-fold reduction in N …

w.

]]>(I see that some of my comments below have already been dealt with during the weekend. Sorry for the possible repetitions..)

I will try to address some of the numerous points that you have pointed to. But before, let me

try to explain a little bit what the pseudo-proxy approach can and cannot achieve. In experimental sciences one cannot really prove a theory, one can only falsify it by performing an experiment in which the theory does not seem to hold. In paleoclimate, obviously we cannot do experiments, so we resort to parallel worlds that could mimic to a certain degree of realism the real world. In the pseudo-proxy approach these parallel worlds are the output of climate models,and this idea has been also applied in other far-away areas of research, for instance to test methods to disentangle the genetic linage of organisms. The draw-back is that we cannot represent the real world that realistically– we cannot grow bristle cone pines inside the computer, so we have to simplify the problem and get something that could look like a dendrochronological -or other proxy, time series. Given en these limitations, your are bound in this approach by two factors: first you can try to be as realistic- or pessimistic if you prefer- as possible, generating artificial “bad apples” and test whatever method you prefer. If the method does not perform well, your study can be always regarded as too pessimistic, and therefore not relevant for the real world: “you have constructed the bad apples to discredit the method”. On the other hand, one has to reach some degree of realism, to avoid a second caveat that is better illustrated with an example. Imagine that you have a marvelous proxy P that shows a correlation of 1 with the Northern hemisphere temperature. In this case, any method, indeed the simplest one T=P, will perform perfectly, but you will not be able to claim that method is right because the starting point was unrealistic. So one has to design, in one hand, proxies that are realistic enough but, on the other hand, that tend to be optimistic, so that at the end of your analysis you can write something like “even in this optimistic scenario, the method….”. Therefore, one cannot test the method in “isolation”: the input data are also important.

The other side of the coin is that if you do not find something very significant, for instance in our response to your GRL paper, in which we did not found a large difference between “normal” pc centering and MBH-PC centering, it can be of course due to the fact that we were too optimistic in our generation of proxies, or due to the fact the differences do not exist. We found that in the world represented by ECHO-G and by our pseudoproxies these differences really were not large. Nothing more, but nothing less. This problem is similar as in statistical testing of hypothesis, and in science in general. Not being able to reject the null-hypothesis, in this case that the differences do not exist, does not mean that you have proven it.

Now, to some particular points:

yes. the PC-variance rescaling is implemented in V06, although I particularly think it is wrong. After finding the optimal (defined in some way) regression parameters, this rescaling shifts their values away from the optimum. Interestingly, there is paper that has not been cited in all this discussion about this point, written quite a few years ago by BàÆà⻲ger in Climate Research 1996 (the same BàÆà⻲ger as in BàÆà⻲ger and Cubasch) in the context of statistical downscaling. Statistical downscaling denotes the methods to estimate regional climate change from the output of global climate models, and technically is a problem similar to that of climate reconstruction – the target this time are the local variables, the predictors the large-scale fields. In this paper the tension between optimal estimation of the mean and variance conservation is clearly illustrated.

-Detrended or non-detrended calibration. This is an well-known issue and to my knowledge it has been considered in the statistical literature under different names: partial correlation, non-stationary regression, regression with serially correlated data.. The first paper seems to have been written by Yule as early as 1926 (“Why do we sometimes get nonsense correlations between timeseries?”), and I read recently one review paper on this topic written by Philips in 2005 (“Challenges of trending timeseries in econometrics”). So the literature must be large. In climate research it is actually very well recognized: this is why, for instance, to calculate the power of the monthly temperatures in Sidney to predict simultaneous monthly temperatures in Toronto you filter out the annual cycle. Otherwise you get a very nice high anticorrelation, which is of course useless. Or you can try to predict the number of births from its correlation with the number of storks, both showing a trend due to urbanization: again a nice, albeit, useless correlation, unless you believe that storks may indeed play a role. Many other examples abound, one particularly nice, indicating a very high (of course spurious) correlation between Northern Hemisphere temperature and West German unemployment, was shown in the NAS panel meeting. To ascertain a real link, you need a certain number of degrees of freedom, and a long-term trend is just one number, which can be arbitrary re-scaled through the calibration step to any other number one pleases. I think this is widely recognized in the analysis of instrumental data, but surprisingly not in paleoclimate.

In case of proxies, you should have to believe that the long-term trends in the proxy are completely due to the impact of its local climate, or to be more accurate, due to the impact of local temperature. This may be, or not, the case as proxies may be affected by many other long-term effects, especially in the 20th century, such as precipitation, nutrients, changes in the amplitude of the annual cycle, biological adaptation, and a long list. Actually, we know that this not just an assumption, since many tree-ring indicators and local temperatures do show a different link before and after approximately 1980, so that there must be a source of non-climatic long-term trends. As this behavior is not really understood, one has to assume that it could have also happened in the past.

This is essentially the rationale for detrending , or alternatively for including random trends in the pseudo-proxies if one relies on non-detrended calibration. Alternatively, if one has a very good knowledge of the proxies and one can rule out these potential sources of trends, then non-detrended calibration should be correct.

Surely, the econometrics literature may offer more sophisticated solutions to this problem,, and we would be well-advised to look more carefully into some of these, more professional, studies.

Ironically, in each one of the three papers submitted in which reconstructions methods are tested (VS04, VS06 and one under revisions), at least one reviewer required to test the method with red-noise pseudo-proxies (or proxies with random trends). In VS06, it was not even in the first draft and was included at request of a reviewer. This is indicative that the problem is recognized by at least some in the paleo community.

All this is however not really essential, since the method also fails even with non-detrended calibration and even with white noise, and in both models (ECHO-G and HadCM3) tested. BàÆà⻲ger, Fast and Cubasch had pointed out this already in January in the Tellus paper, which had been submitted to Science in spring 2005. Science did not consider it relevant enough for publication at that time, although we explicitly recommended it. Now, for some reason (or perhaps by chance) they changed their opinion. In my humble opinion, this paper is, however, better than the Wahl et al comment and actually better than our VS04, since it delves in a much more detail manner into the causes of the failure of many more methods.

]]>Would it not be advantageous to move your filter in one sample steps rather then the length of the filter? For example the 4 year average would be samples 1-4, 2-5, 3-6 …

]]>Now to make a reconstruction, the argument is that there is a low-frequency relationship without there being a high frequency relationship, as evidenced by say that the 2^5 year scale – but you only have 4 measurements to fit. With the 64-year scale, which is not even centennial, you only have 2 bins. So how do you get any confidence intervals. You might get an r^2 of 100%, but the t-statistic won’t be significant.

In wavelet analysis, where they try to deal with scaling issues systematically, in series of length 128, the confidence intervals by the time you get to the 5th scale are from floor to ceiling.

It’s not whether you’ve calculated the value in the bin accurately, it’s that you’ve only got a very few low-frequency values to establish a relatinoship.

And by the way, the estimation of the mean in autocorreated time series is fraught with problems totally ignored by the Hockey Team – although this is a different issue. Once the data ceases to be “independent”, then the variance of the mean estimate does not decline as 1/n, but at a much lower – and there are quite palusibile circumstances under which Hockey Team methods would underestimate it by an order of magnitude. I’ll post up some references from Hampel, who I’ve been re-reading.

]]>they only claim that they can recover the verification period mean – well this has – let me count – 1 degree of freedom. So you can’t have any confidence interval on it.

I don’t understand. A mean is a summary statistic calculated from a sample, and standard statistics gives a confidence interval for the mean.

Is there something different about this mean?

]]>