Mann reported a “significant” correlation of -0.5481829 between Dongge dO18 and gridcell temperature. Today I will report on exactly how Mann calculated this “significant” correlation. In keeping with recent requests, I will refrain from making any comments on this procedure, in the confident expectation that my critics will provide some commentary on what they think of this procedure.

Let me start by describing the data sets used in the calculation. The temperature data used in the calculation is Mann’s infilled version of CRU data for gridcell 27.5N 107.5E, with the Mann version shown below against current CRU annual data (red dots). The CRU series starts in 1921, so the first half of the temperature data is not “observed” directly but has been “infilled” by Mann in a calculation that I have not yet had an opportunity to examine.

The Dongge O18 data is shown below (inverted orientation); the original data is not available annually but only in irregular years (shown in red dots), with the Mann data interpolated linearly. There are 11 values after 1921 (the start of the actual CRU instrumental record) and 33 values since 1850, the start of the infllled instrumental record. The age model used by Mann is the “tuned” age model, with the age apparently “tuned” using the method criticized by Gavin Schmidt in his critique of Loehle.

Mann’s low-frequency correlation proved to be the correlation between highly smoothed versions of both series: each series was Mannian smoothed using a Butterworth filter with f=0.05. For reference, the smoothed gridcell series is shown below (this is shown in SD Units below, together with the version extracted from clidatal, which matches very closely.)

Next here is the corresponding smoothed version of the Dongge O18 series:

In their SI, Mann et al say:

Owing to reduced degrees of freedom arising from modest temporal autocorrelation, the effective P value for annual screening is slightly higher …For the decadally resolved proxies, the effect is negligible because the decadal time scale of the smoothing is long compared with the intrinsic autocorrelation time scales of the data.

Obviously, the radical smoothing of these two series will reduce the number of degrees of freedom. Santer et al 2008 recently commented on the effect of autocorrelation on degrees of freedom and, presumably, one of the first observations that a co-author of Santer et al 2008 (such as Gavin Schmidt) reviewing this article would make is: ummm, Mike, can you flesh out your argument that autocorrelation in the “decadally resolved” series doesn’t matter?

Y’see, the number of years is 146 (1850-1995). The autocorrelation of the residuals in a linear regression is 0.9945544 and the resulting degrees of freedom using the Quenouille formula used in Santer et al 2008 (N(1-r)/(1+r) is only 0.399, something that must have worried Gavin Schmidt.

As an experiment, I tried the following procedure to calculate the relationship between O18 and Dongge gridcell temperature. I made the assumption that Dongge O18 could not teleconnect with future temperatures. Based on this assumption, for each year in which there was a Dongge speleo O18 reading, I calculated the average gridcell temperature for the prior years (up to the previous reading.) The results are shown below (with the “binned” temperature as red dots), compared with the original “infilled” series and the Mannian smooth.

Now I realize that this procedure does not exploit all possble covariance information between Dongge O18 and ring widths of Argentine cypress, but this is just a blog and not a “peer reviewed” publication in an esteemed journal such as PNAS. If critics will grant me the permission to proceed with the analysis on this basis, below is a scatter plot between the “binned” gridcell temperature and Dongge O18.

The r2 of the relationship is 0.0006547 (adjusted r2: -0.03158) with a t-statistic of -0.143, a value which does not meet any significance test.

Spurious correlations between smoothed series (the Slutzky-Yule effect) has been known to economists and statisticians since the 1930s. It has been mentioned in recent climate literature e.g. Gershunov et al (J Clim 2001) who state:

spurious relationships abound, especially when one deals with low-frequency phenomena diagnosed in short time series (Wunsch 1999). In general, the apparent presence of trends and periodicities in short filtered random time series is known as the ‘‘Slutsky–Yule effect’ (Stephenson et al. 2000).

## 17 Comments

Seriously, is there any other scientific discipline where you can just “fill-in” (read: make up) missing data??

And what is the rationale for running the correlation on smoothed data sets? I see no evidence that the smoothing is doing anything other than rewriting the data.

Steve:Why are you asking me what the rationale is? Ask Mann or Gavin Schmidt.I see five charts in your post. I expected a sixth to follow this comment:

Fixed.Steve:

I think Steve is right to use only the raw Dongge data to determine this correlation, and not the interpolated and then smoothed Dongge data.

However, I’d be curious to know what the correlation is between raw Dongge and unbinned temperature (for the same year only).

Is the rationale for binning temperatures that groundwater d18O represents an average of several past years’ rainfall and not just the current year’s rainfall? If so, the binning should perhaps extend a fixed number of years into the past (perhaps with declining weights) from each Dongge observation, regardless of their spacing. In any event, I think Steve is correct to use only current and/or past temperatures, and not future temperatures, in constructing his temperature variable.

Wow, the smoothed Dongge PREDICTS the smoothed gridcell temperature.

That’s what I call correlation! Thank you Mann.

And what is the justification for the assumption “that Dongge O18 could not teleconnect with future temperatures”?

Ok, sarcasm off.

Wouldn’t the correlation improve dramatically if it turned out that the dating of the dO18 was wrong by 5-10 years before 1960? Having said that, the 1950-2000 correlation looks lousy whichever way you look at it (although I admit my eyeballs have not been peer-reviewed).

As a retire ME I am pretty much an anostic on AWG, but you pick too many nits with accepted warming theory.

Re: Jim Norvell (#7),

Two space shuttles burned up because engineers failed to “pick nits.”

Re: Jim Norvell (#7),

Hmm. Seems to me as if much of accepted warming theory has a very low signal to nits ratio.

Regarding the Dongge O18 data, it’s been a long time since my last statistics course, but the notion of linear interpolation between irregular-year data points raises a flag. This seems to add additional smoothing at the front end. Depending on the ratio of filled-in to raw-data years, this could be a

veryheavy-handed smoothing. But maybe that’s what you already said in another way. If so, sorry.A great idea, only Gavin would censor everything at RC, Hansen is busy on the Discovery channel claiming Armageddon from 2 C temp rise, and Mann is busy with stealth deletions and alterations on the Penn State web site. In all seriousness, why isn’t the paleoclimate crowd demanding these answers, if not in the review process, then in later commentary? (I know, rhetorical and OT).

I understand the argument for nonexistence of teleconnection. If you have a timeseries and you average a section, placing the result at the one side causes a phase change in the filtered signal. Variable frequency data (red dots) results in a variable phase change which if there is a signal would make correlation worse.

I don’t think this is quite right. Mann is using a 20-yr lowpass filter which makes the effective sampling rate about once every 10 yrs, for N ~ 14. Applying the Quenouille formula on an annual autocorrelation basis in this case produces a meaningless number. It would be more suitable to calculate the autocorrelation at lag ~11. A correlation greater than 0.5 for N=14, given serial independence, does reach 95% significance.

Jesper #12 writes,

But if you use the autocorrelation at lag 11, the “N” in the formula should be 14 or so, not 146, so that as long as there is still a fair amount of autocorrelation, the adjusted sample size will still be well under 14.

But the AR(1) model probably isn’t very good for this complicated doubly smoothed data that may have been autocorrelated to start with. I would therefore just take the Quenouille adjustment as an indication of cause for extreme concern, rather than as definitive. I think Steve did right to just regress the raw Dongge data on binned temperature, though as I noted in #3 above, there might be other reasonable ways of binning.

#12. I think that the most sensible approach (as Hu concurs) is to do the analysis with binned data – perhaps varying the binning approach along the lines suggested in a comment by Hu. What is the correct number of df for a Mannian calculation, considering all the steps? Who knows. Remember you start off with irregular data spaced about 10 years apart on average. You then do a linear interpolation. You then apply a Butterworth filter with f=.05 to both the instrumental and proxy data and you get a set of residuals with autocorrelation off the charts. The setup doesn’t fit the assumptions to do a linear regression (which is equivalent here to a correlation).

You have no basis for asserting that the effective N is about 14 merely because “Mann is using a 20-yr lowpass filter”. I don’t know how many df are really in this setup; but there’s more to this set up than the Butterworth. You haven;’t shown that Slutsky-Yule isn’t present here, for example.

You can’t ignore the binned results, which are arguably more faithful to the data as it exists. All in all, the Mannian calculation is seriously screwed up way of doing things. Given that PNAS knew that these calculations would be scrutinized, you’d think that they’d have made a better effort to have stuff that was a little less embarrassing.

Re: Steve McIntyre (#14),

As you are probably aware,

PNAShas a mixed reputation because of the different tracks of peer review that can taken to have a paper published in the journal. The “Communicated by Lonnie G. Thompson” on Mann (2008) indicates that the paper was submitted through Track I. In this path to publication, Dr. Thompson served as editor for the article and obtained at least 2 reviews of the paper from individuals at other institutions, and not from those of any of the authors. The peer review process is not conducted at the level of the PNAS editorial board, but at the level of the communicator. In principle, those peers and their reviews should be anonymous, but in practice, they are often not. Once the concerns of the reviewers are answered, the editor advises the editorial board of whether to publish it. While the editorial board reserves the right to reject any manuscript, this rarely happens on Track I publications. I have been witness to conversations between scientists that sounded like the following: “We’ll get Dr. X down the hall to communicate it, and he’ll lean on 2 of his peers to bless it, and it’ll get published.”My mentor, a NAS member themselves, refused to publish my work in PNAS because of the stigma attached when you communicate your own work. I don’t enjoy the fact that I am immediately more skeptical of manuscripts submitted through Track I, but I am.

Off topic:

Working in the digital signal processing area, I think what is missing in handling climate data is a better understanding (and knowledge) of sampled data. After all what we have here is discrete data series sampled from a continuous signal. Besides obvious errors with the sampling process (UHI et al) I wonder if people working in climate science fundamentally understand what it means to “take a temperature sample”. Basic Nyquist and Shannon stuff.

Besides, I think taking a “true” daily-average “temperature sample” at a weather-station is quite pointless (for climate science IMHO) if you don’t have records about the accompanying daily sunshine duration and energy transport via wind, convection, and so on. Even than I have difficulties to fathom how one could create a representation of “real climate” from this data – but maybe I just lack imagination.

Does anyone know of a publication comparing the statistics of binned vs smoothed data? This impacts on the recent discussions regarding methods of smoothing, as you would think that binning would always be more reliable than smoothing on autocorrelated dta. But I can’t find any study or advice on it. thanks