The “full” network isn’t.
Mann et al 2008 describe a “full” network consisting of 1209 proxies:
We made use of a multiple proxy (‘‘multiproxy’) database consisting of a diverse (1,209) set of annually (1,158) and decadally (51) resolved proxy series … Reconstructions were performed based on both the ‘‘full’ proxy data network and on a ‘‘screened’ network (Table S1)
The locations of the “full” network are illustrated in their Figure 1, captioned “spatial distribution of full (A) and screened (B)” network. Their Figure S7 compares “long-term CPS NH land (a) and EIV NH land plus ocean (full global proxy network) (b) with and without using tree-ring data”. Their Figure S8 compares “long-term CPS NH land (a) and EIV NH land plus ocean (b) reconstructions (full global proxy network)”. Their Figure S11 shows “plots of the NH CPS Land reconstruction (A) and EIV Land plus ocean (full global proxy network)”.
Table S1 shows breakdown counts by dendro and non-dendro, by annual-resolved and total, by NH, SH and Global. The top portion of Table S1 is excerpted here:
The problem is that their actual calculations appear almost certainly not to have used the network with the count statistics shown in Table S1, but a truncated version of that network, with their “full” network using only 663 (560 – NH) of the 1209 sites (1026- NH). Jean S has run the gridproxy.m program using the FULL option and so we have been able to “prove” this for this part of the program (CPS) through an actual run of the program. (I’m not sure that the EIV portion of the program can be run with present code.)
First here is a table showing counts that compare to the top right counts (NH all) for Table S1. The column headed Matlab shows the counts obtained from the gridproxy.m program, while the column headed “Reported” shows the corresponding column from Table S1. I’ll explain the calculation of the other 3 columns below (which I calculated from first principles. For now, you can see that my NH total in the third column matches the counts from gridproxy.m exactly in nearly every case and, in no case, differs by more than 1.)
NH Dendro |
NH Other |
NH Total |
Matlab |
Reported |
421 |
139 |
560 |
560 |
1036 |
297 |
129 |
426 |
427 |
700 |
162 |
124 |
286 |
287 |
408 |
90 |
120 |
210 |
211 |
277 |
62 |
39 |
101 |
102 |
151 |
28 |
34 |
62 |
62 |
102 |
19 |
30 |
49 |
49 |
77 |
11 |
30 |
41 |
41 |
63 |
6 |
24 |
30 |
30 |
46 |
4 |
21 |
25 |
25 |
40 |
4 |
20 |
24 |
24 |
37 |
4 |
20 |
24 |
24 |
34 |
4 |
19 |
23 |
23 |
33 |
4 |
18 |
22 |
22 |
28 |
4 |
18 |
22 |
22 |
28 |
3 |
18 |
21 |
21 |
27 |
2 |
17 |
19 |
19 |
24 |
An Unreported Screening Procedure on the Full Network
The difference appears to result from an unreported screening procedure on the “full” network.
Mann’s SI does describe a screening procedure on his “screened” network:
Screening Procedure. To pass screening, a series was required to exhibit a statistically significant (P 0.10) correlation with either one of the two closest instrumental surface temperature grid points over the calibration interval (although see discussion below about the influence of temporal autocorrelation, which reduces the effective criterion to roughly P 0.13 in the case of correlations at the annual time scales)
The implementation of a procedure along these lines can be discerned in gridproxy.m in the case (confusingly) entitled “WHOLE”, but which means screened. At one stage of the program, correlation hurdles are defined for various proxy classes (using the proxy class codes to set the hurdle.) “Annual” and “decadally” resolved proxies have different benchmarks. Documentary and speleothems (5000,6000 series) have somewhat different benchmarks (corals – 7000 check for negative correlation). The relevant lines are:
case ‘WHOLE’
ia=4; %% annually resolved
ib=5; %% decadally resolved
corra=0.106; %% 9000,8000,7500,4000,3000,2000
corra2=-0.106; %% 7000
corra3=0.136; %% 6000, 5000
corrb=0.337; %% same as above +1
corrb2=-0.337;
corrb3=0.425;
Later in the program, they test the observed correlation for each proxy against the relevant benchmark (their code is very inelegant; they speak Matlab with a heavy Fortran accent). For the “screened” network, corra is 0.106.
for i=1:m1-1 % This is for searching annually-resolved proxies
if (z(3,i)==9000 | z(3,i)==8000 | z(3,i)==7500 | z(3,i)==4000 | z(3,i)==3000 | z(3,i)==2000) &…
x(kk,i+1)>-99999 & x(kkk,i+1)>-99999 &…
z(1,i)>=ilon1 & z(1,i)=ilat1 & z(2,i)=corra
n=n+1;
…
end
For corals and speleothems, they do it a little differently, testing the absolute value of the correlation:
… z(1,i)>=ilon1 & z(1,i)=ilat1 & z(2,i)=corra3
Now watch what happens in the “FULL” case. Here’s how they set their hurdles. Can you see what happens?
case ‘FULL_’
ia=4;
ib=5;
corra=0;
corra2=0;
corra3=0;
corrb=0;
corrb2=0;
corrb3=0;
If a series has a negative correlation to temperature (as about half of the tree ring series do), the test shown above will lead to the exclusion of this series. So this test either intentionally or unintentionally eliminates all the series with negative correlations to gridcell temperature – thus the reduction in NH count from 1036 to 560. Jean S got this count from running gridproxy.m and I got this number by independently comparing reported correlations to a benchmark of 0. So while it’s not impossible that I’ve misinterpreted something here, until we see some alternate explanation, I think that this interpretation should be regarded as valid.
Now this creates problems – both with the statistical significance of what was reported and with the accuracy of the methodological description in the PNAS article, both familiar themes in this corpus.
We’ve discussed on many occasions that you can “get” a HS merely from picking upward-trending series from networks of red noise (David Stockwell had a good note on this phenomenon on his blog a couple of years ago. My first experiments of this type were on the cherry picks in the original Jacoby network.) Since gridcell temperature series from the mid-19th century to late 20th century by and large have an upward trend, this test is equivalent to a simple pick operation. Whatever the merits of effectively screening the data set for upward trending data, this affects statistical benchmarks, as the null is applying the same sort of operation to red noise networks.
Secondly, whatever one may think of the Mannian PC algorithm (and defenders are reduced to a pretty hard core right now), few people defend the non-reporting of their modification of the PC algorithm and their failure to assess the statistical significance and properties of these changes. (Quite aside from whether they can “get” a HS using some other method.) We’re into something pretty similar here. Again there seems to be an unreported screening operation on the “full” network. The value of archiving code is once again demonstrated as the unreported operation can only be discerned through parsing code. Because Mann et al commendably archived code concurrent with publication, it hasn’t taken years to identify the issue. (I note that the precise identification of the problem with Mannian PCs also was identified when a little bit of code was inadvertently left in a directory identified for the public after MM03.) So code really does help pin down problems.
41 Comments
Steve: It looks like you forgot to close a tag in the second code listing above:
<strong>z(ia,i)>=corra
— Sinan
Re: Sinan Unur (#1),
He also forgot to close a [b] tag in html.
There seems to be an interaction between blockquote and strong in WordPress. I did it right but WordPress removed the end bold inside the blockquote, so I tried to manually fix it.
If the period for testing for correlation with temperature was something like 500 years, then maybe those with no correlation are “bad” proxies without it being potentially spurious (a “mining for hockey sticks” exercise), but even then negative correlations would be just as good as positive (just be careful with the sign). Using R2 they wouldn’t have made this error of thinking a negative correlation means no predictive power.
Again this is great detective work.
I guess I am bemused by the decision to use r> |.106| as the criteria for including a proxy series. Why not r > |.206|? Since r> |.106| leads to an r2 of 1%, which means that there are innumerable other factors – some likely to be more potent – contributing to the proxy metric. Without specifying these how does one go about reconstructing the past temperature record without necessarily including error bars that make the whole exercise problematic. I mean if you had two thermometers in the same grid cell that caliberated r=.10 would you use either of them? Or do I have the wrong end of the stick here?
#5. Also recall the pick two procedure for their calculation of correlations.
The other aspect to this is reverse engineering. It’s hard to rationally justify the purposes of many of these methods, but they all end up affecting weights. Some preliminary calcs by Jean S indicate that the weights are highly concentrated on a few proxies. Lucy and Charlie Brown one more time. So the issue will be whether this few proxies (bristlecones, Finnish sediments with Mann orientation) are wonder thermometers. Same old same old.
If one assumes the missing ‘abs’ to be a bug, rather than a feature – what is the effect on the results of adding the abs to:
z(1,i)>=ilon1 & z(1,i)=ilat1 & z(2,i)=corra
in the non-corals/speleotherms case?
Steve,
I tried sending you an email about this, but it seems to have become lost. There are quite a few dead links to climate2003.com on the CA homepage.
Thought you might like to know for cleanliness purposes.
Steve: That’s an old website. I got tired of paying for it when I had space at CA but need to spend some time posting the pages again under a new alter ego. So much to do, so little time.
If you want to prove that the mean climate in the past 1000 years was without structure, picking proxies with a r=.1 with climate is ideal. Take a bunch of proxies that explain barely more than 1% of the variation in temperature and average them and you get 0 for each point in time…a nice flat hockey stick handle.
#7. Hard to say. I don’t think that anyone has got the next portion to work yet.
There are any number of calculations which can be used to show the pointlessness of certain operations. At some point, you’d think that people in the trade have to take some responsibility for underwriting subprime proxies.
Also, what is with all the magic numbers in the code, like the following:
Whoever wrote that needs to take a CS101 refresher.
Steve: those are codes for the types of proxies. It’s a bit inelegant but I find their Fortran style do-loops in preference to logical vectors and functions to be more annoying,
I have to mention that it is not just the picking process that results in creation of the hockey stick. It is the amplification process, an ideal scale and offset match the temperatures in recent times to the best possible extent which deamplifies historic data.
A simple picking process amplifies current data while averaging the remainder. If there is a real historic signal, it is on the correct scale. I demonstrated this effect on this post. The data was not amplified, just picked.
http://noconsensus.wordpress.com/2008/09/22/how-come-so-many-independant-papers-claim-hockey-sticks/
Amplification of the sorted data results in a statistical demagnification and offsetting of historic data relative to present data. As I showed in this post.
http://noconsensus.wordpress.com/2008/09/29/simple-statistical-evidence-why-hockey-stick-temp-graphs-are-bent/
It is very important that the discussion doesn’t become, why not use better or higher correlation values. Any noise created by processes other than the one you are looking for will cause this effect. Higher correlation sorting of a set of data gives the same shaped result as low correlation with a noisier graph due to less series passing calibration.
Jeff, here are some posts by David Stockwell along similar lines. http://landshape.org/enm/2006/03/ (the pictures see, to be lost, I’ll check with him about this.) This was in part pursuant to passim remarks here. I’ll re-read your post and try to understand the interaction.
An elementary exercise that grad students have to go through is before collecting data see if the sample size they plan to collect with the expected variability will enable them to detect an effect of a given size (a power analysis). With a correlation of .1 as a cutoff, it is unlikely that any signal (such as MWP) would be detectable, no matter how many proxies are tossed in the hat.
It is intentional. They state in the main text:
So the first if (and the corresponding for the “decadal” proxies later) tests for positive correlation. This includes categories 9000 (tree-width), 8000 (ice-core oxygen isotopes? – yes), 7500 (MXD), 4000 (lake sediments), 3000 (composite??? – SM: 3000 – a few temperature reconstructions from tree rings unaccountably not classed with dendro; 3001- ocean sediments) and 2000 (Luterbacher). It is worth mentioning that four Lake Korttajärvi series are in the category 4000.
The second if
tests for negative correlation and is design only for the category 7000 (coral oxygen-isotope records?).
Now the third if for the categories 6000 (speleothems) and 5000 (what is this category??? [SM- documentary, 5000 mostly precipitation, 5001 mostly Chinese ] , it sure looks like the historical documents category… maybe 5000 and 4000 should be switched??) is the one for series with a priori unknown sign of the correlation. The interesting thing is the line right after
So although the sign of the correlation is not known, it is known a priori that the series with negative correlation should be flipped!
Re: Jean S (#15),
Steve, does this mean they expected a positive correlation for the Lake Korttajarvi sediments you mentioned in your “It’s Saturday Night Live” post? Are lake sediment studies normally positively correlated and the Korttajarvi sediments involved a different kind of data collection procedure which actually gives a negative correlation?
Steve: I can’t imagine that all the possible lake sediment parameters are known a priori to be positive.
I’ve made a few inline comments on #15. In my post, I said that 5000,6000 were speleothems and corals. Brain cramp. Corals are 7000 series, 6000 is speleothem, 5000 documentary.
I’m still not convinced that they intentionally shrunk the “full” network without reporting the screen and shrink. While their stats and commentary on the “full” network are incorrect, I suspect that this particular screen was inadvertent; otherwise, they are making intentional misrepresentation – something that I’m reluctant to think on these facts.
I’m sure that you’ve noticed that they use Mannian smoothing – wouldn’t it be nice to encounter one method that was used in conventional literature.
The function lowpassmin cycles through end point padding alternatives. I’ve experimented with 10-point Butterworth filters using the R-function from signal package and noticed very serious end point ringing for the Socotra and Dongge series. I’m not used to Butterworth filters and am not over-interpreting this, but the effect is quite large in a first pass
Steve
Read some EE Texts about Butterworth filters. That is where they originate. Very nice but they have some spurious outputs that are known in the EE world.
Butterworth basically just means “no ripple”. i.e. the frequency response falls off from a shelf into a nice smooth curve with no wiggling around. There are filters with steeper falloff but they have ripple in the “shelf” portion in exchange. There are also filters with less phase shift but they also have ripple.
Regarding the coding style: I think what Soronel was trying to say is that a good programmer uses named constants to make the code readable. In C you could do:
#define CODE_DOCUMENTARY 5000
if( type == CODE_DOCUMENTARY ) …
and so on.
Here are some graphs comparing the three primary types of low pass filter in electronics – Butterworth, Bessel and Chebyshev, showing the strengths and weaknesses of each.
Electronics texts are not very helpful on the impact of these filters on noisy stochastic series, which are a different design issue.
THe issue that I’m looking is at end effects where the butterworth filter seems to really produce some spurious ringing effects at the ends of some series e.g. Dongge. But I’m also not used to these filters and there may be some differences between the R and Matlab implementations that’s producing an artifact. JEan S has done a Matlab calc on DOngge but I’m stuck for now on transposing it to R.
Given that the smoothing is probably not a good idea in the first place(see MAtt Briggs), spending time on potential end point issues resulting from Mannian smoothing is a very unedifying project.
Re: Steve McIntyre (#21),
I assume, from this comment, that you are interested in the time domain response of the Butterworth filer and not the frequency domain that previous posters have described. It has been a long time since my circuit lectures but your comment seems to be describing the impulse response of the filter. Perhaps more knowledgeable people cna comment but this is an inherent part of the filter design/ The ringing will change depending on the shape of the signal. A signal with an abrupt cutoff will have a perceptible ring. This can be heard on the audible ringing tone of a telephone system. The tone will be low pass filtered after D to A conversion. Depending on the state of the signal when it is cut off, a pop (ringing) mauy be heard. So the signal sounds like ring-pop
Re: Steve McIntyre (#21),
If the end-padding method is fixed, RomanM’s impulse response matrix tells all we need ( assuming the processes in question are Gaussian.. )
http://www.climateaudit.org/?p=3504#comment-301058
I think “The Scientist and Engineer’s Guide to Digital Signal Processing” might be more along the lines your looking for. The whole book is free to download. Chapter 20, “Chebyshev Filters”, is about Chebyshev and Butterworth filters.
From an earth sciences perspective, a more important, underlying question is whether signal processing/filtering methods developed for electronics are, in fact, applicable to climatological data and analysis, both for real data and for proxies.
Exactly, I’ve had several jobs where dropping the numbers directly into the test would get you a serious dressing down and people who couldn’t learn got canned.
Steve McI writes (#6),
Last weekend I tried implementing the Mannian Pick Two procedure on my golf game: the course wasn’t busy, so I tried teeing off twice on each hole and playing the better of the two drives. This gave me a much better score than the stuffy conventional rules. The PGA should take note!
Can anyone give an overview of the “5000 – Documentary” proxies? Because the central European historians are pretty emphatic about the Medieval Warm Period, and it would be interesting to see if any of that information is in here in the first place.
From tracking wheat harvests through birthrates and taxes, there’s a fair chunk of records.
Mr. McIntyre : I think Don is right and you are indeed dealing with the impulse response properties of the filter.
I was just trying to help by describing what the term “Butterworth” actually means. What it describes is a frequency domain property (maximal passband flatness and no ripple). I realize this isn’t very helpful statistically but thought it would be good to at least know what the term means.
I am fairly sure that you are both right that a Butterworth filter will exhibit significant ringing after a step change. It’s arguable what the correct method is for using a Butterworth on a series which has finite length. The only 100% valid approach is to truncate the smoothed result by the filter width, so that all your output values are only involving valid input sample values.
If you’re not going to do that you’d have to, at the very least, replicate the end values to avoid injecting high frequencies into the filter near the ends. Step changes are not what this type of filter is designed to handle at all. But even that isn’t really a very clever thing to do, since you’re affecting the trend by having a dramatic loss of all frequency components of the signal near the ends. Neither is linear interpolation very clever, that fixes the first derivative but not the second derivative, etc.
So really where does it stop? You’d basically need a way to generate data at the ends which has the same spectral properties of the data itself but won’t affect the trend and append that. I’m still not sure that’s a totally valid solution but it’s the best I can come up with. I’d truncate it.
Ah-hah:
Here they state:
Butterworth maximally flat magnitude
Diavantages:
Some overshoot and ringing in step response.
They go on to say that if you want minimal ringing after a step change you should use a Thomson filter. Maybe that’s the better way to go in this case. So if you’re going to do your own study, you might want to look into that. Won’t help for pre-existing studies using Butterworth though.
Me, I don’t “want” to do anything with this particular filter. It’s just that it’s embedded into this study. (ACtually Mann used it before, we just haven’t looked at it before.)
What is worth investigating is the impact of such an extensive time duration of a 10th order IIR filter.
Mark
Causality issues notwithstanding, of course.
Mark
EA and Steve #27:
Maybe that is why the X-ray density sign was swopped. Otherwise the lake sediments would have a negative correlation to other methods!
I’m amused by the topic of filter design having cropped up. It’s many years since I did any of this kind of thing and I never really understood the electronics. But I knew someone who did. His standard technique for testing analogue filters was to put a square wave into it, and look at the output on a scope. This allows you to see the impulse repsonse. A Butterworth filter would, if I recall, ring a bit. You’d get a high frequency spike that then decayed very quickly. Bessel filters wouldn’t. They were, in effect, critically damped.
I’m more than a little bit amazed to find this coming up in a discussion on statistics.
But I’m having real trouble imagining what applying such a filter to this kind of data means. If one believed that the climate was a collection of overlapping cycles of different lengths – then this technique could be used to remove the shortest cycles.
But proxies aren’t a continuous signal. They are a series of data points, and the variation between successive data points could be quite large – not disimilar to the change at an end point. If you are getting ringing at an enpoint are you also getting ringing within the dataset. This rather depends on the frequency of sampling (Nyquist?) also on what smoothing might already have been applied.
Steve: Take a look at the Dongge cave smoothing on the new thread. Is this the sort of effect that you’ve noticed:
Re: Nick Moon (#36),
Actually that allows you to see the step response (assuming the period of the pulse had sufficient duration), which is different than the impulse response.
Mark
I can say from my experience in a different field (audio engineering) that Butterworth filters are commonly used as low- or bandpass filters when “treating” noisy analogue sound recordings, because of their nice flat response in the frequency domain. In my own audio restoration work I am *not* using them any more however, as I find the VERY noticeable impulse distortion objectionable – a steep BW filter will turn a sharp audio impulse (no matter if caused by a scratched record or by a percussive instrument) into a declining sinusoidal wave of perceptible length, about 5 to 10 times as long as the original impulse, audible as a “ringing” noise like from a dampened plucked string. Definitely NOT what you’d want to use if – like with the temperature curves discussed here – you’d need to preserve the curve SHAPE rather than (a certain part of) its spectral properties! By now, there are linear-phase digital lowpass and bandpass filters available for audio and video use that avoid these defects. Sorry I have no idea how exactly these are implemented mathematically, but IMHO this would be a field where to look for better filtering/smoothing methods.
I’m starting a thread on Buterworth filters. Given that frequency fidelity is not an issue in these reconstructions (nobody argues for stable underlying cycles), as others observe, why use this particular method? Anyway, please take this to the other thread.
Re: Steve McIntyre (#39),
What happened to the other thread? It seems to have disappeared?
[I suppose it will re-appear as soon as I post this, however.
Steve: sorry about that. I was adding some info from UC and inadvertently left it offline. .
Re: #36
Yes the spike at the end really does look very like what you see on an oscilloscope.
Re: #38 – I stand fully corrected.