Mann 2008 Correlation Benchmarks

Update Sep 24 – I suggest that you start with a later post here.

The purpose of working through frustrating details of Mannian lat-longs and so on was to start testing the assertion that the network contained 484 “significant” proxies and that this meant something. As so often, there’s more to this than meets the eye.

I’ve calculated a number of different distributions as cases vary. I’ve gotten quite fond of flash gifs to show these sorts of analyses. I’ll list the cases shown in the flash-gif below and discuss the cases below the graphic.

Figure 1. Graphic with different distribution assumptions. The cases analysed: whether one or two gridcells are used for comparison, Luterbacher inclusion; degrees of freedom (Neff); SI versus replicated correlations. The graphs appear in the following order:
1. One pick; Luterbacher in; Neff=140
2. One pick; Luterbacher out; Neff=140
3. Pick two daily keno; Luterbacher out; Neff=140
4. Pick two daily keno; Luterbacher out; Neff=110
5. Pick two daily keno; Luterbacher out; Neff=110; absmax (replicated correlations not SI)
6. Pick two daily keno; Luterbacher out; Neff=64; absmax (replicated correlations not SI)

The SI refers to a couple of correlation benchmarks as 90% and 95% percentiles (actually Mann calls this “significance”, but I’m going to use the more neutral term percentile for now.) The SI doesn’t explain where these benchmarks come from, but I’m familiar enough with the ecology here to have a pretty good idea. A fairly reasonable rule of thumb is that the Fisher transformation of a distribution of correlation coefficients has a normal distribution with sd=1/sqrt(N-3), where N is the number of degrees of freedom. The Fisher transformation can be represented in r by the atanh function and its inverse by the tanh function. If you calculate the tanh of the 90% and 95% percentile of a normal distribution with sd=sqrt(1/140), you get correlation hurdles that closely match the Mann numbers. The tanh function in this range is very close to unity and maybe the values are just the qnorm percentiles, which are a titch higher. Regardless, this is almost certainly where the standards come from. I’ll apply this to consider some permutations for deriving alternative benchmarks.

tanh( qnorm(.95, sd=1/sqrt(140))) #[1] 0.1381269
tanh( qnorm(.9, sd=1/sqrt(140))) #[1] 0.1078893

There are some features to this graphic that I find pretty interesting.

First, the Mann distributions have a very noticeable bimodal distribution. In the pick two daily keno cases, I made two columns each 10,000 long simulated from rnorm(sd=1/sqrt(140)) etc and then picked the value with the highest absolute value. This picking procedure yielded a bimodal distribution. So it seems pretty plausible that the bimodal distribution of Mann’s correlation coefficients has something to do with the pick two procedure and that this needs to be allowed for in estimating statistical significance.

Second, the Luterbacher correlations are all implausibly high as representatives of a “proxy” populations – they show up as a peculiar bulge on the far right of the distribution. They have instrumental data in them and cannot be used in tests of the ability of proxies to capture signal. In graphics 2-6, these correlations have been excluded, as they should have been in the original article.

Third, I can’t replicate many correlations in the SI. I can replicate some almost exactly – so it’s a bit puzzling right now. My next nearest gridcell algorithm is a little different than Mann’s; I’m aware of the difference; I can’t see why the present differences would be material, but will re-visit this at some point. So I may re-issue my absmax calculation at some point. The bulk of the differences arise in the problematic Briffa MXD network.

Fourth, the value of Neff=110 comes from an aside in the SI. Mann says (in effect) that, because of
“modest autocorrelation”, what seems to be a 90th percentile is actually a 87.2 percentile. I can get this result using my interpretation of how he derives benchmarks by Neff=110 as shown below:

tanh( qnorm(.9, sd=1/sqrt(140))) #[1] 0.1078893
tanh(qnorm(.872, sd=1/sqrt(110))) #[1] 0.1078820

The combination of allowing for autocorrelation, allowing for pick two daily keno, Luterbacher exclusion and calculated correlations, makes the Mann distribution look increasingly like draws from random data. If Neff is reduced to 64, one gets an almost exact match.

The article contains no analysis of autocorrelation in this network. I’ve examined many individual series and it is impossible to baldly assert that autocorrelation is “modest”. Autocorrelation varies hugely between “proxies”; it is very low in some series (e.g. Luterbacher) but it is immense in others. It’s an issue that needed to be worked through. At this point, and these are just notes, it might be pretty hard to show that the observed correlation distribution could not come from pick two daily keno, if more autocorrelation assumption is allowed for.

There are still some other shoes to drop. All these correlations use Mannian infilled proxy data, Mannian infilled temperature data and truncated Briffa MXD series. If true Briffa MXD values are used, the Briffa MXD correlations are going to drop like the value of a subprime mortgage fund.

Sea Ice – End of Game Analysis

On July 2, I started this popular series of threads as follows:

For anyone who’s betting that 2008 meltback will exceed 2007 meltback, I think that you’ll be able to pretty much know where you stand by the end of this week and your chances are not looking good right now based on this week’s exit polls. Another Climate Audit first.

The 2007-2008 gap was then just over 500,000 km. I don’t pretend to have any expertise in this topic, but I think that my early July predictions on this stood up better than those of the official organizations.

Thanks to TAC for keeping an eye on things.

As observed by commenters, 2008 levels were still exceptionally low and certainly cannot be interpreted as disproving anything about long-term trends. Indeed, quite the opposite. The most that can be said in the opposite direction is only that the baby-ice models all predicted a huge decrease in 2008 relative to 2007 and considerably more baby ice survived than initially projected.

The Mystery in Kenya

Today’s Mannian mystery takes us to Kenya Tanzania, not to Kilimanjaro itself, but to the great plains, still home to prides of lions, herds of wildebeest and giraffes.
Continue reading

Mann 2008: the Briffa MXD Network

Those who forget the past are condemned to repeat it. The SI for MBH (and Mann et al 2007) had incorrect geographic locations for numerous proxies.

The same error is repeated in Mann et al 2008, a defect encountered by Jeff Id and myself in trying to replicate reported correlations to gridcell temperatures. Nearly 100 “Schweingruber” MXD series have incorrect geographic locations in the SI; strangely occasional “schweingruber” MXD have correct geographic locations. It’s the usual dog’s breakfast.
Continue reading

Say My Name

In an online trailer for a new climate documentary, James Hansen, presumably exhausted from answering “niggling questions” at a gala Lehman Bros dinner tells the film-maker:

I’m not going to use McIntyre’s name.

The problem of name usage has been recently considered in several important philosophy workshops and conferences. On the top right, I linked to Knowles et al, 2007, a recent presentation on the name usage problem to a large academic conference. Readers should not be deterred by the advanced mathematics. On the bottom left, I linked to a workshop presentation by Mathers et al. (with a highly reconizable nickname for M&M readers.) An earlier treatment of the topic is at (Knowles et al 1999, 2000). The say my name issue is also considered passim in Jones et al (1996)

   
   

New Light on the Lost Cedars of Gaspé

A data set that was almost as controversial in MBH98 as the Graybill bristlecones was the Gaspé cedar chronology used by Jacoby and d’Arrigo. An interesting new cedar chronology from Quebec has just appeared at NCDC, shown below. The third chronology shown below is an unreported update to the Gaspé series. I reported the unreported update in a 2005 post (see comments by Martin Wilmking, a young dendro interested in the divergence issue). Also see 2007 discussion here.


Figure 1. Three Quebec White Cedar Chronologies.

The red chronology is cana036 (St Anne’s River), a chronology used in Jacoby and D’Arrigo 1989, 1992; a chronology that was as important in the AD1400 network as the bristlecone PC1. (We discussed it at length in MM2005 (EE); our attention was attracted to it because, out of 435 series in MBH, it was the only series where early values had been “infilled”, presumably in order to “get” it into the troublesome 1400 network.)

The green chronology is a digitized version of an unpublished update that I obtained somewhat by accident. As I reported in some early CA posts here, here and here , Jacoby and d’Arrigo did not publish the updated information, refused to provide a digital version of the update and refused to identify the location of either the original site or the update. In the linked post, I provide Jacoby’s “a few good men” explanation of why not all data should be reported; D’Arrigo is the dendro who explained to an astonished NAS panel that “you have to pick cherries if you want to make cherry pie”. Jacoby was very big on the idea that trees could teleconnect to world climate bypassing local climate, an idea that achieved its most Rococo implementation in the Mann papers. The HS shaped Gaspé chronology is used in Rutherford et al 2005, Mann et al 2007 and Mann et al 2008.

The Gaspé chronology was never published in a proper study – the only publication that I’ve seen is by Sheppard and Cook in a recreational magazine, (Sheppard, P.R., Cook, E.R. 1988. Scientific value of trees in old-growth natural areas. Natural Areas Journal 8(1):7–12.)  where they observe that little was known of cedar chronologies and other studies. Some of the leading cedar specialists in the world are at Ross’ university (Guelph) and they explained to us in 2004 that cedars grow best in cool moist years.

The new data (C. Dagneau and D. Duchaine) goes from 1540-2005 and is also white cedar from Quebec. It has a 0.41 correlation to the withheld Gaspé update and only 0.12 to the questionable data used in Mann and other studies. (Note: The elevation of the timbers in this study is not known. Neither is the elevation of the cana036 (St Anne’s River) series with the pronounced HS shape and, as noted, above, Jacoby refused to provide information on the location. Presumably the unreported update version was at similar elevation to the original Cook sample, whatever that was. Rob Wilson has also incorporated European historical timbers with uncertain provenance into altitude chronologies, though I’ve not determined whether there are considerations that he employed that are inapplicable here. A proper comparison obviously requires updating the proxies – something long overdue in Gaspé and the reason why I originally requested location information from Jacoby, as it was then my intent to re-sample these trees.)

I recently reported on the inconsistency between new data versions and Team data versions at Sheep Mountain, Tornetrask and Polar Urals. Gaspé is one more example. It is hard for me to see how objective scientists could use such data without showing its validity.

Update: Reader Reference observes:

This site looks interesting, http://www.foretgaspesie-les-iles.ca/ Ah now, how about this pdf report? Regeneration dynamics of Thuja Occidentalis L. in old mesic cedar stands on the Gaspé Peninsula. A natural regeneration study of Eastern White Cedar? Always wanted to know why it is so hard to grow Thuja sp. from seed. (OT).

Wow, look at the spiral twist on those trunks, need to take care orientating the core barrel there. Let’s see now, where are these beauties located? Rivière Dartmouth watershed 49 01’N 64 50’W, 8% slope, (click on the Terrain button and Zoom out a bit). Inside the Forillon National Park? Oops, I’ll probably need permits to study there….

Let’s try this location in the Réserve écologique de la Grande Rivèrie watershed 48 36’N 64 49’W (definitely need permits here!), 4% slope circa 200m elevation. Mesic slope?, mineral flush, plenty of ground water, all a Thuja needs for a long and happy life.

Update 2; My correspondence asking for data;
Here is correspondence regarding Gaspe prior to CA being started. It’s as polite as anyone could ask. And CA didn’t exist. After a year of effort, I got nothing. This obviously not an isolated incident.

3/19/2004
Dear Dr Cook,
I note that you collected this site (cana036) many years ago. I was wondering if you published on this site and, if so, could provide me a reference. Thanks, Steve McIntyre

3/22/2004
I have not published anything about this chronology. Gordon Jacoby and Rosanne D’Arrigo have used it however in some of their climate reconstructions. You will need to contact them for references.
Ed

3/22/2004
Thanks for the reply. I’ve seen the Jacoby-d’Arrigo references and, in fact, that’s what occasioned my interest. It was included in their “northern treeline” index – which seemed a little odd to me, since the site is far from the treeline. I also notice that the earliest portion of the chronology (not used by Jacoby and d’Arrigo) is based only on one tree. If you were doing the chronology today, would you include the portion based on only one tree in your site chronology? Also do you know (from past notes or otherwise) any details about potential logging or other forestry operations in the area?
Thanks, Steve McIntyre

3/29/2004
Dear Ed,
Curiously, this site has an extraordinarily large (and disproportionate) influence in the results of Mann et al (1998). I’m planning to get a tree ring specialist from Quebec to re-visit the site. Do you by any chance have a map (or other description) of your sample locations which you could send me?

Also, the early part of the archived chronology is based on only one tree. Would it be fair to say that if you were to re-do the chronology today, you would not publish the portion of the chronology relying on only one tree?
Thanks, Steve McIntyre

4/12/2004
Hi, bringing forward this inquiry again and checking whether you had a map of the sample locations for cana036? Thanks, Steve McIntyre

4/14/2004
Dear Rosanne,
I [understand] that there is some data extending Ed Cook’s archived data (ending in 1982) up to 1991. It is highly relevant to some studies that I am currently carrying out and I would appreciate the updated series version both in crn and rwl forms. Thank you for your attention.
Regards, Steve McIntyre

4/14/2004 [communicated from Rosanne]
the data you have are probably superior with regards to a NH signal.

5/5/2004
Dear Dr. Cook,
I was hoping that you could attend to this inquiry. I was hoping to get to this site in June or July. It’s also my understanding that other unarchived data from Gaspe has been collected by LDEO and I would appreciate information on this as well. Thank you for your attention.
Regards,
Steve McIntyre

8/23/2004
Dear Dr. Cook, I’ve run across short discussions of this chronology in Sheppard and Cook, Natural Areas Journal (1988) and again in Cook and Peters (1987). I would like to arrange for someone to visit this site prior to winter and would appreciate particulars on its exact location.
In the Natural Areas Journal article, you also reference a cedar site in Michigan which has not been archived. I presume that the pending cedar site in Maine refers to Sag Pond – is this correct?
Regards, Steve McIntyre

9/24/2004
will send something to you next week.
Ed

10/15/2004
Any progress with this?
Steve

10/15/2004
Hi Steve,
I will do my best next week. I have been a bit over the top on things lately.
Ed

11/16/2004
Any progress on this?
Steve

1/31/2005
Dear Dr. Cook, as I mentioned in my email to Connie Woodhouse, I would appreciate a listing of the sites used in your interest recent article in Science , Cook et al [2004], preferably in a format that includes ITRDB codes where available. Connie Woodhouse mentioned that you travel frequently – which is certainly evident from the varied places that you have reported on. I think that it would be a good idea to simply archive the listing as an additional SI, but in any event, I would appreciate the listing. Thanks, Steve McIntyre

PS if you’ve had an opportunity to locate the exact location of the Ste Anne River, Gaspe tree series, I would appreciate it. I’ve had no luck getting the 1991 update to this series from Dr Jacoby, all of which is quite frustrating, and lends itself to criticism.

2/4/2005
Dear Connie, I’ve sent a request to Cook without any acknowledgement. In view of Cook’s previous behaviour, I do not think that the problem arises from Cook’s travel. In your capacity as a co-author, I re-iterate my request for identification of the sites and, if you do not have the information, request that you take responsibility for obtaining the information and then forwarding to me. I’m tired of sending unacknowledged emails to Cook. Regards, Steve McIntyre
Update 3:
Francois and Bender, I might have found the Gaspe cedar location. Take a look here here . Right description, right location AND, like Graybill sites, easy access.

Original Caption: headwaters of the Ste Anne River

On the morning of our second day, we started our ‘serious’ exploration of the Gaspé by driving north up the 86-mile valley of the Grand Cascapedia River, which rises near the Chic Choc mountains in the central part of the peninsula. Highway 299 is a great road, with very little traffic, winding its way between the fast-flowing river and forest covered hills.

It was about 11:30 AM by the time we reached the Gite du Mont-Albert area of Gaspésie Provincial Park, the main starting point for tourist activities in this part of Quebec, renowned for its hiking and winter-skiing activities.

After paying our C$3.50 per person day-use fee at the Interpretive Centre, Sue and I confirmed that the higher trails leading into the peaks were still closed due to snow depth, so we decided to at least try the relatively short ‘Belvedere (Lookout) de la Lucarne’ route. Here, Sue is starting up the trail through the forest shortly after noon and the 2nd photo shows me with some of the peaks in the background as we emerge in a clearing higher up the slope. The 3rd photo shows one of the trail signposts to help keep us sorted out on the interconnecting system (along with a small map that I had printed from the internet before leaving home). In less than a half-hour, we had reached the wooden ‘belvedere’ on a small rise, where we had great views of the mountains in all directions (4th and 5th photos). Because the trails are interconnected, we decided to continue onward down the slopes to the nearby Sainte-Anne River and circle back to the Gite area by a different route….

We had noticed many piles of Moose droppings as we ascended and, sure enough, only a few minutes after leaving the belvedere we stumbled upon one of these large beasts browsing beside the trail. It was as surprised as we were and ambled off into the forest before I could draw my camera!

After descending from our Lucarne ‘lookout’ perch, we crossed Highway 299 and quickly encountered the narrow upstream reaches of the Sainte-Anne River as seen here. A very well-built pedestrian bridge (2nd photo) allowed for easy crossing and we were soon exploring along the banks of this fast-flowing and clear body of water. We had brought a cooler with us when we left on this Gaspé trip and used supplies from it to make ourselves some cheese and tomatoe sandwiches before setting of on the hike. There were not a lot of dry places to sit in the forest this early in the season, but we managed to find some boulders beside the river to use as seats while we enjoyed an early afternoon picnic and the sound of rushing water (3rd photo).

On our way back to our parked car, we continued along this side of the river before crossing a second foot-bridge to return to our starting point. Along the way we came across many places where winter snow was still hanging on in the shadows of the forest (4th photo) and also a few diversions off the trail because of winter blow-down trees (5th photo). Shortly after skirting that large specimen snapped off at ground level, we met two Park maintenance workers heading toward it with a chainsaw as they carried out their clean-up duties prior to the real start of the tourist season. By 2:30 PM, we were in our car and headed for the north coast, planning to stay in Ste.-Anne-des-Monts where our little stream finally reaches the St. Lawrence River.

The Mann Correlation Mystery

Here’s another interesting mystery in Mann et al 2008. Their SI table rtable1209 reports correlations to 1850-1995 instrumental temperature. The correlations reported in their PNAS SI Table SD1 sets all but 484 “significant” values to NA, so the r1209 table is more comprehensive. The instrumental version supposedly used in their calculations is now archived at WDCP (hooray), though it wasn’t archived in their Penn State SI, and this can be used to test reported correlations. In their SI, they report lat-longs of all 1209 series.

I calculated 1850-1995 correlations between proxies and corresponding gridcell values and compared to reported values, yielding the graphic below (color coded by proxy “type”). As you can see, many correlations match exactly up to rounding) – showing that the calculation is grabbing the “right” things for many of the series, but many don’t. There is also a remarkable pattern in the differences, which is evident as soon as you look at the graphic and which I’ll discuss below.

Figure 1. Scatter plot of calculated correlations from "infilled" 1209 series and corresponding gridcell temperatures

The pattern is this: if the Mann correlation is positive, then the reported value is equal to or greater than the value that I calculated; while if the Mann correlation is negative, the reported value is less than or equal to the value that I calculated. So if you multiple the difference by the sign of the Mann correlation, you get the following highly non-random pattern:

Figure 2. Difference between reported correlation and calculated correlation, multiplied by sign of Mann correlation

If one now looks at histograms of the correlation, one gets the following. The reported correlations are “hollowed out” around 0, yielding a somewhat trimodal distribution. The bump out on the high end of the correlations comes from the Luterbacher series, which use instrumental data. These are claimed in the 484 “significant” correlations, though they are unrepresentative of proxy series used in the MWP. There are also a lot of Briffa MXD gridded series in the 484.

Figure 3. Histograms of reported and calculated correlations

There’s a tricky little comment in the SI which I’m going to investigate in this connection:

To pass screening, a series was required to exhibit a statistically significant (P > 0.10) correlation with either one of the two closest instrumental surface temperature grid points

This would tend to hollow out the distribution as there are two chances at a “significant” correlation. So far, I haven’t figured out how (or whether) Mann adjusted his significance benchmark for this double dip and would be interested in any reader thoughts on this (see the SI.)

Jeff Id

Another interesting post from Jeff Id: http://noconsensus.wordpress.com/2008/09/20/online-experiment-with-the-latest-hockey-stick/

Mann 2008: the Luterbacher Mystery

Jeff Id has identified another intriguing mystery in the arduous problem of determining what Mann’s realdata was. Jeff observed that the version of the Luterbacher lutannt10 series in Mann’s infilled data version allproxy1209 was different than the version of lutannt10 in allproxyoriginal (Sep 5 version), illustrating this as below:

I’ve verified this information; below is my replot of lutannt10 versions:

I also compared versions of this series in the 6 versions of Mann’s data archived over the past week (3 infilled versions, 2 distinct; 3 ‘original’ versions, 2 distinct). Two different versions of lutant10 occur in the 4 distinct versions. The version labelled “1209” above occurs in the Sep 4 allproxy1209, Sep 5 allproxy1209 and WDCP proxy-infilled data sets, while the version labelled “original” occurs in the Sep 4 allproxyoriginal, the Sep 5 allproxyoriginal and the WDCP proxy-original data versions i.e. one version occurs in all infilled versions and an inconsistent version appears in all “original” versions.

The Luterbacher data is not itself archived and so there is, at present, no way to verify which is the “original” ‘original’ realdata. (I think that Hans Erren tried to obtain Luterbacher’s data a couple of years and was unsuccessful.)

Whatever the result, I guess that we’re going to see still one more version of either the infilled or original data and maybe even get the real realdata. At that point, we won’t just have the “original” data; we’ll have a Team of original data.

BBC Climate Wars Part 2

Discussed here with embedded youtube clips. Here is a clip on the Stick. Try not to puke.