Detrended in Amherst

Wahl et al [2006 ] fulminated as follows :

The VS04 results have been interpreted to cast serious doubt on the MBH reconstruction. … However, these results are in large part dependent on a detrending step not used by MBH, which is physically inappropriate and statistically not required. The take-away message for the climate community should be strong encouragement for more vigorous cross-comparisons of the various reconstruction implementations, based on real-world proxy series, model emulations, and simulated modifications to real-world data. Such a step would help eliminate unnecessary confusion that can distract from the crucial contributions of climate change research to important scientific and policy questions.

Quite aside from the issue of whether trending or detrending makes any difference to the VZGT results (I’m convinced that the issue is immaterial to their principal point), it grates me no end that Mannians can seemingly with a straight face suggest to others that cross-comparisons are a good idea as a means of avoiding confusion, after it has required years of quasi-litigation to gradually unpeel details of Mannian methods. I want to go back over some of our correspondence with Nature. I started doing this because I recalled some curious issues of trended-vs-detrended arising in our Materials Complaint. If the implementation of trending-detrending matters to anything, then Nature should step and take its share of the blame for failing to respond to very specific issues for methodological clarification. There is also a rich irony in this, because Mann’s justification for not providing proper methods was that Zorita et al 2003 had managed to replicate their results – a claim still extant in the Corrigendum SI.

In the wake of Wahl et al., realclimate piled on as follows:

Von Storch et al. claimed to have tested the climate reconstruction method of Mann et al. (1998) in model simulations, and found it performed very poorly. Now, Eugene Wahl, David Ritson and Caspar Amman show that the main reason for the alleged poor performance is that Von Storch et al. implemented the method incorrectly. What Von Storch et al. did, without mentioning it in their paper, was to remove the trend before calibrating the method against observational data – a step that severely degrades the performance of Climate Field Reconstruction (CFR) methods such as the Mann et al. method (unfortunately this erroneous procedure has already been propagated in a paper by Burger and Cubasch (GRL, 2005) where the authors refer to a personal communication with Von Storch to justify the use of the procedure). Another more recent analysis has shown that CFR methods perform well when used correctly. (See our addendum for a less technical description of what this is all about). How big a difference does this all make? The calibration error in the temperature minimum around 1820, where one of the largest errors occurs, is shown as 0.6ºC in the standard case of 75% variance in the Von Storch et al analysis. This error reduces to 0.3ºC even in the seriously drift-affected ECHO-G run when the erroneous detrending step is left out.

I’m more inclined to think that detrending is an option and that one would want results to be robust to such methodological decisions. But let’s look at some the history just for amusement. Prior to the publication of MM03, I politely asked Mann if there was a more detailed description of methodology – correspondence here. (I also asked him to confirm that the data was the data actually used in MBH98.) Mann cited Zorita et al 2003 as a reason why he didn’t feel obliged to describe methods in more detail:

Owing to numerous demands on my time, I will not be able to respond to further inquiries. … Other researchers have successfully implemented our methodology based on the information provided in our articles [see e.g. Zorita, E., F. Gonzalez-Rouco, and S. Legutke, Testing the Mann et al. (1998) approach to paleoclimate reconstructions in the context of a 1000-yr control simulation with the ECHO-G Coupled Climate Model, J. Climate, 16, 1378-1390, 2003.]. I trust, therefore, that you will find (as in this case) that all necessary details are provided in the papers we have published or the supplementary information links provided by those papers.

In Mann’s first response to MM03, we see the question of detrended-nondetrended turn up as follows:

MM also appear to inconsistently use standard deviations of un-detrended data, while MBH98 had normalized their EOFs by detrended gridpoint standard deviations.

Yes, you read that correctly. Here Mann is accusing us of using non-detrended data, whereas the "correct" Mannian approach was using detrended data. Now the procedure is different, but one can see why anyone encountering Mannian controversial material might conclude that detrending was part of Mannian methodology. Shortly afterwards, Mann et al submitted an article to Climatic Change (rejected by Stephen Schneider if you can imagine) in which they described "Important Technical Errors" alleged to have been committed in MM03, including the following:

MM03 also appear to have estimated gridpoint standard deviations from the un-detrended surface temperature data, while MBH98 had normalized their EOFs by detrended gridpoint standard deviations.

Again, you read that right. After publication of MM03, we tried again, quite politely, to obtain source code so that any avoidable methodological questions could be avoided and were once again rebuffed – correspondence here

We then filed a complaint with Nature – anyone remember Mann’s 159 series? We asked for results of the individual steps and to this day there is no information on the results of Mann’s individual steps in calibration-verification. Here is our complaint to Nature, in which we said:

Prior to the publication of our article, we requested other particulars on the computational methodology from Professor Mann and were refused. Accordingly, we attempted to assess the impact of the data problems by following the methodology publicly disclosed in MBH98. Professor Mann then criticized us for failing to replicate previously undisclosed details of his methodology. We once again requested particulars on his methodology, including copies of the computer programs used to read in the proxy and temperature series and to produce the Northern Hemisphere temperature index”¢’¬?but we have been categorically refused.

The policies of Nature rightly place a burden on authors to disclose data and methods to any interested readers. We have been systematically and deliberately stymied by Professor Mann on the most elementary requests: a proper listing of his data series and the exact computational procedures used. In the process of trying to obtain this information we have concluded that the disclosure at the Nature SI site is not merely inadequate, but in some cases it contradicts what is now revealed at the University of Virginia FTP site.

Among the listed items in our Materials Complaint to Nature was the following:

10. The disclosure of methodology for calculating tree ring principal components is inaccurate. Again MBH98 methodology is not "conventional". In this case, the FTP site contains computer programs which show that the data was transformed in ways not disclosed in MBH98. These undisclosed transformations have a material impact on the final results.

Nature promised to have our Materials Complaint independently reviewed, but failed to comply with that promise and dealt with it internally. In February, Mann replied to the Materials Complaint including the following response to # 10 above (obviously I disagree with the answer, but I’m trying to move along):

Each of these statements in incorrect. A conventional PCA was indeed used. The authors apparently failed to take note of the stepwise procedure used by us, and described in our paper. This procedure allows PC series to be calculated independently for each sub-interval (e.g. 1820-1980, then 1780-1980, …, 1400-1980) to allow for the use of an increasing number of data in the different sub-networks increasingly later in time. The misunderstanding of this procedure led to them eliminating roughly 80% of the proxy indicators used by us prior to AD 1600, the primary reason for the spurious result that they have reported. Precise details regarding how the data were standardized are provided in the revised supplementary information. We have shown elsewhere that the MBH98 reconstruction is in fact entirely robust with respect to whether or not the proxy series were standardized by the detrended or raw calibration period variance.

Once again, all of the original proxy data used, and all of the PC series used, were available on the public ftp site from July 2002, though the complainants did not download and use the correct data. The new, revised ftp site provides the data and listings of data in a thoroughly documented manner such that similar mistakes should not be possible in the future.

Here Mann confounds the de-centering issue with the unusual stepwise PC calculations, which were being disentangled at the same time. Anyway, shortly after receiving Mann’s response, Nature advised us that they would require a Corrigendum and in March sent us a copy of the Corrigendum (which differed in one important detail from the one finally published in July 2004). We provided details comments on the proposed Corrigendum and even discussed detrended-nondetrended as follows:

2. MBH98 stated that for the temperature data “the mean was removed, and the series was normalized by its standard deviation”. Recently, Mann et al. stated that they used “de-trended gridpoint standard deviations” to normalize temperature data. Again, in view of the inaccurate prior description, a line item in the Corrigendum would appear to be warranted.

The issue also came up in Mann’s Reply to our submission to Nature, which would have been read by the referees, as follows:

We also show that our re-standardization of all indicators in the MBH98 network by their detrended standard deviation during the calibration period, prior to calibration and reconstruction did not significantly influence the MBH98 reconstruction (line 3, Figure 1c). This latter step was motivated by the fact that 20th century trends in instrumental and proxy data typically far exceed the expectations for a ‘red noise’ null hypothesis (6). Normalizing by the detrended standard deviation therefore more properly weights the data series with respect to their estimated noise variance….

MM04 criticize the PC representation of the North American ITRDB data which was based, for the period AD 1400-1450, on an EOF analysis of the 70 constituent series which were standardized, as discussed earlier, by their detrended calibration period standard deviation.

Caption to Figure 1: Alternative versions of MBH98 reconstruction (shown for AD 1400-1500 period) in which (3) indicators have not been restandardized by detrended calibration period variance and (4) time series of the reconstructed instrumental eigenvectors have not been standardized to have same variance as the corresponding instrumental eigenvectors during the calibration period.

Anyone reading this might conclude that detrending was part of Mannian methods. In the Corrigendum SI, detrending was again mentioned as follows:

All predictors (proxy and long instrumental and historical/instrumental records) and predictand (20th century instrumental record) were standardized, prior to the analysis, through removal of the calibration period (1902-1980) mean and normalization by the calibration period standard deviation. Standard deviations were calculated from the linearly detrended gridpoint series, to avoid leverage by non-stationary 20th century trends. The results are not sensitive to this step (Mann et al, in review)

Here we have another example of Hockey Team cheque-kiting. The citation "Mann, in review" presumably was to the MBH submission to Climatic Change, which was probably rejected by the time that Corrigendum SI appeared (and this has not been changed in the Corrigendum.)

Amusingly the Corrigendum SI cites Zorita et al 2003 approvingly here and here :

Note that an additional, independent application of the methodology of MBH98 can be found in the following publication:

Zorita, E., F. Gonzalez-Rouco, and S. Legutke, 2003: Testing the Mann et al. (1998) approach to paleoclimate reconstructions in the context of a 1000-yr control simulation with the ECHO-G Coupled Climate Model, J. Climate, 16, 1378-1390.

We’ve described the schmozzle with Nature in which they cut and cut and the word limit down eventually (without notice) to 500 words and said that our article was too "technical". Referee #2 of the second review here said:

I would encourage them to pursue their testing of MMB98,and by the way other reconstructions. As I wrote in my first evaluation, this should be a normal and sound scientific process that should not hampered. For instance, questions that seem to be quite critical, such as the sensitivity of the MBH98 reconstructions in more remote periods to changes or omissions in the proxy network or the dependency of the final results to the rescaling of the reconstructed PCs, have become clearer to me now. From the reply in MBH04 I am now afraid that they were not sufficiently described in the original MBH98 work. In particular the PCs renormalization, could have been included as clarification in the recent Corrigendum in Nature by MBH.

I found the last comment intriguing as that evidenced to me that the Corrigendum had not been peer reviewed by our referees and, if not by them, then it was NOT independently peer reviewed. I tried to get Nature to answer this, but they evaded the question, but Marcel Crok eventually got them to admit that the MBH Corrrigendum was not peer reviewed. BTW Referee #3 is someone familiar. I also know who Referee #2 is. I’m guessing that Referee #1 (who was added only after referees #2 and #3 recommended our first submit) is Osborn or Briffa or someone in that crowd.

After seeing the inadequate information in the Corrigendum SI, we tried once again with Nature, this time buttressed by the seeming support, we re-iterated our request to Nature for Mannian details, still unavailable in the Corrigendum SI:

the referees expressly encouraged us to continue our analysis of MBH98 and of multiproxy calculations generally and one of them expressly stated that our efforts should not be “hampered”.

Under the circumstances, we believe that the full data set and accompanying programs for MBH98 should now be included in the Nature Supplementary Information, along with an accounting of any discrepancies between what has been listed at Nature.com to date and what was actually used in MBH98…. [including] "the results of the 11 “experiments” referred to in MBH98" ….

Nature refused even to require Mann to disclose the results of the individual steps in the calibration and verification periods. This remains shocking to me.

And with regard to the additional experimental results that you request, our view is that this too goes beyond an obligation on the part of the authors, given that the full listing of the source data and documentation of the procedures used to generate the final findings are provided in the corrected Supplementary Information. (This is the most that we would normally require of any author.)

A little later, realclimate reported that detrended standard deviations were used in Mannian PC analysis:

Eigenvalue spectrum for Mann et al (1998) PCA analysis (1902-1980 zero reference period, data normalized by detrended 1902-1980 standard deviation):

In passing, Mann and Jones 2003 mentions the use of detrended data variance:

While our “Åstandard’ reconstruction involved area and local-correlation weighted composites, the sensitivity to the weighting scheme was also examined. Calibration resolved variance (“ÅRE’) [see e.g., Mann et al., 1998] was conservatively estimated from the detrended decadal data variance resolved in the instrumental record.

IPCC TAR mentioned in connection with a figure that:

All series were linearly detrended prior to analysis, and spectra computed using a standard Tukey window with the window width (maximum lag used in the estimate) set to one-fifth of the series length,

The issue of whether to choose detrended or non-detrended correlations can be traced back nearly a century in economic literature. To the extent that VZGT were guessing at Mannian procedure, it was not an unreasonable guess, especially based on then-contemporary excoriations by Mann against us for using non-detrended standard deviations in an emulation, excoriations which would (ahem) have been familiar to a Nature referee, who might have been misled by them if he went on to consider such matters.

But no one should be required to guess. I don’t think that the Wahl et al point matters a damn. Even if there is an error in methodology and even if it mattered to the conclusion, Mann has had lots of opportunities to provide accurate information on his methodology and failed to do so. To this day, no one even knows the results of the individual steps. But the most preposterous aspect of this dispute is the realclimate suggestion that Mann is owed an apology.

The people who should be apologizing right now are Nature – they had an opportunity and a justification for requiring Mann to archive both source code and intermediate results necessary to verify results and they failed to do so.

This entry was written by Stephen McIntyre, posted on May 5, 2006 at 2:54 PM, filed under MBH98, Replication and tagged ritson, storch. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

26 Comments

Doug L

Posted May 5, 2006 at 4:29 PM | Permalink

This highway leads to the shadowy tip of reality: you’re on a through route to the land of the different, the bizarre, the unexplainable…Go as far as you like on this road. Its limits are only those of mind itself. Ladies and Gentlemen, you’re entering the wondrous dimension of imagination. Next stop….The Twilight Zone.”
Steve McIntyre

Posted May 5, 2006 at 5:28 PM | Permalink

It is a common view that the term "AlGorithm" derives from Arabic, but I presume that the term is actually derived from the name of the erstwhile inventor of the Internet (also developer of the computer language AlGol). So when Mann said that he would not be intimidated into revealing his AlGorithm, think Austin Powers and his MoJo.
David Stockwell

Posted May 5, 2006 at 6:17 PM | Permalink

So does the Mann proceedure use detrended data in a variance estimation step, but “un-detrended” data elsewhere? BTW what on earth is “un-detrended” data? Detrended then undetrended again?
ET SidViscous

Posted May 5, 2006 at 6:19 PM | Permalink

I wonder if there is a mini Mann
TCO

Posted May 5, 2006 at 6:20 PM | Permalink

From SI: “…nearly identical results are
obtained through the use of a 1902-1971 calibration period (Mann et al,
in review).” Did this paper ever come out?
Steve McIntyre

Posted May 5, 2006 at 6:21 PM | Permalink

#4. Yes. Think about it.
Steve McIntyre

Posted May 5, 2006 at 6:26 PM | Permalink

#5. That’s submission to Climatic Change by MBH in 2004. I was asked to review. I’m sure that I’ve reported on all this. In my capacity as a reviewer, I asked for the supporting data and code. Schneider said that no one had ever asked for such things in 28 years of editing and it would require a change in editorial policy. I asked that the policy be re-considered. They agreed to establish a data policy requiring authors to provide supporting data but did not agree to code and would not even ask for code. So I asked for supporting data. Mann refused once again; so I pointed out to Schneider that Mann had failed to comply with their newly minted data policy and that was the end of the article. By this time, Mann had kited a check on the submission (and it seems another check.) Jones and Mann 2004 cited the rejected submission to supposedly repudiate MM03 and then could cite Jones and Mann 2004. What do the facts matter when you can cite something?
- Skiphil
  
  Posted Dec 7, 2012 at 3:22 PM | Permalink
  
  More fascination with the checkered career of new AGU Fellow Michael Mann. Doesn’t this history resonate in some interesting ways with the recent episodes of Gergis, Karoly et al (2012)….??
TCO

Posted May 5, 2006 at 6:35 PM | Permalink

But the Corrigendum also needs updating. 🙂
TCO

Posted May 5, 2006 at 6:36 PM | Permalink

Do you remember if the assertions in the Corrigendum (I’m assuming the different references to a paper in review are to the same paper) where borne out by Mann’s submitted paper?
Brooks Hurd

Posted May 5, 2006 at 7:30 PM | Permalink

Steve,

If Mann were running a group of banks in the US and pulled these sorts of games, the Federal banking examiners would shut him down, sieze his assets and throw him into a Federal prison. Jake Butcher used to shuffle money from bank to bank just before the Federal bank examiners would arrive at one of his banks. From the perspective of that one bank, the books would balance. One day they hit all his banks simaltaneously. I believe that he is still in prison.

If Mann is so convinced that he is correct, then it would be a reasonable expectation that he be held to the same standards of proof as Jake Butcher. Jacob Franklin Butcher
bkc

Posted May 5, 2006 at 9:27 PM | Permalink

I wonder if there is a mini Mann

Doctor Schmidt?!
IL

Posted May 6, 2006 at 12:49 AM | Permalink

I wonder if there is a mini Mann

Amman?
Paul

Posted May 6, 2006 at 6:55 AM | Permalink

Wahl seems so forthright about not “de-trending” data. As your co-author pointed ouit in another thread, it doesn’t really matter whether you “de-trend” or not. You just need to be able to saisfactorily account for the variance in the data over time.

If there is an trend in the data the authors need to:

1. establish whether it is:
a) a stochastic trend (i.e. non-statiionary ties series), or
b) a deterministic trend (i.e. trend stationary time series).

2. Test over an estimation period that:
a) the errors (or residuals) are properly behaved (IID)
b) the constructed proxie series cointegrates with the historical temperature data (i.e. perform a Dickey Fuller/Augmented Dickey Fuller test)

Then can they proceed to validate the model over a post estimation period.

I have never seen any evidence that Mann, Wahl or any associate have approached either 1, or 2.
Peter Hartley

Posted May 6, 2006 at 7:06 AM | Permalink

“Oh! what a tangled web we weave
When first we practise to deceive!”

Sir Walter Scott, Marmion, Canto vi. Stanza 17
Steve McIntyre

Posted May 6, 2006 at 8:16 AM | Permalink

MBH98: correlation (r) and squared-correlation (r2) statistics are also determined

Mann Dec 2004: Our reconstruction passes both RE and R^2 verification statistics if calculated correctly.

Mann to NAS Panel Mar 2006: I did not calculate the verification r2 statistic. That would be a foolish and incorrect thing to do.
David Stockwell

Posted May 6, 2006 at 9:21 AM | Permalink

re: 15. One would be hard pressed to find an honest explanation for that.
Dave Dardinger

Posted May 6, 2006 at 9:43 AM | Permalink

re: #15 It’s simple. Others, the “fools” on the Hockey Team, calculated the r2 but Mann did not.
David Stockwell

Posted May 6, 2006 at 9:57 AM | Permalink

Re: #17. Oh I get it. Bradley or Hughes did the calculation of r2.
Lubo Motl

Posted May 6, 2006 at 12:24 PM | Permalink

Dear Steve,

I claim discovery or at least co-discovery of the fact that Al Gore invented algorithms. 😉

http://www.google.com/search?q=%22al+gore%22+%22only+discovery%22+algorithms

All the best
Lubos
Jean S

Posted May 7, 2006 at 6:30 AM | Permalink

re #13: In the caliberation period, MBH98 used a highly reliable test to confirm that residuals are indeed i.i.d. (Gaussian and uncorrelated). The test used is a standard Mannian procedure known to the rest of the world as “inspection” 😉 This is described in MBH98 as follows:

The spectra of the calibration residuals for these quantities were, furthermore, found to be approximately “white’, showing little evidence for preferred or deficiently resolved timescales in the calibration process.
Steve McIntyre

Posted May 7, 2006 at 6:46 AM | Permalink

#21. But let’s suppose that someone wanted to check the "inspection" through an actual statistic. We requested these very residuals in December 2003 or alternatively the results of the individual steps. Mann refused, the NSF refused, Nature refused.

The properties of the 1820 step differ from the properties of the 1400 step. Mann presumably noticed this in 1999 – hence the different confidence intervals in MBH99.

Did he even carry out a Mannian "inspection" on the residuals from steps other than AD1820? Or is this one more time, like the verification r2, where the results for the 1820 step wre reported but not the results for the problematic early step?

I wish that others would write to Nature and ask for the results of the individual steps so that these "inspections" can be checked.
Steve McIntyre

Posted May 7, 2006 at 6:54 AM | Permalink

A curiosity in the 2004 Corrigendum. Although the MBH98 confidence intervals were supposedly increased in MBH99, there is no mention of this in the Corrigendum. In fact, the confidence intervals here http://www.nature.com/nature/journal/v430/n6995/extref/FigureData/nhmean.txt are unchanged from MBH98.

A couple of days ago, realclimate was huffing and puffing about how an amendment published by VZGT in an "obscure" journal was an insufficient correction – leaving aside the issue of whether detrended-nondetrended is germane to the point. GRL is not "obscure", but neither is it Nature. If Mann found that the confidence intervals published in NAture were wrong, isn’t that the place that he should have pointed this out? And even if he had failed to previously do so, how could he justify not doing so in the 2004 Corrigendum?
Jean S

Posted May 8, 2006 at 5:27 AM | Permalink

re #15: Steve, you clearly fell into Mannian word trap 😉 Let’s see:

MBH98: correlation r and squared-correlation r^2 statistics are also determined

Yes, but there is no mentioning of those calculated for the temperature reconstructions.

Mann Dec 2004: Our reconstruction passes both RE and R^2 verification statistics if calculated correctly.

Now the reduction of error (RE), usually called the coefficient of determination and called “conventional ‘resolved variance'” in MBH98, is usually denoted with the capital letter, i.e., R^2. So RE=R^2 🙂 Also, even if you take R^2 to be the squared sample correlation coefficient, the result holds if “calculated correctly”! This is because in the simple linear regression (one predictor) R^2=r^2.

Mann to NAS Panel Mar 2006: I did not calculate the verification r2 statistic. That would be a foolish and incorrect thing to do.

Again correct. His computer may or may not have calculated those, very likely he did not do that himself 🙂 It would be foolish since his computer can do it much better, and incorrect because it just might show that there is no statistical skill in his recostructions.
jae

Posted May 8, 2006 at 5:32 PM | Permalink

What a shell game! You really do have to keep your eyes on the pea!
Michael Jankowski

Posted May 9, 2006 at 11:07 AM | Permalink

However, these results are in large part dependent on a detrending step not used by MBH, which is physically inappropriate and statistically not required.

So that detrending step was “statistically not required?” What ever happened to being statistically “robust?”

I have never seen any evidence that Mann, Wahl or any associate have approached either 1, or 2.

That’s because they’ve “moved-on.”

One Trackback

By Climategatekeeping: the Nature Intervention « Climate Audit on Jan 5, 2010 at 11:42 AM

[…] MBH98 Corrigendum (July 2004) also check-kited Mann et al (Clim Chg submitted) (see here) In passing, the MBH Corrigendum was not externally peer reviewed – a point directly […]

Climate Audit