Mann cited Ammann and Wahl’s recently released paper at NAS (which was not available to us in time for the NAS panel, although I’d seen and reviewed an earlier draft.) After reading it, Per said that he thought that the reviewers had done a lousy job.
Now I was only a reviewer for the first draft (after sending in my review, I seem to have been replaced; I never heard anything more from Climatic Change.) I provided many detailed comments that were simply ignored. Obviously I have an interest in the matter, but Schneider knew of my interest and presumably that was one of the reasons that he asked me to review. (I had previously reviewed a submission by MBH to Climatic Change in 2004, which has never seen the light of day. My review was detailed, so Schneider knew how I reviewed things. Actually my review of the MBH submission led to the introduction of a limited data policy at Climatic Change for the first time, so I had a positive contribution.) In this case however, Schneider disregarded most of my comments – which, by and large, pertain to objective things even though there is a controversial edge.
The only materially new sections of the revised article are the discussions of RE versus R2, low frequency versus high frequency, Appendixes 1-3 plus of course the table of verification statistics. Ammann and Wahl say that they have provided the verification statistics so they are "available to the community". Good and I’m glad that they saw the light. But let’s be clear about this – they were not added as a simple response to a reviewer request. They refused to provide this information to me as a reviewer and Schneider abetted this. Readers of this blog know that, as recently as December AGU, Ammann was still refusing to disclose the verification r2 and similar statistics. It was only because we filed a complaint with UCAR about misconduct in withholding the results and because I’ve got an audience at the blog and have hammered away at Ammann for attempting to withhold adverse results, that they disclosed the adverse results.
I’ll have quite a bit to say about the article itself in a few days. (I’ve still got to bring the NAS panel to the end of last Friday.)
No one should be under the impression that, if I’d been a little "nicer" to Ammann, that he would have done this on his own. I’ve made some very polite offers to Ammann and have been ignored. He fought disclosure to the bitter end. If you don’t believe me, read on. This goes up to our review of Ammann and Wahl, which I’ll post up tomorrow or the next day. I’ve started with some requests from me to Ammann long before the Climatic Change process began. I’ve described my discussions with Ammann at AGU elsewhere.
Correspondence with Ammann and Climatic Change
Dec. 22, 2004 SM to Ammann (no answer)
Dear Dr Amman,
I attended your presentation at the AGU last week on your emulation of MBH. As you may aware, I have considerable background with this and am interested in the project. When do you anticipate posting up your results? I would be interested in any results that you are able to share at the present time.
Thanks, Steve McIntyre
Jan 4, 2005 SM to Ammann (no answer)
Dear Dr Amman,
Michael Mann has been citing an article of yours under review as both confirming his results and discrediting findings attributed to McIntyre and McKitrick. I would appreciate a chance to look at your article. Thanks, Steve McIntyre
May 12, 2005 Clim Chg to SM
Dear Dr. McIntyre,
Attached is a letter from Stephen Schneider requesting review of the above referenced paper, which is also sent as an attachment (ms and four figures).
Please acknowledge receipt and let us know if you need a hard copy.
May 12, 2005 SM to Clim Chg
I appreciate the invitation and would be happy to provide a review within 4 weeks. Regards, Steve McIntyre
May 12, 2005 Clim Chg to SM
Dear Dr. McIntyre,
Thank you for confirming receipt and for your interest in providing comments. Attached are also the Guidelines for Reviewers and Climatic Change Editorial Policy. CCedpolicy98.pdf CCGuidelinesRevs98.pdf
Regards, Katarina Kivel
May 13, 2005 Ammann and Wahl to M&M
Dear Steve McIntyre and Ross McKitrick,
we have finally submitted our manuscripts containing our own reproduction of the Mann-Bradley-Hughes climate reconstruction including a now complete analysis and verification of suggested modifications put forth in your GRL and Energy and Environment articles.
It is our understanding that you should get the two papers to review shortly (or you might have received them already). If you should not receive such a request, please let us know so that we can send you a copy.
Caspar and Gene Wahl
May 13, 2005 SM to Ammann (no answer)
Thanks for the email. I’ve received the CC paper, but not the GRL paper, which will probably arrive next week, so I wouldn’t mind seeing it.
Obviously there’s quite a lot of interest in this topic. Yours is the 4th Comment submitted so far to GRL, so replying has become a small industry. We have 2 Replies to finalize for next week. I suspect that they’ll run all of them at the same time.
I’ve started reconciling your code to our code and finding a lot of similarities so far. I’m glad it’s in R. While you characterize your results quite differently than we do in our EE article, many of the conclusions on calculations seem pretty similar (you note similarities in a couple of places in the CC submission). In my opinion, the key issues are going to be assessing the quality of bristlecones as a determinant of world climate history and sole reliance on RE statistics without insuring against possibilities of spuriousness.
While both parties have different objectives in terms of conclusions that they wish to emphasize, I’m 99% sure that there will be a great deal of common ground in terms of code. In order to focus debate, I would like to suggest that we try to work towards some joint statement on how we have emulated MBH98 and on any residual differences between our methods. I’m annotating as I go, and if there is some possibility of doing a joint statement, I’ll share these comments rather than using them for controversial purposes.
I would characterize both algorithms as emulations, as neither of us has "reproduced" MBH98 in audit terms, although each of us has replicated enough of the characteristics to make analytical statements. I think that your website language is somewhat misleading in this respect. In passing, it seems a little churlish that you should criticize von Storch (correctly) for not attempting to replicate MBH methods in your CC article, while at the same time, not acknowledging our emulations which attempt and substantially accomplish what you criticize vS for not doing.
Regards, Steve McIntyre
June 6, 2005 SM to Climatic Change
In a first look at the submission by Wahl and Ammann, I noticed the following missing information and data, which I require in order to finish the review.
The authors rightly place considerable importance on the need to report verification statistics on climate reconstructions (see page 7 ) and stated (page 23) that such verification statistics would be available at a website. However, I was unable to locate them in the article or at the website, other than the Reduction of Error statistic, the distribution of which is at issue and which certainly should not be the only significance test cited. Could you please have the authors provide the following verification statistics for each of the runs cited in the article:
1· Skill Score (according to formula in Wilks , equation 7.20)"
2· Product-moment correlation coefficient (Cook et al ;
3· Sign test (Cook et al ;
4· Product means test (Cook et al ;
5· Coefficient of efficiency (Cook et al ;
Thanks very much. I anticipate making a number of comments after receiving this information.
June 10, 2005 Response by Ammann and Wahl,
Dr. Stephen Schneider
Editor in Chief, Climatic Change
Dear Dr. Schneider:
This communication is in response to a request for additional information (specifically calculation of a number of statistics) for our submission, # 3321.
Our general conclusion is that the statistics we already have included in mss. #3321 are the most meaningful for the purpose of examining the validity of reconstructions of decadal and multi-decadal trends of surface temperature over the last millennium. Extending the set of measures to include those requested would add only very-high frequency (interannual) information that cannot, by construction, examine the fidelity of reconstructing longer-term trends. Thus, these measures are not directly relevant to the purpose of mss. #3321. We explain our reasoning in detail below.
We also would like to emphasize that the purpose of making our code and data sets available
(cf. http://www.cgd.ucar.edu/ccr/ammann/millennium/CODES_MBH.html ) is to facilitate examination of the MBH reconstruction and the other scenarios we examine. Of course, the reviewer is free to use these tools to calculate these statistics him/herself. Indeed, we are already aware of one such use at the following website, http://www.climateaudit.org (S. McIntyre).
First, and most generally, the statistics requested by the reviewer measure in the high frequency (interannual) range, as explained in detail below for the various measures. It was not by omission of consideration that we did not include any of these measures, but rather that we consider the exact interannual tracking of the climate reconstructions we have done to be, at most, of minor consequence to determining their usefulness. The reason for this is that it is at the scale of low-frequency information (multi-decadal to secular variation) that issues about last-millennium climate reconstructions are the most salient. This is clear from the last-millennium paleo-reconstruction literature, and the scientific debate has generally shifted towards this set of issues in regards to the MBH reconstruction in particular, as demonstrated by the attention given to the von Storch et al. (2004) and Moberg et al. (2005) examinations of it. In mss. #3321 itself, we address the primary issue of whether or not the early 15th century can reasonably be considered anything like the later 20th century in terms on N. Hemisphere average surface temperature. Individual years are not at issue here, but rather averages on the order of 2-5 decades. We also address the low-frequency amplitude issues raised by von Storch et al. and Moberg et al., in recognition of their importance.
The measures we use, RE and deviation from the mean of the verification period, are specifically included to account for this consideration. RE, by design, picks up a combination of both high and low frequency information in the independent verification period (explained below), and the deviation of the reconstructed verification-period mean from its instrumental counterpart picks up the lowest frequency information possible, at the scale of the entire verification period (1854-1901). We believe that the combination of these two measures is appropriate to characterize the reconstructions for the primary task of discerning long-term deviations from the calibration-period (1902-1980) mean, which is the heart of the matter for last-millennium reconstructions.
Consequences of Focusing on High Frequency-only Measures
If we were to employ high-frequency-only measures, our primary conclusion concerning the trajectory of N. Hemisphere temperature over the 600 years would not be fundamentally altered. That is, the results presented by McIntyre and McKitrick, which we find to be without merit based on RE and deviation from verification period mean, would still remain without merit. None of these results would be altered into significance by the use of high-frequency-only measures, thus the MM "correction" to MBH that the early 15th century was at least as warm as the late 20th century would be refuted in any case. What could possibly change is that some of the MBH "segments" (based on varying richnesses over time of the proxy data) and some of the WA scenarios we present might not pass verification significance testing at the highest-frequency domain. If one wanted to use this frequency domain as the primary gauge of significance (which we argue, as above, is not at the heart of the matter), then the most impact such consideration would have would be to make moot the reconstruction scenarios thus judged. In such a process, some information that is demonstrably valid at lower frequencies could be lost, but no new information would be added.
An analogy to the frequency spectra of musical instruments is apt in this regard. A violin and flute playing A440 are both producing sound pressure waves with a fundamental frequency of 440 cycles per second. Although they are playing the same note, what allows us to readily detect that two quite different instruments are being played is the sonic energy being produced at higher frequencies by each instrument (called "harmonics", or more generally "overtones"). The energy and frequency spectra of these higher-frequency components of the whole sound differ for families of instruments, and indeed for each individual instrument. Using high-frequency-only measures of merit as the final arbiters in validating climate reconstructions would be analogous to using only the overtones to characterize the sound being produced by different instruments. Doing so would allow us to determine which instrument is being played, but would lose the information of what notes they are actually playing! In climate reconstruction, such a process would involve losing trend information in relation to a standard (typically the calibration period mean), but would focus on year-to-year fidelity. It is exactly this result that use of the statistics requested as primary validation criteria would entail.
Based on these considerations, we believe that the measures of merit we have reported in mss. #3321 are appropriate to validation at the frequency domains that are salient in last-millennium climate reconstruction of hemispheric/global averages. That high-frequency information has, at least some, relevance we do not argue, but we do strongly argue against using high-frequency-only measures as final arbiters of significance. To do so could result in throwing out demonstrably valid decadal/multi-decadal information, which we believe is a scientifically inappropriate waste of information.
Considerations Concerning the Requested Statistics and the RE Statistic
All of the requested statistics we have examined (the first four) isolate high-frequency (interannual) information on reconstruction performance. Each of the four measures is evaluated in this regard here. The RE statistic is evaluated in relation to the other statistics in (5).
1) The product moment correlation coefficient (r) Any arbitrary offset in the means of the series being compared leaves (r) completely unchanged, meaning that it can have either low or high values that are entirely unrelated to the low-frequency performance of the reconstructions.
2) The coefficient of efficiency (CE) In the case of CE, a related issue arises in that, by design, the mean of the period being examined is the standard against which deviations in the instrumental values are calculated (cf. (5) below). Thus, CE is incapable of measuring the ability of the reconstructions to detect changes in mean behavior (in relation to the calibration period mean) of the instrumental data being used for verification.
3) The sign test This test is, again by construction, a high-frequency-only statistic. It measures only year-to-year changes in the direction of sign of the reconstructions in relation to those of the actual values.
4) The product means test The product means test can also be a test that is insensitive to detection of changes in climate average behavior–depending on the mean values used for calculating the "cross-products of the actual and estimated yearly departures from their respective mean values" (Cook et al., 1994). If these means are both over the verification period, then again, this statistic is a high-frequency-only measure. It is this use that we expect from the context of the Cook et al. explanation.
5) The reduction of error statistic (RE) RE is identical to CE, with one exception. For a given period of interest, both subtract from one the ratio of the sum of squared residuals of reconstruction to the sum of squared deviations of the instrumental values from their mean. In the case of CE, as mentioned in (2), the mean for the instrumental values during the verification period is the verification-period mean itself. In the case of RE, the mean for the instrumental values during the verification period is the calibration-period mean. This difference allows the RE of verification to detect as useful information changes in the mean of the reconstructed values from the calibration-period mean. RE rewards this detection, and thus it can register as a valid reconstruction one that does lose some high frequency fidelity in the verification period, but which retains useful low-frequency fidelity in relation to offsets from the calibration period mean. Cook et al. discuss this "odd behavior" that a high-frequency test (they mention r2) can show poorer performance than RE in such a situation. However, this discussion is concerned with ensuring that high frequency reconstruction fidelity is the target of interest; conversely–and most importantly–the detection of differences of mean between the calibration and verification periods is not considered as a target of examination.
Concerning whether RE is "at issue" The requester mentions that the RE statistic is at issue, a claim that Dr. Ammann and I have shown is made moot by the results of our indirect tests in ms #3321. In addition, Dr. Ammann and I have shown in other material referenced in mss. #3321 that the analysis of McIntrye and McKitrick in GRL (2005)–which claims RE significance levels are improperly determined by Mann, Bradley, Hughes–is itself deeply flawed. Thus, the argument in the request is incorrectly put in this regard, and it also ignores that we do use an entirely separate statistic–the deviation from verification period mean. [my bold]
June 10 Schneider to SM (Cover Letter for A&W Response of June 10, 2005)
Dear Dr. McIntyre,
With regard to your request, authors Eugene Wahl and Caspar Ammann claim (see attached) that much of the data you have requested can be derived from information they have already given and argue that high-frequency results are not what is in debate in most of the literature. In fact, I wonder, given Ed Lorenz’s classical contributions on unpredictability of weather, which leads to stochasticity of high-frequency climate, what could we learn at an interannual time scale about longer term issues like the multi-decade averaged paleo-climatic temperature reconstruction that has been in dispute?
In any case, if you have strong arguments to the contrary, of course, I will be happy to receive them and pass them on to the authors.
In addition, given the nature of this issue, it is not unlikely that there will be unresolved methodological and philosophical differences among reviewers and between reviewers and authors on this topic. If, as I suspect, that turns out to be the case, then the usual practice at Climatic Change, when there is no closure between some reviewers and authors, is to commission “springboard” editorials that openly raise these issues of dispute, so that the broad interdisciplinary readership of Climatic Change can be better enlightened on what is technical and what is paradigmatic disagreement.
But, it is premature to predict an impasse between reviewers and authors until all reviews are in and the authors’ revision is resubmitted.
Thank you for your efforts as a reviewer.
Stephen H. Schneider
June 15 SM to Climatic Change
Could you please send me a pdf of the following publication. I would like to refer to it in order to respond to the recent letter from Wahl and Ammann in connection with my review of their CC submission. Dr. Schneider should have a copy of the article around (he cited it in a recent presentation in England). If not, could you obtain it from the authors.
Wahl, E.R. and Ammann, C.: "Stationarity and Fidelity of Simulated El Niàƒ-Southern Oscillation Climate Proxies over the Last Millenium in Forced Transient AOGCM Output".
At their website, they say the following in connection with this study: "This result indicates that modern-period validations of reconstructions based on relatively poor-quality proxies can give a strongly false sense of security about the likely long-term reliability of these reconstructions." Thus, it bears strongly on their current argument for refusing to produce verification statistics that I believe to be relevant.
Regards, Steve McIntyre
June 15, 2005 Ammann Response to June 15 Request
This request for an additional manuscript is rather puzzling to me. It appears highly unusual that a reviewer would be requesting through an editor material that is (a) not mentioned anywhere in the manuscript and (b) not at all relevant to the research contained in the paper under review. The submitted manuscript, including the online distribution of the reconstruction code, seems quite sufficient for performing a review.
The requested paper is completely irrelevant for the review because it is based on Climate Model data, studies a few single isolated grid points, and focuses on high-frequency interannual climate variability. All three of these issues are not under consideration in the Climatic Change manuscript where the aim is to introduce an open-source code to redo Mann-Bradley-Hughes (MBH) within its own framework and evaluate a number of recently raised criticisms that concern century scale hemispheric climate. Neither the criticisms nor our evaluation addressing them question the fundamental assumptions underlying MBH. This is clearly stated in our submission. There is no element in the thrust of the manuscript that the reviewer is considering that has any link to the mentioned paper on Stationarity and Fidelity of ENSO using climate model data.
After brief consultation with E. Wahl, I politely decline this request and would ask the reviewer in question to get in touch with us if he or she is interested in this science unrelated to the manuscript under consideration.
June 22, 2005 SM to Climatic Change (no answer)
Dear Dr. Schneider,
In your letter of June 10, 2005, you suggest that I, in my capacity as a reviewer, should carry out computer runs in order to obtain the data that I requested from Wahl and Ammann. You stated as follows:
With regard to your request, authors Eugene Wahl and Caspar Ammann claim (see attached) that much of the data you have requested can be derived from information they have already given.
Wahl and Ammann’s exact words were:
We also would like to emphasize that the purpose of making our code and data sets available (cf. http://www.cgd.ucar.edu/ccr/ammann/millennium/CODES_MBH.html ) is to facilitate examination of the MBH reconstruction and the other scenarios we examine. Of course, the reviewer is free to use these tools to calculate these statistics him/herself.
First, the website in question only contains information on one scenario. It does not include information on the other scenarios examined. These are promised after publication in Climatic Change. So the data sets involved in the “other scenarios”‘? are, as a matter of fact, not available.
Second, the availability of “much of the data”‘? is no substitute for the availability of all the data. In my experience, it is usually the data that is hardest to obtain that is most likely to prove problematic.
Thirdly, availability of source code does not affect the authors’ responsibility to provide requested data and results. The availability of source code is very important for verification and replication, but it is not a substitute for provision of important and standard statistics as calculated by the authors.
Finally, last year, with respect to another paper, you explicitly took the view that Climatic Change reviewers were not expected to run source code. Your words were:
Reviewers are not expected to rerun authors codes, (Jan. 25, 2004)
[it] is not generally a reviewer responsibility to perform replication analyses–as a practical matter we’d have precious few pro bono reviewers if each were required to perform replication work on complex codes–theirs or anyone else’s. (Feb. 19,2004)
Wahl and Ammann may not be aware of this CC policy, but I find it particularly odd that you should have adopted the position of your recent letter. I request that you re-consider your decision in the light of the opposite position that you took last year.
Thank you for your consideration on this matter. I also find both the reasons for the refusal by Wahl and Ammann to provide the requested results and the refusal itself to be very unacceptable. I will send you comments on this in a separate email.
Regards, Steve McIntyre
June 22, 2005 SM to Climatic Change,(no answer)
The response letter from Wahl and Ammann states that they "have shown in other material referenced in mss. #3321 that the analysis of McIntrye and McKitrick in GRL (2005)–which claims RE significance levels are improperly determined by Mann, Bradley, Hughes–is itself deeply flawed."
This "other material" is not on the present record. 1) Could you ask them to briefly summarize the flaws in the analysis of RE significance levels that they are referring to here. 2) if the other material is unpublished, could you ask them to provide a copy of the other material so referenced. 3) could you find out from them the approximate anticipated publication date of the other material?
Regards, Steve McIntyre
July 7, 2005 SM to Schneider
Dear Dr Schneider and Ms Kivel,
I have not heard back from you on my most recent correspondence. Proceeding on the information available to me, I have enclosed my review of the Wahl and Ammann submission, which certainly took more time than I would have wished.
However, I have benefited from reading many interesting articles in Climatic Change and am happy to assist with any reviewing where you think that my assistance will be of benefit to CC.
Regards, Steve McIntyre
July 8, Clim Chg to SM
Dear Dr. McIntyre,
Steve Schneider has asked me to let you know that he appreciates your reviewing the manuscript by Wahl and Ammann in a timely manner. But since it is important that the process be as thorough as possible, please let us know if you need more time to prepare a revised version of your review based on this response from the authors to your most recent request, or if you are satisfied with your review as is.
We look forward to hearing from you.
RESPONSE FROM EUGENE WAHL AND CASPAR AMMANN
The attached article text is in response to the request from the reviewer received June 30, 2005. It is the full text of an article submitted by Caspar Ammann and myself to GRL, which was declined. The decline decision was not for technical reasons, but because GRL had several comments on the same initial paper by McIntyre and McKitrick (2005), of which ours was one, and the editor chose to decline for reasons of repetitiveness. We disagree with this decision from an editorial policy standpoint, however, we are planning to submit this text to another journal besides GRL. What is attached is exactly the text to which we refer in mss 3321. The attachment and this paragraph together provide a full response to the reviewer’s questions 2 and 3.
In response to the reviewer’s question 1, in the quote from our response letter to the reviewer’s earlier request for additional information (response dated June 10) we are actually not commenting on an analysis of the significance of RE levels as such. Rather, we are highlighting that the analysis in McIntyre and McKitrick (2005)–on which the question regarding RE significance levels is itself based–is flawed. We demonstrate these flaws in the attached text, which as mentioned is in the process of being re-submitted (the fundamental scientific content will be unchanged on re-submission). The purpose of mentioning these flaws is to show that the McIntyre and McKitrick article, which is used as a basis for justifying higher RE levels for significance than those commonly used in dendroclimatology and by MBH, is itself at issue for being conceptually inaccurate, and thus cannot be a strong basis on which to question standard analyses of RE significance.
July 8 SM to Clim Chg
No, it is unnecessary to change anything. Regards, Steve McIntyre
Review Submitted July 7, 2005 (coming soon)