Ammann at AGU #2

Continued from Ammann at AGU (#1).

I’m going to give a fairly brief account of previous attempts to get the residual series and/or cross-validation R2 from Mann, including inquiries to Mann, N.S.F., through Nature, by Climatic Change, by Natuurwetenschap & Techniek and by the House Energy and Commerce Committee. As you will see, no one has been able to get Mann to disclose the information – even with a very direct question by the House Committee.

Do residuals and cross-validation statistics “matter” and should Mann have to disclose them? Well, they are vital to consideration of any statistical model. They should be every bit as important to a climate scientist as DNA fingerprints and stem cell colony photos are to stem cell researchers. Imagine if the SI to the Hwang article had not contained this information? Without the detailed SI, Hwang would still be in business. But there are many reasons short of fraud to examine the residuals; concern about fraud is probably the last reason. But none of these other reasons have so far prevailed. Preparing this review has reminded me just how determined Mann has been in avoiding disclosure of this information and the dangerous line that he is treading with respect to the House Energy and Commerce Committee.

Mann and NSF
Reprising briefly here my recent discussion of this stage. On December 17, 2003, I requested the residual series from Mann, copying David Verardo. David Verardo had replied here that the source code was Mann’s personal property. This was a highly questionable assertion since Mann’s terms of employment appear to provide that the source code was university property, a claim discussed at Title to Source Code and The Tort of Conversion here.

Nature
After being rebuffed by Mann in the request for residuals, on Dec. 17, 2003, we added this request to our existing Materials Complaint to Nature here (item 3). Nature promised to seek “external independent advice” on these matters, but failed to do so as is evident in the correspondence file.

In February 2004, since Mann’s response to the Materials Complaint had not satisfied them, they advised me that they would require a Corrigendum, saying that they “trust that the responses answer all your queries, and that you find this resolution of the matter satisfactory”. When we expressed concerns, they said “The authors have assured us that the data sets and methods are revealed completely and accurately, and we are confident that they are as keen as yourself to resolve the matter. ”

When we saw the draft Corrigendum, we pointed out many problems, including our concern about whether the residual series were in the proposed SI (see heading Supplementary Information item 3). We saw the draft SI only in June 2004 after being directed there in connection with our review of a submission by MBH to Climatic Change. We noticed that the requested information on cross-validation statistics and residuals was not in the draft SI and immediately notified Nature. I received a temporizing reply: “We do hope that this will provide you with the information that you are after, but please do not hesitate to get in touch if further problems remain.”

When we saw the referee comments in August 2004, we realized that they had not been involved in refereeing the Corrigendum For example, one of the referees said:

For instance, questions that seem to be quite critical, such as the sensitivity of the MBH98 reconstructions in more remote periods to changes or omissions in the proxy network or the dependency of the final results to the rescaling of the reconstructed PCs, have become clearer to me now. From the reply in MBH04 I am now afraid that they were not sufficiently described in the original MBH98 work. In particular the PCs renormalization, could have been included as clarification in the recent Corrigendum in Nature by MBH.

He also said that our investigations should not be “hampered” as follows:

I would encourage them to pursue their testing of MMB98,and by the way other reconstructions. As I wrote in my first evaluation, this should be a normal and sound scientific process that should not hampered.

On August 10, 2004, we re-iterated our longstanding requests for the residuals and source code. These were referred to the Editor himself. On Sep. 7, 2004, the tortuous process reached a dead end as follows:

And with regard to the additional experimental results that you request, our view is that this too goes beyond an obligation on the part of the authors, given that the full listing of the source data and documentation of the procedures used to generate the final findings are provided in the corrected Supplementary Information. (This is the most that we would normally require of any author.)

Obviously not every author had been required to issue a completely new SI. Further, MBH had certainly not received a clean bill of health from the referees. In my opinion, regardless of a publication decision on our comment, this position by Nature – taken directly by Philip Campbell – was completely unreasonable.

Climatic Change 2004
Late in 2003, MBH had submitted an article excoriating our 2003 article to Climatic Change. You can see a reference to it on Stephen Schneider’s reference list here. Mann and others had vehemently objected to our publishing at E&E without their having a chance to review (notwithstanding Mann’s prior statement to us that he was too busy see #16), a position reported at Schneider’s website as follows:

Mann and his colleagues and other members of the scientific community were outraged when they learned of the publication of the McIntyre/McKitrick article. Most credible scientific journals receiving criticism of previously published work typically give the authors under fire the chance to review and respond to an article challenging their claims.

In fairness to this position, in late 2003, Schneider offered me a chance to review the MBH submission criticizing our 2003 paper. (I’ve found Schneider to always be an engaging and cheerful correspondent.) In my capacity as a reviewer, I promptly requested the source code and residual series, in this case also specifically requesting the Durbin-Watson statistic. (I wasn’t then thinking about anything as mundane as total failure of cross-validation R2 statistics.) This occasioned a very interesting correspondence. Schneider advised me that no one had ever requested source code in 28 years of his editing the journal and even to make the request required an editorial decision. We had lengthy and interesting correspondence. Eventually I was informed that the Climatic Change had adopted a policy requiring authors to provide supporting data, but not source code. I re-iterated my request for residual series as supporting data and Schneider duly requested the information from Mann. Mann provided a URL for the SI being prepared for the Nature Corrigendum (without mentioning this to Schneider) who interpreted it as merely being helpful. But he refused the request for residuals and the Durbin-Watson statistics in no uncertain terms as follows:

It is not our responsibility to provide [the residual series], we have neither the time nor the inclination to do so. These can be readily produced by anyone seeking to reproduce our analysis, based on the data we have made available, and our method which we have described in detail

For the Durbin-Watson statistic, they said “We did not describe such statistics in our study.” Can you imagine an econometrician satisfying an editor with such a statement?

Schneider sent the MBH response to me with the comment:

I am hopeful that you will now be better able to complete your review, though not all items you requested–in particular source code–are included.

I dutifully wrote a review, pointing out that MBH had just stuck a finger in the eye of Climatic Change’s policy on providing supporting data and had thereby disqualified themselves. In my opinion, Schneider should have dealt with the matter editorially as soon as Mann refused to provide the supporting data. Why did he need me to write a review after such an overt refusal to supply supporting data? He’d already seen the breach of policy for himself and should have reacted accordingly right away.

I never heard any more about the submission, but, in any event, the MBH submission was never published. However, by then, Jones and Mann [2004] had cited MBH [submitted to Climatic Change] as authority for statements hyper-ventilating against us. (They did not withdraw these statements when the submission was not published.) In terms of our getting the residuals, Mann et al. had withdrawn their paper rather than provide the residual series. (I re-iterate that we did not know that there were problems as elementary as the cross-validation R2 when we started asking for the residuals; however, Mann did. This undoubtedly accounts for the almost hysterical comments about the R2 statistics when we first mentioned them in our revised Nature submission.)

Natuurwetenschap & Techniek
In our submission at Nature, notwithstanding our explicit statements that we were not offering an “alternative” reconstruction, one of the reviewers asked us for cross-validation statistics. Grudgingly, we did the calculations for our re-submission and first noticed low R2 values for the 15th century step in our emulation of MBH98; we reported this but rather as an afterthought. Our main focus was on the remarkable lack of “robustness” of results to a few series and to the methodological fingers on the scale through the PC method and the “editing” of the Gaspé series. In our 800-word version, we even dropped out this point. However, the passing mention of the low R2 provoked a hysterical response from Mann in his Reply to Referees – he fulminated against the R2 statistic on no fewer than 5 occasions in the Reply. (These fulminations ultimately resulted in the curious diatribe against the R2 statistic in Rutherford et al [2005], a diatribe which is completely inconsistent with prior positions of the parties on cross-validation R2 statistics. The only motivation for this diatribe was our pending Nature submission, which theoretically was governed by confidentiality restrictions prohibiting the use of the material by the responding authors for their own purposes. However, that’s another story.)

On our side, the issue of cross-validation statistics focused a little more when we saw the referee comments in August. They spent a lot of time on RE versus R2 – far beyond anything in our revised submission, where the R2 comment had been exported to the SI. I suspect that Mann brought the attention upon himself by his fulminations against the R2 statistic in his Reply to Referees.

One of the referees (#2 here), (the one who also said “I am particularly unimpressed by the MBH style of ‘shouting louder and longer so they must be right”) gave short shrift to Mann’s argument for the supremacy of RE versus R2 and his comments are very perspicacious given subsequent MBH positions:

The advocacy of RE in preference to r by MBH is a bit extreme. The correlation coefficient certainly has drawbacks, but no verification measure is perfect, and I see no evidence in the verification literature (or Wilks) that RE is the standard preferred measure. Indeed the only one of the 3 references (7) cited in the revised response that was available to me is somewhat critical of RE. My preference would be not to rely on a single measure, but to look at contributions form bias, differences in variances and departures from linear dependence.

This referee was a statistician who specialized in principal components. (A reader has written to me that he is in fact a very eminent specialist if you’re trying to guess. Update – it was Ian Jolliffe) However, we lost ground with Referee #2 of our original article (#3 in the re-submission) , who was strongly influenced by RE arguments. (Frustratingly, he benchmarked the RE versus R2 against the AD1820 step, where the R2 is favorable. This step was illustrated in a map in MBH98 – obvious proof that they used the R2 statistic when it was to their advantage.) If you read the comments of this referee, you’ll notice the mis-spelling McKritik, which is a Germanic mis-spelling that I’ve noticed elsewhere, giving some clues as to who the referee might be (Update -Eduardo Zorita who later expressed regret).

While the referee position was frustrating in terms of getting published at Nature, the comments do not give MBH98 a clean bill of health and, as I mentioned above, should have occasioned a fresh re-refereeing of MBH98 itself, which did not take place. For our purposes, we realized that we had to deal directly with RE statistics in a way that we’d not done in our Nature submission. This led directly to a complete re-thinking of the topic expressed in our GRL article, which was far more than a regurgitation of the Nature submission.

We improved our MBH emulation using the new data at the Corrigendum SI (available in July 2004) and felt confident enough in our emulation to assert that cross-validation R2 for the 15th century step was approximately 0.02 – obviously a damning result (as were other standard cross-validation statistics used in paleoclimate such as the CE, product mean test and sign test.) We reported this in our GRL article and no one to date has denied this , although we’ve been accused of many things. With such a lousy verification R2 statistic, we argued that it was impossible for the underlying model to have statistical significance and thus that the seemingly significant RE statistic was in fact “spurious” using this in a statistical sense [Granger and Newbold, 1974; Phillips 1986] rather than in the Mannian sense of merely being a term of disapproval.

Further, it seemed impossible to us that Mann would not have calculated the R2 statistic (especially since the R2 statistic for the AD1820 step was shown in a map by gridcell). Snce it was not reported in the original SI, the only conclusion was that Mann had withheld the information. We eventually commented on this in very sharp terms in our E&E article.

So this topic was very fresh in my mind when, in late 2004, I was interviewed by Marcel Crok, a reporter for Natuurwetenschap & Techniek. Like others, he initially viewed the story as an unlikely curiosity, but gradually got very intrigued and wrote a lengthy article. As part of his due diligence, he asked me for some questions to ask Mann through which he could try to differentiate our positions. I suggested the question about the cross-validation R2. So the question from NWT to Mann directly asking about it was really the first direct inquiry for the cross-validation R2 (as opposed to the inquiry for the residual series which would have led to it). I’ve excerpted the full dialogue from NWT since it is really quite provocative, but I urge interested parties to re-read the full letter at NWT since it gives such a nice flavor to Mann’s efforts to block any criticism.

[[2) There is a severe debate between you and MM about the skill of the calculation. You claim a high RE-statistic. MM show that their simulated hockey sticks also give a high RE-statistic but a very low R^2 statistic. ]]

We showed in our reply to the REJECT MM comment to Nature, that they incorrectly calculated all of their verification statistics, because they didn’t account for the changing spatial sampling of the Northern Hemisphere temperature record back in time. See the attached supplementary information (“supplementary3.pdf”–read page 2) that was provided to the reviewers of the rejected comment by McIntyre and McKitrick. Keep in mind that the reviewers of their Nature comment, who had the expertise and full available material to judge whether or not MM’s claims were plausible, decided that they were not.

Our reconstruction passes both RE and R^2 verification statistics if calculated correctly. Wahl and Ammann (in press) reproduce our RE results (which are twice as high as those estimated by MM), and cannot reproduce MM’s results. There is little, if anything correct, in what MM have published or claimed. Again, none of their claims have passed a legitimate scientific peer review process!

See also Rutherford et al (in press–see above) for an extensive discussion of cross-validation, and the relative merits of different metrics (RE vs CE vs r2). It is well known to any scientists in meteorology or climatology that RE is the preferred metric for skill validation because it accounts for changes in mean and variance prior to the calibration interval (which R^2 does not!). The preferred use of RE dates back to the famous paper by Lorenz in evaluated skill in meteorological forecasts.

It must be stated that McKitrick has been shown to be prone to making major errors in his published work. You should refer to the discussions here:
http://www.realclimate.org/index.php?p=41

and here:
http://cgi.cse.unsw.edu.au/~lambert/cgi-bin/blog/2004/08#mckitrick6

particularly interesting, in the context of this discussion, is his failure in an independent context (the Michaels and McKitrick paper discussed in the first link) to understand the issue of cross-validation! That is, in both the McIntyre and McKitrick ’03 paper, and the Michaels and McKitrick ’05 paper, the authors failed to even understand the importance of performing cross-validation! Such papers could never be published in a respected scientific journal.

[[In MBH98 you didn’t calculate the R^2 statistic, but in Mann and Jones (2003) you did. I asked Eduardo Zorita questions about this and he said he would calculate both. Why didn’t you calculate the R^2 in MBH98? ]]

Repeating what I said above, see Rutherford et al (in press–see above) for an extensive discussion of cross-validation, and the relative merits of different metrics (RE vs CE vs R^2). It is well known to any scientists in meteorology or climatology that RE is the preferred metric for skill validation because it accounts for changes in mean and variance prior to the calibration interval (which R^2 does not!). The preferred use of RE dates back to the famous paper by Lorenz in evaluated skill in meteorological forecasts.

It’s interesting to re-read Mann’s answer especially in light of subsequent access to Mann’s source code in the summer of 2005, which provided incontrovertible evidence that Mann had in fact calculated the cross-validation R2 statistic (which was then not reported.) See for example Cross-Validation R2 and More on Cross-Validation R2. Note that Mann told NWT that his reconstruction “passes both RE and R^2 verification statistics if calculated correctly.” This is obviously a different claim than saying that the R2 test should be preferred.

In any event, while Mann fulminated at length, you will note that he did not provide the cross-validation R2 statistic in question to NWT.

House Energy and Commerce Committee
Now comes a remarkable twist to the story. The House Energy and Commerce Committee became intrigued with the matter when Mann, who had testified to Congress, injudiciously told the Wall Street Journal that he would not be intimidated into disclosing his algorithm. unofficial online here . The House Committee asked Mann (and Bradley and Hughes) straight out:

7 c. Did you calculate the R2 statistic for the temperature reconstruction, particularly for the 15th Century proxy record calculations and what were the results?”
d. What validation statistics did you calculate for the reconstruction prior to 1820, and what were the results?

You wouldn’t think that that left much wiggle room. But do you think that they got a straight answer? Neither Bradley nor Hughes even answered the question as I noted here. I’ve provided below Mann’s full answer as it is rather delicious:

A(7C): The Committee inquires about the calculation of the R2 statistic for temperature reconstruction, especially for the 15th Century proxy calculations.

In order to answer this question it is important to clarify that I assume that what is meant by the “R2” statistic is the squared Pearson dot-moment correlation, or r2 (i.e., the square of the simple linear correlation coefficient between two time series) over the 1856-1901 “verification” interval for our reconstruction. My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of “skill.” The statistic used by Mann et al. 1998, the reduction of error, or “RE” statistic, is generally favored by scientists in the field. See, e.g., Luterbacher, J.D., et al., European Seasonal and Annual Temperature Variability, Trends and Extremes Since 1500, Science 303, 1499-1503 (2004).

RE is the preferred measure of statistical skill because it takes into account not only whether a reconstruction is “correlated” with the actual test data, but also whether it can closely reproduce the mean and standard deviation of the test data. If a reconstruction cannot do that, it cannot be considered statistically valid (i.e., useful or meaningful). The linear correlation coefficient (r) is not a sufficient diagnostic of skill, precisely because it cannot measure the ability of a reconstruction to capture changes that occur in either the standard deviation or mean of the series outside the calibration interval. This is well known. See Wilks, D.S., STATISTICAL METHODS IN ATMOSPHERIC SCIENCE, chap. 7 (Academic Press 1995); Cook, et al., Spatial Regression Methods in Dendroclimatology: A Review and Comparison of Two Techniques, International Journal of Climatology, 14, 379-402 (1994). The highest possible attainable value of r2 (i.e., r2 = 1) may result even from a reconstruction that has no statistical skill at all. See, e.g., Rutherford, et al., Proxy-based Northern Hemisphere Surface Temperature Reconstructions: Sensitivity to Methodology, Predictor Network, Target Season and Target Domain, Journal of Climate (2005) (in press, to appear in July issue)(available at ftp://holocene.evsc.virginia.edu/pub/mann/RuthetalJClimate-inpress05.pdf). For all of these reasons, we, and other researchers in our field, employ RE and not r2 as the primary measure of reconstructive skill.

As noted above, in contrast to the work of Mann et al. 1998, the results of the McIntyre and McKitrick analyses fail verification tests using the accepted metric RE. This is a key finding of the Wahl and Ammann study cited above. This means that the reconstructions McIntyre and McKitrick produced are statistically inferior to the simplest possible statistical reconstruction: one that simply assigns the mean over the calibration period to all previous reconstructed values. It is for these reasons that Wahl and Ammann have concluded that McIntyre and McKitrick’s results are “without statistical and climatological merit.”

A(7D): The Committee asks “[w]hat validation statistics did you calculate for the reconstruction prior to 1820, and what were the results?”

Our validation statistics were described in detail in a table provided in the supplementary information on Nature’s website accompanying our original nature article, Mann, M.E., Bradley, R.S., Hughes, M.K., Global-Scale Temperature Patterns and Climate Forcing Over the Past Six Centuries, Nature, 392, 779-787 (1998). These statistics remain on Nature’s website (see http://www.nature.com/nature/journal/v392/n6678/suppinfo/392779a0.html) and on our own website. See ftp:holocene.evsc.virginia.edu/pub/Mannetal98.

Interestingly, the link to Nature does not contain the said statistics nor does the UVA website. (The statistics are still up at the UMass website, not listed here.) You’d think that he’d try to get things like this right once in a while.

I won’t go in detail over the many mis-statements and mischaracterizations here. The main point is: did the House Committee get the requested information about the cross-validation statistics? The answer is obviously that they didn’t. Maybe they’d have done better if they’d asked about steroids.

Anyway I’ll tell you tomorrow whether I was able to get the R2 information from Ammann at AGU, which Mann had so stoutly withheld even from the House Energy and Commerce Committee.

Continued here

This entry was written by Stephen McIntyre, posted on Jan 11, 2006 at 11:36 AM, filed under MBH98, Wahl and Ammann and tagged agu, ammann, r2, verification-r2, wahl, wilks. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

21 Comments

John A

Posted Jan 11, 2006 at 12:55 PM | Permalink

It still amazes me that Mann used such arrogant language to the Committee, as if they were children asking for the keys to the car.
Dave Dardinger

Posted Jan 11, 2006 at 1:46 PM | Permalink

I’d really, really, like to see George (muirgeo) analyze this posting of yours. I’m quite certain he won’t understand the perfidity of Mann, quite apart from the statistical questions.
Ross McKitrick

Posted Jan 11, 2006 at 2:36 PM | Permalink

Not to distract from the theme, but Mann has on several occasions fumed that Michaels and I did not do a cross-validation test. We did the test, it is in Table 7 of our paper, col 5. R2 = 0.26 and the associated F stat is significant. And of course Michaels and I published our data and code, thereby allowing for precisely the kind of scrutiny that Mann has assiduously blocked for his work. We also promptly acknowledged and fixed the error that Mann alludes to. That Mann throws these kinds of accusations around even while blocking external scrutiny of his own work is, to say the least, ironic.
Justin Rietz

Posted Jan 11, 2006 at 3:06 PM | Permalink

The following statement in Mann’s letter stood out to me:

“My colleagues and I follow the National Research Council’s guidance with regards to the disclosure of research data, and all of our data and methodologies have been fully disclosed and are available to anyone with a computer and an internet connection.”

Is this accurate? Has he released all of his code, data, and findings (obviously he has not released his R2 stats, if they exist). Or is it that the NRC doesn’t require the disclosure of all code and data?
John A

Posted Jan 11, 2006 at 5:08 PM | Permalink

Re #3

That Mann throws these accusations around and they get repeated from one weblog to the next, is part of the process of propaganda. If this was a scientist who had faith in the quality of his work, he would have published the lot long ago.

It is extraordinary that Mann, Bradley and Hughes refused to answer a direct question on statistical significance. I don’t think that Senator Barton is used to taking the middle finger as an answer from anyone, especially after a lecture from Mann given in such an exasperated and arrogant tone.

The Committee asked did he calculate the R2 statistic and what was the result prior to 1820 and Mann gave them a lecture on why they prefer the RE statistic. He then points them to citations that do not answer the Committee’s questions either.

If he was trying to invite a subpoena to appear before the Committee then he’s certainly gone the right way about it.
jae

Posted Jan 11, 2006 at 5:13 PM | Permalink

Did the Barton Committee drop these issues?
Steve McIntyre

Posted Jan 11, 2006 at 6:08 PM | Permalink

I don’t know. I can’t imagine that they’d just leave it alone given the cheeky answers, but they’ve got lots of other things to do and maybe their attention has gone elsewhere.
Bill Bixby

Posted Jan 11, 2006 at 7:42 PM | Permalink

FYI, Barton has asked the National Academy to convene a committee to look into the hockey stick debate. They are putting the committee together right now. The report is supposed to be out sometime this spring — that’s a short fuse for this kind of report. Should be interesting.
Steve McIntyre

Posted Jan 11, 2006 at 9:01 PM | Permalink

#7 – where did you hear this? I wonder what the terms of reference are. I sure hope that they are specific and detailed. I wonder how the National Academy is planning to carry out the assignment whatever it is.
Steve McIntyre

Posted Jan 11, 2006 at 9:21 PM | Permalink

Wikipedia: He has been organizing committee chair for the National Academy of Sciences “Frontiers of Science’ and has served as a committee member or advisor for other National Academy of Sciences panels.
Steve McIntyre

Posted Jan 11, 2006 at 9:48 PM | Permalink

#4. I’m not sure what “guidance” he’s talking about. I couldn’t locate specific policies in a quick browse of the National research Council website. The track record is very inconsistent with the recommendations set out here.http://www.nap.edu/books/0309088593/html/R1.html
Steve Bloom

Posted Jan 12, 2006 at 2:16 AM | Permalink

John A., that would be Congressman Barton rather than Senator.

Folks unfamiliar with the United States Congress might not realize that Barton stepped about as far into another committee’s turf as he could. If there’s one thing closer to a Congressman’s heart than perks from lobbyists, it’s turf. The NAS information is interesting, if true. It probably means that Boehlert’s committee (the science one) is either not going to hold hearings at all, or will only do so if the NAS report provides a reason. We shall see, but I doubt there will be a reason. In any case, once Boehlert had publicly stated the Barton letters were out of line Mann was in no danger of reprisal for being insufficiently responsive (if he actually was).
John A

Posted Jan 12, 2006 at 3:57 AM | Permalink

Re #12

I apologise for my error in referring to Barton as a Senator rather than a Representative.

However as Steve has already noted, Rep Barton’s Energy Committee has seniority over Rep Boehlert’s Science Committee and that Barton has been unafraid to use that seniority in order to ask critical questions about the science that is supposedly shaping and informing future energy policy.
Steve Bloom

Posted Jan 12, 2006 at 12:06 PM | Permalink

The link was to an editorial, not a news article. Seniority is a term freighted with significance in terms of Congress. Full policy committees aren’t “senior” to one another, even though it is probably fair to say that Barton’s committee is informally considered more important due to its subject matter jurisdiction. In terms of procedure, though, that doesn’t mean much. Barton can send letters until he’s blue in the face, but that’s a different animal from holding a hearing. The key point here is that Barton threatened to hold hearings, but hasn’t done so. In terms of future prospects for such hearings, it’s probably worth pointing out that Barton’s star is not exactly in the ascendant these days, while the clout of Republican moderates like Boehlert has been enhanced. Even if hearings were to be held by his committee, it’s not at all clear that Barton would be able to keep the hearing focused on MBH. I suspect it’s rather more likely that there would be a broader focus on GW issues, in which case I’m afraid all the more recent science would get far more emphasis than a now eight year old study that may or may not receive much of a mention in the AR4. Other than the WSJ, the press would also want to focus on more recent stuff. Barton and his staff are very aware of this, meaning that there’s a good chance they never even intended to call hearings.
Conrad

Posted Jan 12, 2006 at 2:38 PM | Permalink

Stem cell science is lucky that Hwang was caught. That allows the science to be cleaned up and standards improved. Climate Science has not been so lucky. There is a lot of junk encased in the totality of Climate Change, and the scientific community seems very reluctant to clean up the mistakes. That bodes very ill for the future of climate science and everyone involved in the quasi coverup.
Steve Bloom

Posted Jan 12, 2006 at 3:38 PM | Permalink

Ooh, it’s a *conspiracy*! What I want to know is, where was Michael Mann on that November 1963 day in Dallas? Attending nursery school? Not born yet? Picnicing with his kindergarten class on a grassy knoll? These are the kinds of pathetic excuses we’ve come to expect from the warmers…
Steve McIntyre

Posted Jan 12, 2006 at 5:07 PM | Permalink

#15, 16 – MBH is simply one study and the issue of global warming does not stand or fall on MBH. I’ve never said that it does – or even that it stands or falls with Hockey Team multiproxy studies. I’ve also never said that politians should do nothing. I agree (and I’ve said this before) that, if I were a politician, I would be guided by institutions like IPCC. However, I think that scientists and statisticians can reasonably question each IPCC step and argument and ask for comprehensive disclosure and politicians should expect that there has been adequate disclosure and due diligence throughout the piece.
Dave Dardinger

Posted Jan 12, 2006 at 5:39 PM | Permalink

Uh, Steve B, what in Conrad’s post indicates a conspiracy? However, I’ll admit that the sort of pathetic smear you tried is quite the kind of thing we’ve come to expect from warmers.
Martin Ringo

Posted Jan 13, 2006 at 11:15 PM | Permalink

Re: “Can you imagine an econometrician satisfying an editor with such a statement?”

First, I think the answer is “Yes.” Of course, that “Yes” is dependent on testing for serial correlation in other, (presumably) more robust ways or that the data sequence is arbitrary as in cross-sectional data. The latter, of course, does not, by definition, apply to chronologies. As to the former, one would usually expect an author to say something like, “We tested the influence of serial correlation on the test statistic with the X-Y-Z test [standard alternatives are Breusch-Godfrey and Ljung_Box which can handle multiple orders of autocorrelation and one could come up with better tests for non-linear models] and did not use, or hence report, Durbin-Watson test values.”

Second, if he had supplied the residuals, either explicitly or implicity [see third point], the D-W statistic is a pretty straightforward compuation:
DW = Sum[t=2 to N] (e(t)-e(t-1))^2/ Sum[t=1 to N] e(t)^2. Thus, it seems that the issue is not whether he supplied the D-W statistic, but whether he tested and accounted for serial correlation.

Third, why were the residuals needed? Did Mann fail to supplied the dependent variable and its estimates [the Y and “Y hat” values of the Y=F(X,e) relationship]? Maybe I should add, that I still haven’t figured out just exactly what variables are the Y and X’s or what is the “F(…)” of his calibration and verification.
Steve McIntyre

Posted Jan 13, 2006 at 11:35 PM | Permalink

Third point: the target temperature series was supplied, but not Y-hat for the 15th century step in controversy. The problem is that his reconstruction was in 11 steps with each step using different proxies and having different validity. The Y-hat that he supplied was the spliced reconstruction, so that the 15th century step in controversy was only reported for the first 50 years, then it was overwritten by the next step, etc.

At the time of the initial request, there was not even an accurate rendering of what data was used – he said that in the original article that 112 series were used, then after our 2003 article, he said that 159 series were used, then the Corrigendum SI listed only 139 series. Some series listed in the SI as being used in the principal components calculations were not used; the principal components methodology was notoriously misrepresented; there are other undescribed methods. So it was (and remains impossible) to exactly replicate his calculations. Even now both Ammann and Wahl and ourselves (who do agree) can’t reproduce his early 15th century results other than in general terms.

The way that he tested for serial correlation is pretty odd-looking to someone coming at it from an econometrics perspective. He eyeballed the Fourier spectrum of the residuals as to whether the spectrum was consistent with white noise. The residuals from the AD1820 step probably are, but perhaps not the earlier steps. (This doesn’t deter them.) They report for calibration residuals, but not verification residuals. I’m looking into this methodology as we speak, as I’m not familiar with the properties of the method, which is also used in Rutherford et al, 2005.

I think that there may also be issues about whether usual tests for autocorrelation in residuals will be effective against ARMA(1,1) residuals with high autocorrelation – for the reasons outlined in Deng [2005].

However, issues of autocorrelation in the residuals got overtaken with the catastrophic and unexpected failure of the cross-validation R2. But at the outset, I had no idea or reason to suppose that MBH98 would fail such a simple statistical test. Hence the request for residuals (or something else that would be equivalent).
Terry

Posted Jan 14, 2006 at 1:28 AM | Permalink

Martin:

Re: “Can you imagine an econometrician satisfying an editor with such a statement?”

First, I think the answer is “Yes.” Of course, that “Yes” is dependent on testing for serial correlation in other, (presumably) more robust ways or that the data sequence is arbitrary as in cross-sectional data. The latter, of course, does not, by definition, apply to chronologies. As to the former, one would usually expect an author to say something like, “We tested the influence of serial correlation on the test statistic with the X-Y-Z test [standard alternatives are Breusch-Godfrey and Ljung_Box which can handle multiple orders of autocorrelation and one could come up with better tests for non-linear models] and did not use, or hence report, Durbin-Watson test values.”

I agree. Normally an author would not actually provide the code or data, but would run the requested test and report the results. This is standard when a referee asks for a robustness check. So, asking for the code is not standard practice.

But, Mann’s response to the request is a bit bizarre. I would expect the response to be: “The R2 statistic is …, but this is unimportant for the following reasons.” But Mann doesn’t report the requested statistic. Instead he only argues that it is inappropriate or wrong. Simply courtesy for the reader requires that the question actually be answered.

Mann’s response to the inclusion/exclusion of bristelcones is also a bit bizarre. If I were the referee, I would expect the response to be “Yes, results are sensitive to inclusion of the bristlecones which is ok because …” rather than “but you can’t think about excluding the bristelcones, you are throwing out data.” Of COURSE the analysis without the bristlecones throws out some of the data. That is the whole point, to see which data is driving the results and which is not. It is highly informative to know which data series we shold focus on and which are peripheral. The reader can then consider whether flaws in the key series might undermine the results.