Mosher on the Provenance of Said et al 2008

Steve Mosher summarizes his reading of the provenance of section 1 of Said et al 2008 as follows (The other sections, the substantive portion of the work, are not the subject of allegations):

Reading wegmans mail and look at the texts it appears that there is something that Mashey misses. The SNA boilerplate material in the Wegman report was produce by Denise reeves. She is the plagiarist for the SNA material in the wegman report. The Wegman report was used by Sharabati in the first 31 pages of his dissertation Sharabati then wrote section 1 of Said, which is the paper that got retracted

Wikipedia -> Reeves(Wegman)–>Sharabati(dissertation) -> Said (Sharabati)

here is what Mashey writes about Sharabati taking from Reeves(Wegman report)

“The SNA introduction was about 5 pages of text in the WR, of which some came from Reeves, but may have been edited further. Said (2008), Sharabati(2008) and Rezazad(2009) used shorter extracts, pp.118-128. It is difficult for text to be both original work and standard ―boiler plate.

Here is the problem. In Sharabati(2008) the first 31 pages ARE boiler plate. It is explicitly NOT presented as his original work.

We know this by reading the dissertation and noting that he closes the first 31 pages by saying in the chapters that follow I present my original work

So, it’s not too difficult to see that in Sharabati’s dissertation the boiler plate is NOT presented as original work. what follows the first 31 pages is presented explicitly as his original work. One may have hoped that he use proper citation in the prefatory material for his original work. Had he cited the Wegman report as his source it would have been clearer. Also, when this work became section 1 of Said 08, he also face a problem. How could he cite his dissertation? Section 1 of Said 08, which was written by Sharabati, is word for word out of his dissertation. So, he makes up a cite for it. bad move.

to recap: Reeves takes material from Wikipedia and other sources and gives wegman a few pages of boilerplate ( with no cites) for the Wegman report. Wegman and company do not check this for plagairism. Sharabati writing his dissertation lifts from the wegman report for the prefatory material for his dissertation. he does not represent this as his original work. His original work he claims explicitly follows page 31. Later, Sharabati cribs from his own dissertation to write section 1 of Said 08.

The editor accepts the proposal to fix the problem in said 08 by including proper citation. He is over ruled and the article is retracted. Wegman is then tarred for plagiarism when

1. he did not write the section (section 1 of Said 08) where the plagiarism occured
2. Sharabati, borrowed from his own dissertation
3. The dissertation cribbed from the wegman report
4. Reeves wrote that section of the Wegman report

Weird.

I’ve examined Elsevier’s policies on how to handle plagiarism allegations and when retraction is an appropriate remedy. Also weird. More on this in an upcoming post.

162 Comments

  1. Willis Eschenbach
    Posted May 29, 2011 at 11:45 PM | Permalink

    Fascinating analysis, Mosh. More twists and turns than a Poirot novel.

    All the best,

    w.

    PS – typo in the first line, should be “looking at the texts”.

    • Posted May 30, 2011 at 12:56 AM | Permalink

      Re: Willis Eschenbach (May 29 23:45), I pull my hair out reading Mashey texts so I just go back to the original stuff and remember how we do this when we study author influence in literature.

      you work BACK to the source.

      when you look at it that way it becomes really clear.

      Said 08 section 1 is the section with the problem.

      Sharabati wrote that. The editor didnt check, Wegman didnt check, Said didnt check.

      Sharabati created section 1 directly from his own dissertation. Mashey had that text but I dont know if he checked it.

      IF he had read the whole thing ( its only 230 pages) he would have found the words on page 31

      The literature I present in the following chapters is solely my own work and represents
      my contribution to the field of network theory and analysis with the exception of few def-
      initions in which it was difficult to separate them from other material. Consequently, I
      preferred to keep these definitions that were named after scholars or previous defined in the
      science in context. The dissertation offers solutions to many of the problems encountered
      by analysts and researchers and many of the ideas that are presented in the dissertation
      can be used successfully in the fields of network theory, graph theory and matrix theory.

      Sharabati created the first few pages of his dissertation from the Wegman report and did not
      cite it. I’m not sure of the citation guidelines for dissertations and citing congressional
      testimony. Any way Sharabati thought that the wegman report was an original source.

      the 5 pages of the Wegman report that Sharabati lifted were produced by Reeves. She copied
      general definitions from Wikipedia and some primary sources.

      Its way easier to say that “Wegman plagiarized” than it is to say “Wegman gave the mundane task of creating
      throat clearing boiler plate to a grad student and she fumbled the ball”

  2. Posted May 30, 2011 at 12:03 AM | Permalink

    Well, the other thing from Mashey’s report, which you’ve based this on, is that Ms Reeve was asked to write a SNA summary because she had been sent on a one-week course. From Wegman’s email to Elsevier:

    “Denise worked (and still works) for Mitre Corporation. Her company sent her to take a short course on social network analysis from Kathleen Carley, a professor at Carnegie-Mellon University. Dr. Carley is an internationally recognized expert on social network analysis. When Denise returned from her short course at Carnegie-Mellon, I took her to be the most knowledgeable among us on social network analysis, and I asked her to write up a short description we could include in our summary. She provided that within a few days, which I of course took to be her original work”

    Wegman needed a stick to beat Mann with – couldn’t find anything new with statistics, so there you go. Ms Reeve, with a one-week course, was their expert.

    • Posted May 30, 2011 at 12:21 AM | Permalink

      Update – I thought Mashey said a one-week course, but I see now that the duration was not specified.

    • Posted May 30, 2011 at 12:32 AM | Permalink

      Re: Nick Stokes (May 30 00:03), Nick here is what I found interesting in this post

      Wegman and Said on social networks: More dubious scholarship

      mashey writes

      Indeed, as mentioned previously, the Said et al introduction is largely a condensation of the five-page Wegman et al section. But it starts out a little differently:

      A social network is an emerging tool frequently used on quantitative social science to understand how individuals or organizations are related. The basic mathematical structure for visualizing the social network is a graph. A graph is a pair (V ,E) where V is a set of nodes or vertices and E is a set of edges or links.

      The first part is barely comprehensible English and so is probably from the authors;

      The first paragraph of Said 08 has such bad english that he thinks it comes from the authors (non native speakers)

      Then in his side by side analysis

      Click to access said-et-al-social-networks1.pdf

      Mashey looks at this first paragraph and does NOT look for a source???

      WHY? he writes

      [General definition, no antcedent sought]

      He doesnt look for a source BECAUSE he considers it to be a general definition. As if plagiarism doesnt apply

      BUT, if he checked he would have found this paragraph in Sharabati’s dissertation. he would have found
      that section one is lifted word for word. And he had the dissertation.

      Here is the bottom line

      We now can trace the provenance of the Said 2008 Section 1.

      Reeves takes boiler plate material from Wikipedia and other sources. She creates a document and gives
      it to Wegman. She does not cite any of her sources. There is no clear guide for preparing a report for congress
      ( like a citation guide) Wegman and company take her text and assume that there are no issues. Much like the editors of a journal dont check.

      Later that material gets used by Sharabati in his dissertation and then he reuses it in Said 08.

      Now that we understand the flow its becomes clear why people want to make it simpler by saying that
      Wegman plagiarized. the truth is a lot messier

      rather than confront that truth you want to distract us with Reeves training. wrong. Rigsby is the person you want to look at for the mathematical approach. Reeves was assigned bolierplate. She screwed it up.

      I’ll fault Mashey for being obtuse and misleading about the source of section 1 of Said.

      the source is clear: Shabaratis dissertation. Thats clear form the text and from Wegman’s mail.

    • Posted May 30, 2011 at 12:38 AM | Permalink

      Re: Nick Stokes (May 30 00:03), SNA does NOT beat up Mann. get the argument straight

      Wegman is looking for an argument to bolster his contention that errors are more likely to enter publication IF the reviewers authors are related in a social network.

      He could not prove this. so his work is only suggestive. It doesnt attack Mann, it seeks to explain how errors can propagate and persist. Then again it could caused by a collective brain fart

      • Posted May 30, 2011 at 1:22 AM | Permalink

        Re: steven mosher (May 30 00:38),
        “SNA does NOT beat up Mann.”
        In the Wegman Report, it does. As you say, W has nothing but suspicion about actual reviews, padded out with graphs with black squares (and Wiki). But it gives him the chance to talk a lot about cliques and allegiances, claiming that these are technical terms. Clique is a ring-in from graph theory, and no, it doesn’t mean what Congress thinks it means. But so there is wisdom there like:
        “Discussion: Several other interesting details emerge. The clique Claussen-Ganopolski-Brovkin-Kubatzki-Bauer is completely isolated from the other researchers in the area. Similarly, the Ribe-Gimeno-Garcia-Herrera-Gallego clique and the Arbarbanel-Rabinovich-Tsimning-Huerta-Gibb clique are nearly isolated with only linkages to Mann in the first case and linkages to Mann and Lall in the second case.”

        That’s what SNA can do for you. Hi-tech gossip.

        ps Reeves’ course was one week – vergano said that, not Mashey.

        • Posted May 30, 2011 at 1:50 AM | Permalink

          Re: Nick Stokes (May 30 01:22), SNA beats up Mann?

          Evidence?

          Mann’s math is beat up by the stats.
          peer review is brought under suspicion by SNA. In the Q&A wegman is clear that he
          cannot prove the case without the author’s names. The SNA of WR is presented as
          some suspicions. Are they followed up by recommendations to congress about peer review? hmm. no. the recommendations are directed at policy documents.

          Recommendation 1. Especially when massive amounts of public monies and human lives are at stake, academic work should have a more intense level of scrutiny and review. It is especially the case that authors of policy-related documents like the IPCC report, Climate Change 2001: The Scientific Basis, should not be the same people as those that constructed the academic papers.

          do we want to reject this recomendation? I hardly think so. Neither did Overpeck
          who agreed with this position in the climatehate mails.

          yes it was a one week class. Probably enough education to write some boiler plate
          and neglect the cites. Clearly, nothing in the contrarian position depends upon the SNA. the mistakes in contrarianism have other roots. The idiots who refuse to believe in AGW don’t do so because of SNA.

        • Posted May 30, 2011 at 3:09 AM | Permalink

          Re: steven mosher (May 30 01:50),
          Evidence?
          The evidence is the report. Make of it what you will. There’s lots more like I quoted.

          Peer review isn’t brought under suspicion by SNA. That just proves that people wrote joint papers. We knew that. W’s definition of a clique is satisfied by any set of authors who wrote a paper. The O’Donnell-Lewis-McIntyre-Condon clique wrote O10. Wow! But the WR SNA is clique-clique-clique. Says nothing at all about peer review. That may be imperfect, but this SNA adds nothing to our knowledge of it.

          Ab odd feature of the WR was the disconnect between body and conclusions. They rarely interact. So yes, SNA didn’t feature in the conclusions. It had done its job.

          Denise’s one-week class – well, remember Wegman:

          I took her to be the most knowledgeable among us on social network analysis
          So after that boilerplate, it’s all downhill for SNA.

        • Posted May 30, 2011 at 6:17 AM | Permalink

          > The evidence is in the report.

          Without speculating on the nature of the evidence for now, as speculation can be dangerous, there might also be some evidence in the Hearings’ themselves:

          Click to access BartonHearingsJUL2006.pdf

          (The original resource seems unavailable this morning.)

          For instance, here is one paragraph from Wegman’s testimony, with our emphasis:

          > Because of this apparent isolation, **we** decided to attempt to understand the paleoclimate community by exploring the social network of authorships in the temperature reconstruction area. **We** found that at least 43 authors have direct ties to Dr. Mann–and this should be figure 6, please; thank you–have direct ties to Dr. Mann by virtue of coauthored papers with him. **Our** findings from this analysis suggest that authors in this area of the relatively narrow field of paleoclimate studies are closely connected. **Dr. Mann has an unusually large reach in terms of influence.** He is the coauthor with every one of these people which are indicated by the black edge borders on the top and the side of this graph. In particular, he has a close connection with Drs. Jones, Bradley, Hughes, Briffa, Rutherford, and Osborne and those are indicated by the solid block on the upper left-hand corner.

        • Steve McIntyre
          Posted May 30, 2011 at 6:30 AM | Permalink

          In particular, he has a close connection with Drs. Jones, Bradley, Hughes, Briffa, Rutherford, and Osborne and those are indicated by the solid block on the upper left-hand corner.

          Are you suggesting that this assertion is incorrect? Pub-leeze.

          A point also being overlooked in all of this is the non-independence of authorship of hockey stick articles – something that deserves its own analysis. Without using the language of social networks analysis, I observed early on that, in addition to the serial re-use of the same proxies, the so-called “independent” studies did not have independent authors. Mann and Jones 2003 is not “independent” of Jones et al 1998 or Mann et al 1998 (or for that matter, Briffa et al 1998).

        • Posted May 30, 2011 at 6:45 AM | Permalink

          > Are you suggesting that this assertion is incorrect? Pub-leeze.

          Certainly not. Usually, correct informations are more useful to “beat up” on someone.

          But I am suggesting that Wegman is owning the SNA in his testimony. So whoever wrote which part the report is irrelevant to the intellectual responsiblity.

          > A point also being overlooked in all of this is the non-independence of authorship of hockey stick articles – something that deserves its own analysis.

          Indeed. We can also ask ourselves if Mann could be the captain of the Kyoto Flames, or Blades, or Heat:

          http://web.archive.org/web/20050222224714/www.climate2003.com/blog/hockey_team.htm

        • Posted May 30, 2011 at 6:08 PM | Permalink

          Re: willard (May 30 06:45), I don’t think you will find many veterans of this discussion who would argue that Wegman does not bear final responsibility for the preparation of the wegman report.

          The evidence is that he selected Reese and Reese provided material without citation. Its unclear what bibliographical guidelines there are for presentations to Congress. Perhaps Reese thought she didnt have to footnote.
          We can hold wegman responsible for selecting her, we can hold him responsible for failing to give her proper guidance. We can hold him responsible for failing to check her work. Those failings are clear and indisputable in my mind

          That doesnt make him guilty of plagiarism. In the same way I argued with people who thought Jones was guilty of fraud I have to argue with people who think that Wegman copied text and promoted it as his original work. When argued that Jones was sloppy and not corrupt, I think that was accurate and fair. I’d say the same about Wegman here. With Hide the decline I suggested fixing the problem via errata. I’d suggest the same here. Punishments should be proportional to the crime. Nothing in the science changes.

          So, its clear to me that wegman did not plagiarize.
          His real failings lie elsewhere. If people wanted to focus on the real failings, then the discussion would be much different.

          It would center around using grad students, puffing up their publication records, etc. Lets take Sharabati. In mosher’s universe the work he did for Said 2008 was NOT worthy of a co authorship. Somebody who writes the boiler plate for a paper shouldn’t have his publication record puffed up.

          Of course Sharabati was a pet student, we all know the pet students get to publish papers with the star professor.

        • Posted Jun 1, 2011 at 12:50 PM | Permalink

          > The evidence is that he selected Reese and Reese provided material without citation.

          This is certainly not the same evidence that was asked by Moshpit to Nick Stokes above, about his claim that Wegman used is SNA to beat up the Kyoto Flames and its captain.

          Nick Stokes cited the Wegman Report as evidence. I offered to take also into account Barton Hearings. In the transcripts of the Barton Hearings, we can read:

          > Dr. Mann has an unusually large reach in terms of influence.

          We can also read Barton (does he have a clique too?) himself saying:

          > Our central question is: Can we count on hockey stick studies? That answer from Dr. Wegman and his panel appears to be, “No.” And it doesn’t appear to be a matter of overlooking the researchers’ written caveats about their particular work; rather, the Wegman panel has identified a fundamental error of methodology. If that finding holds up, it will highlight a mistake that lay dormant for years as a closed network of supportive colleagues saw and heard what it wanted.

          Rather than conceding even this seemingly indisputable point made by Nick Stokes, we get some highsticks (spitball not being thematic), like citing a random paragraph from the Wegman Report, shifting topics, and equivocating “to copy” and “to plagiarize”.

          This will not erase this evidence. And if these quotes are not enough to substantiate the claim that Wegman is using SNA as a stick to cross-check the captain of the Kyoto Flames, perhaps we should extend bender’s advice to the Barton Hearings.

        • Posted Jun 2, 2011 at 8:50 PM | Permalink

          No Steve, the fact that Wegman was shocked, shocked in 2006 that Michael Mann had close co-author relations with Scott Rutherford, his doctoral student and Ray B his post-doctoral advisor is one of the best gambling going on here moments in climate science history

          PUlleze

        • Posted May 30, 2011 at 5:48 PM | Permalink

          Re: willard (May 30 06:17), willard. nothing in that paragraph can be construed as an attack on Mann or his science.

          One of the interesting questions associated with the ‘hockey stick controversy’ are the relationships among the authors and consequently how confident one can be in the peer review process. In particular, if there is a tight relationship among the authors and there are not a large number of individuals engaged in a particular topic area, then one may suspect that the peer review process does not fully vet papers before they are published.
          Indeed, a common practice among associate editors for scholarly journals is to look in the list of references for a submitted paper to see who else is writing in a given area and thus who might legitimately be called on to provide knowledgeable peer review. Of course, if a given discipline area is small and the authors in the area are tightly coupled, then this process is likely to turn up very sympathetic referees. These referees may have co- authored other papers with a given author. They may believe they know that author’s other writings well enough that errors can continue to propagate and indeed be reinforced.

          In order to answer such questions about the relationships among authors in the area of temperature reconstructions, we developed two datasets.

          Nick’s contention was that the SNA was used to attack Mann

          Wegman needed a stick to beat Mann with – couldn’t find anything new with statistics, so there you go

          A plain reading of the text and an understanding of logic should show you that
          Wegman was up to two things in the report.

          1. Attacking the science of mann by looking at the stats.
          2. TRYING TO EXPLAIN how such mistakes can propagate.

          the SNA is related too the latter. the SNA does not attack mann the person or his science. It is aimed at the PEER REVIEW SYSTEM

          Simply, if you want to attack wegmans work, I have no issue with that, but get the argument right. The HS work is directed at Mann. the SNA work is directed at the peer review SYSTEM. Mann’s work is used as an example. Wegman does not conclude that mann has any fault in this system. Wegman doesnt argue that the HS is invalid BECAUSE of the system.

          The math shows the problems with the HS, the SNA PURPORTS to explain how such a mistake can get into the system.

        • Posted May 30, 2011 at 10:11 PM | Permalink

          > [A]n understanding of logic should show you that Wegman was up to two things in the report. […]

          A plain reading of my comment above should show that it was about the Hearings, not about the Wegman report.

          > [N]othing in that paragraph [see Moshpit’s quote of the “interesting questions”] can be construed as an attack on Mann or his science.

          A plain reading of that section should show that the “interesting questions” quoted by Moshpit are immediately followed by this: “In order to answer such questions **about the relationships among authors in the area of temperature reconstructions** […]” (our emphasis).

          > the SNA work is directed at the peer review SYSTEM.

          A plain reading of the “discussions” of the SNA section of the Report should suffice to show that Wegman might have instead directed his SNA work at the “Team”, and more centrally its captain.

          An understanding of logic should show that Mosphit’s argument has no merit and that his claim in no way refutes Nick Stokes’s.

        • Posted May 30, 2011 at 11:48 PM | Permalink

          Re: willard (May 30 22:11), On the contrary

          the SNA analysis adds Nothing to to the criticism of mann’s science. NOTHING,
          it is not used as a stick to beat Mann.
          the SNA analysis is used to asses Peer review. Mann has nothing to do with how peer review is set up. If I’m reviewing the umpire system and I look at tapes
          of Gaylord Perry pitching, to asses the umpire system, you would be hard pressed to say I was beating up Gaylord Perry.

          The best evidence that SNA is NOT used to beat up Mann is the fact that none of us who criticize him for things like decentered PCA even read or care about the SNA stuff. It’s a rather weak critcism of peer review. the better case the case we argue comes from the mails.

          but we can end this simply. which paper of mann’s does the SNA analysis refute? which statistical method does it impugn? what character defect does it expose. None. the SNA analysis, thin and feeble as it is, is not directed at mann the scientist or mann the person. It’s findings are not turned into recommendations.

          In any case who wrote section 1 of Said 08?
          where did he copy that from?

          Is wegman guilty of plagiarism? does the evidence show that he personally copied material and presented it as his own original work?

          or are the real facts a lot more complicated.

          Note I do not excuse him of responsibility.

        • Posted May 31, 2011 at 1:29 PM | Permalink

          > the SNA analysis adds Nothing to to the criticism of mann’s science [sic.].

          There is no need to “add”. It just is something else, and something relatively new. It still looks like a stick to beat Mann, at least to Nick Stokes, and perhaps also to whoever read plainly the Wegman Report and the Barton Hearings.

          > It’s findings are not turned into recommendations.

          This might be demonstrably false, as three of the four recommendations are related to the “discussions” of the SNA section in one way or another.

          An interesting question associated with our current controversy is the relationship between the authors’s knowledge and the content of the resources they cite, and consequently how confident one can be in the hypothesis that Wegman knew that the Team could have been named the Kyoto Flames.

          A plain reading of the introduction of the Wegman Report shows that Wegman’s clique of resources included CA and its ancestor:

          > The discussion and evaluation of the use of PCA to some extent has degenerated in to the battle of competing web blogs: http://www.climateaudit.org, http://www.climate2003.org, and http://www.realclimate.org.

          An understanding of logic should show that if Wegman consulted the resources he cited, he could have known that the Team might have been called the Kyoto Flames.

          But it’s tough to be sure. So little time. So much to do.

        • steven mosher
          Posted May 31, 2011 at 3:27 PM | Permalink

          I’ll note that you addressed none of my arguments and answered none of the important questions. I award you no points, may god have mercy on your soul. The hilarious thing is that the only recommendation that comes close to commenting on the peer review process is something that Overpeck himself argued for. And as you well know he would never accept a recommendation based on Wegman’s faulty SNA analysis.

        • Posted May 31, 2011 at 9:13 PM | Permalink

          The only relevant arguments and questions in this subthread are related to the sources of evidence for Nick Stokes’s claim that Wegman’s SNA is a stick used to beat up the Kyoto Flames and its captain. The arguments that Moshpit offered against Stokes’s claim has been shown to have no merit.

          A plain reading of the Wegman Report should allow anyone who followed bender’s advice to see that the relationship between the recommendations and the conclusions deserves due diligence.

        • zinfan94
          Posted Jun 5, 2011 at 2:49 PM | Permalink

          I am really confused by your claim that Denise Reeves “… is the plagiarist for the SNA material in the wegman report”.

          She wasn’t listed as an author, so how can she be the plagiarist?

          Furthermore, it appears that most of the original work in the Wegman Report was actually done by Denise Reeves. The SNA analysis was done by her, even if this work was poorly done. The rest of the WR was mostly assembled from unattributed work of others, or “fake” work that was supposed to replicate the statistical analysis. The statistical analysis is apparently simply a summary of work done by McIntyre. Much of the rest of the report was plagiarized from Bradley and others, and doctored up to suit Wegman.

          It appears to me, that most of the original work in the WR was done by Denise Reeves! Do I understand this correctly?

        • Posted May 30, 2011 at 5:36 PM | Permalink

          Re: Nick Stokes (May 30 03:09),

          “Peer review isn’t brought under suspicion by SNA.”

          Let me see if I can be clearer. The case Wegman makes is pretty tame. as he notes in a small field where authors and co authors review each other work there is a prima facia case for suspicion. Wegman tries to put some math on that. That math certainly doesnt weaken the suspicion.

          But I see that you have retreated from the claim that SNA is used to attack MANN.

          it’s not. Its USED to add credence to a suspicion. it certainly doesnt detract from the suspicion.

          I hope thats clear because its incontrovertible that the SNA does not attack Mann the individual or his work.

        • Posted May 31, 2011 at 2:56 AM | Permalink

          Re: steven mosher (May 30 17:36),
          There’s no explanatory role for the SNA stuff that makes sense at all. The omly statistical issue that the WR talks about is the use of the wrong average for centering. So the pretext for SNA is – how could that have got past review? So W sets up a hullabaloo about how scientists wrote papers together.

          But in fact there is no mystery about how it got past review. One reason is that it has little practical effect, which makes it hard for an observer to spot. Wegman himself calls it
          “a simple seemingly innocuous and somewhat obscure calibration assumption.”

          But the main one is that it’s a programming error. You can’t find it without getting into the code (MM05 GRL). Reviewers rarely do that in any field of science. In fact, M&M didn’t notice it in their first 2003 auditing. So that’s a perfectly adequate explanation for it passing the reviewers.

          So if there is no reasonable expectation that reviewers would find the decentering, and SNA can’t help us with a review issue, and isn’t referenced in the recommendations, then what is it there for? It is directly ad hominem. It doesn’t talk about Mann’s science. It creates a conspiracy. All those scientists breathing together. A clique – yes that’s it. And Mann is at the centre. Big diagrams with all the arrows pointing at Mann.

        • Posted May 31, 2011 at 10:40 AM | Permalink

          Re: Nick Stokes (May 31 02:56),

          It doesn’t talk about Mann’s science. It creates a conspiracy. All those scientists breathing together. A clique – yes that’s it. And Mann is at the centre. Big diagrams with all the arrows pointing at Mann

          Ok NOW we are getting somewhere.

          You believe that a network diagram that has Mann and others in it is evidence
          that Wegman thinks there is a conspiracy? I dont think Ive ever seen any of the nutjobs who think there is a conspiracy in climate science cite Said 2008
          or the Wegman report and make this point about the diagrams. They usually go directly the climategategate mails for better evidence of conspiracy.

          And you know I’ve seen Mann defend himself against all sorts of charges. He has a very thin skin . let me look for how he reacted to the Wegman SNA material at the time it was written.. seriously, I havent seen how he took that material at the time. “you know how mike is” so it would be interesting to see if he took SNA as a personal attack on him. and if he construed these charts as you do.

        • Posted Jun 1, 2011 at 5:13 PM | Permalink

          Steven,
          I did a little research into the usage of the word clique by the Wegman clique. Firstly from the Wegman report we have

          “Mann, Rutherford, Jones, Osborn, Briffa, Bradley and Hughes form a clique”
          “the Mann-Rutherford-Jones-Osborn-Briffa-Bradley-Hughes clique”
          “The Mann-Briffa-Hughes-Bradley-Rutherford clique”
          “the Gareth Jones-Allen-Parker-Davies-Stott clique”
          “The clique Claussen-Ganopolski-Brovkin-Kubatzki-Bauer”
          “the Ribe-Gimeno-Garcia-Herrera-Gallego clique”
          “the Arbarbanel-Rabinovich-Tsimning-Huerta-Gibb clique”

          What does this remind me of? I looked up some boilerplate NoKo propaganda:
          “the bidding of the war criminal Bush clique”
          “the war criminal Bush warmongering capitalist clique”
          “the American Militarist Bush Junta Clique “

          OK, they lay on the adjectives a bit thick. But clique does not sound good.

          Then I went to the Stupak questioning of Wegman. Here, perhaps stung by this sort of criticism, W had his students do his own network analysis in the same math style. So they talked about cliques too. But never in the same way, only in a detached mathematical way. In fact, the only time it came down to individuals, they handled it like this:
          “clique number 11 consists of the nodes (Wegman, Solka, Bryant), clique number 9 consists of the coauthors (Wegman, Solka, W. Martinez, Reid), clique number 2 consists of the actors (Wegman, Solka, W. Martinez, Marchette, Priebe)”

        • Posted May 31, 2011 at 8:16 PM | Permalink

          And the authors of MPH98 and related papers have yet to acknowledge the error, to my knowledge, or to give credit where credit is due.

        • Posted May 30, 2011 at 9:45 AM | Permalink

          “climatehate mails.”

          Freudian slip?

          How do we know Freud even wore a slip?

    • Keith W.
      Posted May 30, 2011 at 1:51 AM | Permalink

      Also, Nick, Wegman did not say expert, just most knowledgeable. All that implies is that she was the person involved with the report with the most knowledge about social network analysis. That to me means that she was the person with the most familiarity with the background texts pertaining to the subject.

      If I’m writing a report on widgets, and I want to give some details about the manufacturing process, and I have as a co-writer someone who has worked in a similar factory or has more recent information about the process, it makes sense to turn that section of writing over to them. I place my trust in them that what they write will either be original or be properly cited with regard to references. I do expect them to go to a better source that Wikipedia if there is any importance connected to the paper.

      Wegman trusted someone to do a professional job on a subsection of a paper being presented to Congress. All he was wanting was basic information to demonstrate a supposition, not the solution to the Riemann Hypothesis. He knew that he did not have evidence that conclusively proved that Team members had coddled the papers of fellow Team members in the review process. The only evidence that would do that would be if journals involved released the reviews and the names of the reviewers, which they were not going to do.

      But there were patterns, and papers where statistical errors had been made had been published. Statistical errors that review by a statistician would have found and either blocked publication until corrections were made or led to a retraction of the paper prior to publication.. So, the review process, which was being used to vindicate erroneous papers, was suspect. Why were bad papers getting through, while the journals said the papers had been reviewed by qualified experts and peers? Might it not have been favoritism?

      One person fell down on producing the proper citations on an analysis piece that was not the major emphasis of the Wegman report. It was secondary analysis, suggesting that a body with the power to require the journals provide the evidence of who reviewed what might look to see if there had been nefarious deeds. Congress did not act on the situation, and until Climategate, no evidence was available that this supposition was anything more than a supposition. No one cared about the minor peccadillo when its argument could not be proven. But with the appearance of evidence, it has come under fire trying to negate the entire impact of the Wegman report. Because, if this one part is true, than the rest is true as well. And that discredits the journals across the board.

      • Posted May 30, 2011 at 6:47 AM | Permalink

        Re: Keith W. (May 30 01:51),
        Wegman seems to have a knack for self-refutation. In a report where he berates climate scientists for not engaging statisticians (“apparently no independent statistical expertise was sought or used”), he presents a social network analysis where his “most knowledgeable” person has been to a one-week short course. And doesn’t even name her as coauthor.

        • Tom Gray
          Posted May 30, 2011 at 7:48 AM | Permalink

          The world is faced with potential catastrophe and this is the level of debate

          Is
          Is not
          Is so
          You did it
          You did it too

          Why don’t we just use a variant derived from Huffman encoding

          So

          Is = 0
          Is not = 1
          Is so = 10
          You did it = 1

          Climate science particular arguments could be given their own Huffman code

          So bandwidth and storage for blog threads could be compressed to

          00 01 11 101 110 10 01 00 01 10 01 10 01 10 01 10 to infinity

          On the rare occasion when a new argument is generated a body like IANA could be used to generate a new Huffman code for it

        • Posted May 30, 2011 at 7:53 AM | Permalink

          Re: Tom Gray (May 30 07:48),

          I agree. The people who are helping are the climate scientists. And I think it would be good if we just let them get on with it.

        • Tom Gray
          Posted May 30, 2011 at 8:35 AM | Permalink

          You’ve got to be kidding. Calling people “shills”, “flat earthers”, “deniers” and worse is helping?

          Climate scientists should stick to what they know and not open their mouths about things that they do not. They have poisoned the well on this issue and should hang their heads in shame for that.

          There is potential for major environmental disruption. There is potential for major economic disruption. In the depression of the 30s, people starved while food was left to rot or dumped into the sea. Certain climate scientists respond to a potential for a repeat of that by calling people nasty names. No, they should not get on with that.

          That the issue has degenerated into name calling between sides and that one side is as bad as the other does not excuse the behavior of certain climate scientists. Contra their own opinion, they are not smarter than everybody else combined three times over.

        • Posted May 30, 2011 at 3:19 PM | Permalink

          Re: Tom Gray (May 30 08:35),
          “not open their mouths about things that they do not.”

          I don’t think they have been vocal on that – I think your examples are from private emails which people, not in a friendly spirit, decided to publish. If what they say to each other in private distresses you, don’t read.

        • Posted May 30, 2011 at 9:25 PM | Permalink

          Re: Nick Stokes (May 30 15:19), It distresses me that they would write to potential reviewers of McIntyre and call him a fraud. in private. It is one thing to do that in public where the man could defend himself, but to privately trash him to people who may be asked to review him goes a bit too far. Personally, if I’m not willing to say it in public I dont say it in private. Somebody should ask these guys if they would like to repeat those charges in public where the consequences would fall more heavily on the person making the charge of fraud, rather than the victim of it.

        • Posted May 30, 2011 at 11:47 PM | Permalink

          “Somebody should ask these guys if they would like to repeat those charges in public”

          Someone didn’t ask.

        • Posted May 31, 2011 at 10:23 AM | Permalink

          Re: Nick Stokes (May 30 23:47),

          Sorry Nick. There are great many things I regret not doing in the book. 200 pages in 30 days. Many many things got left on the cutting room floor.
          For the most part I stayed away from the majority Mann’s nasty behavior toward steve. In fact I didnt even discuss these issues in the book as they were not central to the story about FOIA, which is the story that I asked Revkin to follow. Its really the only story that matters to me. So, It would have been rather out of place of me to ask Mann to say repeat these charges publically, when I didnt discuss them in depth or at all in the book.

        • Brian B
          Posted May 30, 2011 at 10:00 AM | Permalink

          –The people who are helping are the climate scientists. And I think it would be good if we just let them get on with it.–

          Indeed. Heaven knows the Team has sedulously earned the trust to work without any grown-ups looking over their shoulders.
          Why should we bother their important work by having someone other than their friends or fellow self confirming activists checking their work?

        • Posted May 30, 2011 at 10:02 AM | Permalink

          I have no problem letting them get on with it, as long as they show their work, to everyone.

        • Posted May 30, 2011 at 11:17 AM | Permalink

          Re: Tom Gray (May 30 07:48), I like this.

          Maybe some climate scientists will realize that blog wars are a dance and they will get back to the business of science and stay out of politics.

        • Steve Fitzpatrick
          Posted May 31, 2011 at 8:16 AM | Permalink

          I don’t think that is a realistic possibility. If there is anything the UEA emails showed, it is that those involved are very highly motivated in a political sense, and want to bring about substantial reduction in fossil fuel use, ASAP. It is their overriding concern; it is the Team’s primary mission….. save the Earth form mankind. The convolution of the scientific process with political considerations is everywhere evident. I just don’t think the “political climate” within climate science is going to change any time in the foreseeable future, because most of those who enter the field and who are promoted within the field hold common political views and personal values (strongly green, strongly left).

        • DEEBEE
          Posted May 30, 2011 at 12:35 PM | Permalink

          Nick, how was that self refutation? Stipulating your juxtaposition of the two “facts”, you could accuse him of being inconsistent, hypocrite; if Reese was wrong. It still does not refute his criticism of climate “scientists” engaging statisticians — given the kindergarten mistakes by the team even a one week course in stat would have been a heaven sent.

          Sorry I “self-refute” and correct the above to at least a one week course in scientific integrity.

        • Michael Smith
          Posted May 31, 2011 at 5:29 AM | Permalink

          Nick Stokes wrote:

          “Wegman seems to have a knack for self-refutation. In a report where he berates climate scientists for not engaging statisticians (“apparently no independent statistical expertise was sought or used”), he presents a social network analysis where his “most knowledgeable” person has been to a one-week short course.”

          Your analogy is preposterous. The fact that someone, Wegman, criticizes scientists for failing to consult certain experts — in ONE situation — does not in any way obligate that person, Wegman, to ALWAYS and CONSISTENTLY consult a similar expert regarding EVERYTHING he says, does or writes.

          Climate scientists are presenting findings that hordes of leftist politicians and environmental activists are invoking to demand an at-gunpoint, forced, global-scale reduction in mankind’s standard of living — findings which depend crucially in many cases on the use of advanced statistics, a field of study in which many of these scientists are not experts and are, in some cases, clearly ignorant. Wegman — by contrast — issued a report on his findings regarding one specific statistical dispute in climate science and chose to use a lightly-trained person to author boilerplate on a minor section regarding SNA because he thought it, to some extent, explained the propagation of errors in the publications in that field. The two situations are not remotely comparable. Your attempt to invoke the latter as proof of “self-refutation” of Wegman’s point regarding the former only betrays your desperation to find SOMETHING, ANYTHING, no matter how thin it may be ridiculously stretched, to discredit a statistical analysis that confirmed M&M’s findings and exposed Mann’s errors.

          What’s MORE, even if Wegman’s failure to consult an expert on SNA WAS, in fact, an error of the same magnitude and importance as climate scientist’s failure to consult experts on statistics, that error would not “self-refute” anything. One error does not justify another; the hypocrisy of a critic justifies a criticism of his character, but does not constitute an argument against the critic’s argument. See the logical fallacy of ad hominem for elaboration.

        • pete
          Posted May 31, 2011 at 5:42 AM | Permalink

          Your attempt to invoke the latter as proof of “self-refutation” of Wegman’s point regarding the former only betrays your desperation to find SOMETHING, ANYTHING, no matter how thin it may be ridiculously stretched, to discredit a statistical analysis that confirmed M&M’s findings and exposed Mann’s errors.

          A lot of people seem to be under the impression that the SNA plagiarism is the only thing Deep Climate has found wrong with the Wegman Report.

          You seem to think that Wegman did some sort of statistical analysis that confirmed MM05. All he did was re-run Steve’s code, cherry-picking and all, and then confusing the issue by describing Steve’s fractional arima “persistent red noise” as “AR(1) with parameter 0.2”.

          There is a pattern throughout the report of plagiarised material, slightly degraded to obscure its source and mangle its meaning. This includes the “analysis” of short-centered PCA, the supposed centerpiece of the report.

        • Posted May 31, 2011 at 10:14 AM | Permalink

          Re: pete (May 31 05:42),

          “A lot of people seem to be under the impression that the SNA plagiarism is the only thing Deep Climate has found wrong with the Wegman Report.”

          I hardly think that. part of the problem is the continual confusion of the story by people like Mashey and Dave Clarke (DC). The story on the plagiarism in the SNA portion is pretty clear, (who did what). if they just told it simply and clearly.

          If they didnt try to over charge the plagiarism case ( Like skeptics overcharged the fraud case against Jones) then they could move on clearly to the Stats issues.

          But as long as they say things like “the said paper was a cornerstone of scepticism” “Wegman copied wikipedia” as long as they do those things, the things that go just over the edge, the things that play to the choir, you’ll never get traction on the stats issues. never.

          Its those little twists that create so many of the kerfuffles. Why look, Nick stokes, willard, and I are arguing about whether or not the SNA analysis beat mann with a stick.

          Silly. Why from their position, they really should want to talk about the stats works. But instead they spend time arguing about trivialities over a statement in a comment. Weird. It’s diffusing their radical potential.

        • Posted May 31, 2011 at 3:02 PM | Permalink

          Steven,
          pete is not referring to the Mashey report, but to this more recent post by DC. It us indeed devastating. Fig 4.1, from M&M2005 GRL, purports to show a hockey-stick generated from red noise, which is compared with the MBH plot. It doesn’t say how that instance of noise was chosen, but there’s a clear implication that it is at least typical. Well, it seems it was indeed a random choice, but from a subset of 100 out of 10000 runs. And those 100 were chosen for their prominent “hockey-stick index”.

          Likewise Fig 4.4 purports to show HS plots from red noise. The caption starts impressively:
          “One of the most compelling illustrations that McIntyre and McKitrick have produced is created by feeding red noise [AR(1) with parameter = 0.2] into the MBH algorithm.”
          But again, what you see is not a random sample of results. It’s a random sample (12) of the top 100 of 10000, chosen for hockey-stick index, as per M&M code. And the description of the noise source is quite wrong. The “persistent” character (ARFIMA) is important – the NAS report with AR(1) needed to go to very high autocorrelation to achieve the same effect.

          Steve- Nick, you’re arguing points that even Wahl and Ammann knew not to argue. That the MBH algorithm mined for hockey stick shapes was also confirmed by the NAS panel. They reported:

          McIntyre and McKitrick (2003) [actually 2005]demonstrated that under some conditions, the leading principal component can exhibit a spurious trendlike appearance, which could then lead to a spurious trend in the proxy-based reconstruction. To see how this can happen, suppose that instead of proxy climate data, one simply used a random sample of autocorrelated time series that did not contain a coherent signal. If these simulated proxies are standardized as anomalies with respect to a calibration period and used to form principal components, the first component tends to exhibit a trend, even though the proxies themselves have no common trend. Essentially, the first component tends to capture those proxies that, by chance, show different values between
          the calibration period and the remainder of the data. If this component is used by itself or in conjunction with a small number of unaffected components to perform reconstruction, the resulting temperature reconstruction may exhibit a trend, even though the individual proxies do not. Figure 9-2 shows the result of a simple simulation along the lines of McIntyre and McKitrick (2003) (the computer code appears in Appendix B). In each simulation, 50 autocorrelated time series of length 600 were constructed, with no coherent signal. Each was centered at the mean of its last 100 values, and the first principal component was found. The
          figure shows the first components from five such simulations overlaid. Principal components have an arbitrary sign, which was chosen here to make the last 100 values higher on average than the remainder.

          They illustrated with the following figure, the scale of which is determined by being a principal component of length 581 – a point not understood by some critics. *The scale of the MBH PC1 is equivalent. The PCs are rescaled before regression (though the rescaling is redundant to the regression).

          It is foolish to deny the data mining of the Mannian PC method. In the case at hand, it considered the bristlecones as the “domininant pattern of variance” though they were only a local pattern of Graybill strip bark chronologies.

          They illustra

        • Posted May 31, 2011 at 5:35 PM | Permalink

          Steve,
          I can’t see what that extract from the NAS report has to do with Wegman’s use of rigged examples of red noise simulations in Figs 4.1 and 4.4?

        • pete
          Posted May 31, 2011 at 5:51 PM | Permalink

          Steve, the point isn’t whether or not short-centered PCA mines for high HSI. The point is Wegman didn’t check. He just re-ran your code (which cherry-picks the 1% of PC1s with highest HSI). If he’d actually had a proper look at the issue he wouldn’t have labelled your fractional-arima-plus-cherry-pick result as “AR(1) with parameter = 0.2”.

        • Steve McIntyre
          Posted May 31, 2011 at 6:52 PM | Permalink

          I’ve commented on the lack of due diligence by academic inquiries in the past. After saying that strip bark chronologies should be avoided in reconstructions, the NAS panel illustration used reconstructions dependent on strip bark chronologies. I asked North about that in an online seminar and he had no answer – he said that my questions were always tough. In a seminar at Texas A&M, North said that his inquiry just “winged” it. The figure showing the distribution of the HSI index is from the full sample. Mann’s claim was that his reconstruction was “99% significant” – whatever that means. Our article did not say that all simulations generated a HS pattern. [Note (Sep 23, 2014): it said that simulations from networks with the persistence properties of the North American network “nearly always” generated HS-shaped PC1s; in the running text, observing that a 1-sigma HS occurred over 99% of the time.] The point was that MBH operations applied to red noise could generate high HSI-index results.

          In a real data set, Mann’s data mining algorithm picked out Graybill bristlecone chronologies in the North American network and moved them into a much higher PC – which we reported in our articles. The Graybill chronologies were known beforehand as problematic as we observed at the time and the NAS panel recommended that they be ‘avoided”.

        • Posted May 31, 2011 at 9:07 PM | Permalink

          McIntyre yor plot shows some sort of hockey stick. But is the vertical axis relevant. If it is then the amplitude is abot 1/10 of the required figure. I.E. it is in the noise.

          Steve; the sum of squares of a principal component from singular value decomposition is 1. For a series of length 581, this determines the scale on the y-axis. The MBH PC1 has precisely the same scale. In hte subsequent regression step, the PC1 is (in effect) rescaled to temperature. Don’t lose sight of the fact that the problem is a combination of both methods and proxies. It is seldom that a statistical procedure is so flawed as to be “wrong”, but Mannian principal components really is.

        • RomanM
          Posted May 31, 2011 at 10:02 PM | Permalink

          If he’d actually had a proper look at the issue he wouldn’t have labelled your fractional-arima-plus-cherry-pick result as “AR(1) with parameter = 0.2″.

          You have raised this more than once as if it was some sort of egregious error. Are you saying that Wegman did not understand the statistics of what was done? Exactly what would “those of us with a technical bent” technically decree it to be referred to as?

        • pete
          Posted May 31, 2011 at 11:24 PM | Permalink

          MM05-GRL used fractional arima. This is not the same thing as AR(1). If Wegman had tried to reproduce MM05 using AR(1)p=0.2, his results would have been different from MM05-GRL.

          I think he should have called it “fractional arima”, or “arfima”. The fact that he didn’t shows that he didn’t understand that the results weren’t in fact from AR(1)p=0.2 simulations.

          The Figure 4.4 caption also says nothing about the 1% cherry-pick, although that trick is inappropriate whether or not you describe it properly.

          But it’s not the mislabelling of the result that’s the egrerious error. It’s the rubber-stamping of an analysis that he didn’t take the time to understand.

        • RomanM
          Posted Jun 1, 2011 at 7:09 AM | Permalink

          I think he should have called it “fractional arima”, or “arfima”. The fact that he didn’t shows that he didn’t understand that the results weren’t in fact from AR(1)p=0.2 simulations.

          This is utter nonsense. Section 2.2 of the Wegman report discusses both standard ARMA and fractional differencing. Why would the latter portion be in the report if Wegman “did not understand”? Anyone reading the MM script for generating the series could not possibly be oblivious to the fact that the series are fractional AR(1).

          Is using such terminology wrong? If you look at the original Hosking paper in Biometrika on fractional differencing, you will note that the author refers to the ARIMA(0,d,0) process as “fractionally differenced white noise”. Maybe you should inform him that he doesn’t understand that “in fact” it is not white noise.

          The term “arfima” does not seem to be de facto standard terminology for such processes. Hosking did not use it, nor did Haslett and Raftery in their paper in 1989 from which the R methodology was taken. R help for the fracdiff library states at one point “Fractionally differenced ARIMA aka ARFIMA(p,d,q) models”.

          Your assessment that a competent experienced statistician would not “take the time to understand” before creating such a report is sheer arrogance.

        • oneuniverse
          Posted Jun 1, 2011 at 5:51 AM | Permalink

          Pete, figure 4.2 covers all 10,000 simulated PC1’s. The hockey-stick-selecting feature of the MBH algorithm, and the lack of such a feature in the standard centered algorithm, are starkly visible in the comparison of the two histograms.

          You accuse Wegman of only re-running McMc’s code, yet you ignore the additional work carried out by Wegman and co-authors, encapsulated in figures 4.6 and 4.7.

          You accuse Wegman of not understanding the statistical analysis, yet you ignore that fact that in Appendix A, the bias introduced by the MBH centering is mathematically analysed and the magnitude of the bias is derived in the general case.

        • Posted Jun 1, 2011 at 7:51 AM | Permalink

          Roman,
          “This is utter nonsense. Section 2.2 of the Wegman report discusses both standard ARMA and fractional differencing. Why would the latter portion be in the report if Wegman “did not understand”?”

          Oddly enough, the part that introduces fractional brownian motion in Sec 2.2
          “Random (or stochastic) processes whose autocorrelation function, decaying as a power law, sums to infinity are known as long range correlations or long range dependent processes. Because the decay is slow, as opposed to exponential decay, these processes are said to have long memory. Applications exhibiting long-range dependence include Ethernet traffic, financial time series, geophysical time series such as variation in temperature, and amplitude and frequency variation in EEG signals. Fractional Brownian motion is a self-similar Gaussian process with long memory.”
          is very close to one from:
          Govindan Rangarajan, Mingzhou Ding (ed..), Processes with long-range correlations: theory and applications (Springer, 2003)

          The later description of it:
          “An object with self-similarity is exactly or approximately similar to a part of itself. For example, many coastlines in the real world are self-similar since parts of them show the same properties at many scales. Self-similarity is a common property of many fractals,”
          comes with minor changes from Wikipedia.

        • pete
          Posted Jun 1, 2011 at 7:06 PM | Permalink

          Anyone reading the MM script for generating the series could not possibly be oblivious to the fact that the series are fractional AR(1).

          That’s exactly the point; Wegman didn’t bother to read the MM script.

        • RomanM
          Posted Jun 1, 2011 at 9:17 PM | Permalink

          You must be clairvoyant. Along with mind reading, can you predict the future too?

          Two minutes earlier, you wrote “Wegman’s error was to use code which he didn’t understand”. Now you tell me “Wegman didn’t bother to read the script”. Which is it?

        • Posted Jun 1, 2011 at 9:46 PM | Permalink

          Romanm,
          “Two minutes earlier, you wrote “Wegman’s error was to use code which he didn’t understand”. Now you tell me “Wegman didn’t bother to read the script”. Which is it?”

          Where’s the inconsistency? It says that he didn’t understand the code because he didn’t read it.

          Do you think he did read it? If so, how could he have missed that his graphs showing hockey-sticks from red noise came from a program that selected them from a large sample according to HS index. Do you think he read that (it’s clearly marked) and just didn’t think we needed to know?

        • RomanM
          Posted Jun 1, 2011 at 9:49 PM | Permalink

          Nick, if he didn’t read it, what “code” did he run?

        • pete
          Posted Jun 1, 2011 at 10:40 PM | Permalink

          RomanM, do you seriously not understand that it’s possible to run code without reading it?

        • oneuniverse
          Posted Jun 3, 2011 at 6:12 AM | Permalink

          Pete, if you run the MM05 script as archived, it almost certainly wouldn’t run as there are a few sections that are hard-coded to the local (non-standard) directory structure on Steve’s development PC, and would require editing.

          Steve: at the time, I hadn’t fully appreciated the possibility of turnkey programs, in which all data was retrieved from online datasets -a concept that developed together with Climate Audit. The original concept was to show the actual calculations in detail.

        • Posted Jun 3, 2011 at 6:48 AM | Permalink

          Does the program have a version history?

        • RomanM
          Posted Jun 3, 2011 at 7:06 AM | Permalink

          What “program” are you talking about and what point, if any, are you trying to make?

        • Posted Jun 3, 2011 at 10:12 AM | Permalink

          1U,
          obviously this relates to the earlier question of whether he actually read the code. Your explanation might be right. But the url.source website where he would then have got the archived data is also used as a write-to site, and it’s a journal site (in my version). So he wouldn’t have had authority to do that, and would have had to fix that, probably by changing to a local site. So then he would have had to put the archived data in the right place. I suppose he could have just commented out the write statements, but it does seem all very careless.

        • RomanM
          Posted Jun 3, 2011 at 12:08 PM | Permalink

          Nick, besides the issues that oneuniverse has raised, there is the fact that the R package waveslim is used in the script. That library is not part of the default installation of R and must be added to R or the script will not run.

          If it is not present, the error message would alert the user to add the package and it would be expected that at that point, any statistician would read the descriptions and familiarize themselves with the use of the materials. If it is already present on the computer, then it would have been downloaded earlier at which time the same understanding of the workings of the package would have been gained. Most of the R help descriptions for the various functions include references to the papers which describe the mathematical algorithms and /or the statistical background for those functions.

          Yes, it is possible for people like DeepClimate, who have a limited background in statistics and have never done research, teaching or consulting in the discipline, to run the scripts without any understanding of how the script works or what it does. Prof. Wegman with his many years of previous experience would have found it no more difficult than I did to come up to speed on the topic if he did not already have that specific knowledge. For anyone to claim that he did not do so before creating a report for the US Congress is sheer hubris.

        • Posted Jun 3, 2011 at 1:35 PM | Permalink

          Roman,
          I think it is odd that you keep insisting that he read the program properly, because the kindest explanation for what he claimed is that he didn’t read it. Otherwise, you haven’t explained what a statistician is doing claiming that the occurrence of simulation results demonstrates HS-ness, when they had been selected from a much larger sample exactly on the basis of HS-ness.

          And said “We have been able to reproduce the results of McIntyre and McKitrick (2005b).” when Figs 4.1, 4.3 and 4.4 are merely plots of archived MM2005 calculated results. That’s not reproducing, it’s copying.

        • oneuniverse
          Posted Jun 3, 2011 at 2:36 PM | Permalink

          Nick, actually, I made a mistake – Fig.4.3 is not generated from the archived data.

          (Sorry about that – I’m not running the fig.3 code yet until the UVa archive is available, but I should’ve paid better attention.)

        • Posted Jun 3, 2011 at 4:00 PM | Permalink

          1U,
          Thanks, yes, I think it’s only the graphs depending on the 100 sample selection (4.1 and 4.4).

        • oneuniverse
          Posted Jun 3, 2011 at 5:25 PM | Permalink

          Nick, the inadvertent (imo) use of archived results instead of fresh results is down to not noticing an incorrect (but similar to the correct) filename in a single line.

          hockeysticks<-read.table(file.path(url.source,"2004GL021750-hockeysticks.txt"),sep="\t",skip=1)

          should be, for the purposes of the reproduction,

          hockeysticks<-read.table(file.path(url.source,"hockeysticks.txt"),sep="\t",skip=1)

          Also, you wrote to Roman : “you haven’t explained what a statistician is doing claiming that the occurrence of simulation results demonstrates HS-ness, when they had been selected from a much larger sample exactly on the basis of HS-ness.”

          As already mentioned, Fig. 4.2 is the one that encapsulates the behaviour of the MBH algorithm across all 10,000 simulations. Wegman et al. note in the discussion of this figure : “In particular, the MBH98 methodology (and follow-on studies that use the MBH98 methodology) show a marked preference for ‘hockey stick” shapes.”. So the hockey-stick selecting feature of the MBH method is already established in the discussion of Fig. 4.2. It is not mentioned in the discussion or captions of Fig.4.1.

          In Fig.4.4’s caption, there is the phrase I assume you’re objecting to: “The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications.”. While this statement may well be correct (if dependent on definition of hockey-stick eg. HS index > some number) for full 10,000 simulations, it shouldn’t be part of the caption for the subset of 12 (although this could be argued – I personally wouldn’t). I’m just now redoing the 10,000 arfima simulations (my PC from yesterday isn’t accessible), and will report what a random selection of plots from the full 10,000 looks like. Since there are likely to be very few simulated PC1’s with an absolute hockey stick index < 0.5, I'm guessing they will all exhibit HS-ness to some degree. (By contrast, the majority of PC1's from the centered method will almost certainly have an absolute hockey stick index < 0.5).

        • Posted Jun 3, 2011 at 6:12 PM | Permalink

          1U,
          The discussion of Fig 4.1 starts “The similarity in shapes is obvious … However, the top panel clearly exhibits the hockey stick behavior induced by the MBH98 methodology.”, referring to the synthetic PC1 and the MBH result. It doesn’t actually say how the top panel plot was chosen, but I don’t think that “It’s no 71 in a subset of 100 selected by HS index from a sample of 10000 simulations” is really what you’d expect a statistician to offer without explanation, when saying that it shows MBH-style HS behaviour.

          If you’re doing tests, I think it would be interesting to show what the top 100 of results from centred synthetic PC’s would do. Just use stat4 instead of stat2 in the order command, and see how Figs 4.1 and 4.4 would look.

        • oneuniverse
          Posted Jun 3, 2011 at 7:17 PM | Permalink

          Nick, ok, I’ll do that, and also plot random samples from the centered PCs.

          Please note though that due to their different distributions, the top 100 of the centered method PCs will be less representative of the whole set than the top 100 of the decentered method PCs, which should be apparent from a comparison with the random samples.

        • oneuniverse
          Posted Jun 3, 2011 at 10:04 PM | Permalink

          Nick, you can find view the 4 figures here.

          Each figure plots 12 series, either a random sample of 12 from the 10,000, or a random sample of 12 from the top 100, for centered and MBH methods respectively.

          Of the random sample from the 10000 produced by the MBH method, I would say 10 show distinct hockey-stickness, one has a milder uptick, and one is disqualified from being a HS as it doesn’t have the flat handle.

          None of the random samples from the 10000 produced by the centered method are hockey-sticks.

        • oneuniverse
          Posted Jun 3, 2011 at 10:15 PM | Permalink

          Please note that the Y-axis scalings are different – the centered ones have (-7,7), the top 100 MBH one has (-.1,.03) as in the original, and the random MBH one is (-.1, .1).

        • Posted Jun 3, 2011 at 11:01 PM | Permalink

          1U,
          Unfortunately, I can’t see the plots – I get a connection reset error.

          I tried it myself. As you probably found, they don’t use the same process for centred and decentred. Decentred has explicit SVD; centred uses prcomp() or princomp(). It should come to the same thing, but I think it is bad practice in the circumstances. And of course the scaling comes out all different, which I guess is reflected in your graph scales that you mention.

          I rhought the best and fairest way to compare was to just readjust the “mannomatic” fn to have default M=581 (instead of M=78) for centering, so the whole mean is used. That way they go through exactly the same process.

          I found then that centred gave strong hockey stick shapes (with the selection of 100).

          I’ll write a blog post about it. I’d be happy to post your results too if you’d like.

        • Posted Jun 3, 2011 at 11:41 PM | Permalink

          1U,
          Odd things with your images. Firefox timed out immediately – IE just went on loading indefinitely (I waited 3 min) – but Chrome worked.

          Yes, that looks right. I’ve been getting similar HS shapes with centred selected. I haven’t produced the plots for unselected yet, but individual plots look consistent with that.

          I spoke too soon with Chrome, it also loses the site after any change.

        • oneuniverse
          Posted Jun 4, 2011 at 2:49 PM | Permalink

          Nick, I can access MediaFire with Firefox, but the iPad won’t even load the MediaFire homepage. I’ve transfered the files to here at RapidShare. I’ve added figures of a random sample of 100 from the full populations of centered and MBH results respectively. Each is broken up into 4 files of 25 plots each, as I couldn’t get a 10×10 grid to work.

          You suggest a fairer comparison is compare the MBH method to an adjusted “mannomatic” function (call it the “semi-mannomatic”) which uses the whole series in place of the 1902-1980 period, but still with the 3 MBH steps : 1) centering, 2) dividing by the SD, and 3) dividing by the SD of the residuals from a linear fit.

          Based on my limited familiarity with the dendro and statistics, I tentatively agree that the standardisation step of division by the SD makes sense since the MBH proxies are a mixture of different measurements (ring width, ring density, ice O18, etc), and standardisation is considered mandatory when there are different units. (Have these different proxies already been standardised into common units before the PCA, though? If so, then the step 2 division by SD is questionable.)

          With similar tentativeness, I disagree that the 3rd step of dividing by the SD of the residuals from a linear fit should be included since this this appears to be an unusual step, neither justified in the paper or [from my brief review] in the basic literature on PCA (please correct me if I’m wrong).

          Taking the above, and your criticism of the lack of SVD for the centered step, into consideration, I reran the simulations (with a simulation size of 2000 rathen than 10000 as my PC here is slow), using the prcomp method with scale=TRUE for the centered, which will do the steps 1&2 of the semi-mannomatic, and uses SVD. (I would, given more time, run your suggested function too for comparison).

          A comparison of the random samples, which I think we agree is the fair comparison, shows that hockey sticks are the norm in the MBH method (this is unchanged, of course), and not in the centered and scaled prcomp method. I found 2, maybe 3 sticks in the random sample of 100 (from population of 2000 centered). A look at the Fig. 2 histograms shows that there’s very little change from the MM analysis.

          Steve – you should consult our Reply to Huybers and Reply to von Storch on whether it makes sense to double-standardize the tree ring networks as Wahl and Ammann argue. All of this is just ways to get the bristlecones in disguise.

        • oneuniverse
          Posted Jun 4, 2011 at 4:55 PM | Permalink

          Dear mods, please release my post at Jun 4, 2011 at 2:49 PM from moderation if possible?

        • Posted Jun 5, 2011 at 1:46 AM | Permalink

          1U,
          Some continuing trouble – when I clicked on your link, McAfee site-advisor came up with this dramatic warning. I’ll switch to linux to proceed, but for the moment I have a 10000 sim run going.

          I’ve found the best way to display pics is to set up a site with Google sites – you have a lot more control. It’s free. Before that I used to use Tinypic, with no issues.

          There’ve been some unexoected household issues here which has delayed my own simulations, but I still hope to have a blog page in a day or so. I imagine it will agree with your results.

          I don’t have strong feelings wbout whether the extra matters (divide by sd etc) are good or bad – my basis for suggesting the semi-mannomatic (liked that word) was to limit the change to the factor Wegman was talking about – the decentering.

        • oneuniverse
          Posted Jun 5, 2011 at 7:17 AM | Permalink

          Steve, thank you. So the proxies are already cast into common units, and the statistical authorities cited by Huybers do not recommended standardising them again.

          (I’m not clear how this is relates to the bristlecones, though? Also, how do W&A get a hockeystick when they omit the bristlecones & foxtails (fig. 1c)?)

          Nick, thanks for the googlesites suggestion, I’ve moved them to here.

          Wegman talks about the decentering, MM05 discusses the extra divisions as well.
          I think it’s very useful to see how MBH method differs from the recommended procedure, which is what MM05 does.

          It’s also interesting to examine the individual effects of the extra steps taken by MBH.
          I’m running the simulation now with the semi-mannomatic. It would then make sense for completeness to do additional simulations omitting the other steps (divisions by SD).

        • oneuniverse
          Posted Jun 5, 2011 at 12:55 PM | Permalink

          Dear mods, I have another post in moderation (Jun 5, 2011 at 7:17 AM) – please rescue if possible?

          Nick, I’ve now run the simulation using the semi-mannomatic as well. I’ve uploaded to the fig.2 for the run – there’s little appreciable difference between it and the MM analysis.
          (I moved to googlesites as suggested, link is in moderated post).

        • oneuniverse
          Posted Jun 5, 2011 at 6:43 PM | Permalink

          re: my question about fig. 1c (fig5 of WA07) – please ignore, I had misunderstood it.

        • Posted Jun 5, 2011 at 8:16 PM | Permalink

          Thanks, 1U
          Yes, the new site works really well. The selection of the top 100 does also bring hockeysticks out of centered simulations. The random selection of decentered does show endpoint behaviour, not so obvious, partly because it is as likely to go down as up (for PCA, it doesn’t matter which way). And as you say, Fig 2, and its message, are still OK.

          That Ammann site you linked to is helpful too.

        • Posted Jun 8, 2011 at 8:50 AM | Permalink

          1U,
          I have posted my results corresponding to yours here, with commentary. I linked to your results.

        • Fred
          Posted Jun 1, 2011 at 10:03 PM | Permalink

          Actually the situation is worse than described by pete. Wegman’s fig 4.4 shows examples that were not calculated ‘de novo’ by him, but that are read directly from a file of results originally archived by MM05. (Remember these are drawn from random realisations – he can’t possibly have got the exact same answers in a new calculation without using the (unarchived) seed for the R random number generator). This re-reading of the saved MMO5 file is what the archived code does and since there is no sense in which re-running the code as written and getting that figure can count as any sort of validation, the conclusion that Wegman did not read or understand the code he was running is inescapable.

          It may well be that Wegman edited the code to use AR(1) and rho=0.2, but if he didn’t notice that the code throws that all out and reads in data from a saved file before plotting, that again points to a lack of reading comprehension or carelessness.

          Of course, since Wegman never archived his own code we don’t know for sure.

          Ironically, this is once case where the perfect replication of the results by DC tells us all we need to conclude that Wegman did not in fact do any ‘due diligence’ before writing his report.

        • Steve McIntyre
          Posted Jun 1, 2011 at 11:31 PM | Permalink

          I’m quite prepared to agree that the value of reports from academic panels that do inadequate or no due diligence is much diminished. However, the NAS panel did even less due diligence. North said that they just “winged it”. After saying that bristlecones should be avoided, they used reconstructions containing bristlecones anyway.

          And the IPCC doesn’t do any due diligence of the type that you are discussing here – a point that Mann made several years ago.

          Makes one wonder about academic panels.

        • oneuniverse
          Posted Jun 2, 2011 at 11:10 AM | Permalink

          FWIW, I’ve just run an edited version of the MM05 R-script which uses hockey-sticks from the newly generated simulations rather than the archived hockeysticks. The functions in Part 1 were not audited for correctness. The code to generate figure 3 (4.3 in the WR) was skipped (since I can’t access Mann’s MBH FTP archive at UVa). Hockeystick no.8 of WR Fig.4.4 was also omitted since it’s dependent on file pc01.out of the UVa archive. (What is the significance of no.8 ? Why isn’t it like the other 11, a random sample from the top 100?)

          Apart from the missing no.8 stick, Figs. 4.1, 4.2 and 4.4 of the WR were successfully reproduced (although they are not exactly identical, of course).

        • Posted Jun 2, 2011 at 6:18 PM | Permalink

          Oneuniverse, I did the same, with the same outcome.

          “(although they are not exactly identical, of course)”

          Indeed, and that shows you were more thorough than Wegman. His Fig 4.1 is identical with Fig 1 in MM05. He didn’t run the code here – he just plotted M&M archived results.

        • oneuniverse
          Posted Jun 3, 2011 at 8:53 AM | Permalink

          Nick Stokes: He didn’t run the code here – he just plotted M&M archived results.

          The natural explanation is that he ran the code (which itself plots archived results for fig. 4.1, 4.3 and 4.4). If he didn’t run the code, how did he create the report’s fig. 4.2 (“our recomputation of the Figure 2 in McIntyre and McKitrick(2005b)”), which is very similar but not identical to fig. 2 of MM2005b? There’s no reason to believe that it isn’t a recomputation.

          The R-script archived for MM2005b is not quite turn-key code, as Steve points out above. It apparent to me, from the way that tables are often saved and then immediately reloaded in the next section (the next line), that the code would sometimes have been run in segments by the authors, to allow development and testing without having to recompute the simulations at every run. Running the MM code (once the hard-coded file paths are edited) will create a fresh set of simulations. The critical fig. 2 (4.2 in WR) is generated from these new simulations. Due to what is apparently a left-over piece of code, Fig. 4.1 and 4.3 and 4.4 are plotted used the archived hockeysticks rather than those from the new simulations. The authors of the Wegman report should have spotted that this is a mistake for the purposes of replication (although there would have been reasons in the code’s past to plot the exact sticks published in MM2005b) – however, using the freshly created simulations yields results that are effectively almost identical to the archived results.

          DeepClimate writes :

          It’s true that NRC did provide a demonstration of the bias effect using AR1 noise instead of ARFIMA. But it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA. Indeed, that high parameter was chosen by the NRC expressly because it represented noise “similar” to McIntyre’s more complex methodology.

          I edited the MM script to run some simulations using AR(1) with ar coefficients of 0.2, 0.5 and 0.9. (To save time, I reduced the simulation size for 0.5 and 0.9, after checking that there was little difference between that and a full 10,000 run for coeff 0.2 – I’ll do the full run later).

          The bias of the decentered PCA is still clearly visible in all the histogram comparisons, although the ‘hockey stick’ indices are of lower absolute value for the lower coefficients – the absolute values for the decentered analysis cluster just under 1 for coeff 0.2, just over 1 for coeff 0.5, and just under 2 for coeff 0.9 (for comparison, the abs. values for the arfima method cluster around 1.7). The histograms for the centered analysis are correspondingly narrower – the same kind of ‘reverse image’ quality is present for all the comparisons.

          So while it’s not possible to get the steep hockey stick of MBH without using a high coefficient, this is due to the characterstics of the bristlecone data, and not the MBH algorithm. DeepClimate is wrong to state that “it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA.”, since the bias is plainly apparent even with a low coefficient like 0.2.

        • oneuniverse
          Posted Jun 3, 2011 at 9:11 AM | Permalink

          The authors of the Wegman report should have spotted that this is a mistake for the purposes of replication

          Sorry, that wasn’t well phrased – rather, they should have spotted the use of archived results, since it’s a mistake for the purposes of replication.

        • Layman Lurker
          Posted Jun 4, 2011 at 10:28 PM | Permalink

          DeepClimate is wrong to state that “it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA.”, since the bias is plainly apparent even with a low coefficient like 0.2.

          Steve may correct me if I’m wrong, but wasn’t the whole point of using arfima to simulate the bristlecones? Obviously, they are not AR1 p=0.2.

        • Posted Jun 4, 2011 at 12:10 PM | Permalink

          > What “program” are you talking about and what point, if any, are you trying to make?

          So we can talk about code and script without fussing, but not program. Good to know.

          Sometimes, scripts, programs, and code has version numbering and in general version control:

          http://cran.r-project.org/web/packages/waveslim/index.html

          We see that version 1.6.4 of waveslim was published on the 2010-06-10 by Brandon Whitcher.

          So, do we have some metadata for Steve’s script?

        • Posted Jun 4, 2011 at 8:06 PM | Permalink

          Re: willard (Jun 4 12:10),

          Start here:

          http://cran.r-project.org/src/contrib/Archive/waveslim/

          If you think there is a issue related to versioning, then you have
          all the data and code at your disposal to do the test.

          take the script.

          Download the various archives and determine if there are changes due to versioning.

          If you have trouble figuring out how to write R to load a specific version
          of a library, then just ask the R help list or RTFM.

          For perfectionists I would suggest that they include in all scripts
          a little piece of code that sends sessionInfo() to a file.

          But, again, if you think there might be a version Issue you have all the tools and data required to set that speculation to rest or to raise it as a legit concern.

        • RomanM
          Posted Jun 4, 2011 at 9:28 PM | Permalink

          Steven, I think that the willard is trying to create some sort of issue with the perceived lack of version number on Steve’s script. I have no idea what his possible issue with it may be or where he thinks that he could take it.

          The script is published on a journal site and what you see is what you get. It is fixed in time. Nothing to see here, willard…

        • Posted Jun 5, 2011 at 6:37 AM | Permalink

          It would be interesting to know how Wegman’s team could guarantee that Steve’s code was used to produced the result, if he could determine if changes are due to versioning, know how to send sessionInfo() to a file, let alone know why, or if they could load a specific version of a library. If they could not, how could we be confident that hiring statisticians is enough to audit coding practices in a way that satisfy the ideals of reproducible research?

          Since we’ve heard so much good of auditing coding practices, it would be interesting to know how archiving code guarantees that this was the code used for simulations. Having code, like having data, is insufficient to warrant any knowledge about it. The second idea that came to mind is that we’d have to keep track of the dependencies for replication, perhaps in a less cumbersome way than suggested by Mosh. The first one is that versioning system could help for reproducible research, more so if it included some collaboratory notebook facility.

        • Posted Jun 5, 2011 at 5:33 PM | Permalink

          Re: willard (Jun 4 12:10),

          It would be interesting to know how Wegman’s team could guarantee that Steve’s code was used to produced the result, if he could determine if changes are due to versioning, know how to send sessionInfo() to a file, let alone know why, or if they could load a specific version of a library. If they could not, how could we be confident that hiring statisticians is enough to audit coding practices in a way that satisfy the ideals of reproducible research?

          Since we’ve heard so much good of auditing coding practices, it would be interesting to know how archiving code guarantees that this was the code used for simulations. Having code, like having data, is insufficient to warrant any knowledge about it. The second idea that came to mind is that we’d have to keep track of the dependencies for replication, perhaps in a less cumbersome way than suggested by Mosh. The first one is that versioning system could help for reproducible research, more so if it included some collaboratory notebook facility.

          “It would be interesting to know how Wegman’s team could guarantee that Steve’s code was used to produced the result, if he could determine if changes are due to versioning, know how to send sessionInfo() to a file, let alone know why, or if they could load a specific version of a library. If they could not, how could we be confident that hiring statisticians is enough to audit coding practices in a way that satisfy the ideals of reproducible research?”

          First you have not heard so much about good code auditing practices. in fact, you’ve heard very little about it. Outside of a smattering of posts where I point people to the primary literature, it’s not been much of a topic. Any way Hiring statisticians has nothing to do with reproduceable research. The field leading the way in reproduceable research is my old field, computational linguistics, although other fields (biomedical) are also on the forefront. Essentially RR is an ideal and various fields work toward that ideal with more or less success. Code and data are necessary, but the ideal requires more as I’ve detailed many times.

          The easiest way to prove that code was used to produce the results is to require a reproduceable document. For example use sweave. I assumed you knew this willard. If you want references on how to do this, GIYF. or start with some simple examples
          http://www.stat.umn.edu/~charlie/Sweave/
          or you can start here
          http://sepwww.stanford.edu/doku.php?id=sep:research:reproducible
          Jon is still around and he answers emails. he’s a kind and generous man I am sure he will answer your questions.

          WRT changes being due to versioning… that is elementary. Depending on the package you use it’s a simple matter to check change logs, for example, you can go to the change log on Rforge and see a daily log of changes made, suggestions made, bugs reported. But there’s an easier way, described below. The point is you have the tools to check IF you think it’s a problem.

          sessionInfo() is pretty straightforward. According to the rules of reproduceable research the system you use, the tools you use, should
          all be documented. If you look back at my questions to gavin WRT GISSTEMP
          you will see these sorts of questions. compilers, OS, etc. The goal of course is to produce machine independent results. which would also mean recording seeds for and random number generation. But basically if you want to produce reproduceable results you’d start by capturing the system characteristics.
          like So:

          > WillardStupidQuestion WillardStupidQuestion
          R version 2.12.0 (2010-10-15)
          Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

          locale:
          [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

          attached base packages:
          [1] stats graphics grDevices utils datasets methods base

          other attached packages:
          [1] raster_1.6-16 sp_0.9-72

          loaded via a namespace (and not attached):
          [1] grid_2.12.0 lattice_0.19-13

          # Now save stupid willard to a file.
          > save(WillardStupidQuestion, file=”Troll.Rdata”)

          See. its easy.

          Now, when you ask for help on the R help list, folks require that you
          include executable code and we always ask for sessionInfo() just to
          eliminate that as a cause for concern. Go ahead and try it.
          get yourself a mac with the OS described above, download the version
          of R detailed. Download the versions of the packages detailed.
          and run the code I’ve put here in the comment. Or give me remote access to your desktop and I’ll take over your system and show you.

          Now, your next question. The one advantage of R and the packages is that I could send you everything.

          1. the R source code for the version I used
          2. the source code for all the packages I used
          3. the source code for the compilers I used
          4. the make files
          5. my script.

          you can build everything from source and then run your scripts. So with that software drop, you actually Build everything from source. That approach doesnt require that you document which version you used because you just include the source for the version and build from source. If you wanna get real picky you could also recompile the OS source.

        • Posted Jun 5, 2011 at 5:42 PM | Permalink

          Re: willard (Jun 4 12:10),

          so willard, after you get your system set up
          just type this at the console

          WillardStupidQuestion <- sessionInfo()
          save(WillardStupidQuestion, file="Troll.Rdata")

          There, you have copied your sessionInfo to a file.

        • Posted Jun 5, 2011 at 8:22 PM | Permalink

          > First you have not heard so much about good code auditing practices. in fact, you’ve heard very little about it. Outside of a smattering of posts where I point people to the primary literature, it’s not been much of a topic.

          Anyone who read the blog can recall MrPete mentioning it from time to time. Yet we agree that we still never really heard very little of it, nothing more than a few links. And so we thank Mosh for describing a bit more what should look like reproducible research.

          > Any way Hiring statisticians has nothing to do with reproduceable [sic.] research.

          Here is a relevant paragraph in the Wegman Report:

          > We have been able to reproduce the results of McIntyre and McKitrick (2005b). While at first the McIntyre code was specific to the file structure of his computer, with his assistance we were able to run the code on our own machines and reproduce and extend some of his results. In Figure 4.1, the top panel displays PC1 simulated using the MBH98 methodology from stationary trendless red noise. The bottom panel displays the MBH98 Northern Hemisphere temperature index reconstruction.

          This quote shows that Wegman and his clique intended to **reproduce** M&M’s results, and thus to yield to the ideal of RR. Interestingly, the caption of the figure 1 is said to be a “reproduced version”, whereas figure 2 and 3 are said to be “recomputations”. Is figure 1 reproduced in a way that is converging toward the “ideal” of RR, not unlike in the above quote? (As an aside, we note that we did not have any timestamps along these reproductions.)

          This quote also shows that Wegman and his clique needed some assistance to run the code. With this kind of assistance, it’s quite possible to run code without reading it, as pete emphasized earlier. It’s quite possible that Wegman read it, but then Nick’s question remains unanswered:

          > [H]ow could he have missed that his graphs showing hockey-sticks from red noise came from a program that selected them from a large sample according to HS index. […] Do you think he read that (it’s clearly marked) and just didn’t think we needed to know?

          Considering the assistance that was needed, it remains to be seen that checking change logs would be an elementary task for the Wegman clique. In any case, the reading that has been done argues strongly for litteral programming.

          Finally, notwithstanding all being said so far, we have yet to see what warrants the **reproduction** of random simulations. (Mosh might have forgotten to add this question his Troll.Rdata file.) Unless we can answer that stupid question, it might more prudent to talk about “auditibility” as an ideal too.

        • Steve McIntyre
          Posted Jun 5, 2011 at 9:25 PM | Permalink

          Tell me Willard – can you tell me (1) how to calculate confidence in MBH99; (2) calculate the number of retained PCs in MBH98 and MBH99.

          Look, I’m all in favor of the newfound zeal by you and others for replication and would very much appreciate any light that you can shed on these longstanding puzzles.

          And while you’re at it, can you provide me with the results of the individual MBH98 “experiments” before splicing – the “dirty laundry”.

        • Posted Jun 6, 2011 at 12:14 AM | Permalink

          Re: willard (Jun 4 12:10), Willard its pretty clear you have no idea what you are talking about. First with respect to what mr Pete and I have been talking about here since 2007. As I’ve noted several times RR is an ideal for the publication of science. Frankly, I put little weight on anything Wegman did. As I view it his report to congress, like most science papers, are not actually science. They are advertisements for science but not the science itself. You asked a question about versions. As I explained if you have a question about the versions YOU can do several things to answer those questions. YOU are not prevented from looking at those questions. What I’m suggesting is that instead of giving other people assignments that you do some work yourself.
          As Gavin challenged folks to do their own damn science or shut up, I’m suggesting that You check these things or shut up. I have no patience for people who ask questions that they can answer for themselves, especially when others have gone to the pains of providing them with the tools free of charge, even more so when the work has been produced as a volunteer effort with zero compensation. I suspect that you didnt even go to RForge. In fact, Im pretty damn certain you didnt. Further, If people want to avoid all questions about versions they could build from source. They can because of millions of man hours freely devoted.

          As for your last question, what warrants the reproduction of random simulations? Typically people this to check for the following.

          1. that the code reproduces the figures that were published
          2. to test for machine/processor/OS/compiler independence.

          So, you seem to agree that RR is a good idea. That would put you squarely in my camp and would make you highly critical of Mann. Thanks for your support

        • Posted Jun 6, 2011 at 1:22 AM | Permalink

          Perhaps Steve should concede, in all fairness, that it would be more useful, and helpful, to discuss how the CI should have been estimated in a proper way, or to prove once and for all that there exists no statistical method for calculating CI from data sets chosen according to such and such and so and so, say because of data snooping, because estimating the confidence levels from the calibration period is WRONG, or else. And that should hold even if we’re looking crossword puzzles that interest talented and young scholars, regrettably compelled to anonymity.

          At the very least, it would help the reconstructers to avoid some critical comments from CA and perhaps also contribute to the advance of the empirical methodology of the field.

          But if we can’t convene on that, I suggest we turn it over to UC.

          There is no need to feel one-eyed.

        • Steve McIntyre
          Posted Jun 6, 2011 at 10:49 AM | Permalink

          It’s quite different trying to figure out how something should be done as opposed to seeing if people have proved what they claimed to have proved.

          At the outset, I didn’t presume that the academics in question weren’t doing things right or that they didn’t know how to do things. However, it has become increasingly clear that they really are at sea in their statistical perspective on proxy reconstructions.

          From time to time, I’ve started down this road in my posts trying to apply Brown and Sundberg methods. It’s very slow going for me trying to apply these methods. Sorry about that. And unfortunately I get distracted by other issues that seem pressing at the time. On many occasions, I’ve observed that big problem in the field is the inconsistency of the so=-called “proxies” and urged specialists to try to resolve details of these inconsistencies, as an alternative to cherrypicking, for example, Yamal over Polar Urals.

        • Posted Jun 6, 2011 at 11:41 AM | Permalink

          Re: willard (Jun 4 12:10),

          Perhaps Steve should concede, in all fairness, that it would be more useful, and helpful, to discuss how the CI should have been estimated in a proper way, or to prove once and for all that there exists no statistical method for calculating CI from data sets chosen according to such and such and so and so, say because of data snooping, because estimating the confidence levels from the calibration period is WRONG, or else. And that should hold even if we’re looking crossword puzzles that interest talented and young scholars, regrettably compelled to anonymity.

          What an Odd way of trying to dictate what people should or shouldnt do. Willard, I’ll play bender and suggest that you read the whole blog. Trying to figure out how something was done is one type of challenge. It requires certain talents and engages certain crowds. Telling people how things should be done is quite another thing. Reading your sentences here I get the impression that you dont understand statistics. You certainly dont understand coding. Here is the deal, unless you can demonstrate some value here, I don’t think anybody will be taking your suggestions. Just sayin.

        • Posted Jun 7, 2011 at 10:31 AM | Permalink

          Glad to agree with Mosh that RR is an ideal, however misspecified it is as an idea for now.

          Glad to agree that sharing of code does by itself warrant reproducibility but with communities trustily explaining how to run the code, postponing for another time the epistemological quandaries of what implies the reproduction of simulations and the maintenance of repositories.

          Glad to agree that little weight should be put on anything Wegman did, as long as due diligence is still paid to the Barton Hearings and to the Report made by him and his clique.

          Enough agreements have been found to forgive Moshpit’s relentless unnecessary roughness, unsportsmanlike mistconduct, and poor peripheral vision.

          May we continue to follow bender’s advice, whatever “reading the whole blog” means. Perhaps Moshpit should restart five years ago:

          More on MBH Confidence Intervals

          when Eduardo and others were making some fair comments.

        • Posted Jun 7, 2011 at 1:14 PM | Permalink

          Re: willard (Jun 4 12:10),

          Glad to agree with Mosh that RR is an ideal, however misspecified it is as an idea for now.

          Glad to agree that sharing of code does by itself warrant reproducibility but with communities trustily explaining how to run the code, postponing for another time the epistemological quandaries of what implies the reproduction of simulations and the maintenance of repositories.

          Glad to agree that little weight should be put on anything Wegman did, as long as due diligence is still paid to the Barton Hearings and to the Report made by him and his clique.

          Enough agreements have been found to forgive Moshpit’s relentless unnecessary roughness, unsportsmanlike mistconduct, and poor peripheral vision.

          May we continue to follow bender’s advice, whatever “reading the whole blog” means. Perhaps Moshpit should restart five years ago:

          More on MBH Confidence Intervals

          when Eduardo and others were making some fair comments.

          Once again you misunderstand which really makes me question whether your practical knowledge has any depth whatsoever.

          1. RR is not mis specified. It is being practiced exactly as I described
          2. Repositories are not required as I explained. Do you know what sweave is? Did you read and understand anything about the various options I gave, some dependent upon repositories, others not.

          3. Unnecessary roughness? Trolls who persist as you did, usually get worse. It’s rather necessary roughness.

          4. Been there, read that. As I noted YOU add no value here. No value to the statistical discussions and no value to the code discussions. You add no value because you are ignorant. Eduardo, on the other hand adds value.
          And he had the honesty to announce that his intentions in CHANGING the topic were self interested “My comment was somewhat self-serving, I admit. ” You, on the other hand, willard are a troll pure and simple.

        • Posted Jun 7, 2011 at 9:57 PM | Permalink

          1.1 If we’re to apply Mosh’s ringtone of the moment, the RR links and projects presented so far are “mere promotional tools” for RR, since we do not have the “code” that would specify RR. And so it fails a simple test of reflexivity. A formal specification of RR is thus lacking.

          1.2 Fomel & Claerbout (2009) concludes:

          > Your solution to reproducibility might differ from the those described in
          this issue, but only with a joint effort can we change the standards by which
          computational results are rendered scientific.

          The originators of the RR movement openly admit that they’re far from a complete specification of reproducibility, one that encompasses all solutions, in contrast to Moshpit’s minimization.

          1.3 In the same article, the authors discuss many difficulties they encountered and some they still consider important to solve (e.g. recruiting), in constrast to Moshpit’s minimization.

          1.4 This discussion deflects attention away from the fact that we do not know which version of the script Wegman and his clique ran, assuming he did run it, and that in any case evidence shows he did very little due diligence.

          2.1 A piece of code, be it a simple HTML page, should reside somewhere, and be maintained by an authority that has responsibility over it. Saying that repositories “are not required” goes against very basic principles of archivistics, let alone ontology, unless we cling to spiritualism.

          2.2 The archiving problems implied by repositories are never to be trivialized, but let’s turn them over to archivists for now.

          3.1 Identifying someone as a troll is a trick to polemicize:

          Polemics, Politics, and Problematizations

          3.2 Moshpit has clearly been trying to make the discussion about me, which is the more reprehensible version of the same tired trick he used in #comment-280226 onward, when he trying again and again to change the subject.

          4.1 It is Steve that started the discussion about MBH in #comment-283602, so to recall what Eduardo said (without acknowledging that Steve conceded the point, now and then) as a talking poing about honesty and topic shifting should be redirected at him in the first place.

          4.2 Everything Moshpit added to the discussion only served to derail it. The topic of this thread is about the Report produced by Wegman and his clique, in which we find the SNA of the Kyoto Flames, and also in my opinion the Barton Hearings, which provides some relevant context.

          5.1 Poor reading skills and overconfidence provide weak gatekeeping.

          5.2 A community shall be judged by the way it deals with its visitors.

          5.3 Let Mosh have the last word.

        • Posted Jun 7, 2011 at 11:37 PM | Permalink

          1.1 wrong. Various forms of the specification are laid out. Your requirement that the specification itself be reproducable, shows that you don’t understand the difference between a specification and the thing specified. The specification is not research, it is the guideline for RR. It’s actually pretty simple and if you understood it at a working level you wouldnt say stupid things

          1.2 Wrong. As I pointed out there are several fields in which it is practiced. You’ll not I linked to the primary literature. Looks like you havent kept up with your reading. Suggest you do more googling try using Darpa as a keyword, or computation linguistics, or bioinformatics.

          1.3 Misleading, as I said its being practiced. of course there are challenges. But look what we have to work with. Educated people such as yourself cant even read the most up to date stuff. Of course their are recruiting issues

          1.4 Correct. Wegman to my knowledge hasnt produced code. In my mind that puts him toward the mann/jones/scafetta side of the debate. So, you are rightly critical of everyone ( nearly all of climate science)

          2.1 Wrong. Once again you misunderstand a distinction. Your issue was version control. There are two ways to handle that. You suggested a problem with the version of the library steve used and I gave you a simple solution which you dont even understand.

          2.2 Wrong again.

          3.1 Wrong again. You tried to hijack this discussion with a question about versions. That question is a
          legitamate question AND you have the tools to answer that question. You are free to answer your own
          question. But you are too lazy or stupid to do it.

          3.2 Wrong. The issue is your behavior, not you. The issue is you have everything at your disposal to answer your own questions, but you are really NOT interested in the answer. if you were you would answer it for yourself.

          4.1 Wrong again. You suggested that it might be better to show HOW the CI was calculated than to find mistakes with Mann. Giving steve advice on what he should do. You cited eduardo. What you failed to point out was this
          Eduardo ACKNOWLEDGED that he wanted to shift the discussion FOR HIS OWN PURPOSE. You do not make such an admission. That’s being a troll.

          4.2 You asked a question about versions. I gave you directions on how to handle your own question. When you fail to answer a question which you have the power to answer it becomes clear to all.

          5.1 I;m glad to see you admit your flaws at least. Google is not your friend.

          5.2 Depends on who is doing the judging. You have the right to comment here. That is more than I can say for the places you frequent

          5.3 Sorry, you cant give me what I already have.

        • Posted Jun 8, 2011 at 10:50 AM | Permalink

          The questions that started this subthread (starting with pete’s #comment-280928) were if Wegman and his clique **checked**, **read** or **understood** Steve’s script. The Wegman Report shows evidence that the Wegman clique did not paid due diligence, at the very least.

          In the Wegman Report we can read that:

          > We have been able to reproduce the results of McIntyre and McKitrick (2005b). While at first the McIntyre code was specific to the file structure of his computer, **with his assistance** we were able to run the code on our own machines and reproduce and extend some of his results.

          The nature of the assistance provided by Steve has not been made public, as far as I know. My first surmise was that Steve simply sent them a version of his script that works for them. But since it would be stupid to suppose anything like a control version system to help reproduce research, asking for a rollback around the times Wegman and his clique ran Steve’s code would also be stupid. There’s only one version of Steve’s code made public, and that’s that.

          Nonetheless, the relevant question remains: do we have evidence of the kind of assistance that was being provided to Wegman?

          If this assistance helped out Wegman and his clique run Steve’s code, knowing thenature of this assistance might also help us reproduce the graphs produced in that report. There is no need for anything else than instructions sent by email to anyone with working level in reproducibility research, after all.

        • Posted Jun 8, 2011 at 1:16 PM | Permalink

          Once again, willard cannot be honest about what he is trying to do or what he has done. Unlike eduardo.

          But since it would be stupid to suppose anything like a control version system to help reproduce research, asking for a rollback around the times Wegman and his clique ran Steve’s code would also be stupid. There’s only one version of Steve’s code made public, and that’s that.

          Nonetheless, the relevant question remains: do we have evidence of the kind of assistance that was being provided to Wegman?

          If this assistance helped out Wegman and his clique run Steve’s code, knowing thenature of this assistance might also help us reproduce the graphs produced in that report. There is no need for anything else than instructions sent by email to anyone with working level in reproducibility research, after all.

          1. You asked about versions of the libraries that steve used. What I pointed you at were all the versions for that library. You can quite simply download those versions using the install.packages() function and specify the version you want to run. You can run the script against all versions and see if the answer changes. That is something YOU can do. The tools are available to answer YOUR question. But you refuse to answer your own question when people have volunteered their time to give you this power. In our world willard people who dont RTFM when we have made the manual available, are seen as lazy or worse.

          2. Wegman as I have said should have made his scripts available. It’s one reason why I give his work no points.

          So to repeat. You’ve asked a question about versions. If that was a legitimate question that you were really interested in, I’ve described how you can tell whether the version of the library matters or not. If you ask a question and I tell you how you can answer that question, and you subsequently move to other questions, That gives me reason to believe that your questions are not about uncovering the truth. They are about derailing the conversation. Googling around to get a sketchy understanding of things you dont understand, quoting out of date assesments, linking to threads without reading all the comments, is also evidence that your questions are not to be taken seriously. Let me contrast this with how real questions work. In 2007 I asked gavin a question about GISSTEMP. he said the papers answered the question. They did not. So, we asked for code. In the end we got the code and my question was answered. I thanked him. I had the ability to to look at the code and get the answer I sought. You’ve asked a question about versions. I pointed you to the resource. You didnt do anything except raise other issues. I’ve never liked skeptics who practiced that kind of BS, and so I’ll have to be consistent and say that until you show that you’ve spent sometime answering the question you raised, you’ll be a troll.

        • Posted Jun 9, 2011 at 11:07 AM | Permalink

          My first question was related to the version of Steve’s code. The answer seems to be that the code here:

          ftp://ftp.agu.org/apend/gl/2004GL021750/

          is the final version. No other version seems to have been published since then.

          Wegman and his clique wrote that they have been assisted by Steve to run that code. Nothing else is known about the nature of this assistance by Steve.

          Wegman and his clique wrote that they have “reproduced” MM’s work, and they made some “recomputations”. We also know that Steve sent Wegman some emails.

          Wegman and his clique did not publish their code, as promised. This non-disclosure has been critiqued by Steve and Mosh.

          An intriguing puzzle. Pieces are missing.

        • steven mosher
          Posted May 31, 2011 at 7:15 PM | Permalink

          I’m aware that he is speaking about dave clarkes work. I would hope that people could do the following

          1. Characterize Wegman’s scholarship shortcomings accurately

          2. characterize the import of these papers accurately

          Then move on to the science and engage. And then engagement shoudl be between the principles and not their seconds. although, you make a fine second so take no criticism from that. That’s what I would rather see.

          All I can do is say what I can to clear away some misconceptions about how those documents came to be created. Suggest that Wegman own certain shortcomings and move on to the science.

        • loner
          Posted Jun 2, 2011 at 8:43 AM | Permalink

          Seriously, how much science is discussed here. Most of the time it’s political smears. The science ran out long ago, and never achieved much anyway. AGW is real, so and the basic point of MBH hockey stick is real, even if you accept the hockey stick or not. The prime reason for this site even existing is now defunct. The temperature records are fundamentally correct, the physics is correct, the anthropogenic source of CO2 is correct, the cryosphere is receeding.

          Not much left to do here except submit FOI requests.

        • pete
          Posted May 31, 2011 at 7:16 PM | Permalink

          I’ve commented on the lack of due diligence by academic inquiries in the past.

          Have you talked about the lack of due diligence in the Wegman report? A lot of readers here are still under the impression that he “confirmed” your MM05-GRL findings.

          The figure showing the distribution of the HSI index is from the full sample.

          You’re moving the pea under the thimble — we aren’t talking about Figure 2 in MM05-GRL. Figure 4.4 in the Wegman Report was generated using your code. Your code includes a 1% cherry-picking step, and uses fractional arima rather than AR(1) as claimed by Wegman. Given that Wegman did no due diligence, why should anyone consider his report credible?

        • John M
          Posted May 31, 2011 at 7:50 PM | Permalink

          I see now.

          I’ve seen people showing up with this argument before and never fully explained it. Now it’s very clear.

          It’s not whether or not Steve Mc was right.

          It’s not whether or not Mann’s method mines for hockey sticks.

          This is all about nailing Wegman.

          This helps a lot.

        • pete
          Posted May 31, 2011 at 8:12 PM | Permalink

          This is all about nailing Wegman.

          This is a thread about Wegman. As far as Wegman is concerned it doesn’t matter if Steve was right or wrong, because Wegman never bothered to check whether Steve was right or wrong.

          For the record, short-centering does promote hs-shaped series to earlier principal components. And the effects of this on temperature reconstructions have been exaggerated by “skeptics”.

        • John M
          Posted May 31, 2011 at 8:15 PM | Permalink

          Actually, this is a thread about social networking and bias, but…whatever.

          Just good to know that it’s agreed that Mann’s PCA mined for hockey sticks.

        • Posted Jun 1, 2011 at 9:16 AM | Permalink

          Nick, In case you are wondering why we used an ARFIMA process, in our original submission to Nature in January 2004 we used an AR1 model and calculated the coefficients by regression on the 70 NOAMER sites going back to 1400. In Mann’s first reply he criticised this as follows:

          20th century trends in instrumental and proxy data typically far exceed
          the expectations for a ‘red noise’ null hypothesis (6). Normalizing by the detrended standard deviation therefore more properly weights the data series with respect to their estimated noise variance. It is inappropriate to use a red noise model in testing standardization procedures, as MM04 did, because simple spectral analyses of the actual series reveals many of them to be statistically inconsistent with an underlying red noise model.

          One of the referees was not greatly impressed with this:

          MBH seem to be too dismissive of MM’s red noise simulations. Even if red noise is not the best model for the series, they should have reservations about a procedure that gives the ‘hockey stick’ shape for all 10 simulations, when such a shape would not be expected.

          but to deal with it, in June 2004 Steve started experimenting with the hosking procedure in R, which fit models on long proxy series allowing for near unit root behaviour. The coefficients were, indeed, turning out to indicate near unit roots and long persistence. Our Nature resubmission was rejected in August and thereafter we made a new submission to GRL using the ARFIMA noise model, which was consistently applied thereafter. It was never criticized, and it should be borne in mind that a general model would yield an AR1 form if that fit the data better.

        • Posted Jun 1, 2011 at 9:25 AM | Permalink

          Re: Ross McKitrick (Jun 1 09:16),
          Thanks, Ross
          I don’t myself have criticisms of ARFIMA – it sounds interesting. But is 0.2 a reasobnable value for the parameter of the corresponding AR1 model?

        • Steve McIntyre
          Posted Jun 1, 2011 at 9:56 AM | Permalink

          The tree ring chronologies have AR coefficients all over the map. At the time – and this was before Climate Audit – I remember thinking to myself that the AR properties of tree ring chronologies varied a lot by author. Stahle’s series had little autocorrelation, while Graybill’s had a lot.

          Mannian principal components has other remarkable properties. In MM05b, we observed that, in the NOAMER network, if you arbitrarily increased values of all series other than the Graybills (the ones in the CENSORED directory), the Mann algorithm would flip them over and make an even bigger HS.

          In our reply to VZ, we observed that you could make a real network with an actual signal that was not a HS – but the addition of 2 or so HS series to the network and application of the Mann algorithm would result in a HS PC1 with the ignoring of the real signal.

          Mannian principal components data mines for hockey sticks. And does so quite efficiently, End of story. People can debate how one should represent a tree ring network for comparison – something that specialists in the field have neglected. I think that our approach was sensible and showed the effects of data mining.

          As both Mann and ourselves agree, in the case at hand, Mann’s data mining algorithm located actual HS patterns in the data in the bristlecones. At the end of the day, from a data analysis point of view – as we pointed out at that time – the issue is the validity of the Graybill chronologies as a dominant index of world temperature. This leads to practical data analysis and consideration of bristlecones. We surveyed relevant issues in MM 2005 (EE).

          The bristlecones affect multiple reconstructions whereas Mannian principal components don’t. However, the MBH network was relatively unique because it was such a large network. You can get a stick by simply picking out series the properties of which are known ahead of time – this is the method of most reconstructions. Mann’s method seemed more impressive at the time, because he got a stick from a very large data set. However, this was because of the weight given to Graybill bristlecones. Use of principal components – not just Mannian principal components – contributes to this, since, if you use enough PCs, the bristlecones come in anyway and under Mannian regression, still leads to a HS. However, these other methods differ materially from the original published method and all have hair on them, not least because of the dependence on Graybill bristlecones, a practice that the NAS panel (not Wegman) said to ‘avopid”.

        • steven mosher
          Posted May 31, 2011 at 7:23 PM | Permalink

          Let me be clearer Pete.

          When people talk about the climategate mails there is always one guy in the bunch who goes on about the harry read me file.

          That’s largely a distraction from core issues. But it generates long long debates about peripheral problems. What gets short shrift are the key problems.

          So, I’m suggesting that if DC wants to talk about the stats issue and foreground that, that he should not try to overplay the plagiarism card.

          when you get into that discussion, the fights are stupid ( does SNA beat Mann with a stick? ) and pointless.

          But if people keep saying over the top things ( Jones committed fraud, Wegman plagiarized wikipedia) then we proceed to avoid the science. I got all day.

        • pete
          Posted May 31, 2011 at 8:45 PM | Permalink

          If DC just talked about the stats, then certain people are just gonna say that Wegman’s an expert statistician and the Wegman Report said Steve was right. Especially journalists.

          So he uses the plagiarism angle to demonstrate Wegman’s lack of credibility. And that’s reached a wider audience, which is good. Those of us with a technical bent already know that the problems with short-centering have been exaggerated.

          Steve: what evidence do you have that Wegman had any personal participation in the copying of boilerplate, as opposed to it being done in Said et al 2008 by Sharabati without Wegman’s knowledge?

          Luterbacher et al 2010 plagiarized its opening paragraph from Mann et al 2008. Does mean that its authros also lack credibility?

          Wahl and Ammann 2007 plagiarized its main concepts from the Mann et al submission to Nature in 2004, responding to us. Does that demonstrate their lack of credibility as well.

          Far from the problems with MBH98 being “exaggerated”, in my opinion, they have been under-estimated.

        • Posted May 31, 2011 at 9:06 PM | Permalink

          How about the problems with strip-bark pines, and why they get so heavily weighted? And the problem with upside down Tiljander, and why it gets so heavily weighted… without those, there is no Hockey stick.

        • John M
          Posted May 31, 2011 at 9:12 PM | Permalink

          What was the exageration? You can have just a little bit of artificial hockey stickness?

          And it seems to me what you just said indicates that it is all about getting Wegman.

        • pete
          Posted May 31, 2011 at 11:12 PM | Permalink

          Steve: what evidence do you have that Wegman had any personal participation in the copying of boilerplate, as opposed to it being done in Said et al 2008 by Sharabati without Wegman’s knowledge?

          I haven’t claimed “personal participation” by Wegman. In fact, I’ve been making the point that the real problem with the report is the uncritical repetition of your claims without bothering to replicate them.

          me:the problems with short-centering have been exaggerated.

          Watch the pea move…

          Far from the problems with MBH98 being “exaggerated”, in my opinion, they have been under-estimated.

          Did your cherry-pick of the 1% of PC1s with highest HSI result in the under-estimation or exaggeration of the problems with short-centering?

        • Layman Lurker
          Posted Jun 1, 2011 at 12:11 AM | Permalink

          Pete, IIRC what the Mann algorithm does with ltp red noise vs AR1(.2) is immaterial wrt MBH. Either way the bristlecone dominated PC1 is the product. Therefore the real issue is with the validity the bristlecones as the dominant proxy and the sensitivity of the reconstruction to their exclusion.

        • Posted Jun 1, 2011 at 2:36 AM | Permalink

          Re: Layman Lurker (Jun 1 00:11),
          “Pete, IIRC what the Mann algorithm does with ltp red noise vs AR1(.2) is immaterial wrt MBH. “
          I agree. But it’s material to Sec 4 of the Wegman report. Success of the emulation depends a lot on how well you feel the red noise chosen relates to the spectrum present in the data. That’s why it’s red, after all. So if he says it’s AR1(.2) when it’s a more controversial choice that was implemented, then that matters.

          Though IMO the unstated sampling from the HSI top 1% selection is more blatant.

          As to bristlecones, Wegman barely mentions them.

        • oneuniverse
          Posted Jun 1, 2011 at 6:46 AM | Permalink

          Nick,

          The Wegman report’s discussion of bristlecones is more significant than you make out. They point out that a) the bristlecones dominate Mann’s reconstruction, b) the bristlecones are unreliable temperature proxies, and c) there is no MBH hockeystick without the bristlecones, whether a centered or decentered analysis is used.

          In the MBH98 de-centered principal component calculation, a group of twenty primarily bristlecone pine sites govern the first principal component. Fourteen of these chronologies account for over 93% variance in the PC1 and 38% of the total variance. The effect is that it omits the influence of the other 56 proxies in the network. In a centered version of the data, the influence of the bristlecone pine drops to the fourth principal component, where it accounts for 8% of the total variance.

          Out of the 70 sites in the network, 93% of the variance in the MBH98 PC1 is accounted for by only 15 bristlecone and foxtail pine sites, all with data collected by one man, Donald Graybill. Without the transformation, these sites have an explained variance of less than 8%. The substantially reduced share of explained variance coupled with the omission of virtually every species other than bristlecone and foxtail pine, argues strongly against interpreting it as the dominant component of variance in the North American network. There is also evidence present in other articles calling the reliability of bristlecone pines as an effective temperature proxy into question.

          Although we have not addressed the Bristlecone Pines issue extensively in this report except as one element of the proxy data, there is one point worth mentioning. Graybill and Idso (1993) specifically sought to show that Bristlecone Pines were CO2 fertilized. Bondi et al. (1999) suggest [Bristlecones] “are not a reliable temperature proxy for the last 150 years as it shows an increasing trend in about 1850 that has been attributed to atmospheric CO2 fertilization.” It is not surprising therefore that this important proxy in MBH98/99 yields a temperature curve that is highly correlated with atmospheric CO2.

          Furthermore, the MM03 results occur even in a de-centered PC calculation, regardless of the presence of PC4, if the bristlecone pine sites are excluded.

          Additionally, Mann et al. responded to the MM03 critique of the bristlecone pine, which pointed out that the bristlecone pine had no established linear response to temperature and as such was not a reliable temperature indicator. Mann et al. responded by stating that their indicators were linearly related to one or more instrumental training patterns, not local temperatures. Thus, the use of the bristlecone pine series as a temperature indicator may not be valid.

        • Posted Jun 1, 2011 at 7:58 AM | Permalink

          oneuniverse.

          All those paras but the third are not in their own part of the report, but in the summary of M&M papers. W is just reporting what they said, not what he’s saying.

          And the third starts, correctly “Although we have not addressed the Bristlecone Pines issue extensively in this report…” (in fact, not at all, in their discussion). It is in the findings where, typically for this report, it launches into a discussion unrelated to the body of the text.

        • oneuniverse
          Posted Jun 1, 2011 at 10:29 AM | Permalink

          The authors’ state (and not in the summaries): “We have been able to reproduce the results of McIntyre and McKitrick (2005b).”. Thus, they implicitly confirm that the bristecones dominate the reconstruction. Do you contest this?

          As you point out, the references to the literature (Greybill and Idso 1993, Bondi et al. 1999) on the unreliability of the bristlecones as temperature proxies are provided in the Findings section of the report, and not the summaries. You have a complaint that this matter was not discussed in the body of the text (although present in the summaries), and so should not be mentioned in the findings. I have to disagree. The body of the text (ignoring SNA) examines the behaviour of the MBH algorithm: to create a hockey-stick, the algorithm requires the presence of a hockey-stick in the input data. It’s therefore appropriate and responsible for the Wegman report, as well as criticising the biased nature of the algorithm, to reiterate that the hockey-stick series in the input set are from material whose suitability as a temperature proxy for the instrumental period is highly contested in the literature. (eg. Greybill and Idso 1993, one of the authors of which is the creator of the series, states: “It is notable that trends of the magnitude observed in the 20th C ringwidth growth are conspicuously lacking in all of the time series of instrumented climatic variables that might reasonably be considered growth-forcing in nature.”)

        • Steve McIntyre
          Posted May 31, 2011 at 10:33 PM | Permalink

          The programs describing a “very artificial fudge” and so on are tree ring programs that implemented variations of the “Briffa bodge” ie just changing data to make it look more like they expected it to. The classic example is Briffa’s 1992 Tornetrask chronology, used in a number of multiproxy reconstructions. The Briffa bodge was one way of hiding the decline.

          later CRU developed another technique for hiding the decline – just deleting the data. This replaced the Briffa bodge.

        • Michael Smith
          Posted Jun 1, 2011 at 5:21 AM | Permalink

          Pete replied to my comment by saying:

          “You seem to think that Wegman did some sort of statistical analysis that confirmed MM05.”

          Sorry, Pete, but your response is an unpersuasive collection of unsubstantiated claims.

          I’m not an expert in statistics — rather, I’m a guy who uses statistics in the real world to assess and control the quality of processes that output millions of pieces of molded plastics daily in my business. What I AM good at is sensing those who “know the details” in statistics from those who only “know the superficial”.

          M&M know this stuff down to the root, i.e. they know it down to its real-world, existential, what-does-it-mean-about-reality essence. People like DC, by contrast, — and I’ve met them by the bushel — are mere pretenders who’ve only learned enough to make shallow, superficial accusations and claims that impress their fellow, even-more-ignorant believers, but which fail to hold water when one considers the entire context of M&M’s analysis.

          Steve and Ross KNOW WHAT THEY KNOW and they know WHAT THEY DON’T KNOW, and are concientious about not going “a bridge too far” in their claims. NOTHING builds credibility with me more than seeing a person constantly and consistantly note the limits of their knowledge and claims. In this debate, the over-the-top claims and beyond-context conclusions are overwhelmingly (though not completely) issued by the alarmists and their fellow travelers, like DC and yourself.

        • pete
          Posted Jun 1, 2011 at 6:04 AM | Permalink

          Steve and Ross KNOW WHAT THEY KNOW and they know WHAT THEY DON’T KNOW, and are concientious about not going “a bridge too far” in their claims.

          Well done, you’ve just triggered Poe’s law.

        • stan
          Posted May 31, 2011 at 10:41 PM | Permalink

          Nick, you cannot be serious. Self-refutation?! The climate scientists butchered their work because of the failure to get stats help. Their failure to get help went to the central core of everything they were doing and influenced a public debate involving billions of people and many trillions of dollars. You want to equate that with Wegman’s actually getting help on the most insignificant part of a report, only the help screwed up.

          That’s what you call self-refutation?! That’s two bridges too far. Get real.

    • Posted May 30, 2011 at 2:17 AM | Permalink

      Re: Nick Stokes (May 30 00:03), She really doesnt write a summary.She writes an introduction, throat clearing, as mashey notes some of the definitions she stole dont even get used.

    • KnR
      Posted May 30, 2011 at 4:40 AM | Permalink

      ‘couldn’t find anything new with statistics,’ partly true he confirmed has other had Mann’s stats where rubbish , but that oddly causes you no concern .
      The bits borrowed but not cited probably , which is the problem , are not central to the Wegman conclusion that Mann was using statistics not to inform but to dis-inform, a habit the team got its self into has its pushed the notion of the ‘good lie’ to its outer limits so that its found itself in science where it has no role at all.

  3. Posted May 30, 2011 at 1:26 AM | Permalink

    Kinda weird that the editor of the magazine faxed a wegman email to the USA today reporter.

    am I reading Mashey correctly?

    So basically the GMU response to the FOIA says that the USA today reporter cannot transmit it to a third
    party. and then the editor FAXs a wegman email to the reporter and then it shows up in a mashey report

    weird.

    • Posted May 30, 2011 at 1:40 AM | Permalink

      Re: steven mosher (May 30 01:26),
      There’s nothing to stop the editor faxing a copy of his own received email if he wants. It’s independent of FOI.

      But as I’m sure you’re aware, universities if required to release material under FOI can’t add their own conditions (though a court can impose conditions, as currently at UVa).

      • Posted May 30, 2011 at 2:07 AM | Permalink

        Re: Nick Stokes (May 30 01:40), I’m aware that an editor may choose to make a mail sent to him by wegman public. I find it odd that the editor of a journal would fax an email from an author to a journalist. especially when the case is still being handled at GMU. if GMU is conducting a tough investigation they will be questioning all the authors independently so that they cannot coordinate stories.
        releasing a mail from wegman seems problematic when the GMU case is still open.

        Also, Who knows, perhaps the journal has a policy for handling plagiarism cases. Perhaps
        those policies allow an editor to reveal the communications with an author. I would expect a good policy would control these communications.

        With respect to the 3rd party issue. you’re missing the point.

        If the GMU material has that wegman mail in it, why would the reporter ask for it?

        If it doesnt why would the editor give it?

        Its not that I see anything wrong here (legally) its the backstory I’m interested in.

        • Posted May 30, 2011 at 7:51 AM | Permalink

          Re: steven mosher (May 30 02:07),

          “If the GMU material has that wegman mail in it, why would the reporter ask for it?”
          The mail is 16 Mar 2011; Vergano received his GMU material Nov 2010.

          “If it doesnt why would the editor give it?”
          Azen’s role as editor is may now be precarious, and his reputation lessened. He may think the mail puts him in a better light. Or he may think it would have been obtained under FOI anyway.

        • Tom Gray
          Posted May 30, 2011 at 7:55 AM | Permalink

          “may be” “may think”

          Now there is an argument that is based on the facts.

          Do the people in these “debates” actually think before they write?

          I would be very cautious about putting words into someone’s mouth or thoughts into someone’s mind. I thought speculations about motives were strictly forbidden in this blog

        • Tom Gray
          Posted May 30, 2011 at 7:56 AM | Permalink

          “reputation lessened”

          Ah the memory of the Academy of Athens pales with the level of debate here

        • Posted May 30, 2011 at 8:02 AM | Permalink

          Re: Tom Gray (May 30 07:55),
          I’m not arguing anything. Mosh asked – I’ve suggested possible explanations. I have no dog in the fight.

          But it can’t be good for Azen. No editor looks good if papers have to be withdrawn. Especially one that he waved through personally.

        • Posted May 30, 2011 at 9:49 AM | Permalink

          Which would explain why a lot of rubbish climate science papers don’t get pulled.

        • Posted May 30, 2011 at 11:13 AM | Permalink

          Re: Nick Stokes (May 30 07:51), Thanks for clarifying the date. That makes it weirder

          Maybe he thinks it helps Wegman. I wonder if he checked with Wegman?

        • Posted Jun 2, 2011 at 9:10 PM | Permalink

          Why are you assuming that these Emails came from Azen? Deep Climate clearly says that they were sent to Vergano as a result of an FOI request.

          Now a government agency (GMU) has gotta be joking if it thinks it can place restrictions on material released by FOI if, for no other reason that anyone else can get the material by another FOI request. Once released it is public record. So that dog won’t hunt anymore that the bull that lawyers commonly put at the bottom of Emails.

  4. bobdenton
    Posted May 30, 2011 at 3:02 AM | Permalink

    I feel a great deal of sympathy for Wegman, I don’t consider this particular example plagiarism as anything but trivial. That said, the cited authors of a paper or report must take responsibility for all its contents. When material prepared by other persons is used there must be some procedure for confirming that those passages are appropriately cited, I assume by directly asking the person to confirm that except insofar as it is their own work they have reviewed it to ensure it complies with the appropriate academic standard of citation – even better,getting this confirmation in writing.

    The way some passages have been copied uncited seems to indicate a systemic failure to ensure that include works which are not the work of the responsible authors has not been appropriately cited and Wegman must accept some responsibility for that.

    His suggestion that an erratum sheet be issued seems an appropriate remedy (along with apology to offended parties), and had GM completed their misconduct procedure on ime they could have directed such a remedy. Now that resolution is no longer open to them.

    Hopefully, they will give directions for authors including work prepared by others to follow. This can’t be an uncommon event.

  5. Latimer Alder
    Posted May 30, 2011 at 4:46 AM | Permalink

    Am I alone in being a non-academic and wondering what all the fuss is about? Whether such and such a piece of text is original work or pinched from somebody else doesn’t matter to me.

    Surely it is the content of the text, not its provenance that is important?

    Some seem to be like a football team beaten 5-0 at home and having a bitter argument over whether Joe Scruggins or his twin brother Jim scored the fourth goal. It is pretty irrelevant. Your team still got stuffed.

  6. Stacey
    Posted May 30, 2011 at 4:51 AM | Permalink

    Re N Stocks
    1 Does the alleged plagiarism affect the findings of the Wegman et al report? Answer No

    2 Does the Wegman et al report prove an unhealthy relationships amongst the Fiddlestick Team members?
    Answer No
    3 Do the Climategate emails show that Wegman et al were right all along?
    Answer Yes yes yes

    • slowjoe
      Posted May 30, 2011 at 7:00 AM | Permalink

      I would imagine that this is succint reason for the witch-hunt against Wegman. The Climategate emails show that peer-review had been subverted by at least Jones, as postulated to a wide audience by the Wegman, and published in a peer-reviewed paper in Said 2008.

      Were he wrong, they’d have published another paper saying so.

  7. pete
    Posted May 30, 2011 at 6:05 AM | Permalink

    As noted on Andrew Gelman’s blog, the reason this plagiarism is problematic is that Wegman was presented to Congress as an expert.

    Arguments against the contents of the report have had little effect; most people hearing two conflicting “expert” views aren’t in a position to decide which expert is right. What they can understand is that Wegman wasn’t capable of putting together a report without plagiarising most of it.

    The supposed “content” is just Wegman re-running Steve’s code. But Steve’s already run his own code! The only thing the report adds to the debate is an “expert” agreement with Steve. Without Wegman’s “expert” stamp of approval, the Wegman Report is worthless.

    • Bad Andrew
      Posted May 31, 2011 at 11:33 AM | Permalink

      I disagree. “Expert” may be meaningful to Congress and academics, but to reg’lar people like me, we’ve seen so many “experts” trotted out for so many things for so many years… it jus’ don’t mean too much. Expert A says this, Expert B says that… what else is on?

      Andrew

    • loner
      Posted Jun 3, 2011 at 4:40 AM | Permalink

      The funny thing is the auditor missed everything that Wegman got wrong. Everything. Did some hidden kryptonite cause a loss in super auditor powers?

  8. Tom Gray
    Posted May 30, 2011 at 6:14 AM | Permalink

    You know that when I read things like this, I find it difficult to understand that the world is facing a potentially grave crisis. There could be massive disruptions to the environment. There could be massive disruptions to the economy. This crisis could be beyond humanity’s ability to cope.

    So how does humanity address it? We divide up into sides and cast aspersions on the integrity and capability of all of the other sides. We push our own findings beyond their capacity to favor our own political objectives.

    The ancients used to devise ingenious ways to do numerological calculations on the names of their opponents. Using different alphabets and novel methods of calculation, they all were able to indicate that their opponent’s names wereequivalent to 666 – the number of the beast. Now we, as inheritors of the rationalism of the Enlightenment, have moved beyond all of that. We would not dream of using irrational calculations to try to discredit someone else’s findings.

    Maybe we should all just take a step back from all of this and try to understand just what we are all doing and if it is really accomplishing anything.

    • DEEBEE
      Posted May 30, 2011 at 12:50 PM | Permalink

      I did a checksum analysis on your text, while playing it backwards and it was 999. So playing it forward must be 666. /sarc

  9. andymc
    Posted May 30, 2011 at 8:14 AM | Permalink

    Pete,

    don’t forget that The NAS panel also agreed with Wegman and Steve. (Also, Steve was an expert reviewer for the IPCC)

  10. Geoff Sherrington
    Posted May 30, 2011 at 9:07 AM | Permalink

    There is a spectrum of plagiarism, classed by intent of the accused. If the intent was to knowingly deceive, that is serious. If the intent was innocent, akin to a typographic error, that is not serious and should not reflect upon the author.

    We get konwn unkonwns in the plagiarism of photographs that I study. If a photographer snaps a wall of a room on which hangs a copyrighted painting or photo, is the resulting image an example of plagiarism? I think not, for practical use, if the intent is established as innocent. Another example. Here, it is hard to take a wide inner city photo without including some graffiti, some of which is commissioned and copyrighted, but not labelled as such, at all or prominently. Would you call that plagiarism?

  11. EdeF
    Posted May 30, 2011 at 9:16 AM | Permalink

    (Steve, you are mentioned, along with Ross and Bishop Hill, in a readable
    article in First Things magazine by Prof. William Happer, Cyrus Fogg Brackett
    Professor of Physics at Princeton University. The Truth About Green House Gasses. FYI)

    http://www.firstthings.com/article/2011/05/the-truth-about-greenhouse-gases

  12. Craig Loehle
    Posted May 30, 2011 at 9:54 AM | Permalink

    On multiauthor papers it is very difficult, especially on literature surveys, to verify the originality of the work of coauthors. There is no “procedure” one can use. If I got burned like this by a coauthor, it would be very upsetting, but what could I do “next time”? Not clear at all.

    • Arthur Dent
      Posted May 30, 2011 at 11:39 AM | Permalink

      The obvious response would be to do no further work with such a co-author. Once bitten twice shy, so to speak

  13. golf charley
    Posted May 31, 2011 at 9:50 AM | Permalink

    If only so much effort could be put,, into helping Phil Jones track down , original data sources for his 1990 Nature paper concerning UHI, I feel sure that Mashey et al could truly advance the science

  14. genealogymaster
    Posted May 31, 2011 at 10:49 AM | Permalink

    I’m watching this play out on a few websites and I find it interesting that those who are against seeing Dr. Mann’s material keep coming up with the same arguement over and over and offer nothing to continue the debate. Congrats Steve on keeping it civil, I can well understand at times for some of these items it can get ugly. One comment I’ve heard is that if the scientists like Dr. Mann weren’t so sloppy with their research and archiving this wouldn’t be a problem and adding to that fact some say the UVa have put in student records. Why would anyone do that?

    • Posted May 31, 2011 at 8:21 PM | Permalink

      I don’t understand. Who is “against seeing Dr. Mann’s material”?

One Trackback

  1. By Top Posts — WordPress.com on May 31, 2011 at 7:13 PM

    […] Mosher on the Provenance of Said et al 2008 Steve Mosher summarizes his reading of the provenance of section 1 of Said et al 2008 as follows (The other sections, […] […]