McKitrick and Nierenberg 2010 Rebuts Another Team Article

McKitrick and Nierenberg 2010, rebutting Schmidt 2009 is in press at the Journal of Economic and Social Measurement.

Schmidt (Int J of Climatology) 2009, which commented on McKitrick and Michaels 2007, was peer reviewed by Phil Jones (the puffball review is in the Climategate documents); McKitrick was not given a chance to comment. In contrast, when McKitrick and Nierenberg submitted to IJOC, despite specific requests that Schmidt not be a reviewer, McKitrick and Nierenberg ended up with what was, in effect, a Team peer review, with Gavin Schmidt an important and unreported member of the Team. As in other incidents of Team peer review, the Team managed to stifle the comment at IJOC. In Climategate terminology, the Team ensured that there wasn’t a “leak” at IJOC. Eventually, McKitrick and Nierenberg submitted to the Journal of Economic and Social Measurement, which was not subject to the Team.

Ross writes at his webpage:

NEW PAPER ON CONTAMINATED SURFACE TEMPERATURE DATA: In 2007 I published a paper with Pat Michaels showing evidence that CRU global surface temperature data used by the IPCC are likely contaminated due to socioeconomic development and variations in data quality. In 2009 Gavin Schmidt published a paper in the International Journal of Climatology claiming our results, as well as those of de Laat and Maurellis who independently found the same things we did, were spurious. My rebuttal, coauthored with Nicolas Nierenberg, has been accepted at The Journal of Economic and Social Measurement.

* McKitrick, Ross R. and Nicolas Nierenberg (2010) Socioeconomic Patterns in Climate Data. Journal of Economic and Social Measurement, forthcoming.

Data/Code archive here.

Ross provides the following account of Team peer review at the International Journal of Climatology (this is the same journal where Team peer reviewers stifled our comment on Santer et al 2008):

We submitted in April 2009. I requested that Schmidt not be a reviewer, or at least that he not be given a veto. Andrew Comrie promised that he would not let this happen. In September I got an email from Schmidt asking for my Stata code to assist him in preparing a response to the IJOC submission. I wrote to Comrie saying WTF, and he wrote back saying

Dear Ross,

Thanks for checking back on this. You had requested that Schmidt should not be a referee, and he is not. Some of the 3rd party reviewers have requested more background on the exchange, including views from Schmidt and other related submissions, so Schmidt has been asked to write responses. The entire set of exchanges will be assessed once responses are coordinated, and we’ll let you know how to proceed at that point. Thanks for providing your code too.

Best wishes,

Andrew

Then in October 09, having heard nothing back, I wrote

From: Ross McKitrick [mailto:rmckitri@uoguelph.ca]
Sent: Friday, October 16, 2009 2:08 PM
To: Andrew Comrie
Cc: Nicolas Nierenberg
Subject: JOC-09-0139 – International Journal of Climatology

Dear Andrew

Another month has passed concerning our manuscript, and I am writing to ask if there is any progress to report. In a previous email you told me that even though Gavin Schmidt was not a reviewer he was nonetheless brought into the review process and given the opportunity to supply comments. At your request I supplied data and code to him, but I have not received any further communication from him about any comments or concerns he might have about our findings. My assumption is that Schmidt will be disposed against conceding anything we argue, and communication from him to that effect should not, of itself, be taken as evidence against the validity of our findings.

I have been concerned about the fact that there was a perceived need to give Schmidt an opportunity to communicate privately with the referees. Everything they need in order to evaluate the technical content of the two papers was in his original paper was and SI, plus our submission and supplemental material. By giving Schmidt a backchannel it opens up the possibility that he will raise new issues outside the scope of the two submissions, which under the circumstances would be improper, especially if we are not given the chance to respond to them.

Alternatively, if he has been brought into the process because a reviewer felt incapable of evaluating specific technical issues (e.g. pertaining to the statistical analysis), then an independent reviewer with expertise in the area should be consulted. I would caution against the assumption that Schmidt has the specific expertise to adjudicate such points, if by any chance that had been the motivation for bringing him into the process.

Thank you for your consideration.
Yours truly
Ross

To which Andrew replied

Dear Ross,

Gavin Schmidt left me a message a few days ago saying that he hoped to have his comments ready within a week. When I receive them, the process then goes back to the reviewers. As per earlier emails, I understand your concerns about his role and that of the reviewers, including the dimensions you mention below. I reiterate what I said before about keeping the review process fair all round.

I do appreciate your patience in this extended process caused by reviewer delays and requests. As soon as I have news I will be in touch.

Best wishes,

Andrew

On December 15, still having heard nothing, I sent in a revision that took into account some new results arising from the paper that ended up in SP&P. Comrie wrote back to say he was still waiting for “some reviewer feedback”.

On Feb 8 2010 we got the rejection based on 3 referee reports. None of them pointed to any technical flaws, they either dismissed the whole literature, or raised objections that would have applied equally to Schmidt’s paper. I sent in a protest letter the same day
(http://rossmckitrick.weebly.com/uploads/4/8/0/8/4808045/response_to_ijoc.pdf) which Comrie and MacGregor summarily dismissed.

At his website, Ross summarized the process as follows:

After 10 months we found out that IJOC was rejecting our paper on the basis of some inane referee reports to which Nico and I were not given a chance to reply. We did anyway, and if anyone thinks the rejection by IJOC amounts to a knock against our paper, please read our response letter for some perspective. Whether or not the IJOC editors read it, they refused to reconsider our paper. Interestingly, we learned from the Climategate release that Schmidt’s paper, which focuses on defending Phil Jones’ CRU data against its various critics, was sent by the IJOC Editors to be reviewed by Phil Jones of the CRU. As you can imagine his review was shallow and uncritical, but evidently impressed the editors of IJOC. They didn’t ask deLaat or me to supply a review, nor did they invite us to contribute a response. Every interaction I have had over the years with the IJOC has left me very unimpressed.

Whereas Gavin Schmidt demanded (and received) code from McKitrick as part of his review of the Mc-N comment, Phil Jones demanded nothing, providing only the following review (followed by a number of “minor comments” on punctuation and references):

This paper is timely as it clearly shows that the results claimed in dML06 and MM07 are almost certainly spurious. It is important that such papers get written and the obvious statistical errors highlighted. Here the problem relates to the original belief that there were many more spatial degrees of freedom. This is a common mistake and it will be good to have another paper to refer to when reviewing any more papers like dML06 and MM07. There is really no excuse for these sorts of mistakes to be made, that lead to erroneous claims about problems with the surface temperature record.

My recommendation is that the paper be accepted subject to minor revisions. I have grouped my comments into minor changes that are needed, and a second set of thoughts that the author might like to consider to help clarify his arguments. It is certain that this paper will get read by a particular type of climatologist, so it ought to be as clear as possible. I’m happy if all the thoughts are ignored.

The asymmetry of the peer review process is impossible to justify. Schmidt was reviewed by a conflicted party (conflicted in his favor) and received a puffball review. McKitrick and Nierenberg were reviewed by a conflicted party (adverse) who did what they could to prevent a “leak” in the system. (Climategate correspondent Tim Osborn was on the editorial board of IJOC.)

I remain puzzled on the justification for conflicted parties (conflict of interest including both partisanship and friendship) acting as anonymous reviewers, let alone without disclosure to the submitting authors. The only “explanation” that I’ve received is that the editor should take this into consideration, but, when confronted with partisan editors, this affords little reassurance. In this case, I am further puzzled by the apparent inequity in Schmidt acting as part of a peer review Team for McKitrick’s comment on his article, when McKitrick was not afforded a similar courtesy.

459 Comments

  1. stan
    Posted Dec 15, 2010 at 8:43 AM | Permalink

    It may turn out that the best thing the web and blogs can do for scientific journals will be giving the authors a forum to present facts such as these. Or perhaps I should say “best for science”. If journals behave in a manner that is best for science, this will be good for the journals, too.

  2. Fred
    Posted Dec 15, 2010 at 8:48 AM | Permalink

    One would think that by now the Team should have figured out they look guilty, guilty, guilty when they jump through hoops to suppress a critique on one hand and provide puff piece support for their Team friends on the other.

    They just don’t seem to get this interweb thingy and its power to bu press the suppression of truth.

    Where’s WikiLeaks when you really need them 🙂

  3. glacierman
    Posted Dec 15, 2010 at 9:03 AM | Permalink

    The actions of the IJOC are completely inappropriate, and thankfully, are well documented.

  4. bernie
    Posted Dec 15, 2010 at 9:21 AM | Permalink

    Stunning. This whole Hockey Stick/Climategate/BCP/Wegman episode will provide an important case study in the History of Science.

    However, the pressure is surely building with the latest M&W contribution.

  5. P.Solar
    Posted Dec 15, 2010 at 9:27 AM | Permalink

    Congratulations to Ross McKitrick et al on this rigorous work and finally finding a place to get it published.

    This was an ingenious method of detecting the UHI contamination indirectly, rather than examining the individual records and locations.

    It’s a bit like them pinning Al Capone for tax evasion.

    Schmidt and the Team are going to have to up their game. Their amateurish methods just don’t stand up to examination.

    It seems their only way to defend their work now is by underhand review tactics and pulling influence within their network of relations in the pally peer review system.

  6. Anthony Hanwell
    Posted Dec 15, 2010 at 9:41 AM | Permalink

    Now deep into retirement but with two articles in Nature 50 years ago, I am astonished and sickened at these revelations. I suspect there were some underhand journal activities in my day but now as suggested by your first commentator, the glare of publicity made possible by the Internet must surely curb such disgraceful behaviour? Climate science is an oxymoron.

    • Robinson
      Posted Dec 16, 2010 at 8:21 AM | Permalink

      Apparently it seems it does not curb such behaviour.

      Strange but true.

  7. KnR
    Posted Dec 15, 2010 at 10:08 AM | Permalink

    If people working in climate science want to know why they have such poor reputation , this give offers them a insight into the problem. Individuals working in the area see it as their right to control what goes on in it and that what does go on can only support them. Poor practice was seen as fine when it was nothing more than a little known and little cared about area. But once public and wider scientific interest became involved it was no longer possible to carry on the old ‘ways’ But through arrogance or ignorance some are seen as unable or unwilling to change their ways.

    One of the problems of the awful reviews that followed climate , was that they left a message that there was no real need to change or improve a thing in the study of in climate science. So they have carried on making the same mistakes that lead into trouble in the first place. Hopefully this will come back to bite them in the arse , to ever ones benefit .

  8. Mailman
    Posted Dec 15, 2010 at 10:09 AM | Permalink

    I’m engaged with a scientist on another forum who see’s peer review as the be all and end all when it comes to scientific literature, he will not hear a word against the system!

    But I think this is exactly what the team wants. They understand the value and perceived impartiality peer review has in with scientists, so if they can subvert the process they know they have won.

    Mailman

    • mark t
      Posted Dec 15, 2010 at 11:34 AM | Permalink

      It works fine (mostly) in fields sans politicization. Most battles are carried out in respective journals, never seen by the public at large. If something sloppy, or outright incorrect gets through, journal readers catch it and submit comments, which get published if they have merit. The record is then corrected and everyone moves on to the next new idea.

      Not so when there are gatekeepers ala climate science and The Team.

      Mark

      • Dave
        Posted Dec 15, 2010 at 12:46 PM | Permalink

        I think it’s something of a myth that the peer review process in particular, and ‘science’ in general, work well. They just about work long-term, and that’s the best you can say.

        The kind of behaviour we see in climate science is not at all unusual – cf. Planck: “Science advances one funeral at a time”. Many fields have at one time or another been subject to gatekeepers of the ‘truth’. The only thing different here (to most examples) is that we’re being asked to trust the scientific method *in the short-term* when it can only be relied upon in the long-term, and to spend vast sums on the weight of it.

        • mark t
          Posted Dec 15, 2010 at 1:28 PM | Permalink

          That’s why I threw in the “sans politicization” part. Fields that do not rely on any public perception (particularly engineering, with which I am most familiar) rely more heavily on results and political hands are tied by that fact. Such cases, of course, often have much more objective criteria for evaluation, making the task of review much simpler.

          Mark

    • curious
      Posted Dec 19, 2010 at 8:43 AM | Permalink

      Mailman – for your correspondent’s info:

      http://retractionwatch.wordpress.com/

  9. bender
    Posted Dec 15, 2010 at 10:15 AM | Permalink

    Another case of a bluff being called to their detriment. The defense of S09 was pathetic given the adroit critique of McKitrick.

    That makes three cases – Mann98, Schmidt09, Steig09 – where this approach has failed badly.

    The bluff does not work any more, my alarmist friends. Enough people are fed up with your pomp & bluster that there is now sufficient interest and analytical capacity to refute your weakest and shallowest claims. The sooner you drop the pomp & bluster & buffoonery, the better off we will all be.

    We need UNBIASED, TRANSPARENT estimates of the rate of climate change. Biased, opaque methods work against your adopted cause. Are you so stupid that you still don’t realize this? Or so selfish that you simply can’t arrest your urge to indulge?

    Cut the crap. Admit when you’re wrong. It’s a blogospheric world, and honesty is the only policy.

    • Posted Dec 15, 2010 at 9:55 PM | Permalink

      I would say many more than three cases. MBH98 and all the so-called “independent” verifications of same. I don’t know how many there are.

    • nono
      Posted Dec 16, 2010 at 10:16 AM | Permalink

      It’s a blogospheric world
      ————————-
      Well said.

    • Skiphil
      Posted Dec 12, 2012 at 4:45 PM | Permalink

      Two years on from Bender’s sensible remarks and Schmidt-Mann-Steig still man the battlements at RealClimate, with little improvement that I can detect. Mann is more entrenched than ever with his pop sci book and his AGU Fellow status. Is “science” self-correcting?

      • mrmethane
        Posted Dec 12, 2012 at 5:26 PM | Permalink

        …. one funeral at a time, according to one wag ….

  10. Craig Loehle
    Posted Dec 15, 2010 at 10:27 AM | Permalink

    This demonstrates once again that complaints about suppression are not just sour grapes. My experience and things people have told me include superficial and dismissive comments being enough to reject an article, or even that they can’t find fault with methods but it gets the “wrong” answer. Simply arrogant and wrong behavior. Bullying.

  11. Mac
    Posted Dec 15, 2010 at 10:33 AM | Permalink

    Well Team members did say they were going to redefine peer review in order to keep research critical of their approach out of publications.

    They have been true to their word.

  12. bender
    Posted Dec 15, 2010 at 10:35 AM | Permalink

    A brief note to newcomers here. (Someone could do this better than me, but here goes.)

    This particular exchange is not a tempest in teapot. Indeed, this is the reason for climategate. A group of skeptical auditors wanted to make sure that when the UEA CRU surface record was putatively adjusted for urban heating effects, that these effects were estimated robustly and that corrections were in fact correctly implemented.

    What ensued was a gobsmacking pattern of denial – which only raised the skeptics’ concern that these effects were not estimated and removed using proper experimental and statistical techniques.

    This paper represents important progress in understanding what is (and isn’t) in global surface temperature data sets. It appears the surface record may well be – as skeptics hypothesized – contaminated with local urban & land-use heating effects that have nothing whatsoever to do with well-mixed greenhouse gases. When you interpolate amongst urban stations, you attribute warming to non-urban areas that isn’t actually there.

    (Hence the raison d’etre for WUWT.)

    • Posted Dec 15, 2010 at 11:03 AM | Permalink

      And as I am sure readers with experience of scientific and technical publishing will generally recognize, this perversion of peer review simply does not occur in other domains. These shenanigans are what Jones, Schmidt, Mann, et al will be remembered for.

      • Skip Smith
        Posted Dec 15, 2010 at 5:46 PM | Permalink

        Actually, this kind of gatekeeping crap is fairly common in the peer reviewed literature.

        Steve: doesn’t justify it. If the literature is being relied upon for policy making, then measures should be taken to eliminate it in this branch of literature.

        • Skip Smith
          Posted Dec 15, 2010 at 7:12 PM | Permalink

          Wasn’t justifying it, just pointing out the assertion that this doesn’t happen in other fields is incorrect.

        • Posted Dec 15, 2010 at 7:32 PM | Permalink

          Machiavellian machinations are mostly hidden or known to the few involved in other sciences, NONE of which will have the potential financial and social impact of Climatology… The existence of reviewer bias in other fields may or may not be important, but this particular egregious example of “hiding the decline” is KNOWN and needs to be shown to the world…

        • Duster
          Posted Dec 17, 2010 at 3:59 AM | Permalink

          “Machiavellian machinations are mostly hidden or known to the few involved in other sciences, NONE of which will have the potential financial and social impact of Climatology… ”

          Not true really, and in other fields the opposing parties sound very similar and resort to similar tactics. Consider some of the shenanigans that took place not long after the founding of the Royal Society. Another now historical example is the opposition faced by Benoit Mandelbrot. Many of his critics were utterly dismissive and reading their arguments strongly suggests that they did not read Mandelbrot’s work. It happens in all fields of investigation when the purse strings of research funding and published exposure come under the control of a clique convinced of the correctness of their views.

          My personal view is that “peers” are defined far too narrowly and that committees that judge the worthiness of proposed research ought to be completely naive about the field where the research will be undertaken.

        • oneuniverse
          Posted Dec 15, 2010 at 8:50 PM | Permalink

          Actually, this kind of gatekeeping crap is fairly common in the peer reviewed literature.

          What evidence led you to that conclusion? Also, quantify ‘farily common’, please?

        • Skip Smith
          Posted Dec 15, 2010 at 9:17 PM | Permalink

          My experiences publishing in the peer reviewed literature led me to this conclusion. “Fairly common” means “pretty much anytime you try to publish a paper critical of someone else’s work.”

        • Posted Dec 15, 2010 at 10:04 PM | Permalink

          Please indicate the field(s) of your experience. My experience is in chemistry and physics – and I’ve never seen anything resembling the manipulation of the peer review process exhibited here. It would be very helpful if you could provide an example (or examples).

        • Pat Frank
          Posted Dec 15, 2010 at 11:27 PM | Permalink

          I agree, ZT. I’m a chemist, and have never seen anything remotely like what’s happened in climate science. I’ve controverted others’ work, and have never experienced a lock out.

          In an early case, the guy whose results I was disputing was (and still is) a prominent research academic, who said my work was “polluting the literature.” But there was never a hint of anyone trying to block my papers.

        • pesadia
          Posted Dec 17, 2010 at 6:21 PM | Permalink

          Lee Smollen wrote a book called “The trouble with physics” in which he outlines many problems with the science. Although he does not refer to the peer review process, there are some disturbing aspects which deter research in areas which are not mainstream. He makes the point that unless you are developing string theory, you may have difficulty obtaining funds.

        • Skip Smith
          Posted Dec 16, 2010 at 1:23 AM | Permalink

          I’m in the social sciences.

          I’ve experienced gatekeeping behavior multiple times, most recently about two years ago, when I submitted to a top social science journal. Two reviewers recommended publication with only minor revisions. One reviewer recommended rejection for trivial reasons, was blatantly hostile, and argued the issue had already been definitively settled by the work I was criticizing. My guess is that this is was either person I was criticizing or one of his graduate students.

          My manuscript was rejected, and later accepted at another journal.

        • Adam Gallon
          Posted Dec 16, 2010 at 4:01 AM | Permalink

          “I’m in the social sciences”
          That explains a lot, another oxymoron like “climate science”, where bull & PC rule the roost.

        • Skip Smith
          Posted Dec 16, 2010 at 2:10 PM | Permalink

          You’re aware that Ross McKittrick is a social scientist, aren’t you?

        • oneuniverse
          Posted Dec 16, 2010 at 11:59 PM | Permalink

          He’s also a natural scientist with 14 peer-reviewed science journal articles (16 in economics).

          Adam, you may be thinking of “sociology” rather than “social sciences”, which is a larger collection of fields that includes the former.

        • oneuniverse
          Posted Dec 16, 2010 at 7:34 AM | Permalink

          “pretty much anytime you try to publish a paper critical of someone else’s work.”

          Thanks Skip – what you describe doesn’t cover Schmidt’s experience when criticising McKitrick and Nierenberg – in contrast to Mc & N’s experience criticising Schmidt. As Steve Mc wrote, “The asymmetry of the peer review process is impossible to justify.”

          It might be good to hear from (many) others involved in peer-reviewed publishing before coming to such a broad conclusion as you did eg. see Richard’s comment on this thread.

        • Skip Smith
          Posted Dec 16, 2010 at 2:23 PM | Permalink

          You mean broad conclusions like “this perversion of peer review simply does not occur” outside of climate science?

        • oneuniverse
          Posted Dec 16, 2010 at 10:49 PM | Permalink

          Yes, ZT chose to go overboard with that statement: there are incidents outside climate science of peer-review gone bad.

          If you want to debate that such debased peer-review is fairly common in the classic “hard” sciences, you need to provide some evidence other than your personal experience publishing outside those fields.

        • nono
          Posted Dec 16, 2010 at 10:23 AM | Permalink

          I’m in Geophysics, dynamics of the lithosphere.

          I’ve witnessed gatekeeping behavior against crackpots, who then claimed they had been censored.

        • Posted Dec 16, 2010 at 10:32 AM | Permalink

          Yes, thanks Skip. Social science and climate science apparently have similar peer review problems that need to be resolved. In subjects that tend to help people and economies (like engineering, chemistry, physics, medicine, etc.) the types of peer-review ‘practices’ described in this article are not tolerated.

        • Brian H
          Posted Dec 17, 2010 at 8:09 AM | Permalink

          OT

      • NormD
        Posted Dec 16, 2010 at 8:40 AM | Permalink

        This seems germane

        How To Publish A Scientific Comment in 123 Easy Steps

        How To Publish A Scientific Comment in 123 Easy Steps

      • per
        Posted Dec 17, 2010 at 12:17 PM | Permalink

        Note that for “gatekeeping”, we are talking about one bad reviewer. If you want to make the case that you get one bad reviewer, that is a widespread phenomenon in science. The idea that chemistry/ physics are exempt is very strong.

        Just for info, there was a recent science paper on bacteria containing Arsenic. There is some severe criticism (e.g.
        http://rrresearch.blogspot.com/2010/12/arsenic-associated-bacteria-nasas.html
        http://www.guardian.co.uk/science/blog/audio/2010/dec/13/science-weekly-podcast-arsenic-bacteria-backlash)

        The embarrassing thing is NASA went overboard, with press conference, etc. The science looks to be extremely ropey, and all of a sudden NASA doesn’t want to do anything about discussing it that isn’t “peer-reviewed”. Even taking the paper at its best, with the evidence on its face, it is verging on the astonishing that the paper got through. The evidence for As being in the DNA is simply embarrassingly poor. Science is likely not to want too much publicity, because it is likely going to be very embarrassing for Science’s standards.

        per

    • jorge kafkazar
      Posted Dec 16, 2010 at 12:24 AM | Permalink

      I think you mean http://www.surfacestations.org/

    • Posted Dec 18, 2010 at 5:41 AM | Permalink

      Re: bender (Dec 15 10:35),

      This particular exchange is not a tempest in teapot. Indeed, this is the reason for climategate. A group of skeptical auditors wanted to make sure that when the UEA CRU surface record was putatively adjusted for urban heating effects, that these effects were estimated robustly and that corrections were in fact correctly implemented.

      What ensued was a gobsmacking pattern of denial…

      There is another pea-under-thimble issue here, which I regard as pretty crucial to the whole shenanigan. Similar to the way Mann validates recent warming by using data-without-bristlecones-but-with-Tiljander, mutually validating data-without-Tiljander-but-with-bristlecones. Here we have “natural solar influence is invalidated because recent correlation fails, therefore it’s our unnatural CO2, etc” but the solar correlation only fails so long as UHI is regarded as negligible.

      “THEREFORE TRIP UP ALL THOSE WHO SUGGEST UHI IS SIGNIFICANT”.

  13. Mac
    Posted Dec 15, 2010 at 10:46 AM | Permalink

    Andrew Comrie to Phil Jones (2004), “Your contribution to the review process is essential and greatly valued”

    It would seem that Andrew Comrie is greatly enamoured of the Team.

    • bender
      Posted Dec 15, 2010 at 10:47 AM | Permalink

      Or maybe he’s just being polite. He was very polite with Ross.

  14. bender
    Posted Dec 15, 2010 at 11:04 AM | Permalink

    I like how, on p. 17, removing the effect of spatial autocorrelation brings the surface trend lower than the tropospheric trend, in conformity with GHG theory & GCM output. Let’s hear Gavin argue against that one.

  15. RomanM
    Posted Dec 15, 2010 at 11:05 AM | Permalink

    What an embarrassment to science many of the “learned societies” of climate have become:

    Gavin Schmidt left me a message a few days ago saying that he hoped to have his comments ready within a week. When I receive them, the process then goes back to the reviewers.

    Not only was Gavin a “reviewer”, he was in charge of the entire review process! Where has the integrity of these people gone?

    • mark t
      Posted Dec 15, 2010 at 11:28 AM | Permalink

      Advocacy attracts those with an agenda, not integrity. Your question assumes they had some to begin with.

      Mark

      • Posted Dec 15, 2010 at 1:32 PM | Permalink

        Good point. Advocacy is one side of the coin. The other (to my mind) is the step-change in funding in the 1988-92 (Bush Snr) years after Hansen’s testimony to Congress – from $200m to $2bn per year was it? This feeds into and motivates advocacy – indeed the two are highly synergistic.

        Not all integrity goes out the window as new money, and thus people, arrive. But soon the bad begins to predominate, both within groups and individuals. As Lindzen has pointed out many times, once the increase of funding has taken hold none of the rest of the badness requires a conspiracy, at least one simplistically conceived. That observation of course doesn’t prove that there are no conspiratorial people whatsoever in any part of landscape either. The Climategate emails contain not exactly exciting plotting but enough to poison the particular wells around which CRU operated. And that’s just a sample.

        • Harold
          Posted Dec 22, 2010 at 7:49 PM | Permalink

          NASA is something of a special case. They spent a lot of time lobbying for funding in the post-Apollo years. The shuttle SRB that failed was originally supposed to have been fabricated in one piece in Florida, but getting congressional support from the Utah delegation was much easier of it was made in two pieces in Utah.

          Or perhaps East Anglia was in a similar funding situation, so they need dire results to increase funding.

  16. bender
    Posted Dec 15, 2010 at 11:12 AM | Permalink

    Dear Dr. McKitrick,

    If I understand Table 6 correctly, this would suggest that 1/3 of the surface warming trend in the last 30 years may be attributable to urban and land-use effects alone. Do I have that correct? Second, given that regression produces biased coefficients, might this number be as high as, say, 1/2?

    Thank you for your tenacity in making public this important work.

    • Posted Dec 15, 2010 at 12:21 PM | Permalink

      Given the assumptions and the method, yes, a 1/3 reduction in the land-based trend is, in my view, a reasonable conclusion of the analysis. I hope to be able to firm up the analysis in future by pooling time series and cross section, but for now that is the result.

      I’m not sure what you mean when you say that regression produces biased coefficients. The estimators we used are asymptotically unbiased, at least theoretically.

      • RuhRoh
        Posted Dec 15, 2010 at 3:37 PM | Permalink

        Or, said again differently, would it be equally correct to say, if indeed 1/3 of the putative temperature rise is attributable to Urban Temperature artifacts, that the reported trend overstates the increase by 50% ?

        BTW, are the results amenable to a geo-referenced presentation? Some folks (not me) seem to be very good at implementing those things in a very web-friendly way, such as KevinUK, etc. The map of removed outliers had me wishing for a color graphic conveying the results of the non-outliers. My ability to plow through tables of numbers has become impaired in concert with my short-term memory…
        Hashtable burnout…
        Thanks for all the grinding on the gory details!
        RR

      • See - owe to Rich
        Posted Dec 15, 2010 at 5:30 PM | Permalink

        Ross M, before this paper, based on your earlier paper with Michaels, in order to assess the global UHI contamination per decade, I had written to myself as follows.

        “I have found that M&M actually talk about an increase of 0.17K v 0.30K per decade over land. Therefore the estimated UHI discrepancy is 0.13K per decade, but that is over land. If there is none over sea then we should multiply this by a factor of about 0.3 to get to a global figure of 0.04K per decade.”

        Please can I now ask you:

        Was that analysis ever (approximately) correct?

        Is it still correct and if not what should it be?

        The answers will figure in a paper I am trying to write.

        Thanks,
        Rich.

  17. Pat
    Posted Dec 15, 2010 at 11:14 AM | Permalink

    Perhaps the House of Commons Science & Technology Committee may be interested in Ross’ experience, and the U.S. House of Representives Scitech committee may want to talk to IJOC about the peer review process.

    • RomanM
      Posted Dec 15, 2010 at 11:32 AM | Permalink

      Not really. IJOC is a publication of the Royal Meteorological Society which is based in the UK.

      Given the track record of government investigations of climate science misdoings in the UK, all one would see is another splash of white paint on the entire matter.

  18. Dishman
    Posted Dec 15, 2010 at 11:14 AM | Permalink

    I find this disappointing, but not surprising.

    As I see it, The Team has staked out an untenable position, and committed to defending it to the bitter end.

    That seems unwise to me.

    It further seems to me that in attempting to hold their position, they have closed their eyes to something important. That’s not mine to judge, though. I’ll leave that to each of the members of The Team to judge for themselves.

    • bender
      Posted Dec 15, 2010 at 11:31 AM | Permalink

      They’ll back-pedal without admitting any error and claim that this proves skeptics were making much ado about nothing (50% warming trend is enough to validate alarmist movement) while inflicting much collateral damage in the process. And they’ll get away with it. No one cares about process. It’s all about GHGs and the GMT trend. As long as the magnitude of attribution is non-zero their agenda will continue.

      • Dishman
        Posted Dec 15, 2010 at 12:02 PM | Permalink

        I understand what you’re saying, and it is a reasonable position based on the evidence available.

        My statement stands.

        • bender
          Posted Dec 15, 2010 at 12:17 PM | Permalink

          Your statement disagrees with mine.

          I think they have staked out several positions, some untenable, others more tenable – and that they will only chose to defend the tenable. The job of the skeptic is to force them off the less tenable positions. While they may be reluctant to back off a position, and they may loathe the optics – they DO back off positions without “defending them to the bitter end”. It’s called “moving on”. You move on while claiming victory. This is exactly what Steig seems to be doing.

          So, you see, we disagree quite strongly.

        • Ron Cram
          Posted Dec 15, 2010 at 7:14 PM | Permalink

          I don’t think you disagree quite strongly. In neither case do they give ground. If they claim victory and move on, their prior strong defense still stands. Getting them to declare victory and move on is not success.

      • Pat Frank
        Posted Dec 15, 2010 at 12:17 PM | Permalink

        As it turns out, it is zero, in the statistical sense.

        • bender
          Posted Dec 15, 2010 at 12:19 PM | Permalink

          Is your evidence for this claim related to this specific post?

        • Pat Frank
          Posted Dec 15, 2010 at 12:31 PM | Permalink

          No, I have a paper coming out in the next issue of E&E demonstrating this fact.

        • Posted Dec 15, 2010 at 1:22 PM | Permalink

          Re: Pat Frank (Dec 15 12:31), Hi Pat,

          Do you have a preprint available? If so, please contact me at cdquarles at msn dot com with a link to it.

          Thanks

        • Pat Frank
          Posted Dec 15, 2010 at 2:41 PM | Permalink

          It’s not posted anywhere, and hasn’t yet appeared on the E&E website.

          But here’s the title and abstract:
          “Uncertainty in the Global Average Surface Air Temperature Index: A Representative Lower Limit”

          Abstract: “Sensor measurement uncertainty has never been fully considered in prior appraisals of global average surface air temperature. The estimated average ±0.2 C station error has been incorrectly assessed as random, and the systematic error from uncontrolled variables has been invariably neglected. The systematic errors in measurements from three ideally sited and maintained temperature sensors are calculated herein. Combined with the ±0.2 C average station error, a representative lower-limit uncertainty of ±0.46 C was found for any global annual surface air temperature anomaly. This ±0.46 C reveals that the global surface air temperature anomaly trend from 1880 through 2000 is statistically indistinguishable from 0 C, and represents a lower limit of calibration uncertainty for climate models and for any prospective physically justifiable proxy reconstruction of paleo-temperature. The rate and magnitude of 20th century warming are thus unknowable, and suggestions of an unprecedented trend in 20th century global air temperature are unsustainable.

        • See - owe to Rich
          Posted Dec 15, 2010 at 5:10 PM | Permalink

          That is an interesting theory, and I’ll look forward to seeing your data which support it. But there are a lot of data which don’t support it. From year to year, we don’t see anything like a 0.46K standard deviation in the various anomalies. Are you accusing HadCRUT3 and GISS of cooking the data to avoid large variance? Are you accusing the satellite UAH anomalies of the same?

          Is variance more of a problem than bias, here?

          Rich.

        • Pat Frank
          Posted Dec 15, 2010 at 11:46 PM | Permalink

          I’m principally concerned with instrumental error, especially systematic error, which has been thoroughly neglected in making global temperature series. The results such as you describe could arise from a slow drift of systematic bias among the stations that contribute the average annual anomaly. Variance and bias due to systematic error only show up when the experimental instrument is tested against a precision instrument that isn’t subject to, e.g., uncontrolled variables. Without an external reference, the biased measurements just look like data. If one starts with systematic biases in the measurements, and there’s more systematic drift, one needn’t have reason to suspect anything wrong, just looking at the data.

        • Geoff Sherrington
          Posted Dec 16, 2010 at 9:15 PM | Permalink

          Pat, here is some internal Australian work that was forwarded to me by the BOM. I believe it is open file. It deals with sensor error evaluation and other variables related to instrumentation. The site is only a few miles from my home so I can relate to it with local knowledge. It’s a bit late for your paper in prep, but might assist questions arising.

          Click to access Jane%20Warne%20thermometry%20Broadmeadows.pdf

        • Pat Frank
          Posted Dec 17, 2010 at 12:56 PM | Permalink

          Thanks Geoff — see my comment in our other thread below, where things aren’t so compressed.

        • Brooks Hurd
          Posted Dec 18, 2010 at 8:30 AM | Permalink

          Pat,
          I look forward to reading your article. It has been my belief for years that much of the climate science community has ignored the effect of instrument error on their conclusions.

        • Pat Frank
          Posted Dec 18, 2010 at 7:57 PM | Permalink

          Thanks, Brooks. If you contact me at pfrank830 AT earthlink POINT net, I’ll send you a reprint.

  19. oeman50
    Posted Dec 15, 2010 at 12:11 PM | Permalink

    Can you say “IJOC gate”?

    • bender
      Posted Dec 15, 2010 at 12:21 PM | Permalink

      What, because an editor reneged on a promise that he was not bound to uphold?

    • Steven Mosher
      Posted Dec 18, 2010 at 11:00 PM | Permalink

      not without breaking my tongue. does mosher sound finnish to you?

  20. Pat Frank
    Posted Dec 15, 2010 at 12:30 PM | Permalink

    What’s really psychologically jarring is the friendly and reasonable stance exhibited by Andrew Comrie throughout his communications with Ross, contrasted with his apparently concurrent perversion of the review protocols.

    In reviewing Chemistry papers, I would have to ask permission of the editor to bring in a colleague to assist. That colleague would write a joint review with me. S/He would not write a separate review, and would certainly not:
    a) bring in further unannounced reviewers in a team effort, and;
    b) end up in control of the entire tempo of the review process.

    One can only wonder how these folks can possibly think they’re engaged in ethical practice. This is especially true for an editor like Andrew Comrie, who really doesn’t have any ax to grind in the specific science at hand. What could he possibly be thinking to give an unfair inside track to scientists who are shamelessly and baldly intent on subverting scientific ethics in order to carry the argument?

    • Hank Hancock
      Posted Dec 18, 2010 at 1:22 AM | Permalink

      I would like to offer some validation to your comment from another field. I’ve published a number of papers in leading medical journals. My field is perinatology. While we’ve faced the occasional reviewer who is initially critical of our submission, I’ve found that reasonable exchange resolves most issues and the paper goes to publication. The journals I’ve published in are usually very careful to

      a) not select reviewers who may have a vested interest in the acceptance or rejection of a submission

      b) not introduce new reviewers or subject matter experts to the review process without formal process (with proper notification and justification for their inclusion to the authors). Such changes in mid stream of the review process is generally seen as the journal having been incompetent in their selection of reviewers and the reviewers themselves being incompetent to recognize they weren’t qualified to review.

      I never cease to be amazed at the degree of shenanigans disclosed in the climate science publican process and find the IJOC’s obviously biased selection of reviewers, and allowance for interference by Schmidt, who had just as obvious vested interest, beyond any reasonable defense.

  21. Craig Loehle
    Posted Dec 15, 2010 at 12:41 PM | Permalink

    The thing that bothers me about Gavin’s “refutation” of the original M&M work is that his arguments such as autocorrelation of climate fields undermining the conclusions, etc., is that they are merely plausible alternatives, not demonstrated. That one can defend the status quo with such hand-waving should not be allowed.

    • bender
      Posted Dec 15, 2010 at 12:46 PM | Permalink

      It is a dodge. An unwillingness to engage. An unwillingness to “tell the whole truth”. The peer review process as we know it does not oblige this level of engagement.

      • Luis Dias
        Posted Dec 16, 2010 at 9:23 PM | Permalink

        Sure. Just as anyone can dismiss Gavin’s papers, and why not, his “friends'” papers, while making some disparaging comments on how they can go wrong, so can Gavin himself. He can say whatever he wants, after all it’s his blog.

        Furthermore, he has no interest whatsoever in giving his opponents too much credit by investing himself more time than what he deems necessary in order to get his point across. So he will write his own paper dismissing Ross’s findings, and then the audience will consider it a “debunking” of another skeptic paper and call it a day. For this to happen he probably doesn’t even need to address all the strong points of mckitrick 2010.

    • mikep
      Posted Dec 15, 2010 at 6:42 PM | Permalink

      It’s worse than handwaving. S09 certainly gives the impression that he thinks autocorrelation in the dependent variable is enough on its own to cause problems – presumably Jones does too, judging by his review. But the point is that it’s autocorrelation in the residuals that causes problems. Jones as a reviewer could and should have asked that this be tested for. But he didn’t, whether through ignorance or other reasons is not clear.

  22. RayG
    Posted Dec 15, 2010 at 7:48 PM | Permalink

    This December is rapidly turning into a very interesting month. I might even be tempted to say that Christmas has come a little early. So far, we have O’Donnell et al rebutting Steig 2009, Zhong and Imhoff’s presentation at AGU challenging Jones et al 1990 and its progeny and now McKitrick and Nierenberg rebutting Schmidt 2009. Perhaps I am being a little greedy, but I hope that there are more presents to come.

    Thank you to all of the authors who have persevered in publishing their work.

  23. Henry
    Posted Dec 15, 2010 at 8:51 PM | Permalink

    It does not make any difference to the analysis, but I don’t like MN’s Mercator projection in Figure 1: it should only be used for navigation. What would happen if you wanted to show anything near the Poles?

    • Posted Dec 15, 2010 at 9:07 PM | Permalink

      Software to put dots on maps is, I’m embarrassed to say, something I haven’t yet acquired. It’s a deficiency in Stata that it doesn’t (AFAIK) have a mapping utility. Yes I know, R does, I sat down last year on November 16 to learn R and then got distracted shortly thereafter. Nicolas used google maps to generate Fig 1. If anyone is willing to improve on it please email me ross.mckitrick at uoguelph dot ca right away and I’ll send you the coordinates, and I can probably get a better-looking Figure into the ms before printing.

      • Laws of Nature
        Posted Dec 16, 2010 at 4:37 AM | Permalink

        Dear Ross,

        Steve did several plots like that with R (I believe).
        Learning R is one thing, taking a working document and only replacing the dots quite another 🙂

        For what it’s worth, I agree Pat Frank and bender:
        The tone of Andrew Comrie is very friendly he should be made aware of this blog for example and hopefully he is willing to comment!
        And me too is amazed by December so far!
        I start to wonder if it is possible to save 100billion $ somehow 🙂

        All the best
        LoN

      • Posted Dec 16, 2010 at 10:02 PM | Permalink

        Thanks to 2 readers who have supplied me with maps.

  24. AntonyIndia
    Posted Dec 15, 2010 at 10:48 PM | Permalink

    From the IJOC’s website:”The International Journal of Communication is an online, multi-media, academic journal that adheres to the highest standards of peer review and engages established and emerging scholars from anywhere in the world. The International Journal of Communication is an interdisciplinary journal that, while centered in communication, is open and welcoming to contributions from the many disciplines and approaches that meet at the crossroads that is communication study.”

    The above shows their “highest standards of peer review”. They bring down the curtains on themselves. Who wants to publish with such an dubious club and who trusts what they read there? Another case of ‘defending vested interests’.

  25. AusieDan
    Posted Dec 15, 2010 at 11:22 PM | Permalink

    Pat Frank, Bender & Ross McKitrick
    I agree with Pat Frank completely.

    I have studied both maximum annual temperature and rainfall at a number of individual Australian locations.

    When there is no UHI, temperature and (negative) rainfall anomalies vary together with no sign of long term increase.

    Long term trend in temperature in Sydeny for example was flat from 1866 to 1957. In March 1858 there was a major change in the immediate built environment and from then temperature departed from rainfall and started to rise.

    Ditto Adelaide in 1978, when the thermometer was moved from the edge to the centre of the city.

    I have a draft paper on this and more if anybody is interested. Steve MacIntyre has my email address.

  26. AusieDan
    Posted Dec 15, 2010 at 11:26 PM | Permalink

    Typos again, I’m afraid.

    The change in Sydney built environment occurred in March 1958, not 100 years previously.

    And yes, my home town is spelt Sydney.

  27. Richard
    Posted Dec 16, 2010 at 12:39 AM | Permalink

    This whole episode is extraordinary. I am a reviewer for a major international journal in my field (thank God, its not Climate Science), so am more than familiar with peer review in my profession. Other close relatives of mine also hold similar positions. I have never in my life come across such as tainted process as Ross has described. This is not science. Its the mafia.

    • Steve McIntyre
      Posted Dec 16, 2010 at 12:45 AM | Permalink

      Actually there are some even worse incidents. I think that I’ll take up Team peer review in some posts in the next month or so = maybe rattle a few skeletons.

      It’s frustrating that Muir Russell, who was charged with investigating such incidents, failed to do so.

      • Richard
        Posted Dec 16, 2010 at 12:49 AM | Permalink

        All I can say is that I admire the persistence that Ross and you display – if I was confronted with this challenge to publish my papers, I’m sure I wouldn’t have the time to combat it. I can only stand back in awe and congratulate you. I just can’t understand how IJOC can let this happen. It’s pretty clear what is going on here! What is damning for IJOC is that this paper was published elsewhere.

      • Posted Dec 16, 2010 at 4:10 AM | Permalink

        Speaking of “Team peer review” and that which Muir Russell failed to do … as many here are aware, the unsavoury practices in which the Team engaged extended to their involvement in the IPCC.

        There is now a “powerful new research tool” available that will lighten the load for those who wish to examine the foundations on which the IPCC has built its “assesssment” in AR4. AccessIPCC provides an annotated version of all 44 Chapters contained in Working Groups 1, 2 & 3 of the IPCC’s Fourth Assessment Report.

        In addition to linking citations within the text directly to the reference (content of which can be seen in advance via tooltip in mouseover), we have “tagged” citations (and some key people) with a variety of parameters.

        There are also a number of summary tables – and we didn’t cherry-pick or torture one byte of data. Andrew Montford has noted that the “overall effect [of AccessIPCC] is to illuminate our understanding of the AR4 process.”

        It might even help in the ‘rattling of skeletons’, Steve 🙂

  28. bender
    Posted Dec 16, 2010 at 12:43 AM | Permalink

    I would invite Mike Comrie to comment, both on the editorial process, and on the fact that the article appears to be sound, and was eventually accepted elsewhere.

    • mondo
      Posted Dec 16, 2010 at 5:45 AM | Permalink

      Bender. I think you mean Andrew Comrie.

      • bender
        Posted Dec 16, 2010 at 8:04 AM | Permalink

        Ok, him too. But Mike Comrie probably knows more about hockey sticks.

  29. Nicolas Nierenberg
    Posted Dec 16, 2010 at 1:57 AM | Permalink

    Gavin has commented about this over at realclimate. [SM- http://www.realclimate.org/index.php/archives/2010/12/responses-to-mcshane-and-wyner/comment-page-1/#comment-194921%5D

    Since this is likely to come up anyway, there is another conspiracy-laden post at CA with regards to the peer review of a new publication by McKitrick and Nierenberg. This had it’s genesis in a ‘comment’ on my 2009 paper on spurious correlations in work by Mckitrick and Michaels and separately, de Laat and Maurelis. Instead of submitting a comment on my paper, M&N submitted a ‘new’ paper that was in effect simply a comment and specifically asked that I not be a reviewer. I was therefore not chosen as a reviewer (and I have no idea who the reviewers were). Nonetheless, since the submission was so highly related to my paper, and used some of the data I had uploaded as part of my earlier paper, the editor of IJOC asked me to prepare a counter-point to their submission. I did so, and in so doing pointed out a number of problems in the M&N paper (comparing the ensemble mean of the GCM simulations with a single realisation from the real world, and ignoring the fact that the single GCM realisations showed very similar levels of ‘contamination’, misunderstandings of the relationships between model versions, continued use of a flawed experimental design etc.). I had no further connection to the review process and at no time did I communicate directly to the reviewers.

    The counter-point I submitted was fair and to the point (though critical), and in no way constituted any kind of improper influence. Editors make decisions about who reviews what paper – not authors, and they make the decisions about what gets accepted or not, not reviewers. Authors who seek to escape knowledgeable scrutiny of their work often come up with lists of people who they claim are unable to give a fair review, and editors need to use their discretion in assessing whether this is a genuine issue, or simply an attempted end run around the review process.

    I have not yet seen the ‘new’ M&N paper, but it is very likely to be more of same attempts to rescue a flawed analysis. It should be noted that the main objection to my 2009 paper was that I didn’t show that the residuals from Mckitrick’s regression contained auto-correlation. This could have been usefully added (and can be seen here), and in any case was admitted by Mckitrick in yet another flawed paper on the topic earlier this year. The overwhelming reason why Mckitrick is wrong though is because he is using an incorrect null hypothesis to judge the signficance of his results. A much more relevant null is whether the real data exhibit patterns to economic activiy that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.

    • bender
      Posted Dec 16, 2010 at 8:08 AM | Permalink

      Gavin, Gavin. None of the issues you point to affects the substance of their argument. They’re right; you’re wrong. Get over your silly mistake about spatial autocorrelation. Digging your heels in ever deeper makes us all wonder about what the rest of your science is like.

      • Mikael Lönnroth
        Posted Dec 16, 2010 at 8:53 AM | Permalink

        If this blog post is to taken as a part of the argument about the process, though, then either Gavin is not telling the truth or McKitrick & McIntyre have come to unfounded conclusions about collusion?

        • kim
          Posted Dec 16, 2010 at 9:22 AM | Permalink

          Andrew Comrie and Gavin Schmidt appear to contradict each other over the issue of the process of choosing purview, inadequately excused by the nuance of ‘counterpoint’.
          ========

        • bender
          Posted Dec 16, 2010 at 9:42 AM | Permalink

          It’s not that black & white. Read the thread. Comrie made a promise that he sorta kinda did not keep. But really he kinda did. But not really really. Besides, he wasn’t obliged to.

          Ok. It’s not black-and-white like you make it out to be.

        • Steve McIntyre
          Posted Dec 16, 2010 at 9:55 AM | Permalink

          Re: Mikael Lönnroth (Dec 16 08:53),
          If you parse the words, Comrie said;

          Some of the 3rd party reviewers have requested more background on the exchange, including views from Schmidt and other related submissions, so Schmidt has been asked to write responses.

          It appears that the reviewers wanted to know what Gavin thought before expressing their own opinions and that Comrie was the one who solicited comments from Gavin. Gavin explains this on the basis that the McKitrick-Nierenberg paper was a sort of comment on Schmidt. But Schmidt 2009 was equally a comment on McKitrick and Michaels and, in that case, the reviewers did not seek out McKitrick’s views before rendering a decision on Schmidt. Nor was McKitrick given an opportunity to examine Schmidt’s code prior to acceptance of Schmidt 2009. We have Phil Jones’ review to show the difference between the pal review that Schmidt received and the adversarial review that McKitrick and Nierenberg received.

          If Gavin the International Mystery Man is blind to the differences, then that says more about the Team than about the validity of the discrimination.

        • bender
          Posted Dec 16, 2010 at 10:02 AM | Permalink

          But it’s ok to discriminate, as long as it’s only against crackpots.

        • Posted Dec 17, 2010 at 12:42 AM | Permalink

          Or as long as it’s indiscriminate.

        • Posted Dec 16, 2010 at 12:01 PM | Permalink

          Gavin said:

          since the submission was so highly related to my paper, and used some of the data I had uploaded as part of my earlier paper, the editor of IJOC asked me to prepare a counter-point to their submission

          Gavin is speculating about why he was asked to contribute a response. Comrie gave me a different reason: he said it was at the request of the other reviewers. I have no idea if that is true or not.

          Gavin’s stated rationale would have equally applied to his paper commenting on my work and that of de Laat and Maurellis, but we obviously weren’t invited to provide a counter-point for that paper. And in that case the IJOC asked Phil Jones to be a referee, and they had to have known he was far from neutral, especially when they saw the puffball review he provided.

          In the cover letter to IJOC I said

          Although he was responding to my paper, and he used my data and my methodology to make his argument, neither I nor my coauthor were invited to referee or comment on the paper. I therefore request the same consideration, namely that you not ask Gavin Schmidt to referee my paper.

          Gavin then hints that this was an attempt at an end-run: “Authors who seek to escape knowledgeable scrutiny of their work often come up with lists of people who they claim are unable to give a fair review.” However, since he then goes on to say

          I have not yet seen the ‘new’ M&N paper, but it is very likely to be more of same attempts to rescue a flawed analysis.

          I’d say we were right to claim he could not give a fair review.

          As for his other points, complete tabulations of spatial autocorrelation tests on the dependent variable and residuals are shown in the paper. He’s the one who built an argument on the issue without conducting (or at least reporting) any tests. And he seems to be advocating a Texas Sharpshooter approach to the issue. If, out of the hundreds of GCM runs that could be procured, you can find one that exhibits correlations between socioeconomic patterns and warming patterns, that proves the pattern is spurious everywhere it is observed. Well, no. Even if the process is totally random, you might get 5% of the runs exhibiting a spurious pattern. Our paper presents a range of null hypotheses that covers all the reasonable statements of the problem, namely whether real-world observations exhibit the contamination pattern in question.

        • bender
          Posted Dec 16, 2010 at 12:08 PM | Permalink

          If, out of the hundreds of GCM runs that could be procured, you can find one that exhibits correlations between socioeconomic patterns and warming patterns, that proves the pattern is spurious everywhere it is observed. Well, no. Even if the process is totally random, you might get 5% of the runs exhibiting a spurious pattern.

          Surely to god Gavin understands this.

        • Kenneth Fritsch
          Posted Dec 16, 2010 at 12:39 PM | Permalink

          Gavin Schmidt reminds me of the politician, who appears to be a knowledgeable person on the subject at hand, and comes forth with an answer like this one of Schmidt’s. You do not know whether you overestimated his knowledge base or whether he is simply giving you a “political” answer. Some Team members do have a rather unique way of replying to criticisms.

        • bender
          Posted Dec 16, 2010 at 12:36 PM | Permalink

          It’s called a “permutation test of significance”. If Gavin really has run a large number of simulations then use this as a denominator. Then count the number of runs with correlations equalling or exceeding those of McKitrick’s and use this as the numerator. That ratio is a direct estiamte of the probability of Ross uncovering this relationship by random chance alone.

          Should McKitrick have done this himself? IMO, yes. OTOH not every reviewer would insist on that approach as a condition of publication.

          The point is: if Gavin has those numbers – and he says he does – then why doesn’t he reveal them? Why does he dance around saying that McKitrick used an incorrect test of significance? Let’s see ’em.

          Bottom line: My bet is that no more than 10 in 1000 yield correlations as high as McKitrick’s. Implying a 1% chance he’s wrong. I am tempted to guess 1 in 1000.

        • Posted Dec 16, 2010 at 12:57 PM | Permalink

          There’s more to it than that. You have to get the signs right. If the model generates significant correlations with the opposite sign as the observed data, that increases the significance of the model-data mismatch. And to motivate such a test you have to explain why the GCM outputs should be considered appropriate to generate the benchmark stochastic term for the observations, rather than the error term in the observations themselves. Especially since the SAC tests show that the GCMs have a different spatial dependence structure than observations. Put another way, you could come up with obviously nonsensical models to generate the numerator in the above test, which would let you get any critical value you want. There has to be a statistical argument why your benchmarking process corresponds to the null hypothesis. Our regression model provides this automatically, but the GCM Wheel of Fortune approach does not.

          Anyway, if this really was Gavin’s argument, I can attest that the referees made no reference to it in their rejection of the paper.

          The thing is, Gavin proposed the 5 GISS-E runs as a benchmark. His argument fails on his own terms, so it won’t do for him now to appeal to some large speculative population of GCM runs that might or might not exist, and if they do, they might or might not yield the set of coefficients he’s after. His model runs generated coefficients that bore no resemblance to the results on observed data. Even leaving aside the fact that he failed to correct the SAC in his own regressions on the GISS-E data, I don’t understand how he could claim that he demonstrated our results are spurious when he could not replicate them on his artificial data set.

        • bender
          Posted Dec 16, 2010 at 1:02 PM | Permalink

          I referred to magnitude of correlations, not significance levels:

          the number of runs with correlations equalling or exceeding those of McKitrick’s

          so in my example I’m also insisting that we get the signs right.

        • bender
          Posted Dec 16, 2010 at 1:07 PM | Permalink

          His model runs generated coefficients that bore no resemblance to the results on observed data.

          Yes, I know. That’s why I want to see 1000 unsnooped runs. To prevent him from snooping 5 new ones that might be more favorable to his case.

          His latest argument is bogus, so why not invite him to show us the numbers?

          And after he’s done that maybe he can talk to us about Imhoff et al 2010, and whether their estimate of UHI is compatible with Jones et al (1990).

        • Bernie
          Posted Dec 16, 2010 at 1:15 PM | Permalink

          Surely Gavin has mis-specified the entire issue. As I read it, Gavin is claiming that physical climate forcings as captured in the GCM’s provide such a great explanation of the observed temperature trends that a model that includes the same physical climate forcings plus socio-economic factors would provide no better explanation of the temperature trend. Wouldn’t the fact that a certain run of a GCM explains so much of the variability in the temperature trends that there is no unexplained variance and therefore nothing for the socio-economic factors to explain mean that that particular GCM must be the Holy Grail of GCMs? Surely this argues that existing GCMs have a far greater explanatory power than seems heretofor to be the case given the variation among GCMs.

        • Posted Dec 16, 2010 at 1:48 PM | Permalink

          Nail/Head. I have been working for a few months with another coauthor on this very issue. We are using all the GCM runs currently available, evaluating every combination numerically possible (2^55) alongside the socioeconomic and circulation index data I had previously assembled, to see what variables really explain the spatial trend pattern. We got stalled because it is difficult to deal with the requirement that the GCMs have to yield positive correlations with the observations — the issue bender and I noted above. Inequality constraints in this kind of model are difficult to formulate.

        • Bernie
          Posted Dec 16, 2010 at 10:36 PM | Permalink

          Ross:
          It seems that Gavin is arguing that since enough monkeys sitting at enough typewriters for long enough might type “To be or not be”, then Shakespeare clearly (a) didn’t write Hamlet or (b) Shakespeare was not a great writer or (c) Shakespeare was a monkey or (d) any combination of (a), (b) and (c). His logic is mystifying.

          Surely given the newly emerging size of UHI efects, Occam would argue that you control the obvious UHI effects (whatever their source) before looking at the more complex climate factors with unknown feedback effects. Jones et al efforts to treat UHI as minimal increasingly looks like an excuse not to include such socio-economic measures in their models.

        • oneuniverse
          Posted Dec 17, 2010 at 1:12 PM | Permalink

          Thanks Bernie!

          Gavin trundles on though eg. #46: “Deciding that a spatial pattern that is correlated to a ‘socio-economic’ variable is causative, requires an understanding of what the distribution of that pattern is under a null hypothesis of no ‘contamination’. GCMs can produce such a distribution (albeit imperfectly), and so should be used for the null.”

          Would the authors consider leaving a reply at RC ?

        • bender
          Posted Dec 17, 2010 at 7:33 PM | Permalink

          Gavin can’t possibly imagine a mechanism whereby socioeconomic variables stand in as a proxy for soemthing like, say, umm, people, going industrial things in urban areas?

          Not afflicted with the malady of thought, I guess.

          Here’s a clue, Gav, if you have such a hard time thinking of a mechanism, why not drop Ross an email and you guys talk about it like real people?

        • bender
          Posted Dec 17, 2010 at 7:57 PM | Permalink

          McKitrick: Interesting, there are spatial autocorrelation patterns in global climate data that match spatial autocorrelation patterns in socioeconomic data.

          Schmidt: So what, my models produce that kind of spatial autocorrelation patterning with no socieconomic variables. Therefore your correlation is spurious.

          McKitrick: But the patterns in the climate models don’t match the patterns in the socioeconomic data, despite them both being spatially autocrrelated. Therefore the correlation I observe in real data is probably not spurious.

          Schmidt: It probably is spurious because there the spatial autocorrelation is so high that you don’t have as many effective degreed freedom as you yhink you do.

          McKitrick: Yes, but there is no spatial autocorrelation in hte model residuals, and that’s the issue. It means I have well-specified model, which means whatever those socioeconomic variables are proxying, the relationship is pretty strong.

          Schmidt: Yeah, will you haven’t pinpointed the mechanism.

          Mosher: It’s true that some of Ross’s variables are pretty flakey.

          Imhoff: Say, maybe these UHI effects are stronger than we think.

          Long: And maybe their impact is being smeared into the rural data.

          Mosher: I know how we can estimate those effects and how to correct for them.

          Jones: We already know the magnitude of UHI and it’s so small it hardly needs correcting.

          Gavin: We already know the magnitude of UHI and it’s so small it hardly needs correcting.

          Team chorus: We already know the magnitude of UHI and it’s so small it hardly needs correcting. No one shall publish anything in cliamte science without our say-so.

          You tell me who’s being productive in moving us forward and who’s being obstructionist and holding us back.

        • Craig Loehle
          Posted Dec 16, 2010 at 5:02 PM | Permalink

          This type of defence is also used for the GCMs. To the charge that the GCMs do not reproduce the PDO and/or mid-Century warm bump, it is asserted that sometimes they do show a decadal excursion. A similar type of false defense.

        • Neil Fisher
          Posted Dec 17, 2010 at 3:05 AM | Permalink

          I always thought it would be interesting to know if any of these “stochastic decadal excursions” in the model runs ever lasted significantly longer than a decade or so. By the “Gavin logic” described above, if even one of those runs showed a century long excursion, then it is not beyond the realm of possibility that the real realisation of climate we have experienced was simply one such excursion itself. Of course, such model runs would no doubt be thrown out as “unrealistic”…

    • Salamano
      Posted Dec 16, 2010 at 9:20 AM | Permalink

      This is a posting replying to Stephen Mosher on RealClimate–

      44.The overwhelming reason why McKitrick is wrong though is because he is using an incorrect null hypothesis to judge the significance of his results. A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.

      #####
      That’s a good point Gavin. With his code in hand could you do that?

      [Response: Yes, and I have for each iteration I have looked at. And McKitrick’s conclusions fail to hold up every time. There is a limit to how many times I’m going to do it again. Anyone else could do so themselves using the archived data for Schmidt (2009) or data from IPCC AR4. Note that this requires looking at individual runs, not ensemble means. – gavin]

      What is the ‘incorrect’ null hypothesis (or, judging from his later comments, perhaps ‘irrelevant’ null is a better construct?). Which conclusions are failing to hold up? Seems like an exercise more than a few folks could examine.

      • bender
        Posted Dec 16, 2010 at 9:48 AM | Permalink

        McKitrick’s conclusions fail to hold up every time.

        Ahh, there’s that pea again. If the significance levels change when using individual runs as opposed to ensemble averages, then I wouldn’t exactly call that “conclusions” that “fail to hold up”. See, the details matter here. So no one should be asking Gavin if he COULD do the analysis. They should be asking to show his evidence – the way McKitrick did.

        But over at RC they’ve got the sheep correctly programmed to NOT demand such evidence.

      • bender
        Posted Dec 16, 2010 at 12:44 PM | Permalink

        There is a limit to how many times I’m going to do it again.

        Sounds like a possible pre-emptive dodge. That “limit” needs to be about 1000-10000 in order to have a well-resolved permutation test of significance. Ask Gavin exactly how many runs he’s done and of those how many produced correlations equal or exceeding McKitrick’s. Copy and paste my text if you like. If he refuses then, because he’s already snooped the possiblities, you know what the answer is: Ross right, Gavin wrong.

  30. Geoff Sherrington
    Posted Dec 16, 2010 at 6:01 AM | Permalink

    For Pat Frank,

    Here is a set of statistical tables and graphs for 16 Australian “rural” (and probably truly rural) stations for the last 40 years. If you can tease it out, standard deviations around 0.5 on annual temperatures are not uncommon, but I fear to comment on cause because I have not conducted causal sub-experiments. Rawer data available on request.

    http://www.geoffstuff.com/SI%20GRAPHS%20AND%20STATS.doc

    Dop chemists like you and me really have such antics before a publication is accepted? I think not, though I am out of the modern game play.

    • Pat Frank
      Posted Dec 16, 2010 at 1:20 PM | Permalink

      Hi Goeff — the data I used for my paper was comparison between the field temperature readings of standard sensor/screen combinations, and measurements of a high-precision temperature sensor that was mostly immune to the uncontrolled variables of wind speed and solar loading. All the data were taken under the same field conditions at the same time.

      That comparison allowed one to judge the validity of the sensor measurements against a “true” measurement. The data one gets is a bias frequency of the sensor measurements relative to the “true” value, and that gives the variance due to uncontrolled variables.

      Unfortunately, one always has to have a measure of the “true” value — the true air temperature in this case — in order to evaluate the systematic error hidden in the station sensor temperature data.

      The raw temperature data stream from a surface station doesn’t have that information, so it’s impossible to know the magnitude of the systematic error hidden in the numbers.

      Your question about Chemistry is relevant to the topic here. Never, in my experience, have I seen anything at all in Chemistry like the circumstances Ross experienced. Even when Pons and Fleishman were involved in the cold fusion fiasco, and things got politicized, I don’t recall any reports or complaints of gate keeping going on during the contentious debate on the issue.

      On the other hand, I’ve now published two papers on climate — this one and the Skeptic paper. Each time, one reviewer accused me of dishonesty; something that has never happened to me in about 60 Chemistry publications and a couple of controversies. And I’ve never heard any colleagues complain of like accusations or gate-keeping.

      Climate science is uniquely poisoned, and it seems to me deliberately so through the active collusion of some scientists. That the poison has spread so widely as to have toxicated multiple journal editors to the point of blithe dishonesty is simply astonishing. It seems clear that normative morality in climate science has gone the way of Hannah Arendt’s analysis, where evil is unrecognized because it is become contextually banal. What’s also astonishing is how easily and readily these people were moved.

      • Geoff Sherrington
        Posted Dec 16, 2010 at 6:52 PM | Permalink

        Pat, understood. I was giving some “different” SDs that show an order of magnitude for comparison. Used to own an analytical cham lab so I know exactly what you mean, hence my comment on not having looked at causation.

        It’s hard to be dishonest in Chemistry. The results are often so easy to replicate when correct.

        The more this develops, the more I’m looking to Learned Institutions for not enforcing guidelines, rules of conduct, etc. If this trend spreads to extreme consequences, we will arrive at a day when our medical specialists increasingly lack professionalism and that is a personal worry.

        • Pat Frank
          Posted Dec 17, 2010 at 1:16 PM | Permalink

          Goeff, it occurred to me later that in mentioning “causal sub-experiments” you meant basic instrumental uncertainties.

          Thanks for Jane Warne’s report. I’d actually found it in my earlier data searches on the web. But it didn’t present data that I could use. It shows temperatures from various setups relative to a reference Stevenson screen, which is a worthy study if one wants to estimate the necessary correction to account for instrumental changes.

          But what I really needed are measured temperatures relative to a high precision standard sensor that is relatively impervious to solar loading and wind speed. That’s what Hubbard and Lin reported, and that’s what allowed me to estimate a lower limit of systematic error in the record from standard climate stations.

          I really agree with your worry. Methodological standards are everything in science. If they go for reasons of incompetence, or are jettisoned for reasons of ideology, the whole of modern society goes with it along with liberal freedom.

        • Brooks Hurd
          Posted Dec 18, 2010 at 9:01 AM | Permalink

          Geoff and Pat, I have often wondered whether “team” climate scientists would be willing to fly on airliners designed and built by people who made use of the team’s selective reasoning and their disregard for analyses which lead to conclusions other thier own.

      • AnyColourYouLike
        Posted Dec 16, 2010 at 8:51 PM | Permalink

        Pat Frank

        “Your question about Chemistry is relevant to the topic here. Never, in my experience, have I seen anything at all in Chemistry like the circumstances Ross experienced. Even when Pons and Fleishman were involved in the cold fusion fiasco, and things got politicized, I don’t recall any reports or complaints of gate keeping going on during the contentious debate on the issue.

        On the other hand, I’ve now published two papers on climate — this one and the Skeptic paper. Each time, one reviewer accused me of dishonesty; something that has never happened to me in about 60 Chemistry publications and a couple of controversies. And I’ve never heard any colleagues complain of like accusations or gate-keeping.

        Climate science is uniquely poisoned, and it seems to me deliberately so through the active collusion of some scientists. That the poison has spread so widely as to have toxicated multiple journal editors to the point of blithe dishonesty is simply astonishing. It seems clear that normative morality in climate science has gone the way of Hannah Arendt’s analysis, where evil is unrecognized because it is become contextually banal. What’s also astonishing is how easily and readily these people were moved.”

        Pat, I just want to say that I think this is really beautifully put, and gets to the nub of the “crazy-making” antics in this field, which leave many of us open-mouthed with astonishment, though probably some combat veterans here may have become merely numb and inured. As a relative newcomer, I’m still at the shocked stage: it really is a very bizarre situation!

        • Pat Frank
          Posted Dec 17, 2010 at 1:19 PM | Permalink

          Thanks, I agree with your astonishment. I’d never have believed such things could happen, especially in physics, except that I’ve now seen it with my own eyes.

      • Posted Dec 17, 2010 at 4:41 AM | Permalink

        I’ve now published two papers on climate — this one and the Skeptic paper. Each time, one reviewer accused me of dishonesty; something that has never happened to me in about 60 Chemistry publications and a couple of controversies. And I’ve never heard any colleagues complain of like accusations or gate-keeping.

        Your testimony is vital Pat because, unlike Ross, your previous publications were in science. Are you the first such cross-over person willing to be critical of the much-feted ‘consensus’? As others have said, thank you – for the years given to this, out of love for true science.

        • Pat Frank
          Posted Dec 17, 2010 at 1:35 PM | Permalink

          Willie Soon and Sallie Baliunas are among the earliest scientists to take notice and publish work critical of what was happening to climate science. They have received endless villainous criticism, to the point that it’s my understanding that Sallie didn’t want to experience any more and left off. Willie has continued and has continued to suffer attacks on his character as a scientist.

          But remember that Fredrick Seitz, a prominent physicist and past president of the US NAS, was attacked and defamed when as far back as 1996 he protested the tendentious climatological bowdlerization of the IPCC’s 2AR. So, I’d guess that scientists external to climate science have been protesting the circumstances as they’ve come to their own realization of the corrosion of climate science and especially of the pernicious impact of that corrosion on science itself and on society. And of course, those within the field who protested and persisted have been under continuous defamatory assault. Pat Michaels and Fred Singer are prominent examples.

        • Posted Dec 17, 2010 at 4:09 PM | Permalink

          Thanks. I’ve always taken Soon and Baliunas (as well as Michaels and Singer) as inside the climate science tent. Seitz is new to me. I’ll look into it. But the vicious nature of the attacks is for me proto-totalitarian. Not the real thing, not yet, because the political systems hold the worst excesses at bay. Your reaching for Arendt says the same thing. Thank you again for standing up to it. (As for Steve, what can one say? Immense.)

        • Posted Dec 18, 2010 at 3:34 AM | Permalink

          Re: Pat Frank (Dec 17 13:35), Tim Ball is another. His Wikipedia punishment is to simply have his whole article deleted. Another He Who Must Not Be Named.

      • oneuniverse
        Posted Dec 19, 2010 at 7:29 PM | Permalink

        Pat Frank: Even when Pons and Fleishman were involved in the cold fusion fiasco, and things got politicized, I don’t recall any reports or complaints of gate keeping going on during the contentious debate on the issue.

        Pat, I merely note that there were complaints. eg. from (the) Julian Schwinger :

        My first attempt at publication, for the record, was a total disaster. “Cold Fusion: A Hypothesis” was written to suggest several critical experiments, which is the function of hypothesis. The masked reviewers, to a person, ignored that, and complained that I had not proved the underlying assumptions. Has the knowledge that physics is an experimental science been totally lost?

        The paper was submitted, in August 1989, to Physical Review Letters. I anticipated that PRL would have some difficulty with what had become a very controversial subject, but I felt an obligation to give them the first chance. What I had not expected–as I wrote in my subsequent letter of resignation from the American Physical Society–was contempt.

        [ “Cold Fusion : A Brief History of Mine”, J. Schwinger 1994 ]

        The pressure for conformity is enormous. I have experienced it in editors’ rejection of submitted papers, based on venomous criticism of anonymous referees. The replacement of impartial reviewing by censorship will be the death of science.

        [ From a talk given in Japan on fellow Nobel Laureate Tomonaga’s centennial birthday. ]

      • oneuniverse
        Posted Dec 19, 2010 at 7:52 PM | Permalink

        Pat Frank: Even when Pons and Fleishman were involved in the cold fusion fiasco, and things got politicized, I don’t recall any reports or complaints of gate keeping going on during the contentious debate on the issue.

        Hi Pat, there were complaints. eg. from Julian Schwinger :

        “My first attempt at publication, for the record, was a total disaster. “Cold Fusion: A Hypothesis” was written to suggest several critical experiments, which is the function of hypothesis. The masked reviewers, to a person, ignored that, and complained that I had not proved the underlying assumptions. Has the knowledge that physics is an experimental science been totally lost?

        “The paper was submitted, in August 1989, to Physical Review Letters. I anticipated that PRL would have some difficulty with what had become a very controversial subject, but I felt an obligation to give them the first chance. What I had not expected–as I wrote in my subsequent letter of resignation from the American Physical Society–was contempt.”
        (“Cold Fusion : A Brief History of Mine”, J. Schwinger 1994)

        “The pressure for conformity is enormous. I have experienced it in editors’ rejection of submitted papers, based on venomous criticism of anonymous referees. The replacement of impartial reviewing by censorship will be the death of science.”
        (from a talk given in Japan on fellow Nobel Laureate Tomonaga’s centennial birthday)

  31. kim
    Posted Dec 16, 2010 at 9:09 AM | Permalink

    Hang ’em high. Oops, they’re already hoist on their own retard.
    ===============

    • Brian H
      Posted Dec 17, 2010 at 8:58 AM | Permalink

      No hanging involved! A “petard” is a sapper’s mine, used to blow down castle walls. Premature detonation “hoists” the layer indeed, possibly in several discrete segments.

      • bender
        Posted Dec 18, 2010 at 5:07 PM | Permalink

        If Brian H read the blog he would know that (1) the hoisting of petards has already been thoroughly discussed (no pedagogy required), and (2) lecturing kim on just about anything is laughable.

        • kim
          Posted Dec 19, 2010 at 3:46 AM | Permalink

          Heh, I’m grooming him for succession.
          ============

  32. Kenneth Fritsch
    Posted Dec 16, 2010 at 11:26 AM | Permalink

    From reading McKitrick’s paper it would appear to me that Gavin Schmidt assumed that a spatial auto correlation of a series leads to the same auto correlation for the residuals that would be used in determining the degrees of freedom for a statistical estimation (such as used by McKitrick in his analysis) or that Schmidt erroneously used the wrong auto correlation.

    I do not read RC very often, so I do not know whether this question would be answered by Schmidt. In the first case it would appear that Schmidt was lazy and in the second that he has incomplete knowledge of the statistical processes he was criticizing.

    Did not Steve M point to a similar problem of using the auto correlation of a series versus the residuals of that series in another dispute with the Team?

    • geronimo
      Posted Dec 16, 2010 at 2:40 PM | Permalink

      Gavin is quick to describe anyone who isn’t a climate scientist as a “citizen scientist” with the contempt of a high priest for the hoi polloi. What has struck me is that the quite large numbers of the climate scientist community are “citizen statisticians” with an incomplete knowledge of statistical methods needed to separate signal from noise.

      • Steve McIntyre
        Posted Dec 16, 2010 at 3:01 PM | Permalink

        I think that “amateur statistician” is more apt.

      • Posted Dec 16, 2010 at 3:21 PM | Permalink

        Re: geronimo (Dec 16 14:40),

        While Steve’s definition could make sense…Amateur often has the following meaning…a person who engages in some art, science, sport, etc. for the pleasure of it rather than for money, i.e the lover of some activity. However, when I read the current defense of his statistical skill by Gavin Schmidt, I think more in terms of “The Rape of The Lock” (Alexander Pope) and see wikipedia if you crave a succinct explanation… and hoping that this does not break the bounds of propriety.

      • Brian H
        Posted Dec 17, 2010 at 9:03 AM | Permalink

        Climate scientists [ -snip]. Their obstinate DIY clannishness excludes all interference by professionals in the dozens of subspecialties they claim to meld perfectly, uniquely, and exclusively.

        Bah.

        [RomanM: No need to be abusive.]

    • bender
      Posted Dec 16, 2010 at 2:59 PM | Permalink

      Schmidt assumed that a spatial auto correlation of a series leads to the same auto correlation for the residuals that would be used in determining the degrees of freedom for a statistical estimation

      He assumed, correctly, that if a model were mis-specified that this is exactly what would happen. But he did not comment on the fact that this did NOT in Ross’s analysis.

      That it didn’t happen indicates Ross probably has a well-specified model. And I can only guess that Gavin didn’t want to advertise that too loudly.

      Gavin’s complaint about individual runs versus ensemble averages in significance testing is half-right – so he’s not as ignorant as some suggest. However he does fail to tell the whole story, suggesting it would be too much work to do all the runs required, and that someone else can do it as easily as he can. The reason it’s too much work is the payoff: the result would not be in his favor.

      • Pat Frank
        Posted Dec 16, 2010 at 6:21 PM | Permalink

        Let’s see . . . is that called ‘a lie of omission’?

        • bender
          Posted Dec 16, 2010 at 7:20 PM | Permalink

          I don’t like these kinds of comments because it kind of forces me to reply and clarify and protest over putting words in my mouth.

          I don’t know what Gavin knows. I can only guess at what he might have figured out. I can’t say that he isn’t disclosing the whole truth. I can can only guess that that might be the case.

          I invite him to come here and discuss.

        • Pat Frank
          Posted Dec 16, 2010 at 8:08 PM | Permalink

          bender: “But [Gavin] did not comment on the fact that this did NOT [happen] in Ross’s analysis.

          “That it didn’t happen indicates Ross probably has a well-specified model. And I can only guess that Gavin didn’t want to advertise that too loudly.

          🙂

  33. zinfan94
    Posted Dec 16, 2010 at 8:29 PM | Permalink

    I am not sure I understand why this isn’t simply sour grapes. Lets lay out the facts:

    – McKitrick and Nierenberg tried to publish a rebuttal to Schmidt 2009 and submitted their paper to the International Journal of Climatology.
    – McKitrick asked the editor that Dr. Schmidt not be a referee, and the editor agreed.
    – One of the referees asked for Dr. Schmidt’s comments on the paper, which a referee is allowed to do.
    – The result of the comments exposed problems in the paper that couldn’t be repaired to the satisfaction of the editor at IJOC. Essentially, McKitrick and Nierenberg couldn’t address the shortcomings well enough to satisfy the editor… They got their day in court, their “debate”, whatever you want to call it, and lost. They were unable to convince the editor.
    – Now some version of the paper (that Schmidt hasn’t even seen) will be published in Journal of Economic and Social Measurement.

    Looks to me like McKitrick and Nierenberg lost the “game” fair and square, and now have come home to their friends at CA to complain about the bad referees- they look like a couple of sore losers.

    • Posted Dec 16, 2010 at 9:08 PM | Permalink

      One of the referees asked for Dr. Schmidt’s comments on the paper, which a referee is allowed to do.

      No, this actually is improper. A referee is not supposed to show a paper to others. If he or she feels unable, on technical grounds, to provide a report, the proper thing to do is to decline to act as reviewer, not to contact someone else for help, and certainly not the one person most unsuitable under the circumstances to inject neutral advice. As I pointed out to the editor, there were no issues at stake that went beyond those published in the previous papers or explained in the published code archives, so there was no reason to ask Gavin to provide backchannel input. If the referee didn’t have the technical depth to do the review, they should have declined.

      the comments exposed problems in the paper that couldn’t be repaired to the satisfaction of the editor

      Actually none of the referees made any reference to the points Gavin submitted, so you have no basis to say this. More to the point, we were never shown Gavin’s comments, so I have no idea if any problems were exposed.

      McKitrick and Nierenberg couldn’t address the shortcomings well enough to satisfy the editor.

      We were never shown Gavin’s comments, let alone asked to address any of the issues raised by him or any of the referees. There was never a “debate” as you call it.

      Plenty of papers, including good papers, get rejected. That’s life in academia. That is not the point of this thread. The point here is to illustrate the asymmetry in treatment of papers. Schmidt submitted a comment on my work, I was not asked to referee it nor invited to respond, and the paper was given a light review by Phil Jones who had a direct personal interest in seeing Schmidt’s paper published. I submitted a comment on Schmidt’s paper and the editor allowed Schmidt to provide rebuttal material to the reviewers, which I was never shown, after promising that Schmidt would not be asked to act as a reviewer. This is not sour grapes, it is a legitimate complaint about bias in the process.

      • Luis Dias
        Posted Dec 16, 2010 at 9:31 PM | Permalink

        I think this is pretty obvious, don’t understand where the confusion comes from…

      • glacierman
        Posted Dec 17, 2010 at 8:38 AM | Permalink

        It looks like the whole thing was a delay tactic. No issues were ever identified.

      • Richard
        Posted Dec 18, 2010 at 1:36 AM | Permalink

        Ross is completely correct here. It is definitely improper way for editor to referees to behave. I’ve never encountered behaviour like this. Further, commonsense would suggest that it would not engender open scientific enquiry and would only encourage the sort of statements and behaviour that Gavin has been quoted above. It would appear that Gavin has learnt nothing from Climategate.

    • bender
      Posted Dec 17, 2010 at 2:35 AM | Permalink

      Too funny. Where’s willard peacemaker to set the record straight?

    • TAC
      Posted Dec 17, 2010 at 5:24 AM | Permalink

      FWIW, I agree with Luis: This is pretty obvious. Did you really mistake “asymmetry” for “sour grapes”? Do you know the meaning of these terms?

      I am not trying to be mean — just wondering why communication across the “climate divide” is often so difficult.

    • kuhnkat
      Posted Dec 17, 2010 at 11:05 AM | Permalink

      “- Now some version of the paper (that Schmidt hasn’t even seen) will be published in Journal of Economic and Social Measurement.”

      You statement presupposes that Schmidt can have something useful to say. Care to prove this?

  34. Barclay E MacDonald
    Posted Dec 16, 2010 at 10:02 PM | Permalink

    Zinfan94

    “The result of the comments exposed problems in the paper that couldn’t be repaired to the satisfaction of the editor at IJOC.”

    Did you just make this up?

    Specifically, what “problems” were “exposed” to the editor (as opposed to ones that were within Gavin’s mind.)? Second, what is your evidence that repairs were proposed to the editor by Dr. McKitrick prior to rejection? What were the proposed repairs prior to rejection? What exactly did the editor say in response to the proposed repairs, prior to rejection?

    And third, specifically, what evidence do you have that the comment below from Dr. McKitrick’s letter of February 18, 2010 to the editor was in any way a mis-statement?
    “We have received your response dated February 8, 2010 rejecting our manuscript. In your cover letter you indicated that your aim was to provide a fair and thorough review process. However, the result of the
    process, bearing in mind the context created by the publication of Schmidt’s paper in 2009, and the fact that you did not give us the opportunity to respond to any of the referee comments, did not end up being
    fair to us.
    The third referee supported consideration of a revised version, and raised points that could be addressed in a revision, so we will focus on the first two referees’comments.”

    I look forward to your response.

  35. Steven Mosher
    Posted Dec 17, 2010 at 1:19 AM | Permalink

    You guys should note that this is part of the “new” tactics. Rapid reaction. Now, things will get interesting. For the rapid reaction to work they need code. Its even better if they can get on the review (by hook or crook)

    Expect rapid reaction pieces, muddy the waters, and shut down the threads, and move on.

    • Luis Dias
      Posted Dec 17, 2010 at 11:30 AM | Permalink

      This is somewhat funny in itself, if it wasn’t sad.

      In real terms, this means that the “skeptic” McKitrick papers will be “legitimate” in the literature for 1/6 or 1/10 of the time that the responses to them will. 9/10 of the time, Gavin at RC will claim the “peer-review” high ground of having McKitrick “falsified and debunked”, and only has to handwave and behave as if little is happening in 1/10th of the time.

      This is of course a highly truth relativistic analysis I’m making. I’m only making this analysis assuming that neither party will concede their points and that both parties will have their points published.

      Obvious assymetry is obvious (as the kids say in the internetz).

  36. Steven Mosher
    Posted Dec 17, 2010 at 1:25 AM | Permalink

    Ok, OT. But I listened to this guy today

    http://paul.kedrosky.com/archives/2009/07/dragon-kings_vs.html

  37. Steven Mosher
    Posted Dec 17, 2010 at 1:48 AM | Permalink

    OT, Bender you might like this, steve u too

    http://arxiv.org/abs/0907.4290

  38. Posted Dec 17, 2010 at 10:32 AM | Permalink

    It still speaks volumes about the integrity of skeptics and proponents that Skeptics provide data and methods when asked. No FOI required.

    • Luis Dias
      Posted Dec 17, 2010 at 11:34 AM | Permalink

      It works against honest people, though.

      As Anthony Trollope, in “Barchester Towers” said,

      “Wise people, when they are in the wrong, always put themselves right by finding fault with the people against whom they have sinned. . . . A man in the right relies easily on his rectitude, and therefore goes about unarmed. His very strength is his weakness. A man in the wrong knows that he must look to his weapons; his very weakness is his strength. The one is never prepared for combat, the other is always ready. Therefore it is that in this world the man that is in the wrong almost invariably conquers the man that is in the right, and invariably despises him.”

      From Stephen Budiansky, here:

      http://budiansky.blogspot.com/2010/12/you-can-always-beat-honest-man.html#ixzz18O5qeAT6

      • Posted Dec 18, 2010 at 4:43 AM | Permalink

        Re: Luis Dias (Dec 17 11:34), Trollope reads better IMHO if one substitutes “cunning people” for “wise people”. The tenacity of people here, as well as the continuing success of Shakespeare’s plays, is all evidence that though cunning may buy time, as with IJoC, people want integrity to prevail.

  39. Steven Mosher
    Posted Dec 17, 2010 at 10:48 AM | Permalink

    What happened to the RC thread

    • bender
      Posted Dec 17, 2010 at 11:46 AM | Permalink

      Which thread? A thread at RC? Or a thread on RC?

      • RuhRoh
        Posted Dec 17, 2010 at 12:53 PM | Permalink

        Can help with image processing. What is your website?
        RR

        • Steven Mosher
          Posted Dec 17, 2010 at 1:18 PM | Permalink

          Re: RuhRoh (Dec 17 12:53),

          http://stevemosher.wordpress.com/

          leave me a comment, your email address will show up for me.

          Its a big project, I think Imhoff started to work on it, looks like it was never finished. Suffice to say the one scientist I pointed to it ( works in the field) was totally unaware that this had been done. Anyway, interesting bit of imagery history.

  40. Steven Mosher
    Posted Dec 17, 2010 at 10:55 AM | Permalink

    “Can I ask you something in CONFIDENCE – don’t email around, especially not to
    Keith and Tim here. Have you reviewed any papers recently for Science that say that
    MBH98 and MJ03 have underestimated variability in the millennial record – from models
    or from some low-freq proxy data. Just a yes or no will do. Tim is reviewing them – I
    want
    to make sure he takes my comments on board, but he wants to be squeaky clean with
    discussing them with others. So forget this email when you reply.
    Cheers
    Phil”

    • Luis Dias
      Posted Dec 17, 2010 at 11:36 AM | Permalink

      tsc tsc, the man has a big depression, why won’t you leave the poor man alone? Look at him, so cute, like a sad teddy bear who’s getting a beating from skeptic bullies. Shame on you!

      • Steven Mosher
        Posted Dec 17, 2010 at 1:10 PM | Permalink

        Re: Luis Dias (Dec 17 11:36),

        You miss the point. My point is this. Most people believe that sharing a paper you are asked to review is not proper protocal. If journals want to make it proper they should.
        What the mail shows is that Osborn knew what squeaky clean means. now it appears that journal editors are going to support a different approach to review. Personally I have no problem with hostile self interested reviewers. Does that shock you? Steve and I disagree on this. What I would object to is changing rules without notice and changing standards
        based on the content of the article. The point of the mail is NOT what Jones said. The point of the mail is Osborn’s understanding of the process.

        But in general they are going down a path of redefining peer reviewed literature. Let that process be open and fair and rigorous so that it produces the best science and not just the science that supports our conclusions. I happen to think that Ross has got some flakey predictors, so my gut reaction is that there is something “wrong” with his approach. But I certainly would say it merits publication and does not merit special treatment just because I happen to think ( gut feel) that there is something amiss with it. Its only because “they” have determined that “no skeptics publish peer reviewed lit” that they feel the need to keep these kinds of papers out of the “canon”.

        • Artifex
          Posted Dec 17, 2010 at 1:24 PM | Permalink

          Personally I have no problem with hostile self interested reviewers. Does that shock you?

          No, but ….. hostile, biased reviewers coupled with anonymous review is a poor formula. The best weapon we have against these slimy and dishonest practices is a bright light and ridicule. The reason Steve is so effective and so disliked by the realclimate scientists is there is that holding their actions up to a spot light and saying “look at this” has a lot of convincing power.

          I also have no problems with hostile review, but if they have the right to biased review action, I hold that I have just as much right to break the anonymity of the review and out those who make inane and ideological comments to public ridicule.

        • Scott B.
          Posted Dec 17, 2010 at 4:52 PM | Permalink

          It does not shock me, as hostile, self-interested reviewers should provide the best critiques. But there need to be rules regarding these critiques, to make sure they’re legitimate. Not just comments designed to muddy the waters, delay the publication process, or anything else not directly in pursuit of the science. And those rules need to be followed.

        • Steven Mosher
          Posted Dec 18, 2010 at 11:10 PM | Permalink

          I’m not gunna make up rules for journals. I’m no journal expert. I can judge that process seems different for Ryan and Ross and can argue that the journals should have a consistent process. But even the best process is only as good as the people in it. Hence the necessity of personal attacks at times

          Like the dude at AGU from Science who stood up and argued that they had a great data archiving policy. Enforcement? oh crap, you dont expect us to enforce our policy

        • bender
          Posted Dec 18, 2010 at 11:12 PM | Permalink

          heh heh. politician.

        • Steven Mosher
          Posted Dec 19, 2010 at 1:17 AM | Permalink

          Re: bender (Dec 18 23:12), Bender check out the presentations for the uncertainty in Paleo section on friday
          1340. ( maybe steve Mc can get them)

          There was a fascinating problem if the whole pseudo proxy/ GCM disscussion. where the GCM hindcast and the proxy were of opposite signs. Wahl and some other guy both pressed on the point.

          The downscaling work was also cool.

          Also, by using the GCM as a prior one team also could construct S/N fields, and other could say that proxies in certain areas were uninformative. All very “cutting edge” one team had built an emulator of the GCM to give finer results.

          And ya, one guy did a nice paper on the dangers of wiggle matching.

          And Jones made a plea for more winter proxies, as they are more informative. Also did some highlights for Brohan 2010. Appears the Bucket problem will be addressed…..

          JEG and company did a paper on reconstructing El Nino. Also, there is a problem with GCMs and observation.

          1. In tropics in a GCM warmer = more precip=higher salinity.
          2. In observations warmer =more precip =rain=lower sea surface salinity.

  41. Barclay E MacDonald
    Posted Dec 17, 2010 at 5:15 PM | Permalink

    Peer review should move science forward each time. That is the correct objective. That it should move science forward ultimately is unsatisfactory.

  42. Noblesse Oblige
    Posted Dec 17, 2010 at 7:45 PM | Permalink

    No thoughtful person can subscribe to the notion that peer review in climate is a valid process. Even without the forces running loose in the climate world, peer review is a questionable undertaking, rife with bias. The social psychologists have known this for decades and have even researched the role of bias. See for example the classic paper by Mahoney http://www.mang.canterbury.ac.nz/writing_guide/review/mahoney.shtml

    In climate it has become nothing more than a defense mechanism for the dogma to hide behind. It needs to go if we are ever going to see the light of day in this mess. Even single-point decisions by editors would be better than what we have.

    • bender
      Posted Dec 17, 2010 at 8:11 PM | Permalink

      You can’t get away from bias. That’s NOT the problem here. As long as the biases are independent they tend to cancel out.

      No, the problem here is non-independence of review resulting from a concerted conspiracy to share criticisms and subvert the maintenance of independence of thought. The reason for independence is precisely to thwart the emergence of groupthink – which, left unchecked, will always arise in response to everyday business forces of self-interest.

      Mosher just showed us: Osborn knew this. And Jones knew this. That’s why “squeaky clean” is such a high-entropy word.

      Climategatekeeping is not about bias. It’s about subversion of idnependence of thought. It’s about *propagating* bias. It’s about the spread of memes. It’s about propaganda masquerading as science.

      Comrie failed to provide the indepedence to which McKitrick was entitled. Three independent repudiations of McKitrick would have been far, far more powerful than one group-review led by Schmidt.

      McKitrick was right to take his manuscript elsewhere.

      • Steven Mosher
        Posted Dec 19, 2010 at 2:11 PM | Permalink

        hehe,

        Glad you remembered the high entropy aspect of this. At some point I think I need to go back to the mails to detail what I’m calling the “thin green line”. If climategate is an example of noble cause corruption, then we should expect to see two things: an erosion of ethical behavior toward outsiders and an increase in rules surrounding external communication– getting the story straight, things you don’t say. nobody is allowed to cross the thin green line. The “new” AGU has decided that it’s principle goal will be helping members “get the word out.” Do you think they will help Judith Curry get her word out? I doubt it. In the mails Mann pleaded for help; he portrayed himself as fighting a lonely battle. Jones joined him in that struggle. Going forward it’s clear that the professional institutions will join the struggle. The Union of concern scientists has, and other associations are. The journals are with their editorials. They are all adopting Mann’s world view. Increasingly there will no place for the scientist who just wants to do science, free from the entanglements of the sweaty public. There will be questions you can’t ask, doubts you can’t raise and free inquiry will gradually be replaced by science in the furtherance of the dominant political paradigm. And no one inside the green line will be able to see it. And if by chance they do see the matrix they won’t be able to raise their hand and say they disagree. That personal value will have no standing.

        • EdeF
          Posted Dec 19, 2010 at 8:27 PM | Permalink

          When I was about 6 or 7 I saw a map of the world and instantly recognized that Africa fit up against S. America, and Europe with
          North America. (Not sure what I thought about Oz.) Looking at the
          satellite images on UHI and looking at some of the temperature data
          for several city-countryside pairs, its equally obvious what is going on
          with UHI. It is time for the team to admit UHI contamination, come up with better processes to deal with it in the CRUTEM-like data bases
          and move on. This stonewalling is childish.

    • Ryan O
      Posted Dec 17, 2010 at 9:37 PM | Permalink

      Fantastic article. The question most people should ask after reading it is, “Did we really need to do an experiment to know that this is true?”

      Human nature is human nature. Simply because one person is a “scientist” does not mean he is magically unaffected by human misperceptions and cognitive frailties.

      This is what makes the adamant defense of peer-review processes (and not just in climate science) such a counterproductive activity. The peer-review process needs fixing, but to fix it, the practitioners must first admit it is broken.

      • Posted Dec 18, 2010 at 1:23 PM | Permalink

        IMHO, as a complete layperson, with no scientific training, is that Peer-Review is simply a cursory examination of a paper. If there’s nothing obviously major wrong with the paper, then it’s good to go. Such review does NOT mean the paper’s conclusions are valid. It’s simply a first, quick, examination of a submitted paper. Replication is the real test, and if it’s replication by only your buddies, then it’s probably biased.

    • Posted Dec 18, 2010 at 1:20 PM | Permalink

      The social psychologists have known this for decades and have even researched the role of bias.

      Was that research peer-reviewed?

      *ducks and runs*

  43. Salamano
    Posted Dec 17, 2010 at 9:41 PM | Permalink

    This may be OT, but it is part of the RC discussion on this same topic…

    Malcolm Hughes chimed in with something that I was wondering whether or not was common knowledge:

    “As the author responsible for suggesting the selection criteria for tree-ring data in the Mann et al papers (1998, 1999, 2000, 2008, 2009 etc) I can tell you why the particular criteria were used, and why there is a clear basis for them. It is important to note that the requirement for at least 8 series was coupled with the criterion that there be a mean correlation equal to or greater than 0.5 between the individual series at one site that were combined to produce the site chronology. Wigley et al (1984) derived and tested a statistic they called ‘Expressed Population Signal’ (EPS) that has since been referred in many, many publications (444 by December 17 2010 according to the ISI Web Of Knowledge). They wrote ‘time series are averaged to enhance a common underlying signal or combined to produce area averages. How well, then, does the average of a finite number (N) of time series represent the population average…?’. To calculate EPS you need N and the mean correlation between the N series (rbar). In FORTRAN terms it is given by N*rbar/(1+(N-1)*rbar). If you write a simple MS-Excel formula you can calculate EPS for various values of rbar and N. Setting rbar as 0.5 shows EPS rising steeply up to 0.89 at N ~ 8, and then yielding very little increase in EPS for each additional series. By the way, the Wigley et al. (1984) paper includes not only testing of this and another statistic in real-life use, but also includes a formal derivation of them. Of course, as with any statistic, EPS is a guide to judgment and the assumptions on which it is based must be borne in mind. Given how much attention has been given to the problem of replication of tree-ring data in the published literature, as witnessed by the frequent citing of the Wigley et al (1984) paper, McShane and Wyner’s rejoinder reveals a distinct lack of familiarity with the most basic material on which they chose to pronounce.
    Reference:
    Wigley et al. 1984. Journal of Climate and Applied Meteorology, 23, 201-203.

    Comment by Malcolm Hughes — 17 December 2010 @ 12:18 PM

    • HAS
      Posted Dec 17, 2010 at 11:54 PM | Permalink

      Wigley et al is available here: http://journals.ametsoc.org/doi/pdf/10.1175/1520-0450%281984%29023%3C0201%3AOTAVOC%3E2.0.CO%3B2

      • Gerald Machnee
        Posted Dec 18, 2010 at 1:33 AM | Permalink

        Any further explanations of “Expressed Population Signal’ (EPS)”

        • Steven Mosher
          Posted Dec 18, 2010 at 1:47 AM | Permalink

          Re: Gerald Machnee (Dec 18 01:33), Today Jones said that EPS should be above .85.

          I suggest folks go back and look at Yamal and calculate EPS there..

          Yes Jones was at AGU as was Malcom Wahl and John Mashey.

          The last speak was tremendous. a statistician who talked about “tricks”
          She said the word about 4-5 times. Coincidently at that time Mashey looked over at
          the Hughes/and wahl gaggle and smiled. Dont know what to make of that. But here presentation was great, especially in talking about forward modelling of proxies. I wish bender was there

    • bender
      Posted Dec 18, 2010 at 2:37 AM | Permalink

      Good thing those bcps have high EPS 😉

  44. pete
    Posted Dec 17, 2010 at 10:20 PM | Permalink

    Section 2.2 combines some very serious errors.

    1: Ensemble means are not comparable to realised observations. Your method needs to be applied to the individual model runs. The paragraph you quote from S09 mentions “plenty of internal variability”. When you average over model runs you remove this interval variability.

    2: You’ve confused the “confidence interval estimated on model generated data” with the distribution of coefficients estimated on multiple realisations of model data. This is related to point 1 — obviously you can’t get a distribution from an ensemble if you apply the test to the ensemble mean instead of individual runs.

    3: You’re looking for “significant coefficients of the same approximate size and sign”, whereas the proper test would be to check that no more than 5% of the simulated coefficients exceed the (absolute) magnitude of the observed coefficients. I think that in econometrics lingo you’d describe this as calculating a response surface.

    • Ryan O
      Posted Dec 17, 2010 at 11:15 PM | Permalink

      Ross,

      I am curious about this as well, though I think Gavin’s use of the models as the null is the wrong null. The model data and observational data do not display the same properties, so I’m not sure how one could justify models as a null. Regardless, using the individual model runs rather than ensemble means seems the right way to do the analysis to me.

      Also, I’m not sure that requiring the nulls (whatever they are) to display the same sign as the correlations in the observational data is correct, either. While I think it would be correct to penalize based on the null showing the same sign as the observational data for one factor but opposite for another factor (as there is a physical constraint that the factors should be consistent with each other), I do not see how one could dismiss a case of a model showing the same magnitude of correlation for each of the factors simply because all of the signs are opposite.

      • bender
        Posted Dec 18, 2010 at 2:52 AM | Permalink

        The null was decided when Gavin declared that his models produce SAC patterns not unlike the patterns Ross was pointing to in the real data. The null hypothesis is neither correct nor incorrect; it is just what Gavin asserted. Ross merely took his cue from Gavin. Basically had to re-do Gavin’s work because Gavin did it wrong.

        Pete has said nothing here that I haven’t said already. It is true that one should look at individual runs to test Gavin’s hypothesis – not ensembles. However, when you do that I’m fairly confident that what you will find will not support Gavin’s contention that the models produce patterns not unlike those in the socioeconomic data. They likely produce more randomized patterns that are quite unlike the soocioeconomic data. And if the correlations are negative then that’s even stronger proof for Ross’s argument, and agianst Gavin’s. So testing on absolute values of correlations is retarded.

        Why are we arguing with each other and picking holes in past papers when the obvious solution is for Gavin to put up or shut up? Show us the runs & correlation stats. I want to see 1000 of them.

        • Ryan O
          Posted Dec 18, 2010 at 11:07 AM | Permalink

          Bender,

          First, the null hypothesis must represent a possible state for the population in which the postulated effect does not exist. One then determines the likelihood that the apparent effect in the real data could be due to a random realization of the null. If the models do not have the same statistical properties as the real data, then they cannot represent a random realization of the same population of data from which the real observations are drawn. If so, they are not an appropriate null regardless of whether someone used them as such in the past.

          The question about whether models are an appropriate null is an important one, as they continue to be used to test the fidelity of proxy reconstructions . . . and apparently have been accepted as a valid method to test SAC patterns. I do not think they are suitable for either use. It is not an argument against M&N; it is an additional argument against Gavin.

          Second, we seem to agree that the appropriate method is to test using individual models rather than ensembles (though, like you, I doubt the results would change).

          Third, I disagree on the issue of the correlation sign, at least from a statistical standpoint. When determining the statistical significance of a trend line, for example, realizations with both signs are used to determine the p-value. I do not see why that is not the case here.

          However, if there is a physical constraint that any postulated relationship between SAC and temperature patterns must have a certain sign, then the existence of correlations of the opposite sign in the models give additional evidence that the models are the wrong null, because they produce patterns that are not physically possible in the real data.

          Lastly, your implication that the goal is to “pick holes in past papers” is a bit misplaced. Until the question is definitively answered from all possible angles, the “UHI is not significant” meme will continue to be sung. What I am curious about are things that Gaven et al. might continue to point to in order to keep the meme alive. If M&N actually do answer these questions, that would be fantastic. If they do not (and I don’t believe they do), then it still gives Gavin opportunity to blather on about the insignificance of UHI.

          While it should be Gavin who has to prove both that models are correct nulls and that he finds the same socioeconomic patterns in the models, real life is far from ideal. At the moment, there seems to be a group of people who can freely make claims without substantiating them, and these claims are taken as fact until the evidence against them is overwhelming. Regardless of whether this is right, it is fact. If one wants the no-UHI meme to disappear, then one must play by these rather unfair rules. Such is life.

        • bender
          Posted Dec 18, 2010 at 1:47 PM | Permalink

          Speaking of misplaced points, I’m willing to debate you on most of these issues. I agree about choice of null in general; I assure you I don’t need lecturing there. Entire volumes have been written on choice of appropriate nulls. But I’m not interested in discussing how science should be done *in general*. I’m interested in explaining to people why Gavin is wrong both in fact and in process, and why Ross is right.

          I will move off this point (my friend, willard) when there is recognition of error. Not sooner. This is how science proceeds. You show some class and admit when you’re wrong.

          Speaking of which, it is very classy of Ryan O to not argue with team supporters on the import of his recent paper. Their whitewash stinks – pretending Ryan O has not refuted their claims when in fact he has shaken them into the dustbin.

        • Ryan O
          Posted Dec 18, 2010 at 2:39 PM | Permalink

          Bender,

          I’m not sure that there is a whole lot we have to debate, except for perhaps the sign issue. My only point there is that requiring the correct sign is a physical – not statistical – constraint, and is more appropriately used to discard invalid nulls. I think we both agree that getting the sign right is important . . . but we seem to advocate different ways of dealing with it. Both ways, however, indicate that Gavin is wrong.

          Our differences, I think, are because we are looking at this from different perspectives. In my case, I believe that M&N are entirely correct. However, what I believe does not matter. As long as any wiggle room for the no-UHI-meme, the meme will persist. From this perspective, it does not matter that Gavin has the logical burden of proof. What matters is whether he has any room to repeat the meme without demonstrating complete intellectual bankruptcy.

          Your perspective, I gather, is that M&N have demonstrated that significant UHI contamination of the temperature record is statistically very likely. You also know that the counter-arguments are unlikely to change this. You want Gavin et al. to acknowledge they have the burden of proof, and produce said proof forthwith (or forever hold their peace).

          I agree that your way is the way it should work, but how it actually works seems to be quite different. In other words, rather than having the debate settled based on the other side admitting they cannot provide the requisite evidence to back up their claims, the debate can only be settled by showing preemptively that the requisite information does not – and cannot – exist.

        • bender
          Posted Dec 18, 2010 at 4:46 PM | Permalink

          The reason that, in this specific case, it’s only the correlations exceeding McKitrick’s (and not significance levels, or, what is the same, absolute values of correlations) (i.e. one-tailed vs two-tailed test) is because of the hypothesis put forth by Gavin: ‘the patterns in the real climate data can be explained by my model just as well as by your socioeconomic data’. So it’s a one-tailed comparison between model & data, because your null hypothesis is a strong positive correlation (at least as strong as between climate data and socioeconomic data), not a zero correlation. Strong negative correlations between observations and climate model runs would certainly be interesting to know about; and they should count against Gavin, not for. That is why the one-tailed test (counting only those correlations that exceed Ross’s) is more appropriate.

          On choice of null model. If Gavin’s climate model is a poor choice of null (because the distributions it produces don’t match those of observed data), well that’s Gavin’s problem. It’s his hypothesis that the physical models are more awesome than any flakey socioeconomic data. I agree that in general it makes sense to use a model where means and distributions are equivalent, if you’ve got that choice. But that wasn’t the statement Gavin made. That it’s a weak null is a direct consequence of his quick mouth and eagerness to prove Ross wrong. (Think, Gavin, next time, and you won’t have to back-pedal so far and so fast. How the hell did your trash get through peer review, anyways? Oh, wait, the process you guys use only *appears* “squeaky clean”.)

          And sure, when someone refuses to admit his error, it forces you to do his work. Again. And (as with Steig) your choice is not just to pound him into the ground with yet another reply, but do something more constructive, e.g. a methods paper using a more rigorous null. That approach makes sense and it’s that approach that got you praise from Weaver. “Don’t snipe; innovate.”

          Me, I’m here to snipe. It’s what I do. Apologies to willard.

        • Ryan O
          Posted Dec 18, 2010 at 7:36 PM | Permalink

          Bender,

          Got it. I am slow sometimes. 😉

        • pete
          Posted Dec 18, 2010 at 7:37 PM | Permalink

          Gavin: ‘the patterns in the real climate data can be explained by my model just as well as by your socioeconomic data’

          This isn’t quite right. The claim is: ‘your socioeconomic data explains the patterns in my model data just as well as it explains the real data.

          i.e. if the socioeconomic data can explain the a priori unrelated model data, then Ross’s explanation for the observed data may also be spurious.

        • bender
          Posted Dec 19, 2010 at 1:02 AM | Permalink

          If I were you, I would ask Ross about that.
          The socioeconomic data don’t explain the model output. And the reason is because the model’s UHIs are too weak. And the reason they’re too weak is becasue they assume so. And the reason they assume so is because of Phil Jones unique approach to science, and the solidarity of the team. What do you make of Imhoff et al 2010?

        • Steven Mosher
          Posted Dec 19, 2010 at 2:21 PM | Permalink

          I also posted a link to a recent study of Uccles.

          Going back to 1883 using historical maps and a urban energy balance model the teams estimated that 50% of the warming was the result of UHI.

          Every city will of course be different. But it strikes me that we are in a position to use urban energy balance models to estimate what the temperature would be like had the urban growth not occurred. Or since the science takes note of these models in urban planning they can hardly object to using them in rural reconstructions. If the science were open and free some uninspired grad student would be given this project by professor Bender. “here’s an interesting problem and a new fangled way of looking at it, there you have your dissertation topic.”

    • HAS
      Posted Dec 18, 2010 at 12:27 AM | Permalink

      I would just note that Schmidt 2009 to which M&N is a response happily reported results for the ensemble of GISS-ER and AMIPf8 runs with no caveats at the time. As I noted over at RC it would be useful if Gavin released the comments he made when asked by the editor IJC.

      An additional point to note is that M&N in replicating the analysis by Schmidt did note of the data that:

      “The average GISS-E land surface trend is 0.20 oC/decade, well below the reported trend of 0.30
      oC/decade in the CRU3v compilation. The range of trends over land in the all-GCM mean is 0.07 to 0.56
      oC/decade with a mean of 0.23 oC/decade, putting the CRU3v data in the upper half of the model spread.
      The standard deviations of the ensemble mean modeled trends are much smaller (one-third or less) than
      those in the observational data. This is not because the trends are averaged across multiple model runs,
      instead the trends in individual model runs have very small standard deviations to begin with.”

      On the face of it one would have to say it’s worth doing some more work here.

      • Ryan O
        Posted Dec 18, 2010 at 12:33 AM | Permalink

        Yes. I’ve noticed the same things in the models. That’s why I do not think models are a very good null at all. However, I do think Pete has a point . . . even though Gavin did the same thing.

        • oneuniverse
          Posted Dec 19, 2010 at 7:43 PM | Permalink

          Also, are we certain that UHI contamination does not affect the models? According to RealClimate’s FAQ :

          # Are climate models just a fit to the trend in the global temperature data?

          No. Much of the confusion concerning this point comes from a misunderstanding stemming from the point above. Model development actually does not use the trend data in tuning (see below). Instead, modellers work to improve the climatology of the model (the fit to the average conditions), and it’s intrinsic variability (such as the frequency and amplitude of tropical variability). The resulting model is pretty much used ‘as is’ in hindcast experiments for the 20th Century.

          Trends aside, climatalogical measurements must be used to determine ‘average conditions’ and ‘intrinsic variablility’. Can we confirm that no datasets potentially affected by UHI (or other urban effects) are used?

        • bender
          Posted Dec 20, 2010 at 10:54 AM | Permalink

          One must be careful to watch “the pea under the thimble”, as Steve likes to say. (This kind of red herring is served up all over “skeptical science” attempts to refute the skeptics.)

          1. Reasonable skeptics do not assert that climate models are “just a fit” to GMT. There’s a ton physics in the models, a lot of it correct with parameters estimated with high precision. However there are some some parameters that are not estimated precisely via independent experimentation – and the quality of a model is determined, not by its average strength, but by the strength of its weakest link. Enter the problem of “tuning”. Moist convection has many tuned parameters. (How many is a fine question for Gavin.) Because it is a sub-gridcell process, an empirical parameterization is necessary; it is not described at the level of the physical process of cloud formation. (Which is why Lindzen and Spencer are relevant.)

          2. Reasonable skpetics do not assert that it’s the *GCM* tuning that is the most critical problem. It’s the correlative task of determining the various forcings by solar, volcanoes, aerosols, black carbon, GHG, etc. i.e. This exercise does not involve the GCMs. This is not a statistical exercise, but, again, an ad hoc tuning. And what do you think they are tuning that model to? A surface record that includes UHI contamination? Well, ask Gavin; he’s a straight-shooter.

          It’s easy to refute invented strawmen. As we see here, the “refutation” is more often a dismissal than a genuine attempt to engage on the subject. Back in the real world: it’s much harder to refute a rational skeptic.

          Finally, why are there spelling errors in their FAQ? Do they not scrutinize their own writing? That might explain some things.

        • oneuniverse
          Posted Dec 22, 2010 at 12:57 PM | Permalink

          Enter the problem of “tuning”. Moist convection has many tuned parameters. (How many is a fine question for Gavin.)

          I asked Gavin a few questions about ModelE tuning at Collide-a-skape’s “Curry Agonistes” thread, which he graciously answered. He doesn’t mention moist convection. My questions reproduced verbatim, Gavin’s answers quoted :

          – Are threshhold relative humidities of water and ice clouds, and the gravity wave drag parameters the only parameters in the GISS ModelE tuned according to a consideration of their emergent effects?

          – What are other four or so tunable parameters in the emergent phenomena scale category, as mentioned in the RC FAQ?

          In terms of tuning at the global level, we have three main cloud related parameters we play with (to get a reasonable global albedo, high cloud amount and radiative balance). For stratospheric circulation (which impacts strat-trop exchange, high latitude sea level pressure fields), we tune the GW drag. In the new models with interactive aerosol effects, there are a couple of parameters associated with effective radii that can be important. There are some rather broad constraints on these values from observations, but there is still a lot of scope for variations that have important effects. Pretty much everything else is set for local/process reasons. This means that our ability to globally tune to fit anything other than gross features of the climatology is impossible. We cannot tune for climate sensitivity for instance, even if we wanted to, nor can I force any particular metric to suddenly match much better to observations (I wish!). One thing to note is that differently-tuned versions of one model are almost always more similar to each other than they are to a different model, or even their AR4 ancestor.

          – Given these tunable parameters, what is the tunable range of the radiative balance ?

          – What is the resultant tunable range of the model’s equlibrium temperature?

          We are able to tune the radiative balance by a couple of W/m2 perhaps, the albedo by a 1% etc. It can’t make a model that is hopelessly wrong suddenly work well. As for the model global mean surface temperature, we do not specifically tune for that, but we usually end up with values between 13 and 15 deg C.

        • bender
          Posted Dec 20, 2010 at 11:39 AM | Permalink

          For more on tuning GCMs read Kiehl (2007):

          Kiehl (2007) on Tuning GCMs

        • oneuniverse
          Posted Dec 21, 2010 at 6:22 PM | Permalink

          Thanks bender, Kiehl 2008, what a result. Using Google to find citing papers, I came across a paper that appears to have been written in response, “Why are climate models reproducing the observed global surface warming so well?” (Reto Knutti, GRL 2008), which contains some notable comments about modelling practices :

          Models differ because of their underlying assumptions and parameterizations, and it is plausible that choices are made based on the model’s ability to simulate observed trends.

          Models, therefore, simulate similar warming for different reasons, and it is unlikely that this effect would appear randomly. While it is impossible to know what decisions are made in the development process of each model, it seems plausible that choices are made based on agreement with observations as to what parameterizations are used, what forcing datasets are selected, or whether an uncertain forcing (e.g., mineral dust, land use change) or feedback (indirect aerosol effect) is incorporated or not.

          Such manner of selecting forcing datasets would be wrong, but Reto Knutti strangely gives all this the green-light :

          The model development process is always open to influence, conscious or unconscious, from the participants’ knowledge of the observed changes. It is therefore neither surprising nor problematic that the simulated and observed trends in global temperature are in good agreement.

        • oneuniverse
          Posted Dec 21, 2010 at 6:25 PM | Permalink

          “Kiehl 2007”, not 2008..

        • oneuniverse
          Posted Dec 21, 2010 at 6:43 PM | Permalink

          While it is impossible to know what decisions are made in the development process of each model

          Impossible?

        • bender
          Posted Dec 21, 2010 at 6:56 PM | Permalink

          i.e.
          Impossible, based simply on reading a paper or model documentation.
          Impossible, simply by scanning model code.
          Impossible, without insight into the model development process.
          Impossible, without rather deeper investigation than is common to undertake.

          i.e. Possible.

        • Steven Mosher
          Posted Dec 22, 2010 at 12:18 AM | Permalink

          Re: bender (Dec 21 18:56),

          Watch Tim Palmer’s presentation from AGU. he gets climategate, uncertainty and the silliness of stamping one’s foot about your grandchildren. If you need a link holler

        • oneuniverse
          Posted Dec 22, 2010 at 9:37 AM | Permalink

          Knutti 2008 follows Kiehl 2007 in finding sensitivity/forcing correlation for some more recent models:

          It is shown that while climate sensitivity and radiative forcing are indeed correlated across the latest ensemble of models, eliminating this correlation would not strongly change the uncertainty range of long-term temperature projections.

          The correlation is for CMIP3 models, however the long-term temperature projection appears to be from the Bern2.5D model (not considered in Kiehl 2007). I didn’t spot any explanation of why the Bern model was a suitable replacement for the CMIP3 models, so I don’t see Knutti’s result concerning the small change in the long-term uncertainty of the Bern model as a mitigating argument against Kiehl’s and Knutti’s own results for the sensitivity/forcing correlation.

          Even if the Bern2.5D model was shown to be substitutable for the CMIP3 models, or if CMIP3 models had been used and had produced a similar long-term result, it still wouldn’t be a mitigating argument. The similarity of the presented 20th c. results for different forcings & parametrisations of different models remains eyebrow-raising.

      • pete
        Posted Dec 18, 2010 at 1:10 AM | Permalink

        instead the trends in individual model runs have very small standard deviations to begin with.

        I get standard deviations between 0.15 and 0.46 for the trend fields from the IPCC model runs (cf 0.26 for the CRU data).

        Schmidt 2009 … happily reported results for the ensemble of GISS-ER and AMIPf8 runs with no caveats at the time.

        Gavin reported both individual and ensemble mean results.

        Note that he says “significant correlations exist for the economic variables even with the ensemble mean” (emphasis mine). He’s publishing in a climate journal, so he can assume the audience knows the difference between and ensemble and the individual runs.

        • HAS
          Posted Dec 18, 2010 at 4:23 PM | Permalink

          Pete

          Which dataset were you using – the ones in M&N’s SI? If not are there differences?

          BTW the use of ensemble or individual runs is an interesting question.

          First I’d note in passing that central tendency is generally taught in basic statistics courses so its not just climate scientists that would understand the issue.

          The substantive question here is what these runs represent.

          If you take the view that each is a separate attempt to model the earth’s climate over the period in question, then the ensemble is likely to be a better estimate of the world’s climate than the individual runs, and the reduction in variance is simply an artifact of the fact that we have better information. If the hypothesis being tested is that the real world controlled to eliminate contamination shows contamination, then using your best estimate of the real world without contamination makes sense.

          Using an ensemble shouldn’t raise any eyebrows either – this is how climate models are regularly used in practice
          multiple runs are used to derive better estimates of what’s happening in the real world.

          Not only are climate models “ensembled” to give better estimates, all the other series being used are derived from ensembles of e.g. weather station measurements and this doesn’t cause people too much grief. (In all cases there is the question of whether the simple mean is the appropriate estimate from what are not necessarily independent samples, but let’s put that Pandora’s box aside.)

          The alternative view being espoused by Schmidt is that each individual run stands alone: “A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.”

          Some have interpreted Schmidt to be saying that a probabilistic approach is required – run 1000 models and if 95% showed no contamination then you could put your money on M&N.

          But on a strict interpretation of Schmidt is saying if there exists one realisataion of a climate model that exhibits contamination then N&W’s case is disproven. He makes this explicit later on in RC: “Classical statistics … by convention uses a 95% cutoff, so if you found that this pattern occurred less than one time in 20 runs you might start to think that there was something to be explained. However, it still would not prove that there was contamination – perhaps the real world was just one of those times.”

          This suggests Schmidt sees climate models as modeling what happens on a wide range of possible earths, of which our actual experience is just a particular case (and presumably there exists a model run that will match experience exactly – it’s just that it hasn’t been run yet).

          The problem with this line of reasoning is that unless you have a view about the relationship between the set of individual model runs that have been done so far and the run that represents reality, climate models are limited in what they can tell us about the real world (or the “real world run” in Schmidt’s terminology).

          In fact climate science makes quite detailed claims about the relationship between individual runs and reality. Paramount amongst these (in some quarters at least) is the claim that the individual runs, with appropriate forcings, tell us something about the “real world run”. So the “real world run” is being claimed to be dependent on the particular runs we have happened to have done.

          The problem for Schmidt in the extreme is a logical one. If the “real world run” contains some significant “contamination” that no other run contains (or that only a limited number of runs contain), then until he finds that run we can have no confidence that other climate model runs tell us anything about an important aspect of reality.

          All of which IMHO is sophistry, of course.

          Models are an attempt to describe something that exists independently of the model, namely observed phenomena; multiple model runs are routinely used to improve the estimates of those phenomena from the models; and if the best estimates from those models fail to give good estimates of the phenomena then either there is something wrong with the models (and this will limit the inferences you can make with them) or there’s something wrong with the measurement of the phenomena (or both).

          So Ryan O I think it is quite instructive to compare the null with the output of models, even if your instinct is correct that model runs are no substitute for real live measurement.

          Schmidt concludes his last comment quoted by me above by saying: “McKitrick’s hypothesis would have a lot more traction if it actually predicted something observable, rather than being a post hoc validation.” M&N’s hypothesis predicts that the rate of change in temperature measured by land surface stations will continue to diverge from the value estimated by climate models as various environmental variables change.

        • bender
          Posted Dec 18, 2010 at 4:59 PM | Permalink

          You are missing the point. The reason individual runs are preferable to ensembles, in this specific case, is because that’s what Gavin hypothesized: my model produces SAC patterns just like those in your socioeconomic data. “Standard practice” is irrelevant when it comes to testing specific claims.

          You need to make an apples-to apples comparison: one model realization versus one real-world realization. (pete is correct on this; but I said it first). But then you need to do this comparison 1000 times to obtain a proper permutation test of significance. Gavin looked at how many runs before shooting off his mouth that his model produces SAC patterns just like those in the socioeconomic data? We don’t know. (pete, why don’t you ask him?)

          Obviously no one here is reading what I write. Probably time to call it quits for a few months.

        • HAS
          Posted Dec 18, 2010 at 5:37 PM | Permalink

          Actually I read what you wrote and made a number of observations if you go back through my comment.

          First I had previously made the point that M&H were responding to Schmidt.

          Second I was observing that Schmidt had a higher test than you did, (no contamination in all possible worlds).

          Third that I disagreed with you and Pete that an ensemble versus observation is necessarily a “mistake”. Just repeating what you’ve said doesn’t make it true, Bellman notwithstanding.

        • bender
          Posted Dec 18, 2010 at 6:53 PM | Permalink

          You want to debate whether it’s appropriate to use individual runs or an ensemble mean? This should be interesting. Of course it’s not so because I say it’s so … I’m not much into dogma … but I’m willing to explain to you why you’re wrong and pete’s right.

        • HAS
          Posted Dec 19, 2010 at 3:10 PM | Permalink

          Coming back to this I see the issue of whether one should test the contamination hypothesis on each run or use the models to estimate the real world parameter and test on that remains.

          I thought an simpler example might help elucidate.

          I’m running an urban bus systems and the statistic of interest is the annual average daily number of buses that pass each bus stop between 4 & 5 pm. I have RFID readers around the city and tags on my buses (trop) and I also employ faithful retainers at each bus stop to record the passage of buses during the day. My suspected source of contamination of their measures is the proximity of the observation point to a pub, just showing how how faithless faithful retainers can be these days.

          I have bus system modeling team that have developed a complex model of traffic flows etc that allows me to throw the dispatch times for buses on any day and it models where the buses will be. The model deals with week days, weekends, school holidays etc etc. The team is particularly proud of the new engine simulation module where by considering the flows of fuel vapour in the cylinder they can model the changed power of the buses in the event that we switch to bio-diesel – but I digress.

          So the question is:

          Do you run the model for a randomly selected day, estimated the statistic, check for contamination and give the comparison a tick or a cross, and select the next one day in the set? OR

          Do you run the model over the set of randomly selected days to estimate the statistic, and check that for contamination?

          Oh and before you answer I did mention pubs were shut on Sundays didn’t I?

        • HAS
          Posted Dec 20, 2010 at 3:33 PM | Permalink

          On reflection I was carried away by my own lyricism in the last sentence and forgot I was testing each run against the annual average. So ignore that, but the question still remains.

        • EdeF
          Posted Dec 19, 2010 at 10:38 AM | Permalink

          Bender, you have been studying the GCMs, how long does it take to do one run? Would 100 runs suffice? I say this because I run large multivariable stochastic simulations that can take awhile
          to run. One individual run can be all over the place. My guess is that the GCMs take ages to do an individual run, we might have another ice age before they could run 1000. Is it practical?

        • Ryan O
          Posted Dec 19, 2010 at 12:35 PM | Permalink

          If it is not practical, then the claim that GCMs reproduce the same correlations to socioeconomic factors as the observational data should not have been made in the first place. Now that the claim has been made, the only way to demonstrate it properly is to do lots and lots of runs.

          Perhaps that was part of the strategy, as hundreds of runs from a given GCM are not likely to ever be performed.

        • pete
          Posted Dec 18, 2010 at 4:43 PM | Permalink

          Which dataset were you using – the ones in M&N’s SI? If not are there differences?

          Yep, SI dataset.

          First I’d note in passing that central tendency is generally taught in basic statistics courses so its not just climate scientists that would understand the issue.

          The same mistake was made in MMH10, so while I’m sure a wide variety of people understand the issue, I’m not convinced Ross does.

          The substantive question here is what these runs represent.

          If you take the view that each is a separate attempt to model the earth’s climate over the period in question, then the ensemble is likely to be a better estimate of the world’s climate than the individual runs, and the reduction in variance is simply an artifact of the fact that we have better information. If the hypothesis being tested is that the real world controlled to eliminate contamination shows contamination, then using your best estimate of the real world without contamination makes sense.

          Each run comes from slightly different starting conditions. Averaging over an ensemble tells you which features are deterministic (climate) and which are stochastic/chaotic (weather).

          Schmidt concludes his last comment quoted by me above by saying: “McKitrick’s hypothesis would have a lot more traction if it actually predicted something observable, rather than being a post hoc validation.” M&N’s hypothesis predicts that the rate of change in temperature measured by land surface stations will continue to diverge from the value estimated by climate models as various environmental variables change.

          He’s talking about successful out-of-sample prediction — it’s not enough just to make a prediction, the prediction has to be good.

          MN10 does contain a validation test, but given the presence of SAC it’s a bit of a joke.

        • HAS
          Posted Dec 18, 2010 at 5:18 PM | Permalink

          Pete

          Few comments.

          You assert it was a “mistake” to use an ensemble and therefore because they did that M&N don’t understand central tendency. I’m sure you weren’t seriously suggesting that and were only making a debating point.

          I previously observed that Schmidt made the same “mistake”, M&N were responding to that, and I also spent some time in my comment outlining why ensembles could well be appropriate.

          Perhaps we should explore if I’m mistaken in that view (and ipso facto don’t understand central tendency)?

          Your only comment in response to the points I made were to say ensembles are used to separate climate from weather. I guess that begs the question whether the observed trend in the temperature gradient over the period 1979-2002 is an artifact of weather or climate?

          If its climate I’m sure you will agree from your own reasoning using the average from ensembles is the appropriate thing to do.

          I’ll leave you to quietly contemplate the implications if you say it’s only weather.

          On your point about predictions, what you might have liked Schmidt to say is neither here nor there. What he said was what I quoted.

          Finally I’m sorry I don’t get the joke about SAC. Did you feel M&N’s treatment inadequate?

        • pete
          Posted Dec 18, 2010 at 6:45 PM | Permalink

          You assert it was a “mistake” to use an ensemble and therefore because they did that M&N don’t understand central tendency. I’m sure you weren’t seriously suggesting that and were only making a debating point.

          The mistake was not using the individual runs. It’s not necessarily a mistake to look at the ensemble mean as well.

          And I’m not suggesting Ross doesn’t understand central tendency. I’m suggesting he doesn’t understand the implications of central tendency for model ensebles.

          I guess that begs the question whether the observed trend in the temperature gradient over the period 1979-2002 is an artifact of weather or climate?

          It’s well established that it’s climate. The observed trend is well outside the envelope of unforced model trends.

          Finally I’m sorry I don’t get the joke about SAC. Did you feel M&N’s treatment inadequate?

          They withhold 30% of data at random. Given the spatial autocorrelation, that’s a really easy test to pass.

        • bender
          Posted Dec 18, 2010 at 6:49 PM | Permalink

          Stop debating pete on points where he’s right unless you know what you’re talking about.

        • Ryan O
          Posted Dec 18, 2010 at 7:33 PM | Permalink

          Pete,

          Central tendency for model ensembles requires that models be random realizations of the same underlying population. While this is often asserted, it has never been established, and there are many reasons why it may not be true. The models use different parameterizations and include different physics, so if enough runs were available to do a sufficient Monte Carlo comparison of the results, my money is on the fact that they would show they are not likely to all be random realizations of the same population.

          Until this is established one way or the other, the central tendency argument is nothing more than an unsubstantiated claim that has some logical appeal, but no evidence to back it up.

          Which, of course, is more reason to test against individual runs . . . but also begs the question that if the models are not alternate representations of the same population, then, by necessity, at least some of them cannot be alternate representations of the real earth. At present, we have no way of knowing which are which (or if, indeed, any of them are alternate representations of the real earth).

          And even if they are alternate representations of the real earth, the scale at which that representation is valid has not been established, either.

        • bender
          Posted Dec 18, 2010 at 11:10 PM | Permalink

          There is an irony in pete’s bleating about McKitrick’s alleged lack of comprehension of the central limit theorem that no one has picked up on yet.

          Here’s a puzzle for pete. If you did a Monte Carlo simulation of 1000 model runs and each run to the socioeconomic data, what do you think the mean correlation static would be. Ha ha. Joke’s on you buddy. Smile for me, and confess that Ross knows something that you just learned today. Something about the CLT.

          (The only reason I suggested the Monte Carlo approach (as opposed to using the ensemble mean) is so that Gav could have his 5 in 1000. I didn’t want him batting 0. Ha ha.)

          Now, about pete’s bleating over SAC and the witholding of 30% being inadequate. Tell me how much you want withheld and give me citation. Then let’s do a bootstrap resample on each simulation run. Say a thousand resmples on 100 runs. That’s a milion simulations. It will take that pesky SAC right out of the analysis … and you know what the result is going to be? A slightly lower significance level for Ross. Oops. Ha ha. Try it.

          Meanwhile, I have a paper for you. Guess which one. Dated 2010. Hee hee.

          Now get back to your dorm room. Your KD’s burning.

        • pete
          Posted Dec 18, 2010 at 11:55 PM | Permalink

          Here’s a puzzle for pete. If you did a Monte Carlo simulation of 1000 model runs and each run to the socioeconomic data, what do you think the mean correlation stat[ist]ic would be[?]

          Zero?

          Now, about pete’s bleating over SAC and the witholding of 30% being inadequate. Tell me how much you want withheld and give me citation.

          I’m pointing out that Ross’s method is wrong. I don’t see how I’m obligated to fix it for him.

        • bender
          Posted Dec 19, 2010 at 12:30 AM | Permalink

          Heh heh. Wanna put money on it?

        • pete
          Posted Dec 19, 2010 at 1:45 AM | Permalink

          My apoologies, apparently you’re less stupid than your behaviour suggests.

          Yes, the limit will be small rather than actually zero (small based on Ross’s estimates for the ensembles). Point for you.

          Since we’re playing here’s a question for you:

          How do you estimate quantiles for a sampling distribution when all you have is an estimate for its location?

        • TAG
          Posted Dec 19, 2010 at 6:11 AM | Permalink

          Can I ask a simple layman’s question that has occurred to me. Apparently Schmidt says that there are spatial patterns in his model outputs but these patterns are meaningless.

          So the models produce meaningless patterns on the continental scale according to Schmidit. he then uses that to indicate that the pattern found by McKittrick in the real world data is also likely to be meaningless. Schmidt uses this as an “Aha!” indicatio that McKittirck analysis is incorrect.

          However I constantly hear of other patterns (droughts, floods, cats and dogs living together) that are found in model outputs for various scenarios. Schmidt’s comments seem to say that there are mechanisms known for these GCMs patterns but that McKittrick can produce no mechanism for his. So these other patterns are correct and McKittrick is wrong.

          However Hypothetical mechanisms with no justification in real world data are just so much arm waving.

          So if Schmidt says that patterns in GCM outputs re meaningless then why does this not hold fro all patterns found in GCM outputs Why is McKittrick wrong but these other scenarios correct?

        • bender
          Posted Dec 19, 2010 at 8:24 AM | Permalink

          apparently you’re less stupid than your behaviour suggests

          apparently you are prone to making quick assumptions about people.

          i’m not here to joust with you. i’m here to say UHI is underestimtated, and accounts for 1/3 of the supposed AGW in the GMT surface record.

        • Neil Fisher
          Posted Dec 24, 2010 at 3:55 PM | Permalink

          Re: pete (Dec 18 23:55), Pete, if you believe you’ve found an error in the publisher record, do you not feel obliged to correct it in the literature? Ross has already acknowledged (and issued corrections for) several errors that others have found and I would suggest to you that if he thought your point had merit, it would be on that list. If that’s true, then your failure to write such a comment leaves what you believe to be a significant error in the scientific record. Best you check if it “matters”, eh? I wouldn’t imagine you’d have difficulty finding a sympathetic referee.

          Bender: perhaps this is yet another aspect that belongs in your list of hypocracies (if you still have it) – Team papers need to be rebutted in detail in the litchurchur, anti-team papers (you know what I mean) need only be rebutted by unresolved speculation on blogs. Oh wait – nah, that one has to be in your list already!

        • pete
          Posted Dec 24, 2010 at 4:43 PM | Permalink

          Ross’s paper hasn’t been published yet. It saves everyone a bit of time and acrimony if errors can be identified and fixed early.

          For example, I expect Ross will fix the multicollinearity problem I pointed out (nb: doing so will strengthen his conclusion), without anyone needing to sumbit a comment to JESM.

        • pete
          Posted Dec 18, 2010 at 11:52 PM | Permalink

          The central tendency argument works best for an ensemble of runs from the same model (so I think we’re agreed on that part? I’m pretty sure the modelling community feel the same way too).

          Presumably there is something in the model averaging literature that can handle the non-random sample from some theoretical space of possible climate models.

        • bender
          Posted Dec 19, 2010 at 12:33 AM | Permalink

          dear pete,
          the correlation with the ensemble converges in the limit to the ensemble of correlations.
          your pal,
          bender

          (this proves how generous I am toward gav, letting him have 1000 swings at the ball. 5/1000 is better than 0/1, i guess)

        • bender
          Posted Dec 19, 2010 at 12:38 AM | Permalink

          “ensemble of runs from the same model”

          heh heh, the irony of a climatologue lecturing bender on statistics.

          pete, if it’s not the same model it’s, by definition, not an “ensemble” in the statistical sense of the word.

          only in climate science do they invent their own meanings for well-defined concepts, such as “ensemble”.

          pull out your kendall & stuart. when you’ve mastered that, then come back and lecture ross & me.

          thanks for playing.

        • Ryan O
          Posted Dec 19, 2010 at 9:17 AM | Permalink

          Bender is tricksy . . .

        • Steven Mosher
          Posted Dec 19, 2010 at 5:04 PM | Permalink

          Re: Ryan O (Dec 19 09:17), He’s wicked smart. I know I read the whole damn blog from start to finish as he suggested including the comments ( ok I skipped some of the unthreaded) and nobody comes away from an engagement with bender unscathed. I suspect he is the kind of guy moshpit would pick as a director for the dissertation.. and then cursed the choice till I was done.

        • j ferguson
          Posted Dec 19, 2010 at 5:46 PM | Permalink

          And Mosh, this is the guy who cleans your pool? Wet or dry?

        • pete
          Posted Dec 19, 2010 at 2:09 PM | Permalink

          Playing dumb so I give him the simplified answer, then pretending he’s all wise and Socratic because the simplified answer’s not the whole story? Yeah, tricksy about covers it.

          So bender: gonna answer my question?

          How did Ross expect to estimate quantiles for a sampling distribution when all he did was estimate its location?

        • bender
          Posted Dec 19, 2010 at 5:58 PM | Permalink

          look, pete, i’m not going to joust with you. i like you. if you have a problem with ross’s paper, then write a full rebuttal to the journal where he published. nitpicking is not a rebuttal. this is not a bluff. you’re smarter than me and ross? then move the goalsticks of science forward in a way that gavin couldn’t, or coudn’t be bothered.

          done for the holidays.

        • pete
          Posted Dec 19, 2010 at 6:24 PM | Permalink

          Gotta know when to fold ’em eh bender?

          Happy Holidays!

        • bender
          Posted Dec 19, 2010 at 7:27 PM | Permalink

          ask something coherent and relevant and i’m sure i won’t be able to resist replying. meanwhile i’ve asked a half-dozen questions here that you could respond to. some were even directed to you. so “you first” as they say. and if you can’t goad me into a reply, perhaps try something coherent that ross will find enticing.

          as i see it, you’re the one folding. i don’t do socratic with sophomores.

          happy holidays

        • Ryan O
          Posted Dec 19, 2010 at 8:21 PM | Permalink

          Pete,

          The answer is he can’t, but the more relevant question is whether it matters. Even a quick glance at Table II in S09 shows that M&N’s answer is not likely to change by including individual model runs. Even factor m only yields a correlation coefficient at the upper 95% CI of ~0.22, or half that of the observational data, and the wrong sign as well.

          While I still agree that M&N’s analysis would have been better were individual model runs used, the results aren’t likely to change.

          Besides, as Gavin was the one who first proposed the test in S09, he has the burden of proof to show that they would.

          Not to put words in Bender’s mouth, that is.

        • bender
          Posted Dec 20, 2010 at 11:45 AM | Permalink

          results aren’t likely to change

          Precisely. But this is what pete disputes. And on the basis of what evidence? He won’t say. Chooses instead to play guessing games.

          Bluff.

          And now he has Nicolas Nierenberg’s comments to reply to. Will he reply?

        • pete
          Posted Dec 20, 2010 at 8:26 PM | Permalink

          While I still agree that M&N’s analysis would have been better were individual model runs used, the results aren’t likely to change.

          Sometimes this gets lost in the back-and-forth: a better analysis is a good thing, even if the qualitative results don’t change.

        • morebrocato
          Posted Dec 20, 2010 at 8:47 PM | Permalink

          “a better analysis is a good thing, even if the qualitative results don’t change”.

          — Haven’t we been down this road before? Hasn’t ‘The Team’ dismissed publications, despite their ‘better’ analysis, becuase, upon their inspection, it didn’t change the qualitative results (significantly enough to their satisfaction)?

          ‘Better’ analysis apparently is only necessary when it challenges team-established climate understanding. If you’re already ‘in’, feel free to put out whatever you like. We could always add it to the literature with a “if confirmed, this could be awesome” tagline to it. Or maybe we could issue some corrigenda later that somehow still says there’s nothing wrong with the original (because so much of statistical anaylsis has improved in the last year).

        • bender
          Posted Dec 21, 2010 at 5:42 AM | Permalink

          A better analysis IS better … IF it is published. So what are you waiting for? If the soocioeconomics literature is so dang easy to penetrate, then go for it. Otherwise it’s just nitpicking commentary at a blog.

        • mikep
          Posted Dec 19, 2010 at 4:42 AM | Permalink

          In reply to Bender’s comment

          “Finally I’m sorry I don’t get the joke about SAC. Did you feel M&N’s treatment inadequate?”

          Pete says

          “They withhold 30% of data at random. Given the spatial autocorrelation, that’s a really easy test to pass.”

          Perhaps Pete missed table 3 in M&N 2010. Here is teh relevant quote from section 3.2

          “The first row of results refers to the original configuration in MM07: the CRU gridded trends
          regressed on the UAH tropospheric trends and the rest of the MM07 model variables in Equation (1). In this case the robust LM score is significant for both the dependent variable and the residuals, but much more so for the dependent variable, indicating that a spatial lag model is appropriate. The second and third rows show the test scores using CRU3v surface data and either UAH4 or RSS4 tropospheric series.
          In both cases the dependent variable lag is significant while the residual lag term is insignificant. Again this indicates that the spatial lag model is appropriate, and also indicates that the regression model is well-specified in the sense that the SAC is removed from the error terms.”

          Is Pete still under the impression that SAC in the dependent variable rather than the residuals is a problem that affects the use of the equation for forecasting? The point is that SAC in the dependent variable is a feature of the data that the independent variables should explain. If the independent variables do their job and there is no residual SAC in the estimated equation then is no problem with using the equation as is.

        • bender
          Posted Dec 19, 2010 at 8:16 AM | Permalink

          Is Pete still under the impression that SAC in the dependent variable rather than the residuals is a problem that affects the use of the equation for forecasting?

          That is my impression.

        • mikep
          Posted Dec 19, 2010 at 10:52 AM | Permalink

          Perhaps this ubiquitous (at least in climate science circles) mistake deserves a name: Gavin’s error?

        • Scott Brim
          Posted Dec 19, 2010 at 12:39 PM | Permalink

          Both words should be capitalized within that usage; i.e. Gavin’s Error.

          A more-or-less informal title for the same mistake might be worded as “Schmidt’s Slip-up.”

        • theduke
          Posted Dec 19, 2010 at 1:57 PM | Permalink

          Schmidt’s Fallacy?

        • Pat Frank
          Posted Dec 20, 2010 at 3:15 PM | Permalink

          HAS“”I guess that begs the question whether the observed trend in the temperature gradient over the period 1979-2002 is an artifact of weather or climate?”

          peteIt’s well established that it’s climate. The observed trend is well outside the envelope of unforced model trends.

          That argument is true only on the assumption that climate models accurately capture the climate of Earth. But whether models accurately capture Earth climate is precisely the larger question at hand.

          To suppose that unforced model trends define natural variability, and go on to suppose that observed trends are unnatural because they are outside of modeled trends, is to put the scientific cart before the horse.

          Climatology has jumped the scientific gun. Modelers apparently can’t think of how the models could be wrong and so have assumed their models are correct. Observations are then interpreted strictly in terms of models, with the physical truth of model outputs an apparently unquestioned given. This opposes the way science is actually done. In actual science, observations test theories.

          In fact, climate observations til now have nominally falsified model predictions. I write “nominally” because no one has ever propagated model uncertainties into their purported predictions.

          In any case, there are not enough climate data to delimit natural variability. Proxy paleo-data are far too crude, and surface temperature data prior to about 1950 are far too sparse. And, in fact, surface temperature data are uniformly too imprecise to constrain an interpretation of a trend less than about 1 C per century.

          This whole discussion is a fascinating exercise in detailed statistical reasoning, but the circular tendency implicit in putting model outputs ahead of observation places it squarely outside science.

        • mark t
          Posted Dec 20, 2010 at 7:35 PM | Permalink

          I’m thinking first courses in logic need to be taught in high school. It is baffling that people actually think this way, but it seems to permeate otherwise technical discussions regularly.

          Mark

        • ianl8888
          Posted Dec 21, 2010 at 1:23 AM | Permalink

          > … but the circular tendency implicit in putting model outputs ahead of observation … <

          Judith Curry, on her website, also nailed this circularity in the IPCC process about 6 weeks ago

        • Pat Frank
          Posted Dec 21, 2010 at 2:20 AM | Permalink

          Judith Curry is several years late to that conversation.

        • bender
          Posted Dec 21, 2010 at 5:35 AM | Permalink

          better late than never

        • Posted Dec 22, 2010 at 1:59 PM | Permalink

          Re: pete (Dec 18 16:43), I would argue with your claim that the weather is stochastic, particularly without having your definition of stochastic to examine (where I live the weather is a damped-driven dynamic system … tell me the current dew point and the direction and rate of change of the dew point and I will give you a low temp prediction +/- 2F for the next 24h, such as dew point 32F unchanging then I will predict the next 24h low to be 32F with a range of 30 to 34). Climate is deterministic, though, as it is strictly determined by the previously realized weather.

        • pete
          Posted Dec 22, 2010 at 4:30 PM | Permalink

          Weather is chaotic, in the sense that it has sensitive dependence on intitial conditions.

          This makes the weather realisations, over the sort of time frame a GCM uses, essesntially random.

          I didn’t mean to imply that weather couldn’t be predicted in the short term.

        • Posted Dec 23, 2010 at 12:49 PM | Permalink

          Re: pete (Dec 22 16:30), pete,

          Thanks, that I understand. What I don’t understand is the claim that weather is stochastic. Chaotic is not the same thing. Even granting that to the GCMs, in the real world, the only scale at which I would call weather stochastic is at the molecular level or over very short time periods; similar to radioactive decay or chemical reactions. However, in the bulk state and typical time scales, neither of those used to be considered stochastic back when I took chemistry and physics.

          Hm, yes weather (modeled) is sensitive to initial conditions and boundary values; but, pete, by definition climate must be also. That implies, to me, that what we call the climate must also be strongly sensitive (modeled) to initial conditions and boundary values; which means that any series of measurements made or model outputs obtained will have spatial and temporal auto-correlation (scale invariant maybe?).

        • pete
          Posted Dec 23, 2010 at 5:34 PM | Permalink

          You’re right, chaotic and stocastic aren’t the same thing. But if the perturbations to the initial conditions are random, then a chaotic deterministic system becomes a stochastic one.

          Climate is less sensitive to initial conditions than weather. I can’t predict whether October 11 next year will be warmer than October 10, but I can predict that winter will be colder than summer.

        • Benjamin
          Posted Dec 24, 2010 at 5:11 AM | Permalink

          This has nothing to do with “climate”, it only has to do with the fact that seasonal cycle variance is greater than day to day variance.

          That’s what we call “seasons”.

          It doesn’t imply anything about long term variance vs short term variance, and global variance vs local variance (which is what an average personn cares about).

        • pete
          Posted Dec 25, 2010 at 4:10 PM | Permalink

          Just calling them ‘seasons’ doesn’t magically make them predictable.

          We can predicted the seasons because they’re forced by Earths orbit, i.e they’re dependent on boundary conditions rather than initial conditions.

        • Harold
          Posted Dec 25, 2010 at 9:52 AM | Permalink

          Climate is less sensitive to initial conditions than weather? You speak as if these two things were the result of two different physical systems with two different sensitivities to initial conditions. This can’t be, so the perceived differences are definitional, not physical.

        • pete
          Posted Dec 25, 2010 at 4:09 PM | Permalink

          Suppose I spend all day flipping coins. The result of any given coin flip is sensitive to initial conditions. But the long run ratio of heads to tails isn’t. Same physical system, but different measurements with different sensitivities.

        • oneuniverse
          Posted Dec 25, 2010 at 6:19 PM | Permalink

          pete, each coin-flip is independent of the others. The relevant phsyical parameters of the environment and the coin itself remain constant.

          The climate system is different – coin flips are not independent – a wrong ‘coin flip’ will introduce an error in the energy balance (and other variables). There’s no obvious reason why these errors should cancel in the long run. Any such assumption needs to be demonstrated.

          Dr. Judith Curry directed me to this paper by L.Smith, “What might we learn from climate forecasts?”, PNAS 2002, which reiterates this non-controversial point:

          “Given a coupled nonlinear simulation, it is not clear that one can get the averages right without being able to simulate the details correctly.”

          “Given the nonlinearities involved, it is not clear whether a model that cannot produce reasonable “weather” can produce reasonable climate statistics of the kind needed for policy making, much less whether it can mimic climate change realistically.”

        • mark t
          Posted Dec 25, 2010 at 10:12 PM | Permalink

          Wow. Pete spends all this time arguing stats topics and then suddenly demonstrates a lack of understanding of i.i.d. Yeah, you can claim otherwise, pete, but your poor analogy is an intellectual ‘tell.’ If you really did understand, you would also understand why your example does not apply and thus, you would not have used it.
          Mark

        • pete
          Posted Dec 26, 2010 at 3:17 AM | Permalink

          Suddenly the Climate Audit crowd has a touching faith in models. It must be a Christmas miracle!

          Successive Bernoulli trials are i.i.d. Actual real-life coin flips aren’t.

        • mark t
          Posted Dec 27, 2010 at 9:43 PM | Permalink

          I apologize. Pete’s confusion is unrelated to the concept of i.i.d., though he does seem to think that the small perturbations that occur during a coin flip, a random process, somehow serves as an apt analogy to initial conditions (and the resulting long-term changes) in the climate/weather system.
          Mark

        • Posted Dec 25, 2010 at 12:54 PM | Permalink

          Re: pete (Dec 23 17:34),

          That’s a mighty big ‘if’ you are assuming away without evidence, in my opinion. It is also my opinion that there is a thing called Earth’s weather system, but not a separate thing called Earth’s climate system. Climate is strictly dependent upon and determined by the previously realized weather.

          Re the October prediction? Yes you can. About 2/3rds of the time it will be within a few degrees of this year’s numbers and about 50% of the time warmer or colder if this year’s value is at the median of the values previously seen.

        • pete
          Posted Dec 25, 2010 at 4:06 PM | Permalink

          Re the October prediction? Yes you can. About 2/3rds of the time it will be within a few degrees of this year’s numbers and about 50% of the time warmer or colder if this year’s value is at the median of the values previously seen.

          You missed my point. You can’t predict if 11 October 2011 will be warmer or colder than 10 October 2011.

        • Dave Dardinger
          Posted Dec 24, 2010 at 10:20 AM | Permalink

          cdquarles (Dec 23 12:49), Re: cdquarles (Dec 23 12:49),

          What I don’t understand is the claim that weather is stochastic. Chaotic is not the same thing.

          In Newtonian physics, given a given state of a system, the following states are deterministic, though they might be chaotic or not. In quantum physics, in addition to not being able to know the system exactly in the first place, there are many possible states following from a given state. This makes the very concept of chaos unrealistic, though we can postulate that there’s some sense in which many situations reduce to the Newtonian situation to a high degree of accuracy (Planetary movements might be such a situation). But I doubt that weather is one of them. There are too many interactions between the atoms in the atmosphere to make a mathematical system like chaos more realistic than just looking at the quantum results themselves.

        • Posted Dec 25, 2010 at 1:06 PM | Permalink

          Re: Dave Dardinger (Dec 24 10:20),

          Dave,

          I think that the normal rules of bulk chemistry and physics applies here given the types of measurements made in observing the system. Without other evidence to the contrary, that is. This has been shown, I think, that quantum chemistry rules simplify down to the usual rules in bulk systems when sampling chemical reactions or radioactive decay.

          I think that it has been shown that planetary orbits qualify for the Newtonian treatment and that the orbit of Earth is indeed deterministic and chaotic to a sufficient degree of accuracy and precision.

        • Dave Dardinger
          Posted Dec 25, 2010 at 2:38 PM | Permalink

          Re: cdquarles (Dec 25 13:06), Re: cdquarles (Dec 25 13:06),

          No, the sort of observation here isn’t just some sort of averaging but observing the particular configurations interacting. To keep it somewhat in the realm of what’s talked about on this board, it’d be like a member of the Team claiming that Climategate was meaningless because the sheets [as printed] were (on average) just a lite shade of grey and we didn’t need to consider the designs consisting of words as important since it’s the bulk printing which is important. So as not to make it too complicated, it’s the quantum mechanical version of the butterfly affect which we need to look at. The difference is that it’s not just slightly different initial conditions causing the final result to be completely different in each “run” of reality, but different spots in the hugely multi-dimensional phase space which are selected (randomly??) in such a “run”.

          And I’m afraid that we’ll have to leave it there as I know Steve doesn’t like this sort of discussion here. If you want to discuss it on some other board, let me know where you’re responding.

  45. Geoff Sherrington
    Posted Dec 19, 2010 at 3:35 AM | Permalink

    Brought up on differentiating between ‘precision’ and ‘accuracy’ or in other words ‘repeatability’ and ‘bias’, I’d like to see some more dissection of these concepts in the discussion above.

    Specific reasons – how valid is it to compare models from the lit when the authors have got together beforehand and submitted those that seemed clser to the rest (if they did)? What is the validity of rejecting runs that ‘don’t look right’ before they see the light of day (if they did)? How do you mathematically incorporate agreed conventions, such as on staring conditions and bounds for some parameters (like avoiding boiling oceans)(like they did)? Is it valid to specify the sign of nominated feedbacks? Is it valid to do these things if your intention is to apply classical statistical estimates of closeness?

    I’d be fascinated to hear from a modeller who did not swap notes with others before publishing. I keep coming back to out very own CSIRO comparison with 67 model runs http://www.pas.rochester.edu/~douglass/papers/Published%20JOC1651.pdf

    Pressure (hPa)–>Surface 1000 925 850 700 600 500 400 300 250 200 150 100 Model Sims.∗ Trends (milli °C/decade)
    CSIRO 163 213 174 181 199 204 226 271 307 299 255 166 53
    AVERAGE 156 198 166 177 191 203 227 272 314 320 307 268 78
    Std. Dev. 64 443 72 70 82 96 109 131 148 149 154 160 124

    See how close CSIRO Mark 3 is to the average of all the others in the mid range, 204/203, 226/227, 271/272 for example. If you are using closeness to the ensemble average as a criterion (and I do not suggest that you do) then the other modellers might as well pack up and go home.

    Except that these figures tell us precisely zero about freedom from from bias, as coming decades might reveal.

    Is there not a single modeller willing to step forward and say thet the calculations were entirely from first principles, unaffected by input from any other modellers? I have not seen such a statement yet, but then I have not read all of the literature.

  46. ben
    Posted Dec 19, 2010 at 6:38 AM | Permalink

    Steve, how you stay sane through all of this I don’t know. Breathing exercises? Yoga?

  47. Nicolas Nierenberg
    Posted Dec 20, 2010 at 11:03 AM | Permalink

    I have put up my thoughts about all of this at my blog .

    • Nicolas Nierenberg
      Posted Dec 20, 2010 at 11:04 AM | Permalink

      http://nierenbergclimate.blogspot.com/2010/12/socioeconic-trends-in-climate-data-is.html

      • bender
        Posted Dec 20, 2010 at 11:31 AM | Permalink

        Who would make a statistical argument without using the standard statistical tests in the literature?

        International Man of Mystery would.

      • bender
        Posted Dec 20, 2010 at 11:36 AM | Permalink

        Also remember that land makes up only one third of the Earth’s surface so even if there were a 50% error in land trends this would only be a 15% difference in the overall trend. Therefore this shouldn’t be an argument over the big picture.

        This is precisely what the warmista *should* be saying. But burying their heads in denial over UHI contamination problems – this only makes them look anti-scientific. Which they are.

  48. Posted Dec 20, 2010 at 1:31 PM | Permalink

    I’ve been offline for a couple of days. Before responding to what I take to be Pete’s main objection, indulge me one bleat.

    When Pat and I published MM2004, I was told that by a blogger that there was a cosine error and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll all ignore the results. So I fixed the calculations and published a correction showing the conclusions stayed the same, but the results were still ignored.

    Then was told that by a blogger that the error term is clustered and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll all ignore the results. So I fixed the calculations and found the conclusions stayed the same, but the results were still ignored.

    Then I was told that it was only a partial sample of the world and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll ignore the results. So I built a new data base covering the whole (CRU-available) world and re-did the calculations and got results (MM07) just as strong as before, but the results were still ignored.

    Then I was told at a conference that the results simply reflect the fact that Europe is a special case climatically and that’s what drives the results and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll ignore the results. So I re-did the calculations leaving Europe out and found it didn’t affect the conclusions, but the results were still ignored (and the questioner said he still wouldn’t believe them).

    Then I was told by a blogger that the temperature field is spatially autocorrelated and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll ignore the results. So I tested for SAC and re-did the calculations and found it didn’t affect the conclusions, but the results were still ignored. And the blogger refused to submit his argument to a journal so I couldn’t rebut it in print.

    Then I was told by the IPCC that natural atmospheric circulations account for everything and my results were actually insignificant and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll ignore the results. So I tested for circulation effects and re-did the calculations and published a paper showing it didn’t affect the conclusions, but the results were still ignored.

    Then I was told by Gavin that there’s SAC, and GCM runs can replicate the effects and the results don’t hold up on non-UAH data sets and even though he didn’t do the calculations needed to prove these things he was sure it meant I’m ALL WRONG and therefore we’ll ignore the results. So I tested for these things and re-did the calculations using non-UAH data and GCM data and published a paper showing it didn’t affect the conclusions. Now I am waiting to confirm that the results will be ignored.

    Meanwhile an anonymous commenter comes along saying that coefficients estimated on observed data cannot be compared to ensemble means, instead they must be compared to individual model runs, and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we should ignore the results.

    It could very well be the case that I’m ALL WRONG. It’s just that the people who say I’m wrong keep putting forward reasons that they don’t bother to check, and when I check them, they don’t hold up. I think the data show that the spatial warming pattern is correlated with the spatial pattern of socioeconomic development in a way that seriously undermines the view that homogeneity adjustments are adequate for removing non-climatic trends from surface temperature series. The fact that the people who should be most concerned about this problem–namely the main producers and promoters of these data products, including the IPCC–repeatedly react with careless and dismissive arrogance hardly inspires confidence in their position.

    Now, once more doing the checking….

    I have posted a
    log file showing a comparison of the coefficients estimated on all 55 GCM runs in my possession to the coefficients from observations (using the cru3v-uah4 pair). I have computed an indicator showing, for each of the socioeconomic coefficients, the fraction of the gcm-generated range that exceeds the draw from observations. Although it’s not a proper bootstrap, it’s all I can do given the limited gcm output I have. Also the gcm coeffs are not maximum likelihood because I am ignoring the SAC issue, for simplicity. Nonetheless taking the view that observed coeffs should be compared to the range of gcm-generated coeffs, this gives us an idea of whether the gcm range is big enough to encompass the observation-generated coeffs. The results should be self-explanatory, but for ease of reading I have added notes (scroll to the bottom).

    Beyond that, I note that Pete acknowledges that the 25-year trends are climatic, not merely weather; and ensemble means are indicative of climatic processes, not merely weather. It is the trends that are being compared, and moreover trends in backcasts using common forcing inputs, so use of ensemble means is certainly appropriate. How convenient, though, for defenders of a scientific theory to be able to argue that absurdities in individual GCM runs are irrelevant because it’s only the ensemble means that we should examine, and when absurdities in the means are noted they are dismissed because it’s only the individual GCM runs that we should examine, and the potential population of such runs is so large we will never actually have it.

    • TAG
      Posted Dec 20, 2010 at 3:56 PM | Permalink

      What does this say about the IPCC process?

    • HAS
      Posted Dec 20, 2010 at 4:18 PM | Permalink

      Amen to that, particularly the last para.

      What it leaves hanging however is the more important question:

      “Why should we think that the testing against a GCM in any shape or form has the ability to validate or otherwise the hypothesis that measured land surface temps are related to local socio-economic factors?” To the extent that this testing tells us anything it is more likely to tell us something about GCMs (with a nod to ocam).

      Every time I come face to face with the idea that somehow GCMs have a life of their own (literally) and are therefore of equal value as observations of the real world, I sense a deep philosophical divide that goes to the heart of modern scientific method and empiricism.

      So in my book teaching statistics isn’t going to solve this. A more fundamental need is giving a good grounding in formal systems theory, scientific method, and basic training in understanding the difference between the empirical universe and models, systems and meta-systems etc. Also teaching that modeling is purposeful so utility is a defining characteristic.

      It is of course quite valid to believe that models are the empirical universe, the contradictions arises when you start trying to tell other people to do something to change the future, if you see what I mean (“go talk to your model” :)).

    • pete
      Posted Dec 20, 2010 at 8:07 PM | Permalink

      Meanwhile an anonymous commenter comes along saying that coefficients estimated on observed data cannot be compared to ensemble means, instead they must be compared to individual model runs, and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we should ignore the results. (emphasis mine)

      I’d like to point out that this sort of conversation goes a lot better if you don’t put words in other peoples’ mouths. All I did was point out a fairly obvious error in your paper. I haven’t made any claims about the implications of that error.

      But apparently you’re upset because people are making a big deal out of errors that don’t matter. That’s an interesting line to take on a blog that has “Method Wrong + Answer Correct = Bad Science” as a mantra.

      • Nicolas Nierenberg
        Posted Dec 21, 2010 at 3:36 AM | Permalink

        Forgetting for the moment the surrounding conversations. I believe that Ross has now shown that looking at the individual runs doesn’t change the result, so I assume you are now satisfied?

        • pete
          Posted Dec 21, 2010 at 7:01 AM | Permalink

          It depends on what you mean by ‘satisfied’. I’m certainly more interested in your results.

          But I’m not satisfied that Ross understands the error he made. In his last comment he still claims that “use of ensemble means is certainly appropriate”.

          Once you’ve established a significant relationship between your variables and the temperature record the next step is to look at how that fits in with the other evidence. To do that you need a good estimate of the uncertainties in your result, which you can’t get without fixing this error.

        • bender
          Posted Dec 21, 2010 at 9:42 AM | Permalink

          Gosh, you love that word, don’t you? Contrary to your opinion, this is not an “error”, for the reasons I’ve outlined on three different occasions. If you think it is an “error” then you should write a letter in reply to the journal where they published.

          Oh right, your tactic is to not resolve anything because you’d prefer to maintain your license to continue using that word.

          I expected more from you, pete. You’re another Tom P. Bye.

        • Posted Dec 21, 2010 at 11:39 AM | Permalink

          Once you’ve established a significant relationship between your variables and the temperature record the next step is to look at how that fits in with the other evidence. To do that you need a good estimate of the uncertainties in your result, which you can’t get without fixing this error.

          No, I reject that statement. The paper discusses at considerable length the treatment of the uncertainties, i.e. the stochastic processes involved, and I defy you to find any treatment of the topic in the literature that has gone into this much detail to test for, model and treat the spatial dependence issue. The contrast between the maximum likelihood estimates of SAC in the observations versus the models is itself an important finding. You have yet to comment on that.

          Last fall I presented earlier versions of these results to a joint economics/climatology workshop at the University of Exeter, home of the Hadley Centre. Some of the comments that came out of that led to refinements of the analysis that went into the final version. Nobody described the use of ensemble means as an “error”. In fact given the combined expertise in the room, I’d have liked to see someone try.

          OK, I take back any insinuation that you think the results are ALL WRONG. It’s just that you haven’t yet indicated which results you think are valid.

        • bernie
          Posted Dec 21, 2010 at 12:27 PM | Permalink

          Pete:
          You say:
          “Once you’ve established a significant relationship between your variables and the temperature record the next step is to look at how that fits in with the other evidence. To do that you need a good estimate of the uncertainties in your result, which you can’t get without fixing this error.”

          Bender and Ross may understand what you are saying, but this seems to me to be somewhat imprecise and confusing. What exactly are you saying should be looked at with what other evidence? Which uncertainties are you talking about? Moreover, whose variables are you talking about – Ross’s or those of GCM’s? Do you have examples of where the authors of GCMs have done what you are saying Ross needs to do?

        • Ryan O
          Posted Dec 21, 2010 at 3:14 PM | Permalink

          He’s saying that looking at the ensemble mean does not give any indication of the variability from run-to-run.

          For example, let’s say that the ensemble mean yields a correlation of -0.15 to factor m. Based on the degrees of freedom remaining, we can obtain an estimate of the range for the ensemble mean using the standard error for covariances. Let’s say that gives a 95% CI of -0.15 +/- 0.05. But this only applies to the ensemble mean.

          Now, let’s say that you have 100 individual runs. The mean of the runs is -0.15, and the standard deviation (after the DoF correction for SAC) is 0.02. This gives a 95% CI of -0.15 +/- 0.04. Close to the ensemble mean, right?

          But let’s say instead that the 100 individual runs had a mean of -0.15, but a standard deviation of 0.20. Because the individual runs show so much more variability, the 95% CIs bloat to -0.15 +/- 0.40. Much different than the ensemble mean.

          So in theory, Pete’s question is a legitimate one. The problem with Pete’s question, though, is this:

          1. We do have individual runs (listed in S09), and even though the number of runs is small, one can still obtain an estimate for the standard deviation for individual model runs. Ross has even more results, and the variability in the model runs still does not account for the correlation seen in the observational data. So it’s a moot point.

          2. The validity of Pete’s question depends on the models being an appropriate null for the test. This is by no means proven, and using models as nulls for tests has been contested in the literature. The models do not show the same statistical properties as the real-world data.

          3. The modelers argue that it is only the ensemble means that have predictive power, not the individual models. So if it is only the ensemble means that are proposed to have a physical meaning, then the modelers are admitting that the individual runs are not likely to represent a real alternative representation of earth’s climate. Therefore, testing using the individual model runs is entirely nonsensical. They’ve already been declared to be unrealistic representations of the climate.

          4. None of this has any bearing whatsoever on the results that Ross gets from the observational data. This fact is routinely ignored by Pete and everyone else who doesn’t want to admit that UHI is a problem. The results that Ross gets are statistically significant in and of themselves, even after correction for SAC.

          5. Even assuming that Pete were right and the range calculated from the individual runs encompassed the coefficients calculated from the observational data, it doesn’t matter until the modelers can prove that the ensemble means or individual runs truly are alternate representations of earth’s climate, thus proving that the models are a better null than whatever noise model Ross used.

          6. The Team conveniently moves the goalposts on appropriate noise models for nulls. For a good example, see Ross and Steve’s critiques about MBH when the noise model is generated from empirical autoregressive coefficients and then take a look at Wahl & Amman’s responses. In that case, Ross and Steve had good reason to believe that higher-order coefficients were necessary to properly model the noise . . . yet the Team rejected the critique on the ground that it was unrealistic. Now, when Ross develops his UHI argument, the Team rejects the results because a demonstrably unrealistic noise model (the GCMs) kinda-sorta duplicates Ross’s results if it’s a full moon on Thursday and you’re reading the results while high on crack. The asymmetry of the Team’s statistical arguments is as plain as a neon sign.

        • bernie
          Posted Dec 21, 2010 at 4:14 PM | Permalink

          Ryan:
          Many thanks for the clear exposition of the points at issue. I assume we will hear from pete and bender if they have any differences in interpretation.
          The question in my mind remains as to why GCMs would not include UHI/Socioeconomic variables from the start that are known to have an impact on temperature. Are they concerned that they cannot effectively separate out these variables from CO2 forcings? If so, that rather begs the question doesn’t it?

        • Ryan O
          Posted Dec 21, 2010 at 4:15 PM | Permalink

          That one is easy. UHI is not a problem. So why include it? 😉

        • bernie
          Posted Dec 21, 2010 at 4:22 PM | Permalink

          Oh, you mean they are applying the Ostrich principle?

        • Ryan O
          Posted Dec 21, 2010 at 4:18 PM | Permalink

          The real answer, though, is that UHI is not a “forcing”. It causes contamination in the surface temperature record, making the surface temperature record appear to warm more than it really has. Besides, it’s hard to include UHI in a model unless you also include cities and make assumptions about population growth and other related things that have nothing to do with climate.

        • bender
          Posted Dec 21, 2010 at 5:33 PM | Permalink

          Yes, important to underline that it’s not UHI that is the problem, but UHI

          contamination

          . No one denies UHI. What is being denied is the possiblity that failure to correctly remove that effect inadvertently results in non-warming locations being warmed.

          Why the denial? I don’t know. I keep asking and I’m not getting any replies.

        • Posted Dec 21, 2010 at 4:30 PM | Permalink

          The Ostrich theorem is invoked. To give one example:

          Inhomogeneities in the data arise mainly due to changes in instruments, exposure, station location (elevation, position), ship height, observation time, urbanization effects, and the method used to calculate averages. However, these effects are all well understood and taken into account in the construction of the data set.

          (From Jun et al, quoted in the paper). See discussion in the MN10 paper. They don’t deal with the problem because “it’s not a problem.”

        • bender
          Posted Dec 21, 2010 at 5:42 PM | Permalink

          But if the effect is “taken into account” then how do the correlations arise between real climate data and socioeconomic data? Gavin is asserting that this could occur by random chance alone, climate being a stochastic process well-represented by the models? Come on, Gavin, there’s a 5% chance you’re right. Tops. UHI contamination is WAYYYY more likely.

          What are you afraid of, Gavin? That you’ll have to re-parameterize your model, thus finally being forced to admit that they are in fact tuned to observations? And that this tuning is quite critical to model performance?

          ‘Fess up. Whole truth.

        • Harold
          Posted Dec 25, 2010 at 10:20 AM | Permalink

          “could occur by random chance alone”

          This one makes me giggle. Maybe I should find a youtube video showing a RC glider climbing the thermal over a large parking lot on a sunny day. Maybe it’s un-scientific in a way, but it’s akin to the “rubber seal and ice water” demonstration.

          If I can use the secondary results of an effect to fly a plane, my guess is the primary effect is too large to legitimately ignore.

        • bernie
          Posted Dec 21, 2010 at 5:52 PM | Permalink

          Ross:
          Nice quotation. It is hard to imagine a more compelling operational definition of the Ostrich Principal/Theorem!
          It reminds me of the old joke of the guy with a stomach pain who goes to a number of different medical specialists and gets a different diagnoses based on the doctor’s specialty. Scientists cannot just ignore variables simply because they are not in their area of focus. As a group of models, GCMs are clearly underspecified and are therefore seriously flawed.

        • bender
          Posted Dec 21, 2010 at 5:59 PM | Permalink

          bernie,
          It’s probably not the under-specification of GCMs that is the problem. I’m surprised you’re missing the point; it’s not like you. Search the blog for UHI contamination.

        • bernie
          Posted Dec 21, 2010 at 6:09 PM | Permalink

          bender:
          OK I see your point – but then how do you characterize socio-economic variables that Ross is working with?

        • bender
          Posted Dec 21, 2010 at 6:47 PM | Permalink

          Sorry, I’m not understanding. How do I characterize these variables? My understanding is that the socioeconomic variables are a proxy for human activity & infrastructure – waste heat & asphalt – which are more concentrated in urban areas, and are probably on the rise throughout most of the world through the post-war 1950s-80s. My surmise – and maybe I’m wrong here – is that some of the apparent warming – that inside the UHI – is UHI effect that has not been “adequately accounted for”. But some of the warming is manufactured, and lies outside the urban core, in areas where attempts at correcting UHI have inadvertently led to the artificial warming of truly rural (and thus non-warming) sites. My guess/assumption is that’s why it takes several socioeconomic variables to account for the spatial pattern of warming – it’s different aspects of UHI effects occurring at finer and broader spatial scales.

          I don’t know if I’m right or if Ross & Nicolas agree. But does that answer your question?

        • TAG
          Posted Dec 21, 2010 at 7:19 PM | Permalink

          Layman’s description – not for the participants but for people like me

          As I understand it, the issue is not the GCMs. it is the historical temperature record because it has not been adjusted to account for the changes in UHI.

          McKittrick says that warming in the historical record is correlated spatially with the degree of economic growth (GDP?). Thus UHI effects have not been removed from the historical record. A significant proportion of the purported AGW warming is thus an artifact of inadequate UHI adjustment. The historical record is thus contaminated.

          Schmidt says that the GCMs produce simulated temperature records that show the same variation and there therefore McKittrick’s correlation is spurious.

          McKittrick says that he checked his calculations with the GCM runs that he has and that Schmidt’s assertion does not hold

          The argument on this blog is to the degree of variability that should be taken in describing the historical record.

          Should the hypothesis be tested against a number of individual runs and the number of runs having the same or greater degree of correlation be used as a measure. Or should the individual GCM runs be consolidated in an ensemble man with the variabilityof this mean being
          concomitantly less

          I know that this is inaccurate but I hope it does have the flavour of what is going on.

        • bender
          Posted Dec 21, 2010 at 5:30 PM | Permalink

          This is consistent with both my understanding and my understanding of pete’s complaint. Thanks to Ross, Nicolas and Ryan O for choosing to engage.

          Give pete *some* credit. He spotted a small but non-trivial error in Ross’s code (for which Ross thanked him). I’m sure pete won’t let it go to his head.

        • bender
          Posted Dec 21, 2010 at 5:34 PM | Permalink

          This was a reply to bernie. Why it got placed here, I have no idea.

      • bender
        Posted Dec 21, 2010 at 5:46 AM | Permalink

        pete, please reply to Nicolas and Ross. I see the word “error” in your comment a couple of times, yet you have actually not made your case. We’re listening, so proceed.

        • pete
          Posted Dec 21, 2010 at 6:32 AM | Permalink

          Are you really listening bender? My case is:

          The regression coefficients for the ensemble mean only give you the location of the sampling distribution. If you want information about the scale of the sampling distribution (which you need if you want to claim that a coefficient is ‘significant’) then you have to look at the individual model runs.

        • bender
          Posted Dec 21, 2010 at 8:25 AM | Permalink

          I’m listening, but I’m afraid I don’t understand what you’re saying. Merely repeating yourself isn’t going to help. “location” and “scale” of sampling distributions? These are geographicsl terms, not statistical, so I’m afraid your jargon is opaque to me.

          “Significance” of a coefficient is shorthand for a coefficient being significantly different from zero. Not sure why you feel the need to put quotes on it.

        • bender
          Posted Dec 21, 2010 at 9:33 AM | Permalink

          Wait a sec. You are simply talking about the mean and variance of the distribution of correlation coefficients. But this is no different from what was established a week ago, where I said:

          “Gavin’s complaint about individual runs versus ensemble averages in significance testing is half-right”

          McKitrick and Nierenberg 2010 Rebuts Another Team Article

          where I proceeded to explain why he’s not 100% right.

          This was TWO DAYS before you visited this blog. So you’ve really contributed nothing to the discussion. Gavin made an assertion. I half agreed. We all think it doesn’t make a difference because the ensemble of correlations that you lust after converges on the ensemble correlation.

          Why don’t you and Gavin just admit UHI contamination is a problem? Is it really so hard?

        • pete
          Posted Dec 21, 2010 at 10:03 PM | Permalink

          I’m listening, but I’m afraid I don’t understand what you’re saying. Merely repeating yourself isn’t going to help. “location” and “scale” of sampling distributions? These are geographicsl terms, not statistical, so I’m afraid your jargon is opaque to me.

          If you’re going to tell me to go read Advanced Theory of Statistics, then don’t complain if I start using proper statistical terminology. (‘Location’ and ‘scale’ are slightly more general terms than ‘mean’ and ‘variance’, and apply even when a distribution doesn’t have a mean or variance. That’s not an issue here, so feel free to read them simply as ‘mean’ and ‘variance’.)

          We all think it doesn’t make a difference because the ensemble of correlations that you lust after converges on the ensemble correlation.

          The estimate for mean of the sampling distribution of the regression coefficients converges to the regression coefficient of the ensemble.

          The estimate for the variance of the sampling distribution will converge to some strictly positive number. We need this number to determine significance, and we can’t get it by looking at the ensemble.

        • bender
          Posted Dec 21, 2010 at 11:34 PM | Permalink

          It’s about time that you clarified for us that you indeed have nothing to add beyond what I stated 6 days ago.

      • bender
        Posted Dec 21, 2010 at 6:04 AM | Permalink

        Ross: an anonymous commenter comes along saying that coefficients estimated on observed data cannot be compared to ensemble means, instead they must be compared to individual model runs

        pete: this sort of conversation goes a lot better if you don’t put words in other peoples’ mouths

        Ross is not “putting words in your mouth”. I actually said this – and long before you showed up here. You’re the one who’s making a big deal of it, whereas I explained why it doesn’t matter. If you’re now agreeing that it doesn’t matter then, terrific, we all agree: UHI contamination is a serious problem.

        You want to publish GOOD METHOD = GOOD SCIENCE? Then go for it! But let’s get facts straight. It’s Steve’s blog. It’s Wegman’s “mantra”. No one here is interested in nitpicky correctess for correctness sake. We commenters are mostly engineers and we just don’t want the plane to crash. So methodological dogma is, I would say, not a big part of our “mantra”.

        Wegman was commenting on the unreliability of climate proxies – a skepticism which we now know was actually shared by Phil Jones. To suggest that Ross’s paper is as erroneous as tree rings are imprecise is laughable.

        Get serious. Or make us laugh. You choose.

    • pete
      Posted Dec 20, 2010 at 8:16 PM | Permalink

      Beyond that, I note that Pete acknowledges that the 25-year trends are climatic, not merely weather; and ensemble means are indicative of climatic processes, not merely weather. It is the trends that are being compared, and moreover trends in backcasts using common forcing inputs, so use of ensemble means is certainly appropriate.

      No, I was referring to the 25-year global trends.

      How convenient, though, for defenders of a scientific theory to be able to argue that absurdities in individual GCM runs are irrelevant because it’s only the ensemble means that we should examine, and when absurdities in the means are noted they are dismissed because it’s only the individual GCM runs that we should examine, and the potential population of such runs is so large we will never actually have it.

      You still seem to be having trouble grasping the difference between the ensemble mean and the individual runs.

      If you want to know the sapmpling distribution of your regression coefficients, it’s vital that you take into account the variation about the ensemble mean. As bender so kindly points out, regression results for the ensemble mean only give you the location of your sampling distribution, not its scale.

      It sounds like you want a nice, simple, back-and-white rule: “always use the ensemble mean” or “always uese the individual runs”. But it doesn’t work that way; whether one or the other is appropriate depends on what you’re using them for.

      • HAS
        Posted Dec 20, 2010 at 9:17 PM | Permalink

        Pete, I guess my point is that it is quite acceptable to take the view that you do not “want to know the sampling distribution of your regression coefficients”. I might be quite satisfied just knowing the sampling distribution of the temperature gradients as derived from the ensemble. In fact I think there are good methodological reasons why the problem should be framed this way.

      • bender
        Posted Dec 21, 2010 at 5:51 AM | Permalink

        You still seem to be having trouble grasping the difference between the ensemble mean and the individual runs.

        Anyone with such misguided thinking is not worth replying to. So why bother? Why don’t you focus instead on a coherent reply to Ross and Nicolas? Explain “the error”. And are you conceding that “the error” doesn’t matter?

    • bender
      Posted Dec 20, 2010 at 10:03 PM | Permalink

      Excellent, excellent summary.

      Too bad pete has chosen for the more acrimonious path in reply. pete, I’d have preferred it if you could have commented on the whole proceedings, rather than continue with the irrelevant nit-picking. But I understand. Gavin is your pal and you don’t care much about science at this stage.

    • Posted Dec 20, 2010 at 11:16 PM | Permalink

      The rule of thumb seems to be “Don’t piss off Canadians who are good at Math.”

      • theduke
        Posted Dec 20, 2010 at 11:21 PM | Permalink

        “. . . truth, justice, and the Canadian way.”

    • pete
      Posted Dec 21, 2010 at 7:45 AM | Permalink

      I’ve never used Stata, so it’s a little hard to interpret that log file. Are the trend fields for the individual runs in degrees/year or degrees/decade?

      • Posted Dec 21, 2010 at 11:18 AM | Permalink

        Good catch. You are right: in the main code files for the paper everything is matched up to deg/decade. Here I missed that step for the individual gcm runs. I have posted a revised log file at the same link.

        There is some bread for you in these revised runs. In 43 of the 55 runs (78%) the GCM does not yield any matches (down from 100%). In 8 runs (15%) the range for 1 of the 6 coeffs encompasses the observed coeff (up from 0%). In 3 runs (5%) it does so for 2 coefs and in 1 run it does for 3 (up from 0%). In none of the runs does a GCM yield 4 or more coeffs exceeding those in the observations.

        Looking at the GCM-generated coef ranges, in 1 case just under 10% of the range exceeds the observed coef. In the other cases the overlap is 5% or less, indicating that the coefs are in the tails of the gcm ranges.

        Out of 55*6=330 socioecon coeffs, in 17 cases the GCMs produced a coeff that exceeded the value estimated on observations, which is 5%. So they are getting a match at about the rate we would expect from Gaussian random guesses. Given a large enough list of GCM runs you will eventually get one that yields all 6 coeffs exceeding the observed values. But we can’t say how many model runs you’ll need since the success rate on that score was 0/55 in this sample.

        With regard to the climate/weather issue, you can’t appeal to the fact that you were referring to the global trend as a climatic variable, unless you are also claiming that “climate” has no meaning as a local phenomenon. In other words if all local trends are “weather” and climatic trends are only observed at global levels, then who cares about climate? We only ever experience the local trends. If we cannot say that a 25-year trend in a 2.5 degree grid cell is an indicator of the local climate, then why does the IPCC publish those maps with coloured grid cells? Why not just use one colour for the whole world?

        But if you are prepared to accept the grid cell trend as a climatic variable, in some sense, then a comparison to ensemble averages is appropriate. If you are going to set aside MN10 on the basis that you reject the concept of a grid cell trend as a climatic variable, so be it: but I think you will find you have to reject a lot of empirical papers on that basis.

        • bender
          Posted Dec 21, 2010 at 11:38 AM | Permalink

          Admits errors. Redoes calculations. Willing to be convinced by new data.

          He’s not allowed on the Team.

        • Steven Mosher
          Posted Dec 22, 2010 at 12:30 AM | Permalink

          Re: bender (Dec 21 11:38),

          http://www.agu.org/meetings/fm10/lectures/lecture_videos/A42A.shtml

          here. you and Ryan should watch this

        • bender
          Posted Dec 22, 2010 at 10:38 AM | Permalink

          Hey, Gavin, what did he say at 17:46?

        • bender
          Posted Dec 22, 2010 at 10:46 AM | Permalink

          24:33 Oops

        • bender
          Posted Dec 22, 2010 at 10:48 AM | Permalink

          26:09 missing box

        • bender
          Posted Dec 22, 2010 at 10:50 AM | Permalink

          27:30 structural errors common to all models

        • Ryan O
          Posted Dec 22, 2010 at 11:06 AM | Permalink

          It’s times like these that I hate dialup . . .

        • bender
          Posted Dec 22, 2010 at 11:16 AM | Permalink

          43:30 skewers multi-model “ensembles”

        • bender
          Posted Dec 22, 2010 at 11:24 AM | Permalink

          49:20 affordability of emissions cuts given model uncertainty; futility of the precautionary principle

        • bender
          Posted Dec 22, 2010 at 11:26 AM | Permalink

          51:10 only modest effort to resolve uncertainty

        • bender
          Posted Dec 22, 2010 at 11:34 AM | Permalink

          57:55 cloud feedbacks & probability of climate catastrophe

        • bender
          Posted Dec 22, 2010 at 11:35 AM | Permalink

          59:00 high climate sensitivities result from unphysical cloud parameterizations

        • bender
          Posted Dec 22, 2010 at 11:39 AM | Permalink

          62:00 “now that the cat is out of the …” ????

        • John M
          Posted Dec 22, 2010 at 3:27 PM | Permalink

          Myabe that’s where Mosher tried to ask a question. Someone must’ve recognized him.

        • Steven Mosher
          Posted Dec 22, 2010 at 5:28 PM | Permalink

          Re: John M (Dec 22 15:27), Dude somebody did recognize me at Jones talk. It was funny as hell. My badge was turned around and she was saying ” you look familar?” whats your name? Ha. I looked at my badge. Opps. Then thinking quickly I said, steve smith. WAAA. Anyway, I would NEVER ask a question at such an event. I was an observer. For the most part I liked everything I saw and heard. The people were great. the youth earnest. All good. That’s why I suggested that AGU should invite skeptics. Not to talk, but to listen and just walk around.

        • bender
          Posted Dec 22, 2010 at 11:43 AM | Permalink

          Steve M: you will probably want to scrutinize this material for a post.

        • Steven Mosher
          Posted Dec 22, 2010 at 5:21 PM | Permalink

          Re: bender (Dec 22 11:43),

          Bender you and Ryan and Pete may want to watch all the videos on the Netwon institute site. I have forwarded them to SteveMc. There is also one where a Baysian does something similar to Lucias excercise with a two box model. There is also a FASCINATING session on emulating the outputs of a GCM.. and how that emulation discovered an error in the model. Palmer gets climategate. I wont put words in his mouth(go listen), but he sees that the science needs to focus on uncertainty. He sees opportunity in this debacle. Judith also gets it. She wonders if the science can rule out “King Dragons” ..outliers in powerlaw distributions. Peter Webster gets it. We spent a couple minutes talking about ICOADS at lunch. Peters question: How can you understand natural variability when you Run everything through EOFs and smooth out all the bumps ( my translation of heavy maths talk that was above mosphit paygrade)

          In any case. What was SteveMc’s primary concern in all of the hockey stickery. When it comes down to it all of mann’s shenanigans result in NARROW CI’s. Obsborn, bless his soul, even noted that the CIs were smaller than the instrument series in some cases. I’ll sum up all of steve’s scientific points in one phrase. We are more uncertain than Mann thinks.

        • Steven Mosher
          Posted Dec 22, 2010 at 5:59 PM | Permalink

          Re: bender (Dec 22 11:43),

          http://www.newton.ac.uk/programmes/CLP/seminars/index.html

          http://www.newton.ac.uk/programmes/CLP/seminars/092110159.html

          http://sms.cam.ac.uk/media/1081571?format=flv&quality=high&fetch_type=stream

          Lots more bender. Moshpit is back in school..watching them all

        • bender
          Posted Dec 22, 2010 at 8:32 PM | Permalink

          Did you notice it was Alan Robock introduced Tim Palmer?

        • bender
          Posted Dec 22, 2010 at 8:33 PM | Permalink

          Do you think it was Gerald Browning at AGU that got yanked, asking about the “cat out of the bag”?

        • EdeF
          Posted Dec 23, 2010 at 2:39 PM | Permalink

          Tim Palmer has an interesting brief on the history of GCM
          modeling. He even mentions the U-word. Interesting that they don’t use random variables, it sounds like small changes in initial conditions produces enough variability for them to handle at the moment. Use of random variables would cause them to have to make orders of magnitude more runs, which they can’t do at present. Since small changes in input may lead to large effects downstream, I am interested in why they have left out human-generated heat in their models. Possibly its just too big of a job to do at present. Looks like they have added new capabilities each decade since the 50s. I thought that the original idea for the GCM was merely to estimate the sensitivity of the climate to say a doubling of CO2. For some reason this has morphed into a belief that they can predict large time scale changes in climate. Now, what do you do when your models underpredict the temperature record as found in the land station data? There might be a tendency to tweak knobs and throw in some fudge factors.

          The use of large scale models has one overriding goal: to help people understand physical processes. A number is not the goal, human understanding is the goal. Are we getting any wiser?

        • Speed
          Posted Dec 23, 2010 at 3:23 PM | Permalink

          … it sounds like small changes in initial conditions produces enough variability for them to handle at the moment.

          Don’t weather simulations (forecasts) depend on initial conditions while climate simulations (forecasts) depend on boundary conditions?

          But forecast skill of even our new models degrades with time and by 1-2 weeks ago there is little forecast skill left. So knowing that, how can we forecast 50 years from now what will be happening with increasing greenhouse gases?

          Technically, one is an initial value problem and other a boundary condition problem.
          http://cliffmass.blogspot.com/2009/08/global-warming-misconceptions_17.html

        • bender
          Posted Dec 23, 2010 at 6:38 PM | Permalink

          This is OT. (As are musings about chaos & weather versus climate.)

        • Steven Mosher
          Posted Dec 22, 2010 at 12:39 AM | Permalink

          Re: bender (Dec 21 11:38),

          http://sms.cam.ac.uk/media/1083628;jsessionid=8870C1C6C261914F335D7B4024E2566D?format=flv&quality=high&fetch_type=stream

        • HAS
          Posted Dec 21, 2010 at 3:13 PM | Permalink

          It is still possible to argue (and effectively Schmidt does so in a comment in RC) that each run with initial conditions represents a different world, and there is one run that models the real world. (However it is also claimed that runs with a range of realistic initial conditions cluster and thus allow us to make inference about what the real world one would look like).

          Of course this means that the Schmidt test for contamination (“you’ve got to show it isn’t in the model runs”) breaks down. Without some additional claims about the relationship between initial conditions and contamination, or contamination across runs, you can only run the test if you are given the real world-run. Note that on this interpretation testing the ensemble means doesn’t help with the test.

          So I think I’m moving my position. I no longer think that there is nothing wrong with testing the ensemble, I now think testing individual runs is a fool’s errand.

        • Ryan O
          Posted Dec 21, 2010 at 6:54 PM | Permalink

          I would go the other way.

          The whole reason (in my opinion) for ensemble means being used at all is to allow modelers to place a greater certainty on the “projections” than is warranted and to account for the fact that most individual runs deviate quite significantly (especially on regional levels) from what is known to have happened in the hindcasts. It’s a way to turn something that has more internal variability than they want to admit into a pseudo-deterministic projection that looks good on paper.

          The real climate isn’t an ensemble. It’s a single realization of a stochastic process. Either an individual model run represents a possible earth, or it doesn’t. Of course, this has the obvious corollary that the either models have real problems or the internal variability of the climate is much greater than the party line. And if the individual runs cannot be said to represent earth, then one must ask how the average of non-earths information can be anything but an average non-earth.

        • bender
          Posted Dec 21, 2010 at 7:01 PM | Permalink

          The ensemble allows you to focus on that which is hypothesized to be deterministic, such as GHG forcing. With individual model runs the stochastics may prevent you from seeing things like determinstic trends that are a response to simulated external forcings.

          The statistics of individual runs are relevant.

          Of course, the most relevant statistic of all would be the one Willis wants to see – the statistics of individual runs that are dismissed as “unrealistic” versus those that are accepted as “realistic”.

          Asking that question gets them talking about the value of ensembles in a hurry.

        • Ryan O
          Posted Dec 21, 2010 at 8:12 PM | Permalink

          Agreed, but the range of the individual runs is what needs to be used to set your confidence intervals . . . and the range needs to determined without any post hoc elimination of runs.

        • bender
          Posted Dec 21, 2010 at 8:14 PM | Permalink

          yep

        • HAS
          Posted Dec 21, 2010 at 8:18 PM | Permalink

          In the end we can probably all agree that the statistics of individual runs or ensembles are only relevant if Schmidt’s test is relevant in some shape or form.

          It isn’t.

          The idea that one introduces GCMs into the validation of empirical results derived independently is a joke and should be treated as such (particularly given all the questions about the quality of GCMs in hind casts – and in addition to the various issues mentioned here I’d particularly remind about problems at the sub global level).

          In any normal discipline the conversation would have been along the lines of: “That’s an interesting result M&N, lets apply it to the output of our GCMs to see what it tells us about how well they are performing. Wow from the small number of runs we have it looks as though our GCM models are getting it right – they show no contamination – but better check that we’re handling UHI correctly in our core data.”

        • bender
          Posted Dec 21, 2010 at 8:49 PM | Permalink

          “The idea that one introduces GCMs into the validation of empirical results derived independently is a joke”

          I’ll debate that one.

        • HAS
          Posted Dec 21, 2010 at 9:05 PM | Permalink

          If the validation failed what would you conclude?

        • bender
          Posted Dec 21, 2010 at 11:29 PM | Permalink

          Fair enough. Nobody should take any test against a GCM as “conclusive”. It is at best “indicative” or “suggestive”. But the idea is not “a joke”.

        • HAS
          Posted Dec 22, 2010 at 1:10 AM | Permalink

          I think you’d conclude there was something wrong with the Climate Model(s).

          If you are looking for validation of the UHI effects identified by M&N there are much more direct experimental ways to do it. Schmidt’s response is a classic case of if you’ve only got a hammer everything looks like a nail.

          We seem to give GCM as they stand a status well beyond their due. They are interesting laboratories in which to test the science, but if I want to validate UHI effects or estimate the temperature probability distribution in 2050 give me a plumber any day.

        • Neil Fisher
          Posted Dec 23, 2010 at 11:52 PM | Permalink

          “The idea that one introduces GCMs into the validation of empirical results derived independently is a joke”

          I’ll debate that one.

          As would I – but I’m not agreeing with bender. “Joke” is not quite right, even though it looks at first glance to be laughable (to me at least) to use an unvalidated, unverified model to validate unrelated empirical observations. If I were being generous, I would would call it “an interesting suggestion”; if I were being parsimonious, I would call it “ridiculous” or perhaps “completely unjustified”. Read it again bender – “The idea that one introduces GCMs into the validation of empirical results derived independently is a joke”

        • Pat Frank
          Posted Dec 22, 2010 at 12:39 PM | Permalink

          A 2005 paper published in Nature by the ClimatePrediction.Net team* revealed their ratio of kept vs. total, at any rate. In the control experiment, they ran 2,017 “unique simulations” of which they had to discard 869.

          This was because the 869 simulations showed global cooling with constant forcing (Figure 1a). The rationale for the dismissal was that, “Some model versions show substantial drifts in the control phase owing to the use of a simplified ocean. We remove unstable simulations …

          In the doubled CO2 regime, six of 414 model versions were uncooperative and showed cooling. This was again ascribed to “the use of a simplified ocean.” So, these six simulations were “excluded from the remaining analysis of sensitivity.”

          Apparently, no one wondered whether use of a simplified ocean might delegitimize the ‘stable simulations’ as well.

          *D. A. Stainforth, et al., (2005) “Uncertainty in predictions of the climate response to rising levels of greenhouse gases” Nature 433, 403-406.

        • Steven Mosher
          Posted Dec 22, 2010 at 5:31 PM | Permalink

          Re: Pat Frank (Dec 22 12:39),
          Slab ocean is crucial.. so is sulfur cycle

        • pete
          Posted Dec 21, 2010 at 11:01 PM | Permalink

          With regard to the climate/weather issue, you can’t appeal to the fact that you were referring to the global trend as a climatic variable, unless you are also claiming that “climate” has no meaning as a local phenomenon.

          We can use observed trends as estimates for climatic trends. In the global or regional case, the standard error in these estimates should be low enough that we can make inferences about the climatic trend from the observed trend.

          If the standard errors are too high (say at the gridcell scale) then any given observed trend doesn’t tell us a lot about the climatic trend. This is not the same as saying the climatic trend doesn’t exist.

          But if you are prepared to accept the grid cell trend as a climatic variable, in some sense, then a comparison to ensemble averages is appropriate.

          A comparison to ensemble averages is fine. The question is how do you make that comparison? How large does a difference need to be before it’s significant? In order to make that comparison you need an estimate for your standard error.

          The standard error is a result of variability about the ensemble mean. If you remove that variability by taking the average over the model runs, you remove your ability to estimate that standard error.

        • Salamano
          Posted Dec 21, 2010 at 11:30 PM | Permalink

          I’m a little confused.

          How is this when compared to the weather world, as opposed to the climate world? We use ensemble modeling all the time, and we use many runs that take the initial state of the atmosphere and perturbate it to various small degrees to either side such that when taken togther, the envelope is assumed to account for all such inconsistencies, abnormalities, non-adjustments, and small-scale variability such that the result can be presumed to be a solid representation of what’s going on.

          In the climate world, does the use of ensembles therefore state that ‘everything we haven’t already thought of’ is similarly negligible because it’s either too insignificant or already accounted for by the perturbation of the initial conditions run through the model?

          I also know in the weather world, if we discover something that affects the initial state conditions of the model, it’s addressed…not dismissed under the penumbra of ensemble envelope accounting. Would it even make sense to dismiss such things when the objective is accuracy?

        • bender
          Posted Dec 21, 2010 at 11:32 PM | Permalink

          sorry, Salamano, pete has other questions to reply to.

        • Ryan O
          Posted Dec 22, 2010 at 1:47 AM | Permalink

          Pete needs a lesson in estimating standard errors.

        • pete
          Posted Dec 22, 2010 at 2:09 AM | Permalink

          Care to elaborate?

        • bender
          Posted Dec 22, 2010 at 9:55 AM | Permalink

          Why should Ryan O elaborate on standard errors, when you won’t elaborate on your F-test? Double-standard much?

        • Ryan O
          Posted Dec 22, 2010 at 2:21 AM | Permalink

          McKitrick and Nierenberg 2010 Rebuts Another Team Article

          By the way, trying to define the standard error as the spread of individual runs about the mean does not win this argument. You must first show why individual model runs are a good null, and then you must show why a feature of the climate in the observational data (as Ross’s socioeconomic correlations persist for greater than the magic 30-year number) should not be tested against what you yourself called “climate” – the ensemble mean – and instead should be tested against what you yourself called “weather”.

          You can’t have it both ways. Pick one.

        • pete
          Posted Dec 22, 2010 at 5:07 AM | Permalink

          None of this has any bearing whatsoever on the results that Ross gets from the observational data. This fact is routinely ignored by Pete and everyone else who doesn’t want to admit that UHI is a problem. The results that Ross gets are statistically significant in and of themselves, even after correction for SAC.

          Statistical significance requires some sort of null model. If not GCMs then what?

          you must show why a feature of the climate in the observational data (as Ross’s socioeconomic correlations persist for greater than the magic 30-year number)

          There’s nothing magic about 30 years; it’s a useful rule of thumb but it depends on the scale of the observations.

          The trend field is a combination of forced climatic trends and endogenous internal variability. If you average across the field, central tendency gives you a regional/global climate trend. If you average over model realisations, central tendency gives you a local climate trend.

        • bender
          Posted Dec 22, 2010 at 9:56 AM | Permalink

          one broken record
          one dodge

        • pete
          Posted Dec 22, 2010 at 5:48 AM | Permalink

          3. The modelers argue that it is only the ensemble means that have predictive power, not the individual models. So if it is only the ensemble means that are proposed to have a physical meaning, then the modelers are admitting that the individual runs are not likely to represent a real alternative representation of earth’s climate. Therefore, testing using the individual model runs is entirely nonsensical. They’ve already been declared to be unrealistic representations of the climate.

          The ensemble means can be physically interpreted as the deterministic response to exogenous forcings.

          The variability about the ensemble means can be physically interpreted as endogenous internal variation.

          Since the internal variability is inherently unpredictable, there’s no point trying to predict it.

          If I told you that a fair coin flipped a thousand times was going to come up tails 468–532 times, would you disbelieve me because I couldn’t predict the individual coin flips?

        • bernie
          Posted Dec 22, 2010 at 7:42 AM | Permalink

          “Deterministic” as in the Laws of Thermodynamics or the consensus of who is going to win the Super Bowl? Seems to me that given our current understanding the GCM ensemble is closer to the second.

        • bender
          Posted Dec 22, 2010 at 9:58 AM | Permalink

          whereupon pete attempts to establish authority by displaying his impressive understanding of the concept of randomness

          but still dodges the question

        • pete
          Posted Dec 21, 2010 at 11:24 PM | Permalink

          Given a large enough list of GCM runs you will eventually get one that yields all 6 coeffs exceeding the observed values. But we can’t say how many model runs you’ll need since the success rate on that score was 0/55 in this sample.

          This is almost certainly not the appropriate test. If you swap out the first model and put the observations in its place, you’ll get a 0/55 success rate as well.

          Note that if the means of the sampling distributions are ~0, you’ve only got a 1/64 chance of matching the signs, let alone exceeding the magnitude.

          I find the distribution of F scores more interesting. 19/55 — more than a third — are higher than the F score for the observations.

        • bender
          Posted Dec 21, 2010 at 11:31 PM | Permalink

          This is almost certainly not the appropriate test.

          I disagree; I think it is an appropriate test. I’ve made my case why. What’s your case to the contrary?

        • pete
          Posted Dec 21, 2010 at 11:35 PM | Permalink

          See where I said “[t]his is almost certainly not the appropriate test”? If you look carefully, just to the right of that there are some more words. Then below them are a few more. If you read those words they might just answer your question.

        • bender
          Posted Dec 22, 2010 at 8:16 AM | Permalink

          Ohhhh. Is that how writing works? Thanks.

        • bender
          Posted Dec 22, 2010 at 9:50 AM | Permalink

          I find the distribution of F scores more interesting. 19/55 — more than a third — are higher than the F score for the observations.

          So you think an F test is appropriate becasue of the result it “gets”? This is science?

          Sorry, pete, but that’s not how it works. Please explain your a priori logic.

          I don’t even understand what you’re calculating. An F-test is a ratio of two variances. Let’s start from there … what variances? Why? And why is this better than the permutation test that I described? Please try not to channel Gavin’s guru, Tom P.

        • bender
          Posted Dec 22, 2010 at 9:53 AM | Permalink

          Wait a sec. You’re comparing the (spatial) variance of one of the socioeconomic variables to the (spatial) variance of the climate data? And you’re proposing the ratio as a meaningful statistic? Is that what you’re doing? And you expect me to follow that bone-headed calculation as though it were somehow intuitive?

          You tell me right now if you’re Tom P. Because if you are, we’re done.

        • pete
          Posted Dec 22, 2010 at 5:04 PM | Permalink

          I don’t know who Tom P is, but I’ll pretend to be him if it will shut you up.

        • bender
          Posted Dec 22, 2010 at 10:11 AM | Permalink

          This is the second time pete has insisted I guess what’s in his mind, refusing to clarify what he’s written. And the second time I’ve played along.

          pete, I won’t be playing a third time. You want to play like Tom P? Game over.

        • Ryan O
          Posted Dec 22, 2010 at 2:13 AM | Permalink

          Pete, seriously, think about this for a second:

          Out of 55*6=330 socioecon coeffs, in 17 cases the GCMs produced a coeff that exceeded the value estimated on observations, which is 5%. So they are getting a match at about the rate we would expect from Gaussian random guesses.

          Then think about this:

          Note that if the means of the sampling distributions are ~0, you’ve only got a 1/64 chance of matching the signs, let alone exceeding the magnitude.

          Now you are on to F tests.

          These are not the droids you are looking for.

        • pete
          Posted Dec 22, 2010 at 2:21 AM | Permalink

          I know you’ve got a good grasp of stats Ryan, so I’m interested in hearing your objections. But blog comments have low enough signal as it is, without me speculating as to what specific error you think I’ve made.

        • bender
          Posted Dec 22, 2010 at 10:03 AM | Permalink

          For the second time …
          Please explain the logic of your F test. It’s not transparent to me. Seems it’s not transparent to Ryan either.

          Stop obfuscating, demanding other people refute your approach before it’s even been explained.

          My hypothesis is that you like the F test for the result that it “gets”. You didn’t think for more than a half-second about what a ratio of two variances means.

          You are getting further and further from Gavin’s argument. Not closer.

        • Ryan O
          Posted Dec 22, 2010 at 11:03 AM | Permalink

          Until you explain exactly what you did for your F test, the numbers you quote don’t mean much to me. By the way, I think I have an idea of what you did, and, if I am right, that test isn’t telling you what you think it is.

        • Layman Lurker
          Posted Dec 22, 2010 at 12:48 PM | Permalink

          Pete is comparing the F scores from each model run to the F score from a table (for the observations?) marked “use combined data” (immediately underneath the table from the last model run). 19/55 of these F scores from the individual model runs exceed the posted F score of 102.4 in the “use combined data” table. The model F scores are derived from the MS ratios of the model vs residuals for each run. The F score is then obviously compared to the critical F value from the alpha=.05 F table using the appropriate degrees of freedom (posted in brackets for each run). My interpretation of the model run F scores is as a null of variance equivalence between model and residuals where a score which exceeds the critical F value means the null is rejected. I have not deduced how the F score was derived for the “use combined data” table.

          I don’t understand how Pete’s 19/55 comparison would be meaningful.

        • bender
          Posted Dec 23, 2010 at 6:44 PM | Permalink

          F scores from each model run

          Again. How do you get an F score from a model run when an F score is the ratio of two variances? What are the two variances, and what makes them comparable as a ratio? Last chance before this thread dies.

        • pete
          Posted Dec 23, 2010 at 6:55 PM | Permalink

          F = (SS.model/df.model) / (SS.resid/df.resid)

        • pete
          Posted Dec 22, 2010 at 4:49 PM | Permalink

          The F-test is used to test for joint significance. So if we want to know if any of the six socioeconomic covariates are significant we should use something like the F-test.

          t-tests are not as suitable in this sort of case — imagine we had 20 covariates: we’d expect one of them on average to pass the t-test even under the null.

          Of course neither the t-test or the F-test is appropriate in this particular case. The assumptions that lead to the t-statistic or F-statistic following a t-distribution or F-distribution are violated.

          That’s why I want the results from individual model runs: we need a replacement for the t-tables or F-tables so that we can do significance tests.

          Using the Monte Carlo distribution of the F-statistic suggests that the p-value for the observed regression* is about 0.34 — not even close to significant.

          Please note that I said “suggests”! I don’t think that this is necessarily the ‘right’ test to do, but it was the obvious (to me at least) quick-look-at-the-log-file test.

          * Layman Lurker:

          You want to look at the line
          reg cru3v uah4 slp dry dslp water abslat g e p m y c , robust cluster(gdp99)

          to see what the following regression is about. The combined data is the data from MM07 plus S09.

        • Ryan O
          Posted Dec 22, 2010 at 12:27 PM | Permalink

          Pete,

          without me speculating as to what specific error you think I’ve made.

          I wasn’t implying you made an error (though the F test thing is . . . well, not an error per se, but not useful, either).

          You yourself have already answered whatever objections you might have to Ross’s and Nic’s study, although you seem not to realize it yet. You yourself have stated why MN10’s results are likely to be correct, although you have yet to make the connection in your head. This is okay – happens to me a lot – but one thing Nic L. could tell you is that, unless I realize where I’ve gone wrong on my own, I often fail to see my error in reasoning.

          So I don’t plan on being more specific than that. Reread all of your posts, from beginning to end, and think about how the things you already have realized render your objections to MN10 largely moot.

        • pete
          Posted Dec 22, 2010 at 5:24 PM | Permalink

          Pete, seriously, think about this for a second:

          Out of 55*6=330 socioecon coeffs, in 17 cases the GCMs produced a coeff that exceeded the value estimated on observations, which is 5%. So they are getting a match at about the rate we would expect from Gaussian random guesses.

          So, converting this to a two-tailed test, Ross’s results aren’t quite significant at 90%.

          Then think about this:

          Note that if the means of the sampling distributions are ~0, you’ve only got a 1/64 chance of matching the signs, let alone exceeding the magnitude.

          This means Ross’s test has a Type I error rate of at least 98%, compared to the usual 5%.

          You yourself have stated why MN10′s results are likely to be correct, although you have yet to make the connection in your head.

          You want me to speculate about what you mean here. But the way I’d do this is to start with the simplest obvious explanations, and rule them out before proceeding further. But the simplest explanations are slightly insulting to you. Like did you not see the distinction between sign-matching one coefficient and sign-matching six? Or did you think that the 1/64 chance of sign-matching was evidence for Ross’s theory, even though the same holds for any six random coefficients?

          Much easier if you’d just tell me. I’ve got xmas shopping to do.

        • bender
          Posted Dec 22, 2010 at 11:39 PM | Permalink

          I’ve got xmas shopping to do.

          gotta know when to fold ’em.

        • Ryan O
          Posted Dec 22, 2010 at 8:45 PM | Permalink

          Pete,

          This will be my last post on this subject. I, too, have other things to do.

          But the way I’d do this is to start with the simplest obvious explanations, and rule them out before proceeding further.

          Hardly. The simplest obvious explanation is UHI. You have decided a priori that UHI is not the answer and are engaged in a series of mental gyrations to justify your belief.

          Now, more to the point:

          You do realize that the results in Ross’s log file are not corrected for SAC, right? Now what would you have said if Ross presented observational results that were not corrected for SAC? Yet you have no reservations about quoting uncorrected results when they suit your purposes. This does not seem to be terribly objective.

          Signs. As I stated at the beginning, I think a 2-tailed test is more proper. But it’s not the same thing as your 2-tailed test, and I will explain.

          Let’s do a thought experiment. Let’s take 3 balls . . . one massive one and two much less massive ones. Place them at fixed points in space. Impart velocities to the 2 small ones that are random in magnitude and direction. Observe.

          During the test, you notice that the two small balls seem to accelerate toward the big ball regardless of initial trajectory. You hypothesize that this is due to a previously unknown force that you call “gravity”. You want to publish your results.

          Someone comes along and says that calculating the probability the effect is real by simply looking at the error variance in the trajectory is not a good null. They propose a trajectory model, which incorporates things like instrument error and space dust winds and naturally-occurring magnetic field perturbations (but none of that hoaky gravity crap or godawful big balls) as a better null. They further claim that they can reproduce your results in some not-insignificant percentage of the model runs.

          As evidence, they show that in ~10% of the runs, the magnitude of the error in the measured trajectory of the two small balls equals or exceeds that of your test.

          Case closed?

          Pete would seem to think so.

          Now you decide to look at their claims a bit further. Remember, in your test, the trajectory error was the same (within experimental error) for both small balls, and the direction vector always pointed toward the big ball. But in the model runs, though the distribution of trajectory errors included the values you found in your test (between the 90 and 95% 2-tailed CIs), the vectors were random and in only 10% of the cases did the small balls move in the same direction (within experimental error).

          Note that in order for the model to have simulated your result of a force between the big and small balls, the small balls must move in the same direction.

          In other words, the test only duplicated your results 0.1 * 0.1 . . . or 1% of the time.

          So Pete, I’ll give you that the signs could be either positive or negative. Bender won’t, and Ross won’t, but I will. However, that comes with a caveat: the combination of the signs must be consistent. Otherwise, the GCMs are only duplicating one element of the observations – magnitude. In order to show that the GCMs generate the same pattern as what is seen in the observational data, they must also duplicate the direction vector, which in this case means the correlations must be internally consistent with the relationship displayed by the socioeconomic factors.

          If the socioeconomic factors were unrelated, the consistency requirement is no longer present. In that case, they have no defined relationship to each other, so there is nothing requiring them to demonstrate any particular pattern of correlations. But in Ross’s case, the socioeconomic factors are almost interchangeable in some cases (you yourself noted the high correlation between two of them). The factors are essentially proxies for human land and energy usage, and are physically coupled.

          So sure, I’ll give you the sign argument – with the caveat that the result is consistent with the underlying physical mechanism. And in order for you to have an argument, you must demonstrate that the joint probability of the factors showing up in a single model run with the correct relationships to each other exceeds whatever significance benchmark you propose.

          This means Ross’s test has a Type I error rate of at least 98%, compared to the usual 5%.

          No. Ross’s test has a Type I error rate of about 10% if I grant you the sign issue; 5% otherwise.

          Your test has a Type II error rate of 1 – 1/32, or about 97%.

        • Salamano
          Posted Dec 22, 2010 at 11:01 PM | Permalink

          Ryan,

          You’re probably not following this thread anymore, but I wanted to say that I felt I was able to understand what you’re talking about all the way through (which is a common experience with your posts).

          I’m still a little confused as to why UHI gets to vanish within the noise or is accounted for by variated ensemble runs, but something like temperature proxies of a prior millenium are guaranteed to rise to the surface (to within some .1 degC accuracy), all despite the ability to run multiple ensembles that can capture similar scenarios.

          Is it something where they both can be right, or both wrong, but not one right and the other wrong, unless by decree?

        • bender
          Posted Dec 22, 2010 at 11:38 PM | Permalink

          The simplest obvious explanation is UHI. You have decided a priori that UHI is not the answer and are engaged in a series of mental gyrations to justify your belief.

          Yes, exactly. And yet he has failed to even say the phrase. Gavin’s alternative hypothesis amounts to mysticism.

          I asked Ross a year ago if he trolled through thousands of variables to come up with his magic subset (thus increasing his risk of detecting a spurious correlation), and he said he didn’t. Which means Gavin’s mysticism is pretty unlikely. Whereas Imhoff’s UHI is a relatively sure bet.

          pete, again, did you like Imhoff’s paper?

        • pete
          Posted Dec 22, 2010 at 11:44 PM | Permalink

          This will be my last post on this subject. I, too, have other things to do.

          Fair enough.

          You do realize that the results in Ross’s log file are not corrected for SAC, right? Now what would you have said if Ross presented observational results that were not corrected for SAC? Yet you have no reservations about quoting uncorrected results when they suit your purposes. This does not seem to be terribly objective.

          If I had a copy of Stata handy I’d have run the SAC-corrected ones myself. Since I don’t I’ll have to make do with what Ross gave me. (Although if someone has Davidson and MacKinnon handy, it’d be useful to know if ‘robust’ refers to FGLS and what correction Stata uses for clustering.)

          A few people seemed to think I should have been satisfied when the first log file came out and appeared to show that the range of coefficients from the model runs wasn’t large enough.

          Now I’m saying that the updated log file suggests that the distribution of coefficients from individual runs might be interesting. I don’t think that’s a leap too far.

          Let’s do a thought experiment. Let’s take 3 balls . . . one massive one and two much less massive ones. Place them at fixed points in space. Impart velocities to the 2 small ones that are random in magnitude and direction. Observe.

          During the test, you notice that the two small balls seem to accelerate toward the big ball regardless of initial trajectory.

          And in order for you to have an argument, you must demonstrate that the joint probability of the factors showing up in a single model run with the correct relationships to each other exceeds whatever significance benchmark you propose.

          In your gravity case we know what the correct relationship is — the small balls should be heading in the same direction (towards the larger one).

          We’ve got no a priori correct sign (or pattern of signs) in the MN10 case. You can’t just define the results of Ross’s regression to be the ‘correct relationship’.

          If the socioeconomic factors were unrelated, the consistency requirement is no longer present. In that case, they have no defined relationship to each other, so there is nothing requiring them to demonstrate any particular pattern of correlations. But in Ross’s case, the socioeconomic factors are almost interchangeable in some cases (you yourself noted the high correlation between two of them).

          I see your point here, I was implicitly assuming that the covariates were unrelated (which they ought to be).

          Looking a bit more carefully, one of p, m, and y really needs to be dropped from the regression.

        • bender
          Posted Dec 22, 2010 at 11:56 PM | Permalink

          We’ve got no a priori correct sign (or pattern of signs) in the MN10 case. You can’t just define the results of Ross’s regression to be the ‘correct relationship’.

          Yes, you can because the a priori mechanistic hypothesis, from IPCC no less, is UHI contamination. The hypothesis dates back to 1989 (Parker) if not 1973 (Oke) if not ~1800. That’s what I call “a priori”.

        • pete
          Posted Dec 22, 2010 at 11:59 PM | Permalink

          Does UHI predict negative coefficients on education and gdp growth?

        • bender
          Posted Dec 23, 2010 at 7:45 AM | Permalink

          I wouldn’t think so, pete.

          But I am pretty sure I stated that N&M isn’t going to be the last word on the issue. And I’m pretty sure Ross would agree. And I’m pretty sure I mentioned Imhoff once or twice.

          Mosher says:

          People would take notice of Ross’ result and try to formulate a regression that had a stronger physical basis.

          Which expresses exactly my sentiment. It’s a notable result – but not a final result – and none of your tinkering is going to overturn it.

        • bender
          Posted Dec 23, 2010 at 8:04 AM | Permalink

          education – Maybe educated societies don’t pave over everything in sight? Maybe industrialized countries aren’t as educated as people who live in them think? I dunno.

          gdp “growth” – is that rate of change in gdp? Because if it is, maybe the fastest growing areas may be areas where there has been less growth in the past. So they’re UHIs of the future?

          Ross may want to comment on this one. I’m no socioeconomist, so I don’t have a lot to say.

          The relationship could be spurious too. Just as with the strong positive ones – only less likely.

        • Steven Mosher
          Posted Dec 23, 2010 at 2:02 AM | Permalink

          Re: pete (Dec 22 23:44),

          The a priori hypothesis of UHI being positive has been confirmed repeatedly by observational climate science.
          It’s a tenet of climate science. It’s the result of understood
          physical processes. We understand it so well that we can even model the UHI for Cities. It understood that the bias is positive. Hence the effort to “green” cities to diminish death from heat waves which are enhanced by UHI.

          The question has always been
          does this bias infect the record. When Peterson and Jones and parker found no bias they postulated ( but did not test) the following:

          UHI does not infect the record ( AS PETERSON EXPLICITLY EXPECTED IT TO) because ( they postulated) the temperature stations are located in the tiny pockets of cities called ‘cool parks.”

          Well now we have the imagery products to actually see if stations are located in cool parks. We also have the imagery products to understand the differences in the following between built areas and unbuilt areas:
          1. Water content of the surface/soil
          2. Amount of impervious surface.
          3. Building height.
          4. Electrification
          5. Land use/land cover/canopy.
          6. evapotranspiration measures.
          7. irrigation data.

          Imhoff’s work is getting closer to capturing the variables that would go into a physically “realistic” regression.
          As it stands I find Ross’s regression interesting, ideally I’d like to see something that had more physically based independent variables. That is how this would normally proceed. People would take notice of Ross’ result and try to formulate a regression that had a stronger physical basis.

          Finally, in Jones paper where he pegged the number at .05C, he took notice of the literature ( accepted climate science literature) that put the range between 0C and .3C. Clearly, Ross’s figures are not out of bounds with respect to the very literature than Jones took notice of. Ross’ work is not at odds with accepted climate science. It is at odds with TEAM SCIENCE which overstates its certainty that there is no UHI effect in the record. And the effort to fight any adjustment whatsoever has led the team to do silly things like deny willis data and mess about with peer review process. As bender suggests, now they got a fight with Imhoff

        • Posted Dec 23, 2010 at 10:14 AM | Permalink

          Steve the challenge will be not only finding “snapshots” of those variables, but finding repeated measures over multidecadal time spans. We are trying to compare changes in T to changes in explanatory variables.

        • Posted Dec 23, 2010 at 10:02 AM | Permalink

          Regarding the rationale for the socioeconomic variables, including educational attainment, we discussed this in MM04:

          Proposing a causal relationship between national socio-economic conditions (such as income and literacy) and the quality of local meteorological data requires justification. A possible mechanism by which economic activity, and attendant land-use changes, affects measures of sensible heat is an induced change in the local Bowen ratio (e.g. Friedrich et. al. 2000, Pielke Sr. et. al. 2003). Other mechanisms might include changes to local atmospheric chemistry from air pollution. Data quality can also be affected by economic conditions. Climate stations are costly to construct, maintain and operate. In the US, this has at times required a full time national staff of over 450 trained personnel (Linacre 1992, p. 31). Meteorological equipment must be kept in good working order, with recommended inspections once a week or immediately after a severe weather disturbance, as well as full maintenance and calibration twice a year, with immediate repair or replacement of defective instruments (Environment Canada 1992). In much of the world, the resources needed to attain these standards would be considered a “luxury.” Since public resources are required, the quality of station data is not independent of general economic conditions. …
          In countries with relatively low educational attainment, skilled labour is scarce and hence its real cost is higher than in countries with advanced educational systems. This constrains the ability of a national meteorological service to hire and retain technical staff for data collection and managing meteorological equipment. While there is no a priori reason to assume this will bias temperature trends up (or down), it could lead to a warming bias if non-urban stations become under-used, if Stevenson Screens are allowed to discolor, or if cold weather events interfere with data collection more often than warm events. An observed spatial pattern of published surface trends and the spatial pattern of educational attainment does not imply anything about the competence of individuals who look after the meteorological instruments, instead the concern is that economic conditions impose a constraint on the overall quality control process.

          And confirming bender’s point: the selection of socioeconomic variables was done prior to any econometric work. I gave the list to my RA and what he found is what I used.

          Pete says:

          If I had a copy of Stata handy I’d have run the SAC-corrected ones myself.

          Perhaps you’re much better at programming than me, but I described below what would be required, and it would take me months to do it, if it even turns out to be possible. Nevertheless, if you think it needs doing, just buy a copy of Stata.
          Otherwise we’re back to the pattern of criticisms based on calculations that nobody has actually done.

        • Nicolas Nierenberg
          Posted Dec 22, 2010 at 7:01 AM | Permalink

          It is important to note that the results that Ross quickly put up do not correct for SAC. It seems likely to me that they will all lose significance when this correction is made, but I haven’t tried it.

        • bender
          Posted Dec 22, 2010 at 10:08 AM | Permalink

          I wager otherwise.

          But Nicolas. Suppose you do correct for SAC (I outlined a Monte Carlo subsmapling scheme to do this) and the result is still significant. Will this satisy somebody like Gavin – who claims that Ross’s result is spurious REGARDLESS of the “significance” level – because earth’s weather is just one realization of a stochastic process, and I can generate that realization if I run my model enough times?

          Your pattern analysis is not definitive. It is suggestive. And that’s as far as you can go with correlative approaches. After that, it’s up to Imhoff’s team to finish the job.

          That’s, ultimately, why pete & Gavin should be replying to Imhoff, not picking nits over N&M.

        • Posted Dec 22, 2010 at 10:42 AM | Permalink

          I think the discussion is into seriously diminishing returns at this point, and I am behind in my Christmas baking and decorating, so unless I start seeing something more compelling than grasping at straws I might not put much time into the remaining thread.

          Pete: surely, in the context of a multiyear debate over the role of SAC in these regressions, you don’t expect us to take seriously an observation about these F scores. As Nico said: sheesh!, (or words to that effect).

          But even if they were SAC-corrected you have to do the estimations imposing sign restrictions if you want to use the joint F scores for the purpose you are referring to. Otherwise you can’t distinguish between cases where a high F score indicates “good” and “way bad” outcomes.

          Finally you’re flitting back and forth between two distributional concepts that are incompatible with each other. The regression model takes the GCM input as representative of the explanatory power of a class of GCMs and constructs a random term using the residual unexplained by the independent variables and the SAC error weighting model. Out of that we get a distribution of the slope coefficients. The library-of-runs approach generates a distribution of coefficients by using individual runs of GCMs with no claim that each one is representative of the explanatory power of any class, instead it is the library as a whole that is being tested. But the residual sum of squares in any one regression does not yield a meaningful “unexplained” variance term since the right hand side variables in any one regression do not represent the “explanatory power” of the modeling concept being tested. Hence the sum or average of F scores is meaningless for your purpose. Each one is generated by only one of (in this case) 55 GCM runs you are taking to represent the explanatory model. To get your F score you would have to come up with a way of doing the estimations jointly under sign restrictions and constructing a multivariate SAC-corrected CI ellipsoid. If such a computation were even possible (since each model will have different optimal distance weights for the W matrix) it would be extremely difficult.

          Given the evidence available from the regressions already shown, it’s pretty clear that the distribution of model-based coeffs, whether computed one way or t’other, lies far enough from the distribution of the observation-based coefficients to justify treating them as distinct processes.

          Land surface data contamination is now, so to speak, the heliocentric model. It’s simple, plausible, and explains things nicely. You’re proposing new and ever-more elaborate epicycles as an alternative. We know where this argument ends up.

        • bender
          Posted Dec 22, 2010 at 10:59 AM | Permalink

          Land surface data contamination is now, so to speak, the heliocentric model. It’s simple, plausible, and explains things nicely. You’re proposing new and ever-more elaborate epicycles as an alternative.

          Parsimony last refuge for desperate skeptics.

        • pete
          Posted Dec 22, 2010 at 6:18 PM | Permalink

          But even if they were SAC-corrected you have to do the estimations imposing sign restrictions if you want to use the joint F scores for the purpose you are referring to. Otherwise you can’t distinguish between cases where a high F score indicates “good” and “way bad” outcomes.

          I disagree with the need for sign restrictions. If the correlations are spurious there’s no reason to expect them to be one sign or the other.

          The regression model takes the GCM input as representative of the explanatory power of a class of GCMs

          The regression doesn’t use the GCMs to explain anything. You’re using socioeconomic variables to explain the variance in the GCMs.

          If the socioeconomic variables give you an explanation for the GCMs, then that explanation must be spurious.

          And so if your explanation of the observed trend field look like the spurious explanations of the GCMs, then that explanation might be spurious too.

          Given the evidence available from the regressions already shown, it’s pretty clear that the distribution of model-based coeffs, whether computed one way or t’other, lies far enough from the distribution of the observation-based coefficients to justify treating them as distinct processes.

          I’ve had a look at a set of scatterplots (i’th coefficient versus j’th coefficient from the regressions in your log file) and based on that I have to say I’m not convinced that the observed coefficients are ‘far away’.

          (I also noticed from the scatterplots that the high correlation (0.95) between m and y (gdp growth and income growth) seems to be causing some identification problems.)

        • bender
          Posted Dec 22, 2010 at 8:31 PM | Permalink

          I disagree with the need for sign restrictions. If the correlations are spurious there’s no reason to expect them to be one sign or the other.

          The reason for the sign restriction is – for the umpteenth time – because of Gavin’s claim: that positive correlations between regions of economic growth and high temperature can arise by random chance alone. Negative correlations count AGAINST Gavin, NOT for.

          What did Ross say about diminishing returns on this conversation?

        • pete
          Posted Dec 22, 2010 at 11:56 PM | Permalink

          Where did Gavin make this claim about positive correlations?

        • RuhRoh
          Posted Dec 23, 2010 at 1:21 AM | Permalink

          Pete;
          Are you the kind of guy who ‘wins’ by always making the last utterance?
          RR

        • bender
          Posted Dec 23, 2010 at 10:50 AM | Permalink

          Please don’t taunt. pete has made some valuable observations (three that I’m willing to back him on), even if his argument in general is slanted. He’s doing what a critical reviewer is supposed to do – pick away at tiny issues that could be of some concern. He hasn’t got anything yet of substance, but at least he is staying relatively focused.

          You can’t fault a guy for replying when others are faulting him for not replying.

        • bender
          Posted Dec 23, 2010 at 7:57 AM | Permalink

          Don’t mince words. He protested the *positive* correlations that Ross found. Where? In the literature. Whether he actually used the word “positive” is immaterial. He protested the result.

        • RuhRoh
          Posted Dec 23, 2010 at 12:07 PM | Permalink

          Bender;
          I appreciate your prompt, polite, well-reasoned guidance to me.

          To the extent my post was ‘mind-guarding’, your response was prompt and effective ‘anti-groupthink’. YAMBFB *.

          I regret my ham-handed, ‘tricky’ interjection into an otherwise fascinating discussion.

          I now revert to turkey-lurkey mode…
          RR
          *(Yet another merit badge for Bender…)

        • pete
          Posted Dec 23, 2010 at 5:42 PM | Permalink

          You’re misrepresenting Gavin’s argumnent.

          http://www.realclimate.org/index.php/archives/2009/02/on-replication/comment-page-2/#comment-112190

        • bender
          Posted Dec 23, 2010 at 6:33 PM | Permalink

          I think those relationships are spurious. They aren’t really significant (despite what the calculation suggests) and therefore whether they are positive or negative is moot. If anyone thinks otherwise, they have to explain why models with no extraneous contamination show highly significant correlations with economic factors that have nothing to do with them, and which disappear if you sub-sample down to the level where the number of effective degrees of freedom is comparable to the number of data points used

          This is his argument verbatim. So please do not dream of accusing me of *willful* misrepresentation. If there is any “misrepresentation” (and I don’t think there is) it is inadvertent.

          His argument is that N&M’s correlations – which are (by and large, with noted exceptions, thanks) positive – are spurious. This is why the negative count against Gavin NOT for him. He’s not commenting on patterns in general. He’s commenting very specifically on N&M’s correlations.

          Can we admit I’m right, finally, and drop this?

        • bender
          Posted Dec 23, 2010 at 6:35 PM | Permalink

          pete,
          Where did Gavin do the sub-sampling that he describes in his argument? Or was it just another elaborate thought experiment of his?

        • pete
          Posted Dec 23, 2010 at 6:42 PM | Permalink

          S09, second-to-last paragraph.

        • bender
          Posted Dec 23, 2010 at 6:45 PM | Permalink

          Thx

        • pete
          Posted Dec 23, 2010 at 6:50 PM | Permalink

          So would positive coefficients on education and gdp growth count (the opposite signs to Ross’s regressions) count for or against Gavin?

        • bender
          Posted Dec 23, 2010 at 6:51 PM | Permalink

          The article is behind a paywall and the link that real climate provides has rotted.

        • bender
          Posted Dec 23, 2010 at 6:56 PM | Permalink

          So would positive coefficients on education and gdp growth count (the opposite signs to Ross’s regressions) count for or against Gavin?

          That’s a fair question, pete. I’ve already confessed to an inability to intperpret those relationships, and have explained why. If there is a straightforward interpretion that is consistent with urban/industrial/asphalt UHI/station quality, then they would count against Gavin. Otherwise, yes, I think they weaken Ross’s argument. But with the information at hand I honestly can’t say.

        • bender
          Posted Dec 23, 2010 at 6:58 PM | Permalink

          pete, are gavin’s GCM correlations on those variables (educ & gdp growth) positive?

        • pete
          Posted Dec 23, 2010 at 7:05 PM | Permalink

          http://pubs.giss.nasa.gov/authors/gschmidt.html

        • pete
          Posted Dec 23, 2010 at 7:08 PM | Permalink

          On Gavin’s 5 GCMs the education coefficients are negative, although for the set of 55 it’s half-and-half each way.

        • pete
          Posted Dec 23, 2010 at 7:23 PM | Permalink

          And probably worth ignoring the coefficients on gdp growth and income growth until the multicollinearity between those two variables and population growth is corrected.

          Ross: it’s just occurred to me that the correct variable to drop is y (since p & m are closer to orthogonal than p & y), but it’s y that has the inconvenient negative coefficient. Therefore dropping y might open you up to criticisms of cherry-picking. If anyone makes that criticism, you can point them to this comment.

        • bender
          Posted Dec 23, 2010 at 7:25 PM | Permalink

          If we sub-sample the data so that the number of samples is closer to the dof indicated above, the significance of correlations to ‘g’ and ‘e’ become marginal or disappear depending on the subsample. Notably, the correlation to ‘g’ is very fragile disappearing completely even with a sub-sampling of 1 in 4 points.

          Perhaps correls with g and e are spurious and the others are effectively proxies for UHI?

        • bender
          Posted Dec 23, 2010 at 7:27 PM | Permalink

          pete, start a new chain to continue discussion.

        • Posted Dec 23, 2010 at 10:10 AM | Permalink

          The regression doesn’t use the GCMs to explain anything. You’re using socioeconomic variables to explain the variance in the GCMs.

          GCM data are on both the right and left side of the estimating equations. The socioecon coeffs only get to explain the portion of the surface trend field not explained by the lower tropospheric trend field, and the fixed geographic variables. This is the set-up for both observational and GCM runs.
          Also the results in Table 7 show the results after subtracting the GCM trends from the observed trends, which is equivalent to inserting the surface GCM trends on the rhs with a coefficient of 1.

    • Posted Dec 21, 2010 at 2:37 PM | Permalink

      Re: Ross McKitrick (Dec 20 13:31),

      Then we were all told by Pete that Ross hadn’t understood Pete’s criticisms, and even though Pete didn’t show appreciation of Ross’ core argument, Pete seems sure it meant Ross is ALL WRONG and therefore he can ignore the results. So everyone tested for troll effects and re-did the explanations, showing it didn’t affect the conclusions, but the results were still ignored.

  49. pete
    Posted Dec 20, 2010 at 8:19 PM | Permalink

    When Pat and I published MM2004, I was told that by a blogger that there was a cosine error and even though they didn’t do the calculations they were sure it meant I’m ALL WRONG and therefore we’ll all ignore the results. So I fixed the calculations and published a correction showing the conclusions stayed the same, but the results were still ignored.

    Which blogger are you talking about here? I remember Tim Lambert pointing out the cosine error, and he did in fact do the calculations.

  50. Kenneth Fritsch
    Posted Dec 21, 2010 at 10:05 AM | Permalink

    “Which blogger are you talking about here? I remember Tim Lambert pointing out the cosine error, and he did in fact do the calculations.”

    Did he do the calculations that would show what effect the change in the cosine calculation would make on the final results/conclusions? And if he did, what did he find?

    • theduke
      Posted Dec 21, 2010 at 10:26 AM | Permalink

      FWIW: http://scienceblogs.com/deltoid/2004/08/mckitrick6.php

    • Posted Dec 21, 2010 at 11:28 AM | Permalink

      Consequently, every single number he calculates is wrong.

      What the recalculations showed was much less dramatic than that, which is why we presented the before and after results in side-by-side format in the correction. Lambert didn’t produce calculations to support his hysterical claim, he produced calculations that showed our conclusions were still significant at 0.0004%. But years later others were repeating his headline claim and still not producing any calculations to back it up.

      Also note in the link theduke presented, that Lambert goes on to claim that the results are ALL WRONG(tm) because the error terms are not clustered… but no calculations.

  51. Steven Mosher
    Posted Dec 22, 2010 at 1:01 AM | Permalink

    here bender

    http://www.newton.ac.uk/programmes/CLP/clpw04p.html

    • bender
      Posted Dec 22, 2010 at 8:19 AM | Permalink

      Those guys aren’t allowed on the team. Maybe pete can go debate them.

  52. pete
    Posted Dec 23, 2010 at 7:37 PM | Permalink

    pete, start a new chain to continue discussion.

    Luckily I’m not reading this on ny netbook, where the over-nested comments come out looking like modern poetry.

    Perhaps correls with g and e are spurious and the others are effectively proxies for UHI?

    I think he’s talking about MM07’s G3 model — that’s the one that restricts the socioeconomic variables to g. e, and x, with x (number of missing months) being inapplicable to the GCM case. So in this case there aren’t any others.

    • oneuniverse
      Posted Dec 23, 2010 at 8:43 PM | Permalink

      pete: “Statistical significance requires some sort of null model. If not GCMs then what?”
      Ross McKitrick earlier: “There has to be a statistical argument why your benchmarking process corresponds to the null hypothesis.”

      GCMs are usually considered to be our best attempts at modelling the climate system, but that alone doesn’t make them appropriate nulls.
      What is the specific argument?

      • HAS
        Posted Dec 23, 2010 at 9:12 PM | Permalink

        Well perhaps we could start by demonstrating that the output of GCMs is independent of the socioeconomic variables … whoops no that can’t be right :).

        If your null hypothesis is that the gradient of the measured surface temperatures are independent of population growth (for example) this has nothing to do with climate or climate models per se and there are any number of ways to test it.

        • oneuniverse
          Posted Dec 28, 2010 at 12:51 PM | Permalink

          HAS, please share if possible?

          Were I unaware of my naivety in statistics, I’d propose rerunning the MN10 analysis many times, each time remapping the socioeconomic data grid-cells (so that Grid i ends up with the socio data from Grid j, i =/= j). For the mapping function, I’d consider random permutations, and as separate tests, translations & rotations, to preserve the spatial structure of the socio data.

        • HAS
          Posted Dec 28, 2010 at 3:13 PM | Permalink

          What MN10 has identified in course grids across the globe is a statistically significant correlation between the land surface temperature gradient and satellite temp gradients plus some soci-economic variables. This is over a period of time limited by the availability of satellite observations.

          In my mind the logical next step in investigating this relationship would be to start to test it at higher levels of resolution, preferably getting down to the level where individual unhomogenized observations start to be reflected in the experiments. That way one starts to get greater granularity in your understanding of the phenomena and potentially what’s causing it all.

        • HAS
          Posted Dec 28, 2010 at 4:27 PM | Permalink

          That should of course be “coarse”.

      • Layman Lurker
        Posted Dec 24, 2010 at 12:03 AM | Permalink

        I can see the logic for use of the models as null. However the equivalence of the models and the observations would be a necessary condition for validity. I believe the track record for models at reproducing observed spatial variability is poor. I think Ryan intends to write an article just from their experiences with their paper regarding spurious spatial correlations (due to autocorrelation) being interpreted as signal. There are many other ways where nonequivalence could be demonstrated. Kenneth Fritsch has been doing some work at tAV in the last several weeks looking at distance correlations of proxies vs observations. As one might expect these comparisons show major differences. One would have to add progressively more white noise to the observations to degrade the observed distance correlations down to the level of the proxies. While the proxies are not the models, it shows that any differences in noise assumptions of models vs observations would kill the notion of models as a valid null for Ross’s test.

  53. Geoff Sherrington
    Posted Dec 23, 2010 at 9:50 PM | Permalink

    Steven Mosher,

    Last night I listened to Tim Palmer spend 72 minutes and one muffled objection lecturing on “After Climategate & Cancun; What Next for Climate Science?”

    http://www.newton.ac.uk/programmes/CLP/clpw04p.html

    Thank you for the reference.

    Stripped of the wordsmithing and details validly needed to give context, I was left with the impression that his overall plan was a close resemblance to the following set of pictures, whose art has been around for decades.

    The 4 images show progressive ease of handling, especially the final compression set that Palmer was keen to promote.

    Please note that

    (a) the final result is replicable by anyone given the same starting set. Note for inquiring commissioners who replicate.

    (b) the final result is bland and is different in hue because of the simplifications imposed. Lesson – if you make GCMs more compressed, you might end up with a different colour to your outcome.

    (c) by step 4, the simplification is starting to add noise where no noise existed before. Lesson, as above.

    (d) Step 2, dealing in an elementary way with stochastic considerations, can remove useful information from the data.

    (e) While 5 blue colour bands are clearly defined at the start of the cartoon, the same cannot be said of various GCMs. Note the Palmer point that most model runs that are detonated before planned orbit are detonated soon after launch. Wrong choice of initial conditions?

    I apologise to readers that this will seem gibberish unless you listen to the lecture.

    • Posted Dec 24, 2010 at 4:33 AM | Permalink

      Thanks Geoff and Steve Mosher for the many references from AGU and Newton Institute for (potential) perusal over the Christmas break. I’ll try and take in this Palmer presentation especially. Have a good break, ye CA editors and lurkers.

  54. nassim
    Posted Dec 26, 2010 at 2:55 AM | Permalink

    Most of you clowns don’t work fat-tail statistics so you are completely misguided and lost. Nature does not conform to your wishes of a normal universe.

    • HAS
      Posted Dec 26, 2010 at 4:07 AM | Permalink

      That’s interesting – what tests do you think M&N should have used on their model to take account of this?

      • theduke
        Posted Dec 26, 2010 at 11:16 AM | Permalink

        Just a guess:

        http://www.huffingtonpost.com/nassim-nicholas-taleb/my-letter-addressing-the_b_270737.html

        • oneuniverse
          Posted Dec 28, 2010 at 2:40 PM | Permalink

          It could be – Taleb is known for his somewhat self-aggrandizing ad hominem statements (this is not a comment on the validity of his work). He also tends to repeat himself about Gaussian and fat-tailed distributions, a subject not unknown to others.

          Taleb has also said: you never win an argument until they attack your person
          nassim: Most of you clowns don’t work fat-tail statistics

          Is nassim assuming a normal distribution of fat-tail cognizant clowns amongst CA readers?

    • MrPete
      Posted Dec 27, 2010 at 5:18 PM | Permalink

      Re: nassim (Dec 26 02:55),
      You’re a smart guy, Nassim. So, have you really looked into climate data, probability and statistical analysis?

      What’s your take on the finding that key paleoclimate tree ring proxies are rooted in measurements with wild excursions?

      Re-post of “Tamino and the Magic Flute”


      and

      More on Almagre Tree 31

      I’m certainly no stats expert. Yet… isn’t it a bit difficult to support fat-tailed hypotheses for climate, when uncertainty about paleoclimate is so huge?

      It’s one thing to consider potentially high risk. But in this case, the real issue seems to be the potential for natural variability to be much larger than is considered politically correct at present.

      I’m glad you’re so certain about who is misguided, lost and a clown. Please feel free to shed light on how to exit the circus tent. I’m sure this blog’s proprietor will be quite happy to provide you with a forum. 🙂

      • theduke
        Posted Dec 29, 2010 at 11:49 AM | Permalink

        crickets . . .

        A drive-by poster afraid of standing his ground?

    • MikeP
      Posted Dec 29, 2010 at 10:31 AM | Permalink

      I agree Nassim, fat tails are a growing problem in America. You’ve probably been blinded by living in an idealized statistical world.

    • Posted Dec 29, 2010 at 12:22 PM | Permalink

      Re: nassim (Dec 26 02:55), Being away for Christmas I missed this one. If this was the real Taleb, shouldn’t we encourage a rather deeper interaction? Dr K, Kolgomorov and Mandelbrot would seem useful places to start.

    • Posted Dec 29, 2010 at 2:09 PM | Permalink

      I suspect this is a poser. Either it’s someone posing as the one-hit wonder of the financial crisis hit parade, or it’s Taleb himself posing as someone who knows the first thing about these issues.

      • Posted Dec 29, 2010 at 8:22 PM | Permalink

        Ha. Sunday Times hack Bryan Appleyard counts Taleb as a mate – along with thinkers like John Gray, James Lovelock and Roger Scruton. Niall Ferguson is another who’s pontificated alongside the guy on the credit crunch – at Davos I think. It might help if he took a sensible view of AGW – albeit marginally.

      • MrPete
        Posted Dec 29, 2010 at 10:19 PM | Permalink

        Re: Ross McKitrick (Dec 29 14:09),
        From his (hidden) email address, I suspect a poser. The email address is completely invalid and unrelated to his advertised email on his home page.

        I recommend ignoring this troll. If Nassim Taleb actually wants a serious discussion, he knows how to reach any of the primary contributors to CA.

2 Trackbacks

  1. […] One more article of the team rebutted. Read through the tactics used by the team and IJOC for this article which ultimately had to be re-submitted and published in another journal. The team never learn and their tactic is always to lie, obfuscate, deny and cheat. McKitrick and Nierenberg 2010 Rebuts Another Team Article Climate Audit […]

  2. By Top Posts — WordPress.com on Dec 15, 2010 at 7:05 PM

    […] McKitrick and Nierenberg 2010 Rebuts Another Team Article McKitrick and Nierenberg 2010, rebutting Schmidt 2009 is in press at the Journal of Economic and Social Measurement. […] […]