Surface Stations

People have quite reasonably asked about my connection with the surface stations article, given my puzzlement at Anthony’s announcement last week. Anthony described my last-minute involvement here.

As readers are probably aware, I haven’t taken much issue with temperature data other than pressing the field to be more transparent. The satellite data seems quite convincing to me over the past 30 years and bounds the potential impact of contamination of surface stations, a point made in a CA post on Berkeley last fall here. Prior to the satellite period, station histories are “proxies” of varying quality. Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from 1979-2008, significantly higher than their GLB land trend of 0.173 deg C/decade. Over land, amplification is negligible.

Anthony had asked me long ago to help with the statistical analysis, but I hadn’t followed up. I had looked at the results in 2007, but hadn’t kept up with it subsequently.

When Anthony made his announcement of big news, I volunteered to check the announcement – presuming that it was something to do with FOIA. Mosher and I were chatting that afternoon, each of us assigning probabilities and each assigning about a 20% chance to it being something to do with the surface stations project.

Anthony sent me his draft paper. In his cover email, he said that the people who had offered to do statistical analysis hadn’t done so (each for valid reasons). So I did some analysis very quickly, which Anthony incorporated in the paper and made me a coauthor though my contribution was very last minute and limited. I haven’t parsed the rest of the paper.

I hadn’t been involved in the surface stations paper until after his announcement though I was familiar with the structure of the data from earlier studies.

I support the idea of getting the best quality metadata on stations and working outward from stations with known properties, as opposed to throwing undigested data into a hopper and hoping to get the answer. I think that breakpoints methods, whatever their merits ultimately demonstrate, need to be carefully parsed and verified against actual data with known properties (as opposed to mere simulations where you may not have thought of all the relevant confounding factors). To that extent, Anthony’s project is a real contribution, whatever the eventual results.

It seemed to me that random effects methodology could be applied to see the impact on trends of the various complicating factors – ratings category, urbanization class, equipment class. (Using the grid region as a separate random effect even provides an elegant way of regional accounting within the algorithm.) This yielded apparent confirmation in expected directions: a distinct effect for urbanization class in the expected direction; of ratings in the expected direction; and of max-min in the expected direction.

Figure 1. Random Effects of Urbanization, Rating, Equipment, Maax-Min.

Whenever I’m working on my own material, I avoid arbitrary deadlines and like to mull things over for a few days. Unfortunately that didn’t happen in this case. There is a confounding interaction with TOBS that needs to be allowed for, as has been quickly and correctly pointed out.

When I had done my own initial assessment of this a few years ago, I had used TOBS versions and am annoyed with myself for not properly considering this factor. I should have noticed it immediately. That will teach me to keep to my practices of not rushing. Anyway, now that I’m drawn into this, I’ll have carry out the TOBS analysis, which I’ll do in the next few days (at the expense of some interesting analysis of Esper et al.)

I have commented from time to time on US data histories in the past – e.g. here, here here, each of which was done less hurriedly than the present analysis.


  1. Posted Jul 31, 2012 at 9:39 AM | Permalink

    Hi Steve.

    Any chance that the list of stations and their site ratings will be made available soon?

  2. Kenneth Fritsch
    Posted Jul 31, 2012 at 10:37 AM | Permalink

    I posted the following at the Black Board with the hopes of seeing more discussion of the issues raised in the Watts paper:

    “I see that SteveM at CA has started a thread on the subject of the Watts paper. Maybe some of these critical details can be revealed there. I would hope that a reasonable discussion could avoid the personality issues. I have questions about the use of change point algorithms that have not been answered to my satisfaction to date and I have a great deal of interest in the benchmarking of these various algorithms by testing against realistic simulated data where the truth is known. I would think that change point analysis could hypothetically be the best method of adjusting non homogeneous temperature data. Unfortunately I am aware of the limitations of these methods when working with noisy data. The key to validating any system is testing it with realistic data.”

    SteveM when you say the following I would agree but how do you use the actual data, as opposed to simulated data where the truth is known, to test an adjustment process. Obviously if we know the truth for a given part of the actual raw data it would be best to use that data. My problem is that I am not at all certain that we can find actual data where we know the truth, and if we could, whether these data would include sufficient typical non homogeneities to properly test the process.

    “I think that breakpoints methods, whatever their merits ultimately demonstrate, need to be carefully parsed and verified against actual data with known properties (as opposed to mere simulations where you may not have thought of all the relevant confounding factors). To that extent, Anthony’s project is a real contribution, whatever the eventual results.”

    I was very frustrated by Watts lack of specificity in text of his paper in reference to the temperature data sets he was using. I am assuming that Raw means the raw data before TOB adjustment and Adjusted means the finally adjusted temperatures after TOB and then application of the Menne change point algorithm.
    The links below are to the Watts paper text and figures:

    Click to access watts-et-al_2012_discussion_paper_webrelease.pdf

    Click to access watts-et-al-2012-figures-and-tables-final1.pdf

  3. Posted Jul 31, 2012 at 10:39 AM | Permalink

    The consistency of the expected result is impressive.

  4. Posted Jul 31, 2012 at 10:53 AM | Permalink

    Jeff, I believe is now aware of this paper, where he gets ‘mentioned’

    Click to access LskyetalPsychScienceinPressClimateConspiracy.pdf

    Steve are you aware of your name being mentioned in the same paper.. (smeared by implication?)

    this is how it was reported……

  5. Benjamin
    Posted Jul 31, 2012 at 10:57 AM | Permalink

    TOBS ?

    • Posted Jul 31, 2012 at 1:40 PM | Permalink

      Time of OBservation

      • Posted Jul 31, 2012 at 2:45 PM | Permalink

        Or… Time of Observation Bias?

  6. Manfred
    Posted Jul 31, 2012 at 11:06 AM | Permalink

    Hi Steve,

    figure 23 in Anthony’s paper opens in my view an easy new way to select well sited stations globally.

    Click to access watts-et-al-2012-figures-and-tables-final1.pdf

    Well placed stations appear to have very similar tends in tmin, tmax, and tmean. Poorly placed stations don’t, mostly because tmin trends are significantly increased.

    So dUHI is (usually) affecting tmin much more than tmax (and tmean) and we are looking for stations with small dUHI over time.

    To select proper stations, in a first step only those stations with very similar trends of tmin, tmax, tmean should be taken, where “similar” perferably means similar over several t.b.d. time scales.

    That would include UHI free stations and stations where UHI did not change a lot, exactly, what is looked for.

    If the number of such stations is not sufficient, more stations may be added, by extracting only time spans with similar tmin, tmax, tmean.

    • Posted Jul 31, 2012 at 12:04 PM | Permalink

      I have graphs presented by M.A. Vukcevic that shows that the increase in in average temperature mostly is caused by higher temperature in the winter while summer temperatures are fairly constant.


      • Manfred
        Posted Jul 31, 2012 at 4:16 PM | Permalink

        But is this related to UHI for (central England) ?

    • DocMartyn
      Posted Jul 31, 2012 at 1:53 PM | Permalink

      Manfred, what you suggest is possible, but should be performed blindly and using notarized predictions.
      One would, a prior, select some metric which one believed would describe particularities of sites.
      One would then suggest from the temperature data alone that in a particular local there are predicted to be; n=14 sites =(1), n=25 sites =(2), and so on.
      The locations/names/coordinates and the predicted classifications are then legally sealed and stored.
      One would then dispatch volunteers to assess sites, without them knowing ANYTHING about an individual site and have them perform the assessment of site quality.
      This data is then collated.
      Only when the predictive and recorded data-sets are complete would one compare the two.
      Such a study would be very powerful.
      If one were to base ones classification on the US data the Watts study for ones predictive identification of site quality and use it in conjunction with European, Japanese or Australian volunteers, then one would have a very good, statistically coherent, study.
      I am sure that many of the readers at Bishop Hill could be recruited to examine local station records and siting issues.
      However, the would all depend on the predictions being done first, in secrecy, and being deposited in a notarized fashion. This is how similar studies are done in the biomedical field.

      • Manfred
        Posted Jul 31, 2012 at 4:33 PM | Permalink

        I was proposing anohter ivory tower algorithm instead, easy to investigate for those who have the skills to process the raw data. The first result would probably be a histogramm of raw data trends for stations wie equal trends in tmin, tmax, tmean compared with the others.

        As I understood, the main issue with Muller’s UHI non-detection is the a priori assumption that low UHI stations have low dUHI. Station ratings are about UHI but only dUHI matters for temperature trends. The connection between UHI and dUHI is conplicated as shown in the log (or square) population law, better sited stations may experience more dUHI due to small changes than already heavily contaminated sites.

        Addtionally, station ratings are questionable outside the ensemble verified by Anthony Watt’s surface station project.

        Comparison of tmin,tmax,tmean trends fills this gap of Muller’s poorly understood and unverified assumption, because dUHI is typically directly visible as differences in these trends.

        The result would still be a lower limit for dUHI, because some akward environment change or environment changes and weather combined may affect all trends in the same way.

        But it may already be good eough to detect some dUHI. This thing may then be made more complex with additional selection criteria such as station rating.

  7. MarkB
    Posted Jul 31, 2012 at 11:35 AM | Permalink

    Please note: many times in the past, commenters here and at other sites have been critical of every author of a climate science paper for a single detail, and assigned responsibility for an error to every author. Perhaps now, it will be allowed that every ‘author’ of a scientific paper doesn’t necessarily know every detail of every point made.

    Another point. The fact that a statistician was brought in over the weekend to finish a paper does not reflect well on the entire effort. A scientific paper is not a homework assignment. I’m actually surprised Steve would put his name on such an effort. Apparently, at least one problem has arisen as a result. I have no idea how this work will shake out, but it certainly didn’t start well. Any such work should be gone over with a fine tooth comb before seeing the light of day.

    • Brandon Shollenberger
      Posted Jul 31, 2012 at 11:57 AM | Permalink


      Please note: many times in the past, commenters here and at other sites have been critical of every author of a climate science paper for a single detail, and assigned responsibility for an error to every author. Perhaps now, it will be allowed that every ‘author’ of a scientific paper doesn’t necessarily know every detail of every point made.

      I’ve been reading this blog since it first came online, and I have trouble thinking of any examples of what you describe. Your description may be accurate for “other sites,” but I don’t think it’s accurate for this one.

    • ChE
      Posted Jul 31, 2012 at 12:05 PM | Permalink

      It may turn out that there are problems, but the idea behind this was to put it out into the blogosphere for trial by fire. This wasn’t peer-reviewed, and as it turns out, neither was the BEST paper. Both are subject to this pre-peer review review. I don’t think anybody is going to be surprised if problems turn up that require some revision.

      • j ferguson
        Posted Jul 31, 2012 at 12:11 PM | Permalink


        This wasn’t peer-reviewed, and as it turns out, neither was the BEST paper.

        Not quite so re; Best Paper:

        Unless you are suggesting that Best didn’t pass peer review.

        • ChE
          Posted Jul 31, 2012 at 12:14 PM | Permalink

          Thus illustrating the larger point about blog review.

      • Steven Mosher
        Posted Jul 31, 2012 at 1:19 PM | Permalink

        ‘but the idea behind this was to put it out into the blogosphere for trial by fire.”


        You will note that data for this paper is absent. That effectively means that we cannot do a proper review. we can’t audit it.

        Prediction: special pleading will commence.

        • Latimer Alder
          Posted Jul 31, 2012 at 2:30 PM | Permalink

          Re: Steven Mosher (Jul 31 13:19),

          Mosh is right. You have to publish the data as well as the press release.

          You cannot even begin to claim the high ground without doing so. Leave such nonsense to the stuffy academics.

      • Posted Jul 31, 2012 at 1:47 PM | Permalink


        Its is hard to blog review in detail without the data they used being publicly available, but we will try and look at what we can.

        Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

        • Posted Jul 31, 2012 at 3:20 PM | Permalink

          I think Anthony is within his rights to withhold his new WMO-standard classifications from all but reviewers and coauthors until this paper is accepted for publication.

          But of course if he wanted to release them now that would be great! I’m dying to know what happens to Wooster and Circleville here in OH, as well as Boulder! Back in 2007, I predicted that Boulder would rise to 2 using the equivalent Leroy 1999 standards from 3 using the now-obsolete CRN standards.

        • Steven Mosher
          Posted Jul 31, 2012 at 3:40 PM | Permalink

          But Hu.

          1. Anthony has put it out for blog review and cited muller as a precedent for this practice. that practice included providing blog reviewers with data.

          2. Anthony brought Steve on board at the last minute even though hes been working on this paper for a year. Steve has a practice as a reviewer of asking for data. Since we bloggers are asked to review this, we would like the data.

          3. if, they want to release the data with limitations, that is fine to. I will sign a NDA to not retransmit the data, and to not publish any results in a journal.

          4. You have to consider the possibiity than Anthony and Steve could now stall for as long as they like, never release the data and many people would consider this published paper to be an accepted fact.

          Steve: Mosh, calm down. this is being dealt with.

        • Posted Jul 31, 2012 at 7:23 PM | Permalink

          Steve: Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

          Will you continue your association with the paper if the relevant data (stn ids and siting classifications) is not made public and archived in a timely manner?

        • Gdn
          Posted Jul 31, 2012 at 8:09 PM | Permalink

          Its is hard to blog review in detail without the data they used being publicly available, but we will try and look at what we can.

          Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

          On the other hand, is it publishable in a journal if it’s completely out there on a blog page?

        • Armand MacMurray
          Posted Aug 1, 2012 at 2:11 AM | Permalink

          Release of all the classifications is of course necessary for a complete review. If Anthony is so extremely gun-shy about releasing the whole batch immediately, why not start quickly with a useful subset, such as a truly random 10% or perhaps a few states’ worth?

        • Steven Mosher
          Posted Aug 1, 2012 at 3:05 PM | Permalink

          Well, the stakes just got raised on this one

          Congressional testimony

          “A new manuscript by Muller et al.
          2012, using the old categorizations of Fall et al., found roughly the same thing. Now,
          however, Leroy 2010 has revised the categorization technique to include more details of
          changes near the stations. This new categorization was applied to the US stations of Fall
          et al., and the results, led by Anthony Watts, are much clearer now. Muller et al. 2012
          did not use the new categorizations. Watts et al. demonstrate that when humans alter the
          immediate landscape around the thermometer stations, there is a clear warming signal
          due simply to those alterations, especially at night.”

          Hmm. I suppose that sitting on data and not reporting adverse results or merely taking ones name off the paper, just got a bit dicey.

          Perhaps, co author Christy should be sent a notice that the results he testified about were not fully baked.

        • Posted Aug 1, 2012 at 5:06 PM | Permalink

          GDN, You have to remember that preprints have always circulated in every field but today we have preprint servers such as arXiv which have fuzzed the standard. Publication today is essentially regarded as the publication of a peer reviewed paper, however, it is always best to consult a journal’s specific policy. This dichotomy is seen in that many people leave the original version on arXiv for people outside the paywall to read and the journals do not object much.

        • Posted Aug 1, 2012 at 8:57 PM | Permalink

          Muller’s papers were available online before being submitted to a journal? I was unaware of this.

        • Posted Aug 2, 2012 at 7:31 AM | Permalink

          General practice is to post/send out preprints at the same time as submission because having a manuscript in good enough shape to post/send out means that it is also in good enough shape to submit. There could be a few days either way in general. People are using arXiv today to establish precedent for submissions because of how long review can take.

        • Anthony Watts
          Posted Aug 2, 2012 at 8:59 AM | Permalink

          Mr. Mosher knows my email, and has my telephone number, and mailing address, and so far he hasn’t been able to bring himself to communicate his concerns to me directly, but instead chooses these potshots everywhere.

          The project was worked on for a year before we released, a number of people looked at it at various stages. Dr. John Christy was in fact the one who suggested we should put a note in about TOBS at the end, saying we will continue to investigate it it, because he knew it would be an important consideration. I concurred. We also knew that to do it right, the TOBS comparison couldn’t simply rely on the “trust us” data from NCDC. Christy had already been through that with his study of irrigation effects in California and had to resort to the original data on B91 forms to disentangle the issue.

          What we are finding so far suggests NCDC’s TOBS times (we have the master file for all stations) don’t match what the observers actually do. That’s a discrepancy that we need to resolve before we can truly measure the effect along with siting.

          Mr. Mosher would do well to note this comparison.

          1. When The Team gets criticized on a technical point, they typically dismiss it with a wave of the hand, saying “it doesn’t matter”. Upside down proxies, YAD061, and lat/lon conflations are good examples.

          2. When we get criticized on a technical point, we stop and work on it to address the issue as best we can.

          Whining won’t help #2 go any faster.

          Zeke has been gracious and helpful, and for that I thank him.

          – Anthony

        • Brandon Shollenberger
          Posted Aug 2, 2012 at 10:39 AM | Permalink

          Anthony, something you said here raises a question I’ve asked elsewhere:

          1. When The Team gets criticized on a technical point, they typically dismiss it with a wave of the hand, saying “it doesn’t matter”. Upside down proxies, YAD061, and lat/lon conflations are good examples.

          Over on The Blackboard, I discussed a problem I have with your Figure 23. The first panel of it displays the location of all the compliant sites with your updated classification scheme. Table 4 of your paper says 13 previously compliant sites are no longer considered compliant. However, when I checked the 71 sites listed as Rank 1/2 on the Surface Stations website, it seemed 30 were missing from your map. I even went ahead and generated an image of your map with the location of those 71 sites marked.

          If my results are right, either that figure or that table must be wrong. It’s also possible the problem goes deeper. There’s no way for me (or any other reader) to tell.

          Could you tell me if I’ve messed up somewhere, or explain what’s going on if I haven’t?

        • Brandon Shollenberger
          Posted Aug 2, 2012 at 10:54 AM | Permalink

          Actually, I may have an answer to my own question. I think a lot of the sites that didn’t show up on that map are airports. If so, Figure 3 (23 was a typo) may actually be showing Rank 1/2 sites sans airports. It doesn’t say so, and it probably should show the airports, but it’s not a big deal. It would also explain why my visual count of locations on the map came up short.

          As an aside, anyone could probably regenerate the station list from that image if they wanted. All not publishing such a list does is add a layer of tedium. It might prevent a perfect replication of the list when sites are located very close to each other, but otherwise…

        • Posted Aug 2, 2012 at 4:56 PM | Permalink

          “We also knew that to do it right, the TOBS comparison couldn’t simply rely on the “trust us” data from NCDC.”

          But what do you propose to rely on? A complete reinterpretation of the B-91 forms? I understand that your contention is that even these can’t be relied on.

          The fact is that a TOBS adjustment is required if time of observation has changed. NCDC has reports that the times did change. Adherence to stated times may be imperfect, but that doesn’t mean that the reported changes can be ignored.

  8. theduke
    Posted Jul 31, 2012 at 12:43 PM | Permalink

    I highly recommend everyone read (or re-read) this link of a post from last fall by Steve and which is also linked in the second paragraph of this post. It’s a “big picture” view of his thoughts on the subject of BEST and satellite temp data. Also very accessible for those of us who are not fully conversant with all the math and science that is discussed here.

  9. BarryW
    Posted Jul 31, 2012 at 12:48 PM | Permalink

    One issue of the temperature data related to Urban/suburban/rural is that, given no microsite bias, all three temperature records are valid. The real point is that they are only valid for the area around them that is uniform to the point at which the data was taken . Heat is heat, but if the site only represents .01% of the grid cell, then it’s contribution is only .01 percent. Trying to homogenize that out makes no physical sense. Doesn’t address all the problems but it at least would take that out of the equation.

  10. Steven Mosher
    Posted Jul 31, 2012 at 1:11 PM | Permalink

    In keeping with the traditions of climate audit … turnkey code and data? You guys put the paper up for review, we can hardly start without the data.

    A few questions.

    1. Did you double check the ratings or just put the data in an algorithm?
    2. Since the new site ratings seem to depend up some manual labor done using Google earth
    Did you have occasion the do a spot check on the accuracy of those ratings?

    To the latter point I’ll draw your attention to just one of many comments about the accuracy
    of google earth and suggest that a audit would definitely have to go down to the raw data there.
    I hope folks kept records.

    “The imagery in Google Earth is stretched based on the angle of the aerial object taking it. Further, the 3D terrain of Google Earth is low resolution, so the imagery can shift depending on the inaccuracy of the terrain. (You might want to try turning off the terrain and making comparisons). To make matters worse, Google has not described in detail how it goes about registering imagery. Human factors are probably involved in alignment and especially in stitching together images for esthetic appearance sake.

    The key point is that Google Earth is NOT a GIS or survey-grade dataset. They don’t promise to be, and they discourage its use for those types of applications.”

    3. Since Wickham made her station list available to you prior to submission, will you make your station list available to others?

    4. Why did you stop at 2008?

    5. What you say about amplification here differs with what you wrote in the paper.

    6. What does a comparison with CRN show?

    7. You use USHCNv2 metadata to classify rural/urban. Did you check that? Do you accept that definition of rural?

    8. The how were grid averages computed?

    Steve: As I mentioned, I’ve been involved with this paper for only a few days. You know my personal policies. I did some limited statistical analysis, which, to my considerable annoyance, I need to revisit. As you know, I don’t have a whole lot of interest in temperature data, which is an absolute sink for time. So I’m going to either have to do the statistics from the ground up according to my standards or not touch it anymore.

    • A. Scott
      Posted Jul 31, 2012 at 2:52 PM | Permalink

      Steve … re: the Google Earth imagery – since the Leroy 2010 standard appears to deal only with sinks and sources within 100m – and realistically it would seem likely in many/most cases, with sinks and sources much closer than that, is any error in Google of any importance.

      Seems no matter how much you warp, twist, stretch .. over 100m or less would seem you cannot create a huge error. And then – even if it were something on the order of 25% – it would seem that would not make all that significant of a change in the ratings formula result – with exception of a few stations with sinks/sources right at 100m that might suffer a shift. I’d think those cases would be very small, if any?

      • Steven Mosher
        Posted Jul 31, 2012 at 3:20 PM | Permalink

        You might think that, but thats just the kind of thing you want to audit.

        As I understand it photos were used and measurement tools were used on those photos. So I would expect that a complete dataset would include.

        1. a copy of the photos.
        2. a description of the method used.
        3. a calibration test of the method.
        4. the measurements produced.

        Then you want to check that the rating was actually done properly. Remember, just because the rater says its a 3, doesnt mean its a 3. raters make mistakes.
        was it blind rated? did the rater know he was re rating a site previously rated at 1? who was the rater?

        lots of things you want to know.. and check.. and not take peoples word for it. You know, apply the same standards of scrunity.

        Recall if you will Anthony requested paper copies of b91s to check NOAA. I think steve may also have made requests for this data. So a standard of checking it seems to me has been set. All the way down to the raw data if its available..

        • A. Scott
          Posted Jul 31, 2012 at 10:14 PM | Permalink

          Why couldn’t you use the Google Maps Sat. view (or other aerial pictometry) as a cross check?

          That data is not stretched or warped. Measure a handful of sites where you think an error may exist in Google Earth then do same in Google Maps Sat. view (or Terraserver or any of the similar). Seems that should give you at least a close idea.

          All that said, again – considering the Leroy 2010 method uses at max a 100m distance I just can’t see any such warping or other inaccuracy being more than a 5-10 percent if that. And I cannot think there are any large number of stations where a 5-10 meter error over 100 meters would cause a class change in any significant number of stations.

          I don’t disagree it should be checked in interest of accuracy, but seems a low priority to me.

        • Howard
          Posted Aug 1, 2012 at 11:48 AM | Permalink

          As a very large critic of you and your *style*, I agree 100% with your take on using GE. (I also applaud your relentless hammering for the data and code) GoogleEarth is a great program that I use all the time, but you always need to confirm with ground-truth and/or USGS topographic maps and/or aerial photography. However, this level of investigation is a just a small part of data collection site evaluation.

          For commercial real estate toxics due diligence, we our bound by professional standards to look at historical USGS topo maps, historical Sanborn fire insurance maps, historical city directories, historical air photos, public agency records. All of this research and subsequent analysis costs between $1,000 to $2,000 per site.

          This minimum professional standard of care is very likely beyond the capability of Surface Stations crowd sourcing.

          In defense of Watts’ Surface Stations (I am a huge critic of WUWT, BTW), they are attempting to do the first steps of the fundamental research that should have already been carried out in each state and county by undergrads at local colleges and universities paid for by the feds between NASA, NOAA and USGS. One year, $1M each and Bob’s your uncle.

          Until individual site evaluation is conducted, all of the temperature data sets are “a pig in a poke”, no matter how much statistical lipstick is rigorously applied. It sounds like there are enough holes in the Watts, et. al. study to discredit and derail the effort of reducing uncertainty in temperature records.

        • A. Scott
          Posted Aug 1, 2012 at 4:03 PM | Permalink

          Howard …your numbers and info are quite interesting ad valuable. As is your grudging appearance of respect for Watts work ;-).

          You point out just how ridiculous this whole temperature data process seems to be. Alleged experts who don’t or won’t do the comparatively tiny amount of proper due diligence to assure the quality of the temp records.

          AGW research has been widely reported to receive billions in funding worldwide, and similarly huge sum in the US … yet they won’t due the simple due diligence of proper site inspection and reviews.

          Using your numbers – at $2,000 each – we’re talking a couple million dollars in total for a detailed professional review of all 1200 stations.

          That seems a tiny price to pay – in fact it would seem that survey should be done every 5-10 years on all of the core reporting stations?

    • Posted Aug 1, 2012 at 2:58 AM | Permalink

      Steve Mc:

      As you know, I don’t have a whole lot of interest in temperature data, which is an absolute sink for time.

      A big reminder of why we should be so grateful to guys like Anthony Watts and Steven Mosher.

      Having mentioned three people I greatly respect I’ll add that when I read Anthony’s teaser on Friday and all the hoopla it generated, including the published ruminations of the said Steves, something told me to “Calm down,” as one aforementioned just said to another. (The only evidence you have for my calmness and even boredom at this point is that you won’t find any record of me joining the speculation game before Anthony’s press release on Sunday. Mind you, I liked the guy who ended ‘That’s the report from my gut’. But my gut was telling me that this wasn’t worth the attention. And I think that now Steve Mc may be regretting he gave it the small amount of attention he did.)

      That isn’t to say that I criticise Mr Watts in the least. He was having a go at something a bit different. Worth a shot – and once problems are sorted out, who knows what the end result will be. And I also feel that Mosher’s decision months back to get pretty heavily involved in BEST was an excellent one.

      I love that feeling of seeming to face multiple ways at once, don’t you? Something I feel is essential to get even close to the truth in the climate game 🙂

      • Steve Reynolds
        Posted Aug 1, 2012 at 4:45 PM | Permalink

        I agree.
        It is good to have more skeptical results getting enough attention to be worth auditing here.

  11. Keith AB
    Posted Jul 31, 2012 at 1:30 PM | Permalink

    Is SM saying that he was duped by AW?

    Is he saying that had he known what conclusion the Watts et al 2012 paper was going to draw he wouldn’t be part of the paper?

    What is Mr McKintyre saying here?

    Steve: I was only on the paper a short time and I overlooked an important issue, which Anthony had paid insufficient attention to. I should have known better – my bad. I’m very annoyed at myself.

    • j ferguson
      Posted Jul 31, 2012 at 3:58 PM | Permalink

      Maybe it wasn’t the last minute until Friday. He let himself be provoked into early release – no crime to my mind, but it would have precluded more care.

    • Posted Aug 1, 2012 at 9:20 PM | Permalink

      Steve, these guys are trolling you. They are trying to manipulate your integrity to go after Anthony. No skeptic believes you contributed anything more to this paper than what was stated nor do they for a moment believe what is posted is the final product. Everything is clearly stated as preliminary and pre-publication.

  12. dhogaza
    Posted Jul 31, 2012 at 1:37 PM | Permalink

    “Is SM saying that he was duped by AW?”

    No, he says that he simply didn’t notice that TOBS issues weren’t accounted for in the conclusion, and his totally reasonable excuse is that he was under time pressure and therefore rushed.

    “Is he saying that had he known what conclusion the Watts et al 2012 paper was going to draw he wouldn’t be part of the paper?”

    I understood him to be saying TOBS issues are important, so I’d assume he’d have told anthony and tried to help him make the paper better …

    Steve: quite so. it is very much my intention to ensure that these issues are recognized and properly treated.

    • vvenema
      Posted Jul 31, 2012 at 4:21 PM | Permalink

      Steve, the discussion and conclusions of Watts et al. (2012) state: “We are investigating other factors such as Time-Of-Observation changes which for the adjusted USHCNv2 is the dominant adjustment factor during 1979-2008.”

      This makes this web publication even weirder to me. At least Anthony Watts seems to have ignored this problem knowingly. How did he expect to pass review with leaving such an important confounding factor out of the analysis. And it is not as if analyzing this would require a whole new study, that may have been an excuse for a less important confounder.

  13. Posted Jul 31, 2012 at 1:44 PM | Permalink

    Mr. McIntyre (or others)

    I am aware of the theory behind the TOBS, but it has struck me as an adjustment of a contrived scenario.

    Background: I grew up as the son of an aeronautical engineer who had a fluid based min-max thermometer that he religiously recorded when he came home at the end of each work day and during the weekends. And plot them up on 11×17 K&E 1x1mm graph paper, one sheet per year, stack on top of each other, year after year, for about 35 years.

    Did he always record the make the recording at 6pm? No. When he got home or after the hottest part of the day.

    I am trying to figure out how some TOBS adjustment can or should be made to a min-max temperature record — provided that there was a gap between the min and max markers and the current mercury levels. A guy doesn’t do this to record bad data.

    Yes, cold fronts would come through at 10pm making the low that was made at 4am and recorded at 6pm not the low of the calendar day. That low would be recorded the next day. But how can a TOBS adjust for that without recording min-max temperatures many times per day? Does that data exist? I doubt it. And why would it be a TOBS ‘adjustment’ instead of the min of several mins recorded that day?

    Does the metadata exist to support blanket significant TOBS adjustments? Convince me that TOBS is not a contrived adjustment to get the answer “we want.”

    Steve: allowing for a TOBS adjustment is reasonable enough. When max min are read daily, if they are read in late afternoon near the daily maximum, a hot day can end up contributing to the maxima for two consecutive days and the cooler next day not counted. The adjustment is made relative to theoretical midnight readings.

    I understand the suspicion of these various adjustments which often seem arbitrary, but this one is fair enough,

  14. Mindbuilder
    Posted Jul 31, 2012 at 1:45 PM | Permalink

    If TOBS for rural stations was a problem in Anthony’s current paper, wouldn’t it have messed up his last one (Fall 2011) as well?

    • Posted Jul 31, 2012 at 1:52 PM | Permalink


      Fall et all examined raw, tobs, and fully adjusted data. Their primary conclusions were based on the fully adjusted data, since raw data can have lots of other confounding issues (tobs changes, instrument changes, instrument moves, etc.) that may be correlated with urbanity or CRN rating the skew the result.

  15. Posted Jul 31, 2012 at 1:51 PM | Permalink


    Along with TOBs you might want to take a closer look at MMTS. I sent Anthony some data on the sensor transition dates so that you can do a before/after comparison.

    Here is some past work I did on the subject, which seems to show a pretty clear cooling bias in max temps due to the MMTS transition:

  16. Steve McIntyre
    Posted Jul 31, 2012 at 2:07 PM | Permalink

    In my original look at this information (2007) here, I used TOBS data. I need to revisit this work.

    • A. Scott
      Posted Jul 31, 2012 at 3:18 PM | Permalink

      Steve … the Conclusions section of the paper does note the TOBS issue is one that needs more investigation:

      “We are investigating other factors such as Time-Of-Observation changes which for the adjusted USHCNv2 is the dominant adjustment factor during 1979-2008.”

      Seems to make it clear that it was not overlooked – the issue was acknowledged and noted for future review?

      Steve: needed to be dealt with.

      • Steven Mosher
        Posted Jul 31, 2012 at 3:29 PM | Permalink

        The problem is that everyone who follows this in detail knows that there is a TOBS adjustment. we know why it is made and we know that the adjustment has been tested and validated. we know that everytime you see a weird result with data or something too good to be true, you check to make sure that the proper TOBS correction has been applied.

        in fact we spent a considerable time here at CA going over TOBS.

        Anytime Anthony does work my first question is always..
        Did you use TOBS?

        its basic.

        • JamesG
          Posted Aug 1, 2012 at 7:38 AM | Permalink

          Where has it been tested and validated? Citation please. When I read the original paper the adjustment is based on the authors made it clear that it was neither tested nor validated against real data, but in fact was largely guesstimated. That it cools the past and warms the present by around 0.4C is enough to tell us it should have been tested somewhere. I’ll be interested in Steves findings. I suspect this is a minefield.

  17. Ivan
    Posted Jul 31, 2012 at 2:13 PM | Permalink


    “The satellite data seems quite convincing to me over the past 30 years and bounds the potential impact of contamination of surface stations”

    Why are you so sure? Have you studied the satellite data and methodology with the same level of auditing scrutiny as you did with the paleo-reconstructions to claim so confidently, or it is just simply convenient to say so, in order to dismiss potentially “toxic” conclusion that the surface data might be “cooked”. Correct me if I am wrong, but the procedures of collecting and processing the satellite data to create a temperature record are extremely complicated, much more so than in the case of the surface record, and both satellite records underwent more than one revision already, all of those revisions increasing substantially the trend. What is the specific basis for you belief that satellite data has more integrity than the surface record?

  18. Mindbuilder
    Posted Jul 31, 2012 at 2:28 PM | Permalink

    If TOBS is a serious problem then I’d say we need to go back to the old observation times, or use both. And it also seems like we should erect shelters at each location that are the same as the original ones along with the MMTS style shelters. Recording electronic sensors or digital cameras pointed at thermometers could be used in the old shelters.

  19. Ivan
    Posted Jul 31, 2012 at 2:32 PM | Permalink

    Steve: “Anthony sent me his draft paper. In his cover email, he said that the people who had offered to do statistical analysis hadn’t done so (each for valid reasons). So I did some analysis very quickly, which Anthony incorporated in the paper and made me a coauthor though my contribution was very last minute and limited. I haven’t parsed the rest of the paper.”

    So, you allowed your name to be added to the list of coauthors without reading the paper itself?!

    Steve: If the paper is submitted anywhere, I will either sign off on the analysis or not be involved. I didn’t “allow” or not “allow” anything in respect to the discussion paper.

  20. Posted Jul 31, 2012 at 2:39 PM | Permalink

    “I support the idea of getting the best quality metadata on stations and working outward from stations with known properties,… To that extent, Anthony’s project is a real contribution, whatever the eventual results.”

    I agree on the idea, if achievable. But how does Anthony’s project contribute? The data seems to be just a photo (and maybe some Google Earth measurement) at a particular point in time (2009). The metadata needed for trend is of station history.

    Steve: quite so. NCDC has pretty good station histories.

    • Leo G
      Posted Jul 31, 2012 at 3:12 PM | Permalink

      Nick, maybe Anthony’s project will get some beurocrat to authorize a few thousand dollars for a sites’ road trip. Reading Anthony’s reply to Revkin, it seems to me that he has a point in his argument about getting out of the office and do some field work. The basic of the temp trend are the stations. Does it not make sense to know how they are sited and what can effect them before you go through all of your modeling tests?

      • Posted Jul 31, 2012 at 3:15 PM | Permalink

        USGS and USCGDS areal photos used for their topo maps could be a usefull way to go back in time and inspect the site history.

        Let’s hope they saved the film.

    • kuhnkat
      Posted Jul 31, 2012 at 3:14 PM | Permalink


      you seem to think that NO current data is better than what Watt’s has done. How does that work?? Because you already have the answer you prefer??

      I would note that better site data was needed to tie into reclassifying the stations using Leroy 2010:

      Click to access CS202_Leroy.pdf

    • Kenneth Fritsch
      Posted Jul 31, 2012 at 3:27 PM | Permalink

      “I agree on the idea, if achievable. But how does Anthony’s project contribute? The data seems to be just a photo (and maybe some Google Earth measurement) at a particular point in time (2009). The metadata needed for trend is of station history.”

      And that can be readily accomplished without leaving your desk. Do not do smiley emoticons.

      I agree that Watts evaluations are snapshots but if those snapshots present a different picture that the meta data what then?

      • Posted Jul 31, 2012 at 4:08 PM | Permalink

        “if those snapshots present a different picture that the meta data what then?”
        They don’t present a “different” picture. They present an unrelated picture. To analyse trend, you need to know about the past. It’s possible that the current photos could be used to aid interpretation of past metadata, but I can see no indication that this has been done.

        • Kenneth Fritsch
          Posted Jul 31, 2012 at 4:42 PM | Permalink

          Oh god, Nick here we go again. Very obviously I am saying if the snapshot shows a very different or even just different picture than the meta data would imply for the time of the snapshot there is a problem. And by the way a snapshot could be used for even validating meta data further back in time than the time of the snapshot. For example, there are changes that the snapshot might show that could tracked by other means, like when a parking lot was blacktopped or an air conditioner was installed. I am under the impression that even good meta data does not account for these changes seen in the micro climate by these snapshots.

          Having said that, I have continued to repeat that one must known when and how and over what time period the micro climate changed to become what it is documented in the snapshot in order to fully utilize it. That is particularly true where studies use only a brief period of the last 30 years. If a really low quality station evaluated today were a low quality station 30 years then we can expect no effect on the 30 year trend. Also would a slowly evolving change be found by change point algorithms or meta data?

  21. Posted Jul 31, 2012 at 3:08 PM | Permalink

    Let’s address the TOBS a different way.

    Here is what we know: Someone recorded a min and a max value at a recorded time on a specific day. We will assume the record is good enough so that 4s can be told from 8s, 7s from 1s, 6s from zeros. That potential source of error is for another day.

    Some recognizes that there is a TOBS issue with the max on day B on the same day as an unusually cold min. Did that max occur before cold front or is it a hold over from the day before?

    As I see it, we have two basic assumptions:
    1. the recorder is contientious so that we can trust the max and min, no action necessary, or
    2. the recorder is an idiot or doesn’t care about getting it right.

    If 2, then we recognize that the max MIGHT be in error.
    our choices are:
    2A: Record the potential error value in the measurement, and thus increase the error bars of our analysis, or
    2B: Fudge the number to some estimate we TOBS adjust e and… do what to the overall error estimate? Leave it unchanged?

    Naturally, I’m in favor of 1, with 2A as a fall back if the metadata indicates sloppiness of the recorder. 2B seems unacceptable to me ever since high school chemistry lab some 40 years ago.

    If people believe TOBS is really important, then it should be primarilly be seen in increasing the uncertainty of results to a point that few conclusions can be made, not strengthening the signal.

    And what is this “estimated at midnight” baloney? What on earth in the written record indicates that the min was happened at 11:50 or 00:10? Does it really make a difference to the 100 year climate record if the “day” was from midnight to midnight or 18:00 to 18:00? A six hour timeshift over a 100 year record is that critical to the result? Moving the temperature to an absolute written record at 18:00 to an estimated record at midnight is doing nothing to improve accuracy. My skepticism is pegging off scale.

    • Kenneth Fritsch
      Posted Jul 31, 2012 at 3:22 PM | Permalink

      Stephen the point has to be in the consistency of the time of the measurement and not precisely when it is made.

    • rk
      Posted Jul 31, 2012 at 5:32 PM | Permalink

      What we have is a non-robust data gathering mechanism. I’m all in favor of accepting that reality, and accepting the variance that comes with it.

      Our basic problem is that data from weather stations were not designed to be used in exotic statistics. They were designed to tell people what the temperature…etc. is

      So therefore we want to acknowledge in our statistics that reality. If a signal is not detectable given the real life experience, then it is not very strong

    • Posted Jul 31, 2012 at 6:02 PM | Permalink

      Stephen Rasey —
      I too was once a TOBS Denialist, but I became a True Believer after a lengthy discussion on CA back in 2007:

      I agree with you that there is nothing magical about midnight — 9 or 10 AM or PM is in fact the optimal time to avoid double counting of warm days or cool nights.

      But switching from 5PM or so to 7AM or so, as has often been done, definitely cools the station on average.

      Personally, I think it would be better to just treat this as a new station with a new offset, rather than to try to use the Karl algorithm to “adjust” it away.

      • Skiphil
        Posted Jul 31, 2012 at 6:30 PM | Permalink

        Has anyone ever been able to study whether taking Tmin and Tmax daily over a month, a year, etc. adequately represents what happens to temps all through the varying 24 hour cycles? e.g., aren’t there some days and nights where many more minutes and hours are closer to Tmax or closer to Tmin? Does it all somehow average out or could there be significant distortions of the real “physical” temps because we don’t have 24 hour continuous data in the historical record? Do satellites now provide any adequate comparison for “validation” purposes? I may not be phrasing any of this right, I’m not in this field, but wondering if Tmin and Tmax can be enough for an accurate representation even if we had good enough data for those numbers?

        • Bob Koss
          Posted Jul 31, 2012 at 8:02 PM | Permalink


          Some stations measure once a day. Others measure once per hour, the newest network(CRN) does it every 10 seconds. There will different readings recorded by different measurement systems. It doesn’t make much difference as long as long as the chosen method is applied consistently at each station. Might on average be worth a couple tenths of a degree difference due to method used.

          Something I’ve never seen discussed is what happens if the chosen method for a station gets changed. I think that should result in a separate record being created. Although each method is self-consistent, the difference in methods may end up creating a unwanted step-change in the record if only a single record continues to be maintained.

          Here is an example using a USCRN record.

          Columns F-G are Tmax, Tmin.
          Column H is (Tmax+Tmin)/2
          Column I is the average of 24 hourly readings ultimately derived from the averages of 10 second readings which are taken from the average of three different thermocouples.

          Note that sometimes column H is higher than I and sometimes the reverse is true.

          It is certainly a well thought out system. Too bad the system is only about ten years old. Also too bad is the idea they don’t include humidity readings in the data files which I know they record.

        • Bob Koss
          Posted Jul 31, 2012 at 8:15 PM | Permalink


          I linked to the monthly file instead of the daily file, but the basic description is similar.

        • Bob Koss
          Posted Jul 31, 2012 at 8:22 PM | Permalink


          Now that I look at the daily file I see that is where they record the humidity. Dumb of me for not checking it out first.

        • mt
          Posted Jul 31, 2012 at 10:43 PM | Permalink

          Using data from here, here’s some code that looks at adjusting the time of observation, simulating min/max measurements from the hourly CRN data. It’s setup to compare 5pm vs other times:

          # Import data, convert invalid measurements
          data = read.table("CRNH0202-2011-FL_Sebring_23_SSE.txt");
          data[,11:12] <- apply(data[,11:12], 2, function(x){replace(x, x==-9999, NA)});
          # Setup matrices
          mins = matrix(rep(0, 24*364), 24, 364);
          maxs = matrix(rep(0, 24*364), 24, 364);
          # For each starting hour
          for (off in 1:24) {
            # get the correct local hour from the data
            hour = (data[off,5]/100) + 1;  # shift 0..23 to 1..24
            # For each day
            for (day in 1:364) {
              # min is the min of the min, max is the max of the max
              start = off + ((day - 1) * 24);
              maxs[hour,day] = max(data[start:(start+24),11],na.rm=TRUE);
              mins[hour,day] = min(data[start:(start+24),12],na.rm=TRUE);
          # Convert missing days to NA
          mins[which(is.infinite(mins))] = NA;
          maxs[which(is.infinite(maxs))] = NA;
          # for each hour, average daily diffs for the year
          diffs = rep(0, 24);
          for (hr in 1:24) {
            diff = ((mins[18,]+maxs[18,])/2) - ((mins[hr,]+maxs[hr,])/2);
            diffs[hr] = mean(diff, na.rm=TRUE);
          # plots
          plot(diffs,type='l',main='Average difference from 5pm by hour\nCRNH0202-2011-FL_Sebring_23_SSE');
          plot(((mins[18,]+maxs[18,])/2) - ((mins[8,]+maxs[8,])/2),type='l',
               main='Diff in daily temp moving from 5pm to 7am\nCRNH0202-2011-FL_Sebring_23_SSE');

          Output plots are average difference over the year for different observation times, and daily differences for 5pm vs 7am. The yearly average difference for CRNH0202-2011-AK_Barrow_4_ENE has similar shape, but smaller spread.

          Steve: very relevant.

        • Skiphil
          Posted Aug 1, 2012 at 2:09 PM | Permalink

          Thanks Bob, and I also found this which discuses related matters:

          That is the kind of issue I was groping toward from my layman’s perspective, that it *might* matter by more than a tenth or two if a temp record is only (Tmin + Tmax) / 2

          This is maybe more about the error bars then any specific correction that could be made, but I was thinking about how temps can *sometimes* be volatile during a 24 hr period, especially as weather fronts move in or out etc. Maybe it all averages out, but if there are any cloud cover changes as discussed at that BH article then there might be warming or cooling that is not about “global warming” per se (as anything related to CO2).

    • Steven Mosher
      Posted Jul 31, 2012 at 9:21 PM | Permalink

      The point is this. If I told you that the thermometer was moved from a grassy field to under an air conditioner you would say that things changed and you would want to investigate that. If the time at which the observation was taken changes we would also want to investigate that.
      we cannot pay attention to changes in observation practice in a selective manner. If we complain about changes to instruments, we have to complain obout changes in time of observation. When we do actually look at the EFFECT of changing time of observation we see very clearly that it biases the answer. changing TOB changes the temperature. Attention to details like this is something that WUWT fans should appreciate. There are two approaches to TOB changes

      1. split the data, and call it two stations.
      2. correct the bias.

      Bias correction for TOB has been investigated. At John daly, here at CA, and in the literature.

      Ignoring the need for a correction, pretending that observing practice matters for microsite but doesnt matter for changing the TOB, is not best practices

  22. Kenneth Fritsch
    Posted Jul 31, 2012 at 3:18 PM | Permalink

    I think that Watts initial intentions were good in that he kept updating the CRN ratings for stations as the team doing the work turned it in. Some participants at these blogs, and including me, did some preliminary calculations and while the results appeared to vary with who was doing the calculations or more importantly how the calculations were being done the results were not overwhelmingly different (although I thought with sufficient data one might be able to significant differences) than the adjusted results from USHCN. The gallery at the time was expecting some dramatic differences or so was my perception. My point at that time was that the number of CRN 1 and CRN 2 stations was very small and that given the noisy data for temperature trends amongst even closer spaced stations meant that in order to see a statistically significant difference due to CRN rating would require a very large difference in trends or a larger number of stations in those classifications. I even suggested the grouping of CRN123 versus CRN45 at that time.

    I admired Watts, and particularly his teams efforts, in going out into the field and looking the micro climate conditions first hand. I have often thought that climate scientists like economists fling data and statistics around of which they do have an intimate understanding and the result could be garbage in and garbage out.

    I was puzzled that when Watts withdraw the updating of the CRN ratings until I realized he hoped to get the data analyzed and published. He was slow in accomplishing this task and in the meanwhile others published papers based on the CRN findings before Watts did his first paper. I have not been happy with the approach taken by any of these papers including the Watts coauthored one.

    Now Watts has a different rating criteria that evidently gives different results and obviously makes this result, if it were to hold up, a publishable event. I do not understand if the prepublication is a matter of shopping the results around or not, but I see no way in hell it can be published without the original data and code. After all Watts is not exactly a climate scientist regular who might be given that exception.

    • Posted Jul 31, 2012 at 4:39 PM | Permalink

      Yeah only real climate scientists can leave out the code and data. That’s one thing we know for sure.

    • Ivan
      Posted Jul 31, 2012 at 5:39 PM | Permalink

      “but I see no way in hell it can be published without the original data and code.”

      that’s because he is not a member of the Hockey Team.

  23. Posted Jul 31, 2012 at 3:41 PM | Permalink

    What surprises me about all this is the apparent urgency that was created which seems unwarranted and is likely to lead to mistakes and omissions.

    More haste less speed seems a worthwhile maxim in science especially when writing about something as slow moving as the Earth’s climate


  24. dearieme
    Posted Jul 31, 2012 at 3:52 PM | Permalink

    It seems to me that the big deal with the paper is that a group has taken the trouble to examine the sites and analyse them using physical thinking. This seems to me to be potentially a far superior approach to the sort of adjustment flummery used heretofore. If detail needs sorting out, so be it: at least it won’t be smuggled into the literature, errors and all, by pal review.

    I must say, though, that I might not have liked having my name added to a paper in a rush. (It happened to me a couple of times, and both times my new colleagues managed to get my name wrong!!)

  25. jfk
    Posted Jul 31, 2012 at 5:57 PM | Permalink

    Well, I am glad at least Mosher wants to apply the same standards to this paper as to one written by, say, Michael Mann. And I’m afraid I can’t understand how “the statistics” for anything like this can be done over a weekend. It’s troubling. I can’t understand the paper very well, there isn’t enough detail.

  26. Rob MW
    Posted Jul 31, 2012 at 6:58 PM | Permalink

    From the Watts paper – Lines 215 >>> 226

    The USHCNv2 monthly temperature data set is described by Menne et al. (2009).
    The raw and unadjusted data provided by NCDC has undergone the standard quality-control screening for errors in recording and transcription by NCDC as part of their normal ingest process but is otherwise unaltered. The intermediate (TOB) data has been adjusted for changes in time of observation such that earlier observations are consistent with current observational practice at each station. The fully adjusted data has been processed by the algorithm described by Menne et al. (2009) to remove apparent inhomogeneities where changes in the daily temperature record at a station differs significantly from neighboring stations. Unlike the unadjusted and TOB data, the adjusted data is serially complete, with missing monthly averages estimated through the use of data from neighboring stations. The USHCNv2 station temperature data in this study is identical to the data used in Fall et al. (2011), coming from the same data set.

    Is it not the case that Anthony is simply using real_climate_science that would underscore a comparison of oranges to oranges thereby avoiding inconsistency from within the real_climate_science community with respect to their own accepted science (oranges) ??

    If this is the case then it would seem to me that criticism surrounding (TOB) is a moot point.

  27. Christoph Dollis
    Posted Jul 31, 2012 at 6:59 PM | Permalink

    I realize it was probably just miscommunication between the two of you, or maybe a last minute honour Watts thought to give you by listing you since you had pitched in, but one of the things that reassured me about the mathematics behind the paper was your involvement. It was disappointing, to say the least, to have enthusiastically touted this paper to some friends, and then see your post.

    But … let that be a lesson to me. I despise self-interested cognitive biases in science, but am hardly immune.

    I realize this is an interruption in what you would otherwise be doing, Steve, but I hope that you can help to tighten up the paper and salvage the value there is within it, which I hope is high. But, failing that, if it needs to be criticized, I hope you’ll do that too, with respect and rigour both.

    Let the science prevail.

    (That said, the conclusions of the paper make 100% intuitive sense to me and I won’t be in the least surprised to see them born out.)

  28. Posted Jul 31, 2012 at 7:00 PM | Permalink

    I put up some initial thoughts here:

    If the CRN1/2 stations from Fall et al are indeed mostly included in the new CRN1/2 pool, it does raise some concerns about how a larger effect is being found using a laxer criteria.

  29. dhogaza
    Posted Jul 31, 2012 at 7:45 PM | Permalink

    “Watts et al. say a statistically significant signal was found in data using minimum adjustments.”

    The fact that a statistically significant signal can be found in a set of data says nothing about the accuracy of that data. And any conclusion you draw from the statistical analysis is only as strong as the underlying data is accurate.

    The raw data is known to have problems. If you refuse to address them, and if they’re significant, it’s garbage in, garbage out.

    Watts needs to show that homogenization algorithms are wrong. You do that by analyzing the algorithms and showing where they are wrong, not by asserting that an analysis of raw, flawed data must be better just because it shows a lower trend. That’s essentially what Watts is doing.

    Steve: Yes and no. I agree with your comment about the importance of addressing problems in raw data – that’s obviously been a major concern of mine with respect to bristlecones, Yamal and so on, where there are problems more serious than “tobs”. I also agree with your remark about assuming something is better because the result meets expectations. Again a criticism of mine with respect to proxy reconstructions. ‘

    I also think that the deconstruction of homogenization algorithms is a different job from presentation of the surface stations data classification and that the two jobs should be kept separate.

    • dhogaza
      Posted Jul 31, 2012 at 7:45 PM | Permalink

      That was meant to be a reply to Stephen Rasey’s post below.

    • dhogaza
      Posted Jul 31, 2012 at 11:42 PM | Permalink

      I also think that the deconstruction of homogenization algorithms is a different job from presentation of the surface stations data classification and that the two jobs should be kept separate.

      Wrong, because Watts is declaring that the homogenization algorithms are wrong, and pretty much stating that it’s due to a desire to show an inflated trend. He didn’t just present his surface stations data classification (hidden, as Mosher has pointed out, where’s the data?), he says they prove the homogenization algorithms are wrong.

      If he didn’t go down that path, I’d agree with that. But not only has he gone down that path, but that’s his entire schtick for years, and that’s the major conclusion of his “work”.

      How can you, as co-author, have missed this ???

      • A. Scott
        Posted Aug 1, 2012 at 3:34 AM | Permalink

        Watts “surface stations data classification” is the result of applying Leroy (2010) siting standards to the existing readily available station data. Not a thing I can see to stop you from duplicating his work and verifying or disproving his results.

        Watts identified the data used. He identified the siting standards used. He listed the process they took. And he showed his results including how the stations shifted in rating catagories from the prior Leroy (1999) standards..

        The USHCN Version 2 Serial Monthly Dataset page here:

        … provides the 4 data sets for each set of station ratings, along with the MMTS and Cotton Region Shelter (Stevenson) site information used for Menne (2010).

        The NCDC Station Histories appear to be here:

        WMO-CIMO endorsement of Leroy(2010) standard is here:

        Click to access 1064_en.pdf

        Leroy(2010) is here:

        Click to access CS202_Leroy.pdf

        Watts(2009) is here:

        Click to access surfacestationsreport_spring09.pdf

        And Muller’s Station data is here:

        Click to access berkeley-earth-station-quality.pdf

        I am a complete layman. I read the Watts report, did a little reading – mostly at blogs like here, and with 5 minutes of searching I was able to find all the above data links.

        Of course I was a fool for doing so as after doing the digging, had I bothered to read the references I would have found all of this data was listed in the Watts report itself.

        I believe that is all of the data required to reproduce Watts work. I even included the NCDC station history metadata in case you don’t want to do the extensive visual and/or onsite inspection Anthony and his help spent well over a year doing.

        Seems to me instead of complaining about his work – if you want to refute it you should just jump in and have at it. Do the work and show where he is wrong.

        • A. Scott
          Posted Aug 1, 2012 at 3:51 AM | Permalink

          Sorry dhogaza … forgot to provide you Fall (2010):

          Click to access r-367.pdf

          NOAA’s Climate Reference Network Site Handbook (see Sec. 2.21)

          Click to access X030FullDocumentD0.pdf

          Watts Surface Stations Project site master list:

          (the brief notes should provide an initial screen of suspect stations)

        • Posted Aug 1, 2012 at 5:34 PM | Permalink

          A. Scott: Watts “surface stations data classification” is the result of applying Leroy (2010) siting standards to the existing readily available station data. Not a thing I can see to stop you from duplicating his work and verifying or disproving his results.

          One thing that would stop us from verifying his results is that he has not provided a list of the USHCN that he has classified, the classification that has been assigned, or the methodology used to make the assignation.

          The fact that Google has aerial imagery, that Leroy 2010 explains a new classification scheme, and that USHCN provides its station data freely to the public does not somehow make Anthony Watts’ refusal to provide the station ids that he used, or to provide the Leroy 2010 station classifications that he used, and or the methods used to make that classifcation in his paper more palatable. Hide the data; hide the code! 😆

          Steve: I agree that there is little point circulating a paper without replicable data – even though this unfortunately remains a common practice in climate science. It’s not what I would have done. I’ve expressed my view on this to Anthony and am hopeful that this gets sorted out. Making the data set publicly available for statistically oriented analysts seems far more consistent with the crowdsourcing philosophy that Anthony’s successfully employed in getting the surveys done than hoarding the data like Lonnie Thompson or a real_climate_scientist.

          It would have been nice if you’d spoken out on any of the occasions in which I’ve been refused data. You are entitled to criticize Anthony on this point, but it does seem opportunistic if you don’t also criticize Lonnie Thompson or David Karoly etc.

        • Posted Aug 1, 2012 at 5:59 PM | Permalink

          Ron Broberg (Aug 1 17:34), Re: A. Scott (Aug 1 03:34), Re Ron Broberg comment:
          See: surface-stations/#comment-345602

          I agree Anthony Watts and crew should be held to the same incredibly tough standards that are required to be met by everyone else in the field of climate science.

          So what does that give him before he has to cough up the code the data and all the details — twenty, thirty years? …and a half dozen FOIAs defended to the teeth? Just askin…. 😉

          Or we could hang on a few days or weeks and let them deal with the other important issues that have been raised… Relax….

    • dhogaza
      Posted Jul 31, 2012 at 11:45 PM | Permalink

      “I also think that the deconstruction of homogenization algorithms is a different job from presentation of the surface stations data classification and that the two jobs should be kept separate.”

      Seriously, if you believe it’s a different job, how can you justify Watts position that the trend based on homogenization is “spurious” and inflated by a factor of two?

      If it’s properly a different job, Watts should STFU.

      If you like his conclusion, he should do the work.

      Really, who are you trying to fool here?

      • dhogaza
        Posted Jul 31, 2012 at 11:47 PM | Permalink

        “If it’s properly a different job, Watts should STFU.”

        And as co-author, you should call him on it.

        (I know you’re not really a co-author, as it’s normally understood, but until you make him take your name off the paper, you are a co-author. Time to choose, are you, or not? If not, make him remove your name from the paper, publicly.)

        • AndyL
          Posted Aug 1, 2012 at 2:20 AM | Permalink

          Steve has frequently said that when a novel statistical method is introduced, there should be a paper on details of the technique that is separate from the paper using the technique. His comment above seems no different.

    • KnR
      Posted Aug 1, 2012 at 4:15 AM | Permalink

      There is a step before RAW data , and that is how you go about collecting it .
      Remember this is problem of data collection becasue of problems with instrument/sites .

      If you don’t get the collection right what ever you do with the data afterwards does not matter .

  30. Arnost
    Posted Jul 31, 2012 at 8:11 PM | Permalink

    Is TOBs bias such an issue given the Watts’ study period was 1978-2008? I note that Fig 3 in the Menne et al paper “THE UNITED STATES HISTORICAL CLIMATOLOGY NETWORK MONTHLY TEMPERATURE DATA – VERSION 2” shows that the TOBs bias trend flattened after 1990… and so the impact on Watts’ study should be limited to only the 80’s (if at all).

    In any case from the Menne paper:

    “The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.012°C dec-1 and in minimum temperatures by about 0.018°C dec-1 over the period 1985-2006″.”

    So even if we accept that this applies across the entire Watts’ study period, then this is still only a 10th of the trend Watts is highlighting – i.e. 0.145°C per decade.

    • Posted Aug 1, 2012 at 12:52 AM | Permalink

      That’s odd. In my version of the Menne et al BAMS paper, the TOBS adjustments are shown in Fig 4 and the corresponding text, starting on p 996, says:
      “The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.015°C
      decade-1 (±0.002) and in minimum temperatures by about 0.022°C decade-1 (±0.002) during the period 1895-2007.”

      Eyeballing Fig 4, over 1979-2008 the trend difference due to TOBS looks like 0.06 °C/decade.

      • Arnost
        Posted Aug 1, 2012 at 3:26 AM | Permalink

        Click to access 141108.pdf

        Got that from the above… also didn’t realise it was different from the BAMS paper – but interestingly the trend for 1985-2006 is specified there [and this is the period of interest].

        My point is given that Anthony is only referencing the period from 1978-2008, then something like 50% of the stations would have ALREADY changed TObs (Time of Observation) from evening to morning. So these cannot be an issue. That should logically reduce this already small overall trend. See DeGaetano 2000:

        But hinking about this a bit further, maybe there is a MMTS conversion issue in play – and not only that you go from a LiG reading to an electronic one. As Menne et al suggest that most of the HCN sites were converted in the 80’s. Modern base units record daily min/max temps for up to 35 days, so I would assume that this would be done on a strict daily basis (midnight – midnight).

        Click to access nimbus-spec.pdf

        But I’m not sure if the earlier ones did this:

        So a conversion from an old style MMTS to a newer one may introduce another TOB issue where you go from a morning reading to a midnight one. Is this undestood and accounted for? Wouldn’t this introduce a warming bias?

        • Posted Aug 1, 2012 at 5:27 PM | Permalink

          I wonder if there was a typo in the draft – 1985-2006 should have been 1895-2006? Otherwise it’s an odd time period to choose.

          There was a significant change in Fog 4 between draft and final. The flattening post 1990 that you noted has gone away.

      • mt
        Posted Aug 1, 2012 at 7:55 AM | Permalink

        Here is the 1986-2006 version. Good to see that peer review works, showing an impact of TOBS changes that ignores most of the 1970-90 switchover is disingenuous at best.

        But if the change in observation was from afternoon to morning, I’m not sure the adjustment makes sense. “The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.012°C dec and in minimum temperatures by about 0.018°C dec over the period 1985-2006”. The bias in maximum readings is a positive bias, namely the measurement is the max of the tail end of the previous day and the current day’s maximum. How can fixing that bias yield an increase in max temps?

        • mt
          Posted Aug 1, 2012 at 8:21 AM | Permalink

          Never mind. If they’re making the 7am measurements look like 5pm measurements, they’d need to correct the cold bias in the minimums and reintroduce the warm bias in the maximums.

      • Ivan
        Posted Aug 1, 2012 at 10:24 AM | Permalink

        That’s not odd at all; that’ what the data shows

  31. Posted Jul 31, 2012 at 8:33 PM | Permalink

    This is a particularly funny thread. People are worried that Anthony rushed things and that led to errors.

    Yes, he did, and it did. That wasn’t the point of the timing of the release. He hasn’t submitted so there’s no need to rush from this point. He’s doing exactly what he said. Let the blogs have at it. What Steve Mc here is doing, will only add to the paper. The work of the station sitings can stand alone without the TOBS consideration. The TOBS will only give a fuller picture.

    The timing was a righteous poke back at some tricks played earlier. I like it. Goose/gander.

    • Christoph Dollis
      Posted Jul 31, 2012 at 9:19 PM | Permalink

      Watts should have nailed down who is and isn’t a co-author first (and whether the co-authors stand behind the paper).

    • dhogaza
      Posted Jul 31, 2012 at 11:38 PM | Permalink

      Gosh, now I’m on moderation, very cool!

      Steve: not personally. don’t get overinflated. some words trigger moderation.

  32. A. Scott
    Posted Jul 31, 2012 at 8:44 PM | Permalink

    crosspost from WUWT:

    From my limited non-technical understanding the data is readily available publicly with the exception of Anthony’s siting results using Leroy 2010. This includes the raw and adjusted temp data along with the Leroy 2010 rating spec’s, which would allow anyone to do their own duplication of the work.

    To me that seems preferable here – anyone attempting to duplicate should start from the beginning – rather then working backward from the conclusions.

    The paper notes they applied the readily available specs of Leroy 2010 to the Fall 2010 USHCNv2 data set.

    They identify the data they use:

    “We make use of the subset of USHCNv2 metadata from stations whose sites have been classified by Watts (2009)” and; “site rating metadata from Fall et al (2011)”.

    They further narrow:

    “Because some stations used in Fall et al. (2011) and Muller et al. (2012) suffered from a lack of the necessary supporting photography and/or measurement required to apply the Leroy (2010) rating system, or had undergone recent station moves, there is in a smaller set of station rating metadata (779 stations) than used in Fall et al (2011) and Muller et al. (2012), both of which used the data set containing 1007 rated stations.”

    Seems correct to expect Steven Mosher and Zeke would have access to this station data as Watts used the same data as Muller 2012 in this regard?

    They included description of data used, methods – how they calculated numbers, and their conclusions.

    To me it would seem much more relevant, for those interested in replicating to follow the entire process – and see how their siting category counts came out.

    And only THEN compare to Watts conclusions.

    I would also be interested in seeing how the USCRN stations, which were designed per Leroy 1999 (“which was designed for site pre-selection, rather than retroactive siting evaluation and classification”) fare under a review using Leroy 2010.

    Watts 2012 notes “Many USHCNv2 stations which were previously rated with the methods employed in Leroy (1999) were subsequently rated differently when the Leroy (2010) method was applied in this study”…

    Again, it would be very interesting, and potentially valuable, to see if the new USCRN sees the same siting quality results using Leroy 2010.

    • Armand MacMurray
      Posted Aug 1, 2012 at 2:35 AM | Permalink

      Having personally visited the CRN site north of Seattle, and seen photos of a number of the other sites, I’d be quite surprised if any ended up below the top site ranking using Leroy 2010.

      Regarding replication, it seems to me that the different aspects such as statistical analysis, checking Leroy 2010 scores, and so forth, are best done by those with expertise and interest in those areas. There’s no reason to demand that a single person or group do it all, or that it be done at the same time, or even in a certain order.

      Personally, I think it will be pretty clear upon looking at 10 or 20% of the sites whether Anthony’s new Leroy 2010 scores are done correctly, and that evaluation of the statistical analysis should not wait for a re-scoring of all the sites.

  33. Posted Jul 31, 2012 at 11:12 PM | Permalink

    Steve, you note that amplification is negligible over land in models with respect to long term trends. This is true globally in models, but I have to wonder the extent to which this effect varies (in models) from location to location. I wonder about this because a fee years back I empirically estimated the amplification factor globally based on interannual temperature fluctuations. I found it to be in the model ball park (perhaps a bit larger, actually) which implied a large warming bias in the surface data, a large cooling bias in the satellite data, some less large combination of those two, or an unknown real climactic effect on lapse rate variation that only operates on the long term and is absent from current models:

    I was motivated to see if I could get a similar result for the US, so I compared USHCN data from NCDC with UAH data for the same area. Much to my surprise, the slope for twelve month smoothed and subsequently detrended data (UAH as X, USHCN as Y) indicated more variation of temperature at the surface: a slope of about 2.23 (2.32 if you don’t detrend). This leads to a very slight cooling of the surface relative to the satellite record adjusted to surface variation levels. This suggests to me that global trends are biased warm but there is not likely to be a significant bias in the US record.

  34. Alexej Buergin
    Posted Aug 1, 2012 at 4:50 AM | Permalink

    If the time of observation should turn out to be a problen during the last 30 years, that would be really, really shocking.

    (PS. So I learned what TOBS is in meteorology; would anybody inform me what STFU means in Canada?)

    • Alexej Buergin
      Posted Aug 1, 2012 at 5:43 AM | Permalink

      So Climate Audit has new rules. My own experience was that
      1) if you are impolite
      2) if you have intellectually nothing to offer
      you get snipped.
      Seems this guy, who is and has, is treated differently.

      • AndyL
        Posted Aug 1, 2012 at 10:27 AM | Permalink

        Steve is more tolerant towards his critics than his supporters

        The phrase says more about the the person who used it than it does about Steve, so leaving it in place could be a fair response

        Steve: Precisely so. I expect regular Climate Audit readers and commenters to comment politely and am disappointed when they don’t. If someone does not comply with such policies, I prefer that people do not respond to such comments.

        • Posted Aug 1, 2012 at 5:16 PM | Permalink

          Some, not Eli to be sure, say that Tony Watts’ rush to press release was driven to come out before John Christy’s testimony in the Senate today.

          Steve: I know that it was more related to Muller. It was definitely a mistake to let the Berkeley thing get under his skin; Anthony realizes that now.

        • RomanM
          Posted Aug 1, 2012 at 6:08 PM | Permalink

          Seems like I remember a similar situation with the earlier BEST results. Strange, but I don’t recall the same some and definitely not Eli complaining vociferously about such an egregious action.

          Definitely crickets

        • Posted Aug 2, 2012 at 7:33 AM | Permalink

          General practice is to post/send out preprints at the same time as submission because having a manuscript in good enough shape to post/send out means that it is also in good enough shape to submit. There could be a few days either way in general. People are using arXiv today to establish precedent for submissions because of how long review can take.

          As Eli recalls that is what Berkeley did, and it is quite standard. What Watts did is post a draft, and a draft full of blunders.

        • RomanM
          Posted Aug 2, 2012 at 8:25 AM | Permalink

          Eli likes to repeat himself by posting the same text multiple times…

          You may not call it a “draft”, but the initial Berkeley release was rushed out filled with a generous amount of errors. Furthermore, it appears that the updated “manuscript in good enough shape to post/send out” is not quite so good enough. However, the BEST folks did not seem to be hindered from milking the media without indicating the publication status of the document.

          Maybe “Eli recalls” selectively…

        • RB
          Posted Aug 2, 2012 at 11:31 AM | Permalink

          It is true that we do not know the publication status of the BEST document. We could however interpret Mosher here as implying that McKitrick’s review may have been found to be “not quite so good enough” by the editors:

        • RomanM
          Posted Aug 2, 2012 at 12:00 PM | Permalink

          According to an update on Ross McKitrick’s web site:

          [Update July 30: JGR told me “This paper was rejected and the editor recommended that the author resubmit it as a new paper.”]

          Sounds to me like it has been rejected…

        • Posted Aug 5, 2012 at 5:59 PM | Permalink

          SOP at AGU journals now. If they think major changes are needed they reject with a suggestion to rewrite in view of the referee’s reports. It unclogs the pipeline. BEST has updated their web page to show that one of their papers has been accepted subject to some changes in the methods paper which is also under consideration.

        • TerryMN
          Posted Aug 5, 2012 at 6:02 PM | Permalink

          Josh, you have a bad habit of making declaritive sentences with assertions that may or may not be true.

        • RomanM
          Posted Aug 6, 2012 at 7:22 AM | Permalink

          This has been the polite language for rejecting non-viable statistics journal manuscripts as far back as I can remember. At that point, the paper is no longer under consideration for publication by the journal so it has indeed been rejected. Should the manuscript be rewritten, it is resubmitted as a new paper.

          Your efforts in spinning facts based on concepts such as “it depends on what the exact meaning of the word rejected is” come across as comical…

        • Venter
          Posted Aug 6, 2012 at 10:50 AM | Permalink

          If you don’t know the meaning of rejected, better go back to school and redo your english comprehension classes.

        • Venter
          Posted Aug 6, 2012 at 10:52 AM | Permalink

          My comment is aimed at Bunny Boi

        • Posted Aug 5, 2012 at 5:56 PM | Permalink

          snip –

          Steve- I normally don’t snip critics but you’re making an untrue factual allegation here. I did not do “much of the statistical analysis” in the paper. I did not even see the paper until Friday; I did one analysis, which unfortunately did not catch a latent problem. It would have been more appropriate to acknowledge me than to list me as a coauthor, but unfortunately I did not catch this as the grandchildren were over visiting on Saturday night and Sunday morning and I missed some emails.

        • Posted Aug 2, 2012 at 7:35 AM | Permalink

          Not to beat on this too much, and Eli suspects that is what Tony told you and what he believed, but Christy is ALSO an author and was quite aware of the hearing. He could have been pushing Watts without showing his hand. Just sayin’but in any case the optics are awful.

        • Posted Aug 2, 2012 at 10:56 AM | Permalink

          There’s no reason Anthony and coauthors couldn’t now take it down, now that it’s been helpfully crowd reviewed. This wouldn’t be a retraction, since it was just a circulation draft in the first place. Leaving it up creates the impression this is a semi-official version.

  35. Geoff Sherrington
    Posted Aug 1, 2012 at 5:28 AM | Permalink

    Please keep in mind that Anthony’s pre-publication covers the USA 48 contiguous States (CONUS). Ultimately, we seek a world estimate of temperature change, with uncertainty quantified, plus more confidence in ascribing change to various factors.

    In Australia (another 2% of the area of the globe) there has been discussion about writing a pre publication similar to the above. However, this might be impossible to do well, if at all. It would be even harder to do, for example, over the Antarctic.

    The Watts pre publication deals mainly with 2 techniques. The first is microsite changes of some magnitude resulting from using methods attributed to Leroy(2010). In Australia, there are over 1,200 sites with temperature data, but there are also many sites where man has not placed any or many objects that can store and re-radiate heat in a way that the Leroy method could compensate. Conversely, there are many that could be analysed this way, but why bother when there are many without the complications?

    The second major Watts technique relates to the ways that USA data are treated after collection. ‘The identified biases include station moves, changes in instrumentation, localized changes in instrumentation location, changes in observation practices, and evolution of the local and microsite station environment over time.’

    Again, there are many Australian sites where many of these factors are absent. It would be perverse to seek out sites heavily affected by these to see if the Watts corrections work in Australia as well as in the USA, though a handful of important sites could use examination.

    I raised Antarctica in these 2 contexts. There is a relationship between analysis method and population density of countries. There has been a good deal of prior work done in Australia, which has a land area similar to CONUS but 5% of the USA population. The work that is being done tends towards the conclusion that official estimates are inflated; qualitatively, it would not be surprising to find that more work on Australian data would give trend results similar to those reported by Anthony & Co. (The story Steig et al 2009 and its rebuttal paper by O’Donnell et al 2010 is well documented for the Antarctic.)

    In short, one can select a number of pristine Australian sites to analyse for trends over the last 30 years. I have done some 45 of these and found a wide trend variation, from about +0.47 to – 0.27 degrees C per decade linear, far greater than the CONUS trends. Therefore, we seem to have a noise problem. If there is to be further development of statistics as Steve Mc has foreshadowed, for example to cope with TOBS, then it would be great if that statistical proficiency could also dig a little deeper into signal:noise topics.

    Finally, the vast bulk of liquid-in-glass observations in Australia were made at 0900 hours, so TOBS can be selectively ignored.

    Yes, the validation of historic temperatures is boring and not mentaly stimulating, but without a verified temperature base, there is not much point in using advanced statistics. That has been a criticism of BEST.

  36. dearieme
    Posted Aug 1, 2012 at 5:33 AM | Permalink

    Since my instinct is that there has been a bit of warming, and one of my complaints has been that there seems to have been no non-risible estimate of it based on land surface measurements, I am glad to think that, after due criticism and correction (if required), there may soon be such a non-risible estimate.

    As for what might have caused such a rise, Lord knows. Is there any evidence worth a hoot?

    • Posted Aug 1, 2012 at 6:19 PM | Permalink

      I’m amused at the risible rush to a rise.
      It’s conflicted, what’s visible: Caution is wise.
      And the Watts et al. paper addresses a bit;
      I’m assuming “non-risible” means “worth a … hoot.” ];-)

      ===|==============/ Keith DeHavelle

  37. Espen
    Posted Aug 1, 2012 at 6:07 AM | Permalink

    A (perhaps naïve) question: Do MMTS stations have the same TOBS issues, or do they record the 00:00-23:59 max/min temperatures properly in a way that doesn’t require a manual reset on some specific time of day? I ask because the Watts et al result for “Rural MMTS, no Airports” (see slide #45 and #52 in the “Overview” PPT is the most eye catching of all, with practically no warming.

    • mt
      Posted Aug 1, 2012 at 6:28 AM | Permalink

      I’m pretty sure MMTS would have TOBS issues if there was a change in the time of observations. This shows that MMTS thermometers have to be manually read, and infers they’re capturing the same min/max over the last 24 hours.

      • Espen
        Posted Aug 1, 2012 at 8:01 AM | Permalink

        Thanks, mt!

        It seems that this has changed recently, though: Anthony now has an update in his second discussion second thread where he notes that “With the advent of the successor display to the MMTS unit, the LCD display based Nimbus, which has memory for up to 35 days (see spec sheet here they stopped worrying about daily readings and simply filled them in at the end of the month by stepping through the display.”

        • Arnost
          Posted Aug 2, 2012 at 1:23 AM | Permalink

          As I said earlier a conversion from an old style MMTS base station to a newer one may introduce another TOB issue – where you go from a morning reading to a midnight one. Is this undestood and accounted for? It should introduce a warming bias.

          Steve; this issue is well known to specialists.

  38. Stephen Richards
    Posted Aug 1, 2012 at 9:31 AM | Permalink

    Just a summary.

    I can see why SteveM is a little peeved. If one is going to be involved in a project one should be there from the beginning even if your contribution is small and towards the end.

    Anthony Watts and his ‘team’ made an enormous effort to get to where they are now. They have seen NOAA/NASA take their data and publish a rebuke long before any reasonble conclusions could be made and even before the data was complete. The dirisible comments here from the lukewarmers at Lucia’s site and the trolls from RC are, IMHO missing the ‘trick’ that Anthony and his team are doing.

    They have made public the body of a paper which has not yet been submitted for publication. Whether by design or accident ( and only they know) this has allowed them to refine the body of the paper and pick up possible data options which will make the paper more ‘airtight’. If you are not a GW supporter you need your work to be very much more precise, accurate and without error than if you are not a ‘team’ member. As we have seen from the likes of Mosher, owzyafarther etc non team member papers are treated in much more rigorous manner than otherwise would be the case.

    As a senior project manager I was guilty of using and abusing people like SteveM because their input(s) are guaranteed to aid the success of the project which was always my primary target. The people I abused in this way were not always happy about it and that led to a reduction in their valuable work. There is a balance to be struck, I now realise, and I hope that AW and SMc will find a way. This work is too important not to be completed with its’ best possible outcome.

    For me; I think a public apology to SteveM (I suspect that Steve doesn’t need it) might go a long way.

    Steve: Anthony and I have chatted. There was a misunderstanding due to the rush and the time zone and my grandkids staying overnight. I signed off on Saturday dinner and we went out them in the morning. I missed some emails in the evening and Sunday morning until later. I would have suggested an acknowledgement. But again, in the rush, I missed something that I wouldn’t normally miss and I’m very annoyed at myself.

    Anthony: Steve created an entire section, and in fact referred to it as “my section” in emails. An acknowledgment would have been insufficient in my view.

  39. Posted Aug 1, 2012 at 10:37 AM | Permalink

    I have a comment from Jul 31, 2012 at 6:48 PM awaiting moderation. It begings:
    I thought the comment from Mike B, Sep 28, 2007 6:45 PM was well taken:

    and has a key quotes:
    There’s lots of assumin’ goin’ on out there.

    …I interact regularly with government agencies (Census Bureau, Department of Energy, Bureau of Economic Analysis) that wouldn’t dream of something so sloppy.

    But I guess the right people liked the result, so it lives on.

  40. Alexej Buergin
    Posted Aug 1, 2012 at 10:53 AM | Permalink

    As much as I regret that I have to wait longer for the promised report on Esper 2012, I have a feeling that a serious look at TOBS might be rewarding. It might just show that US-meteorologists did a good job during the last 30 years (I am talking about the people using the equipment), and that TOBS, as is should be in the age of the computer and automated measuring, is no serious issue anymore.

  41. Ivan
    Posted Aug 1, 2012 at 11:05 AM | Permalink

    This entire debate about TOBS is a red herring, and here is why: if the difference between the raw data and the final product is mainly due to the TOBS adjustments then you would expect to see the similar raw trends for rural and urban stations, and for “compliant” and “non compliant” stations, don’t you?

    However, that’s not the case at all. The raw trend for all the 1,2 rural stations, airports excluded is 0.108 c per decade, three times lower than the reported official trend. And when you take into account just the MMTS rural stations without airports the trend is 0.032, essentially flat! In the same time the raw data for the non-compliant 3,4,5 class stations show 0.212 C and for all stations, urban and rural 1,2 – 0,155, for the class 3,4,5 stations raw 0.246 and for all stations adjusted 0.3. How come that TOBS adjustments for the good and rural stations are so much higher than for the bad and urban ones? Obviously the bulk of the difference between the good and bad stations has nothing to do with TOBS. And the MMTS rural 1,2 stations with the trend 0.032 are the only ones which are relevant for assessing the real climatic warming. So the entire fuss about TOBS is beside the point.

    From the paper:

    “The gridded average of all compliant Class 1&2 stations in the CONUS is only slightly above zero at 0.032°C/decade, while Class 3,4,5 non-compliant stations have a trend value of 0.212°C/decade, a value nearly seven times larger. NOAA adjusted data, for all classes of rural non-airport stations has a value of 0.300°C/decade nearly ten times larger than raw data from the compliant stations.

    These large differences demonstrated between regional and CONUS trends accomplished by removal of airports and choosing the rural subset of stations to remove any potential urbanization effects suggests that rural MMTS stations not situated at airports may have the best representivity of all stations in the USHCNv2.”

    Question: Why then the same paper trumpets 0.155 trend as relevant, when it includes both airports and urban stations data, as well as the measurements made by the older and less reliable equipment?

    • Ivan
      Posted Aug 1, 2012 at 11:11 AM | Permalink

      In other words, the effect of artificially warming up the rural and good stations to match the urban and bad ones is by far greater than the effect of TOBS adjustments.

    • toto
      Posted Aug 1, 2012 at 2:10 PM | Permalink

      then you would expect to see the similar raw trends for rural and urban stations, and for “compliant” and “non compliant” stations, don’t you?

      No, because TOBS biases are much more important for rural stations than for urban stations, and rurality in turn happens to be correlated with station quality. That’s the whole point of the “confound” term used by Steve in the post.

      Steve: Absolutely.

      • Ivan
        Posted Aug 1, 2012 at 5:55 PM | Permalink

        First, the “good” stations could be equally urban and rural, depending upon micro-siting. Quality does not have anything to do with urban vs rural.

        Further, why is TOBS more important for rural stations? You mean, the “real” climatic trend is larger than the raw trend at urban, poorly placed stations? And that more urban and more poorly placed the station is, the less likely it is to experience the problems with TOBS?

        Finally, if the rural stations are generally more likely to have TOBS issues than urban ones, whether then the non-airport rural stations are also more likely to have TOBS issues than the airport rural ones (since the trend at the rural airports is three times higher than at the non-airport rural stations!). And also that the MMTS rural stations have much greater problems than CRS rural stations, since the MMTS trend is about flat, whereas the CRS trend is 0.108 C per decade? Man, that would be really a fine-tuned intelligent design, made by God in order to preserve the global warming hype! 🙂

      • James Smyth
        Posted Aug 1, 2012 at 7:50 PM | Permalink

        No, because TOBS biases are much more important for rural stations

        Why? Briefly.

        Steve: Because the incidence of afternoon-to-morning time changes is much greater.

      • Ivan
        Posted Aug 1, 2012 at 7:51 PM | Permalink

        To summarize your and Steve’s argument – the fact that the good and rural stations show almost no warming trend whereas the badly placed and urban ones show huge warming is not a consequence, as one might think, of the latter being affected by, you know, UHI, but, au contraire, of the former not being “properly adjusted”. I knew there must be some logical explanation.

        Steve: please do not presume that I’m overlooking obvious points. Rural good stations whose TOBS changes from afternoon to morning have measurably lower trends than rural good stations with no TOBS change. You can’t ignore this merely because it gives a result that you “like”.

        • Ivan
          Posted Aug 1, 2012 at 9:29 PM | Permalink

          How about rural good stations with MMTS having substantially lower trend that good rural CRS (if I am not wrong, the MMTSs do not have any TOBS issues, since they automatically record the highest and lowest temperature for a given day?)? Or rural airport stations having 0.240 C trend, whereas non-airport rural ones having from 0.032 to 0.108, depending upon location and measurement technique? Is that also due to TOBS changes? I am not claiming that the TOBS adjustments are irrelevant, I just don;t see how they could explain those differences.

          Steve: TOBS is a different issue than automatic recording. It affects MMTS as well. I don’t know why you’re arguing in such categorical terms. I didn’t say that TOBS accounts for everything. Only that it is a confounding factor that needs to be disentangled and it wasn’t. The statistical analysis needs to be re-done. It will be re-done.

        • Ivan
          Posted Aug 1, 2012 at 11:56 PM | Permalink

          ok, I support your effort to recalculate. But the reason why I was and am skeptical is following: we have data of various degrees of validity according to several independent criteria. The best data available have the lowest trend, and, lo and behold, as you go further to less and less reliable data ,the trend increases. Occam’s Razor seems to favor the elementary explanation that urban and poorly sited stations have artificial warming bias, especially visible at airports. This relationship in the data is so obvious, so overwhelming, that it seems to me exceedingly unlikely that it could be just a product of some simple non-adjusting error. TOBS problem should “target” with such a precision only the good and rural stations, and so many factors should coincide, in order to make the TOBS a significant factor. The MMTS rural, non airport data should have dramatic TOBS problems, but not the rural MMTS airport stations; 1,2 compliant urban stations should have TOBS issues, but not the 3,4,5 class urban stations; MMTS rural stations should systematically have more TOBS problems than CRS. the Airports in general should have much less problems than the non-airport stations. And on it goes. What is the basis for these expectations/predictions? Belief that all those factors could operate in harmony borders on religion.

        • Armand MacMurray
          Posted Aug 3, 2012 at 10:15 AM | Permalink

          Ivan, it would be worth your time to look into a sample of actual stations in order to better understand how temps were recorded in real life. There are many sources of non-uniformity that can correlate with station quality or location.

          Here are a few that I’ve seen:
          1) High-quality rural stations are often run on an individual basis rather than an institution (e.g. farmer volunteer vs. airport office). If a single individual is responsible for the readings in the former case, TOB may take a big jump as that responsibility passes from person to person; in the latter case, there are often “official” procedures in place specifying TOB.

          2) The MMTS (the “first” version, not NIMBUS) is not necessarily “better” than the manual liquid thermometer/Stevenson screen system. For example, IIRC the MMTS has to be manually reset each day to clear it for monitoring the next day’s min/max temps. In addition, the MMTS outdoor sensor is physically connected to the indoor control box via a cable. The default cable shipped was of limited length, and so greatly limited the siting choices for the MMTS temp sensor vs. the connection-free earlier manual liquid thermometer.

        • Ivan
          Posted Aug 4, 2012 at 2:21 PM | Permalink


          you are just illustrating my point. It is one thing to say that the rural stations may not be as reliable as some people think,because of the general problems with personal you note, but quite another to claim that the data must of necessity exhibit a strong cooling bias! Why changing persons would necessarily so alter TOBS as to increase the trend? Why not decrease? What is the basis for believe that there exist ANY bias on that account, let alone, a warming bias? Various changes could simply cancel each other out.

          Also, your second paragraph only strengthens my case; you are pointing out that there is likely a warming bias in MMTS.

        • Armand MacMurray
          Posted Aug 5, 2012 at 1:48 AM | Permalink

          Ivan, all I’m saying is that before making claims and assumptions about what is likely and what is not, one needs to look at the actual data.

          There are many possible sources of bias, some of which may be issues in practice, and some of which may not. For those that are issues, they are not necessarily random, but may be systematic. One really needs to look at the actual data and metadata in order to get a feel for the specific situation — the data awaits you!

  42. Posted Aug 1, 2012 at 11:40 AM | Permalink

    This is certainly an interesting discussion — is TOBS important? Is the temperature record a time sink? Is the temperature record accurate? If somebody asked me if the temperature record was accurate — I would ask “Which one”?

    Analyzing an clarifying the temperature record may be a time sink. However, if I understand correctly the paleoclimate records are calibrated against these many temperature records — and not everybody chooses the same record. (Correct me if I am wrong.) So my understanding is that the most meticulous paleoclimate record could be rendered worthless by calibrating against a temperature record of dubious accuracy. If one has a paleoclimate series of dubious accuracy plotted against a temperature time series of dubious accuracy — what exactly was shown by all this effort?

    …and I did not even ask about TOBS yet. I read Anthony’s latest comment (in his second comment thread) on how the data collection was done and I agree with his general premise that corrections for TOBS might no be as revealing as some imagine. His comments on the data collectors and the methodology match what I know about problems in other areas — that are far less problematic that a partly volunteer data collection network.

    I guess those are the sort of things a simple layman (in climate science) might ask of the experts. I hope this makes sense to others.

    Steve: the temperature data issues are not really relevant to paleoclimate calibration as the uncertainties and issues are not germane. I begrudge the time because the differences in dispute are IMO rather small, but the data sets are large and complicated.

    • Keith AB
      Posted Aug 1, 2012 at 12:44 PM | Permalink

      snip – overeditorializing

      • Posted Aug 1, 2012 at 1:15 PM | Permalink

        Re: Keith AB (Aug 1 12:44), Keith:

        I think that the newest records done with the “memory” stations could be high value — a personal opinion. I think Anthony makes the point very well that the confusion with TOBS might not invalidate previous records but perhaps degrades their worth to some unquantifiable degree. Much of the confusion appears to be due to human nature and the fact that people did not record data at the same time every day due to what can only be explained as human nature. There appears to be no way to untangle that portion of the data. Maybe somebody can develop a statistical test that proves when the recorders were visiting their grand kids or grocery shopping. As much as I respect our host I suspect that developing that test is beyond even him — perhaps even Drs. Hansen and Mann.

        However, Anthony does point out effectively that some of the recent data could be just fine and perhaps should become the gold standard for judging the remaining data. Since the change to the new record keeping thermometers in the 1980s (I think) maybe there is not enough data to perform any reliable calibrations.

        I should clarify one point. Every project that I am currently working on professionally came about because of the Global Temperature Record and the “proof” of AGW. Every project that I am not working on was cancelled because of the same proof in the sense that projects were cancelled because of extra costs in resource exploration and extraction. IOW — It would be difficult to argue that extra costs and at least some of the economic downturn did not occur because of concerns about GHG emissions. Bad economy — no projects. Bad economy for many reasons of course — but there is a contribution in my area from this particular debate. Call this debate a contributing factor of some undefinable proportion and leave it there please. It’s not the point of this discussion.

        So does this discussion affect me? Rather directly I would say — right in the pocketbook — at least in my current projects.

  43. Manfred
    Posted Aug 1, 2012 at 12:06 PM | Permalink

    Can anybody verify this – I think there is an error in the NOAA adjustment procedure.

    “1.A quality control procedure is performed that uses trimmed means and standard deviations in comparison with surrounding stations to identify suspects (> 3.5 standard deviations away from the mean) and outliers (> 5.0 standard deviations). Until recently these suspects and outliers were hand-verified with the original records. However, with the development at the NCDC of more sophisticated QC procedures this has been found to be unnecessary.

    2.Next, the temperature data are adjusted for the time-of-observation bias (Karl, et al. 1986) which occurs when observing times are changed from midnight to some time earlier in the day…”


    The first step should already remove some of the error due to double counting tmin or tmax measured at critical times such as 7 am in the morning or 2 pm in the afternoon. It actually removes those cases with the largest contribution to the TOBS adjustment – double counts with a large difference to the true value. I don’t see any reduction of the TOBS adjustment for errors already removed in step 1.

  44. Nicholas Swart
    Posted Aug 1, 2012 at 12:11 PM | Permalink

    I think some are being a bit too hard on Steve and Anthony. We all know the story of Wiles’ proof of Fermat’s Last Theorem.

    • jfk
      Posted Aug 2, 2012 at 11:23 AM | Permalink

      I know that the proof is publicy available and I can check its correctness if I put enough work into it.

  45. Chuck L
    Posted Aug 1, 2012 at 1:25 PM | Permalink

    I see that a comment I posted has been moderated away. In it I said, the paper was the result of the efforts of volunteers without govenment or university grants or other aid. That you, Steve McIntyre, would do your usual thorough and object analysis of the TOBS issue, and wondered if TOBS would be an issue for all classes of stations which might not change the qualitative results of bias resulting from poorly placed weather stations.

    If I inadvertantly violated blog protocol or offended someone, I certainly am sorry since that was not my intention. That being said, I am curious why my comment was removed.

    Steve: If people are going to participate in public debate, their work should be held to professional standards. I deleted the comment, because I don’t want to support excuse making.

    • Chuck L
      Posted Aug 2, 2012 at 6:39 AM | Permalink

      Fair enough, I wasn’t trying to make excuses for anybody, rather I was impressed that a group of volunteers collaborated on this study without the support of the “climate establishment.

      Thank you for posting this and thank you for taking the time to reply to me.

  46. Posted Aug 1, 2012 at 4:33 PM | Permalink

    Mosher posted 8 questions

    1. Did you double check the ratings or just put the data in an algorithm?
    2. Since the new site ratings seem to depend up some manual labor done using Google earth: Did you have occasion the do a spot check on the accuracy of those ratings?
    3. Since Wickham made her station list available to you prior to submission, will you make your station list available to others?
    4. Why did you stop at 2008?
    5. What you say about amplification here differs with what you wrote in the paper.
    6. What does a comparison with CRN show?
    7. You use USHCNv2 metadata to classify rural/urban. Did you check that? Do you accept that definition of rural?
    8. The how were grid averages computed?

    Reading the rest of thread, I’m at a loss to find cogent answers to any of them. Anyone?

    Steve: better to ask Anthony. As I mentioned in the post, I was not involved in the writing of the paper other than contributing a rushed statistical analysis that unfortunately exacerbated the TOBS problem. Anthony was trying to be polite by adding me as a coauthor, but an acknowledgment would have been appropriate. I was offline on Saturday night and Sunday morning as our grandchildren were visiting and didn’t deal with this issue and it got overtaken by the rush. An unfortunate misunderstanding. However, since I’ve had my fingers burned, I’m now chipping in and trying to ensure that the matter is dealt with correctly. In the meantime, please don’t grind at me as though I was the person who did the work. I can inquire on some of these issues, but that’s all that I can do.

  47. Peter Wilson
    Posted Aug 1, 2012 at 5:52 PM | Permalink

    There have been a number of comments about the immediate nonavailability of the data for this paper. Fair enough.

    At least I think we can be sure Anthony is not going to say “I have 5 years of research tied up in this data, why should I give it to you when all you want to do is find something wrong with it?”

    Nor do I believe any FOI requests will be necessary before the data is released.

  48. Curt
    Posted Aug 1, 2012 at 7:38 PM | Permalink

    The TOBS adjustment is based on the idea that for a once-a-day observation of the minimum and maximum temperatures of the previous 24-hour period, the closer the observation time is to the typical time of minimum or maximum temperature in the diurnal cycle, the more likely it is that there will be a duplicate or near-duplicate reading for two days. Averaged over many days, this could artificially bias the minimum or maximum to be more extreme, if not corrected for.

    As the trend in the US rural stations, which at least until very recently employed these min/max stations, has been from early evening observation (5pm or 7pm in most of the sources I’ve found) to early morning observation (usually 7am), this has been presumed to put an artificial cooling bias into the temperature record, so a net positive, and increasing as more stations have been converted, correction has been added to the raw data.

    The more recent 7am readings have been very close to the typical sunrise minimum daily temperature, the problem of “duplicate” minimum daily temperatures should be particularly acute. However, I have seen anecdotal reports that many of the volunteer COOP weather observers have long been aware that this would yield bad daily readings on many days, and so would reset the minimum reading in the early afternoon so that the next morning’s reading would always show the low for that morning and not data from the previous morning.

    Now, I realize that the plural of anecdote is not data… but it seems to me that it would be quite straightforward to evaluate properly if this is the case for individual stations. For a station, when the time of observation is shifted from early evening to early morning, the number of “duplicate” minimum readings in the raw data should increase greatly if there is no other resetting. Probably the best measure would be to compare the lag-1 autocorrelation in the minimum readings before and after the change. If there is no significant increase in this metric, the adjustment may well be unwarranted.

  49. A. Scott
    Posted Aug 1, 2012 at 8:46 PM | Permalink

    This looks like a valuable resource for aerial, topo and other info of the sort

  50. pjie2
    Posted Aug 2, 2012 at 4:03 AM | Permalink

    Mike B at Tamino’s blog has pointed out that substantial chunks of the current draft (Sections 2.1 and 2.2 in particular) are word-for-word identical with Fall et al 2011. These will obviously need revising before publication to avoid accusations of self-plagiarism. Just a heads-up for Steve M, assuming he wishes to stay a co-author on the final paper.

  51. Posted Aug 2, 2012 at 7:19 AM | Permalink

    I think there are some more issues which need looking at beside TOBs.

    (1) Fluctuations are generally likely to be greater, the colder the average temp one is dealing with. So, tropics: tiny fluctuation. Temperate zones: moderate fluctuation. Arctic zones: huge fluctuation of temp. Thus during the warming-earth period we’ve seen during 1979-2008, there is good a priori reason that min temperatures should be seen to rise more than max temperatures. Likewise, during a cooling-earth time I would expect the opposite.

    (2) So what should be paid very close attention to, is records where the overall max trend appears to be zero. If in such cases, min trend shows at all, these should be regarded with suspicion as likely to be an artefact of land use changes. There may be statistically rather few of these. But we have to make use of what we have, and if something shows up that looks likely to be significant, we have to find another way to winkle it out of the other effects on all records.

    (3) The correlation between station dropout and temperature rise, that Ross McKitrick showed, should be investigated. As noted above, Arctic temperature fluctuations are far greater than temperate zone fluctuations – and this may play into Ross’ results.

    (4) IIRC, Roy Spencer and Andrei Ilarionov showed that the UHI effect on trends is far more marked on very rural areas becoming slightly less rural, than on urban areas becoming even more urban. A significant effect, that might escape detection by even Leroy 2010?

    • Posted Aug 2, 2012 at 7:46 AM | Permalink

      Polar climates have little daily variation because of the long days/nights

      BEST disposes of point 3. The dropout is not nearly as strong in their larger data set.

      Spencer had a really bad way of handling density, basically too coarse grained.

      Temperature drops in deserts can be huge because the humidity is low.

  52. Scott Brim
    Posted Aug 2, 2012 at 10:06 AM | Permalink

    In the nuclear industry, the analysis product itself, and the process used to produce that analysis product, are viewed together as being one unified thing. Likewise, the raw data and the interpreted data, and the analysis methods used to analyze that raw and interpreted data, thus producing information, are also viewed together as being one unified thing.

    Within the nuclear industry, in viewing the fitness of an analysis product for its intended purpose, any deficiencies found in the raw data, in the analysis methods, in the interpreted data, or in the overall process used to produce that analysis product mean that the analysis product itself is deficient. If so, the analysis product is sent back for repairs and rework, including repairs to the process methods, as necessary.

    This is why, for example, it took twenty years and cost fifteen billion dollars to study and analyze the Yucca Mountain site for use as the nation’s high level nuclear waste repository. Moreover, a good portion of that fifteen billion dollars was spent documenting all facets of both the process methods and the analysis product so that it was all totally accessible and totally transparent.

    The validity of the surface temperature record of the United States, and the suitability of its employment in climate science, is arguably a more important public issue to be properly addressed than is the suitability of the Yucca Mountain site for storing the nation’s nuclear waste.

    And yet no one proposes spending fifteen billion dollars to properly study the validity of the surface temperature record of the United States. The job of looking at important problems in the surface temperature record, problems which are not yet properly addressed by the government agencies responsible for that information, is instead being left to ad-hoc associations of volunteers acting in the public interest.

    Ad-hoc or not, the volunteers who have done this important work must be held accountable for the quality of their product. If the product is deficient in terms of process and/or content, it has to be sent back for repair and rework.

    In today’s wired world, it has never been more true that an ounce of prevention is worth a pound of cure. It is plainly obvious that a disciplined internal peer review of the Watts et al paper prior to its public release would have prevented much of the controversy that is now developing about it. Haste makes waste, in other words.

  53. Konrad
    Posted Aug 2, 2012 at 10:56 PM | Permalink

    While I can understand Steve McIntyre’s irritation with revisiting the statistics due to the TOB issue, I feel many in the climate blogsphere are investing undue hope that this may invalidate Anthony Watts’ paper. Perhaps because of the amount of time people have spent trying to tease a climate signal from low resolution meteorological data , or the large body of work resting on this dubious foundation has confused some into thinking that Anthony Watts needs to calculate a climate signal using alternate methods. He does not actually have to do this for this paper to have an impact.

    Some may recall the shrill demands that Steve McIntyre produce his own proxy temperature reconstruction when he brought the infamous hockey stick into question. All that was actually necessary was to demonstrate that there were one or two issues with short centring proxy data before principal component analysis. Issues like short centring red noise prior to PCA also produces hockey sticks.

    All that Anthony Watts need demonstrate is
    1. The amount of metadata required to turn a low resolution meteorological record into a climate record.
    2. That this metadata is unavailable, has not been used or has not been used correctly in the surface temperature products on which much of climate science has been based.

  54. Posted Aug 6, 2012 at 1:32 PM | Permalink

    I’m laughing at a line in, by a blathering professor “Eli Rabett” who is distainful of many in the climate arena including Anthony Watts but sometimes alarmists.

    In criticizing Richard Muller’s claimed flip from skeptic to believer in CAGW and other behaviour, Rabett says “This ain’t saying that the BEST project was useless, they have developed some interesting methods, and pushed the surface temperature instrumental record back somewhat. It wasn’t that others were unaware of such records, but the level of trust was, let us say, about where Michael Mann stands in Steve McIntyre’s mind.”

  55. Jim2
    Posted Aug 7, 2012 at 6:03 AM | Permalink

    Watts did make a mistake, but Muller doesn’t get a free pass. He and his team promised to be transaparent. If you go to the BEST web site and download the code, the associated readme file says the code isn’t cleaned up and may not work. Obviously in the latest paper, according to peer review, not all the methods used were elucidated. So let’s not beat up Watts only here. Muller hasn’t lived up to his promises, either.

  56. Ivan
    Posted Aug 8, 2012 at 11:25 AM | Permalink

    One more strong indication that this entire TOBS panic is probably a waste of time (or should I say – a deliberate diversion).

    Roy Spencer developed his own surface temperature index for the USA 48, by using only the stations which have a homogenous method of taking data over time (four measurements per day) and which are hence completely free of any TOBS biases. In addition, he is correcting the data for urban heat bias by the so-called population density adjustment. His thus obtained linear trend for 1973-2012 is 0.145 C per decade.

    Now, Spencer does not correct for the two things that Watts does: the micro-siting issues and the airports. If he had done that, the overall trend would have been likely much lower. The 1,2 “compliant” stations typically have 2 times less warming than the 3,4,5 non-comliant stations. Assuming a random selection of good and bad stations in Spencer’s index, we should expect the trend to fall substantially when corrected for micro-siting warming bias (because the 3,4,5, stations are much more numerous that the 1,2 stations).

    So, in all likelihood, the real climatic trend at TOBS bias-free stations is almost certainly lower than 0.150 C (reported trend in Antony’s paper) and likely much lower than that. If anything, Antony has significantly EXAGGERATED the real climatic trend.

    • Posted Aug 10, 2012 at 4:50 AM | Permalink

      Seriously what do people mean with gobbledegook, such as “..have 2 times less warming than..”

      Do you mean less than 1/2 the warming?

      I see this gibberish everywhere nowadays and I have no idea what any of these writers mean.

  57. Posted Aug 9, 2012 at 12:23 PM | Permalink

    Peoples are getting over-excited.

    I read Watts as saying he would release raw data, I don’t remember when (e.g. soon or when the paper was submitted to a formal publication). I have the impression they were going to post a few more things when they recovered from the main effort.
    But A. Scott has herein pointed to the root raw data is already publicly available, that Watts listed sources, and that Watts has explained his method, including reference to a published station evaluation criteria. Ideally, the earlier verifiers/critics start in the data chain the better. But hopefully solid data is generated to build further on – Watts is in effect claiming (unlike Muller) that the data is not solid, specifically that there are serious errors/omissions.

    So the effect of time of observation effects on the data needs more work – did others like Muller even mention it? (Though “Ivan” downplays the significance of TOBS, others disagree, seems a secondary factor.)

    Hastiness has been acknowledged, unlike others who can’t acknowledge fundamental errors in their work. Lesson may be to avoid chasing the schedule of sleazy types like Muller who released a paper so flawed his own team and rabid alarmists like Michael Mann have panned it. (Though Mann may be paying back for Muller having pointed in 2004 to McIntyre & McItrick breaking his hockey stick.)

    Seems to be good technical discussion amongst the blather here.
    As for knowing only the current state of each station, that is an important step. It facilitates substantially answering the question “Is this station’s data accurate today?” If the answer is no the station’s data should be removed from the database until the question
    “how quickly did the station get to that inaccurate state” can be answered. (Presumably very few cases where the environment has improved, such as a walkway or road torn up after re-routing of them.)
    At least some of the answer can be obtained by a huge amount of slogging through construction records, news reports (campus newspaper, newsletter of host/data-collection organizations, etc.), and old photographs. Perhaps somewhat amenable to automation (modern search and image-recognition techniques if data is in computer format, otherwise additional cost to photograph/scan). Cash to manage the research can be mailed to me at …. 😉

    More seriously, personally I am skeptical that much of the fussing over instrumental temperature analysis is worthwhile toward the goal of predicting change thus facilitating preparation. I suggest it is even less worthwhile in the blame-humans debate, as satellite temperatures, estimation of temperatures in past centuries and millenia, calculation of human contribution to CO2 increase, physics of CO2’s effect on heat flow in the atmosphere, and inaccuracy of theories (“models”) are far more important. So I’ll leave slogging through history of surface stations to those criticizing Watts’ work. 🙂

  58. Posted Aug 10, 2012 at 2:55 PM | Permalink

    Regarding people thinking their messages are missing, note that WordPress is not always displaying messages in date order.

    In this thread I see a batch of messages Posted Jul 31, 2012 at 3:58 PM through Posted Aug 6, 2012 at 10:52 AM displayed after my Aug 9, 2012 message.

    It isn’t a sub-thread problem (i.e. a Reply will be displayed above later messages that aren’t a reply).

  59. Posted Aug 10, 2012 at 2:56 PM | Permalink

    Re “Curt Posted Aug 1, 2012 at 7:38 PM”
    Please elaborate on how an observation close to the time of minimum and maximum can cause a bias.

    I do expect that an observation of an automatic min-max logging device close to minimum and maximum could be in error as the min or max may not have yet been reached.
    The common case of that would be changing weather, a chinook being an extreme example (temperature can rise or fall at a rate of several degrees per hour.

    I am wary of minimums and maximums, I don’t see how they represent climate for the debate over causes of change. (They are of value to activities sensitive to temperature, such as damage to plants or equipment from freezing or overheat. But intuitively to me an integration is better, because the main debate is over heat being trapped.)

    (As for quick temperature change, I’m remembering the story from Transair, an airline that served Whitehorse YT out of Winnipeg MB. One day they had to hurry to get everyone on board and takeoff, because the temperature was dropping toward the 737 Classic’s authorized ground minimum of -55F, IIRC. That area of the Yukon is often the coldest in Canada, though IIRC even -55F was not common there.)

  60. nono
    Posted Sep 3, 2012 at 8:06 PM | Permalink

    Jul 31, 2012 at 9:04 AM:
    “I’ll have carry out the TOBS analysis, which I’ll do in the next few days”


11 Trackbacks

  1. […] – indicate a warming over the U.S. closer to NOAA’s estimate. This point was raised by ClimateAudit blogger Steven McIntyre: “Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from […]

  2. […] to correct for the time of observation bias (TOB). This is an important issue, which is why McIntre [sic] is having some doubts about the […]

  3. […] surely co-author and obsessive skeptic Steve McIntyre will back Anthony unequivocally? Nope: Anthony sent me his draft paper. In his cover email, he said that the people who had offered to do […]

  4. By Watts' New Paper - Analysis and Critique on Aug 2, 2012 at 10:10 AM

    […] 24 different models).  Note that McIntyre is a co-author of Watts et al., but has only helped with the statistical analysis and did not comment on the whole paper before Watts made it public.  We suggest that he […]

  5. […] bara dell’AGW. Per non essere confusi con “autori aggiunti”, prendono le distanze Steve McIntyre di corsa (h/t Riccardo) e Roger Pielke Sr con più difficoltà dato lo slancio iniziale (h/t […]

  6. […] However, if he is so distinguished, why does he feel it necessary to rely upon Watts et al (2012), which the esteemed Professor apparently co-authored? Whatever the extent of Christy’s actual involvement, this unpublished paper is now receiving significant constructive (but very damaging) criticism; and being disavowed by one of the other high-profile co-authors – Steve McIntyre. […]

  7. […] – indicate a warming over the U.S. closer to NOAA’s estimate. This point was raised by ClimateAudit blogger Steven McIntyre: “Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from […]

  8. By Copritevi « Oggi Scienza on Aug 21, 2012 at 2:57 AM

    […] McIntyre, consulente in pensione di aziende minerarie il quale però ha precisato di aver solo dato una mano con le […]

  9. By Copritevi | Svoogle News on Aug 21, 2012 at 9:31 PM

    […] McIntyre, consulente in pensione di aziende minerarie il quale però ha precisato di aver solo dato una mano con le […]

  10. By Copritevi. « Raggioindaco blog. on Aug 22, 2012 at 6:25 AM

    […] McIntyre, consulente in pensione di aziende minerarie il quale però ha precisato di aver solo dato una mano con le […]

  11. […] […]

%d bloggers like this: