Met Office Archives Data and Code

The UK Met Office has released a large tranche of station data, together with code.

Only last summer, the Met Office had turned down my FOI request for station data, saying that the provision of station data to me would threaten the course of UK international relations. Apparently, these excuses have somehow ceased to apply.

Last summer the Met Office stated:

The Met Office received the data information from Professor Jones at the University of East Anglia on the strict understanding by the data providers that this station data must not be publicly released. If any of this information were released, scientists could be reluctant to share information and participate in scientific projects with the public sector organisations based in the UK in future. It would also damage the trust that scientists have in those scientists who happen to be employed in the public sector and could show the Met Office ignored the confidentiality in which the data information was provided.
However, the effective conduct of international relations depends upon maintaining trust and confidence between states and international organisations. This relationship of trust allows for the free and frank exchange of information on the understanding that it will be treated in confidence. If the United Kingdom does not respect such confidences, its ability to protect and promote United Kingdom interests through international relations may be hampered…

The Met Office are not party to information which would allow us to determine which countries and stations data can or cannot be released as records were not kept, or given to the Met Office, therefore we cannot release data where we have no authority to do so…

Some of the information was provided to Professor Jones on the strict understanding by the data providers that this station data must not be publicly released and it cannot be determined which countries or stations data were given in confidence as records were not kept. The Met Office received the data from Professor Jones on the proviso that it would not be released to any other source and to release it without authority would seriously affect the relationship between the United Kingdom and other Countries and Institutions.

The Met Office announced the release of “station records were produced by the Climatic Research Unit, University of East Anglia, in collaboration with the Met Office Hadley Centre.”

The station data zipfile here is described as a “subset of the full HadCRUT3 record of global temperatures” consisting of:

a network of individual land stations that has been designated by the World Meteorological Organization for use in climate monitoring. The data show monthly average temperature values for over 1,500 land stations…
The stations that we have released are those in the CRUTEM3 database that are also either in the WMO Regional Basic Climatological Network (RBCN) and so freely available without restrictions on re-use; or those for which we have received permission from the national met. service which owns the underlying station data.

I haven’t parsed the data set yet to see what countries are not included in the subset and/or what stations are not included in the subset.

The release was previously reported by Bishop Hill and John Graham-Cumming, who’s already done a preliminary run of the source code made available at the new webpage.

We’ve reported on a previous incident where the Met Office had made untrue statements in order to thwart an FOI request. Is this change of heart an admission of error in at their FOI refusal last summer or has there been a relevant change in their legal situation (as distinct from bad publicity)?

This entry was written by Stephen McIntyre, posted on Dec 22, 2009 at 11:20 PM, filed under Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

168 Comments

Stuart

Posted Dec 22, 2009 at 11:40 PM | Permalink

Why should climate data ever be a secret?
- Architeuthis
  
  Posted Dec 25, 2009 at 9:30 PM | Permalink
  
  Why indeed? And why do not more people ask this basic question when trillions of dollars is demanded based on that data?
Murray Pezim

Posted Dec 22, 2009 at 11:50 PM | Permalink

That’s great; they obviously have nothing nefarious and spooky to hide. Now you can have at her… :thumbsup:
- Raven
  
  Posted Dec 22, 2009 at 11:59 PM | Permalink
  
  The real question is why did it take the embrassment of a leak/hack to force the MET to release this stuff?
  
  The thing that the climate science community seems to forget is they are demanding an extraordinary amount of trust from ordinary people who will be affected by the policies put in place based on their data/science. They should not be surprised to find out that people think they are hiding something if they act like they are hiding something.
  
  Personally, I won’t start trusting the climate science community until I hear a lot of public apologies for the unprofessional behavoir that they have either engaged in or silently tolerated over the last 10 years or so. Unfortunately, it looks like I will be waiting a long time given the response to date.
  - Geoff Sherrington
    
    Posted Dec 23, 2009 at 12:04 AM | Permalink
    
    Raven, You need not jump to conclusions that Climategate prompted this. There has been other work going on quietly before that. Maybe it had some effect.
  - Murray Pezim
    
    Posted Dec 23, 2009 at 10:03 AM | Permalink
    
    I’ve found that people always tend to be protective over data; the main reason for this is that it can be misinterpreted and misrepresented by the McExperts. Joe SixPack, on the other hand, can’t tell a McExpert from a bona-fide Expert, hence, he gets misled. Self-interest groups exploit this weakness/naivety in Joe SixPack to drum-up support. It’s that simple…
    - JBnID
      
      Posted Dec 24, 2009 at 10:43 AM | Permalink
      
      It is the elite’s belief that ‘Joe Six-Pack’ can’t see through BS that confuses the entire issue.
      Data is data and unless its gathered with PRIVATE money it is, by definition, public.
      Public money buys public information, period. When a scientist makes a claim but holds the basis of that claim ‘secret’ he should be laughed at, ridiculed, and made to go sit in the corner. In the current situation, that corner should have bars around it.
    - Architeuthis
      
      Posted Dec 25, 2009 at 9:36 PM | Permalink
      
      Very ture. This is also a reason why Joe Sixpack should not be allowed to vote or have a say in how governments operate.
      
      Personally if you tell him the next decade will be the hottest ever and it is then a cold decade you should not be too shocked if Joe calls BS. If on the other Joe just accepts the explaination that it was actually a hot decade even after freezing his butt off then I agree, Joe should not be allowed to vote or hold opinions.
Ecochemist

Posted Dec 22, 2009 at 11:53 PM | Permalink

Is this the raw data or the “value added” data?
- boballab
  
  Posted Dec 22, 2009 at 11:54 PM | Permalink
  
  It looks like its the”value added”
  - Steven Hales
    
    Posted Dec 23, 2009 at 12:22 AM | Permalink
    
    “Now with lemon”
    - PeterS
      
      Posted Dec 23, 2009 at 12:55 AM | Permalink
      
      If this is nto the raw data then what’s the point? They might as well be the data for Mars. We must see the raw data.
    - Scott Gibson
      
      Posted Dec 23, 2009 at 1:38 AM | Permalink
      
      We can at least compare the value added data with original data. That way, we can see if the adjustments are reasonable. Also, they can no longer respond to criticism by saying that we’re using the wrong data.
    - Syl
      
      Posted Dec 23, 2009 at 9:28 AM | Permalink
      
      Now with 10% more
boballab

Posted Dec 22, 2009 at 11:53 PM | Permalink

For those that haven’t checked yet John Graham-Cumming thinks he’s found one bug already and he thingks this is not the source code, that it is code that was recently ginned up.
AnonyMoose

Posted Dec 22, 2009 at 11:58 PM | Permalink

Only 3.4 MB.
Jeff Id

Posted Dec 22, 2009 at 11:58 PM | Permalink

Steve,

I suspect you’ll dislike this but, congratulations on the beginnings of a battle won. Some unknown people also deserve our thanks, but without your efforts none of this happens.

The release of even a subset of he data is a huge change in openness and understanding of climate science. My suspicion is that in a few years, people will have much better agreement about results and those in power wonder why they followed leaders who forced an opinion rather than let it develop naturally.
- Follow the Money
  
  Posted Dec 23, 2009 at 3:42 PM | Permalink
  
  “The release of even a subset of he data is a huge change in openness”
  
  Or the opposite. A subset? Look at the “Map of Station Locations”
  
  The ones of the subset have peculiarities: Why every station in New Zealand? Why not even one in Taiwan? Why Western US bias over Eastern US? Why so many in the W. European heartland? Why the few used in the Volga River region?
bender

Posted Dec 23, 2009 at 12:05 AM | Permalink

Where do we place our bets on faulty homogenization and UHI correction?
- boballab
  
  Posted Dec 23, 2009 at 12:15 AM | Permalink
  
  Can’t place bets yet still no Raw data this is the corrected data by CRU that is used to make up CRUTEM and combined with the HadSST series to make HadCRUT. The source code looks to be specifically made just for this release. So we still do not get to see inside the black box that is CRU.
  http://bishophill.squarespace.com/blog/2009/12/22/met-office-code.html
  - bender
    
    Posted Dec 23, 2009 at 11:11 AM | Permalink
    
    cancel my enthusiasm
    
    Steve: Didn’t you mean: Curb My Enthusiasm.
Calvin Ball

Posted Dec 23, 2009 at 12:07 AM | Permalink

Is this change of heart an admission of error in at their FOI refusal last summer or has there been a relevant change in their legal situation (as distinct from bad publicity)?

I suspect it had more to do with sludge and a centrifugal compressor, and people higher up realizing that there was only one way turn the bloody thing off.
Josh Keeler

Posted Dec 23, 2009 at 12:10 AM | Permalink

If this is merely a subset of the available data, how reliable can any analysis made of it be in evaluating results from the full set of data? Isn’t this a bit like trying to build a house with half the foundation missing?

Also, what criteria did they use in designating these sites over any other potential sites?

I’m sure the answers aren’t available yet, but it seems to me that these are important questions to ask.
- Josh Keeler
  
  Posted Dec 23, 2009 at 12:21 AM | Permalink
  
  should have read the answers from the Met office before asking that second question:
  
  “The choice of network is designated by the World Meteorological Organisation for monitoring global, hemispheric and regional climate.
  
  To compile the list of stations we have released, we have taken the WMO Regional Basic Climatological Network (RBCN) and Global Climate Observing System (GCOS) Surface Network stations, cross-matched it and released the unambiguous matches.”
Geoff Sherrington

Posted Dec 23, 2009 at 12:11 AM | Permalink

Would it not be prudent to quality control station data before getting too excited over gridded data.

Gridding/interpolation/weighting is a subject in itself, one which has not received much attention, comparatively, because of past inability to access raw data.

It just happens that there are many people who know their work in gridding, from outside the climate community.
Harold

Posted Dec 23, 2009 at 12:16 AM | Permalink

I still think all this confidentiality is spurious. I had to deal with “confidential” information all the time. The easiest thing to do was to tell the people if the information is confidential, don’t give it to me because I have to be able to disclose it. It was a rare that the company involved didn’t decide the information wasn’t confidential after all.

All the journals have to do is tell authors that papers relying on confidential data, code, etc, will not be published. Don’t publish if they don’t give up the information.
Jud

Posted Dec 23, 2009 at 12:23 AM | Permalink

Let’s not get too excited.
From the FAQ…

“Questions and answers about the data sets

Please select a question to open or close the answer.

1. Is the data that you are providing the “value-added” or the “underlying” data?

The data that we are providing is the database used to produce the global temperature series. Some of these data are the original underlying observations and some are observations adjusted to account for non climatic influences, for example changes in observations methods or site location.

The database consists of the “value added” product that has been quality controlled and adjusted to account for identified non-climatic influences. It is the station subset of this value-added product that we have released. Adjustments were only applied to a subset of the stations so in many cases the data provided are the underlying data minus any obviously erroneous values removed by quality control. The Met Office do not hold information as to adjustments that were applied and so cannot advise as to which stations are underlying data only and which contain adjustments.”
- ZT
  
  Posted Dec 23, 2009 at 12:51 AM | Permalink
  
  It is interestingly repetitive disclosure of tedious things you already know, that FAQ, until you find:
  
  The data may have been adjusted to take account of non climatic influences, for example changes in observations methods [yup – you said all that before], and in some cases this adjustment may not have been recorded so it may not be possible to recreate the original data as recorded by the observer [so – you haven’t got the original data].
  
  You pretty soon start wondering just what they have been doing with the tax payer money if they can’t be bothered to keep the original, presumably voluminous (perhaps 10 MB?), of measurements. Amazing how ‘the lost in an office move’ excuse worked so well with politicians. ‘Really? Do you know, the same thing happened to my expense reports.’ I bet they keep cats in the supercomputer – either that or using it for Perl development.
  - Josh Keeler
    
    Posted Dec 23, 2009 at 9:59 AM | Permalink
    
    I believe it was the CRU that claimed to have lost data in their office move, not the UK MET. But maybe they had the same movers.
  - PaulM
    
    Posted Dec 23, 2009 at 12:33 PM | Permalink
    
    The most amazing/pathetic thing in the FAQ is the response to Q3, where they claim that it wasnt possible to store data in the 1980s, and that best practice in the 1980s invloved deleteing your data:
    
    “The data set of temperatures, which are provided as a gridded product back to 1850 was largely compiled in the 1980s when it was technically difficult and expensive to keep multiple copies of the database.
    
    For IT infrastructure of the time this was an exceedingly large database and multiple copies could not be kept at a reasonable cost. There is no question that anything untoward or unacceptable in terms of best practices at the time occurred.”
    - Sinan Unur
      
      Posted Dec 23, 2009 at 12:59 PM | Permalink
      
      It’s funny since magnetic tape storage has existed for more than 50 years now.
    - MrPete
      
      Posted Dec 23, 2009 at 4:24 PM | Permalink
      
      I was a data (demographics) professional in the 1980’s, and ran a university student computing system before that.
      
      Their excuse is specious. In the 1970’s (!) we had multi-hundred megabyte drives. Lots of them (on our DECsystem 20 minicomputer.) In the early 1980’s, I had a 20MB hard drive at home. Not cheap but very usable.
      
      By the late 1980’s, everything was smaller. We fit the entire archive of a major demographic data vendor in 30 10MB cartridges, on a single two foot bookshelf.
      
      And I haven’t even mentioned tape, which as others have noted remained the bulk-data archive medium of choice for a very long time.
      
      Losing or destroying the primary data has never been a best practice. Ever.
  - tty
    
    Posted Dec 23, 2009 at 12:49 PM | Permalink
    
    I strongly agree. It is true that mainframe RAM memory was expensive back then, but techniques for offline storage were well developed, reliable and fairly cheap.
    I worked with a large mainframe system which had problems with online storage of old historical data back then. We solved it by only keeping the pointers that indicated that the data existed online while storing the actual data on tape. This of course meant that the oldest data was not available in real time, since the tape had to be mounted to access it, but it worked. Later when memory became cheaper we restored the data online. But of course we were only dealing with reliability data for aircraft components, not the fate of the planet.
    - Syl
      
      Posted Dec 23, 2009 at 1:35 PM | Permalink
      
      RAM was never used for storage. Magnetic tape drives were used and it was relatively cheap even back then. We are not talking about THAT much data here. Even today, archiving is done with magnetic drives in many instances.
      
      Don`t be surprised that they magically find the data after much pressure.
      
      If they really did lose it, effort should be made to get the RAW data back from many countries. We need to get an idea on the “value” that was added.
    - TAG
      
      Posted Dec 24, 2009 at 11:27 AM | Permalink
      
      To eacho what eveyone else is saying, back in the 70s and 80s, people were creating large computer programs and using proper software maintenance systems to version and archive code. For somebody to suggest that older versions should be deleted would have been absolutely unknown. The older versions were essential for maintaining customers in the filed. The cost of storage would simply be not a factor for this. Magnetic tape was ubiquitous and cheap and if an older version were required the computer operator was notified to mount a tape. This terminology still persist even though the reason behind it faded away about 25 years ago.
- bender
  
  Posted Dec 23, 2009 at 9:33 AM | Permalink
  
  The Met Office do not hold information as to adjustments that were applied and so cannot advise as to which stations are underlying data only and which contain adjustments.
  
  Well, then it’s as good as crap, isn’t it?
  - Gary
    
    Posted Dec 23, 2009 at 9:59 AM | Permalink
    
    Well, a subset of it. An audit comparing it to whatever raw data can be recovered from original sources. The Met Office will need to get busy…
Josh Keeler

Posted Dec 23, 2009 at 12:42 AM | Permalink

“Underlying data are held by the National Meterological Services and other data providers and such data have in many cases been released for research purposes under specific licences that govern their usage and distribution.

It is important to distinguish between the data released by the NMSs and the truly raw data. e.g. the temperature readings noted by the observer. The data may have been adjusted to take account of non climatic influences, for example changes in observations methods, and in some cases this adjustment may not have been recorded so it may not be possible to recreate the original data as recorded by the observer.”

If some data is raw and some is adjusted, and we still don’t know which is which, that leaves a lot of wiggle room.

Looking at the uncertainty band in their own graphic, http://www.metoffice.gov.uk/climatechange/science/monitoring/data-graphic.GIF why is the uncertainty still at close to .5C currently, but less in the 70’s? Are they really less certain of the accuracy of the data now than then, and if so why? Or am I misinterpreting what the uncertainty band represents?
- ZT
  
  Posted Dec 23, 2009 at 12:54 AM | Permalink
  
  Probably a reduction in the number of stations – or a slightly unsubtle attempt to hide the decline – or perhaps both. They spent the money that would have gone on stations on a supercomputer – to eliminate the need for measurement entirely.
PeterS

Posted Dec 23, 2009 at 1:00 AM | Permalink

Obviously we need to analyse the data to make some kind of judgement but I can already see a lot of skepticism, which is really reasonable given what we already know. So, how does one know if the data is real? Is there a way to verify it by spot checks and cross checking with actual raw data collected elsewhere by other institutions?
Barry R.

Posted Dec 23, 2009 at 1:11 AM | Permalink

Slightly off topic, but just eyeballing the graph of the data I get the impression that temperatures were trending up in the 1850 to early 1880s time frame, then dropped abruptly in the early 1880s and didn’t get back to the late 1870s-early 1880s levels until maybe the late 1920s or early 1930s. The early 1880s drop might be due to Krakatoa (lots of sulfur dioxide to reflect sunlight and cool things down), but my understanding is that sulfur dioxide washes out for the most part in about three years. I am I seeing something real here or not? Is there any mechanism that would cause an extremely large volcano eruption to have a multi-decade impact on climate? Is the data good enough to see if there was a persistent drop after the two even bigger eruptions in the 1809-1815 period that supposedly caused the “Year Without a Summer”?
- Tony Hansen
  
  Posted Dec 23, 2009 at 1:47 AM | Permalink
  
  Barry,
  It is interesting that the temps appear to drop ‘before’ Krakatoa.
  E.M. Smith had a chart with a similar drop ‘before’ Tambora.
DeNihilist

Posted Dec 23, 2009 at 2:12 AM | Permalink

Arghhh! This too frustrating. There is a need to get a proper coalition together to raise some serious bucks, and purchase if need be, the raw data from those that “own” it!
- s.matik
  
  Posted Dec 23, 2009 at 2:53 AM | Permalink
  
  Well.. the UNFCCC got there first, and that would be 17 years ago. See the text of the Convention
  http://unfccc.int/essential_background/convention/background/items/1350.php
  Somebody ought to “read it open”, or in other words translate it to something understandable. A spent a little time trying and found it to be very interesting. For starters, it seems to be the base for the largest propaganda machine concerning human caused climate change, or AGW, was set up with that treaty.
  Not going further into detail, but here’s a quote from the text..
  
  ARTICLE 5:
  RESEARCH AND SYSTEMATIC OBSERVATION
  
  In carrying out their commitments under Article 4, paragraph 1(g), the Parties shall:
  
  (a) Support and further develop, as appropriate, international and intergovernmental programmes and networks or organizations aimed at defining, conducting, assessing and financing research, data collection and systematic observation, taking into account the need to minimize duplication of effort;
  
  (b) Support international and intergovernmental efforts to strengthen systematic observation and national scientific and technical research capacities and capabilities, particularly in developing countries, and to promote access to, and the exchange of, data and analyses thereof obtained from areas beyond national jurisdiction; and
  
  (c) Take into account the particular concerns and needs of developing countries and cooperate in improving their endogenous capacities and capabilities to participate in the efforts referred to in subparagraphs (a) and (b) above.
David Mellon

Posted Dec 23, 2009 at 2:33 AM | Permalink

This info has been out since 12/8/2009. I am a noobee so please excuse any obvious ignorance in this comment. I briefly looked at the U.S. data with an emphasis on Nevada since I lived in the state for 27 years. The Nevada stations in the newly available data are as follows: Reno (el 4400′), Las Vegas (el 2000′), Winnemucca (el 4400′), and Ely (el 6400′). The mean elevation of the four measurement stations is 4,300′. Nevada has more mountain ranges than any other state and the mean elevation of the state is about 5,500′ (U.S. Geological Survey). I believe it would have been more appropriate to include South Lake Tahoe (el 6,300) instead of Reno NV. They are about 30 miles apart so the location is a good fit. Substituting South Lake Tahoe yields a mean elevation of 5,375 feet and gets rid of one of the two major metropolitan heat islands used by CRU in this limited dataset.

The more I examine the measured temperature data on AGW the more I realize how little reliable data they really have! Considering the size of Nevada, the granularity of the data is ridiculously high! There are numerous climate zones in Nevada and they should be represented properly in any serious attempt to reconstruct the state’s temperature record.
Karl Lehenbauer

Posted Dec 23, 2009 at 2:43 AM | Permalink

I have written some code to crank these land surface climate station files into CSV files palatable by R. The CSV files can be found here:

http://lehenbauer.com/ukmo09

ukmo09stations.csv is one row per station and contains all of the key=value pairs from each station file.

ukmo09stationtemps.csv are the observation years (station, year and 12 months of temperatures) for each station, while ukmo09stationrows.csv has one row per temperature.

The normals and standard deviations appear in ukmo09normals.csv.

So you can actually use…
x <- read.csv("http://lehenbauer.com/ukmo09/ukmo09stations.csv&quot😉
…but I request that you cache the files locally if you're going to access them a lot.
- Demesure
  
  Posted Dec 23, 2009 at 4:09 AM | Permalink
  
  Thank you so much Karl !
  Maybe you could add a readme.txt file to your download directory, since your above post will be drowned.
  - RomanM
    
    Posted Dec 23, 2009 at 8:36 AM | Permalink
    
    I put up a post on the Met data here some time ago.
    
    The post includes an R script for reading both the data and the other station information contained in the released files. The script uses the extension .doc (WordPress won’t allow me the upload of .txt files), but it is actually a simple text which can be opened easily by R.
Richard

Posted Dec 23, 2009 at 3:55 AM | Permalink

I suspect this is just a publicity stunt to give the supporters of AGW a talking point. “See we have released the data and it all compiles as we have published. So there is nothing wrong and you have made a big fuss over nothing”

As has been pointed out this is not the raw data but the “corrected” data. The “black box” of HADCrut remains the black box.

Still lets see if this reveals something. I suspect nothing much.

The fight has to be kept up to get the raw data and how they get the corrected data from that.
PaulM

Posted Dec 23, 2009 at 4:06 AM | Permalink

* This is a subset of the ‘adjusted’ data, so virtually useless.
* It was released on about Dec 8th, almost certainly in response to climategate.
* JGC’s plots show that even with this adjusted data, the trend is significantly lower than CRUTEM.
- PaulM
  
  Posted Dec 23, 2009 at 4:12 AM | Permalink
  
  Oops, there was a bug in his code, now he gets good agreement, see latest post at
  http://www.jgc.org/blog/
chris

Posted Dec 23, 2009 at 4:15 AM | Permalink

Under UK law I think you can submit a Data Protection request for 10 GBP. The organisation (private as well as public) has to release all info which relates to you, including any files, emails etc. This may reveal more info on how the original FOI request was handled including any emails to/from others.

You know you want to…
cogito

Posted Dec 23, 2009 at 4:25 AM | Permalink

I just compared the data for one station (Saentis, Switzerland, 066899) and compared it to the original data. It matches 100%. However the original data is already homogenized by the local Met office and from which I’m trying to get the raw data or the homogenization procedures.
scienceofdoom

Posted Dec 23, 2009 at 5:42 AM | Permalink

I’m a novice in the details of station data. And I think there are many people on this blog who know it inside out. I would really appreciate it if someone can let me know the answers to these questions – probably the answers are buried in numerous blogs.

1. We can download the raw unadjusted station data from all stations? Or only some stations? Or no stations? (But Willis did a great post about N. Australia recently…)

2. Now that the MET office has released the station data, we can correlate (if had the time and energy) the adjusted station data with the raw station data (see question 1)?

3. Now that the MET office has released data and code we know which stations make up the index? We did already know this before? We didn’t know it before?

I guess the bit I am confused about is that people have been doing posts for ages about the raw data. And from the above post and comments it seems like we are only get adjusted data. Can’t we now join the dots together? Some dots together? No dots?

Thanks a lot for someone who takes the time to help. Probably I’m not the only one with the questions…

Steve
scienceofdoom.com
- Sinan Unur
  
  Posted Dec 23, 2009 at 9:49 AM | Permalink
  
  I do not know how much of the information is new, but if you want to know what stations are in this file, you can find them at http://www.unur.com/climate/cru-station-records/cru-stations-20091223.txt
  
  There is a link to a compressed Excel file with the same information at http://www.unur.com/climate/cru-station-records/
P Gosselin

Posted Dec 23, 2009 at 6:08 AM | Permalink

“…and those in power wonder why they followed leaders who forced an opinion rather than let it develop naturally.”

Not sure who you mean by “those in power” and “leaders”.
I’d say there was immense pressure on scientists to customise results to match specific desires. Having read Corcoran’s NP piece, my feeling is that CRU, Mann, Briffa etc. were highly paid hockey stick foundries supplying people with huge financial and political stakes.
And everything I’ve read thus far, sure looks to me like it was a whistle-blower. Seems to have been quite calculated.
P Gosselin

Posted Dec 23, 2009 at 6:09 AM | Permalink

Replying to Jeff Id, 11:58.
VG

Posted Dec 23, 2009 at 6:45 AM | Permalink

Steve.. you are obviously a were a polite persdon who asked a polite question to Mann many years ago. I don’t understand why you could even bother to give these people any credence at all. At this stage you or I would not know if the data they are supplying is credible. Its been flogged to death (by you and others, the data). I thinks it is really over now (if not today, it will be in 6 months). Old soldiers never die they only fade away and this is the way it will be handled.. It will just fizzle out… But thank you for what you have done and if anyone deserves a Nobel Prize it will be you!
Stacey

Posted Dec 23, 2009 at 7:18 AM | Permalink

I am very puzzled. If different countries have sent data to CRU which has been paid for but is now lost why can’t these countries just resend. Simple really?
- Jimchip
  
  Posted Dec 23, 2009 at 10:15 AM | Permalink
  
  I can’t recall the specific eastangliaemail but there are two, somewhat common, problems: 1) some reporting regions don’t always send current data… they resend last months for the sake of the report. 2) There appears to be a default file name that the data is saved as. Due to ‘unsophistication’ wrt to the use of the reporting software the previous month gets overwritten when the current data is saved. It will never be available again from the original reporting region.
John DeFayette

Posted Dec 23, 2009 at 7:39 AM | Permalink

Regarding the answer to Q.4, OK, I understand the first two sentences are valid, if weak, defenses. Can anyone tell me what the rest of the answer has to do with temperature record accuracy?

Has the Met office turned AGW on its head with an innovative new inference? Now we know that various physical measurements, from glacier volumes, to sea level changes, air moisture content and I suppose also polar bear head counts, are all nothing more than a massive observational validation of the Met office’s increasing temperature record.

Boy, I hate it when I get my causes mixed up with my effects!

Cheers,
John
nearwalden

Posted Dec 23, 2009 at 7:54 AM | Permalink

Two licensing nits:

– there is no license or copyright info associated with the data.

– the website says: “This code is released under an Open Source licence that is contained as comments in the code. By running the code you indicate your acceptance of the licence.” I have been through both files and there is no licence in the comments.

These are not the end of the world, but it is good practice (for data providers and users) to be explicit on what license terms the data and code are provided under.

For example, if the open source license reference above is GPL, then the license requires that all derivative works also be made freely available and under the same GPL license.
G. E. Lambert

Posted Dec 23, 2009 at 8:11 AM | Permalink

New here. Can anyone point me to a tutorial on “gridbox size” and station selection criteria for same.
Sinan Unur

Posted Dec 23, 2009 at 8:30 AM | Permalink

I have started parsing and organizing the data. I just posted the list of all stations included and a Perl script to organize everything in to a database file. I am going to post the data files as well as station normals and standard deviations in the next few hours.

You can find the files at http://www.unur.com/climate/cru-station-records/
- Sinan Unur
  
  Posted Dec 23, 2009 at 10:01 PM | Permalink
  
  OMG! There were several off-by-one and parsing errors in my code. I failed to see the obvious misalignment and the fact that the January standard deviation was parsed as ‘deviations’ for all stations etc etc in my excitement this morning.
  
  I think everything has been fixed now. Apologies for any inconvenience this might have caused.
Sinan Unur

Posted Dec 23, 2009 at 8:33 AM | Permalink

I tried to post this before but the post does not seem to have made it.

I have started parsing the data files in to meta data, normals, standard deviations and anomalies.

So far, I have posted the Perl script I used to parse all the files and the list of stations (along with meta data) on my web site.

See http://www.unur.com/climate/cru-station-records/
Sinan Unur

Posted Dec 23, 2009 at 8:41 AM | Permalink

Just looking at the output of my script, I see some interesting transcription issues.

Unless I am mistaken, there are 32 stations with no country specified. Also note:

ALGERIA|27
ALGERIA—–|1

ANTAR|4
ANTARCTICA|15

ARGEN INA|2
ARGENTINA|54

COTE D’IVOIRE|10
COTE-D’IVOIR|1

TURKE|4
TURKEY|21

W.GER ANY|2
W.GERMANY|6

These are not the only examples. I need to go back and triple check my code and the data. If these errors are really in the data, they would, of course, not mean the data are wrong but they make me cringe.
- Sinan Unur
  
  Posted Dec 23, 2009 at 8:45 AM | Permalink
  
  These transcription errors really do appear in the data. So much for quickly sorting countries by stations.
  - Sinan Unur
    
    Posted Dec 23, 2009 at 9:09 AM | Permalink
    
    Added a table of number of stations by (misspelled) country name.
    
    UNITED STATES and USA together contribute 71 stations. The country with no name contributes 32 😉
    - Sinan Unur
      
      Posted Dec 23, 2009 at 9:18 AM | Permalink
      
      The largest contribution seems to come from RUSSIA (40) plus USSR (31) plus RUSSIA (ASIA) (25) plus RUSSIAN FEDER (8) plus RUSSIA (EUROP) (7) and, finally RUSSIA EUROPE (1).
      
      Then, of course, there are the countries of I, O and S.
      
      Clearly, these transcription errors do not mean there is anything wrong with the numbers but it makes one wonder: Couldn’t they have hired some undergrads to check the names?
    - Byronius
      
      Posted Dec 23, 2009 at 10:59 AM | Permalink
      
      How is gridding accomplished when you don’t know in what country the station is located? Purely lat-long? If the latter, then all one must do is plot the lat-longs on a map. If the former… it would be interesting to know if the stations of “unknown location” share a common characteristic (such as lower temps / trends).
    - Sinan Unur
      
      Posted Dec 23, 2009 at 12:09 PM | Permalink
      
      Grids are based solely on coordinates.
      
      I put together a map of CRU station locations, if you are interested.
- G. E. Lambert
  
  Posted Dec 23, 2009 at 9:33 AM | Permalink
  
  Have you found any stations in the wrong folder? eg Folder 72 appears to be the United States. For unidentified countries, can you correlate to latitude/longitude to make the determination.
  
  Or, is your point that the labeling is so error prone, how can we trust the actual data to be any better? (That might be my point.)
  - Sinan Unur
    
    Posted Dec 23, 2009 at 9:57 AM | Permalink
    
    It just smells bad when these data are supposedly someone’s great achievement and is used as input in decisions affecting the lives of billions of people and something so small as getting the names of the countries right is not done.
    
    I would like to believe that there is no economist on planet earth who would not react strongly if the World Bank published data like this.
    
    I have seen homework assignments prepared by undergrad Intro Stats students with more attention to detail.
    
    And, I could rant all they long about that Perl script they posted.
    - Gary
      
      Posted Dec 23, 2009 at 10:13 AM | Permalink
      
      A bullet-point summary of the problems will do. When they boast about releasing the data, the response will be “how do you explain this list of problems, then?” You can’t spell, you can’t count, you can’t write decent programs.
    - Sinan Unur
      
      Posted Dec 23, 2009 at 10:56 AM | Permalink
      
      I am going through the code. I find the two following items suspect:
      
      1. Use of truncation rather than rounding:
```
my $LonI = int( ( -$station->{long} + 180 ) / 5 );
my $LatI = int( (  $station->{lat}  + 90  ) / 5 );
```
      I am not sure if this matters. Also, I am not sure what the right thing to do is, but truncation (using int) and rounding (using sprintf '%.0f') are different things and I do not think they made the choice consciously.
      
      2. Using the string ‘196908’ as an array index rather than a hash key.
```
my $key = sprintf "%04d%02d", $i, $j;
...
push @{ $GridA[$key][$LonI][$LatI] }, ...
```
      Since ‘196908’ is automatically converted to an integer when used as an array index, this piece of code just created a one hundred ninety six thousand nine hundred eighth element of $GridA which explains the memory footprint of this simple script.
      
      Again, non of these would necessarily cause errors in the calculations (I am sure they diff’ed the output of the real code they used and this Perl script, but it smells bad.
    - G. E. Lambert
      
      Posted Dec 23, 2009 at 10:33 AM | Permalink
      
      Yes, given the stakes involved this does all seem pretty “bush-league”. I would like to believe that when we get to see the work of internationally acclaimed scientists we will be dazzled by their intellect and their craftsmanship. This preliminary look behind the curtain is disappointing.
    - G. E. Lambert
      
      Posted Dec 23, 2009 at 3:03 PM | Permalink
      
      Have done just the most simplistic “look” at the data for MARQUETTE, MICHIGAN, 727430. I am, so far, unable to reproduce their results for “normal” and “standard deviation”. My results agree exactly for June though December. The results for January through May, however, are significantly different. I guess I need to look at and understand the data. Any ideas to save a novice from embarrassment?
    - G. E. Lambert
      
      Posted Dec 23, 2009 at 4:11 PM | Permalink
      
      problem solved. it was embarrassing.
Quondam

Posted Dec 23, 2009 at 8:44 AM | Permalink

Downloaded data from the MET and GISS sites were compared in a spreadsheet for a common station in Marquette, MI (727430). MET monthly data starts in 1873, GISS in 1881. Through 1962 MET and GISS-Raw values are identical. From 1963 to 1993 differences MET values are about 0.3º higher with a seasonable adjustment of a similar magnitude in which MET temperatures are cooler in summer and warmer in winter. Since 1994 MET values are a constant 1.3ºlower. A graph also shows spikes apparently due to transcription errors and a region 2004-2007 for missing data.

As both files are of monthly data, they aren’t daily RAW readings. With 80 years of identical values, there is a common input with ‘added values algorithms’ changing in 1963 and 1993.
AJStrata

Posted Dec 23, 2009 at 9:29 AM | Permalink

The code up at the MET Office does not appear to be the code used by CRU:

http://strata-sphere.com/blog/index.php/archives/11957
Sinan Unur

Posted Dec 23, 2009 at 9:44 AM | Permalink

Just now looking at the station_gridder.perl file from the U.K. Met Office.

The code is written in newbie cargo cult style. For example:

for ( my $Year = 1850 ; $Year <= 2009 ; $Year++ ) {

is a C style loop. A decent Perl programmer would write it as:

for my $Year (1850 .. 2009) {

Instead of using Perl’s standard File::Find, the code manually walks through subdirectories.

and sub mean is an eyesore.

In any case, the housekeeping code does the same thing my stations.pl script does. I am going to add in the anomaly calculation code at some point. Why did I have to find out about this just before Christmas? 😦
Steve McIntyre

Posted Dec 23, 2009 at 9:44 AM | Permalink

People should not assume that there will be a quick return on this particular data dump or, for that matter, that there is necessarily even anything “wrong” with the final result. 99% of all audits of financial statements do not find material errors in the statements, but it is a precaution that’s done for a reason.

My own suggestion to interested people is to look at cases – like Willis did at Darwin or Lars Kamel attempted in South Siberia. My personal concerns and interests are more in the paleo area. I may experiment with the temperature data sets, but it looks like there are enough others interested in this that it need not be a priority for me.

At the end of the day, keep in mind that there is considerable collateral evidence that temperatures are higher now than in the 19th century. The allocation of the increase between the period to the 1930s and from the 1930s to the present may have some play in it.

Given that millions of dollars have been spent over the years on the CRU database, that the craftsmanship should be so abysmal as to have lost or destroyed information before “adding value” is disquieting. As defenders have observed, GHCN has station versions, but these also seem to have been adjusted in various not-very-transparent ways.

In the past, I’ve postulated that the principal commercial interest being protected here was not the NMSs, but CRU’s – that the cost of producing CRUTEM was far less than the grants for producing it and they were using the funds to support other activities, that they did negligible quality control or maintenance on temperature data, that they executed an old and simplistic computer program to produce their index. The amount of money being applied elsewhere is negligible in Wall St terms, but that doesn’t mean that it was unimportant to CRU.
- Hu McCulloch
  
  Posted Dec 23, 2009 at 11:36 AM | Permalink
  
  My own suggestion to interested people is to look at cases – like Willis did at Darwin or Lars Kamel attempted in South Siberia. My personal concerns and interests are more in the paleo area. I may experiment with the temperature data sets, but it looks like there are enough others interested in this that it need not be a priority for me.
  
  Steve, you’ve achieved an enormous victory by getting at least this partial data released. However, perhaps the massive CRU Audit effort that is now possible should be directed to someone else’s website so that you can focus on your own priorities here? Or a consortium of 3 or 4 cooperating websites?
Steve McIntyre

Posted Dec 23, 2009 at 9:47 AM | Permalink

Sinan, when we got the CRU list a coupl of years ago, I collated station information with rational country allocations.

Some of this work may carry forward if the MetOFfice station ids are the same as the semi-idiosyncratic CRU station IDs.
Steve McIntyre

Posted Dec 23, 2009 at 9:49 AM | Permalink

WIKI: In order to keep track of various utilities for handling station data, I would like to have a technical wiki to archive these script. Can we discuss alternatives?
- MrPete
  
  Posted Dec 23, 2009 at 11:05 AM | Permalink
  
  Really, the only issue is whether “we” host it, or find a willing host for high-volume transactions.
  
  One thing we’re working toward is an Open Science open-source archive (SourceForge, etc). I *think* (not sure) there’s a wiki available there…
  - Sinan Unur
    
    Posted Dec 23, 2009 at 11:13 AM | Permalink
    
    http://sourceforge.net/apps/trac/sourceforge/wiki/Hosted%20Apps
    
    MediaWiki Wiki
    
    For the volume this ClimateAudit might generate, SourceForge might be a good way to go.
    
    It gives you version control, tracking, releasing etc and you can use the Wiki.
    
    I am not sure how well it would scale, but Fossil is very easy to set up and use with Apache.
    
    http://fossil-scm.org/index.html/doc/tip/www/index.wiki
- MrPete
  
  Posted Dec 23, 2009 at 11:06 AM | Permalink
  
  By the way Steve, you can upload to climateaudit.info now, if that’s all you need.
- Lucy Skywalker
  
  Posted Dec 23, 2009 at 1:17 PM | Permalink
  
  Another possible is the nascent skeptics’ wiki running on the MediaWiki platform Neutralpedia – see here
- Shen
  
  Posted Dec 23, 2009 at 1:49 PM | Permalink
  
  Steve, I’m the maintainer of neutralpedia.com (which is shamefully devoid of content atm, we’ve been busy). If you want a wiki you are welcome to use that, I’ll make all the necessary modifications to the software that you may need. Bandwidth or space are not a problem, I have several physical servers. Also if you wish to have your own wiki, independent of neutralpedia.com, it can be done. You can see my email so feel free to contact me explaining your requirements.
- alantrer
  
  Posted Dec 28, 2009 at 12:34 AM | Permalink
  
  The appropriate solution depends on the intended requirement. A “script’ these days has a broader meaning. It can be as simple as parsing data into a spreadsheet. It can be as elaborate as plotting graphs for each station in a geographic region on a map.
  
  I have seen a call for “papers” for a balanced and reasoned analysis. For this a MediaWiki is appropriate.
  
  I have also seen many calls for an open source alternative to both global temperature calculations and climate modeling. For this a SourceForge is appropriate.
  
  If a script is a one-off tool to achieve a simple data manipulation then a wiki is sufficient: rich in meta data with a discussion of the process.
  
  If the objective is data management and code development then a software management site is more appropriate.
  
  Personally I believe both are needed. And by moniker alone climateaudit would seem an appropriate trail head.
  
  The resources seem available and motivated. They just require an organizing force.
EdeF

Posted Dec 23, 2009 at 9:51 AM | Permalink

Now, that wasn’t so hard, was it?

Have been looking at California stations and I am underwhelmed. Calif has
only 6 stations: Eureka, San Francisco, Los Angeles, San Diego, Fresno and
Sacramento. Four are along the coast and have very localized climate
that does not represent the rest of the state. Only two stations, Fresno and
Sacramento for the entire center of the state. All stations are less than
200 ft above sea level. There are no stations in the upper Sacramento valley,
nor lower San Juaquin valley, none in the high nor low deserts, nada in eastern
Calif, nyet in the Sierras, the Cascades nor the Klamath range. Nothing in
interior Souther Calif. All 6 stations would have mild weather, no large winds,
low rainfall (except Eureka), no snowfall, no extreme wx either high
temperatures or extreme lows. I am not sure what this is, but it does not
represent California geographic areas nor climate. An additional 6-7 stations
could be added that would do just that. I would add Redding, Bakersfield, Yosemite, Bishop, Pasadena, Palm Springs and Lancaster and that would help.
All of the stations BTW are heavily urbanized, except maybe Eureka, which is
shrouded in a giant haze of pot smoke.
- Jimchip
  
  Posted Dec 23, 2009 at 10:26 AM | Permalink
  
  Click to access 82851.pdf
  
  Christy at UAH has studied San Juachin. I’ll agree with “I am underwhelmed” wrt the surface temp data.
- bender
  
  Posted Dec 23, 2009 at 11:02 AM | Permalink
  
  One of the topics Steve M has discussed in the past is modern statistical methods of analysis for paleo data – for example the use of mixed-effects models to replace the “artesanal” methods of classical dendroclimatology (GREAT word, BTW). It seems to me there is an opportunity for this also in the analysis of inhomogenous and “polluted” surface station data. For example, rather than scan for putatively UHI trends and then “detrend”, why not estimate the urban-related effects *directly* by regressing records onto population counts? Taht way you’re not assuming anything about when the poullting trend was intiated and terminated. You let the model tell you when the effect occurred.
  .
  This would be a good topic for a thread that could genreate some interesting modeling activity in a wiki. I presume that Hu McCulloch, Craig Loehle, RomanM, NW, and EW would all be intersted in this topic.
  
  Steve: YEs, homogenization methods strike me as artisanal as well; the statistical aspect of the problem has underpinned some of my interest. My own impression is that a more advanced method would keep the original data as long as possible and include the adjustments in the estimation. I did something like this with one of Hansen’s steps in a longago thread.
  - scienceofdoom
    
    Posted Dec 23, 2009 at 5:25 PM | Permalink
    
    Bender, do you know about the McKitrick and Michaels paper?
    “Quantifying the influence of anthropogenic surface processes and
    inhomogeneities on gridded global climate data” JOURNAL OF GEOPHYSICAL RESEARCH, 2007.
    
    from the abstract:
    “..If done correctly, temperature trends in climate data should be uncorrelated with socioeconomic variables that determine these extraneous factors. This hypothesis can be tested, which is the main aim of this paper. Using a new database for all available land-based grid cells around the world we test the null hypothesis that the spatial pattern of temperature trends in a widely used gridded climate data set is independent of socioeconomic determinants of surface processes and data inhomogeneities. The hypothesis is strongly rejected (P = 7.1 x10-14), indicating that extraneous (nonclimatic) signals contaminate gridded climate data. The patterns of contamination are detectable in both rich and poor countries and are relatively stronger in countries where real income is growing. We apply a battery of model specification tests to rule out spurious correlations and endogeneity bias. We conclude that the data contamination likely leads to an overstatement of actual trends over land. Using the regression model to filter the extraneous, nonclimatic effects reduces the estimated 1980–2002 global average temperature trend over land by about half.”
    - hengav
      
      Posted Dec 23, 2009 at 8:26 PM | Permalink
      
      Bender and many others are quite familiar with the paper. It is the statistical benchmark of Urban heating (as a non-climatic signal) papers. Too bad Wikipedia won’t let it be included as part of either of the authors’ published papers. UHI investigations are a good place to start with this data. Pick a town near you and have a look.
- Peter D. Tillman
  
  Posted Dec 24, 2009 at 12:01 PM | Permalink
  
  So no-one need re-invent the wheel, have a browse through
  http://chiefio.wordpress.com/
  first, as he’s been doing quite a bit of this.
  
  Worth a repost, imo, though most are probably already aware of this.
  
  Shamefully few, and biased, station selections for the “global” average. Really shoddy and transparently biased work.
  
  Sadly, Pete Tillman
Syl

Posted Dec 23, 2009 at 9:52 AM | Permalink

A Von_Storch editorial stating that everything must be opened up. Threw a few climatologists under the bus as well.

http://online.wsj.com/article/SB10001424052748704238104574601443947078538.html?mod=googlenews_wsj
Mac Lorry

Posted Dec 23, 2009 at 10:05 AM | Permalink

The reported sloppiness and questionable providence of the data raises the question of can any statistically significant climate signal be extracted from it?

Given the source of the data is defensive about its use in debunking various papers, I would be concerned that publishing anything based on this data could be subject to invalidation by some as yet unknown revelation about it.

Perhaps they are feeding crackers to whistle blowers.
- Sinan Unur
  
  Posted Dec 23, 2009 at 10:09 AM | Permalink
  
  Mac Lorry:
  
  There is no problem extracting statistically significant signals from any set of numbers. The hard part is to decide whether something that is statistically significant means anything.
  - Mac Lorry
    
    Posted Dec 23, 2009 at 10:57 AM | Permalink
    
    Sinuan Unur:
    
    Doesn’t the signal have to actually exist in the data? One of the comments in the leaked emails was that when white noise was used as the input to some software it still produced a hockey stick graph. The implication being that there was no signal in the input data, and thus, the hockey stick pattern was the result of bias in the software. If so, then my question was valid. The operative phase being “climate signal” rather than any arbitrary signal.
Sinan Unur

Posted Dec 23, 2009 at 10:07 AM | Permalink

Steve:

In the past, I’ve postulated that the principal commercial interest being protected here was not the NMSs, but CRU’s – that the cost of producing CRUTEM was far less than the grants for producing it and they were using the funds to support other activities, that they did negligible quality control or maintenance on temperature data, that they executed an old and simplistic computer program to produce their index

I think you have been proved right.

when we got the CRU list a couple of years ago, I collated station information with rational country allocations.

That ought to be a good way of fixing the meta information here. Now that I have the data in a SQLite DB, it is easy to run queries against it. Is there a link for the file you created.

WIKI: In order to keep track of various utilities for handling station data, I would like to have a technical wiki to archive these script. Can we discuss alternatives?

First, I will add a copyright and license statement to my code (Perl Artistic License is very permissive). That will ensure that you can include it in such a Wiki.

Second, now that your blog is being hosted on WordPress.com, could we use the dedicated server for such a Wiki?

Steve: Yes, but I’d like someone to volunteer on the software. Pete Holzmann has already shouldered more than his share.
- Sinan Unur
  
  Posted Dec 23, 2009 at 10:59 AM | Permalink
  
  Steve:
  
  Yes, but I’d like someone to volunteer on the software. Pete Holzmann has already shouldered more than his share.
  
  I am definitely able to set it up. My puny virtual server is not up to the task, otherwise I would have offered.
  
  The problem is the time frame. I cannot do it if you want it very quickly. Let’s discuss over email how I can help.
- MrPete
  
  Posted Dec 23, 2009 at 11:24 AM | Permalink
  
  Here’s something someone could follow through on: an old review of Wiki Farm Host services. Many are free, some are not.
  http://pascal.vanhecke.info/2005/10/30/free-hosted-wikis-comparison-of-wiki-farms/
  
  I also found http://wikihosting.net
  
  Now back to my Real World 🙂
- Lucy Skywalker
  
  Posted Dec 23, 2009 at 12:15 PM | Permalink
  
  CA poster Shen has very recently set up a MediaWiki (same platform as Wikipedia) with the intent of establishing a wiki for “Climate Change”, called Neutralpedia. He has uploaded the emails and set up a skeleton website, and was going to get more content online before announcing, but then Lawrence Solomon’s article appeared. And now this. I think he’s open to possibilities.
  
  Just my pennyworth while beset with flu.
edrowland

Posted Dec 23, 2009 at 10:19 AM | Permalink

The FAQ states: The choice of network is designated by the World Meteorological Organisation for monitoring global, hemispheric and regional climate. To compile the list of stations we have released, we have taken the WMO Regional Basic Climatological Network (RBCN) and Global Climate Observing System (GCOS) Surface Network stations, cross-matched it and released the unambiguous matches.

So does anyone know anything about how the stations in the RBCN and GCOS were selected? Based on the California stations, mentioned earlier, it would seem that there has been no effort to select for non-urban stations.
- johnh
  
  Posted Dec 23, 2009 at 11:04 AM | Permalink
  
  According to the GCOS site
  
  ‘The requirements of the GCOS for climate observations are specified by the following scientific panels:
  Surface, upper air, marine, meteorology and atmospheric chemistry composition – Atmospheric Observations Panel for Climate (AOPC)
  Ocean climate – Ocean Observing Panel for Climate (OOPC)
  Terrestrial climate – Terrestrial Observation Panel for Climate (TOPC)’
  
  http://gosic.org/gcos/GCOS-dev.htm
aurbo

Posted Dec 23, 2009 at 10:26 AM | Permalink

Just had time for a quick check of a station I’m very familiar with…New York City. The released data lists the WMO numer as 725030 which is actually LGA (LaGuardia Airport). The data starts in 1822 and runs up to near present. I had to check by hand so I only had time to pick three years…1822, 1870 and 1950.

LaGuardia Airport first started taking obs in Oct 1939. The 1822 data fits very well (within a 0.1°C) with what was called the NY City Office. Actually, this data was taken at Ft Jay on Governor’s Island from 1822 to about 1870 when it shifted to locations on the lower end of Manhattan from 40 Wall St to the Whitehall Building on Battery Place. This latter spot is 1 Mile North of what was Ft Jay.

The 1870 data matches the City Office fairly well, and is reasonably close to Central Park which began about 1869. A 0.1°C difference can be attributed to the conversion errors between °C and °F.

The 1950 data is close to the LGA raw obs, but not exactly. Most are within 0.2°C but one or two months are more than 0.5°C off. Both the City Office and Central Park track well with LGA, most months within 0.7°C.

I didn’t have time to download and check against the most recent year(s).

CRU uses the WMO ID for Laguardia Airport. To answer a question asked earlier, the first two digits are the WMO Block Number, in this case Block #72 which is for the US. (The US also uses Block #74). The next 3 digits are for the station number for synoptic sations, the final digit (when not 0) is for nearby stations which were not included in the original WMO assignments.

It’s a little puzzling as to why there are any differences at all that are larger than 0.1°C. In the three years I checked there does not appear to be any systematic adjustm,ents…at least not for the 1822 and 1870 strings. It’s clear that the meta data for changing station locations is not included.
Kenneth Fritsch

Posted Dec 23, 2009 at 10:40 AM | Permalink

I have little to add to this discussion, but to recognize that the original data is evidently all gone for all time (we are getting “value added” data in this latest dump). Critical to note in the latest effort will be (lack of) evidence of sloppiness and what that implies for the general level of work coming out of CRU.

Also of note was the post by an apparent defender of the consensus status quo where Murray Pezim says:

“I’ve found that people always tend to be protective over data; the main reason for this is that it can be misinterpreted and misrepresented by the McExperts. Joe SixPack, on the other hand, can’t tell a McExpert from a bona-fide Expert, hence, he gets misled. Self-interest groups exploit this weakness/naivety in Joe SixPack to drum-up support. It’s that simple…”

Interesting whether some in the consensus would admit to this type of rationale for with holding information. Applied to everyday policy decisions this would mean that the voter must be exposed only to what the “experts” chose to reveal. That’s scary stuff.

Who decides who the experts are and would sloppiness be a criteria for judging?
- ianl8888
  
  Posted Dec 24, 2009 at 1:30 AM | Permalink
  
  Nonetheless, that’s exactly what happens – here in Aus, even the Census data is secret from us … too many awkward questions can be asked of such a database
Max Hugoson

Posted Dec 23, 2009 at 10:50 AM | Permalink

sni – piling on
Sean Peake

Posted Dec 23, 2009 at 11:18 AM | Permalink

Re MET statement: “… a network of individual land stations that has been designated by the World Meteorological Organization for use in climate monitoring.”

Is it me or does this read like a hedge?—the WMO chose the stations, not the MET or CRU so don’t blame us if these are wrong
- johnh
  
  Posted Dec 23, 2009 at 1:38 PM | Permalink
  
  A MET office offical said exactly that when asked about the Russian IEA complaint about the CRU selection distorting the temp profile for Russia.
  - johnh
    
    Posted Dec 23, 2009 at 1:45 PM | Permalink
    
    Here’s the link to the story, it was a Hadley spokesman.
    
    http://www.express.co.uk/posts/view/146517/Climate-change-lies-by-Britain
    - EdeF
      
      Posted Dec 24, 2009 at 11:15 AM | Permalink
      
      “The World Meteorological Organisation chooses a set of stations evenly distributed across the globe and provides a fair representation of changes in mean temperature on a global scale over land. We don’t pick them so we can’t be accused of fixing the data. We are confident in the accuracy of our report.”
      
      Evenly distributed across the globe. There are 38 stations in the UK and NI,
      and 6 in California, comparing land area that means the UK is 10X more
      represented than Calif. For the US as a whole the figure would be 20X and
      for Canada the value is 25X. For land area in the northern hemisphere the
      main players are the former Soviet Union and Canada, you have to get both
      right.
Chris

Posted Dec 23, 2009 at 11:24 AM | Permalink

A key item revealed in the Met Office FAQ (although not new to many readers here) is that the original temperature data collection work in the 1980’s was financed by the US DOE. And the DOE has continued to fund CRU work for many years since then. Should some group or organization have the will and the financial means to pursue FOIA requests legally, through all the roadblocks that will be thrown up, ALL of the CRU data and code will be subject to release. I also expect that the original raw temperature readings from the 1980s work are still around somewhere, even if there is no surviving copy at CRU.

Since none of it involves national security issues in the US there is no valid reason to withhold it. I’m sure it will take a court case (or a few) but the fact that the DOE, and thus the US taxpayers funded the data collection it’s not going to matter if some supposed commercial interests exist for the temperature data coming from certain locations.

I think there needs to be a concerted effort to ensure that all of the temperature data and its attendant metadata, and all the code used to adjust it needs to be completely out in the open. This goes not just for CRU but also GHCN and GISS, and any other data set that might be used.
Turboblocke

Posted Dec 23, 2009 at 11:36 AM | Permalink

At the very beginning of the article Steve says, “Only last summer, the Met Office had turned down my FOI request for station data, saying that the provision of station data to me would threaten the course of UK international relations. Apparently, these excuses have somehow ceased to apply.”
However, there doesn’t appear to be a change in policy:
From FAQ 7 http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html
We can only release data from NMSs when we have permission from them to do so. In the meantime we are releasing data from a network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release.

We will release more of the remaining station data once we have the permissions in place to do so. We are dependent on international approvals to enable this final step and cannot guarantee that we will get permission from all data owners.”

Auditing some of the comments above, it appears that some of you have read the FAQs… how come you didn’t pick up that little gem?

Steve: I read that. On its face, it suggests that they could have released much if not most of the present information last summer. Or that they could have sought waivers last summer as they were obliged to do under FOI law as others have pointed out. A question: what countries have sent in waivers relevant to the present situation? Is there any evidence that these countries ever had legally relevant confidentiality agreements? Pardon me if I don’t take everything at face value.
Steven Mosher

Posted Dec 23, 2009 at 12:14 PM | Permalink

I’ve got FOIA’s in to determine if Jones followed procedures in accepting confidential data. Keep you all posted.
tty

Posted Dec 23, 2009 at 12:25 PM | Permalink

Following Steves advice I have looked at a specific case, Haparanda in Sweden. I chose this because it is in all the major datasets, and is one of only three Swedish stations in the released CRUTEM data that is more-or-less rural (Haparanda is a small town, though it has been growing rapidly the last few years)

I have analyzed the differences between the following versions:

GHCN Raw (four datasets: 0 1860-1991, 1 1949-1990, 2 1961-1970, 3 1987-2009)
GHCN Adjusted (two datasets 0 1860-1991, 1 1949-1990)
GISS Adjusted (one dataset 1881-2009)
CRU “Value-added” (1859-2009)
NORDKLIM (one dataset 1890-2001)

NORDKLIM (http://www.smhi.se/hfa_coord/nordklim/) is a cooperative climatology program operated by the Nordic meteorological services, I included it since it is presumably the “official” Swedish version.

The four unadjusted GHCN sets are very similar, though not quite identical, where they overlap. The differences are slight, a matter of one or two tenths of a degree in odd months. These look like rounding errors or minor corrections.

The NORDKLIM and the CRU datasets are completely identical except for 2001 where there is a very slight difference which looks like a small retrospective correction that was never implemented in NORDKLIM. It would seem that at least in this case the “value added” is zero, and the CRU version is identical to the data that was turned over by SMHI. These two versions are also very similar to the four GHCN raw versions.

GISS adjusted is 0.1 degrees colder than the GHCN raw version 1881-1991, and identical to GHCN dataset 3 from 1992 on. This is probably an inadvertent “adjustment” caused by the weird GISS algorithm for merging datasets which virtually guarantees that the older dataset is changed either up or down without any reason (note that dataset 0 ends in 1991).

The real “odd man out” is GHCN adjusted. This is adjusted by -0.825 degrees 1860-1928, -1.325 degrees 1929-1946, -0.65 degrees 1947-1954, and by zero from 1955 onwards. I must say it is very difficult to see any reason for these adjustments. They do largely eliminate a period of unusually high temperatures in 1930-45, but since this is amply attested from other sources, it seems that in this case GHCN is adjusting away real changes, not measurement errors.

Another intersting question is why there is no GHCN adjusted dataset after 1991. This is true for all swedish stations. There are absolutely no adjusted data for any station after 1991. I seem to remember that there was a major changeover in data handling that year, so that datasets before and after 1991 are generally separate. Could it be that the GHCN algorithm (which is only supposed to use series longe than 20 years) does not recognize that they are actually from the same site? In that case we can expect a major influx of new “adjusted” data in 2011.
thefordprefect

Posted Dec 23, 2009 at 12:29 PM | Permalink

The raw data is not destroyed, you just have to work to get it. For example at Armargh NI:

raw data

etc.

This is a good example of well recorded daily data. Other locations would have been more difficult to read and transcribe.

To store such data in the 80s would have been difficult – 1000 stations (say), 150KB per picture, per month, per am/pm=1000*150*2= 3.6GB/year
To store the text hand converted would require much less but then you loose the raw data.

Armargh is interesting in that looking at the monthly data there is a discontinuity September 1878 which raises all preceeding months by 1 deg C. Overall the hockey stick shape is well preserved when this is corrected:

Note that for many years the met office has had on line unadjusted temps of a few uk stations.

Steve: CRU’s data is monthly. The raw data could have been saved using 1980s technology. If they digitized paper records, they should have preserved the paper records in case originals in say Mozambique got lost or destroyed.
- _Jim
  
  Posted Dec 23, 2009 at 11:50 PM | Permalink
  
  thefordprefect Posted Dec 23, 2009 at 12:29 PM
  
  The raw data is not destroyed, you just have to work to get it. For example at Armargh NI:
  …
  This is a good example of well recorded daily data. Other locations would have been more difficult to read and transcribe.
  
  I have to say, that is a poor quality image, hardly easily readable;
  a) it looks to have been digitized at low-res, and
  b) it appears to have been been digitized at two levels (i.e., as there are appear to be no shades of grey)
  
  To store such data in the 80s would have been difficult – 1000 stations (say), 150KB per picture, per month, per am/pm=1000*150*2= 3.6GB/year
  To store the text hand converted would require much less but then you loose the raw data.
  
  Well young whipper-snapper, this seems like a common meme going ’round the blogs from cetain quarters; we stored textual information back in ‘those’ days in textual form after an operation called keypunching (see) performed by a keypunch operator in the keypunch department.
  
  I would say there’s a good chance the ‘data’ they initially received had already been transcribed/keypunched onto cards – making a card deck (see) –
  
  – or had arrived on mag (magnetic) tape (with the ‘data’ having already been transcribed of course as described above) in unprocessed form. I find the thought of ‘Xeroxed’ records from all quarters of the world a little unlikely (we could consider facscimile (fax) machines as the medium, but they were not quite as ubiquitous yet BUT we are considering the bulk xfer of data in what was would have termed ‘datasets’).
  
  Now we come to what might seem a stickier area I have not yet seen addressed or questioned: how was a largish dataset like world temps handled and processed on the small machines of that day?
  
  Specifically, how would a multi-megabyte ‘file’ on a DASD (Direct Access Storage Device) or serially from a tape be processed on run after run, possibly looking at the effect of different adjustment techniques?
  
  Easy.
  
  It’s called “Multi-pass assembly” or Mutli-pass compilation”.
  
  The ‘tape’/card deck would have been read-in onto a DASD device (hard disk) or tape perhaps read by the program directly (speed penalty with thisn technique however) … a first pass would collect various ‘data’ from certain fields, creating an intermediate *new* data set (or two) stored on DASD (or maybe tape) … and this process step follwed by a 2nd ‘pass’ of the dataset on the DASD (or tape) into the program again *but* this time the other temporarly dataset is ‘read’ and some combination of operations takes place to homogenize or other process ‘the data’.
  
  Processes like this were common for various resident ‘assemblers’ or multi-pass compilers (think FORTRAN or PASCAL compilers etc in that day) used on machines in that era. A machine of modest memory (128KB) could process multi-thousand line ‘source’ files without tieing up memory for the simple storage of data from one ‘stage’ or phase of processing to the next.
  .
  .
Sinan Unur

Posted Dec 23, 2009 at 12:41 PM | Permalink

I am looking at .

Oh, the subtlety kills me: “Difference from long term average (1961-1990) instead of difference from recent average.
- Sinan Unur
  
  Posted Dec 23, 2009 at 12:42 PM | Permalink
  
  The image did not make it: http://www.metoffice.gov.uk/climatechange/science/monitoring/data-graphic.GIF
Steven Mosher

Posted Dec 23, 2009 at 1:29 PM | Permalink

Well we can put to rest Kim Cobbs concern that it will cost to much to open the code.

Gridding the value added data is a minor piece of code.

Calculating the error bars is far more interesting.

The code that turns “raw” data to value added data is the real holy grail.
- Peter
  
  Posted Dec 24, 2009 at 9:00 AM | Permalink
  
  Hi SM,
  
  I downloaded the CRU value added data for station near me that I have been familiar with for a long(30 yrs plus) time, and then the Environment Canada data for the same station and compared them. I created annual averages, then subtracted the Env Can averages from the CRU data. For this station, it doesn’t take code, just add 0.35 C to the record, starting in 1961, and leave it in until 2007, where it drops to 0.1, and then disappears in 2008. The EC data starts in 1941, and for the first 20 years the annual averages match within 0.01 – 0.04 C which I put down to transcription or rounding or some such. There are 4 outliers in the period from 1961-2008 but the difference returns to a constant amount the following year in each case. This was done in excel quickly, I have not scanned month by month to find the source of the outliers, I have assumed it is an erroneous value since it doesn’t persist.
  
  I mention I have been familiar with the station for a long time, it is a small town which hasn’t changed to speak of. Certainly no appreciable growth or shrinkage, no large industrial changes in it or environs, and what could possible justify a step change for 46 years, which suddenly disappears over 2 years now? Maybe they moved it in 1961 and put it back in 2008. Slowly.
  
  When you add a trendline in excel, the EC data has a barely detectable positive slope, the CRU line has 0.7-0.8C per century to the eyebal
  
  Steve: what’s the station and station #?
  - bender
    
    Posted Dec 24, 2009 at 9:19 AM | Permalink
    
    Would you care to post the plot? What is the station name or station code?
    - Peter
      
      Posted Dec 24, 2009 at 11:45 AM | Permalink
      
      Station is Yarmouth NS Canada, # 716030. If I have glaring errors please point them out, first try at this type of thing.
    - Peter
      
      Posted Dec 24, 2009 at 3:16 PM | Permalink
      
      Sorry 71603. Doesn’t take long for error to creep into numbers, I guess.
geo

Posted Dec 23, 2009 at 2:19 PM | Permalink

It really is heart warming to know that the mighty and ancient UK Foreign Office sprang into concerted effort last summer, and having hectored, pleaded, and cajoled governments all over the globe managed to make this possible.

I mean, that must be what happened given the previous denial, right?
Andrew Bennett

Posted Dec 23, 2009 at 2:20 PM | Permalink

About focussing on individual climate records such as those at Darwin: it is imperative to read in depth about the history of the area.

Yes, the Commonwealth of Australia has a most admirable reputation in meteorology, not least because the nation’s original 1901 constitution specifically empowers the Parliament “to make laws for the peace, order, and good government of the Commonwealth with respect to… meteorological observations”.

However, the Northern Territory has always been a special place in space and time.

The following are essential reading:

“We of the Never-Never” by Mrs Aeneas Gunn;

“Kings in Grass Castles” by Dame Mary Durack;

and especially

“The Territory” by Ernestine Hill. (The opening quote from Mark Twain says it all)

These titles do not appear to be in print outside of Australia but may be found in the library of any established American college, for example.

It would be prudent to examine, instead of Darwin’s, the climate data from south-eastern or south-western Australia. The data are freely available on line from the Bureau of Meteorology.
- Alexej Buergin
  
  Posted Dec 25, 2009 at 3:36 AM | Permalink
  
  Are you saying that Northern Australians cannot be trusted?
  - Andrew Bennett
    
    Posted Dec 25, 2009 at 1:06 PM | Permalink
    
    The early history of the NT is not that of a well ordered society. The Swan River Colony, on the other hand, had the benefit the lash.
    
    There is no doubt that modern data from Darwin are being collected by trained professionals, and so the location is a suitable basis for the SOI for example.
    - Alexej Buergin
      
      Posted Dec 26, 2009 at 5:18 AM | Permalink
      
      But they measured approximately the same temperatures 100 years ago as today (maybe half of a °C more). The (ever increasing) corrections have been applied since 1940 and are now at +3°C. Do the trained professionals do a better job?
      
      (Europeans would probably make the same remarks about SE-Australia that you make about the North).
    - Andrew Bennett
      
      Posted Dec 26, 2009 at 1:09 PM | Permalink
      
      Are you alleging that the post-1940 Darwin corrections were applied by BoM staff?
      
      Your blogic seems to be: some trained meteorologists have been incompetent or unprofessional, therefore all are. That is the real tragedy in all of this.
      
      Go to
      
      http://www.bom.gov.au/climate/averages/index.shtml
      
      and check out York Post Office (WA) and Bathurst Gaol (NSW). From my reading of the history, those data have fair credibility in the early years. The monthly mean maximum temperatures at the two sites seem, from a simple inspection, also to lack trends.
    - Alexej Buergin
      
      Posted Dec 27, 2009 at 3:25 PM | Permalink
      
      Is data that has “fair credibility” not a better indication of climate than data from another place a continent away?
      The focus is not on the measurements, but on the adjustments.
    - Andrew Bennett
      
      Posted Dec 28, 2009 at 3:20 PM | Permalink
      
      Your focus may well be the heavy adjustments at Darwin, which are absolutely not supported by the online BoM data. The more recent of the latter are nicely displayed as
      
      http://reg.bom.gov.au/cgi-bin/climate/change/trendmaps.cgi
      
      For Darwin time series see under ‘high quality site networks’. The slider on the site-specific page allows you to select the trend (“T”).
      
      No doubt, the homogenized global products that are so much in discussion have serious problems.
      
      I was responding to an earlier comment of Steve’s, same thread, that for a quick ‘return’ one should focus on selected sites rather than slog through an entire archive. My concern is the difficulty in assessing the data quality in old records from sites with unknown or turbulent histories. We should select sites that are rural and have long histories of civil order, such as
      
      http://www.bathurst-nsw.com/History.html
EdeF

Posted Dec 23, 2009 at 2:39 PM | Permalink

Everything was stored on magnetic tape in the 80s and 90s. Raw data from countries could
have been something as simple as a typewritten piece of paper. At what point did Hadley
decide they were the repository for global temp data? At that time the should have enacted
data archival methods.
Kenneth Fritsch

Posted Dec 23, 2009 at 2:49 PM | Permalink

Phil Jones has commented, and I will paraphrase here, that while the raw data could be reconstructed, it would be a waste of time to do it, but that he might do it someday.

While Jones’ statement appears one of his off the cuff and spur of the moment reactions, I am wondering if it is theoretically possible to reconstruct the raw data from the value added data – assuming that one has the algorithm for going from raw to value added data. I was thinking that if a value added data point result depends on using multiple pieces of raw data that a final result could be obtained by many paths.

The alternative to recovery of the original data would require CRU going back to the original sources and hoping that they would have retained the data and would supply it a second time. I have not heard that CRU has even made a good faith effort in that direction.

If Jones is being sincere and thoughtful in his statement above, I would guess that he does not see any real value in having the raw data in hand for testing his data set at any future time or revising it as USHCN did recently with its Version 2. Does anyone judge that if Jones were convinced of his wrong-headed thinking in this instance that he would be more inclined to make a good faith effort to recover the raw data?

By the way is the CRU algorithm used to process raw data to value added available to the public? And, if not, could this be a reason for Jones holding out on the raw data?
- Peter D. Tillman
  
  Posted Dec 24, 2009 at 1:27 PM | Permalink
  
  “By the way is the CRU algorithm used to process raw data to value added available to the public? And, if not, could this be a reason for Jones holding out on the raw data?”
  
  Nope. And duh.
  
  Pete Tillman
  —
  “It’s not what we don’t know that’s the problem, it’s what we know that ain’t so.”
  — Mark Twain (probably)
G. E. Lambert

Posted Dec 23, 2009 at 3:06 PM | Permalink

Apologies, I “mis-posted” this.

Have done just the most simplistic “look” at the data for MARQUETTE, MICHIGAN, 727430. I am, so far, unable to reproduce their results for “normal” and “standard deviation”. My results agree exactly for June though December. The results for January through May, however, are significantly different. I guess I need to look at and understand the data. Any ideas to save a novice from embarrassment?
- G. E. Lambert
  
  Posted Dec 23, 2009 at 3:22 PM | Permalink
  
  solved the problem – thanks anyway.
  - RomanM
    
    Posted Dec 23, 2009 at 3:35 PM | Permalink
    
    1979 was the problem…
    - G. E. Lambert
      
      Posted Dec 23, 2009 at 4:12 PM | Permalink
      
      yup, thanks
Follow the Money

Posted Dec 23, 2009 at 3:21 PM | Permalink

I am curious about the map evidencing the difference between the station subset data, that which is released, and the greater number of stations “used in land surface temperature record CRUTEM3.

The site says:

Why these stations?

The choice of network is designated by the World Meteorological Organisation for monitoring global, hemispheric and regional climate.

To compile the list of stations we have released, we have taken the WMO Regional Basic Climatological Network (RBCN) and Global Climate Observing System (GCOS) Surface Network stations, cross-matched it and released the unambiguous matches.

Who cares what the WMO suposedly “designated?” And was the WMO list further pared down by those that are “unambiguous mathches?”

I can eyeball some immediate problems on the map.

No. 1, the Volga River region, which is problematically forgotten elsewhere because of cooling.

Why not even one station from Taiwan?

Why is every station from New Zealand included? (Hmmmm)

And why the preference for Western US stations over Eastern ones?

I suspect the released “subset” is cherry picked to show warming, the other CRUTEM3 would also show warming, but with even more additional fiddling around.
Sinan Unur

Posted Dec 23, 2009 at 5:38 PM | Permalink

Steve:

Given that the CRU web site is still off-line, do you have any copies of files released back in 2007?

http://www.cru.uea.ac.uk/cru/data/landstations/ is redirecting to an “under construction” page.

CRU Reveals Station Identities

Steve: Yes, I have some versions that I’ll put online for interested parties.
geo

Posted Dec 23, 2009 at 7:58 PM | Permalink

Even as “value added” data, this will provide a good deal of insight. As Jones himself and the FOIA rejections have said, anyone can go back to given NMS (National Meterological Services) and get the raw data. So it should be possible piecemeal to get a good idea of what they are doing working backwards from this value added data for given countries to the raw data that many of those countries do make available.

In most cases that data is probably available today in much better/friendlier shape than Jones got it originally due to advances in storage/tech.
P Solar

Posted Dec 23, 2009 at 8:18 PM | Permalink

I posted on an earlier thread that in French the word “cru” is the past tense of believe, which ties up quite well with what it now means in English.

I should add it has another meaning: raw or uncooked. Seems this one got lost in translation 😉
Geoff Sherrington

Posted Dec 23, 2009 at 8:46 PM | Permalink

Re previous Geoff Sherrington
Posted Dec 23, 2009 at 12:11 AM

To clarify this post, I obtained a large tranche from the Hadley Met Office early in Dec 09 and emailed back asking for the provenance of the data (answer will come ater New Year). So when I wrote the above post, I made an error and looked at the end at the gridded data because I thought that was where the attention was focussed, hence my strange wording. Mea culpa.
I shall report if the two releases of station data are the same or add to each other.

See

http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html

for the early set. I’m still confused about whom has released what from where and how adjusted each is. I guess I’m not alone.
Susann

Posted Dec 23, 2009 at 9:04 PM | Permalink

It’s seems to me that there are several possibilities at work:

1) They didn’t want to release their data and horribly documented code because they were afraid of being exposed as sloppy bookkeepers.

2) They didn’t want their data and code to be released to a non-scientist or scientific adversary for silly tribe-ish reasons.

3) They didn’t want to release their data and code to outsiders because they were genuinely concerned that it would be used for nefarious purposes.

4) They didn’t want to release their data and code because they knew it would not stand up to scrutiny.

Not being a scientist able to judge the science itself, I am unable to determine which of the above applies.
- Craig Loehle
  
  Posted Dec 24, 2009 at 10:53 AM | Permalink
  
  Another possibility which I find more plausible is that they did not and do not want to release their code and data because this is the reason they exist and for which they receive millions in grants. If they release it and people see how easy it is to run their own analyses, they are out of business. I think the same applies to GISS where Gavin said they only have a 1/2 time person running their code and doing QA.
DGH

Posted Dec 24, 2009 at 10:33 AM | Permalink

From the Q&A on the announcement page.

“4. How can you be sure that the global temperature record is accurate?

…Furthermore, the strong scientific evidence that climate is changing as a result of human influence is also based on the growing evidence that other aspects of the climate system are changing; these include the atmosphere getting moister , global rainfall patterns changing, reductions in snow cover, glacier volume, and Arctic sea ice, increases in sea level and changes in global scale circulation patterns…”

The temperature record is accurate because it correlates with other climate changes which allow us to leap to the “human influence” conclusion. Trust us.
- bender
  
  Posted Dec 24, 2009 at 10:37 AM | Permalink
  
  snip – dont-try-to-prove/disprove-in-one-paragraph rule
  - bender
    
    Posted Dec 24, 2009 at 11:11 AM | Permalink
    
    bull. I asked that references be listed documenting the effects of AGW on circulation changes. I wasn’t arguing a dman thing. Please put my comment back.
    
    Steve: sorry. there’s so many non-compliant comments these days that I may make editing errors. Water under the bridge unfortunately.
    - bender
      
      Posted Dec 24, 2009 at 3:28 PM | Permalink
      
      No sweat at all. I’m trying to coax the discussion in the direction of Mann’s latest offing: “Global signatures and dynamical origins of the LIA and MCA”. Which is, to be fair, a bit of hat-racking if not coat-racking.
    - DGH
      
      Posted Dec 25, 2009 at 11:14 AM | Permalink
      
      Our host’s snip left a gap in the thread – apology in advance if I misunderstood your request. And another apology if I broke a local rule. I take it you asked for support re the global scale circulation patterns. That’s not mine to produce. You’ll need to request that from the Met Office folks.
      
      The quote that I posted, both the whole Question and excerpted Answer, were lifted directly from their announcement webpage. The last sentence, out of quotes, was my attempt to paraphrase their answer briefly highlight their ongoing buffoonery. The point that I was attempting to make was that the Met Office Q&A author’s bias was exposed by the “human influence” non sequitur. What does the A in AGW have to with the accuracy of the temperature record?
      
      More concerning (to me) is that they defend the quality of their value added data by referencing supposedly related climate changes, i.e. global scale circulation patterns. Related trends, if truly related and truly trendy, might support the conclusion they prefer, but certainly not the quality of the independent record of temperatures.
      
      The sun orbits the Earth, therefore my observation that the sun rises is accurate. The moon also rises therefore my observations of the sun are doubly accurate.
      
      No fellows, neither Science nor the solar system works that way.
Jeff Id

Posted Dec 24, 2009 at 1:15 PM | Permalink

I’ve published some instructions on how to get the Perl code running.

CRU Howto
Charlie T

Posted Dec 24, 2009 at 2:48 PM | Permalink

I looked at Dumfries (UK) and the file says at the top:
Source file= Jones
Jones data to= 1990
To get some idea of the overall impact of DrJ’s adjustments have a look at his ‘before and after’ adjustments graph in slide 8:
http://www.cgd.ucar.edu/cas/symposium/061909presentations/Jones_Boulder_june2009.ppt

Picking off the points (roughly)
From 1890 to 1980:
Before his adjustments the global thermometer record showed a cooling trend of about 0.36C per century
This is adjusted to a global warming trend of about +0.33C per century.
So the adjustments amount to double the warming, over this period.

If one goes from 1826 onward the global cooling trend is less (0.12 C/C) and the adjustment amount to only 125% of the warming.
william Pinn

Posted Dec 24, 2009 at 3:35 PM | Permalink

“I’ve found that people always tend to be protective over data; the main reason for this is that it can be misinterpreted and misrepresented by the McExperts. Joe SixPack, on the other hand, can’t tell a McExpert from a bona-fide Expert, hence, he gets misled. Self-interest groups exploit this weakness/naivety in Joe SixPack to drum-up support. It’s that simple…”

Nice try! But even if Joe SixPack is to stupid to interpret the data, surely those “experts,” whose climate models have never made a valid prediction, can debunk Joe SixPack’s findings. So there is nothing to fear and there is no excuse for withholding “public” data–unless the experts have something to hide.
Vincent Holbrook

Posted Dec 24, 2009 at 4:05 PM | Permalink

Did NCDC modify any of this data?
bin

Posted Dec 24, 2009 at 7:20 PM | Permalink

Wasn’t there an email in which one of the researchers said he would rather delete the data than share it? That could fix a later date on which the data could still be presumed to be around. If that date is rather recent, it would render the 80s lack of storage space argument redundant.

Feel free to delete this comment 🙂
bin

Posted Dec 25, 2009 at 12:29 AM | Permalink

http://www.eastangliaemails.com/emails.php?eid=877&filename=1210341221.txt

2. You can delete this attachment if you want. Keep this quiet also, but
this is the person who is putting in FOI requests for all emails Keith and Tim
have written and received re Ch 6 of AR4. We think we’ve found a way
around this.
…..
This message will self destruct in 10 seconds!
Cheers
Phil
Geoff Sherrington

Posted Dec 25, 2009 at 7:44 AM | Permalink

There are several sources of error for foreign countries supplying met records to others likr the British Met Office. Some have come up on Willis Eschenbach’s posts on WUWT, re Darwin.

I’ll use Australia’s BOM as an example, not wishing to infer that it has particular blame not shared by others, but to illustrate points where I have examples if needed.

2. The host might send truly raw data, but I suspect this would be rare after (say) 1950.

3. The host might send aggregated data, like monthly Tmean, or it might send daily Tmax and Tmin. What the receptor does with these is a guess. It is not generally within the means of the host to specify a particular treatment – it gets the standard global treatment, I suppose.

4. The process of homogenisation by the host is on-going. Thus, there can be a number of versions, all with the same start date, but some with different finishing dates AND varying degrees of adjustment. They can be sent to collection agencies like WMO or NOAA, who might or might not change to the most recent, or averge the overlapping sets, or take the start of one data set and the end of a later.

5. In the meantime, the host might then do more than adjustments for scribal errors, small corrections for station moves and minor meta data things. It might have done its own routine adjustments in much the same style as Giss have outlined. I do not know what mechanism is used to ensure that the adjustments are not done again.

6. The host might put historic data online. Often, the completed data will be more advanced than the metadata, so when one finds that the online data corresponds to no other, one suspects further adjustment.

7. The host might, in a few years, issue another online set. Tertiary users like KNMI might have adopted the first set and not upgraded to the second set and they might not know how to decsribe which set is which because of a lack of meta data to tag each version and its changes.

8. People in the host country might write papers introducing data from other relevant nearby comparison stations that might not be eligible for the global set because of length or missing data or whatever. These can find their way into larger meta stats studies.

The overall result is that the paper trail from original observation to data used in estimation of global means is littered with traps and it is very hard to reverse engineer to see what the host did what to this or that, when and why and by how much, PLUS, what what the recipient did to each of the host stations, which for some stations can number up to 8 possibilities to go from host to compiler.

These are some of the reasons why it can be dangerous to simply pull a set of temperatures off the Net and do studies with them. You can, luck being what it is, end up with a temp set that has divergence from others of a couple of degrees C a year in some years.

If you do a neat proxy study using a wrong temp set, the calibration is wrong and the inferences drawn for the pre-instrument era are wrong.

This possibility is perhaps more common than not.

In summary, I do not know which home data match the Brit Met Office release. It is possible that it is complex, with some stations being treated differently to others, with no warning.
oldgifford

Posted Dec 26, 2009 at 10:58 AM | Permalink

First how I converted the CRU downloads to XLS
Moved all the files into one directory as .txt files
You can download this as a zip file

http://www.akk.me.uk/URL CRU_data.zip

It’s ~ 3.3Mb

Rename all the files to .xls
You then get the data in one column
Uses data –text to columns, delimited –space to split into columns.

Would appreciate your comments on this anomaly.

I took the Met Office Station data for Oxford.
They give tmax and tmin.
For each year added the 12 month values and divided by 12 to get the average for the year.

Calculated (tmax-tmin)/2+tmin to get the average temperature for each year.

Took the CRU data for Oxford which only covers 1900-1980.
[ I wonder why just this limited period when the station data is readily available on the Met Office site to 2009]
Again, added each month together and divided by 12 to get the year’s average
Compared the CRU average with the Met Office average.
Minor differences until the last 3 years when the difference jumps from maximums of around 0.05 to 0.5

I’ve asked the Met Office about the anomaly but still waiting for a reply.
Vincent

Posted Dec 26, 2009 at 6:10 PM | Permalink

These are certainly not “raw data” Few of you seem to understand what was actually nmeasured. In most cases it was the maximum and minimum temperatures taken once a day at verious times. The maximum was usually on a diffferent calendar day from the minimum. They were usually entered on data sheets, which are the true “raw data”

All the figures now supplied are the result of multiple processing. First, the average of the Max and min is taken, sometimes from the same day sometimes from the different days. Then they are averaged over the week, the month, the year after they have eliminated any readings that are unbelievable (depending what you believe), “estimated” missing sequences, time of measuement bias, changes of sites, instruments, observets, administration, and urbanization. lumped together under the heading “homogenization”.

All these processes are subject to errors which cannot currently be calculated, but their total must amount to several degrees Celsius, which should be applied to all of the figures supplied. It means that they are incapable of indicating an upwards or downwards trend unless they exceed several degrees.

The originaal data sheets are probably unobtainable. Even if they were they would be impossible to process. Even if it were p[ossible it is unlikely that they would provide useful information.
- ChrisZ
  
  Posted Dec 27, 2009 at 6:48 AM | Permalink
  
  All these processes are subject to errors which cannot currently be calculated, but their total must amount to several degrees Celsius. […]
  It is unlikely that [the original data sheets] would provide useful information.
  
  I respectfully disagree. Access to the original data sheets would allow, in the absence of the actual algorithms used for all the processing you describe, to reverse-engineer this processing and check whether the various “corrections” and embellishments are sensible, and – more importantly – whether they on average push long-term trends in a certain direction. I trust it is not to be put down to selective reporting that the few spot-checks done on single stations so far have shown a steeper upward slope in the processed version whenever changes were done in comparison to the raw data. What we need now is as much raw, or at least less-processed material, as can be recovered, to do a large number of such station-by-station comparisons, in order to make a well-founded statement about the degree of bias introduced into the available dataset by the corrective processing.
Roderic Fabian

Posted Dec 27, 2009 at 8:45 PM | Permalink

This link:

http://home.swbell.net/frew/lks.html

opens a web page where I’ve recorded links to Google Maps showing the locations of the CRU stations specified in the data release.

Scanning through this I’ve learned that about a third of the stations are rural. There are large areas such as much of Europe, South America, India, much of the Middle East, where there are no rural stations to speak of. Most of the rural stations seem to be concentrated in the other areas, such as North America and Siberia.

Since the locations are only to the nearest tenth of a degree one can only get a general impression of a station’s location unless the station name provides something specific.
Rod Fabian

Posted Dec 28, 2009 at 12:18 PM | Permalink

Here’s a web page with links to Google Maps of the station locations listed in the CRU data released:

http://home.swbell.net/frew/lks.html

Looks like about a third of the stations are rural, mostly in N. America and Siberia.

9 Trackbacks

By Den av Met Office utlovade släppningen av temperaturdata är tyvärr bara ännu en rökridå « Mats Bengtsson Blog on Dec 23, 2009 at 1:13 AM

[…] är trots detta ett litet steg framåt, då före Climategate vägrade Met Office att släppa ifrån sig även dessa modifierade data, så även om det enda som finns tillgångligt nu är en delmängd av de kritiserade modifierade […]
By CRU Releases (Some) Data « the Air Vent on Dec 23, 2009 at 1:30 AM

[…] dataset in climatology. Does this mean it’s done — nope. This is only the beginning, CA has a post describing the release of more metadata (data about data) and actual records for CRU than ever […]
By Climategate, what is going on? - EcoWho on Dec 23, 2009 at 1:41 AM

[…] Met Office Archives Data and Code finally some data comes out, but not the raw variety.. […]
By An Aussie Merry Christmas « TWAWKI on Dec 23, 2009 at 6:25 AM

[…] office finally starts to release the codes that for years it hid, More eco-religious child […]
By The Strata-Sphere » MET Office Did Not Release Actual HADCRUT3 Code on Dec 23, 2009 at 9:29 AM

[…] Steve McIntyre has a post up regarding the recent release of CUR code and data from the MET Office in the UK. I don’t have time for anything but a quick post at the moment (need to shovel some global warming that has my cars snowed in), but I can tell you now the code released by the MET is not the same as the code released with the CRU data and emails, and is very likely not the code used to create the recent CRU temperature profiles used by IPCC and others. […]
By Globull Whining – on its last gasp! « TWAWKI on Dec 24, 2009 at 4:44 PM

[…] office finally release data but its not the raw data. More at Climate Audit here Seems they are trying to placate the populace without doing the right thing – AGAIN! The […]
By Top Posts — WordPress.com on Dec 24, 2009 at 7:07 PM

[…] Met Office Archives Data and Code The UK Met Office has released a large tranche of station data, together with code. Only last summer, the Met Office […] […]
By Hadley Centre releases data–but questions remain « climategate.tv on Dec 26, 2009 at 4:06 AM

[…] Hill took note of the release next, and of Graham-Cummings’ initial analysis. Subsequently Steve McIntyre of Climate Audit noted that the Hadley Centre had only last summer refused to release any such data to him when he had […]
By Some Data Released in UK – NearWalden on Jan 26, 2010 at 8:18 PM

[…] Steve McIntyre reports that “the UK Met Office has released a large tranche of station data, together with code”. […]

Climate Audit