Karl and Hansen Condemn Poor USHCN Metadata

Karl and Hansen have condemned poor USHCN metadata and, perhaps anticipating the need for something like surfacestations.org, gave a stunning endorsement to dramatically improving the collection and dissemination of photographs and other detailed site information for U.S. weather stations. In a statement, they said:

Are we making the measurements, collecting the data, and making it available in a way that both today’s scientist, as well as tomorrow’s, will be able to effectively increase our understanding of natural and human-induced climate change? We would answer the latter question with an emphatic NO. There is an urgent need for improving the record of performance.

They added:

It is necessary to fully document each weather station and its operating procedures. Relevant information includes: instruments, instrument sampling time, station location, exposure, local environmental conditions, and other platform specifics that could influence the data history. The recording should be a mandatory part of the observing routine and should be archived with the original data.

They concluded by saying:

The free, open, and timely exchange of data should be a fundamental U.S. governmental policy and, to the fullest extent possible, should be enforced throughout every federal agency that holds climate-relevant data. Freedom of access, low cost mechanisms that facilitate use (directories, catalogs, browse capabilities, availability of metadata on station histories, algorithm accessibility and documentation, etc.), and quality control should be an integral part of data management.

You think that I’m joking?

Well, I’m having a little fun. Karl and Hansen didn’t say this recently. They said this nearly 10 years ago, as members of a National Research Council panel set up to consider climate observing systems in the U.S. in response to concerns concerns expressed in 1997 by the Framework Convention about the deteriorating state of the climate observation network:

The 1997 Conference on the World Climate Research Programme to the Third Conference of the Parties of the United Nations Framework Convention on Climate Change concluded that the ability to monitor the global climate was inadequate and deteriorating.

In 1999, a U.S. National Research Council panel was commissioned to study the state of the U.S. climate observing systems and issued a report entitled: “Adequacy of Climate Observing Systems. National Academy Press”, online here and the above quotes are taken from this 1999 Report. The panel was chaired by Tom Karl. James Hansen was a member. A full listing of members is shown below.

The report was summarized as follows:

This report discussed how instrumentation, observing practices, processing algorithms, and data archive methods used by scientists may profoundly affect the understanding of climate change. The Board assessed whether scientists are making the measurements, collecting the data, and making it available in a way that would enable contemporary and future scientists to effectively increase understanding of natural and human-induced climate change. The report concluded that this was not the case, and illuminated the importance of multi-decadal climate monitoring and recommended strategies to achieve those goals.

The panel asked:

Are we making the measurements, collecting the data, and making it available in a way that both today’s scientist, as well as tomorrow’s, will be able to effectively increase our understanding of natural and human-induced climate change? The Panel on Climate Observing Systems Status would answer the latter question with an emphatic NO. Given the potential impact of anthropogenic climate change on our society and in a worst-case scenario a catastrophic change in climate, there is an urgent need for improving the record of performance.

They noted:

Although the models used to simulate climate have deficiencies themselves, a major limitation in the detection and attribution studies is the lack of sufficiently long, consistent records of observed climate variables.

Among their recommendations were the following:

3. Metadata: Fully document each observing system and its operating procedures. This is particularly important immediately prior to and following any contemplated change. Relevant information includes: instruments, instrument sampling time, calibration, validation, station location, exposure, local environmental conditions, and other platform specifics that could influence the data history. The recording should be a mandatory part of the observing routine and should be archived with the original data. Algorithms used to process observations need proper documentation. Documentation of changes and improvements in the algorithms should be carried along with the data throughout the data archiving process.

4. Data Quality and Continuity: Assess data quality and homogeneity as a part of routine operating procedures. This assessment should focus on the requirements for measuring climate variability and change, including routine evaluation of the long-term, high-resolution data capable of revealing and documenting important extreme weather events….

10. Data and Metadata Access: Develop data management systems that facilitate access, use, and interpretation of data and data products by users. Freedom of access, low cost mechanisms that facilitate use (directories, catalogs, browse capabilities, availability of metadata on station histories, algorithm accessibility and documentation, etc.), and quality control should be an integral part of data management. International cooperation is critical for successful data management.

Later they observed:

4. The free, open, and timely exchange of data should be a fundamental U.S. governmental policy and, to the fullest extent possible, should be enforced throughout every federal agency that holds climate-relevant data. Adherence to this principle should be promoted more effectively by the U.S. government in its international agreements, with particular attention given to implementation.

5. Vastly improved documentation of all changes in equipment, operations, and site factors in operational observing systems is required to build confidence in the time series of decadal-to-centennial climate change.

As a result of the NRC recommendations, a program to monitor the health of the observing system was supposedly set up, described here, which said:

As a result, the National Research Council (NRC) undertook an assessment of the U.S. climate observing capacity. The NRC recommended that a system of performance measures be developed and monitored on a regular basis because it would be unwise to wait for a major environmental assessment or data archeology effort to discover that problems that occurred 10 or 20 years earlier had already inflicted considerable damage to the climate record. The NRC also recommended that an institutional infrastructure be developed to assess the quality of data sets and correct problems as they occur.

Under this program, incinerators have been installed at the Tahoe City site; a barbecue and air conditioner at Marysvville and the Lake Spaulding site moved from a remote location to a site on a concrete pad. The National Research Council has responded to complaints about continuing problems with data access by instituting a new blue-ribbon panel on data access. In response to the recommendations of the Karl Panel, Karl as an executive has either taken no steps to document for each USHCN site its “local environmental conditions and other platform specifics that could influence the data history” including photographs (a step considered prudent for the new US CRN network) or, if collected, has taken no steps to publicly archive the site photographs in an accessible location. The former seems more likely. However, they have studied the data using two-step linear regression methodology to detect site discontinuities, and the results of new adjustments using two-step linear regression will be released as USHCN v2 in July 2007.

This entry was written by Stephen McIntyre, posted on Jun 5, 2007 at 7:52 PM, filed under Hansen, Surface Record. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

115 Comments

Russ

Posted Jun 5, 2007 at 8:16 PM | Permalink

Karl and Hansen sound like two politicians who need to get the bad news on the street before the elections season starts so, when the issue does come up, they can claim this is an old issue that has already been dealt with.
Steve Sadlov

Posted Jun 5, 2007 at 8:26 PM | Permalink

I took a meteo course from Dozier during my undergrad. He seemed OK, from my recollection. But that recollection is over 20 years old and at the time I was still a Brower quoting ecomaniac. So, caveat emptor! 🙂
Joel McDade

Posted Jun 5, 2007 at 8:29 PM | Permalink

Where is the first part of this (the first three word-press quotes) from?
Steve McIntyre

Posted Jun 5, 2007 at 8:33 PM | Permalink

#3. page 1 of the Foreword
David Smith

Posted Jun 5, 2007 at 9:00 PM | Permalink

I received an interesting comment from a USHCN site manager (the person who records and reports the data at a site).

He said that when they used a Stevenson box, the box was in a grassy area some distance from his location. It required field readings which could be a pain during bad weather.

Then, when the small cylindrical sensor (with its convenient indoor display and recorder) was installed (1990s), it was placed in his “backyard”, only 20 feet from his back door.

This much-closer location apparently was to accomodate the placement of the cable which connects the sensor to the indoor display/recorder.

He found it ironic that the box, which required field readings, was inconveniently some distance away while the cylinder, which read inside, was near his back door.

He also noted that the cylinder site selection and installation was done by government people, not him.
K

Posted Jun 5, 2007 at 9:05 PM | Permalink

It is hard to argue against this. And I don’t. Who isn’t for truth, justice, and er… apple pie.

And it provides a handy disclaimer for any work done to date. If the future refuses to cooperate with AGW then “the data made me do it.”

It also will enlarge funding for every aspect of climate science. It will be exactly those now running the show who will be the experts directing the greatly expanded programs; who else would know needs to be done.

And critics, like M&M, will be further marginalized. For “they are quibbling about the past, the real science is what is done now.”

Fun aside. Better data must be had. Personally I lean toward extensive satellite sensing rather than thousands of surface stations. This morning I see the US is cutting back – they cost too much – but that may be a ploy in a larger game.
Joe Ellebracht

Posted Jun 5, 2007 at 9:47 PM | Permalink

Yes, this statement is a good argument for using satellite data.
Gerald Machnee

Posted Jun 5, 2007 at 10:11 PM | Permalink

Great work Steve M and associates!!
You can say progress is forward.
sonicfrog

Posted Jun 5, 2007 at 10:22 PM | Permalink

Eli Rabbid…. HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA!!!!!! Told Ya!
Anthony Watts

Posted Jun 5, 2007 at 11:11 PM | Permalink

I think we all deserve a latte
Jim Johnson

Posted Jun 5, 2007 at 11:43 PM | Permalink

Latte?

You referring to my acronym for Local Anthropogenic Temperature Tainting Effects?

Seems like you’re finding lots of those, Anthony :^)

Keep up the good work.
John Baltutis

Posted Jun 6, 2007 at 12:34 AM | Permalink

IMHO, the work of surfacestations.org should give everyone a good feel for what’s occurred eight years after the report WRT to their recommendations.
Pompous Git

Posted Jun 6, 2007 at 3:42 AM | Permalink

Congratulations Steve. A minor, but important victory. For now… 🙂

Keep up the good work. History should treat you with somewhat more respect than the Team.
bigcitylib

Posted Jun 6, 2007 at 4:42 AM | Permalink

Is there a link to Hansen’s statement? I am unable to locate it via Google.
M. Simon

Posted Jun 6, 2007 at 5:22 AM | Permalink

If you read this with an open mind it is a major victory.

It brings all the surface records collected to date into question.

When you are trying to measure something your instrument needs to be at minimum 3X as accurate as the required accuracy of the signal. Less than that and you are basically measuring noise. Typically you want an instrument 10X as accurate.

If the temperature record is accurate to +/- 1 deg F and that is on the order of the signal measured then your data is no damn good. If that data is used to calibrate proxy data then proxy data can’t be counted on to be any better than +/- 3 deg F at best.

Which means that the proxy data can be used to define trends but not absolute values.

Here are some links that might be useful:

http://nist.gov/

http://www.cstl.nist.gov/div836/836.05/thermometry/calibrations/uncertainty.htm

http://www.iotech.com/handbook/chapt_6.html

===

http://www.ptcmetrology.com/Metrology_test_equipment.html

Traceability is defined as ” the property of a result of a measurement or the value of a standard whereby it can be related to stated references, usually national or international standards, through an unbroken chain of comparisons all having stated uncertainties”. Traceability ensures that the measurements are accurate representations of the specific quantity subject to measurement, within the uncertainty of the measurement. To ensure traceability, suitably calibrated standards that are appropriately maintained and cared for, proper standard operating procedures, continuous measurement control, surveillance, and suitable documentation must all be present.

Test numbers issued by NIST should not be used nor required as proof of the adequacy or traceability of a test measurement. Having a NIST number does not provide evidence that the measurement value provided by another organization is traceable.

===

In addition to even maintain +/- 1 deg F in the data regular calibrations are required. (Once a year minimum). On top of that how calibration is done is important. Is it on site or is the instrument shipped? Calibrations can shift due to shock and vibration when instruments are shipped. If it is done on site did the calibrating instrument shift from handling? NIST has information on the required calibrations and likely instrument shift over time. They have reccomendations for calibration intervals vs accuracy requirements.

That will take us forward. The question then is how good is the record we have? How well do the instruments used 100 years ago follow the practices we know today are needed to get good data.

The short version? It is all crap.
M. Simon

Posted Jun 6, 2007 at 5:36 AM | Permalink

If you have temperature instrument calibratioin questions contact:

Andrea Swiger

301 975-4800
andrea.swiger@nist.gov
bernie

Posted Jun 6, 2007 at 5:40 AM | Permalink

Did I miss something? Aren’t Steve’s quotations from early reports? Isn’t this a case of a ong term recognition of the limitations of the current record?
M. Simon

Posted Jun 6, 2007 at 5:47 AM | Permalink

A temperature calibration lab:

http://www.calibrationlabs.com/temperature_calibration_services.htm
steven mosher

Posted Jun 6, 2007 at 7:24 AM | Permalink

RE #15.

For what its worth, this is what Jones has to say.

“The random error
in a single thermometer reading is about
0.2C (1sigma) [Folland et al., 2001]; the monthly
average will be based on at least two readings a
day throughout the month, giving 60 or more
values contributing to the mean. So the error
in the monthly average will be at most
0.2/p60 = 0.03C and this will be uncorrelated
with the value for any other station or
the value for any other month.”
bigcitylib

Posted Jun 6, 2007 at 7:33 AM | Permalink

Bernie,

This is what I am wondering. Steve implies that Hansen has made this statement because of surfacestations.org.

If they all from some previous decade, that of course is unlikely.
Jaye

Posted Jun 6, 2007 at 7:42 AM | Permalink

I googled some of the phrases in the first couple of quote blocks and they came out of a 1999 report.
Geoff Olynyk

Posted Jun 6, 2007 at 7:48 AM | Permalink

Man, is that opening paragraph in the post misleading. Those statements by Karl et al. were made 8 years ago!

Doesn’t make the message any less true, mind you, but it’s not like surfacestations.org has (as of yet) struck such fear into the hearts of NASA that Hansen is releasing official statements about USHCN data archiving policies in response.
steven mosher

Posted Jun 6, 2007 at 8:02 AM | Permalink

re 20-22.

SteveM wrote “perhaps anticipating the efforts of surfacestations.org” in the first sentence.
So, Bernie I don’t see how “Steve implies that Hansen has made this statement because of surfacestations.org.”
as you write.

ANTICIPATING would kinda be a big clue. Some might think too much exposure to acausal filters may have
impacted reading skills, or not.
Michael Jankowski

Posted Jun 6, 2007 at 8:06 AM | Permalink

Yeah, probably would’ve been clearer with a statement referring to, “a website or effort along the lines of surfacestations.org.”
Jeff Norman

Posted Jun 6, 2007 at 8:07 AM | Permalink

Re:#20 & #22

SteveM starts by stating:

Karl and Hansen have condemned poor USHCN metadata and, perhaps anticipating the efforts of surfacestations.org

I put the emphasis on the “anticipating”.

The only dates in the article are 1997 and 1999 which suggests the need for something that details the limitations of the weather station data, like surfacestations.org say, was anticipated by eight to ten years.

So while you might have read it the wrong way it in no way means it was witten the wrong way.

Of course it could have been written better but getting something approved by a committee takes a long time, especially when climate science is involved.
rhodeymark

Posted Jun 6, 2007 at 8:13 AM | Permalink

Calibrate your irony meter…
Geoff Olynyk

Posted Jun 6, 2007 at 8:31 AM | Permalink

Oh, I read it correctly and understood. I do think it is a bit misleading to title a thread “Karl and Hansen condemn poor USHCN metadata” when the zeitgeist on this site for the last week has been about a new website condemning poor USHCN metadata. One gets the impression from the thread title that Karl et al. made these statements based on this _recent_ work.

Then there’s the word “anticipating”, which implies some forethought to surfacestations.org, again implying that the USHCN metadata issue has been on Hansen’s mind in 2007.

It’s a bit of a letdown to note that these statements are so old, because it means that (as comment #1 said) they could now say things like “Don’t worry, we dealt with this in 1999. Read the old report!”

In summary, I agree that nothing untrue was said in this thread, but the way it was presented was weird.

Now, this probably isn’t worth any more thought, so I’ll drop it.
MarkW

Posted Jun 6, 2007 at 8:43 AM | Permalink

Geoff,

You are seeing mainly what you want to see. Not necessarily what’s there.
steven mosher

Posted Jun 6, 2007 at 8:51 AM | Permalink

I find it funny that some folks went off 1/2 cocked. Some even googled to find the dates
when SteveM provided them. Had they been following the discussion, they would have known
there are several issues. The quality of the sites, the adjustment methodology, and site stewardship.
It’s not that hard to follow. we are merely doing the work that Hansen and Karl supported.
Skeptics? hardly. Team players!

I see a wonderful article(s) coming out of this.
Steve McIntyre

Posted Jun 6, 2007 at 8:54 AM | Permalink

#28. I did not “imply” that Karl and Hansen were responding to the recent “controversy” over photographing USHCN statements. All the statements e.g. the “emphatic NO” are clearly linked to the 1999 report. Sure I was having a little fun with them, but the fact that these problems were recognized 10 years ago and the situation is just as bad or worse now deserves mockery.

it means that (as comment #1 said) they could now say things like “Don’t worry, we dealt with this in 1999. Read the old report!”

They recognized the problem in 1999. If they say that they “dealt” with it, then that claim can be appraised by the present survey. The Marysville barbecue and Tahoe City incinerator would suggest that they haven’t dealt with the problem.
bernie

Posted Jun 6, 2007 at 8:59 AM | Permalink

Woah, Neddy.
I was simply pointing out that some in reading the post seemed to have missed the dates that Steve clearly cites – no big deal! (SteveMosher #23 – I think you confused me with someone else, perhaps bigcitylib? I fully concur with Jeff N’s reading of what SteveM originally posted.) Second, the fact that the quotations are from a statements 8 to 10 years previously does not lessen the point that quality audits of data are relevant. Third, the fact that the data quality issue has still not been addressed – though there is a promise that it is being addressed – is IMO a valid criticism of those responsible for the quality of the data.
bigcitylib

Posted Jun 6, 2007 at 8:59 AM | Permalink

The only way Hansen could have anticipated this is if he had read
Landscheidt. Scandalously misleading.
Steve McIntyre

Posted Jun 6, 2007 at 9:11 AM | Permalink

bigcityliberal fabricated the following claim against me over at Eli Rabett

Steve quotes statements made by Hansen et al about 8 or 9 years ago in documents he “neglects” to link to,

The post above contained a citation and link to the National Research Council panel report as follows:

In 1999, a U.S. National Research Council panel was commissioned to study the state of the U.S. climate observing systems and issued a report entitled: “Adequacy of Climate Observing Systems. National Academy Press”, online here

I followed that up with an image showing the makeup of the panel and extensive quotations from the report. Why would anyone say that I didn’t link to the 1999 report?

BTW “bigcityliberal” has a profile at his blog here and probably lives in downtown Toronto not far from me. His profile is not extensive but there’s an interesting connection to Eli Rabbit. I can’t provide a quote showing this because of my own blog rules. (I thought of giving myself an exemption but decided against it). Check it out.
steven mosher

Posted Jun 6, 2007 at 9:14 AM | Permalink

re 31.

Yup Bernie, I did confuse you with big city lib. Repeated exposure to acausal filters is my
excuse and I’m sticking to it. Mann made me do it. R^2 for this correlation is nearly .62
Steve McIntyre

Posted Jun 6, 2007 at 9:16 AM | Permalink

#Re 29. I’ll suggest to Anthony that it would be a good idea to write to the NOAA people at http://www.ncdc.noaa.gov/oa/hofn/ asking them whether (1) they have photographs of USHCN sites and if so would they either put them online or send them to you; (2) asking them whether they would like photographs of the sites by volunteers (in keeping with the voluntary spirit of the COOP network itself) and if there are protocols for such photographs that they would like obervers to comply with.
Eli Rabett

Posted Jun 6, 2007 at 9:22 AM | Permalink

Once more the perfect is floated out as the enemy of the useful. This is the same thing as was done with the Mann, Bradley and Hughes studies.
bigcitylib

Posted Jun 6, 2007 at 9:23 AM | Permalink

Whoopsie! It was unclear (perhaps purposely?) that the later document was where the former
quotes had come from, and I admit to not clicking through on that link. The post still reads
as if it were “breaking news”, however, implying that Hansen was making a statement in response
surfacestations. So: still scandalous.

Incidentally, what IS the connection between Eli and me?
bigcitylib

Posted Jun 6, 2007 at 9:29 AM | Permalink

Actually, I see it now. Nevermind.
Anthony Watts

Posted Jun 6, 2007 at 9:43 AM | Permalink

RE 35 SteveM, Yes I’ll do that, but it looks like they aren’t doing much. No pictures to be found on the website and USHCN is delegated to the bottom of this page almost as a footnote.

This page: http://www.ncdc.noaa.gov/oa/hofn/us-insitu.html

It offers output, but is typically coarse lat/lon and some maps with no landmarks, plus you can get some other tables.

It appear they still have their nose buried in data. No mention of photography anywhere. No QC reports on the sites themselves, on QC notes on the data.

Gotta run but maybe somebody can look up our recently discussed sites and post what they say the “health” of these sites are.
Ken Fritsch

Posted Jun 6, 2007 at 9:45 AM | Permalink

I have to admit to a bit of frustration when we allow discussions, such as this one, to get derailed by side issues. I see the issue as Hansen, Karl, et al. making statements about the concern of the quality of temperature data collection and then moving on without really doing anything substantial about it. In light of these statements, it makes the audit of quality control of temperature stations even more important.

In my mind the evident lack of an action plan makes their statements more of a political nature and in the vain of those politicians recently throwing out self righteous statements about goals for mitigating AGW. I would love to hear them tell (truthfully and realistically) what meeting these goals means to the voters back home. Even the greener EU seems to have trouble making the transition on AGW from talking the talk to walking the walk (not that I see the potential walks as best paths). As for Hansen, we seem to have seen much of his political side exposed over the past 20 years and to me this is simply another example of that exposure.
jae

Posted Jun 6, 2007 at 9:53 AM | Permalink

LOL. The problems were recognized 10 years ago, but nothing was done, and the data is still used as though it’s all valid. Gee, another example of the efficiency of the US Government.
MarkW

Posted Jun 6, 2007 at 9:54 AM | Permalink

Steve,

I suspect you meant #27.
Steve Sadlov

Posted Jun 6, 2007 at 9:56 AM | Permalink

RE: #39 – Whenever we hit paydirt (in the highly effective quality auditor sense) the usual suspects undertake a diversionary assault.
MarkW

Posted Jun 6, 2007 at 10:09 AM | Permalink

#36,

A cite would be usefull to back up your claim that Mann et. al. did anything to verify the quality of the network.
Steve McIntyre

Posted Jun 6, 2007 at 10:10 AM | Permalink

Once more the perfect is floated out as the enemy of the useful. This is the same thing as was done with the Mann, Bradley and Hughes studies.

Eli, pray be a little more specific. I’m a great believer in the useful and do not subscribe to the view that anything less than perfection should be rejected because it is not perfect. However, I do believe that attention to detail is important and that representations should be accurate. The USHCN network has been represented to have be “high-quality” and care taken in the selection of stations. A spot check showed that the network included stations like Marysville and others, so that the representation that the stations are “high-quality” and selected with care is false.

Is the network “useful”? It’s hard to say right now. If you averaged the Marysville barbecue and the decent Orland site and concluded that this was similar to an urban airport series, would that prove that the UHI was not material? Obviously not. Without knowing how much of the USHCN is contaminated by barbecues and incinerators and trailers, it’s hard to say that it’s “useful”. It’s possible that a “useful” network can be extracted from this. Obviously this should have been done a long time ago and not be dependent on volunteers.

As to MBH, it was undoubtedly “useful” to IPCC, but it had major flaws as discussed at length elsewhere. Worse, several key claims, instrumental to its widespread acceptance, were simply false: e.g. the claim to “statistial skill” without disclosing that the verification r2 was ~0. Mann illustrated the verification r2 in Figure 3 of MBH98 and then denied to the NAS panel that he had even calculated the statistic, saying that that would be a “foolish and incorrect thing to do”
paminator

Posted Jun 6, 2007 at 10:52 AM | Permalink

RE: 15 and 19

The analysis Jones presents is relevant for random, white-noise error. It does not address the accuracy issues described in #15. It does not address any of the error sources evident in Anthony Watts’ site surveys.

I agree with M. Simon in #15. The claims of accuracy in the surface temperature record made by Jones are crap.

Re: 36, Eli- Please elaborate on who benefits from the ‘usefulness’ of unquantified signal-swamping errors in the surface temperature record.
Eli Rabett

Posted Jun 6, 2007 at 10:55 AM | Permalink

Ken, Karl went out set up the US Climate Reference Network which is an elegant way of starting to characterize and deal with the problems in the 1999 report for ground stations. BTW, read the report, it deals with much more than surface measurements, including problems with satellite measurements and more.

The issue with the original two MBH papers was that there were formal errors, omissions in the data list, etc., but that the net effect on the result was small. That has been hashed over here more than elsewhere, but authoritatively in the NAS report. Moreover, as we see, subsequent proxy reconstructions with the same proxys, different proxies, other methods, etc, yields essentially the same result. This is brought home clearly in Figure 6.10c on page 467 of the WGI report. You can see it here, or download the entire 8 Mb chapt
Pat Frank

Posted Jun 6, 2007 at 11:24 AM | Permalink

#19 & 46 — Jones’ method is fine for determining the standard deviation of the mean, but is nonsense for establishing the uncertainty of a physical measurement. Applying the sqrt(N) criterion supposes that the physical accuracy of a measurement can be improved to an arbitrary minimum merely by making more measurements. I.e., if the physical error is +/-0.03 C after a month, it then becomes +/-0.007 C after a year? Hardly. Jones’ method of lessening errors is a lesson, rather, in how to misuse statistics. It won’t matter how many readings he takes, he’ll never know the temperature to a greater accuracy than the immediate calibration limits of his thermometer. That is, +/- 0.2 C. The high precision he gets after many readings will merely relate to the statistical mean. The physical uncertainty remains as before: +/- 0.2 C.
sonicfrog

Posted Jun 6, 2007 at 11:26 AM | Permalink

Once more the perfect is floated out as the enemy of the useful. This is the same thing as was done with the Mann, Bradley and Hughes studies.

I think the correct turn of phrase is “The perfect is the enemy of the good”. That’s great when you’re teaching an art class or creative writing, not so much when trying to put a man on the moon (or in this case, get a Mann off one (JK)).

I don’t understand the resistance to getting the most reliable and accurate temp measurements possible, thereby reducing the need to adjust the data for measurement errors and decreasing the chance that graphical temp trends are a byproduct of those adjustments. There are instances where adjustments are clearly needed, e.g. Christy and the Satellites (great name for a band), but, in the case of ground based stations, why rely on statistical adjustment when you can get much better / accurate raw data simply by maintaining the stations in a more scientifically proper manner. If Global Warming and the tracking of it’s progress is so important, then I would think this would be worth the effort.

If this “Army Of Davids” (to borrow from Glenn Reynolds) cataloging stations finds that most are placed and maintained properly, then that’s one less issue the skeptics have in the shed. If on the other hand, a good portion of the site mirror some of the worst photo’d so far, then there is an obvious problem that needs to be addressed.
Steve Sadlov

Posted Jun 6, 2007 at 12:22 PM | Permalink

FYI:

An interesting paper
Keith Herbert

Posted Jun 6, 2007 at 1:26 PM | Permalink

To bigcitylib:
I visited your website and see that you accuse Steve McIntyre of being an AGW denier (and later a GW denier). I am unable find anything to support that claim. Can you please point me to your reference? The criticism of the Mann et al Hockey Stick doesn’t constitute an opinion on global warming.
The Hockey Team did issue a corrigendum, though they do not admit it on their site. If I, as a structural engineer, were to make an error of the magnitude of the Hockey Team and try to pretend nothing was wrong, I would lose my license. But for some reason climate scientists are given great leeway and allowed to accuse others of impeding them.
MichaelJ

Posted Jun 6, 2007 at 2:05 PM | Permalink

Seems to me that the AGW defenders are quick to besmirch the reputations of the skeptics who are not convinced of their “science”. Why is this such a problem for those like bigcitylib, Eli, the now seldom seen Steve Bloom, and others who make it a point to throw aspersions on the place of employment or motivations for those who disagree that this is settled science? We would merely like to see accuracy in any science field. Why is Climate Science treated differently from other fields of science where replication of results is necessary for a theory to be verified? What is the threat to those who fervently believe in this theory to having accurate numbers and statistical methods employed to substantiate their research? For my whole adult life, I thought this was what science was supposed to be all about.
Jaye

Posted Jun 6, 2007 at 2:14 PM | Permalink

I put the emphasis on the “anticipating”.

The only dates in the article are 1997 and 1999 which suggests the need for something that details the limitations of the weather station data, like surfacestations.org say, was anticipated by eight to ten years.

Naw, that’s just post lawyering. Had he said something like “…they said that in anticipation of some future effort like surfacestations.org” then I would agree, but given the proximity to the recent surfacestation thread (which I plan on participating in, btw) the natural reading of the first few lines of the post is misleading. I suppose that you could say that technically there is an alternative interpretation but that’s stretching it. Not that SM doesn’t do a great job but tryin’ to keep it real.
Steve Sadlov

Posted Jun 6, 2007 at 2:22 PM | Permalink

RE: #51- Perhaps it’s because core beliefs of the true believers include things like “Love Your Mother (Earth),””All One People” or “each according to his abilities, each according to his needs?”

In other words, social and political goals enslave science for a perceived “higher order goal” or “greater good?”
Steve Sadlov

Posted Jun 6, 2007 at 2:23 PM | Permalink

Sorry that was supposed to be RE: #52.
Steve McIntyre

Posted Jun 6, 2007 at 2:24 PM | Permalink

I thought that the link to the 1999 report was pretty clear, but I’ll add a few sentences to clarify that HAnsen and Karl said this nearly 10 years ago and didn’t do anything about it.

BTW Eli’s comment above was in the spam filter queue for some reason (probably because of links in the post which sometimes triggers the filter) and I’ve just restored it.
steven mosher

Posted Jun 6, 2007 at 2:32 PM | Permalink

#36.

Eli, Thanks so much for posting here. I think open conversation is a good thing; You’re of a like mind I’m sure.
Anyways, you have butchered the good Dr. Johnsen when you wrote: “Once more the perfect is floated
out as the enemy of the useful.” Tell me you don’t teach at the university, please. However, some emeriti
excell at wearing the harlequin. Someone ,most assuredly, can reccommend a tailor in your area. You will have
to find your own nano hat maker.

This is NOT an issue of the perfect being an enemy of the good ( to get the quote right)
This is an issue of holding Hansen and Karl to their standard: Good sites. Good data.Open data. Open source.

So, I would anticipate that folks might start looking at sites in colorado. Colorado has about 25 sites.
A few dedicated folks can survey the sites in a few days. Heck, if you lived there you could do it. But you won’t
That is telling.
Mark T.

Posted Jun 6, 2007 at 2:38 PM | Permalink

Are there any on the Air Force Academy grounds? How about near Farrish Park (off of Rampart Range Rd.)? I’ll be up at Farrish (I think that’s how it is spelled) this weekend camping with my son. We were on the Academy grounds today doing a site survey for some testing we plan to do shortly. Somebody in our group mentioned a weather station.

Oh, if you hadn’t already guessed, I live in CO Springs. 🙂 I’ll also be down near the Rio Grande resevoir, camping in 30-mile campground (30 miles from Creede) the week of July 4th as well. If there are any around there, I could make a stop.

Mark
MichaelJ

Posted Jun 6, 2007 at 2:45 PM | Permalink

#54 Steve I greatly enjoy your posts and have learned quite a bit from them so keep it up. In a delicious contrast, I tried posting a much more innocuous post on realclimate.org where I pointed out an obvious error in one of their posters remarks but alas it was censored out completely. An incredibly good example of the difference in the two sites and their willingness to debate I should think.
Steve McIntyre

Posted Jun 6, 2007 at 2:45 PM | Permalink

The issue with the original two MBH papers was that there were formal errors, omissions in the data list, etc., but that the net effect on the result was small. That has been hashed over here more than elsewhere, but authoritatively in the NAS report. Moreover, as we see, subsequent proxy reconstructions with the same proxys, different proxies, other methods, etc, yields essentially the same result. This is brought home clearly in Figure 6.10c on page 467 of the WGI report. You can see it here, or download the entire 8 Mb chapter.

Eli, I realize that this is the Team talking point, but there’s more to it than this. Mann said that his reconstruction had “significant skill”; it actually had a verification r2 of ~0. IS this a “small” issue? An r2 is pretty much the statistical equivalent of bankruptcy. IF you have an r2 of 0, you really can’t place confidence intervals on the reconstruction as Mann purported to do. That’s why the NAS panel withdrew any “confidence” in the early portion of the Mann reconstruction.

The NAS panel said that bristlecones should be “avoided” in temperature reconstructions for the various reasons discussed elsewhere. They did no due diligence to check whether the illustrated reconstructions used bristlecones (they do). IT’s as though engineers said that substandard concrete should not be used in bridge construction and then presented 4 alternative designs all using substandard concrete. As to them being authoritative on this, North in a seminar online said that the NAS panel “just winged it” , saying “That’s what you do” on these type of panels.

As to the “different proxies” – there’s a common core of proxies in these reconstructions; they are not random. The common core are bristlecones/foxtails; Yamal rather than Polar Urals Update; Thompson’s Dunde ice core. Most of the proxies do not contain any common signal.

Try reading the Wegman report as well. North was asked under oath whether he disagreed with anything in the Wegman report and he said that he didn’t.
Michael Jankowski

Posted Jun 6, 2007 at 2:57 PM | Permalink

This is brought home clearly in Figure 6.10c on page 467 of the WGI report. You can see it here

You clearly haven’t followed the issues with proxies.

BTW, that figure also “brings home clearly” that the onset of widespread glacial retreat should have waited for the 20th century, and we know from the IPCC that it started several decades earlier than that. But I guess that detail is “small.” I’m sure it was the glaciers mistakenly receding prematurely, possibly a miscommunication via teleconnections of some sort.
steven mosher

Posted Jun 6, 2007 at 3:26 PM | Permalink

#48.

Thanks Pat. When I read Jones’s paper that paragraph stuck in my throat. But
I could not articulate what was wrong with it. Now I can. Thanks. Come to think of
it the refutation is rather simple. According to Jones a thermometer that had a single reading error
of 10 degrees C, would still produce a good average on a monthly or yearly basis or a century basis.
DeWitt Payne

Posted Jun 6, 2007 at 3:48 PM | Permalink

#62

While it’s possible to interpolate a thermometer reading, or any other measuring device with a scale, to finer than the smallest scale division, if you don’t record the reading to that precision, then the actual precision is the recorded data. Since temperatures are only reported to the nearest degree, the actual precision is +/- 1 degree, not 0.2 degree.
Ken Fritsch

Posted Jun 6, 2007 at 4:20 PM | Permalink

Re: #60

This is brought home clearly in Figure 6.10c on page 467 of the WGI report. You can see it here, or download the entire 8 Mb chapter.

Is not this reply evidence of Eli and others of like mind who have posted here before more or less taking the statements of scientists’ who put together the 4AR pretty much at face value without feeling they have the expertise to independently question them or the ability to judge those who might?

They seem to feel more comfortable when defending these scientists by attempting to make light of those who do criticize. The nuanced phrasing and spin of the IPCC and what it has “left out” seems to go right over their heads.

I think the attempt to equate criticisms of rather fundamental and basic points with getting in the way of the good and their attempts to do good is another good example of their condition ‘€” and a condition that from my experience gives little near term hopes for recovery.
Stan Palmer

Posted Jun 6, 2007 at 4:22 PM | Permalink

re 63

Measurements to the nearest degree would be +/-0.5 degrees. So a reading of 20degrees must be between 19.5 and 20.5
Hans Erren

Posted Jun 6, 2007 at 4:24 PM | Permalink

re 63:
How do you explain this then?
Jonathan Schafer

Posted Jun 6, 2007 at 4:38 PM | Permalink

I posted this link a few weeks back on unthreaded, but that thread gets so many comments some may have missed it. It seems in line with the general comments made by Steve Mc on this entry. Here are some excerpts which I found illuminating…Karl, btw, was a on the panel.

Natural Climate Variability On Decade-to-Century Time Scales

To obtain a clear picture of the causes of climate variability, both modeling and real-world observations must be employed, and the traces left by past changes must be uncovered. National and international efforts to explore and document climate variability on decade-to-century time scales have begun to provide the necessary foundation for examining human influences on climate. In addition to inspiring the fundamental questions above, the workshop’s presentations and wide-ranging discussions gave the CRC the basis for formulating the following set of recommendations for the conduct of future research.

Criteria must be established to ensure that key variables are identified and observations are made in such a way that their results will yield the most useful data base for future studies of climate variability on decade-to-century time scales. For instance:

Minimal quality standards that exceed those required for measuring diurnal and seasonal cycles must be implemented.

The quality and continuity of data acquisition must be maintained over time. (Converting successful, appropriate research programs to sustained operational ones is one way to provide this continuity.)

Critical forcings and internal climate variables must be monitored as well, to complete the data base.

Multiple quantities should be monitored simultaneously, to permit cross-checking and provide statistical control.

Models should be consulted to help design optimal sampling strategies for monitoring systems.

Modeling studies must be actively pursued in order to improve our skill in simulating and predicting the climate state. The use of many types of models, and closer links between models and observational studies (such as data assimilation), will be necessary. Specific recommendations are:

Different model types must be intercalibrated to establish levels of confidence.

Models of the individual climate system components must be improved in order to facilitate the development of better coupled models, which integrate these components.

Known weaknesses in both models and existing datasets must be targeted for study to improve our understanding of the non-linear global dynamics of the complex interactive climate system’€”for instance, the interactions between the wind-driven and thermohaline circulations are poorly understood.

To permit accurate data/model comparisons, both observational data and the model output must be subjected to the same processing.

Easier access to existing computers must be available to facilitate wider participation in modeling efforts, and higher-speed, larger-memory computers must be developed to make possible more highly resolved models, longer simulations, and more careful sensitivity studies.

Records of past climate change, particularly those reflecting the pre-industrial era, must be actively addressed as a source of valuable new data on the natural component of climate variability. The following approaches are recommended:

The coverage provided by existing proxy indicators must be expanded so that they yield regional, even global, information.

Interpretations of proxy indicators currently in use must be continually evaluated for possible improvements; the associated uncertainties and limitations must be assessed and problems identified.

New proxy indicators of climate must be developed to permit cross-checking of data derived from proxy records currently in use.

Efforts to locate and fully exploit the wealth of information contained in historical records must be supported.

Climate data must be properly archived, and made readily and freely available to researchers worldwide. Exchange of model-derived information, data from in situ observations, and proxy-record knowledge is an overarching concern. Solutions to the most challenging and important research problems depend greatly on the integration of data from all these sources.

This report is from 1995. They seemed to thing that data integrity, availability, etc. were all very important areas for climate research. How the times have changed. All efforts now seem to be to obsfucate and hide the data, and to denigrate anyone who challenges their opinions.

Sad.
steven mosher

Posted Jun 6, 2007 at 4:38 PM | Permalink

re #63.

I get it. When I read Jones’s paper, as I noted, my BS meter was pegged when I read
the error analysis. But I could not put the refutation as elegantly as Pat Frank did.
oh, you’re no slouch, so I learned from you as well. k?
jae

Posted Jun 6, 2007 at 4:39 PM | Permalink

Hans: don’t the errors cancel out over the long run?
Earle Williams

Posted Jun 6, 2007 at 4:46 PM | Permalink

Re #66

Hans Erren,

Annual temperature is a construct, not a measurement. Each daily reading that was recorded with precision of 1C is still limited in its precision. Certainly one can generate a mean temperature to many decimal places, but that does not increase the certainty of the measurement. For example what was the range of values recorded for 1971? All 365 measurements go into the single mean temperature construct. Actually it would be at least 730 measurements, since at a minimum there must be a min and max readings per day. But these are not 730 measurements of a single quantity. These are 730 measurements of 730 quantities: the instantaneous temperature for the day and time of the observation.

I may not be explaining myself well, but I posit that 730 observations of 730 phenomena does not equal 730 observations of 1 phenomenon. What that means for the resulting uncertainty of the constructed annual temperature I am not qualified to judge. but it is not as trivial as plotting a curve.
Hans Erren

Posted Jun 6, 2007 at 4:54 PM | Permalink

re 70:

I may not be explaining myself well, but I posit that 730 observations of 730 phenomena does not equal 730 observations of 1 phenomenon. What that means for the resulting uncertainty of the constructed annual temperature I am not qualified to judge. but it is not as trivial as plotting a curve.

To me – a practical geophysicsist – it shows that a daily observation with a one degree precision is sufficient to get an idea how much climate is changing, provided the accuracy is good. The latter is the art of homogenisation 😉
DeWitt Payne

Posted Jun 6, 2007 at 4:54 PM | Permalink

#69

don’t the errors cancel out over the long run?

IMO, maybe, maybe not. If the errors are gaussian, or the central limit theorem applies, then yes. Instrumental drift is 1/f noise, I think, and that gets worse rather than better by averaging. A simple analogy: when counting radioactive decay, the longer the counting time the better the precision as long as the counting time is very short compared to the half-life of the isotope. If it’s not, then longer counting times give worse precision and accuracy unless corrected for half-life.

#65

I think I made the same statement earlier and decided I was wrong when challenged. If you look at it from the significant figure point of view, I’m pretty sure it’s +/- 1 not +/- 0.5. There should be a way of proving it, but I’m not going to get started tonight.
DeWitt Payne

Posted Jun 6, 2007 at 5:02 PM | Permalink

a daily observation with a one degree precision is sufficient to get an idea how much climate is changing, provided the accuracy is good.

I agree that averaging over time and multiple stations will improve the precision of the measurement. However, I don’t think that precision will scale as (n)^-0.5. That’s the best case and real measurements are almost never best case.
Mark T.

Posted Jun 6, 2007 at 5:06 PM | Permalink

I think I made the same statement earlier and decided I was wrong when challenged. If you look at it from the significant figure point of view, I’m pretty sure it’s +/- 1 not +/- 0.5.

Accuracy of any measurement device is +/- 1/2 the smallest division.

Mark
Mark T.

Posted Jun 6, 2007 at 5:07 PM | Permalink

Uh, that assumes the device is working properly, of course. It could be faulty/malfunctioning and be waaaay off as well. 🙂

Mark
coolwater

Posted Jun 6, 2007 at 5:18 PM | Permalink

this whole discussion IMO is a reliability vs. validity issue. i think the reliability of the measuring instruments is good (or at least whatever error there is is random). To me, the much bigger threat is the validity of the measures, are they measuring what they purport to measure? If there are systematic biases (parking lots, etc.) then the answer is maybe not. I’m a little worried if the climate people are just now asking if their data has some validity issues
jae

Posted Jun 6, 2007 at 5:27 PM | Permalink

76:

I’m a little worried if the climate people are just now asking if their data has some validity issues

As Steve M has indicated in his post, they were asking the questions 10 years ago. But they did nothing and seemed to have stopped asking the questions. Maybe they don’t need to worry about them any more, since the train has now left the station, and they “fix” any problems by adding “adjustments.”
klaus brakebusch

Posted Jun 6, 2007 at 5:41 PM | Permalink

#69, jays

re: Hans: don’t the errors cancel out over the long run?

It depends on the type of error. There are in fact different types
of errors.

For example, If you have a thermometer which (perhaps due to fabrication)
shows 14.2 C instead of 14.0 C, you will have this error usually for
the whole time of live/usage of the thermometer.

If you have, for example an alcohol- or quicksilver-thermometer, it may
too, change it’s reaction to temperature with the age of the thermometer.
Even glass, when not properly fabricated, may allow diffusion of liquids
encapsulated into a body of glass. And a thermometer with decreasing content will show an increasing erroneus value over time.

Another version of errors are, people are reading a value between 14.1 and
14.2. When they have to write down the value, they will choose between one
of this values. Too, if the thermometer’s location in height fits well for
a medium sized person, a rather short person may prever 14.2, a rather tall
person may prefer 14.1. These are more or less random errors.

** Only random errors tend to cancel out over time
Type 1 and 2 errors usually are persistent for a whole dataset for that instrument.

Hmmm, I hope, I did explain it somehow. This is one of these occasions,
where I have to admit, that Englisch isn’t my mother language.
coolwater

Posted Jun 6, 2007 at 5:56 PM | Permalink

77: yeah, I see that now…well that’s too bad…it does seem like for something this important they could have done some quality audits.

I’m new to the climate data…but I’ve seen regressions done on bad data and then when you go back to drill down to see what’s going on it gets pretty ugly.
klaus brakebusch

Posted Jun 6, 2007 at 6:09 PM | Permalink

#73, #74

In fact it depends less on the theoretical point of view.
Practically, it depends more on the type of insrument and it’s display.

With an analog display, interpolating is rather easy. If you have a
thermometer which shows bars for every .1 C, you still can read values
of 0.1, 0.15 (if it’s in the midth of 2 bars) or 0.2
Here we could read values down to a precision of 0.05

Another case would be a digital display. If you get displayed a value of
0.2, it may be, in fact 0.2 or 0.15 or 0.25
Practically, the precision is less than 0.1
John Baltutis

Posted Jun 7, 2007 at 2:35 AM | Permalink

Re #67

National and international efforts to explore and document climate variability on decade-to-century time scales have begun to provide the necessary foundation for examining human influences on climate.

Which, IMHO, assumes that the human influences dominates climate variablity. This assumption has led to the UN/IPCC generated scares.
UC

Posted Jun 7, 2007 at 3:00 AM | Permalink

# 46 , …

I don’t think that quantization noise is a problem, that tends to average out (see Matlab code example below). The quoted text in #19 is from Brohan et al, and they consider bias-like errors in different sections. And those errors are problematic, try to average them out..

BTW, Brohan et al,

There will be a difference between the true mean monthly temperature (i.e. from 1 minute averages) and the average calculated by each station from measurements made less often; but this difference will also be present in the station normal and will cancel in the anomaly.

Just take two measurements per month, and we’ll get the same anomaly? Saves money.

close all
r=randn(365,1)+pi; % 'true signal' rr=round(r); % rounded t=(1:365)'; rm=cumsum(r)./t; rrm=cumsum(rr)./t; subplot(311) plot(r) subplot(312) plot(rr,'r') subplot(313) plot(rrm-rm) % quantization error after averaging N samples grid on
DeWitt Payne

Posted Jun 7, 2007 at 3:12 AM | Permalink

#74

I agree that the accuracy is +/- 0.5, but we’re talking precision here and I think that’s still +/- 1. I could be wrong, though.
DocMartyn

Posted Jun 7, 2007 at 4:33 AM | Permalink

RE #15.

For what its worth, this is what Jones has to say.

Re #9 “The random error in a single thermometer reading is about 0.2C (1sigma) [Folland et al., 2001]; the monthly average will be based on at least two readings a day throughout the month, giving 60 or more
values contributing to the mean. So the error in the monthly average will be at most 0.2/p60 = 0.03C and this will be uncorrelated with the value for any other station or the value for any other month.”

Doesn’t this assume that the error is random and distributed as a normal distribution, If the error curve is not normally distributed, but say sigmoidal, then there is a systemmic error in the system. Unless the actual error is sample there is now way you can say that the error in the monthly average is 0.2/p60 = 0.03C.
Moreover, if the rate at which the high and low temperatures are attained and then return to the average is different, the error for max and min will be very different; i.e. if min temperature is a spike that lasts for only 15 mins, whereas the max temp is a feature that lasts for an hour and half, then the response time of the thermometer will induce a biase.
The response time of the system must be at least an order of magnitude fast than the temperature change for you to be able to do an accurate recording.
MarkW

Posted Jun 7, 2007 at 5:11 AM | Permalink

There are two types of error here. Errors in reading the thermometer (if it is the old style glass one) and a bias error of the thermometer itself. Reading errors, will average out over time.
A bias error will stay the same, regardless of how many times it’s read.
MarkW

Posted Jun 7, 2007 at 5:14 AM | Permalink

Steve S.

I’ll try to find a link to it again, but I have a quote from a lady who worked in the previous Canadian administration. To paraphrase her, she stated that it didn’t matter if global warming is real or not, since the solutions proposed are things we need to be doing anyway.

To a Gaia worshiper, anything man does to change the environment is evil and must be fought.
MarkW

Posted Jun 7, 2007 at 5:19 AM | Permalink

#63,

If you are rounding to the nearest degree, wouldn’t the error be +/- 0.5?
Mark T.

Posted Jun 7, 2007 at 9:57 AM | Permalink

I agree that the accuracy is +/- 0.5, but we’re talking precision here and I think that’s still +/- 1. I could be wrong, though.

Precision would be 1 division, not +/- 1 division. The error is actually “best case” error, which is +/- 1/2 division.

Doesn’t this assume that the error is random and distributed as a normal distribution, If the error curve is not normally distributed, but say sigmoidal, then there is a systemmic error in the system. Unless the actual error is sample there is now way you can say that the error in the monthly average is 0.2/p60 = 0.03C.

Yes, sort of. The error needs to be centered around the true mean for it to cancel in the manner in which he states. Any combination of like PDFs would be covered by the CLT, so the average of the errors will approach a Gaussian PDF with mean about the systemic mean error, with variance of 1. If the systemic mean error is zero, i.e. the mean of the errors is the true value, then it would work.

Mark
Steve Sadlov

Posted Jun 7, 2007 at 11:00 AM | Permalink

RE: #86 – As you may be aware, I am a reformed Gaia worshipper / Deep Ecology True Believer. In my own case, it was a sort of hybrid between pure Gaia and a wannabe Native American approach, I’ll let your imagination fill in the blanks regarding some of the details regarding the latter. In my own construct at the time, Amerikkka, headquartered in LA, and headed by Reagan, was the new Rome. We (the revolutionary opposition) were akin to the Celts in the Asterix comic series. The “New Rome” was spreading its dark evil (of course, Big Oil driven) development across the land. Our own plan was a revolt, leading to the establishment of Ecotopia, straight out of Callenbach’s book. I am not making this up. I can only imagine how this has progressed since I was involved.
Dave Dardinger

Posted Jun 7, 2007 at 2:59 PM | Permalink

The real limit of accuracy is the calibration accuracy combined with the accuracy of the division line itself (equal distances between and square and sharp), not the actual distance between division lines. One way to see this is to think about using verniers which will allow you to get one or two additional digits of correct readings, assuming the main lines and verniers are sufficiently accurate.

Another way is to realize that in most cases a reading will be sufficiently far from the lines that you can be essentially 100% sure of the reading. Only when you’re near the line is there a possibility of observer error (except for the occasional recording error). BTW, if you want to take readings accurate to the nearest degree, you should score the lines on your thermometer on the half degrees.

Finally, realize that if you’re averaging many readings, the observer errors above will tend to average out, allowing still more accuracy to be determined. But, of course, there will still be bias errors which won’t average out, so adding more than another digit is likely just wishful thinking.
DeWitt Payne

Posted Jun 7, 2007 at 4:16 PM | Permalink

My question is why, if you have a digital thermometer (as in the current instruments) that reads to 0.1 degrees, would you round the reading to the nearest degree. IMO, that’s unnecessary censoring of the data causing a significant degradation in data quality.
Neil Fisher

Posted Jun 7, 2007 at 4:50 PM | Permalink

I must admit firstly that some of this talk of errors is over my head. However, I would like to add the following:

Firstly, the two parameters of precision and accuracy are quite different – you can take a reading with a precision of 0.5, but this gives no guarentee of accuracy to any degree. That is to say, your instrument may have a scale marked in 0.5 steps (which defines precision), but be incorrectly calibrated to the physical quantity you are measuring by, for example 10 units! In this case, our precision is good, but our accuracy is not – do not be deluded into thinking that high precision = high accuracy!

Secondly, there hasn’t been any discussion of the various calibration errors that are possible. So far, I have only seen mention of “bias”, which it seems to me is only considering an offset error (ie, the instrument reads 0.5 high at all readings). There are also linearity considerations (ie, the instrument shows 0.5 when true reading should be 0, but shows 110 when true reading should be 100). The linearity curve does not need to be a straight line, and may be a complex curve (an “S” shaped curve is quite common). Each individual instrument will be different. Well designed and simple instruments such as thermometers would likely show, at a guess, a gaussian distribution for linearity errors – can anyone quanitfy this?

There are other considerations as well (such as monotonicity for electronic sensing), but the point of this post is to get you thinking about possible errors and how they might affect, most especially, regional and global averages – in other words, what errors “average out” and what errors “add up”?
John F. Pittman

Posted Jun 7, 2007 at 6:20 PM | Permalink

I always liked what my heat Transfer professor did to show the difference of knowing and thinking you know how to get the correct answer. Many technical fields have to deal with signifiicant figures and data accuracy. He would have a problem on a test where if you used a calculator and carried all the digits (WHAT YOU THINK YOU KNOW) to the next step, rather than carrying the significant figures (WHAT IS ACTUALLY KNOWN), at the end you would get the wrong answer. He did this for a practical reason: to get his students to understand the limits of data. One of the limits in data posited in this thread and others can basically be stated “garbage in, garbage out”. The difference between random error and instrument drift is well known. It is the basis of NIST instruments. An instrument (or in situ) drift can’t be averaged out; it bias remains even in the mean for making computations. Suppose we had a thermometer of .01C accuracy. We mass produce it and set up hundreds of temperature stations around the world. We detect in 20 years that the average temperature is increasing and compute .1C per decade. Global warming…no. There was not a calibration done on the instrument and it has an uncorrected drift of .1C per decade. Microsites (in situ) can show the same problem. Once again our .01C accurate thermometer, but now each year it is calibrated. After 20 years we note that we have seen .1C per decade increase by our computattions. Global warming…no. We added asphalt, moved the thermometer closer to a building, next we airconditioned the building, added a barbeque grill, put in a light bulb that was sometimes left on to our station to read it easier.

The real point of all this, is that if this error has not been properly accounted, the adjustments and computations will be in error. Yet, from what I read in this thread and others, the methods described for some temperature adjustments do not address drift or microsite problems at all. They either assume there is some general trend in the area that the surronding sites can somehow make up for, or they assume all the causes of drift or micrositing have been explained such that only random remains. Not only are such assumptions not justifiable, the pictures show a totally different story. I hope posting a comment from a different thread that is relevant to data quality is allowed. In #9 http://www.climateaudit.org/?p=1640 Eli states

A quick addition. A single photo is a point in time…It is not that Anthony’s project is useless, it is that for climate science purposes it would not be very useful, of course for political purposes it would be very useful.

As a professioanl who has done investigations before for banks and lawyers, I could make Eli choke on his “droppings”. Note that in many of the pictues, that approximate age of bushes, asphalt, layers of paint of buildings, notice if the roof was lately retiled, notice the obvious age of the power pole nearby, etc. All this evidence can be used to show multi-year heat influence to the station. But it does not stop there. Perhaps we should look at what humans do. Did you know that people who fly in small planes love to take pictures…often posted at the small hanger, airport, etc that the stattion is located at? Guess what military planes do for practice? Nothing like shooting some film around the old base. Not to mention some excellent sat pictures you can pay for. You can get dates from these pictures as to influences you saw from another picture and correlate times, changes, and events. It is so good that I have personally written reports that cancelled million dollar loans and buys, have reviewed the felony convictions based on this evidence. Political?? In some areas criminal. Which brings me to my last comment.

We have had since 2002 the OMB, Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies, Final Guidelines (corrected), 67 Fed. Reg. 8452 (Feb. 22. 2002), which requires USA Federal Agencies to ensure they get obtain and disseminate good data. Perhaps we should launch an intiative for the OMB to audit the agency and work of Karl and Hansen. It appears that they or the agency (ies) they work for are not complying with Federal law, since they stated this about 1997, the law went into effect in 2002, and the work that Anthony is doing shows that what they admitted was true and is still true, and that they are still publishing, commenting, etc., providing what appears to be obviuosly flawed data.
rrs

Posted Jun 7, 2007 at 7:20 PM | Permalink

hmmmm, i used a slide rule in college. no chance of carring over extra digits. also learned error analysis much differently ….
steven Mosher

Posted Jun 7, 2007 at 11:01 PM | Permalink

I’ve been thinking some more about Jones quote

“The random error in a single thermometer reading is about 0.2C (1sigma) [Folland et al., 2001];
the monthly average will be based on at least two readings a day throughout the month, giving 60 or more
values contributing to the mean. So the error in the monthly average will be at most 0.2/p60 = 0.03C and this will
be uncorrelated with the value for any other station or the value for any other month.”

I think there is another hitherto undiscussed issue with it. The typical weather station makes
two measurments a day. A max reading and a min reading. To get the daily “mean” you add the two
and divide by two.

So lets imagine the signal is a nicely shaped sinusoidal wave with a max of 1C and a min of -1C.
And when we measure the max, assume a 1 sig error of .1C and when we measure the min assume the same error.

So, daily max will vary ( +-3 sig) from .7C to 1.3C and daily min will vary from -.7C to -1.3C

So, our daily sum will vary between -.6C and .6C and our daily mean would vary between -.3C and .3C
( assuming the errors are not corrrelated.. as in a bias towards always being high or low)

Right? so basically, Jones is wrong because while you make two measurements per day, those measurements
are combined to give one mean measrement per day, and that mean has the same 1 sig error +- .1C ( in my simple
example) as the two measurements used to make up that mean.

Am I being stupid here?
BlogReader

Posted Jun 8, 2007 at 1:07 AM | Permalink

Finally, realize that if you’re averaging many readings, the observer errors above will tend to average out, allowing still more accuracy to be determined.

Is that correct? I’m thinking of an extreme example where you read 10 deg C each day (where you have ticks every 1degC) for a month. Are you now accurate to anything beyond 9.5 to 10.5degC?
UC

Posted Jun 8, 2007 at 1:18 AM | Permalink

#96

Quantization noise is signal dependent. Your example is a good illustration of this.
UC

Posted Jun 8, 2007 at 1:22 AM | Permalink

#95

I think you are correct, if they compute midrange using noisy data, and use true midrange as a reference, then Brohan’s 60 is wrong number.
Willis Eschenbach

Posted Jun 8, 2007 at 2:55 AM | Permalink

Steven Mosher, thank you for your post. In fact, assuming the errors are not correlated, the standard error of the mean is not 0.1 as you say in your example.

Uncorrelated errors add “othogonally”. In general, the formula is

$error_{total}=\sqrt{error_1^2 +error_2^2}$

If each error is 0.1, the error of the sum of the two values will be

$error_{total}=\sqrt{2*0.1^2}=0.14$

and the error of the average will be half of that, or 0.07.

w.
UC

Posted Jun 8, 2007 at 4:08 AM | Permalink

I withdraw #98, 60 is ok. This error is also signal dependent, but we’ll get 60 for the worst case.
MarkW

Posted Jun 8, 2007 at 4:49 AM | Permalink

“A quick addition. A single photo is a point in time…It is not that Anthony’s project is useless, it is that for climate science purposes it would not be very useful, of course for political purposes it would be very useful.”

Quick question.

Wasn’t it Eli who was warning us not to let the perfect become the enemy of the good?

Yet here he is, doing just that.
steven Mosher

Posted Jun 8, 2007 at 7:05 AM | Permalink

re #99

Thanks, it’s been a few years
Melvin Jones

Posted Jun 8, 2007 at 4:38 PM | Permalink

If we’re dealing with temperature changes over the Earth, with a tiny amount of the planet covered as to measure it, and then we combine and combine and combine for a “50,000 foot view” into the issue, and all we can find is an average change of 1/2 of 1 percent a year in this converged data, I’d say that in order to really know what the heck is going on (not what we have now) we need to have really well calibrated thermometers, down to .001 accuracy, that are at least (say) 100 feet from outside influences. The readings need to be displayed down to the same degree of accuracy. The records, and places they are taken, need to be well documented, available and open for anyone to check and so on. Derived figures based upon adjustments for being put in a place they shouldn’t are not good (a 10% of the stations limit should be put in place?).

By the way, the headline being as if this was news was satire, a joke, irony, whatever. “Geez, dude, chill out.”

The emperor is wearing a leotard.
Melvin Jones

Posted Jun 8, 2007 at 4:44 PM | Permalink

MarkW, the only question I ask is that if the study of 1 photo is not too worthwhile, why does anyone want to be so vocal about it? If it doesn’t matter, why worry.

Of course, this begs another question: Doesn’t everything have to start some place? It’s not like it couldn’t be done every year or five or ten or whatever after this first time.
sonicfrog

Posted Jun 8, 2007 at 7:50 PM | Permalink

Perfection is the enemy of the good.

True(ism).

Better is the enemy of the good.

????

Apparently, in the world view of Eli… True.
RealityExplorer

Posted Jun 22, 2007 at 12:08 AM | Permalink

re:99

In fact, assuming the errors are not correlated, the standard error of the mean is not 0.1 as you say in your example.

Uncorrelated errors add “othogonally”. …and the error of the average will be half of that, or 0.07.

Lets try a reality check here. Say measurement A is 5 +- 0.1 ie from 4.9 to 5.1 Measurement B is 5 +- 0.1 ie again fro 4.9 to 5.1
Therefore the average is 5.. with the average of the low as 4.9 and the high is 5.1 … ie uncertainty is still 0.1
You don’t magically gain precision by averaging the values. I don’t know what the error distribution curve is like within that range (whether its a normal distribution curve or not.. depends on how much of this is reading uncertainty or calibration uncertainty, etc..) so while granted it would likely bunch up towards the center and make 5 more likely to be the result.. that can’t be known without knowing the error distribution.

I’d even wonder whether re: reading errors if there was any bias due to unconscious expectation among those who expect global warming towards being more likely to perceive the result as higher than it “is”. Similar perceptual expectation biases have been observed in psychological experiments. This could lead to a self fulfilling prophecy effect.. Over time there would be an upward drift as there would likely be (& have been) in increase in the number of people who expect global warming (or hope to see it to confirm their bias).. and an increase in how strongly they believe it and are inclined to be thinking about it as a potential factor that might impact their reading (vs. the times when they know its supposed to be a cold day and perhaps physically feel cold and are biased in that direction, or conversely on a hot day).

btw, using measurements to represent such a large geographic area can lead to the risk of more errors than just the localized heat island effects. eg, normal local weather patterns that shift so that eg jet stream or water currents shifting slightly over time relative to some fixed measurement points may alter temperature readings over one or more spots without necessarily changing readings elsewhere to compensate. The hope presumably is that on a random basis such random shifts would cancel each other out.. but I don’t have enough information to assess whether random chance might lead to some fluctuation up or down.. perhaps even for several years..
Willis Eschenbach

Posted Jun 22, 2007 at 3:22 AM | Permalink

RealityExplorer, statistics is often counter-intuitive. In this case, you can’t just average the low and high values as you say in your post.

The error value is a statement about the probable size of an error. In your example, if the value is 5 and the standard error is 0.1, there is about a 95% chance that the true answer is between 4.8 and 5.2. But the largest and smallest numbers are less probable. For example, there is about a 60% chance that the true value is between 4.9 and 5.1.

Now, suppose we average two such values. If the errors are independent, some of the time they will cancel each other. And only occasionally will the two extreme values occur at the same time. Because half of the time the errors are in different directions (one negative, one positive), the average of the two values will have a narrower range of probable errors.

This can be confirmed by the thought experiment of averaging say 10,000 such values, each with an error of 0.1. You will not get an average of 5 ⯠0.1 as you claim. You will come out with an average of 5 plus or minus a very small number, because the errors will average out. There is nothing “magical” about this as you seem to think. As long as the errors are symmetrically distributed and are independent, you do gain accuracy by averaging them. In this thought experiment, the answer is 5 ⯠0.001. The general formula for the average of a number of measurements of some value with the same standard error is

$error_{total} = \frac{\sqrt{N*error^2}}{N}=\frac{error}{\sqrt{N}}$

w.
Misidentify

Posted Jun 22, 2007 at 4:12 AM | Permalink

When an adjustment, which has its own independent error value, is made to a set of observed values, how should the error value be calculated? In the case of some weather station measurements a number of sequential adjustments is made, how then is the error value determined?
Paul Linsay

Posted Jun 22, 2007 at 6:40 AM | Permalink

#107, Willis, what you say is true if you have a measurement resolution of 0.001 or better. It’s not true if your best resolution is only 0.1 because your measurements will not have a Gaussian distribution that can be averaged over. You will just measure 5 +- 0.1 every time and the error is 0.1.
Steven B

Posted Jun 22, 2007 at 8:11 AM | Permalink

Paul,

I think what you are trying to say is that quantisation errors are correlated, so if the true value is 5.028 you’ll get 5.0 or 5.1 reported depending on the observer’s personal biases, and if the true value changes to 5.032, you still get 5.0 and 5.1 reported with the same frequencies. The convergence of the mean is to the mean of the quantised distribution, and for the accuracy improvement to work, the relative frequency of reporting 5.0 or 5.1 would have to change slightly as the true value went from 5.028 to 5.032. In practice, whether people round up or down is not sensitive to the precise value – for such a value they will almost always round down: very strong correlation.

This is clearly the case for a single quantity measured directly. But in climate measurements there are a whole range of temperatures that occur, and the quantisation error is each case will be different. One day the true value is 5.038, the next it is 7.212, the next 3.819. Now whether the quantisation pushes it up or down depends on whether those true values close to the half way points change relative frequencies depending on the precise value; the unambiguous ones always get pushed the same way. Now I could believe 5.04 might get treated noticably differently from 5.06, rounded down more often than up, but will 5.0499 really give a one in ten thousand change in ratio compared to 5.0501? Or will it be down entirely to observer peculiarities, as to what they habitually do when the point is indistinguishable from the half-way point?

It doesn’t matter whether the distribution is Gaussian or not – the central limit theorem applies to any distribution with finite variance. But it does very definitely matter that quantisation errors are correlated with the true value, and with each other. You can improve a little on the nominal quantisation by averaging, since some of the ambiguous cases can be influenced by where they are, but below a certain resolution no information can get through the filter of observation, and the accuracy cannot be improved beyond this point.
Paul Linsay

Posted Jun 22, 2007 at 9:30 AM | Permalink

#110, you’re correct. In principle 5.032 should be read as 5.0 some of the time and 5.1 the rest of the time. In practice that doesn’t happen. Even when using an automatic system to record the data, it usually gets stuck on one value or the other. Adding a bit of random noise of the right amplitude can solve this problem. But as always, attempts to improve resolution will run into other errors as the resolution improves.
RealityExplorer

Posted Jun 22, 2007 at 11:13 AM | Permalink

re: 110, Willis… I’d suggest you read my post more carefully. You were talking about statistics assuming a normal distribution it appeared.
I was referring to an error range independent of the error distribution. The actual range of potential values is *not* reduced by averaging 2 values,
magically.. it remains +-0.1 What may change is the distribution which I noted with certain error distributions, such as a normal distribution, would bunch up towards the middle as is effectively what you were saying, increasing the probability the result in the example is closer to 5. However the point is that if you don’t know the error distribution… all you can say for sure is that the error range is still +- 0.1 (and regardless of the distribution of likelihood of the “real value” within that result..that is still the range.. even if the outlying values are less likely).

The point was also that I’m curious if eg re: reading errors if people do know and take into account the true distribution of reading errors. ie if there is a psychological basis for people to read higher over time.. then that increases the likelihood that the actual value is less than the individual readings.. and less than the averages. If people are much more likely to read 5 rather than 4.9 due to unintentional/unaware bias.. (eg, even though its not likely, pretend 99% of the time they’ll read 4.9 as 5) then averaging values doesn’t change this and the “real” value is likely 4.9 rather than 5… and by not knowing the true error distribution your method of using statistic to imply that the average value is somehow more likely to be 5 based on two biased readings is wrong, misleading and an inappropriate way to use statistics..
RealityExplorer

Posted Jun 22, 2007 at 11:57 AM | Permalink

ps, though of course even the +-0.1 is an arbitrary cutoff. Also of course this is assuming reading eg analog thermometers.. I don’t know the
distribution of those vs. digital. However in addition there is some distribution of actual vs. reading errors due to calibration problems and due to systemic effects as people have noted re: urban heat island effect, other factors. Simply averaging values if there is eg a systemic factor leading to higher readings does not magically get rid of that effect and make the result more likely to be true.

I haven’t looked into this stuff in detail.. but I’m curious how often the pressure/density of the atmosphere, and its water vapor content, etc, are measured also at the same points/times within the record so that those factors can be taken into account since there is a meaningful difference in the weight that should be given to a temperature reading depending on the density of the atmosphere, and the heat characteristics of the different substances within the atmosphere at that point.

This is clear if you consider what conceptually the notion of a global temperature might mean and its purpose. I assume that in a sense the global average temperature result is a proxy conceptually for the energy contained within the system as a whole. In reality that energy is distributed unevenly over the surface (or actually throughout the volume of the atmosphere.. but ignoring that for the moment).
Presumably the notion is that if you average the temperatures in equal size block chunks over the globe you’ll get a meaningful average temperature for the whole. One factor I’m curious if they’ve taken into is that this notion really seems to depend on the idea there is an equal quantity of gas represented within each chunk so that the magnitude of the contribution to the total energy of the system for each block of the same physical size. However if the pressure and density of the gas varies then each block does not make an equal contribution to the energy of the system as a whole.

In some sense the notion of using a global temperature average might also be considered to be a proxy for the notion of what the temperature would be if the whole system were well-mixed and at equalibrium with no fluctuations over the globe… that would be the temperature. The difficulty with this is that its assuming temperature readings at each point should be given equal weight when considering the inter-mixed temperature. This ignores the issue of pressure/density varying. eg, to use an extreme that won’t happen in reality to illustrate the point.. pretend the atmosphere were twice as dense within the area of 1 temperature reading as another.. In that case the reading is a proxy for twice as much gas and presumably if everything were intermixed and at the same pressure that amount of gas would have contributed disproportionately to the resulting average temperature. Also of course the whole concept ignores issues related to temperature changes if a gas is placed under more pressure.. or less pressure.. if the atmosphere were mixed to be homogeneous.

There is also the issue that each temperature measurement is a proxy for a large geographic area within which the temperature will vary. eg, pretend you took 1 fixed temperature reading to stand in as a proxy for the global temperature.. Obviously that wouldn’t be very accurate and I’m sure there are points on the earth where that 1 temperature would show cooling, or warming, regardless of the rest of the world. Presumably localized weather patterns may change how that proxy reading site compares to what a true average over every point within that area might be (eg, el nino, etc, may shift values locally). Obviously the hope is that these factors will randomly cancel out.. but I’d be curious what distribution is of potential fluctuations and the likelihood they will all cancel out conveniently… especially given the small % of the surface (and ocean) over time that has been covered and the self-selected nature of much of that re: proximity to human civilizations or trade routes, etc, whose distribution may change as weather patterns change.

I’d also be curious how well they take into account variations in the distribution of temperature throughout a day vs. the time the temperature was taken
due to taking temperatures at the same time each day while the angle of the sun seasonally varies at that clock time.. Also if the temperature were continually taken throughout the day there would be a distribution curve of temperatures.. Taking eg 2 readings in a day and averaging them isn’t necessarily going to come up with real average for the day depending on the distribution of temperatures throughout the day. (and as the seasons vary, and localized weather changes.. the location of those 2 measurements within the daily distribution curve will vary). Again, hopefully randomness will cancel out various factors.. but what level of random fluctation might potentially exist..
jae

Posted Jun 22, 2007 at 12:07 PM | Permalink

113: you may want to read this.
Willis Eschenbach

Posted Jun 22, 2007 at 3:53 PM | Permalink

RealityExplorer, thanks for the clarification. Indeed, I was talking about statistical distributions of error values, while you were talking about extreme values. As you point out, provided we know the extreme limits in advance, they don’t change … but how often does that happen?

In addition, that is not what is usually meant by a statement such as “5 ⯠0.1”, which almost always means a value of 5, with one standard error being 0.1.

Your other comments (about pressure, lack of measurements, and distribution of temperatures during the day) are most relevant.

w.