that they sited the station where they did.

There are many other better locations in that

area outside of town that would meet all

requirements. ]]>

The simplest example is that of a coin flip. If I flip a fair coin 1000 times I would expect to see the percentage of heads approach 50%, but I would be quite surprised if it were exactly 50%.

The question is, how large is large? If I move to a more complex variable, such as a six-sided die, how many rolls of the die do I need to have confidence that the observed mean is within some guardband of the expected mean of 3.5? If I make it two or more die, how many additional rolls are required to converge on the expected mean, again within a specific guardband?

The number of dice rolls required for convergence does not change for loaded (biased) dice. If a die is loaded to favor six, for example, the expected mean might be 4.3, but 1000 or so rolls are still required to have confidence (guardbanded) in the observed mean.

When applying the LLN to temperature measurements, we need to think of the problem in many, many more dimensions. Here are some I can think of off the top of my head:

1) How many measurements of a specific thermometer by a specific observer are required to have confidence in the (guardbanded) accuracy of the observer’s reading of the thermometer?

2) How many measurements of temperature are needed in a single day to have confidence in the mean temperature (for example, a cold February day where the temperature jumps for a two hour period would register an artificially warm average if only daily min/max are used).

3) How many measurements are required in a specific grid cell to have confidence in the cell’s average?

One might argue that millions of thermometer readings are available, but one must also recognize that those readings have space and time dimensions to them as well. If I want to compare the temperature of 1999 with 1899, are 365 daily averages from each year “large enough”? Are they independent enough (remember, this is a probability theory requiring variable independence). Is the number of stations sampling the temperature on a given day in a given cell large enough? Etc.

]]>I moved from a cold climate to a warm one. Of course its warmer!

In the end. There is a difference between

1. weighing the same gold bar on the same calibrated scale over and over.

2. Weighing a fat lady on a yoyo diet with a scale that gets hit by basketballs

on every odd tuesday.

That is my statstical insight

]]>But this is getting ridiculous. ]]>

So, I suppose, using the LLN, if 1 million people tell me it was warmer 25 years ago than today… that

even though they are imprecise instruments, the result is magically fixed by the LLN.

Should be just fine if you adjust for memory smoothing over time.

Cackle

]]>You are quite correct. The quote to which you refer shows confusion of accuracy and precision. There are many forms of “measurement bias” or “inaccuracy” that are unaffected by the number of obvervations made. As a simple example, using your ruler a meter long, if the operator chose one a foot long by mistake, all readings would have gross errors.

]]> people are pretty crappy at guessing the temperature. They are an instrument but they have

a pretty big error. Still, as our friends tell us the LLN fixes everything.

And if 800 people can guess the weight of a bull to one pound, then imagine what a million

could do guessing temperature ?

So, I suppose, using the LLN, if 1 million people tell me it was warmer 25 years ago than today… that

even though they are imprecise instruments, the result is magically fixed by the LLN.

( a joke of course, or is it?)

]]>I think a ruler marked with a resolution of one meter cannot be used to obtain an accurate measurement of an object of length one micron no matter how large the number of attempts. In this case “accurate” meaning that the difference between the actual length of the object and the mean of the reported measurements is a small fraction of the actual length of the object.

What am I missing? Thanks

]]>Under the assumption that all these sites (or at least nearly all) are compliant with the stated process and specifications, one could use more sampling to reduce sampling errors ‘€” and, of course, eliminate the uncertainty of bias errors. Correction procedures are in place in attempts to reduce biases that can be detected when they occur over very short periods of time but it is not clear how well they would work where many sites are out of compliance and particularly when changes can occur over long time periods. I would guess that the assumption of nearly complete compliance is part and partial to the calculation of uncertainties and is why the specifications are developed and assumed to be adhered to.

A further concern that I have is in not knowing how the (lack of) coverage uncertainty would be handled/calculated if the assumption of compliance, or a proper and complete correction for it, in almost all sites was not found to be the case.

A poster recently pointed to the obvious truism that one cannot test quality into a product in reference to making adjustments to the data product, but I think in these cases, we are not at all certain how much quality is in the product or how much of the uncertainty calculations depend on an assumption of that quality.

]]>