Comments on: Rule N and Weighted Regression

By: Geoff Sherrington

Geoff Sherrington — Fri, 21 Mar 2008 03:31:23 +0000

Re # 23 Hans Erren
Thankyou. The more I contemplate this figure the more questions I have. If it was a fundamental part of the construction of adjustments, I’d think it worth a revisit. I can’t find the source either, & I’m not being lazy. Geoff.

By: Hans Erren

Hans Erren — Fri, 21 Mar 2008 00:57:16 +0000

re 16
I can’t find the source at this moment, if I remember correctly it was from Hansen.
AFAIK It’s a pairwise correlation, grouped in lattidunal bands: of course stations can be more that 3000 km separated: in EW direction!

By: Geoff Sherrington

Geoff Sherrington — Fri, 21 Mar 2008 00:37:30 +0000

Re # 21 Hu

Many thanks. I’m trying to help others avoid the tedium of math analysis if some of the fubdamental inputs are doubtful. I re-read Steve’s article on station spacing (the one with the diverse map projections) and was none the wiser. We put years of computer team time into finding the right ways to interpolate ore grades and it is not a trivial subject. It has direct parallels with using paired stations to correct for surface temps and as I have been saying for a long time, you simply cannot use a search distance of 1200 km for temp work. In short, there is no global surface temperature graph, land or sea, in which I have any faith at all. You can drive a big truck through many of the assumptions and corrections.

By: Hu McCulloch

Hu McCulloch — Thu, 20 Mar 2008 17:20:21 +0000

Re Geoff, #19,20,

Hans would be the person to answer these questions. I’d like to know if he did these himself or if not where he found them, as they’re very useful.

One thing I see here is that beyond about 3000 KM (27 deg), the correlations are mostly noise, so that it is better to try to constrain them to a functional form than to just take each correlation at face value. They pretty much die to 0 by 3000 KM, except perhaps for the tropics. There is a piecewise linear curve fit line if you look closely.

Another funny thing is that the northern latitude correlations converge on unity at 0 distance, while the tropical and southern ones don’t. Is this climatological, or does it just mean that tropical and southern stations often aren’t as reliable as northern ones?

Geoff is right that it might make a difference whether distance is measured NS or EW. Also, what is the frequency of this data? Daily data surely has much less correlation, so I’m guessing this is monthly or annual.

By: Geoff Sherrington

Geoff Sherrington — Thu, 20 Mar 2008 11:19:16 +0000

Graph in #16 additional

Why are there so few points nestled beside the left Y-axis? The graphs indicate that there are very few cases of comparison stations being closer than about 200 km to each other.

“Curiouser and curiouser” as Dodgson would say.

By: Geoff Sherrington

Geoff Sherrington — Thu, 20 Mar 2008 08:46:10 +0000

Re # 16 Hu McCulloch

A quick look failed to find the answer to these questions.

In the series of graphs of correlation coefficient versus station separation, were the graphs conducted with a linear E-W search shape, or with some ellipsoid shape, or with a circle of up to 5000 km radius? Let’s assume the latter.

Given that a half-circumference of the earth is 20,000 km, North Pole to South, if you divide this into 7 zones shown (should be 8, but Antarctic is empty) then each latitude band is about 3,000 km wide. But the points on the graph go out to 5,000 km, so they are sampling adjacent bands. Problem?

The next problem with the graph is that the number of pairs within a latitude zone drops off with distance after a while. I would have thought that the more distant the search, the more numerous the pairs available.

Final problem, a correlation coefficient below about 0.8 gets a bit ratty when plotted. In some of the graphed latitude bands, most of the coefficients are below 0.8 by far, so I personally would not have much regard for them. In fact, I’d read very little of use into this graph other than that the tropics behave dissimilar to the poles.

Have you a reference to more of this type of work? Please don’t go to any trouble. I just seem to be missing some vital assumptions. Used to do similar things interpolating grades of ore deposits betweeen drill holes.

By: Steve McIntyre

Steve McIntyre — Wed, 19 Mar 2008 15:24:32 +0000

#16. Hu, both methods contain variance matching steps. The difference is whether you have more or less equal weights or whether you weight proxies according to correlation to the trend.

I haven’t parsed some of the recent discussion and will try to spend some time on it.

By: Patrick Henry

Patrick Henry — Wed, 19 Mar 2008 15:17:37 +0000

Some problems with GISS temperature data for Death Valley.
http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=425746190010&data_set=1&num_neighbors=1

GISS shows temperatures from 1911-1921 between 21.5 and 26 C. The official records from 1922 show the average for that period to be much higher, at 27.3C. Also note the photograph of a well maintained and properly located station.

Click to access mwr-050-01-0010.pdf

By: Hu McCulloch

Hu McCulloch — Wed, 19 Mar 2008 15:11:10 +0000

UC, #12, writes,

Interesting comparison, CVM vs. MBH algorithm. Would you prefer to eat a newspaper or drink a bottle of glue? [smile] Juckes INVR is CCE with identity S, IIRC.

I'll take the "glue" any day! Variance Matching or "NLS" as I call it, a la Moberg et al, is statistical nonsense. The MBH data set has a lot of problems, as we are all aware, but at least their estimator, as described above, is an estimator, if only an inefficient one. If the covariance matrix V of the first stage regression errors is known, then GLS with P = inv(V) is the efficient estimator of U. But any P, including P = I, gives an unbiased estimator. Of course, the variance of a non-GLS estimator requires knowing V, and falsely assuming that V = sigma^2 I will give you wrong answers, so you still have to come up with an intelligent estimate of V. If you have enough degrees of freedom to have reasonable confidence in this estimate of V, you may as well do GLS with it, and thereby improve your efficiency. (With small DOF, weights based on your estimate of V may be cherry picking randomly well-fitting proxies, but I'm willing to assume that away for now.) Thanks for the Technometrics cite. I'll take a look at it. Hans Erren, #13 writes, re my proposal to model the correlation as a simple declining function of the great circle distance between the two sites, But only north of 30N and south of 30S. In the equatorial region the spatial correlation is poor. Hans has already illustrated his point with an excellent graph he posted on 2/11/08 at comment #83 of Thread 2711. The graph itself is on his site at http://home.casema.nl/errenwijlens/co2/station_correlation.gif: The relationship is indeed tighter north of 23.6N than in the tropics or southern latitudes. An ideal model should somehow take this different behavior into account, but I think that as a first approximation, it's better to assume that the relationship is universal than to just ignore it altogether. The "gridding" beloved of climatologists is, if I understand it correctly, a clumsy attempt to take this correlation into account, by essentially assuming that the correlation is perfect within 5 degree gridcells (556 KM at the equator), and 0 between gridcells. MBH grid their data, I gather, presumably in an effort to "correct" for spatial correlation in this manner.

By: Patrick Henry

Patrick Henry — Wed, 19 Mar 2008 14:08:01 +0000

Some good NOAA records of raw temperature data from the 1930s, before any modern temperature manipulations. It appears that 1938 was a very hot year. January 1938 February 1938 March 1938 April 1938 May 1938 June, 1938 July 1938 August 1938 September 1938 October 1938 Dust storms Forest fires