]]>My OSU colleague over in the Statistics Dept., Noel Cressie, has a 1993 book, Statistics for Spatial Data, that provides a modern update of kriging.

He also has an interesting article on global kriging with Gardar Johannesson, “Fixed rank kriging for very large spatial data sets,” J. Royal Statistical Soc B (2008), vol. 70, 209-226. They argue against isotropic exponential kriging, and instead reduce the rank of all global possibilities by means of wavelet-like bisquare local deviation functions centered on a geodesic grid that can have a general covariance matrix. They end up with a 396X396 covariance matrix to estimate, but have 173,405 satellite-generated observations (on ozone concentrations) to fit it with. Their objective is then to interpolate to 51840 prediction points.

All in all, I would say there were some nuggets of information and insights here.

A “silly” metaphor, if you ask me! :-)

]]>All in all, I would say there were some nuggets of information and insights here.

]]>Last year someone on CA or another science blog did a time lapse view of these fronts as viewed from above the South Pole. There were about 6 of them rotating year after year around the pole, which again produces correlations whose significance eludes me. The distance part of the correlation (corresponding to latitude) gets smaller as you approach the pole, so maybe the correlations should be examined on a rotation time base rather than a distance base.

In general, I have suspicions that a geostatistical semivariogram of temperatures from close-spaced stations taken on a given day would give a sill after a few km. If you smooth to monthly data, the range would be greater, then annual smoothing greater again. One of the reasons I’m pushing this approach is to put numbers on these assertions and then to contemplate what they mean. Personally, I see little useful information in the Della-Marta data graphs linked above, to the extent that their use could lead to invalid methods being adopted. Ditto the early Hansen equivalents. Correlations of 0.5 or below at 1,000 km do not excite me. They are near my definition of noise. The negative correlations might simply reflect that highs are followed by lows rotating around the globe and that these in turn are linked to temperatures. I’m surprised that this effect does not show more prominently. Maybe it is buried in latitude dependence.

]]>This makes the problem one of estimating the temperature at given point on the globe, and this then pulls in the use of local temperature estimation models (that I simply note in passing two things: 1) this brings in additional known information relevant to temperature estimation such as surface characteristics, position on globe etc and 2) these models are not necessarily dependent on having the complete time series to to build). However this does introduce further uncertainty into the estimates of the sample temperature, and I’m sure that those clever than I can then calculate the optimum sample size to maximize the quality of the final estimate (a function no doubt of the quality of the local temperature estimation models).

It would also be interesting to think about whether it is better to estimate the vector of global temperatures in time by drawing vector samples, or to estimate each year’s temperature independently.

I know less about USA Football cheer squad than I do about statistics, so I don’t think this will have much impact on the debate either!

]]>Hu, gridding is of course not limited to shapes bounded by lats and longs. Eqi-area hexagonal tiling should be able to be used, even at the expense of having an odd-shaped leftover in a region where there are no temperature records anyhow. I think the assumptions you might have to use would have a tiny effect on the outcome. The broader field of geostatistics (as opposed to the kriging part of it) accommodates different block sizes within an ore deposit. I’m keen for recent specialists to look at the problem from start to finish, rather than just a chapter or topic within geostatistics.

Maybe I would not be good at stimulating one of those USA Football cheer squads, because this debate has not roared into life, but that does not mean I do not appreciate beauty.

]]>My instinct would be to first define the problem in hand – what are we attempting to estimate and what inferences do we want to make from that. The appropriate model and statistical analysis would then flow from that.

I have to agree. Maybe it’s too many years of having to design systems that actually work. I just fail to see how you can come up with a system to solve a problem without first defining the problem to be solved and then fully defining the requirements of the system to solve it. Otherwise, as you said, you wind up with “… a technique looking for an application…”.

]]>I suppose that kriging was mentioned here as a potential alternative method for interpolating global temperatures or perhaps just as an interesting method of interpolation.

In climate science the critical estimate appears to be the CIs for temperatures both regional and global, monthly and seasonal, maximum and minimum. It is my opinion that climate scientists have to make stab determining these uncertainties in the instrumental temperature records in order that their conclusions concerning climate models can be properly evaluated as well as calibrations/verifications for reconstructions of historical temperatures. It is also my opinion that these estimations have not been all that satisfying and subject to assumptions.

Below I have listed some links to articles/information about estimating CIs for temperature measurements and interpolations.

What I have found in my calculations is that the spatial temperature relations change over time and are affected by factors such as altitude and proximities to large bodies of water and, of course, distance separation. Also a surprising feature for those who have not attempted to put together a continuous temperature series for 5 X 5 degree grids is the lack of such grids with a reasonable population over any extended period of time. There is no doubt that lots of interpolation is required to fill in the “empty spaces” and therefore the uncertainty of those interpolations is critical to any regional and global temperature trends.

People complain about the grid concept, but at some point a grid or effective grid is required in order to assign a temperature to a given area unless, of course, we can have measurements over infinitesimal areas.

What about spatial relationships amongst temperature measuring stations changing over time and a method for handling that. Could one practically have a model for the spatial relationship that changes periodically (even annually) for correlating station and/or grid temperatures? How refined of a model does one need to include all the effects of factors that are known to influence the spatial correlation of temperatures like distance, altitude, proximity to bodies of water and perhaps latitude. Finally what is the proper way of testing the validity of these models? Use pseudo temperature grids from a climate model? By withholding data from grids that a large number of station measurements?

Since kriging, as it is applied to geostatistics, does not deal with the time factor, I would guess that we could compare the methods used there to those of the temperature interpolation. For that purpose, I would think an expansion of the post that Geoff Sherrington posted here and more discussion of it is in order. Has anyone used kriging or some other similar statistical method, that includes time, to study the effects of plate tectonics on ore deposits?

]]>But simply aggregating temperatures in predetermined lat/long grid cells is no different to aggregating the phone numbers in a telephone book.

Gridding isn’t quite that bad, though it is a very inefficient way to estimate global temperature.

Hansen/GISS gridding, as I understand it, is equivalent to assuming a constant covariance throughout each 5° “square”. Missing squares are then infilled if they are within 1200 km (about 10.8 deg) of at least one gridsquare with at least one reading, with the equal-weighted average of those gridsquares. This is equivalent to assuming a constant, positive covariance between gridsquares that are within 1200 km of one another.

However, since occupied gridsquares are not influenced by nearby occupied gridsquares, this constant inter-gridsquare correlation must be essentially zero (say .01). Also, since the influence of a gridsquare on its neighborhood does not increase with the number of stations it has, there must not be a nugget effect, so that the effective intra-gridsquare correlations are effectively assumed to be 1.00.

Furthermore, since the 5° “gridsquares” in fact become narrow wedges in northern and southern latitudes, the distortions become more extreme near the poles — stations in Scandinavia that are relatively clustered get disproportinately little weight, while an isolated station in Siberia or Greenland gets inordinately high weight.

If the true correlations are anything like exponential with a .2 nugget factor and 1/e distance of about 1200 km (see the paper Mosh cited, Caesar, Alexander and Vose, http://hadobs.metoffice.com/hadghcnd/HadGHCND_paper.pdf ), this is a very inefficient way to compute a global average. Its properly computed variance (using the isotropic exponential model) would be much higher than if the appropriate exponential model had just been used in the first place.

The covariance structure implicit in gridding isn’t quite isotropic, since it depends on where you are in your “square”, and how square your “square” is. However, it could be approximated near the equator with a correlation function that is unity out to 2.5° (about 280km), then .01 out to 1200 km, and then dropping to zero itself beyond 1200 km. Near the poles, the areas of the inner circles then shrink in proportion to cos(latitude).

]]>