I am new to Kriging myself, so please correct me if I make any errors here. Steve McIntyre (who may be on the beach at the moment!) is far more knowledgeable, and has posted about the topic frequently on CA. See, for starters, “Toeplitz Matrices and the Stahle Treering Network”, 3/22/08, “Antarctic Spatial Autocorrelation #1″, 2/20/09, “Steig Eigenvectors and Chladni Patterns”, and follow-up posts.
Wikipedia has a useful post on Kriging, in which it draws a distinction between “Simple Kriging”, which assumes a known constant unconditional mean for the random field, and “Ordinary Kriging”, where the unconditional mean is unknown. In the former case, the predicted value at an observed point in space will be a convex combination of the unconditional mean and an average of the observed values, while in the latter case it will just be an average of the observed values, with weights summing to 1. We are mostly concerned with “Ordinary Kriging”, though the distinction should be kept in mind.
From Steve’s posts, I associate Kriging with a covariance matrix that has correlations between pairs of observations that decay exponentially with distance, with or without an observation-specific “nugget effect” added on. I am puzzled, therefore, that the Wiki article makes no mention of this type of spatial structure, which I suggest be called “exponential Kriging” when necessary. Instead, Wiki just assumes a general covariance matrix, or “generalized Kriging” as it might be called.
Exponential Kriging can either be covariance stationary (typically called isotropic in the spatial context), ie have the same variances for every point in space (and same covariances for every pair of points as a function of distance), or just correlation stationary, ie have the same correlations as a function of distance, but location-specific variances. The latter might be appropriate if deserts and forests, mountains and plains, polar and tropics, or land and sea turn out to have different variances.
For exponential Kriging on a geoid without nugget effects, the correlation between two observations x(s) and x(s’), made at points s and s’ on the sphere, just becomes an exponential function of the great circle angular distance θ between s and s’:
corr(x(s), x(s’)) = exp(-λ θ)
When doing exponential Kriging on a plane, one could imagine extending the plane to infinity, and thereby obtaining a perfect estimate of the unknown mean as the average (either efficiently weighted or inefficiently unweighted) over all observations, which equal the unknown mean plus disturbances that have the exponentially decaying covariance structure.
On a geoid, however, the surface is finite, and therefore we can never estimate the unknown mean with perfect precision. In fact, in a climate context, what we are actually interested in is the average of x(s) over the whole surface, not the unobserved mean. If we call this unobserved average X, we can write
x(s) = X + y(s),
where y(s) has mean zero and is what I will call the deviation of x(s) from X itself. Although the covariance structure of the deviations y(s) derives from that of the disturbances x(s) about the unobserved mean μ, it is somewhat different, since X has finite variance about μ and correlates with the x(s)’s.
The variance of X about μ is in general the double integral over the sphere of all the covariances, divided by the square of the area A = 4π. Under covariance stationarity, this reduces to the single integral of the covariances relative to any point, divided by A. If I have done the math correctly, this variance is
V = var(X) = (exp(-λπ)+1)/(2(λ2 +1)),
when θ is measured in radians and assuming a unit variance for the disturbances. (The symbol π is Greek pi in this font, not the Roman letter en. This formula may be derived from Integral #407 in the 16th edition of the CRC Math Tables.)
The variance of the deviations y(s) = x(s) – X is then 1-V, and the covariances of the deviations (relative to a unit variance for the disturbances) are
cov(y(s), y(s’)) = exp(-λθ) – V.
Dividing this by 1-V to obtain correlations, we get
corr(y(s), y(s’)) = (2(λ2 + 1)exp(-λθ) – exp(-λπ) – 1)/(2λ2 + 1 – exp(-λπ))
Figure 1 below plots the correlations of both the disturbances and the deviations as a function of distance for λ = 10/π, assuming 20,000 km from pole to pole:
When we look at correlations of deviations from estimated means, we are not looking directly at the correlations of the disturbances, even after allowing for a nugget effect. The former are systematically lower than the latter, and in fact even go negative at long distances, and so will provide a downward biased estimate of λ if used without this adjustment.
Kriging is named for South African mining engineer D.G. Krige (master’s thesis, 1951). Since it’s a proper name, I prefer to use an upper case K (as in Gaussian distribution), though Wiki uses the lower case.
Update 11/21/10. My OSU colleague over in the Statistics Dept., Noel Cressie, has a 1993 book, Statistics for Spatial Data, that provides a modern update of kriging.
He also has an interesting article on global kriging with Gardar Johannesson, “Fixed rank kriging for very large spatial data sets,” J. Royal Statistical Soc B (2008), vol. 70, 209-226. They argue against isotropic exponential kriging, and instead reduce the rank of all global possibilities by means of wavelet-like bisquare local deviation functions centered on a geodesic grid that can have a general covariance matrix. They end up with a 396X396 covariance matrix to estimate, but have 173,405 satellite-generated observations (on ozone concentrations) to fit it with. Their objective is then to interpolate to 51840 prediction points.