Kriging on a Geoid

Geoff Sherrington and others on the First Difference Method post have requested a post for discussing Kriging.

I am new to Kriging myself, so please correct me if I make any errors here. Steve McIntyre (who may be on the beach at the moment!) is far more knowledgeable, and has posted about the topic frequently on CA. See, for starters, “Toeplitz Matrices and the Stahle Treering Network”, 3/22/08, “Antarctic Spatial Autocorrelation #1”, 2/20/09, “Steig Eigenvectors and Chladni Patterns”, and follow-up posts.

Wikipedia has a useful post on Kriging, in which it draws a distinction between “Simple Kriging”, which assumes a known constant unconditional mean for the random field, and “Ordinary Kriging”, where the unconditional mean is unknown. In the former case, the predicted value at an observed point in space will be a convex combination of the unconditional mean and an average of the observed values, while in the latter case it will just be an average of the observed values, with weights summing to 1. We are mostly concerned with “Ordinary Kriging”, though the distinction should be kept in mind.

From Steve’s posts, I associate Kriging with a covariance matrix that has correlations between pairs of observations that decay exponentially with distance, with or without an observation-specific “nugget effect” added on. I am puzzled, therefore, that the Wiki article makes no mention of this type of spatial structure, which I suggest be called “exponential Kriging” when necessary. Instead, Wiki just assumes a general covariance matrix, or “generalized Kriging” as it might be called.

Exponential Kriging can either be covariance stationary (typically called isotropic in the spatial context), ie have the same variances for every point in space (and same covariances for every pair of points as a function of distance), or just correlation stationary, ie have the same correlations as a function of distance, but location-specific variances. The latter might be appropriate if deserts and forests, mountains and plains, polar and tropics, or land and sea turn out to have different variances.

For exponential Kriging on a geoid without nugget effects, the correlation between two observations x(s) and x(s’), made at points s and s’ on the sphere, just becomes an exponential function of the great circle angular distance θ between s and s’:

corr(x(s), x(s’)) = exp(-λ θ)

When doing exponential Kriging on a plane, one could imagine extending the plane to infinity, and thereby obtaining a perfect estimate of the unknown mean as the average (either efficiently weighted or inefficiently unweighted) over all observations, which equal the unknown mean plus disturbances that have the exponentially decaying covariance structure.

On a geoid, however, the surface is finite, and therefore we can never estimate the unknown mean with perfect precision. In fact, in a climate context, what we are actually interested in is the average of x(s) over the whole surface, not the unobserved mean. If we call this unobserved average X, we can write

x(s) = X + y(s),

where y(s) has mean zero and is what I will call the deviation of x(s) from X itself. Although the covariance structure of the deviations y(s) derives from that of the disturbances x(s) about the unobserved mean μ, it is somewhat different, since X has finite variance about μ and correlates with the x(s)’s.

The variance of X about μ is in general the double integral over the sphere of all the covariances, divided by the square of the area A = 4π. Under covariance stationarity, this reduces to the single integral of the covariances relative to any point, divided by A. If I have done the math correctly, this variance is

V = var(X) = (exp(-λπ)+1)/(2(λ² +1)),

when θ is measured in radians and assuming a unit variance for the disturbances. (The symbol π is Greek pi in this font, not the Roman letter en. This formula may be derived from Integral #407 in the 16th edition of the CRC Math Tables.)

The variance of the deviations y(s) = x(s) – X is then 1-V, and the covariances of the deviations (relative to a unit variance for the disturbances) are

cov(y(s), y(s’)) = exp(-λθ) – V.

Dividing this by 1-V to obtain correlations, we get

corr(y(s), y(s’)) = (2(λ² + 1)exp(-λθ) – exp(-λπ) – 1)/(2λ² + 1 – exp(-λπ))
[corrected]

Figure 1 below plots the correlations of both the disturbances and the deviations as a function of distance for λ = 10/π, assuming 20,000 km from pole to pole:

Figure 1

When we look at correlations of deviations from estimated means, we are not looking directly at the correlations of the disturbances, even after allowing for a nugget effect. The former are systematically lower than the latter, and in fact even go negative at long distances, and so will provide a downward biased estimate of λ if used without this adjustment.

Kriging is named for South African mining engineer D.G. Krige (master’s thesis, 1951). Since it’s a proper name, I prefer to use an upper case K (as in Gaussian distribution), though Wiki uses the lower case.

Update 11/21/10. My OSU colleague over in the Statistics Dept., Noel Cressie, has a 1993 book, Statistics for Spatial Data, that provides a modern update of kriging.

He also has an interesting article on global kriging with Gardar Johannesson, “Fixed rank kriging for very large spatial data sets,” J. Royal Statistical Soc B (2008), vol. 70, 209-226. They argue against isotropic exponential kriging, and instead reduce the rank of all global possibilities by means of wavelet-like bisquare local deviation functions centered on a geodesic grid that can have a general covariance matrix. They end up with a 396X396 covariance matrix to estimate, but have 173,405 satellite-generated observations (on ozone concentrations) to fit it with. Their objective is then to interpolate to 51840 prediction points.

This entry was written by Hu McCulloch, posted on Aug 26, 2010 at 1:24 PM, filed under Uncategorized and tagged kriging, mcculloch. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

40 Comments

Nick Stokes

Posted Aug 26, 2010 at 4:28 PM | Permalink

Wiki says that you get to the covariance function by estimating the variogram. In practice this is often done with various parametric models. The first of these listed by Wiki is the exponential model. The second, “spherical variogram” model may be of interest here.
- Jeff Id
  
  Posted Aug 26, 2010 at 5:27 PM | Permalink
  
  I was wondering the same thing. I’ve never kriged, it’s all the rage in Europe I hear. Everything in my non-climate background I can think of came from a gridded matrix.
  - Patagon
    
    Posted Aug 28, 2010 at 12:30 PM | Permalink
    
    Then you may like MacNoleg(96) comments on the issue:
    
    Click to access oleg.pdf
    
    😉
- Hu McCulloch
  
  Posted Aug 26, 2010 at 10:30 PM | Permalink
  
  Thanks, Nick — I see that the Wiki Variogram article discusses the exponential model, but still the Kriging article itself does not even mention it.
  
  It seems from the Variogram article that in geostatistics the property I was calling “covariance stationary” (as in time series models) is more elegantly known as “isotropic”.
Hans Erren

Posted Aug 26, 2010 at 5:48 PM | Permalink

Jeff, Kriging is very common all over the world in mineral exploration for interpolation of geochemical samples.

google geochemical kriging for a treasure trove of papers.
Louis Hissink

Posted Aug 26, 2010 at 7:15 PM | Permalink

During the middle 1970’s at WMC’s Kambalda Nickel Operations, I was involved with trying to understand how Kriging actually worked and whether or not it was a suitable replacement for the polygonal ore-reserve method then in use.

I wrote some Fortran programs to compute the variogram and after extensive modelling we discovered that the variogram range was the distance greater than which data was random, and when the distance was less than the variogram range, linearly correlated, or close to.

The general discipline is geostatistics, or I prefer geomathematics re Fritz Achtberg’s text on the topic. Mind you geostatistics has limitations – De Beers tried to use it to estimate the ore grades for the Naimbian beach mines (diamond) and the best minds could not get it to work. To this day it is impossible to calculate the ore-grade of an alluvial diamond deposit from the collected samples.

Allied to geostistics is the idea of factoring a measurement with an area of influence representing the physical object being studied. This introduces the idea of intensive and extensive variables, and the pitfalls when using intensive variables in isolation as is routinely done in gridding routines (averaging temperature readings within a 5min lat/long grid, for example).

As I often written here and elsewhere, the way a global mean temp is computed using gridding methods is fundamentally biased and plain wrong. What they have ended up computing is the GMT of a spherical surface that has not relevance to physical reality.
- Hu McCulloch
  
  Posted Aug 26, 2010 at 9:47 PM | Permalink
  
  I agree that gridding is pretty primitive, but what’s GMT? Surely not Greenwich Mean Time or the Giant Magellan Telescope — http://en.wikipedia.org/wiki/GMT_(disambiguation)
pete m

Posted Aug 26, 2010 at 10:12 PM | Permalink

i think he means global mean temperature

ie gmt of spherical surface – ie 1 level, not the whole volume etc.
- Hu McCulloch
  
  Posted Aug 26, 2010 at 10:22 PM | Permalink
  
  Thanks, Pete — I see that Louis used it in the same paragraph, so I should have caught that! It’s what I called X in my post.
David Watt

Posted Aug 27, 2010 at 3:26 AM | Permalink

Kriging as I understand it was used in South Africa and elsewhere to assess the extent of localised variance in mineral deposits(the nugget effect). It allowed you to take account of this local variance when assessing the accuracy of your estimate of the grade of the ore for a particular section of the mine.

In the fluid media of climate science (air or water) if we assume that the ground stations are well placed away from heat islands, extreme local variance is unlikely. By and large the assumption that a temperature data point is representative of area around it will be reasonably accurate.

This assumption of course will start to fall down in areas (like the Antarctic) with few data points and with those there are clustered in unrepresentative locations.

But even here Kriging will not be of much help.

There may perhaps be some way statistically of using Kriging for analysing the temperature record considering the many data points that are in heat islands as being equivalent to”nuggets”. But I do not think so as the meteorologists in effect are choosing to place most of their data points within known nuggets that can in no way be considered “random”.

Unless I am missing something radical,it looks very much as if Kriging is a dead end.
- Geoff Sherrington
  
  Posted Aug 27, 2010 at 6:52 AM | Permalink
  
  David, Kriging breaks down when the nugget effect becomes too severe. Up to that point, for more uniform ore deposits, it is widely used and can give excellent results. The nugget effect is much more severe than UHI. With nuggests, you are dealing with spot samples that are up to orders of magnitude above the mean.
  - WillR
    
    Posted Aug 27, 2010 at 7:48 AM | Permalink
    
    Re: Geoff Sherrington (Aug 27 06:52), I think so too. Probably quite useful for a large copper porphyry — probably not so useful in venous systems or with nuggets, conglomerate etc.
    
    At least that’s what our chief geologist tells me when I look at the data.
  - Hu McCulloch
    
    Posted Aug 27, 2010 at 11:37 AM | Permalink
    
    Geoff Sherrington —
    
    Kriging breaks down when the nugget effect becomes too severe.
    
    Kriging assumes the nuggets have mean zero, and so is misled by positive mean UHI nuggets.
    
    Kriging only breaks down if there is no spatial correlation to the disturbances net of the nugget effect. Of course if the nuggets are so noisy that the spatial correlation appears to be zero, there is no point to it. But in climate data, there is strong spatial correlation.
    
    In most of the graphs in Steve’s posts I linked at the beginning of the post, there is a visible nugget effect, but it is rarely more than 30% of the zero-distance covariance. Somewhere Hansen has a graph showing strong correlation in GHCN data out to about 1200 km. Does anyone have a link offhand?
    - Steven Mosher
      
      Posted Aug 27, 2010 at 12:00 PM | Permalink
      
      Hansen 87 I believe, but there is more recent better work.
      Referenced by me and kennth over at Id’s site a while back
      
      arrg
      
      Click to access HadGHCND_paper.pdf
      
      note that the spatial correlation is also a temporal correlation ( your weather yesterday may be my weather today) so the corrleation can be angle dependent.. ( think downstream as opposed to cross stream..
    - Hu McCulloch
      
      Posted Aug 27, 2010 at 1:13 PM | Permalink
      
      Steve —
      Thanks for the Caesar, Alexander & Vose reference (JGR 2006).
      
      They show that their estimated 1/e distance (their “Correlation Length Scale” or CMS) varies from 400-1500 km, depending on month, latitude, and whether you’re looking at Min or Max temps.
      
      Their Fig. 2 for 60-90 dN summers shows a zero-distance limiting correlation of about .8, for a 20% nugget factor. It crosses 1/e at about 600 km, but then crosses zero and presumably goes negative after 1500 km.
      
      I would argue that these negative correlations for deviations from the global average (GMT :-)) are a natural and valid consequence of the assumption that the underlying but unobserved disturbances have a necessarily positive exponential correlation structure. They mean that you get bonus points for adding a station that is out of range of your existing stations! (Sort of like adding a stock with negative “beta” to your portfolio.)
      
      Even aside from the subtle disturbance/deviation distinction, CAV should be estimating their CLS where the correlation hits 1/e (36.8%) of the 0-distance limit (.8 in their Fig 2), not where it hits 1/e itself.
    - Steven Mosher
      
      Posted Aug 28, 2010 at 4:11 AM | Permalink
      
      Re: Hu McCulloch (Aug 27 13:13),
      
      Kenneth and I have both tried to generate some discussion around the paper with no luck.
      
      Can you expand on this
      I would argue that these negative correlations for deviations from the global average (GMT 🙂 ) are a natural and valid consequence of the assumption that the underlying but unobserved disturbances have a necessarily positive exponential correlation structure. They mean that you get bonus points for adding a station that is out of range of your existing stations! (Sort of like adding a stock with negative “beta” to your portfolio.
- Peter D. Tillman
  
  Posted Aug 27, 2010 at 2:47 PM | Permalink
  
  Re: David Watt (Aug 27 03:26), Nugget effect
  
  The literal Nugget Effect was first encountered when trying to calculate ore reserves in placer gravels. I was peripherally involved in such an effort many years ago, when I was working for a company that was trying to mine deep gravels in the old Yuba River (California) placers — near where Anthony Watts lives, IB.
  
  Anyway, you sample low-grade placer gravels with BIG samples, and calculate ore reserves using polygons, ims. Even then, accuracy of predicted grades wasn’t great, as I recall. The project went into production — they rebuilt an old Yuba dredge boat with a longer, narrower bucket-line — but never made much money. Incidentally, the industry standard then for placer gold grade reports was milligrams Au per cubic yard of gravel.
  
  Cheers — Pete Tillman
rich

Posted Aug 27, 2010 at 4:07 AM | Permalink

Here http://www.kriging.com/pg1979_download.html is a free book on the subject. It’s very down-to-earth (this is a joke – laugh now) and I liked the treatment of “variograms”. You might even buy the up-to-date version.
P. Solar

Posted Aug 27, 2010 at 7:17 AM | Permalink

>> Kriging is named for South African mining engineer D.G. Krige (master’s thesis, 1951). Since it’s a proper name, I prefer to use an upper case K (as in Gaussian distribution), though Wiki uses the lower case.
>>

Gauss is a name , a proper noun, gaussian is an adjective and should not be capitalised. Equally verbs are not capitalised so kringing would be correct.
- Hu McCulloch
  
  Posted Aug 27, 2010 at 11:39 AM | Permalink
  
  I’ll give you bowdlerize, but would you insist on victorian, shakespearian, or american?
  - Earle Williams
    
    Posted Aug 27, 2010 at 1:33 PM | Permalink
    
    I’ll stick my neck out here and postulate that when a verb is created from a proper noun (e.g. to bork), its usage requires lower case. A resulting adjective or gerund will be lower case. For an adjective crafted directly from a proper noun (e.g. Malthusian) the adjective will always be spelled with an initial upper case letter.
    
    How’s that for ad hoc arm waving? 🙂
    - Hu McCulloch
      
      Posted Aug 27, 2010 at 6:44 PM | Permalink
      
      Earle Williams
      Posted Aug 27, 2010 at 1:33 PM | Permalink | ReplyI’ll stick my neck out here and postulate that when a verb is created from a proper noun (e.g. to bork), its usage requires lower case. A resulting adjective or gerund will be lower case. For an adjective crafted directly from a proper noun (e.g. Malthusian) the adjective will always be spelled with an initial upper case letter.
      
      How’s that for ad hoc arm waving?
      
      I’ll Google it. 😉
Geoff Sherrington

Posted Aug 28, 2010 at 1:03 AM | Permalink

A geostatistics blog thread was suggested for the global temperature problem because (a) there is wide use of the method in mining and (b) it is usually possible to estimate its performance by comparing pre-mining estimates with post mining ore recovery. It is some years since I was involved with it, please read the reference at the end of this note.

For a start, ignore the time domain. It’s all space and stationary.

Within a single drill hole, there might be chemical analysis every metre for 2 or more elements or compounds (like copper and gold). Geostatistics methodology is applied to determine via a semivariogram, the distance at which an analysis value ceases to be useful in predicting a more distant value. This distance or range affects the spacing between subsequent drill holes, so that interpolation of grade can be made between holes with some estimate of confidence. It becomes apparent whether the deposit is isotropic (the grade drops away similarly in all directions) or more complex, such as when stratigraphic layering is involved.

In the simple case, one can derive a search sphere that is figuratively moved through the deposit, using a weighted formula to assign more or less importance to a point approaching the fringe of the sphere. This process has the general description of “kriging”. More often, the search shape is an ellipsoid of rotation, or even a 3D ellipsoid with non-orthogonal axes. The search object can also have changed properties like orientation, size and weighting when encountering different geologies known to host different grades or metals.

The geostatistics method then chooses those blocks which are of economic ore grade and those that are waste. There are practical boundaries to the size of blocks, which can be of any solid shape that tiles to leave no gaps between blocks. The shape of the to-be-mined portion is calculated in such a way that a minimum of low grade or waste blocks intrude into the ore block pattern. This is especially sensitive at the edges of a diffuse deposit.

The pre-mining model thus gives the shape of the excavation, which blocks will go to the waste dump and which will go to the processing plant. Commonly, there is more than one metal, so different stockpiles are defined (say one for copper and one for gold).

The analogy between mining and global temperature can start with a single station over a year of daily readings of Tmean, or to get into multivariate statistics, Tmax and Tmin. The latter is conceptually similar to a drill hole with 365 one-metre analyses for two elements. The same interpolation method used for the single drill hole can be used here to derive estimates of missing values. As a check, a cross-variogram can be done between Tmax and Tmin to see that they agree within limits that have evolved over time and testing elsewhere. (An outlier finder, if you like).

There is another weather station nearby. How useful is it to infill data still missing from the first station? The situation resembles the ore case of two adjacent drill holes and interpolation between them.

Next, we have a number of weather stations that operated during that year. Kriging methods can be used to interpolate between them and in turn to estimate optimum sizes and shapes for grid cells. It is not automatic that a X minute x Y minute x Z metre cell is a good a priori assumption. It might be convenient for the size of the computer or for the station density, but the geostatistics method should be able to place better confidence limits on the properties of whatever grid cell shape is used – and reject cells where there is inadequate data.

The geoid can be selected as the shape of the mine analogue. The mine is essentially an eggshell shaped layer 1 m thick (or more, if you wish to blend in lower troposphere satellite temperatures or radio sonde readings or natural altitudes). For a given year, a global Tmax and Tmin can be calculated and confidence limits assigned.

Time then enters the equation. In the climate case, a simulation is done for each year for each station. The method has some flexibilities. If there is a station move, a new station number is created and it enters the data set for later years. If there is a period without observations that is too long to infill, the infilling is done to some extent when the assemblage of years is combined to show global temperature movement with time.

The end result depends on assumptions made along the way. Philosophically, kriging is no better than a guess when infilling missing data. There are subjective choices – does one use only stations with 100 years of near-continuous data, or does one throw all available data into the melting pot? In theory the latter approach is possible with geostatistics methodology, though it might throw some back. With contemporary computing power and software, it is possible to apply a range of assumptions to evaluate mathematically, the mean assumption, the deviation from the mean (which gives a certainty mathematically expressed.)

There might well be better ways to approach climate using geostatistics than I have roughly outlined here. The suggestion of treating a year of weather data as a drill hole analogue might not be optimum; it might be better to work in the time domain and treat each weather station over the years as analogous to a single drill hole.

The purpose of this post is not so much to educate – see the reference below for a start – as to try to capture the attention of the many people who worked with quite advanced geostatistics but have not made the mental link to global climate reconstruction. People like those at the UK Met Office who are embarking on a clean slate exercise might like to tap into this rich vein of potential calculation.

Geostatistics have been used for purposes other than mining. See Isobel Clark’s book at http://www.kriging.com/books.html
Rainstat: another data set courtesy of Soil, Water and Climate. This is some
summary statistics on rainfall in South Africa and Namibia over the last 70 years. Lots of variables, lots of non-stationarity and LOTS of samples. This file comes with a companion boundary file, outlining the Republic of South Africa.
Johnsen, G.A., Mortensen, D.A.„ and Gotway, C.A. 1996. “Spatial
and temporal analysis of weed seedling populations using geostatistics”. Weed Science, 44:704-710.

In the field of geostatistics, some people prefer certain approaches and certain texts, so a universal recommendation here will draw fire. For an introductory and older book online, try

http://www.kriging.com/PG1979/
- Kenneth Fritsch
  
  Posted Aug 29, 2010 at 9:55 AM | Permalink
  
  I have not been able to get this post posted so I will remove a couple of links and try again and post the remaining links in another post.
  
  I suppose that kriging was mentioned here as a potential alternative method for interpolating global temperatures or perhaps just as an interesting method of interpolation.
  
  In climate science the critical estimate appears to be the CIs for temperatures both regional and global, monthly and seasonal, maximum and minimum. It is my opinion that climate scientists have to make stab determining these uncertainties in the instrumental temperature records in order that their conclusions concerning climate models can be properly evaluated as well as calibrations/verifications for reconstructions of historical temperatures. It is also my opinion that these estimations have not been all that satisfying and subject to assumptions.
  
  Below I have listed some links to articles/information about estimating CIs for temperature measurements and interpolations.
  
  What I have found in my calculations is that the spatial temperature relations change over time and are affected by factors such as altitude and proximities to large bodies of water and, of course, distance separation. Also a surprising feature for those who have not attempted to put together a continuous temperature series for 5 X 5 degree grids is the lack of such grids with a reasonable population over any extended period of time. There is no doubt that lots of interpolation is required to fill in the “empty spaces” and therefore the uncertainty of those interpolations is critical to any regional and global temperature trends.
  
  People complain about the grid concept, but at some point a grid or effective grid is required in order to assign a temperature to a given area unless, of course, we can have measurements over infinitesimal areas.
  
  What about spatial relationships amongst temperature measuring stations changing over time and a method for handling that. Could one practically have a model for the spatial relationship that changes periodically (even annually) for correlating station and/or grid temperatures? How refined of a model does one need to include all the effects of factors that are known to influence the spatial correlation of temperatures like distance, altitude, proximity to bodies of water and perhaps latitude. Finally what is the proper way of testing the validity of these models? Use pseudo temperature grids from a climate model? By withholding data from grids that a large number of station measurements?
  
  Since kriging, as it is applied to geostatistics, does not deal with the time factor, I would guess that we could compare the methods used there to those of the temperature interpolation. For that purpose, I would think an expansion of the post that Geoff Sherrington posted here and more discussion of it is in order. Has anyone used kriging or some other similar statistical method, that includes time, to study the effects of plate tectonics on ore deposits?
  
  Click to access Shen.pdf
  
  http://journals.ametsoc.org/doi/pdf/10.1175/JCLI4121.1
Geoff Sherrington

Posted Aug 28, 2010 at 2:26 AM | Permalink

Hu, Weather correlations ovr large distances were also covered by Autst. Met. Mag. 53 (2004) 75-93
Updating Australia’s high-quality annual temperature dataset
Paul Della-Marta and Dean Collins
National Climate Centre, Bureau of Meteorology, Australia
and
Karl Braganza
CSIRO Atmospheric Research, Aspendale, Australia
(Manuscript received June 2003; revised September 2003)

One graph is at
http://i260.photobucket.com/albums/ii14/sherro_2008/d-m05.jpg?t=1282980266

I have a paper copy and part of an OCR version, but the OCR program does not cope well with the layout for reasons that only OCR programs know.
- Hu McCulloch
  
  Posted Aug 29, 2010 at 6:29 AM | Permalink
  
  Thanks, Geoff —
  
  The Australian correlation graph you link shows correlations falling from about .8 at the 0 limit, to about 1/e of this at 1200 km. But then there is a distinct increase in correlations (especially in MAX temps) out to 2500 km.
  
  Since any plausible isotropic (covariance stationary) model would have correlations decreasing with distance, this must mean that Australia is not isotropic. I suspect that most of the higher correlations beyond 1200 km are coming from pairs of points that are on the coast, and therefore commonly influence by ocean current temperatures.
  
  A similar increase to positive values appears in the Antarctic data — see links at head of post. Again, this is probably due to circumpolar currents.
  
  The beauty of an isotropic model, even if it isn’t precise, is that you don’t have to measure temperature everywhere in order to compute a plausible covariance matrix and infill the missing area.
  
  The very novel part of Steig 09 was that they had an AVHRR matrix that enabled them to compute weather covariances (if not accurate temperatures) for the entire continent of Antarctica, in 50 km squares. This covariance matrix (used in its entirety without taking PCs) beatifully delineates the Transantarctic mountain range and shows that W. Ant is relatively weakly correlated with E. Ant, even where it is relatively close. (Unfortunately, I lost some nice plots of these that I made last year in a disk crash.)
  
  An isotropic exponential model would have made Scott/Amundsen at the S. Pole (which has a long record but shows little trend) dominate the adjacent part of W. Ant. during the 38-year absence of Byrd station. But the S09 AVHRR matrix shows that most of W. Ant is more strongly correlated with Rothera/Adelaide or McMurdo than with Scott/Amundsen.
  
  Unfortunately, by taking only the first 3 PCs of the AVHRR covariance matrix, S09 lost a lot of the detail in this structure, and somehow gave the extreme peninsula stations (dominated by 6 stations on King George Island 100 km from the peninsula itself) disproportionate influence in W. Ant.
  - Geoff Sherrington
    
    Posted Aug 30, 2010 at 3:50 AM | Permalink
    
    Hu, Agreed, but with reservations. For some seasons the Australian continent has a succession of east-moving fronts (particularly south of the Tropic) that come through with a regularity that must show on the correlations.
    
    Last year someone on CA or another science blog did a time lapse view of these fronts as viewed from above the South Pole. There were about 6 of them rotating year after year around the pole, which again produces correlations whose significance eludes me. The distance part of the correlation (corresponding to latitude) gets smaller as you approach the pole, so maybe the correlations should be examined on a rotation time base rather than a distance base.
    
    In general, I have suspicions that a geostatistical semivariogram of temperatures from close-spaced stations taken on a given day would give a sill after a few km. If you smooth to monthly data, the range would be greater, then annual smoothing greater again. One of the reasons I’m pushing this approach is to put numbers on these assertions and then to contemplate what they mean. Personally, I see little useful information in the Della-Marta data graphs linked above, to the extent that their use could lead to invalid methods being adopted. Ditto the early Hansen equivalents. Correlations of 0.5 or below at 1,000 km do not excite me. They are near my definition of noise. The negative correlations might simply reflect that highs are followed by lows rotating around the globe and that these in turn are linked to temperatures. I’m surprised that this effect does not show more prominently. Maybe it is buried in latitude dependence.
Geoff Sherrington

Posted Aug 28, 2010 at 7:25 AM | Permalink

As an example of ore estimation then confirmation, we have the Ranger One, Number One open cut orebody 250 km East of Darwin.

The total size of the No. 1 Orebody prior to mining was calculated, using a cut-off grade of 0.05% U3O8, to be 22.159 Mt averaging 0.259% U3O8.

A total of 19.78 Mt ore averaging 0.32% U3O8 was mined from the No. 1 open cut before closure about a decade after the estimate.
Gordon Melville Ford

Posted Aug 28, 2010 at 11:47 AM | Permalink

During the early years of open pit mining at the Jmestown Mine in California the head grade at the mill was 90% of the Kriged reserve estimate but the tonnage recovered per bench was 110% of the estimate. Mining dilution was higher than anticipated as ore/waste boundries that were clearly visible in drill holes and underground working became blurred after blasting. Our estimate of shovel selectability was too optimistic.
Louis Hissink

Posted Aug 28, 2010 at 1:20 PM | Permalink

Surely all that is needed is to weight each and every temperature station with an area representing the landuse type – and adopt the polygonal reserve system used in the industry. I’ve noticed that in GIS packages it is termed Voronoi polygons etc.

The crucial thing is to make sure that the temperature metric is used to factor an intensive variable, here area, to do the aggregating to get a global mean temperature, or

(a1xt1 + a2xt2 +…+ anxtn)/(a1+a2+…+an)= Temp.

And no data infilling is permitted, and of course everytime a station is moved or closed the areas have to be recomputed, not a trivial task given what actually happened over time. (In fact I don’t think we could even do that given the lack of data), so well it comes down to reality, I suspect we are unable to reconstruct the historical temperature metric globally over time.

But simply aggregating temperatures in predetermined lat/long grid cells is no different to aggregating the phone numbers in a telephone book. Both calculations can be used to add, albeit specious, authenticity to the verbal virtuosity of the intellectuals selling a particular policy.
- Hu McCulloch
  
  Posted Aug 29, 2010 at 6:57 AM | Permalink
  
  But simply aggregating temperatures in predetermined lat/long grid cells is no different to aggregating the phone numbers in a telephone book.
  
  Gridding isn’t quite that bad, though it is a very inefficient way to estimate global temperature.
  
  Hansen/GISS gridding, as I understand it, is equivalent to assuming a constant covariance throughout each 5° “square”. Missing squares are then infilled if they are within 1200 km (about 10.8 deg) of at least one gridsquare with at least one reading, with the equal-weighted average of those gridsquares. This is equivalent to assuming a constant, positive covariance between gridsquares that are within 1200 km of one another.
  
  However, since occupied gridsquares are not influenced by nearby occupied gridsquares, this constant inter-gridsquare correlation must be essentially zero (say .01). Also, since the influence of a gridsquare on its neighborhood does not increase with the number of stations it has, there must not be a nugget effect, so that the effective intra-gridsquare correlations are effectively assumed to be 1.00.
  
  Furthermore, since the 5° “gridsquares” in fact become narrow wedges in northern and southern latitudes, the distortions become more extreme near the poles — stations in Scandinavia that are relatively clustered get disproportinately little weight, while an isolated station in Siberia or Greenland gets inordinately high weight.
  
  If the true correlations are anything like exponential with a .2 nugget factor and 1/e distance of about 1200 km (see the paper Mosh cited, Caesar, Alexander and Vose, http://hadobs.metoffice.com/hadghcnd/HadGHCND_paper.pdf ), this is a very inefficient way to compute a global average. Its properly computed variance (using the isotropic exponential model) would be much higher than if the appropriate exponential model had just been used in the first place.
  
  The covariance structure implicit in gridding isn’t quite isotropic, since it depends on where you are in your “square”, and how square your “square” is. However, it could be approximated near the equator with a correlation function that is unity out to 2.5° (about 280km), then .01 out to 1200 km, and then dropping to zero itself beyond 1200 km. Near the poles, the areas of the inner circles then shrink in proportion to cos(latitude).
HAS

Posted Aug 29, 2010 at 2:28 AM | Permalink

I have the feeling that this is a technique looking for an application wrt to global temperatures.

My instinct would be to first define the problem in hand – what are we attempting to estimate and what inferences do we want to make from that. The appropriate model and statistical analysis would then flow from that.

Possible questions include: Has the surface temperature of the globe increased over the last 100 years? Has the sea, land surface and atmosphere increased its net energy content over this period? To what extent has human activity contributed to either? How does the temperature/net energy compare to predictive models at a sub-global level?

In each case you’ll need different models and different tests.

It is the failure to be explicit and systematic about these issues that has caused problems for climate science. Just finding another test (with its underlying model) ain’t going to solve this. Better to start with the problem and the inferences involved and work from there.
- Geoff Sherrington
  
  Posted Aug 29, 2010 at 5:49 AM | Permalink
  
  HAS – You are partially correct. As a long time reader of CA, I’ve assumed that others follow the reasons for wanting to try a differnt mathematical approach, especially one that has proven itself useful in other applications. There is a small army of mathematicians out there who do geostatistics at advanced level every day. I’m trying to get readers here to get them involved, friend to friend.
  
  Indeed, I have strong reservations about the need for a global temperature and its time trends, but that is about the most common measurement used to drive policy. If it can be calculated more accurately and its confidence limits firmed up better than now, we would have a gain.
HAS

Posted Aug 29, 2010 at 2:29 AM | Permalink

Should probably have ended:

IMHO

🙂
Joe Crawford

Posted Aug 29, 2010 at 11:31 AM | Permalink

HAS (Aug 29 02:28),

My instinct would be to first define the problem in hand – what are we attempting to estimate and what inferences do we want to make from that. The appropriate model and statistical analysis would then flow from that.

I have to agree. Maybe it’s too many years of having to design systems that actually work. I just fail to see how you can come up with a system to solve a problem without first defining the problem to be solved and then fully defining the requirements of the system to solve it. Otherwise, as you said, you wind up with “… a technique looking for an application…”.
- Geoff Sherrington
  
  Posted Aug 29, 2010 at 8:37 PM | Permalink
  
  Joe, I agree, but please put the posts in context with what has been discussed before. In seeking a better method to calculate global temperature variation over the last 150 years (as a broad objective). A first step in proposing a different approach is to see if any arguments knock it stone dead before you waste the time.
  
  Hu, gridding is of course not limited to shapes bounded by lats and longs. Eqi-area hexagonal tiling should be able to be used, even at the expense of having an odd-shaped leftover in a region where there are no temperature records anyhow. I think the assumptions you might have to use would have a tiny effect on the outcome. The broader field of geostatistics (as opposed to the kriging part of it) accommodates different block sizes within an ore deposit. I’m keen for recent specialists to look at the problem from start to finish, rather than just a chapter or topic within geostatistics.
  
  Maybe I would not be good at stimulating one of those USA Football cheer squads, because this debate has not roared into life, but that does not mean I do not appreciate beauty.
  - HAS
    
    Posted Aug 30, 2010 at 2:02 AM | Permalink
    
    Well I don’t really know enough about statistics to know how best to do this, but I assume the classic approach to calculating global temperature would be simply to draw a random sample of temperatures at positions across the globe and calculate it from that. The larger the sample size the lower the SD and the better the estimate.
    
    This makes the problem one of estimating the temperature at given point on the globe, and this then pulls in the use of local temperature estimation models (that I simply note in passing two things: 1) this brings in additional known information relevant to temperature estimation such as surface characteristics, position on globe etc and 2) these models are not necessarily dependent on having the complete time series to to build). However this does introduce further uncertainty into the estimates of the sample temperature, and I’m sure that those clever than I can then calculate the optimum sample size to maximize the quality of the final estimate (a function no doubt of the quality of the local temperature estimation models).
    
    It would also be interesting to think about whether it is better to estimate the vector of global temperatures in time by drawing vector samples, or to estimate each year’s temperature independently.
    
    I know less about USA Football cheer squad than I do about statistics, so I don’t think this will have much impact on the debate either!
Kenneth Fritsch

Posted Aug 31, 2010 at 9:18 AM | Permalink

Although I would have preferred a more in depth discussion of kriging as it is applied to geostatistics, I have been able to learn from the points made in this thread on spatial interpolation, and particulary those made by Hu McCulloch. They have fit well with some of my concerns and questions.

All in all, I would say there were some nuggets of information and insights here.
- Hu McCulloch
  
  Posted Aug 31, 2010 at 6:38 PM | Permalink
  
  All in all, I would say there were some nuggets of information and insights here.
  
  A “silly” metaphor, if you ask me! 🙂
Hu McCulloch

Posted Nov 21, 2010 at 6:26 PM | Permalink

I just added the following update to the post:

My OSU colleague over in the Statistics Dept., Noel Cressie, has a 1993 book, Statistics for Spatial Data, that provides a modern update of kriging.

He also has an interesting article on global kriging with Gardar Johannesson, “Fixed rank kriging for very large spatial data sets,” J. Royal Statistical Soc B (2008), vol. 70, 209-226. They argue against isotropic exponential kriging, and instead reduce the rank of all global possibilities by means of wavelet-like bisquare local deviation functions centered on a geodesic grid that can have a general covariance matrix. They end up with a 396X396 covariance matrix to estimate, but have 173,405 satellite-generated observations (on ozone concentrations) to fit it with. Their objective is then to interpolate to 51840 prediction points.