Hansen's "Bias Method"

John Goetz sent me some scanned excerpts from Hansen and Lebedeff 1987 which (only somewhat) clarify what Hansen’s doing – although the descriptions below do not clarify the Kalakan problem of how the combined station record becomes colder than any of the measurements in the period. There are somewhat different issues in combining records which are only subject to scribal errors and records from different stations – something that Hansen didn’t discuss (and which I’ll try to get to). But bear in mind that both situations arise and may call [almost certainly call] for different techniques. The method that one gets used to in climate analyses is the “anomaly method” in which a reference period is established (say 1961-1990) and then deltas (“anomalies”) are calculated from averages (taken monthly) over the reference period.

My perception that Hansen didn’t appear to be using the anomaly method appears to be correct based on closer analysis of these excerpts. Instead, Hansen uses what he describes, with perhaps unintended irony, as the “bias method”. I’ll try to explain the method and then provide an example, suggesting that his nomenclature may be quite apt. Again, none of these comments are written in stone, as it’s hard to work through what they actually do in the absence of either adequate documentation or source code, and it’s easy to get wrongfooted.

Hansen and Lebedeff 1987

The esence of the “Bias Method” is stated as follows:

We would like to incorporate the information from all of the relevant available station records. … We calculate the mean of both records for the period in common, and adjust the entire second record {T}_{2} by the difference (bias) \delta T . … The zero point of the temperature scale is arbitrary.

The situation that I want you to picture in your heads and I’ll illustrate below is a long urban record which Hansen wants to adjust in light of a series of generally shorter rural records. (As soon as you express it this way, you can almost immediately see the catch.) So how does this differ from the anomaly method? The anomaly method uses a fixed reference period (say 1961-1990) to center all records. If you’ve got a rural record for 1880-1920 and another from 1960-1980 and a third from 1970-2000 and you need a reference period of 1961-1990, the anomaly method doesn’t allow you to use the records which you can’t center. So what Hansen does (appears to do) is to center the shorter series on the mean of the target (urban) series during the overlap period. More on this below.

Hansen describes the benefit of his method in HL87 as follows:

A principal advantage of this method is that it uses the full period of common record in calculating the bias \delta T between the two stations. Determination of \delta T is the essence of the problem of estimating the area-average temperature change from data at local stations. A second advantage of this method is that it allows the information from all nearby stations to be used provided only that each station have a period of record in common with another of the stations. An alternative method commonly used to combine station records is to define \delta T by specifying the mean temperature of each station as zero for a specific period which had a large number of stations, for example, 1950-1980; this alternative method compares unfavorably to ours with regard to both making use of the maximum number of stations and defining the bias \delta T between stations as accurately as possible.

Hansen then describes an iterative procedure by which stations are added in one at a time, with each interim composite being used as a benchmark for the next series. He observed that experiments indicated that the results were not very sensitive to the order in which stations were added in. He says that the same method was used for combining different scribal versions.

Hansen et al 1999
Hansen et al 1999, 2001 appears to use the same procedure. HAnsen et al 1999 says:

We first describe how multiple records for the same location are combined to form a single time series. This procedure is analogous to that used by HL87 to combine multiple-station records, but because the records are all for the same location, no distance weighting factor is needed.

Two records are combined as shown in Figure 2, if they have a period of overlap. The mean difference or bias between the two records during their period of overlap \delta T is used to adjust one record before the two are averaged, leading to identification of this way for combining records as the “bias” method (HL87) or, alternatively, as the “reference station” method [Peterson et al., 1998b]. The adjustment is useful even with records for nominally the same location, as indicated by the latitude and longitude, because they may differ in the height or surroundings of the thermometer, in their method of calculating daily mean temperature, or in other ways that influence monthly mean temperature. Although the two records to be combined are shown as being distinct in Figure 2, in the majority of cases the overlapping portions of the two records are identical, representing the same measurements that have made their way into more than one data set.

A third record for the same location, if it exists, is then combined with the mean of the first two records in the same way, with all records present for a given year contributing equally to the mean temperature for that year (HL87). This process is continued until all stations with overlap at a given location are employed. If there are additional stations without overlap, these are also combined, without adjustment, provided that the gap between records is no more than 10 years and the mean temperatures for the nearest five year periods of the two records differ by less than one standard deviation. Stations with larger gaps are treated as separate records.

The “Bias Method”?

If I’ve understood Hansen’s “Bias Method” correctly (and I don’t guarantee that I have), consider a situation where the target urban series has a long-term warming trend that is non-climatic (just supposing in a mathematical sense). Then let’s suppose that we have 4 shorter rural series, none of which individually has a trend by hypothesis, which we center with respect to the target series as shown below. If you then take an average or weighted average of the re-centered rural stations, you will obviously, in this case, get a trend that is only a little less than the underlying urban trend.

If you add a whole bunch more short series, you don’t change the underlying problem, though you may attenuate it a little. If the shorter series have their own trends, the situation will be exacerbated from the one shown. Hansen et al 1999 and 2001 are online. I’ve posted John Goetz’ scan of Hansen and Lebedeff 1987 online as well for reference.

refere65.gif
A Potential Bias in Hansen’s “Bias” Method?

Perhaps some other interpretation is possible. In any case, one feels that the potential statistical pitfalls of a method not used off the Island have been poorly analyzed. The lack of statistical reflection on the Island is really quite remarkable.

There are some separate perils for variations of this method in connection with combining records subject only to scribal error, which I’ll discuss separately. At present, I still cannot contemplate any rational method that would yield the actual combined records in some of the stations that we’ve looked at (e.g. Kalakan.)

References:
Hansen, J., and S. Lebedeff, Global trends of measured surface air temperature, J. Geophys. Res., 92, 13,345-13,372, 1987
Hansen, J., R. Ruedy, J. Glascoe, and Mki. Sato, 1999: GISS analysis of surface temperature change. J. Geophys. Res., 104, 30997-31022, doi:10.1029/1999JD900835.


70 Comments

  1. Posted Sep 1, 2007 at 9:14 AM | Permalink

    How do we know that the differences among stations are merely offsets and don’t have a slope difference (based on absolute temperature)?

    Have all the major temperature measurement instrumentation been analyzed for response time constants and change over time. That type of information might help in finding physical phenomena to explain necessary adjustments.

  2. Al
    Posted Sep 1, 2007 at 9:14 AM | Permalink

    What happens if you “reverse the bias method”?

    That is, figure the slope of the first short series, the second short series… all the short series. And use that _slope_ as the target slope for the urban series?

    In this case, it looks like it would turn “Global Warming” into a historical odditiy. But it seems closer to the _concept_ of “How on earth do I correct the Urban site for UHI effects?” adjusting.

  3. VG
    Posted Sep 1, 2007 at 9:19 AM | Permalink

    Hope this is ok place to post this: http://data.giss.nasa.gov/cgi-bin/gistemp/findstation.py table station temperature data to 2007 is ALL plotted to 2006 in graph form. This seems to be the case for all GISS NASA station data. Is this a mistake or been reported previously? It makes it seem that in tabular form that they are showing average to july 2007 when in fact its average jan-dec up to 2006? May be wrong.

  4. wf
    Posted Sep 1, 2007 at 9:27 AM | Permalink

    Is it possible to have this translated into algebra?

  5. DougM
    Posted Sep 1, 2007 at 9:47 AM | Permalink

    If I am reading Hanson 1999 and 2001 correctly this method describes how to combine shorter records into one longer record. The records that are combined are from the same general location and are either regarded as rural or urban but you cannot combine both as they are treated very differently when it comes to calculating the climate trend. If an early record made when the area was bsically rural and a more modern record is from a time when the population has grow enough to make the station be labeled urban are combined, then the combined record must be labeled urban, so that it will have no effect on the long term temperature trend. The last step before gridding is to change the trend of the urban stations to match the trend of the surrounding rural stations to eliminate any UHI effect on the long term trend, and thus urban and rural sites have to be kept completely seperate until then. Of course the stations do have to be labeled correctly to eliminate any effect from UHI.

    I think Hansen has been taking lessons from the philosopher Kant, who is notorious for presenting things in the most confusing way possible. Having a method of combining records that depends on the order you combine them gives a whole orchard of cherry trees if one wants to use them. A method of combining partial records into one that doesn’t depend on the order should be developed.

  6. John Goetz
    Posted Sep 1, 2007 at 9:56 AM | Permalink

    DougM – Hansen does at least define an order, and that is to process beginning with the station having the longest record to the station with the shortest record. However, the evidence appears that this ordering is not actually used. It almost appears the station with the latest record is used first. Unfortunately, regardless of the ordering, none of us have been able to duplicate the results yet.

  7. Douglas Hoyt
    Posted Sep 1, 2007 at 10:03 AM | Permalink

    People should get away from the old fashioned 30 year normal period and work with a 1 year normal period. They should work with anomalies of each station from the get go. Combining short series into long series should be avoided. Urban areas should be avoided. If you do that, then you will likely find that the global temperatures in the 1930s are very similar to the temperatures in the 1990s.

  8. MarkR
    Posted Sep 1, 2007 at 11:23 AM | Permalink

    ……a long urban record which Hansen wants to adjust in light of a series of generally shorter rural records…….So what Hansen does (appears to do) is to center the shorter series on the mean of the target (urban) series during the overlap period

    If the urban record gives a distorted trend of temperature, and one wants to use the rural record to correct that, then surely one wouldn’t want to use the distorted urban record to calibrate the supposedly accurate rural record?

    In addition it seems that many “rural” records have been wrongly classified as rural as time went on, having actually turned into urban areas.

    Urban temps rise (we think), and are used to create an artificial rising trend in (broken or duplicated) rural, which are in turn used to show that there has been no urban heat island effect.

    So there will be a tendancy for misclassified “rural” areas to be used to correct urban trends, when in fact they only serve to further distort the urban trend, so artificially using urban temps to increase the average, while at the same time inflating the rural by wrongly calibrating, or by misclassicication.

  9. Posted Sep 1, 2007 at 11:32 AM | Permalink

    Looking at the abstract of that paper (I am going to ask our librarian to try to locate the actual paper):

    A strong warming trend between 1965 and 1980 raised the global mean temperature in 1980 and 1981 to the highest level in the period of instrumental records. The warm period in recent years differs qualitatively from the earlier warm period centered around 1940; the earlier warming was focused at high northern latitudes, while the recent warming is more global.

    This is what everyone would pick on if this paper were presented at an economics department. Replace the word “temperature” with “stock market index”. Now, suppose someone comes to you and says while the effects of the U.S. stock market crash of 1929 was reflected only in a few other exchanges, the effects of 1987 Black Monday crash were felt in all global exchanges.

    Think of a few responses to this claim. Translate back.

    A computer tape of the derived regional and global temperature changes is available from the authors.

    But no source code for you!

    — Sinan

  10. Paul Wescott
    Posted Sep 1, 2007 at 12:00 PM | Permalink

    MarkR, 9/1, 11:23 am

    Hinkel et al. demonstrated at Barrow, AK, that it doesn’t take much land use change to distort a “Rural” temperature record.

  11. Steve McIntyre
    Posted Sep 1, 2007 at 12:01 PM | Permalink

    Scanned HL87 now online courtesy of John G at http://data.climateaudit.org/pdf/others/HL87.pdf

  12. MarkR
    Posted Sep 1, 2007 at 12:30 PM | Permalink

    #9 PS. The continual reinforcing of error in the adjusted temperature records could have an effect similar to the multiplier or accelerator in Economics. Is this what warmers refer to as forcing multipliers or somesuch?

  13. TAC
    Posted Sep 1, 2007 at 2:22 PM | Permalink

    SteveM, I could not at first believe that you had accurately described the ‘bias’ statistical method — how could an important monitoring program apply such an ad hoc and arbitrary statistical method? — so I went back and read both the 1987 and 1999 Hansen papers. Although I’m not 100% sure — the method is not fully described — I think you did get it right.

    What is odd is that it would not have been difficult to solve this problem with a general linear model, a standard statistical method — it is described in intro Econometrics textbooks — which has easily derived theoretical properties and has been field-tested on millions of diverse problems.

    Of course there are some details that have to be worked out, but the standard approach is straighforward: Define the global temperature as a weighted sum of temperatures at all sampling sites; define a model employing indicator-variable predictors for site location and year; because residual errors are likely correlated, particularly for densely monitored parts of the planet, assume a GLS model (computing the correlation structure of the errors can be tricky; the noise will likely reflect multiple signals with distinct spatial and temporal correlation structures; one has to pay some attention to this step); fit the model for all sites and years simultaneously using the EM algorithm to deal with the nuisance of missing data (there will be lots of missing observations, but that is easy to handle using standard statistical methods). A lot of computation, but no conceptual problem.

    The fitted coefficients on indicator variables corresponding to each year would then be estimates of annual global temperature deviations for that year. The estimators would have usual linear-model properties: Unbiasedness, minimum variance (if you know the correlation structure; in fact we have to estimate Sigma), etc.

    This really isn’t ‘rocket science’…

  14. steven mosher
    Posted Sep 1, 2007 at 3:04 PM | Permalink

    H87 is a treasure trove of wackiness. Where to begin?

  15. Posted Sep 1, 2007 at 3:21 PM | Permalink

    I am particularly intrigued by HL87 figure 3 (annual correlation clouds a function of distance
    and lattitude: the equatorial stations correlate very poorly!
    At 3000 km the stations don’t correlate at all, so what’s the meaning of a global (or even northern hemisphere)average?

  16. John S
    Posted Sep 1, 2007 at 3:25 PM | Permalink

    Oh, you mean you estimate a fixed effects panel data model with time dummies and a bit of clustering.
    Maybe these guys could learn something from the techniques used to analyse the PSID.

  17. KDT
    Posted Sep 1, 2007 at 3:34 PM | Permalink

    If I were to devise a method of combining records of various lengths and overlap periods, and wanted to feel comfortable that it was robust, I’d find a nice set of complete records to test it on, hide various chunks of the records in a systematic fashion and examine the results. I’m no statistician, but the programmer in me thinks such a test would be informative. Is the Bias Method described in enough detail for a test like this?

  18. Posted Sep 1, 2007 at 3:34 PM | Permalink

    still a pretty picture:

  19. Gavin
    Posted Sep 1, 2007 at 3:39 PM | Permalink

    Sigh….

  20. Steve McIntyre
    Posted Sep 1, 2007 at 3:43 PM | Permalink

    #17. I think that we’ll get to that point but we’re still sorting it out. However if I’ve diagnosed it right, the method will not work against the situation described in my post and any simulations will merely flesh out this point.

    I agree about the panel effects. I’ve spent a lot of time on mixed effects models and figured out a nice way of linking tree ring chronology methods to mixed effects calculations.

    I think that there are a lot of potential uses not just in temperatures but in proxies: though it’s not a magic bullet for spurious regression.

    Another thing that I’m mulling over is the strangeness of using this sort of “bias” method for scribal errors, which I’ll describe in a separate post. Hansen will take two series that may have 180 identical values and 20 non-identical values and calculate a “bias” spreading the delta on the non-identical values over all the values. Does it “matter”? I don’t know, but it’s nutty.

  21. Posted Sep 1, 2007 at 3:54 PM | Permalink

    Another thing that I’m mulling over is the strangeness of using this sort of “bias” method for scribal errors, which I’ll describe in a separate post. Hansen will take two series that may have 180 identical values and 20 non-identical values and calculate a “bias” spreading the delta on the non-identical values over all the values. Does it “matter”? I don’t know, but it’s nutty.

    It’s not nutty, it’s lazy. QCing takes time, lots of it.

  22. steven mosher
    Posted Sep 1, 2007 at 3:54 PM | Permalink

    re 15.

    I’m intrigued by fall off in correlation ( at 1200km) to .33 at low latitudes…
    and the decision to stick with 1200km…

    Note that he notes, that reducing 1200km to 800km would reduce coverage to 65%.

  23. Jeff C.
    Posted Sep 1, 2007 at 4:00 PM | Permalink

    #20

    “Hansen will take two series that may have 180 identical values and 20 non-identical values and calculate a “bias” spreading the delta on the non-identical values over all the values. Does it “matter”? ”

    Bagdarin, Russia described in previous post is an interesting case of this. The non-identical points are primarily missing data points that are absent from one dataset while present in the others. If the data were random over time it wouldn’t make much of a difference, but they aren’t. Data points missing from Winter pulls the set up, data missing from Summer pulls it down. If the missing data has any seasonal periodicity a false bias is almost certain.

  24. Posted Sep 1, 2007 at 4:00 PM | Permalink

    RE: #18

    Based on that chart, I’d only buy into ~500 km interaction range.

    Would some type of Kalman Filtering be useful in trying to extract a signal from all of the temperature variability?

  25. hans kelp
    Posted Sep 1, 2007 at 4:16 PM | Permalink

    Re# 19

    Are you Gavin Schmidt from Realclimate?

    H.K.

  26. W F Lenihan
    Posted Sep 1, 2007 at 4:54 PM | Permalink

    Of course the Gavin who posted #19 is Gavin Schmidt of Real Climate. You can tell by the quality of the scientific content of his posting.

  27. Robert Wood
    Posted Sep 1, 2007 at 5:26 PM | Permalink

    I notice “sigh” has been a frequent recent post of Gavin’s.

    Perhaps he is musing upon his future career prospects; or maybe the wool is falling from his eyes; or maybe he is just left tired by the musing of such an obdurate group of numbskulls.

    I personally think it is the former, otherwise he would be mister attack dog.

  28. Robert in Calgary
    Posted Sep 1, 2007 at 5:35 PM | Permalink

    Re: 19 “sigh”

    From a poster at Climate Science as he thanks Roger for his great work….

    “When I read the scientists on realclimate I always get the feeling that they have got it the wrong way around. They have started with the answer, they are not seeking it. …. With facts on your side, there shouldn’t be questions you don’t want to discuss. Especially when these questions are scientifically important.”

  29. Trevor
    Posted Sep 1, 2007 at 5:38 PM | Permalink

    Re: 19 and subsequent comments. To be fair to Gavin Schmidt, I think that you should assume that the post is from a Jester with a perverted sense of humour!

  30. J Christy
    Posted Sep 1, 2007 at 5:40 PM | Permalink

    Steve: In Christy et al. 2006 (JClim) we developed the homogeneous segment method and tested it using the San Joaquin Valley. We did not think the method of Hansen was appropriate to accomplish what was needed with any statistical rigor. Part of our testing was to check that the final trend results were robust when the segments assembled for merging were subsampled and then reconstructed the time series 1000’s of times (also gives good error statistics). Our results show that in general (and more will be coming) GHCN data produce more positive trends than we found through our rigorous breakpoint detection method, our merging methodology and by using approximately 10 times the number of stations. This gives a large enough sample size to understand the error characteristics of the time series.

  31. steven mosher
    Posted Sep 1, 2007 at 5:54 PM | Permalink

    one thought.

    Selecting the “longest” record seems reasonable…….

    1. Shouldnt it really be the longest record with the least number of metadata changes?
    2. Define longest. I’ve seen records that are Long and sparse.
    3. Why not “center” zones or grids on the longest record. This arbitrary anthropmorphic
    tiling of the sphere, bugs the ever livin crap out of me.

    Averaging in other sites based on Distance ?

    Which record has more weight. A record 200km away that is 60 years long with no site
    changes that correlates at .85 or a record 3 km away that is 10 years long that
    correlates at .15?

    Plus early portions of records are prone to being FUBAR

  32. hans kelp
    Posted Sep 1, 2007 at 6:06 PM | Permalink

    Hey, it´s just that I remember from way back that he (Gavin Schmidt)
    promised never to post on this weblog again because
    Steve McIntyre titled a thread “Is Gavin Schmidt honest?”
    If it is Gavin Schmidt posting in #19 it just confirms
    that you cannot trust his words on even simple matters then!
    Instead of acting stupid and derogatory I suppose it was better for
    Gavin Schmidt to assist in an open and fair scientific
    discourse and thereby raising his standing as an honest scientist.
    I for one consider Gavin´s standing rather low for the moment.
    Even lay people like me can see through much of the falsehood
    presented by him and the folks surrounding him!!!
    Just today I told an employee of mine that I could see
    from the postings on Climate Audit that they were about
    to “track” something down not so favourable to Hansen or the Team.
    If #19 really is Gavin Schmidt I´m 100% sure they are on to something.
    Just a thought!

    H.K.

  33. Paul Penrose
    Posted Sep 1, 2007 at 6:11 PM | Permalink

    I have noticed a common thread among the various hockey team papers: the desire to use as much data as possible. Now this, on the surface, seems like a good idea, but if the data is noisy, fagmented, has an unknown provenance, or anything else that makes is suspect, then it should not be used. The team, however, seems so intent on using all the data they can get their hands on that they ignore all these problems. This also exlains all the odd methods they employ in thier analysis. More data does not always equal better final results.

  34. Steve McIntyre
    Posted Sep 1, 2007 at 7:04 PM | Permalink

    It’s not Gavin Schmidt though he did email me recently. He permitted a post of mine a couple of weeks ago at RC, but deleted a link to climateaudit.

  35. jae
    Posted Sep 1, 2007 at 7:32 PM | Permalink

    18: Fascinating! Maybe the mysterious “teleconnections” (hope that is the right word) isn’t as good as some think.

  36. Geoff Sherrington
    Posted Sep 1, 2007 at 8:19 PM | Permalink

    A few quick thoughts on the range of influence of pairs of observations near the Poles. Just for fun.

    Let’s assume that points up to 1000 km apart can be used to help predict each other re temperature, as Hansen accepts and uses in his adjustments.

    I do not know when Hansen uses a straight line, a stepped line, a logarithmic line or a line to a power or inverse power to weight for distance when comparing and weighting sites, as opposed to adjusting for temperature change over time. I’m confused.

    Mental exercise. Start with a point at 500 km North of the South Pole, say Mt Glossopteris at longitude 120 degrees West. (Glossopteris was a plant found in fossils, as geologists know). Call this our Reference Point, RP.

    A point 1000 km south of the RP is on the other side of the world, 500 km from the Pole. That is, at noon at the RP it is midnight at this point. (We assume here that the South Pole is fixed at where Amundsen-Scott Base is, to avoid worrying about magnetic positioning, magnetic pole movement, global wobble, GPS inaccuracy, etc). Does it matter that the two points on opposite sides of the world are both in constant darkness in midwinter and constant sunlight in midsummer? How does one get a grip on diurnal variations? Also, the two points have different thicknesses of atmosphere between them and the Sun, except for two moments daily. Then they alternate half-daily as to which one has the thicker atmosphere, so there is a solar irradiance attenuation effect not the same as (say) comparing sites along the Equator.

    What is really meant by an average temperature at a chosen time at the Pole, corected by station observations up to 1000 km away? It’s a mix of day and night – but at least it’s unlikely to have an asphalt problem.

    A Warm Northerly wind at the RP, if continued forward, becomes a Southerly at this opposite point. How does one write an algorithm to correct for wind direction?

    Look next at what happens if one goes East from the RP for 1000 km., which is at about 85 deg South latitude. Roughly, one stops at 0 deg longitude, on the Greenwich meridian. The time difference between the RP and this site is a third of a day, or 8 hours. A TOB correction seems in order. The northerly wind example that we used before at the RP becomes a South-westerly if uniformly projected onto a near-planar map projection centered on the Pole. (More probably, in real life, the winds would keep at more constant latitudes and travel around the Pole).

    Another complication is cells based on latitude and longitude lines, the latter converging to a point at the Poles. 5 degree grid cells have rather different sizes, with latitude, but this is not the place for an elementary discourse on spherical geometry and map projections.

    I am gently making these points because Hans Erren kindly advised of a paper on the India CA thread where the Arctic was used as a location for estimation of the range of interactions between sites. Near-polar regions have particular properties not so similar closer to the Equator, some mathematical, some temporal, some spatial, some climatic – like different albedos.

    In the absence of computational code released by Hansen, we have no idea of the sophistication of the adjustments made in polar regions, or whether particular problems were not thought of. We do not know if the 1000 km range derived from Polar regions is valid for ROW.

    Unfortunately, the ROW is being fed stories from the USA with no good chance to check them. And they are backed up by Hollywood, so they must be right.

    You are now invited to re-read Steve’s leader to this thread and mentally apply the Hansen bias adjustment to Polar regions. I wish you freedom from headaches.

    We do like the USA in Australia, but we do not like recalcitrant scientists who do not disclose.

    Geoff.

  37. bernie
    Posted Sep 1, 2007 at 9:34 PM | Permalink

    I just noticed Figure 2 in HL87: Do those numbers make sense to others? It indicates only 52 stations for Eastern US, but 142 for Central America and the Western Caribbean! and 119 for the Middle East! And 138 for South Eastern Europe and North Africa! And 33 in Tierra del Fuego!! Is th emap simply misaligned?

  38. Posted Sep 1, 2007 at 10:50 PM | Permalink

    Hansen and Lebedeff, 1987

    I have a different problem with Hansen’s 1987 paper. I start by accepting the adjustment methods and the logic leading to the final conclusion: that for the period studied, after excluding the urban heat island effect caused by cities, there was global warming of 0.5 degree Celsius.

    Is it possible to remove the urban heat island effect by removing the data for “cities”? During the period studied, world population grew from about 1.5 to 5 billion. Urban population growth was more pronounced, and not only in “cities” that are officially recognized as cities. In many countries, most functional cities are unincorporated. The definition of a specific city varies over time more than the growth of the functional city. Worldwide, figures for “administrative city” populations lag de facto populations because of the way that boundaries are drawn and adjusted.

    In my opinion, the observed 0.5 degree Celsius of warming could be explained entirely by the bias arising from under-adjustment for urban heat island effect. With accelerating rural to urban migration in developing countries, this possible source of bias may have increased since the 1987 paper was written.

  39. TCO
    Posted Sep 3, 2007 at 7:37 AM | Permalink

    Steve,

    If none of it is written in stone, you ought to moderate the snark with which you lace your musings about Hansen. As it is, you are being very acccusatory, but trying to keep a caveat to protect yourself. It is also outrageous that you just plot all those Waldo plots as if it is some mystery without first at least describing what Hansen says he did in his papers. You don’t fight fair. And it’s no wonder that you use a medium where you control the discussion to allow yourself to do that.

    Oh…and DO YOU GET the point about some adjustments HAVING to be in a randomly nonphysical direction to compensate for random errors in the physical direction? Or are you just being recalcitrant to admit anything?

  40. Steve McIntyre
    Posted Sep 3, 2007 at 8:19 AM | Permalink

    #41. I don’t see any snark in this post. I got a bit snarky about Hansen in other posts, because of his public snarkiness, but I agree that two wrongs don’t make a right. While the evidence on the bias method for different stations is not fully digested at present, the existence of a problem in how Hansen combined data versions is at this point pretty much written in stone.

    As to trying to describe what Hansen did in his papers, there are many posts here containing quotations from Hansen and concerted efforts to figure out what he did from limited available descriptions. This particular post contains what I believe to be the most salient Hansen excerpts describing this aspect of his method. If you feel that there is another relevant excerpt that I’ve failed to draw attention to, please do so.

  41. fFreddy
    Posted Sep 3, 2007 at 9:38 AM | Permalink

    Re #41, TCO

    Oh…and DO YOU GET the point about some adjustments HAVING to be in a randomly nonphysical direction to compensate for random errors in the physical direction?

    I don’t. Please explain.

  42. blogReader
    Posted Sep 3, 2007 at 9:57 AM | Permalink

    34: Now this, on the surface, seems like a good idea, but if the data is noisy, fagmented, has an unknown provenance, or anything else that makes is suspect, then it should not be used.

    According to http://www.realclimate.org/?comments_popup=442

    When you take the average of many independent measurements, the errors will tend to cancel. So, for a single series, yes it’s true that it would be silly to state the figure to 0.1C precision if the errors are +-0.5C, but when taking the mean over thousands of series, then 0.1C signals are discernible. -rasmus

    If I measure a board of wood with a tape measure a couple of thousand times my 1mm accuracy can become 0.2mm?

  43. Mark T
    Posted Sep 3, 2007 at 10:42 AM | Permalink

    What you’re referring to, blogReader, is not an appropriate analogy. If, however, you were to measure 1000 different pieces of board, then you could expect to increase the accuracy of your _average reading_ to well below that of your rule. However, that assumes you correctly round to the nearest subdivision on your rule with each and every measurement (which creates uniform distribution of the error) AND, your rule does not have any built-in bias, i.e. the subdivision units are accurate. You could potentially minimize bias by using a different rule for every measurement, but you’d still need some way to guarantee that each rule has a distribution of bias around the true mean. Say your tape measure was of the HG-83 kind and always produced a result around 0.3 cm higher than the actual measurement, then your results would be off by that amount as well.

    Rasmus starts off with the false premise that all the errors are iid, and centered around the true mean. What he’s referring to is observations from different locations by thousands of different reads all taken at the same time. He’s assuming all of the thermometers have the same kinds of errors, and all the observers correctly round to the nearest division, and there are NO biases in any of the measurements. Steve and Anthony have shown that to be untrue.

    Mark

  44. Larry
    Posted Sep 3, 2007 at 11:15 AM | Permalink

    Or in a nutshell, the errors only cancel if their distribution is Gaussian, and there’s no bias. You can’t know that a priori. I would think this is obvious, but apparently it isn’t.

  45. Larry
    Posted Sep 3, 2007 at 11:18 AM | Permalink

    Actually, they don’t even have to be Gaussian, they just have to be symmetrical. But even this can’t be assumed.

  46. Mark T
    Posted Sep 3, 2007 at 12:27 PM | Permalink

    Symmetrical doesn’t even matter, just _independent and identically distributed_… LLN says the sum of those distributions will approach the true mean. Of course, the bias may actually be the true mean… 🙂

    Mark

  47. Larry
    Posted Sep 3, 2007 at 12:50 PM | Permalink

    Point being though, that these conditions can’t simply be presumed.

  48. Posted Sep 3, 2007 at 2:45 PM | Permalink

    blogReader September 3rd, 2007 at 9:57 am,

    If I measure a board of wood with a tape measure a couple of thousand times my 1mm accuracy can become 0.2mm?

    If you have a number of different readers in a controlled environment you might increase your precision that much. As to accuracy – has your tape measure been calibrated lately?

    Now can you apply that to climate where you are not measuring the same “thing” under controlled conditions? Some rooms are hotter than others. Some are in bright sun. Others are under clouds. In some places it is snowing. In others rain. etc.

    And from all this we are supposed to discern a signal which is around 5% or less of diurnal variation and as little as 1% or less of annual variation? Pretty good signal processor you have there. Got code?

  49. Posted Sep 3, 2007 at 2:50 PM | Permalink

    Mark T September 3rd, 2007 at 10:42 am,

    It also depends on what you mean by “same time”. Just the fact that you are using min/max assures you that the measurements are not taking place at the same local time.

  50. Posted Sep 3, 2007 at 3:01 PM | Permalink

    re18:
    some thoughts on the non-correlation of the tropics.
    The data clouds are for annual averaged temperature. This average is in the higher latitudes dominated by variation in winter temperature: eg the 2003 summer heat wave doesn’t show up at all in the annual average of the last decade in europe, whereas the severe winters do. As winter temperature is also governed by large high pressure areas, which have a typical diameter of 3000 km, immediately follows the correlation as function of average air pressure. The tropics have a completely different annual regime more dominated by precipitation/humidity than temperature/air pressure (wet and dry seasons and monsoons).

  51. Larry
    Posted Sep 3, 2007 at 3:31 PM | Permalink

    If I measure a board of wood with a tape measure a couple of thousand times my 1mm accuracy can become 0.2mm?

    That’s actually a more interesting question than it first appears. If you have a tape measure that is graded to 1mm, and if you’re measuring an edge that is in actuality 10.6 mm, that would be a true statement if and only if two out of five measurements came out 11 mm, and three out of five came out 10 mm. Those are the only two choices, and for that to be true, the ratio of readings has to split that way.

    Needless to say, when you try to apply that to temperature measurements in Stevenson enclosures in a set of locations that run from soup to nuts in terms of installation standards, that doesn’t mean much. That analysis only would apply to visual reading of the thermometer itself, and not to the question of whether or not the temperature actually sensed by the thermometer is the same as the real temperature of interest.

  52. Larry
    Posted Sep 3, 2007 at 3:32 PM | Permalink

    Err. reverse that. Two out of five would come out 10, and three would come out 11.

  53. Paul Penrose
    Posted Sep 3, 2007 at 3:37 PM | Permalink

    BlogReader,
    That did not address the issues I rasied at all. As already pointed out, the Law of Large Numbers requires that the errors be independant and identically distributed, which can’t be assumed, but my point was that it also won’t fix fragmented data, or data with unknown trends. Data that has an unknown provenance may have any or all of the above problems; you just don’t know.

  54. Mark T
    Posted Sep 3, 2007 at 4:33 PM | Permalink

    It also depends on what you mean by “same time”. Just the fact that you are using min/max assures you that the measurements are not taking place at the same local time.

    They aren’t even measuring true max/min anyway since they measure at the same time during the day (locally), which is not always the max/min points.

    Mark

  55. Neil Fisher
    Posted Sep 3, 2007 at 4:40 PM | Permalink

    TCO said

    You don’t fight fair. And it’s no wonder that you use a medium where you control the discussion to allow yourself to do that.

    LOL!

    Steve doesn’t fight fair? Steve only comments where he controls the discussion? How ironic!

    He’s been called a “mining company shill”, a “denialist” and no doubt many other things by “the Team”, and why? Because he’s dared to question. Because he’s tried to replicate. Because he’s found errors. Did he get an apology when his findings, that were dismissed by the “the Team” as “incorrect”, “wrong”, or “don’t matter” were actually supported by experts in the field? No, I don’t recall he did.

    He’s had legitimate, on topic questions on the science edited from posts at RealClimate, and I have yet to see anyone complain that he has done the same here.

    Steve started CA, IIRC, because he found himself edited into oblivion at RC, so to suggest he “doesn’t fight fair” and “controls the discussion” is… is… I’m lost for words! If this whole debacle was a TV show, I’d be rolling on the floor with laughter about now.

    And yet the greatest irony is this: if “the Team” had followed the scientific method – if they’d share their data and fully document their methods – Steve would have a lot less leverage with people like myself who, while hardly experts, have a basic understanding of science in general and an interest in what’s going on. If you say “trust me, it’s my field”, then I say: trust Steve on stats – it’s his field!

  56. Falafulu Fisi
    Posted Sep 3, 2007 at 7:21 PM | Permalink

    I posted the following message at RealClimate but was not published by the admin, in which I criticized Hansen’s climate linear feedback system model. I don’t know if my message was offensive or something.

    ——- message posted at RealClimate ——–

    Timothy Chase said…
    I point this out because it really doesn’t make much sense to speak of running “climate data thru a purely black box model” given the complexity of the climate system – the fact that there as so many aspects which could be modeled.

    Tim, how about you (ray ladbury , RealClimate members and others) go and re-read the Rossow/Aires paper that I quoted at James Annan’s website. I didn’t respond to James Annan’s comment over there as he seemed to nitpick certain lines & paragraphs to quote in his message, meaning I doubt that he read the mathematical derivation to be able to understand it and make comment. I also doubt that he (James) is familiar with feedback control theory at all, so I couldn’t be bothered to reply.

    “Inferring instantaneous, multivariate and nonlinear sensitivities for the analysis of feedback processes in a dynamical system: Lorenz model case-study”

    http://pubs.giss.nasa.gov/docs/2003/2003_Aires_Rossow.pdf

    Once, you understand the paper above, then you can come back for a debate. If you don’t have a background in feed-back control theory, then I suggest the following books:

    #1) “Digital Control of Dynamics Systems” , 3rd Ed, by Gene.F.Franklin, J.David. Powell and Michael. Workman

    #2) “Feed-back Control of Dynamical Systems” 4th Ed, by Gene.F.Franklin, J.David. Powell and Abbas. Emami-Naeni

    #3) “System Identification: Theory for the User” , 2nd Ed, by Lennart Ljung.

    #4) “Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models” by Oliver Nelles.

    #5) “Multivariable Feedback Control: Analysis and Design” by Sigurd Skogestad , Ian Postlethwaite.

    #6) “Neural Networks for Modelling and Control of Dynamic Systems” by Magnus Nørgaard, O. Ravn, N. K. Poulsen, and L. K. Hansen

    The NARMAX algorithm (nonlinear auto-regressive moving average exogenous) multiple input multiple output (MIMO) mentioned in Rossow/Aires paper is covered in detail in Ref #6.

    Timothy Chase said…
    Whether you are speaking of genetic programming or neural networks, such approaches work best when the problems which they are applied to are fairly deliminated – or at least far more so than the earth’s climate system.

    Tim get any of those books I have quoted above and read about the discipline of feedback control theory and Systems Identification, to get some understanding, then come back to debate. The comment you made above is obvious to me that your opinion is uninformed. Get to grip with feedback control theory before you comment. You will also find the formal definition of “System Sensitivity” of how it is defined in “Control” literature.

    Ray Ladbury said…
    Climate science is quite mature

    Ray, all disciplines of science are mature today, however it doesn’t mean we are getting any closer to a universal climate model. How about you read the issues discussed in the following NASA sponsored workshop (see link below) a few years back and see those difficulties that I said we’re not any closer to finding. Note if you click on the link and when it appears, just refresh it, so that the text doesn’t quashed up to the left side.

    “WORKSHOP ON CLIMATE SYSTEM FEEDBACKS”
    http://grp.giss.nasa.gov/reports/feedback.workshop.report.html

    I am amazed at how, everyone here (RealClimate) attacked Dr. Swartz’s work on climate sensitivity as too simplistic while James Hansen’s paper on sensitivity does escape criticism. James Hansen’s work on climate linear feed-back control is now regarded as inappropriate and misleading. See the link for the workshop above. Where is the criticism of James Hansen? His model is also simplistic and doesn’t represent the true physics. Just compare the model described in Hansen’s paper (below) and the work done by Rossow/Aires, showed above.

    “Climate sensitivity – Analysis of feedback mechanisms” by J. Hansen, A. Cacis, D. Rind. G. Russell.

    The one sided criticism of Dr. Schwartz’s work looks like an attempt to shut-down debate.

  57. Gary
    Posted Sep 3, 2007 at 7:28 PM | Permalink

    #57 – FWIW, it’s not worth responding to TCO. While precise in his criticisms, TCO seems to be stuck in a very narrow rut, always telling Steve what he should do and contributing little to the overall climate science discussion. Best not to feed the trolls.

  58. steven mosher
    Posted Sep 3, 2007 at 8:39 PM | Permalink

    re 58.

    you will not get a proper answer about control theory over there.
    They don’t know what they don’t know. Or they do know and they
    don’t dare say anything.

  59. Larry
    Posted Sep 3, 2007 at 8:52 PM | Permalink

    58, 60, indeed. I’m still waiting for a rebuttal of the Schwartz paper. Poo-pooing isn’t a rebuttal.

    Yes, system dynamics theory has a lot to contribute to this discussion, and in a roundabout way (via filter theory), it’s connect with the statistical work that Steve is doing here.

  60. steven mosher
    Posted Sep 4, 2007 at 7:50 AM | Permalink

    re 61.

    They are leaning on Annan Analysis of schwarz..

    1. A mistake about the CI of a ratio
    2. A mistake about the Time constant

    SteveMc links annans blog.

  61. Posted Sep 4, 2007 at 8:25 AM | Permalink

    #62. That is especially rich. I read the old Hansen paper that someone scanned and Steve posted. In this paper, it is stated that some chosen smoothing was arbitrary. At least Schwarz documented why and how it applied. I could not determin if it was correct, but his assumptions appeared reasonable to me.

  62. Falafulu Fisi
    Posted Sep 4, 2007 at 4:47 PM | Permalink

    Here is another message that I posted at RealClimate but was not published by the admin. English is my second language and to my understanding of what I have written, the message is not offensive. I am starting to think that the so called climate scientists at RealClimate might have got their PhDs in rubbish collection or carpet cleaning rather than climate science. I put forward something to for a debate, but obviously it is not posted. I bet that if I post something about toilet cleaning at RealClimate they would enthusiastically post it on their side. What is wrong with my challenge RealiClimate? Prof. Mann, stop hiding behind the so called reasonable debate moderation, since my message below is reasonable language, unless you see my message as something unanswerable from your fellow group members. If it is the latter, then just front up and say, that you don’t have a clue to what I had put forward for discussion. Just be honest about issues that you know and the ones that you don’t. Not posting my message, I see that as you’re scared to debate with me. This is unbelievable.

    ——- message posted at RealClimate ——–

    doghaza said…
    if your post contained URL references to said books, your post is probably was held up by the spam filter.

    Yes, that filter was the web admin him/herself, which must be a member of the RealClimate group. The refs to books were not URLs, just pure texts, with titles, authors, publishers.

    doghaza said…
    This filter hss no bias against people who don’t know anything about climate science yet imagine themselves to be a threat to the entire field.

    No, I didn’t say that I am a threat to the entire field. I am probably a threat to the knowledge of the members of the RealClimate group here. There is a misconception in the public that if you’re a climate scientist, then professionals from related disciplines such as mathematics, computer science, economics, solid state-physics, photonic electrical engineering, signal processing, control engineering, cosmology, etc, couldn’t understand climate modeling. This is not true. Anyone who has mastered numerical analysis could go into the field of climate modeling (or any branch of numerical-based science really), since all those related disciplines I have mentioned above do involve numerical techniques that are exactly the same as used in climate modeling. All you need to know, is the vocabulary or terminology in the climate modeling literatures. For example, it is often discussed in climate science about the chaotic nature of the weather, but this phenomena happens everywhere, such as a DSP (digital signal processing) engineer would want to know about it in order for him/her to design a heartbeat monitoring instrumentation. There are certain heart conditions that the heart itself beats irregularly and unpredictable (chaotic beating). The aim of the instrumentation is to design and implement algorithms, that could detect the telltale signs of such irregular beatings when it starts to develop and it automatically alerts a medical officer immediately about this development where a possible pre-emptive treatment could be considered before that condition gets worse. Such instrumentations are in regular use in clinics and hospitals today. The chaos in heartbeat time-series is no difference to chaos in climate or weather pattern. Same mathematics although different domain of application. Chaos also occurs in the domain of economics/finance, and again, same mathematics, but different domain of application.

    There are lots of discussion here about climate sensitivity, where I have quoted before that the answer to this question lies in the field of feedback control theory. I don’t know whether any members of the RealClimate group have knowledge about feedback control theory or not, but I could be wrong. I raised it before here, but it was not published and I don’t see what was wrong with my message. There is a formal definition of “system sensitivity” as defined in the control literature and that is exactly what I want to bring the debate here into that subject. See, if my message is not published here, then how on earth that Timothy Chase and others know that I have put forward a topic to debate about?

  63. Posted Sep 5, 2007 at 7:05 AM | Permalink

    #52, a point that is frequently missed over the discussion of warming or cooling, is humidity (water vapor). BTW- water vapor being approx 95% of the atmospheric greenhouse effect. Part of the reason why I don’t buy the end of the world scenerios is the fact that 90F in Tampa FL (70% RH) is not equal to 90F (50% RH) in NYC. Why? Humidity. Consult a psychrometric chart Or the nice detailed one http://www.coolerado.com/CoolTools/Psychrmtrcs/0000Psych11x17US_SI.pdf and one immediately sees why this is so. The total energy of the air is not simply the degree you read off the thermometer, it is the Enthalpy in Btu/# which includes the latent heat along with sensible heat. In the example above 90F, 70%RH = 45 Btu/#; 90F, 50% RH = 38.6 Btu/#, hence Tampa is hotter than NYC. My training is as an engineer. Quite frankly, I don’t understand why anyone would even attempt to create a global average temperature when such a measurement of partial energy is meaningless. Reading the thermometer to determine heat is like watching the leaves on trees and inferring the wind speed, there may be some correlation but nothing of accuracy. The only meaningful measure of energy in the atmosphere is Enthapy. Again, going back to the psychrometric chart, the UHI effect not only increases the air temperature but lowers the RH of the air!!! Why? It’s called humidity ratio, the total absolute water vapor content of the air. When energy is added to the air it affects both RH and Temp. This concept is used in design of air conditioning btw. Note the far right scale on the chart in pounds of water. http://apollo.lsc.vsc.edu/classes/met130/notes/chapter4/specific_hum_plot.html Averg SP by Latitude. http://www.engineeringtoolbox.com/air-psychrometrics-properties-t_8.html Talks about absolute Humidity from my perspective as an engineer in air conditioning. BTW- the psychrometric chart is specific to altitude so you can’t use the same chart for Washington DC as for Denver CO, otherwise you would get the wrong Enthalpy Btu/#.

    I understand why the average person would be easily duped by people showing temperature charts, however, anyone who is in the physical sciences such as engineers and meterologists would know this. It’s no wonder that most meteorologists don’t buy into AGW, they all know about Dew Points, i.e. the water content of the air and the psychrometric chart. Basically, without factoring in humidity (water vapor) any attempt to compare temperatures from one year to the next is like comparing apples to oranges, both are fruit, both are round and both have a skin, but that’s the extent of their similarity.

    Secondly, another point is that attempting to adjust urban temperatures using rural temperatures without knowing the surface heat absorption characteristics of the area is rather a finese than some accurate estimation. If thermodynamics has anything to say about this idea of adjusting urban sites it would be to demand to know what are the heat transfer rates of the surfaces under and around the temperature recording stations. Without this information you can’t compute the heat transfer rates and thus you can’t accurately adjust the temperature to some ambient temperature that discounts the UHI effect. If you can’t accurately do this, then once again, any inference of regional let along global temperature change is sheer guessing.

    Worse yet, this brings up the whole point of significant digits, again how is it statistically meaningful in tenths or hundredths of a degree with a margin of error or bias that is greater than units (location of the decimal place) used? This is basic math here, something you learn in grade school.

  64. Posted Sep 5, 2007 at 7:50 AM | Permalink

    UHI effect changing Enthalpy. In the example above with 90F, 70% RH = 45 Btu/# with 150 grains of water (read far right scale), if the asphault were to heat the air say 5F to 95F, the absolute amount water would stay the same at 150 grains, the new conditions would be 95F, 60% RH = 46.5 Btu/# with 150 grains. That means a 5F increase in temperature made a 1.5 Btu/# increase in Enthalpy. Please note the psychrometric chart that temperature and Enthalpy are not proportional!!! The higher the ambient temperature, the less linear the relationship between temperature and Enthalpy.

    So in order to reverse engineer out the UHI effect, you must determine the excess amount of temperature, then go to the psychrometric chart, follow the grains of water line from the old temperature to the new temperature which then tells you the proper Enthalpy or total heat. However, before you reverse engineer the UHI effect you have to determine what that excess amount of temperature is. You might be able do this by taking the temp and RH of the nearby station, to determine the grains of water in the air and compare that to the temp and RH of urban station. If the grains are the same, then and only then could you reasonably infer the amount of UHI effect in Enthalpy. If the grains are not the same, then you must consider two issues: 1. the nearby rural station is not representative of the area you are trying use to factor out the UHI effect and 2. you must do a thermodynamic calculation of the heat absorption rates of grass specific to the ambient temp. versus to the urban mixed use area heat absorption rate. (heat absorption rates are not the same for each material, urban environments contain, asphault, concrete, etc.)

  65. SteveSadlov
    Posted Sep 5, 2007 at 11:36 AM | Permalink

    RE: #66 – As a solid promoter of the total heat content paradigm, I wholeheartedly endorse your point of view.

  66. steven mosher
    Posted Sep 5, 2007 at 11:36 AM | Permalink

    re 64. Falafulu Fisi

    The function of RealClimate is to REINFORCE the message and support believers.

    Nothing wrong with that.

    Sometimes the RC folks will engage you when it suits there purpose. otherwise they
    will ignore you

  67. Falafulu Fisi
    Posted Sep 5, 2007 at 2:25 PM | Permalink

    Steven Mosher said…
    The function of RealClimate is to REINFORCE the message and support believers.

    That is absolutely true Steven. They reinforce the beliefs of those who are sitting on the fence, just in case they turned away from their AGW religion. Here is another example. If you look at the following link (see below), you noted that my messages #161 and #162 are flagged awaiting moderation, however there are 2 more messages #163 and #164 that were posted at a time after #161 and #162, but have been approved. I don’t see anything offensive in my messages #163 and #164 , in which it is being held to be approved. Do you see anything offensive there Steven Mosher? I am curious to know.

    http://www.realclimate.org/index.php/archives/2007/08/regional-climate-projections/

    RealClimate, could you state of what is wrong with my messages?

  68. Falafulu Fisi
    Posted Sep 5, 2007 at 2:33 PM | Permalink

    Steven Mosher here are my messages which I have mentioned in post #69 above, just in case they have not being approved and removed from awaiting moderation status.

    —– posted at RealClimate : message #161 —–

    Timothy Chase, here is another useful paper for introduction to System Identification , which is a sub-domain of feedback control theory. I have emailed you the list of books on feedback Control theory and System Identification (both linear & non-linear books).

    “GREENHOUSE CLIMATE MODELS: AN OVERVIEW”
    http://www.date.hu/efita2003/centre/pdf/103.pdf

    Why I think that this paper is important for introduction, is to get a preliminary idea since it is relevant to sensitivity debate. When Gavin Schmidt posts his article of why he thinks that AR(1) is an over-simplification in Schwartz’s sensitivity analysis, System Identification methodology could be applied here.

    —————— end #161 ——————–

    —– posted at RealClimate : message #162 —–

    Timothy Chase said…
    A tool to consider wherever raw data exists in a great abundance.

    That’s true. Support Vector Machine (SVM) is a very popular algorithm in the domain of Machine Learning (ML). It is widely adopted in different areas such as speech recognition, robotics, spam-filter, search engine, image processing, data classification (pattern recognition), data approximation (regression), computer network intrusion detection, data-mining, text-mining and many more. It is only recent that I have noted that the research communities in climate science have picked up on it and start using it for analysis. SVM has been around since the early 1990s. Also, the majority of the statistics community haven’t picked up on it yet.

    There are lots of different algorithms in ML that would be of interest to climate modeling, but I think that climate researchers have not picked up on some of those algorithms yet. Eg, PCA (principle component analysis) is a linear method and it is still the popular method for climate data analysis today, however there is a more robust version which is called k-PCA (kernel PCA), a non-linear method and this algorithm has been available in the journal of machine learning since 1998.

    The Journal of Machine Learning Research (JMLR) have made all their papers freely available online to download.

    “Journal of Machine Learning Research”
    http://jmlr.csail.mit.edu/papers/

    Here is another popular freely downloadable research papers in machine learning, from the “Neural Information Processing Systems” (NIPS). You can click on any volume from any year to download a title that might of interest.

    “Neural Information Processing Systems”
    http://books.nips.cc/

    Occasionally, I see some papers related to climate modeling and data-analysis that are being published in the machine learning literatures, prior to being picked up in climate research journals. Eg, a popular algorithm in machine learning called ICA (independent component analysis), a linear method that has been available in the machine learning literatures since the early 1990s, but researchers in climate had just picked up on it over recent years, such as the following paper (see link below). Again, ICA hasn’t penetrated the statistics literatures yet. It is still not widely known in the statistics community yet.

    “Rotation of EOFs by the Independent Component Analysis: Toward a Solution of the Mixing Problem in the Decomposition of Geophysical Time Series.”

    http://adsabs.harvard.edu/abs/2002JAtS…59..111A

    The links for JMLR and NIPS free resources are useful links for climate researchers.

    —————— end #162 ——————–

  69. Sam Urbinto
    Posted Sep 5, 2007 at 2:52 PM | Permalink

    I think I’ve mentioned this before, pretty much you need to know the material you’re over, the amount of sunlight in the location, the temperature, the humidity and the wind speed and direction. From multiple locations in the same basic area (varied distances and heights) would be nice too. Fairly often (10 minutes?)

    But then you still are only getting the surface anyway for some location that gets averaged. So what meaning it has, eh.

    Talking about the 5×5 grids, at the poles they’re triangles 50km base 560km sides with the point at 90 (sea, trapezoids starting at 88 N/S) I think that’s an average around 167 sqkm. Longitude lines are 111 km per degree, latitude it starts at 2 km per degree at 90 going to 111 per degree at the equator.

    The diagonal of the smallest 5×5 grid is 555km and the largest 785 km. You can get a distance here for any grid’s diagonal: http://www.wcrl.ars.usda.gov/cec/java/lat-long.htm

    So I calcualted out the first 4th of each hemisphere has “squares” around 74 sq km, growing to 392 then 517 then 556 sq km by the time the equator is reached.

    Those are just to get an idea what’s going on, it’s very much a swag.

    Sort of like sampling only temperature near the ground. 🙂

  70. Falafulu Fisi
    Posted Sep 5, 2007 at 3:12 PM | Permalink

    Steven Mosher, my message #161 at RealClimate (RC) which I had cut & pasted up here in #70 on this thread , has been deleted by the moderator at RealClimate. I see that my message #162 has made it, ie, it has been posted up there at RC. I think that my message #161 is a direct challenge to them (RC) and I suspect that is the reason why they don’t post it up there, since Gavin Schmidt’s assertion that Schwartz’s AR(1) sensitivity analysis is an over-simplification. I was going to argue that it is not the case at all by applying system identification algorithms.

One Trackback

  1. […] Clark Hansen’s “Bias Method” » This Summary is from an article posted at Climate Audit – by Steve McIntyre on Saturday, […]

%d bloggers like this: