The Two Jeffs on Emulating Steig

The two Jeffs ( C and Id) have interesting progress reports on emulating Steig using unadorned Tapio Schneider code here. Check it out. One of the first questions that occurred to third party readers was whether RegEM somehow increased the proportional weight of Peninsula stations to continental stations as compared to prior studies. Jeff C observes:

As I became more familiar, it dawned on me that RegEM had no way of knowing the physical location of the temperature measurements. RegEM does not know or use the latitude and longitude of the stations when infilling, as that information is never provided to it. There is no “distance weighting” as is typically understood as RegEM has no idea how close or how far the occupied stations (the predictor) are from each other, or from the AWS sites (the predictand).

Jeff notes that the Peninsula is less than 5% of the land mass, but has over 35% of the stations (15 of 42). Jeff shows that the reported Steig trend is cut in half merely through geographic grouping, saying:

Again, I’m not trying to say this is the correct reconstruction or that this is any more valid than that done by Steig. In fact, beyond the peninsula and coast data is so sparse that I doubt any reconstruction is accurate. This is simply to demonstrate that RegEM doesn’t realize that 40% of the occupied station data came from less than 5% of the land mass when it does its infilling. Because of this, the results can be affected by changing the spatial distribution of the predictor data (i.e. occupied stations).

The irrelevance of geography is something that we’ve observed in other Mannian methods, starting right from the rain in Maine (which falls mainly in the Seine.) In MBH98, geographic errors didn’t “matter” either. The rain in Spain/Kenya error in Mann 2008 only “mattered” because the hemisphere changed. Had the error stayed in the same hemisphere, it wouldn’t have “mattered”. Gavin Schmidt and Eric Steig took umbrage at someone bothering to notice a geographic error in the Supplementary Information. At the time, I noted that I wasn’t sure whether the error was a typo or, as in the MBH and Mann 2008 cases, was embedded in the information files themselves. In either case, I didn’t expect the error to “matter” simply because I didn’t expect that Steig’s methods care whether a site was correctly located – a point that is a corollary to the results of the two Jeffs. Take a look.


  1. Bernie
    Posted Feb 15, 2009 at 9:16 AM | Permalink

    I am not sure as to the preferred location of comments. My apologies for the repetition.

    Jeff & Jeff:
    Nicely done.
    The caveats and limitations are also a nice touch.

  2. Ian
    Posted Feb 15, 2009 at 9:40 AM | Permalink

    Well done JJ09, not sure, and really don’t care whether it halves the slope, or doubles it, this is proper science, look at a paper, find a few station errors, consider the whole approach, notice the difference between coastal and inland sites, east/west is significant and then do the proper science. Why the hell can’t the professionals do the same….

  3. Allen63
    Posted Feb 15, 2009 at 9:40 AM | Permalink

    Value added analysis.

    Some interpolation is OK. However, interpolation along changing latitude lines is more like extrapolation into the unknown in this case, I think. My “feeling” is that we may never accurately know (accurately enough to confirm or deny AGW) what was happening in Antarctica prior to satellites.

    Nonetheless, interesting that warming is confirmed during decades when warming may have been inertia (from natural warming since the 1800s). However, lately, no warming — when warming is supposed to be rampant from CO2. A result pretty much in accord with Steig’s finding, I presume.

  4. Ron Cram
    Posted Feb 15, 2009 at 10:03 AM | Permalink

    Nice work, guys. Amazing what comes out when a paper is examined closely.

  5. Jeff C.
    Posted Feb 15, 2009 at 10:21 AM | Permalink


    Thanks for the mention. FYI – the reason I was waffling between the peninsula station count being 35% (15 of 42) and 40% (17 of 42) was that I was originally including the two stations on the South Orkney Islands (Orcadas and Signy) under the category of the “Antarctic Peninisula”. Although they are close (about 400 miles away), they aren’t actually on the peninsula so the 35% value is technically correct. However, an island at 60 degrees north latitude is probably no more representative of the continent-wide Antarctic trends than is that of the peninsula itself.

  6. Jeff C.
    Posted Feb 15, 2009 at 10:24 AM | Permalink

    Ooops, should be 60 degrees South latitude.

  7. Ryan O
    Posted Feb 15, 2009 at 10:49 AM | Permalink

    Very nice. It would be interesting to see how your result compares to the AVHRR recon, especially as the overall trend you got was half of Steig’s AWS recon . . . which already didn’t match the AVHRR. Great job, guys!
    P.S. – and many thanks for the explanations and help in the Deconstruction thread. 🙂

  8. AnonyMoose
    Posted Feb 15, 2009 at 11:05 AM | Permalink

    RegEM is used for the geographic infilling, but is unaware of geography? I thought RegEM was being used to infill missing data periods within each station’s records. Somehow in previous phrasing I missed that it was used for the land blanket. Well, using for geographic tasks something which is unaware of two-dimensional geometry is asinine. And such a use also obviously requires adjustments for topology, as altitude has obvious effects upon temperatures.

  9. Bob North
    Posted Feb 15, 2009 at 11:36 AM | Permalink

    I must say that I did not realize that RegEM does not consider location in infilling data. The nearest-neighbor concept is nothing new in evaluating geospatial data and, seems to me, that some sort of distance/altitude weighting is absolutely critical for having any type of confidence in a reconstruction of any type, be it benzene in a groundwater plume, gold in an ore body, or temperature in Antartica. If this is true, I am left speechless.

    • Posted Feb 15, 2009 at 12:01 PM | Permalink

      Re: Bob North (#9),

      It’s true. It’s the main reason I was interested in working on RegEM. I’m just an engineer and don’t work in climatology but I can’t even begin to understand how data could be imputed without spatial weighting.

      I left this on RomanM’s thread when I was starting to figure it out.

      I’m really not sure what’s going on exactly but I don’t believe location of individual stations was incorporated into the analysis except by correlation to the 3 pc’s in the RegEm analysis.

      In the reconstructions done above, it is only the temperatures fed into a matrix. No information at all is included about spatial position. You could feed a trend in from the north pole and as I understand it the weighting would be determined by the statistical match to the 3 PC’s. Of course you’d expect a better match to stations in closer proximity but there are no other controls or even checks for the final weighting (things that make engineers happy).

      Jeff C showed clearly that adding more stations with a particular trend (upslope) changes the weighting of those trends on all the other stations.

  10. RomanM
    Posted Feb 15, 2009 at 12:16 PM | Permalink

    Jeff Id (#10), you are correct in their conclusion that RegEM does not, in fact, explicitly take distance into account. However, a satellite image from a given moment does not usually look like random noise – there are going to be spatially induced relationships between values from adjacent grid points. Regem uses the covariance matrix of the values from the various gridpoints and will thus contain information about the location relationships implicitly. The PCs are calculated from that matrix.

    In some circumstances, this may have some advantages. The results from a grid point may conceivably be more closely related to those of a point at the same altitude than with a closer point at a diferent altitude. From looking at their results, it is highly doubtful that this method is the best way to go in this analysis,.

  11. PhilH
    Posted Feb 15, 2009 at 12:30 PM | Permalink

    I would like to propose that you add “The Air Vent” to your “Blogroll.”


  12. stan
    Posted Feb 15, 2009 at 12:35 PM | Permalink

    snip – editorializing

  13. Hank Henry
    Posted Feb 15, 2009 at 12:51 PM | Permalink

    I wonder how this method would work if you took a couple dozen spaced out stations from Australia or North America or some other continent and ran the numbers?

  14. MikeU
    Posted Feb 15, 2009 at 1:13 PM | Permalink

    This is a little difficult to comprehend – how can any scientist not think spatial weighting matters when interpolating geographically sparse data? It seems intuitively obvious that it would, and it’s good to see that intuition is borne out by running gridded cells over the same dataset. A factor of 2 difference isn’t exactly minor. Nice work, gentlemen.

  15. Allen63
    Posted Feb 15, 2009 at 1:13 PM | Permalink

    Though open to considering most any idea, I would be hard to convince that a global or continental temperature extrapolation/interpolation method that does not explicitly account for distance, altitude, longitude, latitude, and surroundings (mountains, plains, ocean, snow fields, city streets, etcetera) has any credibility for “fine” trend discrimination of the type needed to prove AGW. I also question the use of data from hundreds of miles away to “correct” local data. How many of these methods are “proven” performers and how many are merely personal preference “guesstimates”?

    If the AGW community and politicians were not planning to “forcibly-take” my money to prevent “catastrophic” AGW, I could be more charitable regarding the questionable AGW “science” used to support their position.

  16. Jeff C.
    Posted Feb 15, 2009 at 1:49 PM | Permalink

    My take on this is that RegEM will do a reasonable job on infilling missing data from dates if the temperature series used are well correlated. Take California as an example. If I had ten locations from the central valley (Bakersfield – Fresno area) that had data missing from scattered dates, RegEM would probably do a decent job infilling as all the sites have similar climactic trends.

    Now add in two more California series of Eureka (NW Coast) and Palm Springs (SE Dessert). Although both are in California, their climates aren’t anything like the central valley. RegEM doesn’t know that these two sites are distant as it doesn’t have the lat/long coordinates. As Roman mentions in #11, the algorithm can recognize that the two new sites aren’t well correlated to the original ten and make implicit assumptions regarding distance. However, if lots of points in the time series are missing (as we know is the case) any trend difference may not be readily apparent. In addition, since there are only two outliers, their impact on the overall reconstruction is minimized. RegEM would still do a reasonable job on the central valley sites, but probably is off considerably for the other two.

    We might have an analogous geographic inbalance in the Steig reconstruction due to the prevalance of peninsula and coastal stations.

    I’m going to run some varariations deleting gridcells to see the impact. I’ll put up the results later in the day.

    • Billy Ruff'n
      Posted Feb 15, 2009 at 2:35 PM | Permalink

      Re: Jeff C. (#17),
      The California example raises a question in the mind of this layman: Wouldn’t it be possible to “test” the accuracy of RegEM reconstructions under varing circumstances using known data, e.g. take data from a region where data sets are reasonably complete (California?), randomly delete elements of the known data set to approximate the data voids found in places like Antarctica, then let RegEM do a reconstruction to “recreate” the missing (deleted) data, and then measure the accuracy of the reconstructed data vs the deleted (known) data? You could then repeat the process under varying circumstances, e.g. more or less deleted data, greater geographic separation between stations, delete data from stations with different climactic trends.

  17. Jeff C.
    Posted Feb 15, 2009 at 2:54 PM | Permalink

    Another observation in all of this is that Steig refers to the occupied station data set as the predictor and the AWS reconstructed data set as the predictand. However, in the methods section and in the Jeff Id implementation of RegEM, there is no real distinction between these data sets.

    From the methods section of the paper:

    RegEM uses an iterative calculation that converges on reconstructed fields that are most consistent with the covariance information present both in the predictor data (in this case the weather stations) and the predictand data (the satellite observations or AWS data).

    In the Jeff Id RegEM, both the AWS data (63 sites) and the occupied station data (42 sites) are dumped in. 105 series are input, and 105 series are output. RegEM has no idea which is the predictor and which is the predictand. This is important as any true distance weighting would need to be applied to both data sets, not just the occupied station data.

    We have been focused on the AWS recon series (i.e. the 63 infilled AWS series) because that is what Steig provided. RegEM also provides a occupied station recon (the other 42 series of the 105 output from RegEM). There might be something to be learned by evaluating these.

    I highly recommend visiting Jeff Id’s site and getting his code to run through this yourself. You need both R and Matlab to run it, if you work in a technical industry you probably have a site-wide Matlab license.

  18. Ryan O
    Posted Feb 15, 2009 at 6:01 PM | Permalink

    I’m not sure if this belongs here or in the RegEM thread. It’s more general than just Antarctica.
    In my opinion, the problem with using RegEM like this is more fundamental than spatial weighting. RegEM can only find correlations. It cannot identify causality. Without establishing causality, no weighting scheme is any less arbitrary than another.
    While it is reasonable to suppose that stations close to each other will share many aspects of climate and weather, the degree to which they share is dependent on more than just proximity. For example, land topology can greatly affect the degree of coupling, and, over time, the nature of that coupling can change. This is not restricted to times of years or more – it can occur in days if it is strongly dependent on highly variable aspects of the weather, such as the location of the jet stream. The correlation in temperature between points in an area as small as Montana (where I’m from) can change from year to year and month to month, and depending on how much needs to be infilled, there may not be enough actual information present for any algorithm (RegEM or otherwise) to accurately capture those changes.
    Because RegEM cannot identify causality, I hesitate to believe any uncertainties calculated from imputed series. Those uncertainties are valid IF and ONLY IF the correlations between the series did not change with time . . . and by virtue of needing RegEM to begin with, that information is unavailable.
    Furthermore, the uncertainties with any algorithm like RegEM must be dependent on the data being missing in a random fashion. With climate information in real life, this is rarely the case. Data is usually missing in large chunks, not randomly.
    I think imputation algorithms can be valuable tools, but the current way they are being used is, in my opinion, poor science. This doesn’t just go for Antarctica; it goes for the paleoclimate reconstructions as well. Unless you can identify a physical cause for temperature between sites to be correlated in a meaningful way, the output is suspect at best.
    Correlation does not demonstrate causality.

  19. Hal
    Posted Feb 15, 2009 at 6:12 PM | Permalink

    # 20 Hans Erren

    RegEm shows that the correlation of i before e (rather that e before i, following c) is much higher in full continental usage.

    The most prevalent peg was the fact that the study appeared to reverse the “Steig” meme that has been a staple of disinformation efforts for a while now.

    Therefore Stieg is now the correct usage.

  20. husten
    Posted Feb 15, 2009 at 6:24 PM | Permalink

    Jeff, Geostatistics tools have an algorithm that computes and visualizes the the variance between stations . (Semi)Variogram.
    Geostats like RegEM has it’s own set of believers and deniers. In the end it all depends how you interpret the results. Some of the tools might be of use to you here. The various software packages involved all honour the distance between stations, address spatial clustering etc. I am not sure there is a FREE software, one would need to google. Most code around is based on the 1980’s BLUEpack.

  21. Robert Wood
    Posted Feb 15, 2009 at 6:48 PM | Permalink

    JeffId @ #10

    With spatial data, one should not “impute” data, as S&M do, but, rather interpolate… in two dimensions. This is not difficult. Image processing technology applies here.

    • Posted Feb 15, 2009 at 7:00 PM | Permalink

      Re: Robert Wood (#24),

      One of my several hidden backgrounds is image processing which I’ve done quite a bit of. It seems from the quote Jeff C put in the article that these methods were rejected. — “Unlike simple distance weighting……….”

  22. Jon
    Posted Feb 15, 2009 at 8:58 PM | Permalink

    Isn’t proper spatial weighting one of the purposes of GISS? This puts new contrast on the discrepancy between GISS and Steig et al results. Also vis-a-vis the notion of prior methods doing a ‘back of the envelope estimation’, Steig et al seem to be taking the casual approach!

    To the Jeffs: I know you are struggling to select a spatial weighting of your own. Perhaps you ought to follow the GISS procedure and use RegEM as a replacement for the GISS in-fill procedures.

    • Ryan O
      Posted Feb 15, 2009 at 10:12 PM | Permalink

      Re: Jon (#26), There’s a conceptual difference between weighting in order to determine an aggregate measurement (like “surface temperature”) and weighting in order to drive an interpolation. (GISS may do a bit of both . . . the homogenization procedure smacks a bit of the latter.)
      The former does not use the weighting to determine the degree to which the grid cells communicate or can be used as predictors of each other. It is an averaging technique to spread points of data over large areas.
      The latter, on the other hand, uses the weighting to determine the degree to which grid cells are causally connected – i.e., can be used as predictors of each other. While distance is a reasonable parameter to choose, how would you know if you’ve chosen the correct distance function? How would you assign a confidence interval? Nor would the function be universal; it would depend on topology and local climatic variations.
      While distance certainly wouldn’t be useless, it is not necessarily physically correct. Physically one quantity can only be used as a predictor of another if there is a causal connection. Distance probably could be used in many cases to approximate the strength of that connection . . . but it’s still just an approximation. Depending on the situation, it might not even be a good approximation.
      Still, it’s probably better than just teleconnecting everything everywhere . . . but I’m not sure how much I would trust it.

  23. alpha
    Posted Feb 15, 2009 at 9:38 PM | Permalink

    (longtime reader, first time commenter, etc.)

    Steve, it might be useful to put together a regularly updated summary post with a table.
    Idea is to give a heads up view of exactly which papers have been contested and by who. With
    you, Watts, Pielke Junior, and others, it appears that there are a fair number of skeptics out there.

    Moreover, there appear to be at least three or four nontrivial data errors (Mann, Hansen, and now Steig)
    and it would be good to summarize them (acknowledged and unacknowledged) in one place.

    Possible columns of this table could be:

    1) year
    2) authors
    3) title
    4) abstract
    5) your summary of their message
    6) link(s) to any papers, code or forensics you or others have done which points out an error
    7) link(s) to any admission of error
    8) link(s) to any post or email denying access to data or code
    9) number of citations
    10) binary indicator: was this cited in IPCC or other influential report?

    Basic idea is the briefest possible heads up view to show — definitively — that many of the
    core papers by Mann and crew have a significant degree of associated controversy.

    The goal would be for me (or others) to point an intelligent non-specialist at this page and — by sheer weight of acknowledged error — demonstrate to them that the “consensus” isn’t really so.

    • Peter D. Tillman
      Posted Feb 16, 2009 at 12:12 PM | Permalink

      Re: alpha (#26),

      Rather than asking Steve to do this (he’s pretty well committed), why don’t you write up a brief summary of Stieg’s article (and others, if possible) in the format you outlined, and post it to — the wiki for this site. Make sure to post a link here so people can find your writeup, and add to it.

      Best, Pete Tillman

  24. alpha
    Posted Feb 15, 2009 at 9:42 PM | Permalink

    also, all this stuff regarding imputation is very dodgy.

    If you have a missing data problem, if you have NAs, it really depends on how those NAs arose.

    Are they missing completely at random (MCAR)? Are the NAs due to completely random events and uncorrelated with any other measured variable?

    Or are they statistically dependent on some other columns in your predictor matrix?

    The best thing to do when you have an NA is to get the distribution of the possible values for that NA. Sometimes in the MCAR case this will be the univariate distribution for that column alone, because no others columns correlate with it. At other times it will be a conditional distribution, where other columns can be used. But using a simple scalar replacement is generally not a good idea.

    NAs are real things, they shouldn’t be papered over…

  25. bugs
    Posted Feb 15, 2009 at 10:35 PM | Permalink

    “starting right from the rain in Maine (which falls mainly in the Seine.)”

    No abuse here and childish taunts, no siree, just honest, dispassionate commentary and analysis.

    Steve: And even after the error had been identified and was well known, the rain in Maine continued to fall in the Seine in the Mann et al 2007 SI. I agree that Mann’s refusal to correct the geographical mislocations was childish. The important thing for readers to reflect on is that, under Mannian methods, geographic errors don’t “matter”.

  26. mhc
    Posted Feb 16, 2009 at 12:39 AM | Permalink

    It seems to me that a very simple way to get reasonable weights for weather stations would be to draw a Veroni diagram, with one polygon around each weather station. Use the area of the polygon as the weight. This essentially attributes to each weather station all points closer to that station than any other station, and avoids all kinds of special cases like two weather stations in a cell, no stations in a cell, etc.

    (I spent most of the winter of 1978 using this method to estimate the amount of ore in potential open-pit uranium mines. Veroni diagrams can be drawn pretty easily with a compass and straightedge, and areas can be found with a mechanical planimeter. Somewhat boring, though. 🙂

  27. Posted Feb 16, 2009 at 2:14 AM | Permalink

    In Googling “Veroni Diagram” I found that Dr John Snow solved the mysterious cholera epidemic that struck central London in the mid 19th Century by constructing such a diagram, where the weights were the number of cholera victims in each house.

    Here is the diagram:

    The outer line denotes the points of equidistance between water pumps. The only pump in the area was the one on Broad Street, so Snow went there and took the handle off and the epidemic was stopped in its tracks.

    The pump (without the handle) is still there:

    and the nearest pub was renamed in honour of the Doctor:

  28. BKR
    Posted Feb 16, 2009 at 2:37 AM | Permalink

    The reference to John Snow is wonderfully accurate for this blog. For a fascinating account of his work, see
    Statistical Models and Shoe Leather
    Author(s): David A. Freedman
    Source: Sociological Methodology, Vol. 21 (1991), pp. 291-313
    Published by: American Sociological Association
    Stable URL:

    As Freedman (a noted statistician) observes “…this paper suggests that statistical technique can seldom be an adequate substitute for good design, relevant data, and testing predictions against reality in a variety of settings.”

    • Peter D. Tillman
      Posted Feb 16, 2009 at 12:16 PM | Permalink

      Re: BKR (#32),

      Thanks for the ref, and the delightful quote. It does seem like the Mannian crowd prefers shuffling electrons to fieldwork, doesn’t it?

      I’d appreciate a copy of the paper if you have it: pdtillmanATgmailDOTcom

      TIA & Cheers — Pete Tillman

  29. bender
    Posted Feb 16, 2009 at 4:03 AM | Permalink

    I had *assumed* RegEM used spatiotemporal covariance for infilling. My very bad.

    • John A
      Posted Feb 16, 2009 at 4:23 AM | Permalink

      Re: bender (#33),

      Its very easy to overestimate the abilities of the Team. You’d think we would have wised up by now…

  30. Louis Hissink
    Posted Feb 16, 2009 at 4:16 AM | Permalink

    Spatial weighting has been used in geostats for decades – its otherwise known as area of influence issues, and usually the method is to weight a reading with the area it is representative of.

    It’s actually a fancy way of making sure the intensive values (station readings) are applied to areas (the extensive variables) to produce numbers which actually mean something physically.

    IOt’s the main reason why the method of calculating the global mean temp is just a load of horsefeathers. No different to aggregating telephone numbers in the grid cells.

  31. husten
    Posted Feb 16, 2009 at 5:22 AM | Permalink

    Geostats: Re: husten (#22), Google has indeed yielded tools that are still free:
    1.) at
    or 2.) at which is also recommended by NASA – see 3.) at

    You don’t want to use commercial tools. Their customers are the oil, coal and mining industries and therefore always yield results biased towards global cooling. 😉 /sarcasm-off/ Most are based on the same code – gslib- but offer more or less comfortable data analysis tools.

  32. Harry Eagar
    Posted Feb 16, 2009 at 12:14 PM | Permalink

    As I recall, Tufte also showed that the epidemic was pretty well over before Snow took the pump handle off.

    I predict something similar with respect to global warming and mitigation.

  33. Posted Feb 16, 2009 at 1:04 PM | Permalink

    Pat Michaels has an interesting point in his editorial piece at the Guardian:

    The problem with Antarctic temperature measurement is that all but three longstanding weather stations are on or very near the coast. Antarctica is a big place, about one-and-a-half times the size of the US. Imagine trying to infer our national temperature only with stations along the Atlantic and Pacific coasts, plus three others in the interior.

    As a test of the veracity of the RegEM reconstruction, couldn’t a similar experiment be conducted with the US continent as the testbed?

  34. Dean P
    Posted Feb 16, 2009 at 6:42 PM | Permalink

    I posted the following over at RC and here was Gavin’s reply:

    One of the primary points of the WUWT discussion is that RegEM assumes that the missing data locations are random in nature. Since the missing data in the Steig paper isn’t random (almost all the interior is “missing”), then is it proper to use a method that assumes otherwise?

    [Response: No. The issue isn’t that the data have to be randomly missing in time or space, but that the value of the missing data is unrelated to the fact that it is missing. – gavin]

    So the question is, does RegEM really expect the missing data to be from random locations and if so, does Gavin realize this?

    • Jason
      Posted Feb 17, 2009 at 8:04 AM | Permalink

      Re: Dean P (#44),

      The data in the interior is different because it is in the interior.

      The data in the interior is missing because it was in the interior.

      It is plainly NOT the case that the value of the data is unrelated to the fact that it is missing. In fact, both are primarily the result of a single factor: the geographic location.

      It is remarkable that Gavin can understand the basic principle and then so spectacularly fail to apply it.

  35. BKR
    Posted Feb 17, 2009 at 2:23 AM | Permalink

    Pete: (#40)

    Click to access Freedman91.pdf

    Absolutely delightful reading if you are interested in statistical analysis of data (it focuses social science but is still useful for thinking about the stuff here).

  36. Carrick
    Posted Feb 17, 2009 at 10:03 AM | Permalink

    A bit off-the-wall question, but one could apply RegEM to the total Earth ground-station data.

    What happens when you do that? How does the reconstruction compare to other methods (e.g.,GISS & HadCRUT).

  37. Harry Eagar
    Posted Feb 17, 2009 at 11:49 AM | Permalink

    ‘the value of the missing data is unrelated to the fact that it is missing.’

    While I take Jason’s point, on the other hand, the value of missing data will always be unrelated to the fact that it is missing.

    The value is/was the value, it does not change by the fact of observation, at least not till you get down to subatomic observations.

    The speed at which I drove to work this morning is unrelated to whether a cop with a laser gun was hiding behind a billboard (purely hypothetical, we don’t have billboards in Hawaii).

    • bender
      Posted Feb 17, 2009 at 2:51 PM | Permalink

      Re: Harry Eagar (#48),

      While I take Jason’s point, on the other hand, the value of missing data will always be unrelated to the fact that it is missing.

      The data are assumed to be mssing at random relative to the data field, not the x, y geo-coordinates. The reality is that data sensors are most likely to fail under extreme weather, which in Antarctica means extreme cold. I note further that the data field in fact covaries with x, y, z, with south pole and higher elevations being far colder than more northerly locations at lower elevation. Therefore it is a bit disingenuous to dismiss someone’s concern about data missing larger from the colder interior continental region. ie. The data are probably not missing at random with regard to the temperature data field. (Of course you have no way of knowing this because the missing data are not known to you. That is why it is an assumption that is so easy to pretend is true.)

      Re: Dean P (#44),
      You must be careful in talking with the dismissive ones at RC. They would rather so show you to be an idiot – especially if you smell like a skeptic – than take the time necessary to fully answer (and clarify, if necessary) your question. As my comment above would indicate.

  38. Dean P
    Posted Feb 17, 2009 at 3:06 PM | Permalink


    I’ve had that happen there before, but thanks for the warning. The reason I even went over there was that several people had mentioned Jeff & Jeff’s work, but hadn’t described what the result was (and I’m not sure I did it justice). The link to the article had been dismissed as not worthy of reading even on the slowest of days. I really wanted to see if Gavin knew that RegEM doesn’t factor in distances when doing its magic.

    I think he knows that, but then I’m not sure he understands the issue that this can lead to. And that is that the easiest way to warm the antarctic is to take more measurements on the peninsula. RegEM will handle the rest…

    (note, this is my take on what J&J have shown… if that’s not accurate, then please let me know!)

    • bender
      Posted Feb 17, 2009 at 4:25 PM | Permalink

      Re: Dean P (#50),
      It is impossible to probe them. They know when you’re probing and will always dodge if they’re in the wrong. This makes it impossible to tell when they’re really wrong vs. when they’re just annoyed and are ignoring the substance of your question. Occasionally, and only when they are in the right, you will get a sensible reply. Don’t forget to genuflect. It increases your odds of getting an answer.

  39. Harry Eagar
    Posted Feb 17, 2009 at 5:44 PM | Permalink

    Hmmm. OK. I get you better now. It’s like the joke about the drunk looking for his lost keys in the dark. He looks under the lamp post, although that’s not where the keys are, it’s where the light is.

    There is an expressive word in Hawaiian pidgen for what I think about this: shibai.

    To watch you guys tease out the story is amusing and stimulating. Even if The Team were all bullet-proof statisticians, I would still think that recreating a temperature history by making up temperatures where there are no observations is shibai, at least when the making up is on the scale we see in Steig.

    But it’s all about process in the audit, ain’t it?

    Gee, hope I didn’t cross the snip line.

  40. Douglas Hoyt
    Posted Feb 18, 2009 at 7:36 AM | Permalink

    Here is a suggested test of RegEM:

    1. Double up the number of pennisular stations from 15 to 30.
    2. The new 15 stations have identical temperature records to the existing 15. The “new” stations could be visualized as being 10 feet away from the existing stations.
    3. Re-run the RegEm analysis and calculate the continent wide trend. If RegEm is correct, then there should be no change in trend. If the method is poor, the trend will change.

  41. Posted Feb 26, 2009 at 7:49 AM | Permalink

    Ich denke, dass das Umweltbewusstsein langsam besser wird. Außerdem wird die Marktlücke Umweltschutz immer grösser, da ja auch der Bedarf steigt. So nimmt die Entwicklung auch langsam einen positiven Verlauf. Desweiteren sollte man auch die Wirtschaftskriese als Chance sehen, denn wenn alte Strukturen vernichtet werden, werden neue Strukturen wachsen. Wie die Natur so will wenn etwas Neues entsteht kann um weiten besser und moderner sein. Lass die Politik nur machen, die wollen alle nur Ihr Geldwelt retten und nicht unsere Umwelt. In der Politik geht’s nur um Macht und nicht um Idealismus.
    Hier ist auch ein Tipp für euch zum Posten.
    Soll kein Spam sein- Ich finde diese Seiten interessant
    Umweltschutz im Bog
    NEUER Rekord bei Kohlendioxidausstoß
    Mit nachhaltigem Gruß

%d bloggers like this: