Smerdon et al 2008 on RegEM

Smerdon et al 2008 is an interesting article on RegEM, continuing a series of exchanges between Smerdon and the Mann group that has been going on for a couple of years.

We haven’t spent much time here on RegEM as we might have. I did a short note in Nov 2007 here.

In July and August 2006 open review(s) of Bürger and Cubasch (CPD, 2006) of Mann (dba Anonymous Reviewer #2) referred to “correct” RegEM, referring to Rutherford et al 2005.

On July 10, 2006, Jean S commented on Rutherford-Mann 2005 “adaptations”, noting three important “adaptations:
1. use of a “hybrid”: separate application of RegEM to “low-frequency” and “high-frequency” as separated by Mannian versions of Butterworth filters;
2. stepwise RegEM
3. an unreported “standardization” step. CA readers were aware by this time that short-segment standardization could have a surprising impact on reconstructions – a point that was then very much in the news with the confirmation of this point in the North and Wegman reports being very fresh at the time. Jean S observed of this unreported standardization:

The above code “standardizes” all proxies (and the surface temperature field) by subtracting the mean of the calibration period (1901-1971) and then divides by the std of the calibration period. I’m not sure whether this has any effect to the final results, but it is definitely also worth checking. If it does not have any effect, why would it be there?

The unreported standardization step noted by Jean S was subsequently determined to be at the heart of an important defect described in Smerdon and Kaplan 2007.

Mann et al 2005 had supposedly tested the RegEM methodology used in the Rutherford et al 2005 reconstruction, then presented as mutually supporting the MBH reconstruction (although Rutherford et al 2005 could be contested on alternate grounds than those discussed by Smerdon, since Rutherford et al 2005 used Mannian PCs without apology.) Smerdon and Kaplan 2007 findings are summarized as follows (in Smerdon et al 2008):

Mann et al 2005 attempted to test the R05 RegEM method using pseudoproxies derived from the National Center for Atmospheric Research (NCAR) Climate System Model (CSM) 1.4 millennial integration… Mann et al 2005 did not actually test the Rutherford et al 2005 technique, which was later shown to fail appropriate pseudoproxy tests (Smerdon and Kaplan 2007). The basis of the criticism by Smerdon and Kaplan (2007) focused on a critical difference between the standardization procedures used in the M05 and R05 studies (here we define the standardization of a time series as both the subtraction of the mean and division by the standard deviation over a specific time interval). Their principal conclusions were as follows: 1) the standardization scheme in M05 used information during the reconstruction interval, a luxury that is only possible in the pseudoclimate of a numerical model simulation and not in actual reconstructions of the earth’s climate; 2) when the appropriate pseudoproxy test of the R05 method was performed (i.e., the data matrix was standardized only during the calibration interval), biases and variance losses throughout the reconstruction interval; and 3) the similarity between the R05 and Mann et al. (1998) reconstructions, in light of the demonstrated problems with the R05 technique, suggests that both reconstructions may suffer from warm biases and variance losses.

In their Reply to Smerdon and Kaplan 2007 (Mann et al 2007b), they claimed that the selection of the ridge parameter using generalized cross validation (GCV), as performed in R05 and M05, was the source of the problem:

The problem lies in the use of a particular selection criterion (Generalized Cross Validation or ‘GCV’) to identify an optimal value of the ‘ridge parameter’, the parameter that controls the degree of smoothing of the covariance information in the data (and thus, the level of preserved variance in the estimated values, and consequently, the amplitude of the reconstruction).

Smerdon et al 2008 (JGR) delicately observed that this assertion was supported only by arm-waving:

The authors do not elaborate any further, however, making it unclear why such conclusions have been reached.

Smerdon et al 2008 report that “explanation” of Mann et al 2007a, 2007b for the problem is invalid, stating:

These results collectively rule out explanations of the standardization sensitivity in RegEMRidge that hinge on the selection of the regularization parameter, and point directly to the additional information (i.e., the mean and standard deviation fields of the full model period) included in the M05 standardization as the source of the differences between M05- and R05-derived reconstructions. It should be noted further that this information, especially in terms of the mean, happens to be “additional” only because of a special property of the dataset to which RegEM is applied herein: missing climate data occur during a period with an average temperature that is significantly colder than the calibration period. This property clearly violates an assumption that missing values are missing at random, which is a standard assumption of EM (Schneider 2006). If the missing data within the climate field were truly missing at random, there presumably would not be a significant systematic difference between the M05 and R05 standardizations, and hence corresponding reconstructions. The violation of the randomness assumption, however, is currently unavoidable for all practical problems of CFRs during the past millennium and thus its role needs to be evaluated for available reconstruction techniques.

Finally, when the application of RegEM-Ridge is appropriately confined to the calibration interval the method is particularly sensitive to high noise levels in the pseudoproxy data. This sensitivity causes low correlation skill of the reconstruction and thus a strong “tendency toward the mean” of the regression results. It therefore will likely pose some challenges to any regularization scheme applied to this dataset when the SNR in the proxies is high. We thus expect RegEMTTLS, which according to M07a does not show standardization sensitivity, to have significantly higher noise tolerance and skill than RegEM-Ridge. The precise reasons and details of this skill increase is a matter for future research. It remains a puzzling question, however, as to why the R05 historical reconstruction that was derived using RegEM-Ridge and the calibration-interval standardization (thus expected to be biased warm with dampened variability) and the M07a historical reconstruction that used RegEM-TTLS (thus expected not to suffer significantly from biases) are not notably different. The absence of a demonstrated explanation for the difference between the performance of RegEM-Ridge and RegEM-TTLS, in light of the new results presented herein, therefore places a burden of proof on the reconstruction community to fully resolve the origin of these differences and explain the present contradiction between pseudoproxy tests of RegEM and RegEM-derived historical reconstructions that show little sensitivity to the method of regularization used.

While Mann is normally not reticent about citing papers under review, Smerdon et al 2008 is, for some reason, not cited in either Mann et al 2008 or Steig et al 2009.

In my opinion, there are other issues with the RegEM project, quite aside from these ones. These relate more to exactly what one is trying to do with a given multivariate methodology.

References:
Bürger, G., and U. Cubasch. 2005. Are multiproxy climate reconstructions robust. Geophysical Research Letters 32, no. L23711: 1-4.
—. 2006. On the verification of climate reconstructions. Climate of the Past Discussions 2: 357-370.
Mann, M. E., S. Rutherford, E. Wahl, and C. Ammann. 2005. Testing the Fidelity of Methods Used in Proxy-Based Reconstructions of Past Climate. Journal of Climate 18, no. 20: 4097-4107.
Mann, M. E., S. Rutherford, E. Wahl, and C. Ammann. 2007a. Robustness of proxy-based climate field reconstruction methods. J. Geophys. Res 112. (revised Feb 2007, published June 2007) url
Mann, M. E., S. Rutherford, E. Wahl, and C. Ammann. 2007b. Reply to Smerdon and Kaplan. Journal of Climate 20: 5671-5674. url (Nov 2007)
Mann, M.E. 2006. Interactive comment on “On the verification of climate reconstructions” by G. Bürger and U. Cubasch. Climate of the Past Discussions 2: S139-S152. url
Rutherford, S., M. E. Mann, T. J. Osborn, R. S. Bradley, K. R. Briffa, M. K. Hughes, and P. D. Jones. 2005. Proxy-Based Northern Hemisphere Surface Temperature Reconstructions: Sensitivity to Method, Predictor Network, Target Season, and Target Domain. Journal of Climate 18, no. 13: 2308-2329.
Smerdon, J. E., J. F. González-Rouco, and E. Zorita. 2008. Comment on “Robustness of proxy-based climate field reconstruction methods” by Michael E. Mann et al. J. Geophys. Res 113. url
Smerdon, J. E., and A. Kaplan. 2007. Comments on “Testing the Fidelity of Methods Used in Proxy-Based Reconstructions of Past Climate”: The Role of the Standardization Interval. Journal of Climate 20: 5666-5670. url
Smerdon, J. E., A. Kaplan, and D. Chang. 2008. On the origin of the standardization sensitivity in RegEM climate field reconstructions. Journal of Climate 21: 6710-6723. url

This entry was written by Stephen McIntyre, posted on Feb 13, 2009 at 11:22 AM, filed under RegEM and tagged multivariate, RegEM, ridge, smerdon, ttls. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

44 Comments

Jeff Id

Posted Feb 13, 2009 at 1:52 PM | Permalink

This sensitivity causes low correlation skill of the reconstruction and thus a strong “tendency toward the mean” of the regression results.

I wonder if that’s what happened to my reconstruction efforts. I have attempted to recreate the AWS RegEm at this link.

Antarctic Temperature RegEm Forensics

I am admittedly a rookie when it comes to these techniques bu while I get similar results to the AWS reconstruction there is an offset between my data and the final result.
Scott Brim

Posted Feb 13, 2009 at 2:18 PM | Permalink

Is there a RegEM 101 online class available somewhere that explains in some detail what it is and what it does?
- MJT
  
  Posted Feb 13, 2009 at 3:33 PM | Permalink
  
  Re: Scott Brim (#2),
  http://climateaudit101.wikispot.org/Glossary_of_Acronyms has a Glossary of Acronyms..
  This article is linked from that list.
  
  Click to access imputation.pdf
  - Scott Brim
    
    Posted Feb 13, 2009 at 4:12 PM | Permalink
    
    Re: MJT (#3),
    .
    I’m reading through this material now. Thanks.
    .
    Of course, another related question arises — one which is likely much more difficult to answer — have the software systems that implement RegEM techniques been subject to software QA/QC/V&V processes so as to be reasonably certain these systems are operating faithfully to the externally specified RegEM algorithms or desired variants thereof?
    .
    Moreover, are these software systems adequately documented both internally and externally? (I define “software system” in a broad context here. Some of the short snippets of R code we see posted on CA on a regular basis would have taken pages and pages and pages of coding to duplicate in FORTRAN.)
    - MJT
      
      Posted Feb 14, 2009 at 12:55 AM | Permalink
      
      Re: Scott Brim (#8),
      
      This is a comparison of Barsat and RegEM but in Section 2 it gives a brief overview of RegEM as it relates to some of the proxy studies.
      
      Click to access BARSAT_Part2.pdf
Kohl Piersen

Posted Feb 13, 2009 at 3:40 PM | Permalink

I am bemused.

Are the problems which seem to bedevil many of these climatological papers (not just the ones identified here) a result of scientists who are climatologists (or whatever) first and only secondarily statisticians? Or are they the result of wishful thinking (they want & expect certain results and so anything which leads in that direction is ‘promoted’ and anything contrary ‘relegated’? Or is there something more methodical going on – deliberate manipulation for particular purposes (whatever they may be)?
Kohl Piersen

Posted Feb 13, 2009 at 3:45 PM | Permalink

MJT

The link refers to an article entitled:

“Analysis of Incomplete Climate Data: Estimation of Mean Values and
Covariance Matrices and Imputation of Missing Values” by
TAPIO SCHNEIDER

Clearly, that arises from climate science.

Has this method been successfully applied in other fields?
- MJT
  
  Posted Feb 14, 2009 at 12:20 AM | Permalink
  
  Re: Kohl Piersen (#5),
  Here is a non-climate science paper on the The Regularized EM Algorithm. http://www.cs.ucr.edu/~hli/paper/hli05rem.pdf
  A quick google also turned up some information in medical imaging for example: http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&id=JEIME5000012000001000017000001&idtype=cvips&gifs=yes
  - Jean S
    
    Posted Feb 14, 2009 at 3:41 PM | Permalink
    
    Re: MJT (#14),
    I can not access the second paper right now, but the first paper does not appear to be dealing with Schneider’s RegEM.
    - Scott Brim
      
      Posted Feb 14, 2009 at 4:17 PM | Permalink
      
      Re: Jean S (#28)
      
      Jean S: MJT #14… I can not access the second paper right now, but the first paper does not appear to be dealing with Schneider’s RegEM.
      
      The first paper was put up for my benefit as newbie in understanding RegEM. The paper contains a fairly readable description of the concepts behind RegEM, albeit for an application in the physical sciences that is not related to climate science.
      .
      I must say that the subsequent discussions among Geoff, Pat, and Peter are a lot more understandable to me, after having read MJT’s various RegEM-related references, than they would have been otherwise.
Henry

Posted Feb 13, 2009 at 3:51 PM | Permalink

a burden of proof on the reconstruction community to fully resolve the origin of these differences and explain the present contradiction between pseudoproxy tests of RegEM and RegEM-derived historical reconstructions that show little sensitivity to the method of regularization used

That comment is a particularly subtle knife. Now the team will have to fail to confirm each others results by a particular amount.
- John A
  
  Posted Feb 13, 2009 at 6:47 PM | Permalink
  
  Re: Henry (#6),
  
  That comment is a particularly subtle knife. Now the team will have to fail to confirm each others results by a particular amount.
  
  …unless they ignore them and “move on”. I don’t see the compulsion of the Hockey Team to explain themselves anywhere – do you?
Mike B

Posted Feb 13, 2009 at 4:03 PM | Permalink

Oh my.

And to think Smerdon and Kaplan did this work right under Hansen’s nose at Columbia.
Geoff Sherrington

Posted Feb 13, 2009 at 5:24 PM | Permalink

This is astounding. The use of figures outside the calibration period, to influence the calibration measures, is an absolute NO NO. Keeping your cake and eating it too.

ALL infilling is guesswork. One can easily imagine when data are unreported because they are abnormally high or low in the opinion of the reporter. There is no mathematical method that will infill such results, since infilling tends towards the local mean. (Some help can come from surrounding sites, but we are not discussing that here).

It is especially unwise to infill or smooth calibration data. Such an approach tends to remove the very purpose of the calibration. Missing values are best left out and if this causes distortions (like leaving out a disproportionate number of winter results) then the conclusion is simply that the calibration cannot be done with the available figures.

Remove them from active duty and move on. I mean the figures, but the same could apply to the authors.
- Pat Frank
  
  Posted Feb 13, 2009 at 6:05 PM | Permalink
  
  Re: Geoff Sherrington (#10), Geoff, thanks for raising an issue that’s bothered me for some time. In-filling of ‘missing’ data points, by whatever means, adds no independent information to a data set. If someone did that in any other experimental field, it would be called cheating.
  
  It’s totally bemusing to me that this has become standard practice in proxy climatology. Any statistics based on in-filled data sets that is any different from the statistics of the original spotty data is meaningless and misleading. I plain don’t understand how this practice passes review. And I further plain don’t understand how proxy climatology retains any credibility outside the hermetic cloister of the practitioners. But it does, remarkably.
  
  I have the same problem with the practice of adjusting urban temperatures using rural station values. Doing so removes all independent significance from the urban temperatures. It makes them no more than a scaled version of the rural set. Using those adjusted temperatures as independent additions to a regional data set is completely unjustifiable.
- davidc
  
  Posted Feb 14, 2009 at 1:55 AM | Permalink
  
  Re: Geoff Sherrington (#9),
  
  Geoff, you say: “Missing values are best left out” but with this kind of statistical analysis that would exclude a lot of data. Given the prominent use of covariance I think you would need a data set in which every location had data at the same times as every other location that was retained. And that would not be easily decided by objective criteria. If you had a record from 1850 to the present which missed June 1945 wouldn’t you need to exclude the data at all the other locations for June 1945? Or would you exclude all the long record and keep the others? As Steve says, with these methods you need to know what you are trying to do.
  
  I think that the fundamental problem is the type of statistical analysis that is being done. I think the right approach (to answer the questions, are we facing catastrophe and do we need to take drastic action?) is:
  
  1. Look at individual sites, estimate a trend.
  2. Plot a histogram of the trends (and if you feel a need, do some of the statistics following naturally from that view of the results)
  
  Anything alarming here?
  
  If there is (like a trend of the order of the expected UHI) separate the data into subsets according to expected UHI. No “adjustments” required.
  
  Look at individual sites at the alarming end of the histogram to see if there’s anyting odd going on. Is it actally linear, for example? Easy to see with an individual site (impossible I think with ridge regression, and PCA, although they assume that ALL of the sites are linear; this statistical analysis just obscures the obvious). Specific sites issues cf Wattsup?
  
  There could be lots more to flow from this approach but I think it’s very striking that this simple approach to communicating hasn’t been tried. I say striking, not surprising.
  
  Steve, thanks for your fantastic work (and Anthony).
  - Geoff Sherrington
    
    Posted Feb 14, 2009 at 6:58 AM | Permalink
    
    Re: davidc (#16),
    
    I think that a major part of the problem is that in many instances the range of data is quite small for say temperatures at a station. So also is the effect being sought, such as 0.8 deg C over a century over the globe. Therefore, a rather anomalous result, discarded because it looks out of place, can lever the rest of the data. The alternative, to in-fill with a value that looks plausible either by eyeball or convoluted statistics, is not scientific, although it is convenient. There is no great merit in departing from scientific principles, in the direction of guessing, because it’s convenient. My science does not work that way, unless heavily qualified. Would you rely on landing an aircraft whose automatic guidance software was part guess, when an error of a few feet separates success from disaster?
    
    You would appreciate the problem more if, like in my geochemical career, we were looking for those golden nuggests in the tonnes of dross. Our purpose to to strive to confirm the integrity of information-rich outliers, while climate science attempts to homogenise them and this has the potential to distort the true data.
    
    We also had the possibility of returning to collect another specimen, which option is generally not open to temporal temperature data. So perhaps I am being too dogmatic about the climate approach.
    
    However, when in Re: MJT (#14), I read in the first reference
    
    Since the missing data Y is totally unknown and is “guessed” from the incomplete data, how can we choose a suitable Y to make the solution more reasonable?
    
    then I see others recognising the same problem.
    
    davidc again,
    
    If you had a record from 1850 to the present which missed June 1945 wouldn’t you need to exclude the data at all the other locations for June 1945?
    
    The answer is no, you would not reject the data lightly. You might treat separately the data before and after a large loss and you might create synthetic losses to see the errors that that induces. In an extreme case, we have so many adjustments to data already that I for one have no trust in most of it. We might in some cases like the Antarctic, stretch missing value substitution to the point of fairy stories.
    
    If it walks right, if it talks right, if it looks right, it’s still a guess and one algorithm is as good as another to mislead. That is one of my criticisms of the primal urge to estimate a global average so that small children can be scared. Forget the global average, handle sub-sets with which you can become intimate and start to understand them.
    
    When you understand a few, then aggregate your knowledge. And again and again.
    - davidc
      
      Posted Feb 15, 2009 at 12:53 AM | Permalink
      
      Re: Geoff Sherrington (#18),
      
      Forget the global average, handle sub-sets with which you can become intimate and start to understand them.
      
      When you understand a few, then aggregate your knowledge. And again and again.
      
      Exactly. Then you won’t use nonsense data like the Finnish sediment data that showed when the bridge was built. But my other point was that the problems start with the selection of a statistical model. Once you decide to use RegEM, or lots of similar methods, as I understand it you are pretty much obliged to infill. But if you used simpler methods you’re not. If you analyse trends at individual locations missing data is not a problem. But if you were tempted to infill missing data at a single location and use data from other locations it would be pretty obvious that what you were doing was flawed. And if you felt you had to infill it would be obvious that the data at that location was inadequate.
    - Geoff Sherrington
      
      Posted Feb 15, 2009 at 4:36 AM | Permalink
      
      Re: davidc (#36),
      
      I’ll defer to Steve if he disagrees with this, but to me there is almost a philosophical event, a “paradigm shift” to use the language of 20 years ago.
      
      Most science is directed to the discovery of new knowledge and its confirmation. Most science seizes upon observations that are different to the expected. Some branches of science love data that vary over large spans in complicated ways. But much of what climate science does is “homogenise”. Seek the lowest denominator, discard the interesting variations on the theme. Play a monotone instead of a melody.
      
      Another paradigm shift. In the past, if mistakes were made and discovered, the authors and helpers would seek what went wrong and why. Often this was by breaking up the bigger problem into bite-sized pieces, controlling as many variables as possible, then saying Eureka when a subset revealed the error. Climate people on the other hand commonly deny error and react by contriving hideously complicated models that have little chance of finding fundamental errors because they are poorly formulated originally.
      
      When PCs were coming available for my geological fratenity in the early 80s, we encouraged data gatherers to still plot maps by hand, the old way, rather than just plug the numbers ito a package. Numbers take on a personality when you worry them enough. That can help you do a better job. But, to lump a huge mass into an amorphous machine and place your faith in a one-page printout – that is not science, that is becoming a servant of the machine.
      
      It is for reasons like this that I dislike the brute force solution to the 4 colour map problem. It’s not really proven, it’s just shown very unlikely to be wrong.
Mike Davis

Posted Feb 13, 2009 at 6:40 PM | Permalink

Pat and Geoff: Thank you both for those comments.
Alan Wilkinson

Posted Feb 13, 2009 at 10:42 PM | Permalink

Pat Frank, the argument will be that the iinfilling is a merging of information from different sources to give a more complete overall picture.

The problem for them is that the additional data (satellite data) does not cover the same timespan. Therefore they are extrapolating not only spatially but temporally, and it appears most of the trend they produce comes from the timespan the satellite data does not cover.
UC

Posted Feb 14, 2009 at 2:09 AM | Permalink

This sensitivity causes low correlation skill of the reconstruction and thus a strong “tendency toward the mean” of the regression results.

I don’t know much about RegEM, but this sounds very much like the same story we’ve had here on ICE and CCE. If data is missing at random, we can use ‘natural calibration’ (ICE) as calibration data is ‘like’ the missing values. That is, we have knowledge about unknown temperature value prior to observing the proxies. That’s why estimator gives values close to calibration mean, when SNR is low.

So, if modern values are expected to be unprecedented, we should use statistical extrapolator, CCE. The difference is clear, as shown earlier with the Briffa case ( http://www.climateaudit.org/?p=4475#comment-314389 )

I tried original regem.m with Juckes’ JBB (Jones 98) dataset, and here is the result:
Peter Hartley

Posted Feb 14, 2009 at 7:53 AM | Permalink

Geoff #9 and subsequent contributors

I think that there may be a reasonable idea implicit in some interpolation of temperature series. For example, we expect the temperatures to have high spatial autocorrelation. Suppose temperature is missing at location A on date t. We can use the relationships between non-missing temperatures at A and measured temperatures at its neighbors B, C, D etc on other dates to infer the likely temperature at A on date t given the temperatures at B, C, D etc on date t. It is true that the temperature at A on date t is then some function of the measurements at the other locations on date t, and thus in a sense does not add information to those values. Surely, the constructed values at A cannot be used to tell us anything more about the spatial relationship since that was used to obtain the missing value. It may, however, be able to tell us something about the temporal relationship, which was not used to construct the missing value.

The basic idea behind EM is that it uses the currently estimated statistical model to predict the most likely value for missing observations. Those observations are then combined with the non-missing ones and used to estimate new parameter values for the statistical model. The new model is then used to obtain new estimates for the missing values and so on, iterating to convergence. If the basic model is correct and we are only uncertain about the parameter values, this has a logic to it.

One practical problem with the algorithm is that the likelihood function that is being maximized often has many local maxima and so the result is sensitive to starting values. Essentially, different sets of parameter values and values for the missing observations give an equally good fit to the observed data. A conceptual problem is that it is a “statistical sausage machine”. The source of the infilled data is rather mysterious and hidden to the user, making it very hard to judge whether the output has any value.

The results are also very dependent on us having the right functional form and the implicit assumption that the missing observations are not systematically biased in some way — they are “missing at random” and uncorrelated with the assumed determining factors in the statistical model. This latter assumption might not be right in the application in question. Ground temperatures might be more likely missing in certain types of weather, and the satellite measurements are more likely to be missing on cloudy days ie. certain types of weather.
- Pat Frank
  
  Posted Feb 14, 2009 at 12:12 PM | Permalink
  
  Re: Peter Hartley (#19), Peter, all your iterative method would do is produce a numerically coherent result. The iteration is not being done within the context of any falsifiable physical theory, and so the result is, in two words, scientifically meaningless. The significance of the result is then assigned by fiat — it looks reasonable, and so it is reasonable.
  
  The fact remains that infilling data by reference to surrounding data is not new data. It’s re-scaled old data and adds nothing to a temperature series except meaningless, but perhaps polemically useful, points. We’ve seen here how autocorrelation removes statistical degrees of freedom and raises confidence intervals. Infilling produces points that are entirely correlated to other data, statistically. They add nothing.
  
  Even infilling using satellite temperatures would require having fully established a scaling relationship between satellite data and surface station data in a scientific context entirely removed from the two temperature data sets themselves. That is, the relationship between data sets would have to be developed by reference to the instruments themselves and the physical parameters of what they measure. Temperature probes measure kinetic energy, for example, while satellites measure radiation. A physical relationship interconverting these measurements would have to be developed in order to know how to change a satellite measurement from 600 km in space into a surface station temperature 1.5 m above the ground. Only after that could satellite data be used to infer a missing surface temperature. One can’t just take a satellite T series, and a surface station T series, and normalize the first to the second to get points for infilling. Doing so is a kind of circular reasoning in which all the complexities of the relationship are compressed into the same linear hypothesis that is also being used to scale the data.
  - Geoff Sherrington
    
    Posted Feb 14, 2009 at 7:58 PM | Permalink
    
    Re: Pat Frank (#23),
    And Peter Hartley,
    
    A better analogy than I used before. Imagine a medical instrument recording your electrocardiogram. Imagine using RegEM type software to fill in missing data. How do you pick up ectopic beats and mild fibrillation? You are defeating a diagnostic purpose of the procedure.
    
    If you find missing pulse patterns, there’s not much point in taking data from siblings and infilling. There might be, if you can establish a confident relation between the appropriate behaviour of the various hearts, but the equivalent of that step is often done in climate work in an unsatisfactory way.
    
    On another tack, somewhere on these Antarctic posts it was noted that there was large variation between 2 stations on ends of a smallish island. IIRC, this amount of variation sets a base level for variation in surrounding places. That is, interpolations over large distances cannot produce a smoother result than the variation between 2 close stations unless errors or special events are present.
    
    I have no doubt that a great deal of thought and skill has gone into these problems, but occasionally problems are intractible. I’m not knocking the mathematicians who work at this – I hope they can find an answer.
UC

Posted Feb 14, 2009 at 8:01 AM | Permalink

So, Juckes INVR (=CCE) is the only (published) reconstruction method that can extrapolate. Juckes only showed smoothed result, and incorrectly though that root-mean-square residual in the calibration period suffices for uncertainties. For your convenience, I’ll plot the result using Brown’s formula for CI:

Next, let’s try RegEM with pseudoproxies that do contain significant MWP signal..
Manuel

Posted Feb 14, 2009 at 10:04 AM | Permalink

Mann et al 2005 attempted to test the R05 RegEM method using pseudoproxies derived from the National Center for Atmospheric Research (NCAR) Climate System Model (CSM) 1.4 millennial integration… Mann et al 2005 did not actually test the Rutherford et al 2005 technique, which was later shown to fail appropriate pseudoproxy tests (Smerdon and Kaplan 2007).

This sounds more like lore than science: building speculations upong previous speculations, until any contact with reality is lost and all that remains is the moral the sage wants to transmit.

Thank you very much for this blog. The auditing work being done here is outstandig. However, what shocked me most was to discover the kind of science and methodology which are behind all these doomsday predicitions.
Mike B

Posted Feb 14, 2009 at 10:42 AM | Permalink

I just read the commentary on the Burger and Cusbasch paper.

Wow. Mann is … quite … a … [trying to find a way not to be snipped] piece of work.

I can’t believe the scientific community tolerates his bullying tactics.

The idea that there is scientific consensus on the Hockey Stick, even among the relatively closed Climate Science Community is slowly being exposed as mythology.

It’s becoming increasingly obvious that it is Hansen, Schmidt, Mann and the rest of the Team who would prefer to have the debate in the echo chamber of the mainstream press rather than in scientific journals.
Peter Hartley

Posted Feb 14, 2009 at 12:58 PM | Permalink

Pat #23 I don’t really disagree with your arguments here. If there is no theory justifying linking the two series, doing so statistically can yield misleading results. I was only trying to explain intuitively what EM and related procedures try to do.

The implicit theory underlying EM is that the missing observations are randomly distributed with respect to the rest of the data and we can use the model as a “theory” to fill them in in a way that makes the chance of seeing the data that actually was measured the most probable outcome (maximized likelihood of observation).

As I said, however, I find algorithms like EM a bit of a “statistical sausage machine” — what comes out might “taste good” to the researcher, but you don’t want to know the manufacturing process or you might be disinclined to accept it! In other words, I am fundamentally agreeing with you that output from EM and related procedures needs to be treated with some caution. We would like to see any results confirmed with more transparent calculations before we pay a lot of attention to them.
Kohl Piersen

Posted Feb 14, 2009 at 2:56 PM | Permalink

Re MJT #14

Yes. Exactly what I was looking for. Thanks.
Jean S

Posted Feb 14, 2009 at 3:31 PM | Permalink

Given the acknowledgement of error

I guess it is a good strategy to admit a “small” error when pressured, and hope that no one notices the greatest problem. This “standardization problem” pales in comparison to what was actually done in Rutherford et al 2005: the core of the (Reg)EM method was coded horribly wrong (*)! Or as Tapio Schneider put it:

estimating a covariance matrix as a sample second moment matrix when data are not centered would obviously be problematic

(*) I guess Smerdon et al has not noticed this yet as they seem to use Schneider’s original code as the core of the RegEM algorithm not Rutherford’s modified code.
- Skiphil
  
  Posted Jan 31, 2013 at 3:20 PM | Permalink
  
  As a current BH thread discusses, the picture got much worse but now Mann et al. claim it doesn’t matter because they have moved on again after this inconvenient dismemberment of their work:
  
  Click to access 2010b_jclim_smerdonetal.pdf
- Skiphil
  
  Posted Jan 31, 2013 at 9:18 PM | Permalink
  
  Smerdon et al. (2013) continues to criticize the Mann-Rutherford corpus rather sharply, if briefly (it is only a Reply to a Comment):
  
  Click to access 2013_jclim_smerdonetal.pdf
Jeff Shifrin

Posted Feb 14, 2009 at 3:34 PM | Permalink

Infilling data is not just useless (since NO NEW INDEPENDENT data is being introduced), it leads to invalid conclusions. The extra dependent data appear to increase the sample size, and thus often creates a result that appears to be statistically significant because of the large sample size, but wouldn’t have been statistically significant based on the actual smaller sample.
- Jean S
  
  Posted Feb 14, 2009 at 4:26 PM | Permalink
  
  Re: Jeff Shifrin (#27),
  This is something I do not understand at all in Steig et al: if your target is temperature of Antarctica (or a large portion of it), why do you run RegEM on gridded temperature? Why not only infill your target temperature? In fact, doing that you would not need regularized EM, standard EM would probably suffice. In the same vein one can wonder why Mann et al 2008 used RegEM, and not standard EM.
Steve McIntyre

Posted Feb 14, 2009 at 5:27 PM | Permalink

Jean S, thank you for refreshing the link to your prior comment on RegEM.

Right now we seem to get two inconsistent stories from Steig on which code was used. Steig said at RC that the original Schneider code was used as is, while the article says that the Rutherford-Mann adapted code was used. Can’t both be right.
Gerald Machnee

Posted Feb 14, 2009 at 5:50 PM | Permalink

Some data is infilled, then some was deleted if it was a certain amount off. How much did the deleted data contribute to less variation. Or my question is – Was the deleted data mostly higher of lower?
MrPete

Posted Feb 14, 2009 at 8:08 PM | Permalink

Layman here. Is this anywhere close to a reasonable analogy?…

My doctor did some blood tests on me. They got my cholesterol numbers and a few other things, but didn’t obtain blood sugar, etc. So, they found another person whose tests closely matched the tests they did have for me…

…and concluded I must be diabetic since the other person is diabetic.
Mike Davis

Posted Feb 14, 2009 at 10:11 PM | Permalink

MrPete:
Find another dr. fast.
John F. Pittman

Posted Feb 15, 2009 at 6:07 AM | Permalink

The intuition behind is that we hope that the missing data
have little uncertainty given the incomplete data because the
EM algorithm implicitly assumes a strong relationship between
the missing data and the incomplete data.

http://www.cs.ucr.edu/~hli/paper/hli05rem.pdf I think we will get little traction from those using RegEM since this quote probably reflects what they do believe as scientists. However, a previous post had pointed out the problem with missing data not being random in many cases, but certain month(s). I see non-random data as a problem. Also, the spatial context must also have an assumed strong relationship for the paper’s temporal-spatial validity. Wouldn’t out of spatial samples (unused stations) be a way of testing this assumption of a strong relationship, and perhaps test the CI’s as well?
Jeff Id

Posted Feb 15, 2009 at 8:43 AM | Permalink

Spatial weigting is one of the main issues I have with the RegEM implementation in the antarctic. The fact that the data are piled together into a matrix with no attempt to insure that a station on the west antarctic has as little effect as possible on the east antarctic makes my head hurt. There is also the problem of station density in a particular area.

Jeff C has regridded the Antarctic data according to proximity in an effort to reduce the dominance of high density stations over the rest of the reconstruction. Just by reasonable regridding (still not taking the proximity to the reconstructed station into account) the AWS trend is cut in half!!

AWS Gridded Reconstruction

Steve: this is good stuff. I’d done a thread linking to this post to attract more attention to your post.
Robinedwards

Posted Feb 16, 2009 at 3:31 PM | Permalink

I’ve posted elsewhere on this infilling problem, and the problems I have with the solutions that have been used. Having just come across the lively discussion in this thread I see that I am not alone in having doubts about the validity of extensive infilling. No-one really prefers infillings (or indeed proxies) if real temperature data are available, but thermometers didn’t appear until the end of the 18th century, and even now they are not easy to install and maintain in remote, very cold regions. So we have a severe problem which those versed in the arcane arts of powerful statistical software are tackling. It seems that some practitioners are less careful than others about their data provenance, and we must be very grateful indeed for the group of engineers/statisticians who inhabit this remarkable blog and who alert the world to these unfortunate habits.

However, it still seems to me that synthesising data in an attempt to redress the lack of physical measurements in Antarctica is a worrysome procedure which would be difficult to justify to those policy-makers who currently wield absolute power in the world of climate science. Not that they any notice of anything that runs counter to their preconceived notions, of course.
UC

Posted Feb 17, 2009 at 8:57 AM | Permalink

Some time ago Jean S wrote

The book by C.R. Rao (Yes, THE Rao) and H. Toutenburg is an excellent buy if you want an uptodate account of the linear models. It is interesting to follow the work of the Team from this 400+ page book: Hegerl 2006 (total least squares, p. 70) is slightly more advanced (in terms of page numbers) than MBH (partial least squares, p. 65). Judging from that the next attempt will be minimax estimation (p. 72) or censored regression/LAD estimators (p. 80). 😉

..and if I’m correctly linking this thread to Rao’s book, we are now on the Ch8, Analysis of Incomplete Data Sets . And the point I’m after is that do we have (8.4) Missing Data in the Response (solution ICE – type calibration ) , or (8.6) Missing Values in the X-matrix ( one-regressor solution, p. 257, looks like CCE – type to me ) . As I noted before, it seems to me that RegEM is close to ICE. I wrote short script to test this, here

And, as bender mentioned extrapolator vs. interpolator concept, you can try this script with missing values in the middle:

or missing values at the end:

It is quite clear that if values to be imputed values are not ‘like’ the calibration values, and SNR is as low as in all these proxy studies, regEM (tried without options! ) won’t work. CIs are based on calibration data, and that won’t do if we want to see whether current temperatures are unprecedented.

Kalman smooth CCE output, that’s my suggestion. Start with random walk model for temperature, it doesn’t have to be perfect model at the beginning 😉
Ryan O

Posted Dec 20, 2009 at 10:36 AM | Permalink

Just as an FYI for anyone who had previously read this thread, the Steig study apparently used the original Schneider code . . . either that, or in Steig’s case, the differences between the Mann-Rutherford version and the original version are negligible.

One Trackback

By Mann 2008 – Replication II « Climate Audit on Jul 3, 2011 at 6:19 AM

[…] here (two 3 MB zip files). High-frequency recon19 shows that I wasn’t completely lost with this comment suggesting RegEM is ICE-like calibration method […]

Climate Audit