UC on Mann Smoothing

UC writes in that there’s another Mannian problem:

And GRL08 ( http://www.meteo.psu.edu/~mann/shared/articles/MannGRL08.pdf ) is the prologue. Tried this new smoother ( http://www.meteo.psu.edu/~mann/smoothing08/lowpassadaptive.m ) with HadCRUT monthly and f=0.0104,

[smoothedbest,w0,w1,w2,msebest] = lowpassadaptive(HadM, 0.0104);

got this figure,

T1

But the weights are [w0 w1 w2]

ans =

0.4100 0 0.5500

Surprisingly, this does not agree with the text, where constraint sum(w_j)=1 is specified. I guess there’s something wrong with Mann’s exhaustive search,

for weight0=0:0.01:1
for weight1=0:0.01:1.0-weight0
for weight2=0:0.01:1-weight0-weight1

I hope he will fix this soon.

UPDATE Sept 3, 8 am:
It didn’t take long. As noted in a comment below, Mann changed the online code within hours of this being posted at CA to the following (note the difference in the third line). There is no annotation of the change nor credit to CA.

for weight0=0:0.01:1
for weight1=0:0.01:1.0-weight0
weight2=1-weight0-weight1;

60 Comments

  1. Jean S
    Posted Sep 2, 2008 at 11:20 AM | Permalink

    Yes, I tried with a signal consisting entirely of ones with the last value changed to zero … and obtained a pretty interesting “smoothing” 😉

    Amazingly or not, Mann solves this rather simple optimization problem with exhaustive search … and even that is coded wrong! Seems to me that none of the reviewers bothered to check the code.

  2. Fred
    Posted Sep 2, 2008 at 11:36 AM | Permalink

    Must have been peer reviewed.

    That technique has always worked before.

  3. Jack Linard
    Posted Sep 2, 2008 at 1:41 PM | Permalink

    I’m an engineer and therefore smarter than most scientists.

    But if I was a scientist and I’d come up with that graph after years of exhaustive investigation, I would have said “Geez, 1910 to 1940 has the same shape and magnitude as 1980 to 2010! Now what the hell did those two sequences have in common?”

  4. Posted Sep 2, 2008 at 2:08 PM | Permalink

    #1

    Mann solves this rather simple optimization problem with exhaustive search

    Mann:

    The Matlab smoothing procedure,
    though not optimal in terms of its statistical properties,
    thus gives results quite similar to more sophisticated and
    precise approaches, and offers the advantage of being less
    computationally cumbersome and easily adaptable (any
    other arbitrary low-pass filter available in Matlab can be
    substituted for the ‘Butterworth’ filter used in our routine).

    less computationally cumbersome??

    tic,[smoothedbest,w0,w1,w2,msebest] = lowpassadaptive(randn(100,1),0.025);toc
    Elapsed time is 18.584496 seconds

  5. Curt
    Posted Sep 2, 2008 at 4:22 PM | Permalink

    UC (#4), you say:

    less computationally cumbersome??

    tic,[smoothedbest,w0,w1,w2,msebest] = lowpassadaptive(randn(100,1),0.025);toc
    Elapsed time is 18.584496 seconds

    Perhaps after his exhaustive search, he could no longer afford that much time…

    In my own work, I care deeply about the computational efficiency of my filter calculations. But I want to be able to repeat them 10,000 – 20,000 times per second.

  6. Thor
    Posted Sep 2, 2008 at 11:58 PM | Permalink

    It seems the code has been updated now

    for weight0=0:0.01:1
    for weight1=0:0.01:1.0-weight0
    weight2=1-weight0-weight1;

    • Jean S
      Posted Sep 3, 2008 at 6:03 AM | Permalink

      Re: Thor (#6),
      http://www.meteo.psu.edu/~mann/smoothing08/

      (minor errors fixed 9/2/08: (i) upper limit error bound too low; (ii) w3 could be chosen such that weights don’t add to unity; under most circumstances—including the analyses in the manuscript–these errors do not influence results obtained, but they could under some circumstances)

      🙂

  7. Posted Sep 3, 2008 at 12:22 AM | Permalink

    #6

    And who said they don’t read CA? 🙂

  8. Mark
    Posted Sep 3, 2008 at 1:01 AM | Permalink

    heres a BBC news article about Mann’s latest work:

    http://news.bbc.co.uk/2/hi/science/nature/7592575.stm

  9. Posted Sep 3, 2008 at 6:05 AM | Permalink

    I just read the 1300 year articles coming out in the papers this morning and wrote this link below on my blog.

    Climate hottest for 1300 years – The Second Coming?

    I then came here to see what was happening. I should have known it was already being addressed. What I say in my blog is different though. While CA is forced to address the science, selective elimination of the data seems designed to make it difficult to refute rather than a better science.

  10. Posted Sep 3, 2008 at 6:10 AM | Permalink

    Dr. Mann, some weights, such as [0.95 0.05 0] are still missing. Seems to be a rounding issue, try ‘format hex’ to locate the problem.

  11. K. Hamed
    Posted Sep 3, 2008 at 7:44 AM | Permalink

    RE #11

    This should take care of it. Loop counters seem to be single precision.

    w0=0:0.01:1;
    for i=1:length(w0)
    weight0=w0(i);
    w1=0:0.01:1-weight0;
    for j=1:length(w1)
    weight1=w1(j);
    weight2=1-weight0-weight1;

  12. Posted Sep 3, 2008 at 9:07 AM | Permalink

    11, 12. It’s just that Mann is a terrible programmer. Any student knows that you don’t use real variables in loops, you use integers, otherwise you can get this rounding problem.
    It should be
    for j=0:100; weight0=j/100;
    etc

    Or if he knew anything at all about Matlab, he would write
    weight0=linspace(0,1,101)

  13. Mark T.
    Posted Sep 3, 2008 at 9:50 AM | Permalink

    He doesn’t. One of these days he’ll learn, maybe not. I always chuckle when he mentions “filters available in Matlab.” There are no filters available, there are solvers available. Anyone can code the solvers in any language by simply picking up a book on filter design and implementing the appropriate solver. MATLAB just makes it easy. 🙂

    Btw, ad-hoc implementations that are “not optimal in terms of its statistical properties,” are quite scary. You can always find an ad-hoc implementation that works in very select cases, particularly when you have a pre-determined outcome (which can sometimes be useful). “more sophisticated and precise approaches” carry statistical weight.

    Mark

    PS: technically, MATLAB should be all caps in publications since that is the actual product name.

  14. Posted Sep 3, 2008 at 11:12 AM | Permalink

    It is also interesting to see that the great Mann still doesn’t seem to realize that his crazy ‘minimum roughness’ smoothing (a) forces the smoothed curve to go through the final data point, and (b) gives this final point an absurdly large influence on the data. Despite the facts that Steve pointed out (a) ages ago, and that (b) is clear from Mann’s own code:
    apad=2*indata(nn)-indata(nn-ipad);
    This shows that the final data point influences all the padded data, with a weighting factor of 2.

    From lowpass.m:

    % (2) pad series with values over last 1/2 filter width reflected w.r.t. x and y (w.r.t. final value)
    % [imposes a point of inflection at x boundary]

    The same lack of understanding appears in his GRL08 paper:
    “and the ‘minimum roughness’ constraint which seeks the smoothest behavior near the time series boundaries by tending to minimize the second derivative of the smooth near the boundaries.”

  15. Craig Loehle
    Posted Sep 3, 2008 at 11:28 AM | Permalink

    For those who may not have been reading CA long, the problem of smoothing up to the end of your data was discussed exhaustively here a while back. All possible methods for smoothing Mann’s data out to 2006, as in the recent paper, were discussed and shown to create artifacts, sometimes huge artifacts.

  16. RomanM
    Posted Sep 3, 2008 at 3:44 PM | Permalink

    With regard to UPDATE Sept 3, 8 am, the text below had been posted by this morning on Mann’s Supplemental Information page for the paper:

    Lowpass filter routine to implement ‘adaptive’ boundary constraints as described in manuscript (minor errors fixed 9/2/08: (i) upper limit error bound too low; (ii) w3 could be chosen such that weights don’t add to unity; under most circumstances—including the analyses in the manuscript–these errors do not influence results obtained, but they could under some circumstances)

    No attribution to CA’s discovery. Maybe they will when they fix the other problem in the program.

    In the paper, mann states:

    Rather than choosing a single best constraint, the adaptive approach seeks an optimal combination S0(t) of the three individual smooths Sj(t) of time series T(t) that result from
    application each of the three lowest-order (see section 1) boundary constraints viz.

    S0(t) = w1S1(t) + w2S2(t) + w3S3(t)

    S1(t), S2(t) and S3(t) represent the minimum smooth, slope, and roughness solutions
    respectively, and the coefficients wj are chosen such that ÎŁ [S0(t)-T(t)]2 is minimized,
    subject to the constraints wj∈[0,1] and Σwj =1.

    But the program reads:

    smoothed=weight0*smoothed0+weight1*smoothed1+weight2*smoothed2;
    mse = var(smoothed-indata)/var(indata);
    if (mse<=msebest)
    w0=weight0;
    w1=weight1;
    w2=weight2;
    msebest=mse;
    smoothedbest=smoothed;

    If I read “var” correctly, mse in the program calculates the variance of the differences between the weighted combination of the smoothed series and the temperature series (indata), not the sum of squares of the differences (The term in the denominator of mse is irrelevant since it is the same for all of the approximately 5000 cycles of the loop – a waste of calculation). When the means of the series S0(t) and T(t) are equal, then the variance and the mean square error are also equal. Otherwise, they are not.

    I don’t think that Prof. Mann understands that his Butterworth-filtfilt smoothing produces three series whose means are in general not only all different from each other but from the mean of the temperature series as well. He is actually minimizing ÎŁ [S0(t)-T(t) – (w1m1 + w2m2 + w3m3 – mT)]2 where m1, m2, m3 and mT are the means of the four series involved in the calculation. The final weights produced by the program could differ from the weights when the mse is correctly calculated.

    Perhaps, he could also explain why the conditions he has imposed on the minimization are necessary. In fact, he might discover that the best fit (with minimum mse) is obtained by simply regressing the three series on the temperatures with no minimum preconditions on the “weights”. It could have saved some embarassment because the flawed program would have been unnecessary.

  17. Posted Sep 3, 2008 at 4:27 PM | Permalink

    Independent Verification of the coding and Verification of the calculations is required before the calculated numbers can be considered to be candidates for acceptable results.

  18. RomanM
    Posted Sep 16, 2008 at 2:56 PM | Permalink

    How come I seem to kill a thread whenever I post a comment? 😉

    I have done some work to determine whether the error noted by UC and the error that I noted in comment 17 actually have a material effect on the results in the paper. The answer in short is yes, on both counts. My description of that work is too long to fit in the margin … er, I mean, to be include in this post, so I have created a pdf document which can be found here.

    Of the two errors, I think that the latter one (var instead of mse) is the more egregious of the two. The first is a mathematical error which relates to the author’s abilities as a programmer, but the latter indicates a lack of appreciation of the elementary statistical tools and concepts which he uses not only in this paper, but also in all of his temperature reconstructions.

    • MarkR
      Posted Sep 16, 2008 at 8:41 PM | Permalink

      Re: RomanM (#19), Have you tried posting a copy of your comments in #17 on Real Climate? You might get an answer from Prof. Mann?

    • Posted Sep 16, 2008 at 10:54 PM | Permalink

      Re: RomanM (#19),

      How come I seem to kill a thread whenever I post a comment? 😉

      You want to check the body count, I always thought I’m the threadstopper 😉

      Note that lowpassmin.m is used in almost all the m-files Mann provided in PNAS SI. var instead of mse is a serious error, but as the whole idea of padding data to minimize (smoothed – real) is silly, I guess it doesn’t change the results.. As Willis points out, this kind of smoothing is really a statistical exercise, where one needs to predict future values. Mann and Brohan seem to have difficulties in understanding this.

    • Jean S
      Posted Sep 17, 2008 at 1:50 AM | Permalink

      Re: RomanM (#19),

      It should be noted that as written by Prof. Mann, version1 requires looping 176851 times on each application of lowpassadaptive.m. During each loop, the routine calculates the variance of the observed temperatures, a value which does not change for any of those 176851 times. To avoid long waits, that repeated calculation was removed and several other minor changes were made to reduce calculation times. In the second (corrected) version, the total number of loops for each year is 5151, a much smaller figure. Of course, all of the looping could have been avoided entirely had Prof. Mann realized that (given the constraints he had placed on the weights – all positive and totaling 1) the entire optimization can be restated as a very simple problem in quadratic programming for which multiple programs exist (some in Matlab) providing solutions with greater than two decimal place accuracy in a quick and easy fashion.

      Michael “I am not a statistician” Mann does not seem to be a much of a programmer either. As I said in #1, it sure seems none of the reviewers bothered to read the code. On the other hand, it seems that they did not understand the paper either; otherwise there can be no way they had given green light for this garbage.

  19. Willis Eschenbach
    Posted Sep 16, 2008 at 9:41 PM | Permalink

    MarkR, you say:

    Have you tried posting a copy of your comments in #17 on Real Climate? You might get an answer from Prof. Mann?

    BWA-HA-HA-HA … oops, sorry, Freudian slip, what I meant to say is that your suggestion would make an interesting experiment. I and others have had little to no success in this arena … however, RomanM, give it a try and report back what happens.

    w.

  20. Willis Eschenbach
    Posted Sep 16, 2008 at 9:55 PM | Permalink

    Mann is clueless about smoothing. He thinks the best smoother is the one that minimizes the squared difference between the smooth and the last few datapoints in the series. He originally claimed that nonsense in his GRL paper about smoothing. I wrote a reply to GRL, but they said I was being too hard on Mann … could be, I don’t think much of his statistical abilities.

    Think about his claim for a minute … the theoretical smoother that would minimize the difference between the smoother and the data points themselves is the smoother which does absolutely no smoothing at all. That would have a squared difference of zero … but I don’t think that’s what we want, is it?

    In fact, we want the smoothing algorithm that minimizes the error, not between the smoother and the data points, but between the smoother and the eventual smoothed line (which, of course, will only be revealed when we have more data points in hand.)

    However, despite the fact that we can’t check the difference between the smoother and the eventual smoothed line at the end of the dataset, we can test and compare it anywhere in the dataset other than within half the smoother width of the end. I have never found a dataset where that test showed that any of Mann’s options (minimum slope, minimum roughness, and minimum smooth) could outperform the smoothing algorithm which I developed and use myself. This means, of course, that his latest failed attempt (average his three smoothers) will also not outperform my algorithm.

    I probably should repolish and resubmit my paper to GRL … another project for the list.

    w.

  21. Posted Sep 16, 2008 at 11:22 PM | Permalink

    re RomanM’s pdf (good read!) ,

    Basically, there is simply no real scientific content to the entire paper.

    Yes.

    There is one further point that needs to be made. It is clear that
    this “method” will likely be used (or has already been used
    possibly?) in smoothing paleo reconstructions of temperatures. What
    can one say about the interpretation and the inherent uncertainty
    bounds of such results?

    As I said, lowpassmin.m, is used everywhere in the new Mann paper. Uncertainty increases, this method is extremely sensitive to measurement errors, see

    http://www.climateaudit.org/?p=1681#comment-114704

    and

    http://www.climateaudit.org/?p=2541#comment-188580

    Similar thing happens with real statistical smoothing,

    http://www.climateaudit.org/?p=1681#comment-114062

  22. Willis Eschenbach
    Posted Sep 17, 2008 at 1:40 AM | Permalink

    Perhaps I should explain my method for handling missing data in a centered smoothing algorithm. I apply it with a Gaussian smoother, but it can be used with any kind of smoother.

    A smoother usually works by multiplying a string of N data points by a “kernel”. The kernel is a string of values, of the same length N as the string of data. The kernel values have the additional property that they sum to one (1). Value by value the data is multiplied by the corresponding kernel value. Then the resulting amounts are summed. This sum is the “Gaussian average”. It is a weighted average, with the kernel containing the weight as a percentage for each data point.

    Now the question arises, how do we deal with missing data? Start with a single missing data point. I reasoned that we know how much of the total weight we are missing. Let’s say that single missing data point gets 11% of the weight (the corresponding kernel value is 0.11). From that we know that on average we’re at 89% of the eventual value. So our best guess (absent other knowledge) is whatever total we had up to that point, divided by 0.89 (1 minus the missing data weight).

    That was my insight. I just keep a running total of how much data I have in hand. If I end up with 73% of the data, I do not consider whether it is from the “past” or the “future”. I do not concern myself with whether the missing 27% is in one year, or three separated years, or three missing years in a row. I just divide my result by the percentage of data weight present. That’s the best estimator I’ve found for the missing data. It does not pad the data or extend it. It just take what it has as the best guess, and adjusts the total accordingly.

    The effect of this is that at the beginning and the end of a block of data, my algorithm transitions smoothly from being an acausal smoother (a centered smoother which averages based on both the future and the past data) to being a causal filter (an uncentered smoother which considers only past and present data).

    As far as I know I invented this method … but then, I don’t know very far, so it probably already has a name.

    Since the transition from acausal to causal is slow and even at the end of the dataset, the possible error also increases slowly. (Remember, this is the error between this estimate and the final future smoothed actual result). The is particularly true with a Gaussian smoother, because much of the weight is near the center. Because of this, the error at the end is minimized.

    My best to everyone,

    w.

    PS – I also found that it is possible to improve this algorithm marginally, by noting that if the data is rising, the estimate by this method is low, and vice versa. This means that the error is somewhat correlated with the trend of the data. However, the gains in using this method are tiny, visually indistinguishable from the basic method.

    • Posted Sep 17, 2008 at 6:35 AM | Permalink

      Re: Willis Eschenbach (#25),

      It does not pad the data or extend it.

      Can’t you replace your

      non-symmetric filter + no data padding

      with equivalent

      symmetric filter + data padding ?

      Then you could compare more easily with Mann’s filter.

      One should note that Matlab’s filfilt.m does ‘minimum roughness’ anyway,

      http://www.mathworks.com/access/helpdesk_r13/help/toolbox/signal/basic19a.html

      filtfilt reduces filter startup transients by carefully choosing initial conditions, and by prepending onto the input sequence a short, reflected piece of the input sequence. For best results, make sure the sequence you are filtering has length at least three times the filter order and tapers to zero on both edges.

  23. Spence_UK
    Posted Sep 17, 2008 at 2:13 AM | Permalink

    As Willis points out, this kind of smoothing is really a statistical exercise, where one needs to predict future values. Mann and Brohan seem to have difficulties in understanding this.

    Ah – but Mann and Brohan know what the future values are, their friends Dr Hansen and Dr Schmidt informed them, and they can therefore judge which is best by eye using this special knowledge. Only card-carrying climate scientists are endowed with this special knowledge so only their judgement is considered credible.

    😉

  24. Steve McIntyre
    Posted Sep 17, 2008 at 7:13 AM | Permalink

    We’ve seen the same lack of understanding in MBH98. The entire apparatus of expanding the “rectonstructed” temperatue principal components to gridcells and taking averages leads to weights for the RPCS which do not change. The weights can be calculated once, saving millions of calculations.

  25. RomanM
    Posted Sep 17, 2008 at 10:26 AM | Permalink

    Thanks to everyone for taking time to read the document and post comments. Let me reply to several of the observations and suggestions:

    Re: MarkR (#20), and Willis

    Have you tried posting a copy of your comments in #17 on Real Climate?

    I get enough of being ignored or abused elsewhere. 😉 I don’t see much point in taking it to a partisan environment which in the past has been unable to admit errors. If anything, submitting a note to GRL might be more to the point. However, for me the fun is always in the chase – doing the math, solving the problem, etc. Once that is done, writing up papers has always been an onerous task, bogged down in references and other niggly publication details. As well, I am still trying to decode the Matlab filtfilt algorithm’s way of incorporating its initial calculated “steady state” conditions into the filtering process. When I have that, I can then determine the effect of each observation on the smoothed value at any other place.

    Re: Willis Eschenbach (#22),

    Think about his claim for a minute 
 the theoretical smoother that would minimize the difference between the smoother and the data points themselves is the smoother which does absolutely no smoothing at all. That would have a squared difference of zero 
 but I don’t think that’s what we want, is it?
    

    However, despite the fact that we can’t check the difference between the smoother and the eventual smoothed line at the end of the dataset, we can test and compare it anywhere in the dataset other than within half the smoother width of the end. I have never found a dataset where that test showed that any of Mann’s options (minimum slope, minimum roughness, and minimum smooth) could outperform the smoothing algorithm which I developed and use myself. This means, of course, that his latest failed attempt (average his three smoothers) will also not outperform my algorithm.

    It may be part of what we want under some circumstances if the intent is to find some sort of “optimal” smooth. This is why in the document I mentioned that Mann uses a “40 year” filter without ever defining it and why it is important that it actually be defined in some rigorous way. Obviously, in the unconstrained case, a less smooth series can have a lower mse and the original series fits itself best with mse = 0. However, if well-defined conditions are placed on the smoothed series, it makes the problem mathematically meaningful and minimum mse can then be a measure of optimality . E.g., these conditions could be numeric bounds on the possible variation of the sequence. The omission of such conditions in the paper is one of the major reasons that the paper has no genuine scientific value. As well, that is also why any meaningful comparison of your filter to others needs to take into account the smoothness level required and/or achieved in both results.

    Re: UC (#24),

    As I said, lowpassmin.m, is used everywhere in the new Mann paper. Uncertainty increases, this method is extremely sensitive to measurement errors


    Your references to your earlier comments are both interesting and helpful. After examining the calculations involved in the Matlab filtering functions, I would be hard put to come up with an interpretation of what the “40 year” smoothed temperature for a given year actually estimates or represents. There are many properties of the Mann smoothie that I doubt the users of it fully understand regarding the potential amount of bias in using that smoothed value to estimate what the actual temperature might have been. This leads to a more drastic underestimate of the uncertainty level. Using IPCC probability jargon, it is “almost absolutely certain” that most viewers of a hockey stick graph will be misled in interpreting a Mann smoothed graph.

    • RomanM
      Posted Sep 26, 2008 at 12:49 PM | Permalink

      Re: RomanM (#30),

      As well, I am still trying to decode the Matlab filtfilt algorithm’s way of incorporating its initial calculated “steady state” conditions into the filtering process. When I have that, I can then determine the effect of each observation on the smoothed value at any other place.

      I think I have it! Like UC, I also like to view a filter as a matrix which tells me what weight each observation in the sequence has on a given smoothed value. The complication in trying to do so in Mann’s case was not only the “optimal initial conditions” of filtfilt and how these initial conditions are implemented in the filter function, but the padding and even more padding in the black box of the Mann methodology. It occurred to me today to instead use the linearity of the overall procedure to construct the filter matrix. If you apply the smoothing to a sequence consisting of a one in the kth position and zeros everywhere else, the resulting smooth is the kth column of the matrix. I wrote a short program in Matlab to calculate the matrix using the lowpass.m function given by Mann:

      function [fmatrix] = mannmat(n, freq, be, en)
      fmatrix = zeros(n);
      ident = eye(n);
      for k = 1:n
      [smooth, mse] = lowpass(ident(1:n,k), freq, be, en);
      fmatrix(1:n, k) = smooth;
      end

      Here, n is the length of the sequence to be filtered, freq is the “frequency” of the smooth using the butterworth filter, and be and en determine the type of padding to be used at the beginning and end of the data sequence: 0 for mean padding, 1 for reflection and 2 for reflect-and-flip. The sequence is then multiplied by the matrix to apply the smooth. The result was checked for some of the be and en values against lowpass.m applied to the 158 values of the 1850-2007 Hadcrut temperature data supposedly used in the paper and the differences were small enough to be accounted for by calculation rounding. (If Prof. Mann chooses to use the program, I hope he gives me some credit for it. 🙂 )

      Some of the results are particularly interesting. For the case n = 158, freq = 1/40, be = 2, and en = 2 (the “minimum roughness” case at both ends – this is the padding shown in UC’s graph (#44)), the 2007 temperature has the weights:

      0.61505,
      0.70775
      0.80268
      0.89909
      0.99620

      in determining the smooth for the years, 2003 to 2007. Seems a little like endpinning! Another interesting observation: Which year has the second largest weight for the smooth at 2006? Nope, not 2006, but 1995! It should be noted that these weights are calculated without using the observed temperature sequence. Here is a graph of the contribution for each of the final 11 years towards the smoothing of the entire sequence (a plot of the last 11 columns of the matrix):

      Note the effect of the last value towards the smoothing at times far removed from it. It also shows the low weight of values near 2007 in determining the smoothing for their own year. That appears to be a side-effect of the padding process. The cases be = 0, en = 0, and be = 1, en = 1, are considerable more uniform.

  26. Mark T.
    Posted Sep 17, 2008 at 10:55 AM | Permalink

    What I don’t get is that the point of smoothing isn’t to minimize the MSE (assuming “optimal” means MMSE) between the data points and the smooth, it is to minimize the MSE between the true state of the system, i.e., the actual temperature or proxy as it is transformed into temperature, and the calculated smooth. A Kalman filter will do this, as an example (UC has some good posts on this floating around in here, too).

    Mann’s solution is by no means optimal in any sense that I know of (and he apparently admits this, but uses it anyway). Nor is it “adaptive,” at least not in the sense that I ever learned (or used). Willis is correct, what he is trying to minimize is absolutely meaningless.

    Good work, btw, RomanM. The use of var() to calculate the MSE rather than taking the root of the sum of the squares is puzzling. Any fixed offset between smoothed and indata will get removed using var(), providing an incorrect result. Very strange, even if you ignore the purpose of smoothing issue. The more I think about this, the more baffled I get.

    Mark

    • RomanM
      Posted Sep 17, 2008 at 12:53 PM | Permalink

      Re: Mark T. (#31),

      I agree with you 100% that the paper is pretty much a meaningless excercise. In a sense, I suspect what Mann is trying to do is to represent the data points as closely as possible with a sequence of a given smoothness level where he interprets “as closely as possible” as minimum mse. My guess is that he somehow had the misimpression that the means of the three fitting series are each equal to the mean of the original series despite the fact that this will obviously not be the case. This would then imply to him that the weights should be constrained to have a total one (for unbiasedness), the means of the differences would equal 0, and then he could calculate his mse using the built-in function var instead of mse (which is also built-in in Matlab).

      What he sees as the “true state of the system” is the smooth in the interior of a series where the endpoints play a lesser role and where all three of his fitting smooths are almost identical. He spends a lot of effort trying to demonstrate that his method is “superior” to the other three individually when you truncate the series at selected points as if that would prove something. I think that it gets particularly bizarre when he states:

      Unfortunately, the relative performance of the different smoothing methods cannot be evaluated for the present day because the true multidecadal smoothed behavior cannot be defined through the present. However, we can turn to model simulations as surrogates for plausible 21st century global temperature scenarios, as discussed in the next section.

      Then he goes on to look at the results of a bunch of models and states:

      These series were treated as 55 plausible surrogate 1850–2100 global mean surface temperature series.

      When you don’t have the real thing, bring out the models! But hey, this isn’t science, it’s climate magic.

  27. Mark T.
    Posted Sep 17, 2008 at 1:19 PM | Permalink

    true multidecadal smoothed behavior cannot be defined through the present.

    Ugh, he really believes this, doesn’t he? Two words: Kalman Filter. He can at the very least find the MMSE estimate of the true series mean and compare his nonsense to that. There may actually be a paper in this idea.

    Mark

  28. Manny
    Posted Sep 24, 2008 at 6:29 PM | Permalink

    I know nothing about low pass or Kalman, but I am not stupid : one cannot smooth out noise up to the last data point without inserting fake data.

    In Mann08, the last part of the red line (the CRUT) is at the core of their argument. If one stops it at 1986, as one should when smoothing over 40 years data ending in 2006, current temperature is no warmer than the MWP. No more argument. It seems to me these fancy smoothing algorithms should be denounced.

    • Mark T.
      Posted Sep 25, 2008 at 9:33 AM | Permalink

      Re: Manny (#34),

      I know nothing about low pass or Kalman, but I am not stupid : one cannot smooth out noise up to the last data point without inserting fake data.

      You shouldn’t make statements that could be at odds with what you admit you don’t understand. Besides the fact that the Kalman filter is a one-step predictor, the “smoothed output” for the current point (present data), is a function of current and past inputs. The Kalman also outputs out a prediction for the next state, but that’s immaterial.

      What Mann does is based on a 40-point FIR that is not lagged, for which it is not possible to get a smooth up to the current data unless some tricks are employed (which result in threads such as this one).

      Mark

  29. Willis Eschenbach
    Posted Sep 25, 2008 at 1:18 AM | Permalink

    Manny, you say:

    I know nothing about low pass or Kalman, but I am not stupid : one cannot smooth out noise up to the last data point without inserting fake data.

    Actually, I described how to do that exact thing without inserting fake data in #25.

    e.

    • Posted Sep 25, 2008 at 2:12 AM | Permalink

      Re: Willis Eschenbach (#35),

      Actually, I described how to do that exact thing without inserting fake data in #25.

      But then your filter is no more time-invariant, and comparison with middle values is not apples to apples. You can transform your filter to time-invariant version, but then you’ll need ad hoc padding.

    • RomanM
      Posted Sep 25, 2008 at 4:44 PM | Permalink

      Re: Willis Eschenbach (#35),

      Actually, I described how to do that exact thing without inserting fake data in #25.

      Willis, you did and you didn’t. It may appear that the two methods of smoothing a sequence of values (padding, and altering the smoothing algorithm) are different, but in essence they are two equivalent formats for describing how a smoothing algorithm handles the endpoints.

      In the case of padding, new “observations” are constructed from the existing values (e.g. by reflection, with or without flipping the values, repeating the end value, using the mean of all or part of the sequence, etc). Then the same smoothing algorithm which has been used in the center of the series (in essencce, using UC’s terminology, time-invariant) is applied to padded series in order to extend the smoothing to the end. When the smoothing is done as a linear function applied to the sequence (as in a weighted moving average) and the padding values are also linear functions of the observed values (as in each of the cases that I mentioned above), it is possible to alter the smoothing algorithm (in a unique way) near the endpoints in a fashion to appear similar to what you do with your truncated “kernel”. When the revised algorithm is applied to the original unpadded sequence it produces exactly the same results as the unaltered algorithm did on the padded sequence.

      Conversely, it is not difficult to show that when using the method you describe, a unique set of “padding” values can be calculated in a linear fashion from the sequence. When these values are appended to the end of the sequence and you apply your smoothing algorithm (unaltered without truncation of the weights), you get exactly the same result as you did without padding. Each of these two methods can be made to look like the other in its usage.

      UC’s point of “comparison with middle values is not apples to apples” applies if you do not compare the padded version of your smoother at the endpoints. As well, if you wish to examine your method’s behavior at the endpoints with that of another method, you should express them both in the same format for the comparison to make sense.

      I had written the above before seeing your latest post (while I had still not submitted this one). I will add that you are correct in that the error bars for any smoothing should be calculated using the format which gives an explicit expression for the relationship between the smoothed value and the original series. That is indeed the format which you prefer.

  30. Willis Eschenbach
    Posted Sep 25, 2008 at 2:44 PM | Permalink

    UC, thanks for your interesting post. You say:

    But then your filter is no more time-invariant, and comparison with middle values is not apples to apples. You can transform your filter to time-invariant version, but then you’ll need ad hoc padding.

    Not sure what you mean by “time-invariant”, could you explain that? What I do is as the filter approaches the end, it gradually changes from an “acausal” filter (averages using data from both before and after the date in question)to a “causal” filter (averages only using data from before the date in question). Is that what you mean by “time-invariant”?

    And while comparison of end values with middle values are not “apples-to-apples”, the error is small even on the last data point, and decreases rapidly away from the end. So there is a difference … but in general it makes very little difference.

    Also, using my method, the error bars for the final points are easy to calculate. This means that in good scientific fashion the best estimate plus error bars can be given, which of course once again makes the comparison apples to apples … what more do we need?

    w.

    • DeWitt Payne
      Posted Sep 25, 2008 at 3:32 PM | Permalink

      Re: Willis Eschenbach (#38),

      Just to see if I understand this point, I think that UC means the group delay changes as you approach the end point. That makes time non-linear.

    • Posted Sep 26, 2008 at 5:53 AM | Permalink

      Re: Willis Eschenbach (#38),

      Not sure what you mean by “time-invariant”, could you explain that?

      See RomanM’s #40. I like to think linear filters as matrix – vector multiplication, so : time-invariant filtering matrix is Toeplitz, http://ccrma.stanford.edu/~jos/mdft/Matrix_Multiplication.html .

      For example, a “causal” filter does not depend on future values. Now, this causal filter’s results at any point could be replicated by choosing an appropriate set of future values 
 but does that mean that it is not really a “causal” filter, since it now depends on future values?

      And again, causal filtering matrix is lower diagonal. I really don’t know how to classify filters that use invented future values 😉 But let’s define these padded filters in your way; then the filter is acausal, except for the last value. That’s the only output that doesn’t use _real_ future values.

      It is easy to get confused with this stuff, Mann has two peer reviewed journal papers that aim to validate this:

      The best procedure, it seems to me, is just to add an error bar on the end of smoothed series.

      For additive measurement errors this can be done. But in the sense Mann is using these figures (1987 … 2026 was the warmest 40 yr period ever) it is quite difficult to add error bars. We’d need a statistical model for the signal. And even with additive measurement errors, it seems that there’s a great confusion (*), see

      http://signals.auditblogs.com/2008/09/25/moore-et-al-2005/

      ( About a paper that Rahmstorf cites, and then shows that IPCC projections underestimate the global warming .. )

      (*) Me or Moore is confused. Or both are 😉

  31. Willis Eschenbach
    Posted Sep 25, 2008 at 6:36 PM | Permalink

    Roman M and DeWitt, thank you for your comments.

    I’ll have to think about the question of padding. I see what you mean, that my procedure can be replicated by choosing an appropriate set of padding values. However, I don’t know that that makes the two equivalent, or that it means that my procedure depends on the padded values.

    For example, a “causal” filter does not depend on future values. Now, this causal filter’s results at any point could be replicated by choosing an appropriate set of future values … but does that mean that it is not really a “causal” filter, since it now depends on future values?

    Like I said, I’ll have to think about that. The best procedure, it seems to me, is just to add an error bar on the end of smoothed series.

    w.

    • RomanM
      Posted Sep 25, 2008 at 7:38 PM | Permalink

      Re: Willis Eschenbach (#41),

      For example, a “causal” filter does not depend on future values. Now, this causal filter’s results at any point could be replicated by choosing an appropriate set of future values 
 but does that mean that it is not really a “causal” filter, since it now depends on future values?

      A filter is “causal” if it only depends on prior time values at ALL points and not just at a single point. In no way can a causal filter be “replicated” by future values. Your filter looks causal at the endpoint, but if you think about it so does every padded filter as well since the padding is calculated using existing prior values.

      In your case, by the way, the “replication” is unique, i.e. there is exactly one set of padding values which will give you the unpadded results. They are not too difficult to calculate on using R.

  32. Willis Eschenbach
    Posted Sep 25, 2008 at 10:17 PM | Permalink

    RomanM, many thanks. I had verified for myself that the “replication” is unique by the ancient Australian method known as “suck it and see” …

    With your definition, I guess any filter which is padded at the end would be neither causal nor acausal … which doesn’t make a whole lot of sense to me. I likely look at this a little different because of the way that I came to design my filter. Initially, I was not concerned with the endpoints, but only with possible missing values in the dataset and how to deal with them. My method treats all missing values (past, present, or future) equally, and deals with them the same way. It treats missing future values in the middle of the dataset the same as missing past values in the middle of the dataset, or missing future values at the end of the dataset. Not sure if that makes it causal, acausal, both, or neither.

    The reason I use my method, however, is not theoretical but practical — of all the methods I know of, including loess and lowess smoothing, my method gives the smallest errors. When I find a better method, I’ll switch.

    w.

  33. Posted Sep 26, 2008 at 3:23 PM | Permalink

    Excellent program, RomanM. Now, from this viewpoint MannGRL08 optimization problem gets interesting..

  34. Posted Sep 27, 2008 at 9:01 AM | Permalink

    .. but let’s first check how these smoothers respond to white measurement errors with unity variance:

    f2=mannmat(100,1/40, 2,2);
    f1=mannmat(100,1/40, 1,1);
    f0=mannmat(100,1/40, 0,0);

    close all

    plot(diag(f0*f0′),’b’)

    hold on

    plot(diag(f1*f1′),’g’)

    plot(diag(f2*f2′),’r’)

    legend(‘0′,’1′,’2’)

    This roughness thing seems to amplify noise near the first points!

    • RomanM
      Posted Sep 27, 2008 at 3:30 PM | Permalink

      Re: UC (#47),

      Your graph nicely demonstrates the need for large uncertainty bars for the smoothed when using the reflect-and-flip padding!

      I am still looking at the matrices to try to understand what is going on in the smoothing process by visually examining the matrices. Using the Matlab “surf” program, you get

      It sort of looks like a hybrid space vehicle! This matrix uses simple reflection at each end of the series (corresponding to be = en = 1). There is a sideways “fin” along with some “wrinkles” near both ends of the main diagonal. From their position, I am guessing that they might be an artifact of the reflection padding which magnifies the effect of some of the observations.

      • Posted Sep 28, 2008 at 2:03 AM | Permalink

        Re: RomanM (#48),

        Your graph nicely demonstrates the need for large uncertainty bars for the smoothed when using the reflect-and-flip padding!

        Yes, and it is formal explanation of this http://www.climateaudit.org/?p=1681#comment-114704

        We need one more matrix for Brohan,

        http://hadobs.metoffice.com/hadcrut3/smoothing.html ,

        continuing the series by repeating the final value.

        • RomanM
          Posted Sep 28, 2008 at 12:56 PM | Permalink

          Re: UC (#49),

          We need one more matrix for Brohan, … continuing the series by repeating the final value.

          Seeing this as a good way to learn some more Matlab, I wrote up a function to calculate the smoothing matrix for an endpinned moving average with an arbitrary set of weights (probably can be done easier, but I learned some things writing it) and ran it for a sequence of comparable length with the ones you used in your graph in #47 .

          function [hmat] = endpin(n, wts)
          wts = wts/sum(wts);
          lenw = length(wts);
          wmat = full(spdiags(repmat(wts,n,1), 0:(lenw-1), n, n+lenw-1));
          hmat = wmat(1:n,(2+floor(lenw/2)):(n-1 + floor((lenw)/2)));
          hmat = [sum(wmat(1:n, 1:(1+floor(lenw/2))),2), hmat, sum(wmat(1:n,(n + floor((lenw)/2)):(n+lenw-1)) ,2)];
          end

          hadwt = binopdf(0:20, 20, .5);
          hadleymat = endpin(100, hadwt);
          plot(diag(hadleymat*hadleymat’),’r’);

          The result:

          Hadley says they use a 21-point binomial filter which with that many points is close to being Gaussian. Unlike the Mann smoothie, points far enough away have no effect on the smoothed value at a given place. As you correctly suspected, the error bar for white noise at the last value is bigger (about twice as large) than at points in the middle of the sequence.

        • Posted Sep 28, 2008 at 1:54 PM | Permalink

          Re: RomanM (#50),

          As you correctly suspected, the error bar for white noise at the last value is bigger (about twice as large) than at points in the middle of the sequence.

          And that should be clearly visible in this figure http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual_s21.png

          But it’s not.

        • UC
          Posted Jan 11, 2013 at 8:13 AM | Permalink

          I think something is fixed now, see Figure 6. of the new HadCRUT4 paper,

          Click to access HadCRUT4_accepted.pdf

          Interesting to see when Mann et al will become aware of this, the impulse response of lowpassmin.m is quite longer than in the hadcrut smoothing case..

  35. RomanM
    Posted Sep 28, 2008 at 3:00 PM | Permalink

    Look how long it took them to figure out that their method exagerates the end value of the series! After an inconvenient downturn in the monthly temperatures early in the year, they suddenly realized that they were misleading the public and needed to ensure that they no longer “cause much discussion” through the use of “inappropriate” methodology. After a possible full year or two of lower temperatures, they could conceivably equally suddenly realize that the error bounds are indeed inaccurate and the upper bounds need to be raised. Perhaps we could speed this process up. The site does state

    We are continually reviewing the way that we present our data and any feedback is welcome.

    Do you think they really mean it? 😉

    • Posted Sep 29, 2008 at 7:25 AM | Permalink

      Re: RomanM (#52),

      These error bounds are clearly incorrectly calculated, and I’m afraid that it is not the only mistake in Brohan’s paper. It’s not like a typo, it is a clear misunderstanding how errors propagate.

      BTW,

      Re http://www.climateaudit.org/?p=3361#comment-292113

      2008/08 value seems to be 0.387 . My prediction was

      % Year Month -2 sigma Predict. +2 sigma
      2008 8 0.15206 0.34864 0.54521

      Not bad. But I guess GCMs did better job.

  36. Posted Sep 30, 2008 at 4:59 AM | Permalink

    ssatrend.m seems to be a non-linear filter, so RomanM’s method cannot be directly used. But I made a small modification, linearization about a trend,

    lin=(1:n)’;

    for k = 1:n [smooth] = ssatrend(ident(1:n,k)+lin, freq);
    fmatrix(1:n, k) = smooth-lin; end

    With this version the resulting matrix approximates the filter very well. And sensitivity to noise is again very clear:

    (n=177; embedding dimension 30, as used in http://signals.auditblogs.com/2008/09/25/moore-et-al-2005/ )

    **

    The asymmetry in figure of #47 seems to be due to a small programming error, lowpass.m pads beginning and end differently ( at the beginning, first value is copied ). Something that Mann can correct in the next version, without credit to CA.

  37. Posted Feb 6, 2009 at 6:29 AM | Permalink

    Time to update, as 2008 is available, using HadCRUT and minimum roughness:

    ..and just another way to look at this:

  38. RomanM
    Posted Jul 3, 2009 at 10:32 AM | Permalink

    Thanks, UC and Steve. I have now downloaded these files from the CA site.

    I’ve been trying to familiarize myself with the inner workings of SSA. It looks like there may be some useful aspects of the methodology, but I’m not sure that smoothing as done by the team is necessarily one of them.

4 Trackbacks

  1. By Jeff Id | Another Newyork Times on Jan 22, 2011 at 10:32 PM

    […] introduced himself to Climate Audit soon after he started his blog (here)>. He began with a variety of interesting technical analyses of Mann et al 2008 – […]

  2. By Jeff Id « Climate Audit on Jan 23, 2011 at 5:03 PM

    […] introduced himself to Climate Audit soon after he started his blog (here)>. He began with a variety of interesting technical analyses of Mann et al 2008 – […]

  3. By UC on Mannian Smoothing « Climate Audit on Jun 19, 2011 at 10:06 PM

    […] readers it is clear why Minimum Roughness acts this way, see for example RomanM’s comment in here (some figures are missing there, will try to update). But to me it seems that the methods used in […]

  4. […] Willis, a couple of days after the release of Mann’s smoothing article, UC observed that there was an error in Mann’s published algorithm, which was reported at CA here. […]