Like a Dog on a Bone

UC observed a couple of days ago that Hadley Center, authors of the pre-eminent temperature series, have suddenly identified an “error” in how they presented temperature data. For presentation of their smoothed temperature series in a part-year situation, their methodology calculated the average of months then available and used that to estimate the current year’s temperature for presentation purposes. For their influential graphic showing smoothed temperature series, they used a 21-point binomial filter (this is reported) extrapolating the latest number for 10 years. This obviously places a lot of leverage on January and February temperatures. (UC has replicated their smoothing method; he sent me code and I’ve confirmed that we can exactly replicate their smoothing methods.)

As has been widely reported, January and February 2008 temperatures are noticeably lower than last years. Here is a plot of Hadley Center GLB monthly temperatures, showing the two 2008 months in bold points. This ties into the monthly plot at the HadCRU site here.

These cold January and February 2008 temperatures have led to a noticeable downturn in the smoothed annual series. This has not escaped the notice of the Hadley Center, who were extremely quick off the mark to notice an “error” which resulted in graphical emphasis of a downturn (here)

We have recently corrected an error in the way that the smoothed time series of data were calculated. Data for 2008 were being used in the smoothing process as if they represented an accurate estimate of the year as a whole. This is not the case and owing to the unusually cool global average temperature in January 2008, the error made it look as though smoothed global average temperatures had dropped markedly in recent years, which is misleading.

Heading into the IPCC WG1 conference in Paris in February 2007, January 2007 was a very warm month. I thought that it would be interesting to plot the HAdCRU style result as of January 2007 and compare it to the January 2008 style (now excised from the website). The blue dots below show the effect of the CRU smoothing method used in 2007 incorporating Jan and Feb 2008 – showing the downturn, which caused the Hadley Center to notice the “smoothing error”. The black shows the present annual series – not using 2008 data – which is what is currently displayed on the Hadley Center website (prettied up and with pseudo-“error” bars.) The red dots show what their 2007 method would have yielded in February 2007, at the time of the IPCC WG1 conference.

They noticed the “smoothing error” like a dog on a bone when temperatures went down, but didn’t notice precisely the same “error” last year, when it yielded record high results. Looks like there are some pit bulls in England as well.

One minor curiosity which some reader may be able to explain. I compared the HadCRU GLB annual series – column 2 in http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual with the HadCRU GLB monthly series – column 2 in http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/monthly. I calculated annual averages from the monthly version and compared them to the archived annual version with the following result. I then manually compared values for 1861 (monthly: -0.811 -0.477 -0.491 -0.375 -0.765 -0.172 -0.308 -0.173 -0.379 -0.397 -0.410 -0.191 with an average of -0.4124167; as compared to the annual of -0.568. Perhaps this is explained somewhere. I didn’t see any explanations in the website explanations – if any one sees an explanation, I’d be interested.

Merely from looking at the monthly temperature histories, I urge readers not to draw any particular conclusions from a couple of cold months. The monthly history has many such cold downspikes and recoveries tend to be quite rapid. If the HadCRU results are an accurate history, one could just as easily look at this graph and argue that the most recent downspike was not as cold as corresponding downspikes in the 1980s. (Evaluating the HadCRU results is very difficult because their data as used is not disclosed.)

From my immediate view in Toronto, we still have banks of snow, which will still be here at the beginning of April in two days. I certainly don’t recall such a situation during my adult life – so the present downspike seems a little unusual from a Toronto perspective but I recognize that this is only one perspective.

This entry was written by Stephen McIntyre, posted on Mar 30, 2008 at 10:14 AM, filed under Surface Record. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

237 Comments

UC

Posted Mar 30, 2008 at 10:31 AM | Permalink

I calculated annual averages from the monthly version and compared them to the archived annual version with the following result. I then manually compared values for 1861 (monthly: -0.811 -0.477 -0.491 -0.375 -0.765 -0.172 -0.308 -0.173 -0.379 -0.397 -0.410 -0.191 with an average of -0.4124167; as compared to the annual of -0.568. Perhaps this is explained somewhere. I didn’t see any explanations in the website explanations – if any one sees an explanation, I’d be interested.

See http://www.climateaudit.org/phpBB3/viewtopic.php?f=6&t=142 ,

Response from officials,

The annual averages are calculated from an area average of the annual average in each data grid cell i.e. the data are averaged first in time, then in space. The method you have used – average first in space then in time – does yield slightly different results, but the two are not significantly different given the estimated uncertainties.

Steve: Thanks for this. I hadn’t noticed your other note. It’s interesting that we both independently examined data for the same year 1861 – the choice was obvious as the largest outlier, but we both crosschecked against an individual year.
Kenneth Fritsch

Posted Mar 30, 2008 at 10:35 AM | Permalink

I think it was either Christy or Spencer who commented that when one goes to adjusting data one must be careful and aware when making these adjustments, that while proper, the adjusters can have a tendency to be biased in one direction — like a dog on a bone.
Jryan

Posted Mar 30, 2008 at 10:35 AM | Permalink

Oh, also…. so we are taking historical January and february numbers to calculate the average for January and February this year? How is that supposed to make sense? Doesn’t that bias the graph to trend heavily in the direction of the predominant trend for the time period observed, regardless of actual temp?
Steve McIntyre

Posted Mar 30, 2008 at 10:52 AM | Permalink

They define their pseudo-“error” bars as:
* Columns 3 and 4 are the upper and lower 95% uncertainty ranges from the station and grid-box sampling uncertainties.
* Columns 5 and 6 are the upper and lower 95% uncertainty ranges from the coverage uncertainties.
* Columns 7 and 8 are the upper and lower 95% uncertainty ranges from the bias uncertainties.

IF there’s a difference between result arising from whether you take an average first spatially and then annually, as opposed to averaging first annually and then spatially, is this additional to the above factors or is it partly subsumed? The plot below compares the reported pseudo-“CIs” to the difference in order of averaging. Blue are the column 3-4 CI, red is the combined “CI”.
steven mosher

Posted Mar 30, 2008 at 11:02 AM | Permalink

I found it funny that this acausal filter is only misleading when it shows cooling.

By the way since Tammy has avoided finishing the fight over PCA, he has moved on to
easier challenges, losing a fight to Lucia first, and now he is content with bashing Anthony,
or as he calls it shooting fish in the barrel.

I suggested that he take on Hadley for this error, but I think he’s not likely to.

a man has to know his limitations
Greg Meurer

Posted Mar 30, 2008 at 11:05 AM | Permalink

Steve,

This further explanation from the Met office will help:

http://www.climateaudit.org/phpBB3/viewtopic.php?f=6&t=142&st=0&sk=t&sd=a&start=10#p3049

Greg
UC

Posted Mar 30, 2008 at 11:10 AM | Permalink

So, if 2008 is as cold as Jan-Feb suggests, they have to write

This is not the case and owing to the unusually cool global average temperature in 2008, the error made it look as though smoothed global average temperatures had dropped markedly in recent years, which is misleading.

and fix that s21 data again. It is quite interesting that ‘colder’ errors are found right away, but ‘warmer’ erros will remain ( this is still wrong). The whole picture gets biased very rapidly.
Dan Hughes

Posted Mar 30, 2008 at 11:37 AM | Permalink

You would think that after encountering a few of these things they would Verify the code and Verify the calculations.

I have not looked at the coding, but I suspect it can’t be a major piece of programing. Even a second pair of eyes should have assisted in finding a problem. We can only hope that someone in the industry begins to understand that independent Verification is not rocket science. (Snarky aside. Maybe because it is climate science, independent Verification is thought to be not necessary.)

I haven’t thought it through completely, but how can an equation and associated coding be developed to do the required arithmetic and not know that the end of the data stream has been reached. It seems that the following storage locations (in an array, maybe) have been filled with the last value in the data stream. If the values were 0.0 wouldn’t the correct results be obtained?
VG

Posted Mar 30, 2008 at 11:39 AM | Permalink

Is this correct?
http://www.cru.uea.ac.uk/cru/climon/data/themi/g17.htm
Richard Sharpe

Posted Mar 30, 2008 at 11:40 AM | Permalink

Evaluating the HadCRU results is very difficult because their data as used is not disclosed.

Am I to understand that they do not disclose their data?

If that is the case, there can be only one conclusion.
M.Villeger

Posted Mar 30, 2008 at 11:44 AM | Permalink

It explains how any cooling will always be a consequence of Global Warming… Fascinating indeed.
mccall

Posted Mar 30, 2008 at 11:55 AM | Permalink

One had hopes that the balance of Hadley (vs. GISS slant) would be sustained — apparently not.
Martin Å

Posted Mar 30, 2008 at 12:05 PM | Permalink

I think displaying data like this is misleading. They present one graph as if it consists of measurement data only. But the end points contains as much extrapolated data as measured data. They are trying to answer the question “What is the average temperature from 10 years ago until 10 years in the future”, with measurement data. IT IS IMPOSSIBLE.

The only sensible question to answer with measurement data is what happened from the past until now. E.g. use an exponential IIR filter. A non weighted filter (average temperature for the preceding 10 years) might be easier to understand for a lay man.

The same goes for tree rings. The average temperature during the latest x years might have had an influence on the average tree ring width the latest x years. This is the only physical comparison, and there are no end point effects.

Then, if you want to do a more high resolution investigation, to resolve changes in the latest years, you have to do that separately (with corresponding larger error bars).
UC

Posted Mar 30, 2008 at 12:11 PM | Permalink

VG,

Is this correct?
http://www.cru.uea.ac.uk/cru/climon/data/themi/g17.htm

Excellent. Now let me pass that data through Mann’s smoothing method called minimum roughness (using his lowpass.m),

( http://signals.auditblogs.com/files/2008/03/mann_smooth.png )

🙂
Raven

Posted Mar 30, 2008 at 12:11 PM | Permalink

Perhaps the folks at Hadley should attend Med school. This study suggests 4th med students have something to teach them:

Arocha & Patel (1995) examined the effects of inconsistent data on subjects’ hypotheses generation and evaluation. Early, intermediate and advanced novices (2nd, 3rd & 4th year medical students) showed differences in terms of their use of co-ordinating operations. These are responses to inconsistent data that are commonly used in scientific and everyday reasoning and include; ignoring data, excluding data, re-interpreting data, re-interpreting hypothesis, modifying a hypothesis to fit the data, and changing a hypothesis altogether. Early and intermediate novices performed more data operations and they more frequently ignored or reinterpreted inconsistent data, whereas advanced novices more often changed their hypotheses to account for the data, changes which decreased the inconsistency with the data. Advanced novices generated a number of early hypotheses and this allowed them to narrow their initial hypothesis set in the face of inconsistent evidence and make fewer data reinterpretations.

Click to access csrp508.pdf
steven mosher

Posted Mar 30, 2008 at 12:22 PM | Permalink

re 14. UC I expect Tamino to come over here ANY second and defend the honor of mann.

See if he posts my post.

“Here you go Tammy.

Tell us about Dr. Manns Minimum roughness smoothing approach.

1. You ran away from a fight with a mining guy.
2. Lost to a girl.
3. resorted to attacking a weather guy.

Take on Dr. Mann, it’ll do your manhood some good

http://www.climateaudit.org/?p=2955#comment-229796”

Let’s see what the data analist of climate science can make out of this.
Jesper

Posted Mar 30, 2008 at 12:42 PM | Permalink

Steve,

Here’s a ref for Hadley error calculations, fyi. Perhaps something to sink your statistical teeth into at some point.

Jones, P. D., Osborn, T. J. & Briffa, K. R. Estimating Sampling Errors in Large-Scale Temperature Averages. Journal of Climate 10, 2548-2568 (1997).
Mike C

Posted Mar 30, 2008 at 12:48 PM | Permalink

When presenting the HadCRU data, I’ve had to point out that they include an average of an incomplete year. It is amusing that their misleading presentation is only corrected when it shows cooling. At least now I only have to point out that the last year (red plot) on their graph is an average of monthly temperatures of the current year.

5 and 16 Mosh
Let’s face it, Tammy, aka Tamino, aka Grant Foster is a waste of time. His posts and comments that repeatedly promote certain candidates and political party should be a clue as to the motivation to play games with words and statistics; to scare up votes. The first post of his that I read was on upward trends in the satellite data. Not once did he mention that the satellite record begins right after the Pacific climate shift, that there are two volcanoes early in the record and a string of El Ninos late in the record, which are the causes of the upward trend in the satellite data.
RomanM

Posted Mar 30, 2008 at 1:23 PM | Permalink

#17 Jesper

I think the one you want is the more recent paper

Brohan, P., J.J. Kennedy, I. Harris, S.F.B. Tett and P.D. Jones, 2006: Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. J. Geophysical Research 111, D12106, doi:10.1029/2005JD006548

It is available for money at the AGU site: http://www.agu.org/pubs/crossref/2006/2005JD006548.shtml however a free sort-of-unofficial version of it can be found at
http://www.cru.uea.ac.uk/cru/data/temperature/HadCRUT3_accepted.pdf .

If any problems were to be found in the earlier work, you would just be told that they “have moved on… so it doesn’t matter.” I have skimmed through it and my favourite part is the section “Blending land and marine data” where they describe how they combine sea and land temperatures for a grid box. They use weights which depend on the relative uncertainty of the two values so that “so that the more reliable value has a higher weighting than the less reliable”. A statistician might think it more appropriate to use the relative areas of sea and land as weights since the cru method introduces bias and underestimates the uncertainty. However, at the end of that section we are told that

This is reasonable if it is assumed that, in any grid box, the land temperature and SST values for that box are each estimates of the same blended temperature. In reality this may not be true (see section 6.4) and an area-weighted average might in some cases give a more physically consistent average temperature. However, the choice of blending weight makes very little difference to large scale averages, so the extra complexity of a blending algorithm which accounts for possible land-sea temperature anomaly differences is not justified.

It seems to be the mantra of the profession: “It’s wrong, but it doesn’t matter because it won’t make a difference in the answer”. Where have we heard that before?
JD

Posted Mar 30, 2008 at 1:36 PM | Permalink

Mike C is correct, Tamino is merely trying to distract Steve from more important issues.

Steve, is it possible that this filtering error goes some way towards accounting for the steep gradients seen on some of the IPCC graphs?

I am amazed that such an influential organisation can make such a rudimentary slip-up. It is a very important point. What is the best way to plot a trend line through a highly statistical data set where the last point/s will tend to cause a distortion? Obviously there are several approaches that could be used, but what is the most reliable method?
Jon

Posted Mar 30, 2008 at 2:01 PM | Permalink

and fix that s21 data again. It is quite interesting that ‘colder’ errors are found right away, but ‘warmer’ erros will remain ( this is still wrong). The whole picture gets biased very rapidly.

Exactly. I am reminded of something from some years back. Caltech, JPL, and PBS under the supervision of Caltech Physics Prof. David Goodstein did an educational film series called “The Mechanical Universe”. The beginning and end of each episode contained clips of Goodstein lecturing to the Caltech Freshman Physics class. One of the “episodes” covered Milikan’s oil-drop experiment. During the end-clip, Goodstein starts talking about the famous error that Milikan made in measuring the charge-of-the-electron. You see, Milikan thought he knew what the answer should be. He worked carefully to remove all the sources of experimental error he could and ended up distorting the data to match more closely what he expected rather than the true answer.
Leif Svalgaard

Posted Mar 30, 2008 at 2:06 PM | Permalink

20 (JD):

but what is the most reliable method?

If you are ‘filtering’ over 21 points, don’t calculate the filtered value for any of the first 10 or the last 10 data points. IMHO, this is the only method that does not introduce bias or extra noise into the system.
mzed

Posted Mar 30, 2008 at 2:09 PM | Permalink

Uh…hadcrut3vgl shows .075 for January, .178 for February…are you sure you don’t have these two datapoints mixed up on your graph? There should be an uptick from Jan. to Feb…
Vic Sage

Posted Mar 30, 2008 at 2:35 PM | Permalink

Unfortunately, this is not surprising.

Sadly, the new definition of good science is anything that agrees with the desired outcome.
UC

Posted Mar 30, 2008 at 2:46 PM | Permalink

#22

That’s the good way. And if other ways are used, it has to be indicated in the result that end-points vs. middle-points are not apple-to-apples.

Some related comments I’ve wrote here:

http://www.climateaudit.org/?p=1681#comment-114704

http://www.climateaudit.org/?p=1681#comment-114062

http://www.climateaudit.org/?p=2541#comment-188580
JD

Posted Mar 30, 2008 at 3:02 PM | Permalink

Leif,

Thank you for the reply. That makes good sense and I wholly agree. However, whilst this principle is intrinsically safe, it does leave a sizeable gap between the end of the curve and the last data record. This can be seen in the post by UC, above.

With hind-sight, I don’t think I phased my post very well. By filtering, I really meant any smoothing or trend determining / function fitting technique. So maybe the real questions are:

What methods are being used, have been used or should be used?
How reliable are the fits/interpolations to the last record?

These questions are posed with climate data in mind and as such any real trend is unknown and that the effective noise is large.

I believe, the “error” that Steve discovered and brought to our attention demonstrates in importance of this issue.
Leif Svalgaard

Posted Mar 30, 2008 at 3:10 PM | Permalink

26 (JD):

However, whilst this principle is intrinsically safe, it does leave a sizeable gap between the end of the curve and the last data record.

Such a gap is scientifically honest. Now, if we believe we have a model for how the expected quantity should vary [maybe cyclically, or with an agreed upon trend] one can exploit that ‘foreknowledge’ in any number of ways. But such much be stated and discussed, and understood, beforehand. I know, of course, that that is precisely the problem, but I don’t see any way around it. I simply get suspicious [and dismiss the claim out of hand] when I see ‘smoothed’ values plotted that include the endpoints.
steven mosher

Posted Mar 30, 2008 at 3:39 PM | Permalink

a 21 point Binomial filter? Clearly it should have been a 1000 point binomial filter.

It would be fun to plot the smooth for various filter widths
sylvain

Posted Mar 30, 2008 at 3:53 PM | Permalink

William M. Briggs had a couple of interesting post a few weeks ago about this:

http://wmbriggs.com/blog/2008/03/09/you-cannot-measure-a-mean/

http://wmbriggs.com/blog/2008/03/10/it-depends-what-the-meaning-of-mean-means/
John Lang

Posted Mar 30, 2008 at 3:55 PM | Permalink

All of this smoothing just masks what is really going on with the climate.

The smoothed HadCRUT3 line does not tell you that something unusual happened in the climate in 1878, 1893-1895, 1945-1947, 1984, 1998 and 2007.

It does pick up the general climate trends accurately from 1911-1944, but the stable cooler climate of 1947-1976 looks like it is just part of a general warming trend which did not start until 1976.

The 21 point binomial filter line needs to be plotted together with the monthly data so that more accurate information is available to the reader.
Jim Arndt

Posted Mar 30, 2008 at 4:07 PM | Permalink

Hi,

I have noticed that since the 1930’s there has been a definite downward trend. Has some here noticed that. But I guess it really depends on your start date. Still 75 years ago it was hotter than today even with UHI effects, if you look at Steve’s Annual vs. monthly.
Bill Mecorney

Posted Mar 30, 2008 at 5:47 PM | Permalink

UC #14 Looks like a Stickey Hock to me. I wish I had half the tech savvy as you folks. My question
is a simple one. Cores of hundreds year old stubby trees that grow for two to three weeks at a time
per year are generating this volume of complicated point counter point to the fifth and sixth
decimal? Then somebody freehands a line through a variable number of dots on a “Graph” and the Earth
is DOOMED? I’m waiting for Godzilla, something I can actually see. I would much prefer being in the
position of Doubter, given the quality of “research” and recalcitrant guff from the Warmetariat.
Carry on, and gratitude to Steve and Ross.
dover_beach

Posted Mar 30, 2008 at 6:08 PM | Permalink

Mosher at #16:

“Let’s see what the data analist of climate science can make out of this.”

Please tell me you meant that, Steven. Either way, I won’t be able to visit Tamino’s blog or read a reference to him or it, and keep a straight face. What a great way to start the week.
Hu McCulloch

Posted Mar 30, 2008 at 6:21 PM | Permalink

Leif Svalgaard (#22) wrote,

If you are ‘filtering’ over 21 points, don’t calculate the filtered value for any of the first 10 or the last 10 data points. IMHO, this is the only method that does not introduce bias or extra noise into the system.

Agreed. But if one really insists on continuing the filter (technically smoother) to the endpoint, a more reasonable approach than endpadding (or Mann’s inflated endpegging algorithm, debunked here earlier somewhere) would be to simply lop weights off the filter, so that the last value of an eg 21 point filter would simply use the first 11 weights, renormalized to sum to unity. An appropriate graphical way to show that the formula has changed is to show the last eg 11 points as a dotted continuation of the solid complete filter curve.
steven mosher

Posted Mar 30, 2008 at 7:35 PM | Permalink

Re 33. One nice thing about leavng spelling mistakes in all my posts is that sometimes
when I mean it, I can pass off something nasty as unintentional. Was there a spelling misatke
in post 16?
Leif Svalgaard

Posted Mar 30, 2008 at 7:43 PM | Permalink

34 (Hu): Yes, some way of telling the apples from the oranges could eb chosen. My problem with that is that this fine point will be completely lost on the ‘unwashed masses’ and the media. Better not to pretend we know more than we do.
David Stockwell

Posted Mar 30, 2008 at 8:26 PM | Permalink

Hadley Center, authors of the pre-eminent temperature series, have suddenly identified an “error” in how they presented temperature data.

Is there a reference for them describing this as an “error”? The method Steve describes is a standard smoothing approach used in climate science.

For their influential graphic showing smoothed temperature series, they used a 21-point binomial filter (this is reported) extrapolating the latest number for 10 years.

If they are really trying to pass of a standard smoothing approach as an “error” in order to change the method,
1) it is an untruth, and 2) this is like changing accounting practises in order to conceal unfavourable results.
Accountants might use the “f” word for that.
Hu McCulloch

Posted Mar 30, 2008 at 9:15 PM | Permalink

Re #34, the 6/9/07 CA thread debunking Mann’s pretentious 2004 GRL exegesis of endpegging is at http://www.climateaudit.org/?p=1681. (See my 2 bits at comment #40.) Consistent application of Mann’s rule to the current data would require that the current value of the 21-year average temperature be last month’s remarkably low reading.

This is just as preposterous now that the trend is cooling as it was when the trend was warming.

Note that a 21-year average of annual averages of monthly data is ultimately just a 252-month average of monthly data. Mann’s formula would require flipping the preceding 251 months over last month and then applying the 252-month average to these figures. This would identically just give last month’s undoubtedly noisy reading as the quintessence of current climate.

This was “science” when the trend was warming, but is now “erroneous” when the trend is down. Curiouser and curiouser…
Rob

Posted Mar 30, 2008 at 9:18 PM | Permalink

From my immediate view in Toronto, we still have banks of snow, which will still be here at the beginning of April in two days. I certainly don’t recall such a situation during my adult life – so the present downspike seems a little unusual from a Toronto perspective but I recognize that this is only one perspective.

Adelaide, South Australia, just smashed the record for any Australian city in recording 15 days straight of maximums over 35 deg. C. (in March of all things.) Could be confirmation bias on my part, but when that kind of thing happens in our summer (Autumn), something as extreme, but inverted, seems to be happening in the NH.

Just another perspective…
Carl Smith

Posted Mar 30, 2008 at 9:34 PM | Permalink

Meanwhile, many parts of Australia’s eastern states extending right up into central Queensland smashed existing March minimum temperature records two nights in a row. Some places broke records that have been intact since the late 1800’s!
Matt A

Posted Mar 30, 2008 at 9:59 PM | Permalink

Sorry I cant provide a reference, however, while things are undoubably hot in Adelaide, Sydney has experienced the coldest summer in 25 years. This is largely linked I believe to rainfall patterns and la nina/el nino effects on Australias climate.
Geoff Sherrington

Posted Mar 30, 2008 at 10:13 PM | Permalink

Re 40 Rob.

But of course Melbourne’s last fortnight of March has been well below average temp and its monthly rainfall average fell in just 2 days. Also the daily minimums for the whole of February 2008 were historically low, on average some 2 deg C below the 100 year average (however that might be calculated). Put that in your weighted 21-point non-linear weighted moving average and it still feels COLD to me.

Please, one swallow does not a summer make. Look at the picture over decades for less anxiety.

Re # 35 Steven Mosher, I used to own an analytical chemistry laboratory so I guess I did not slip up in spelling my work so often. I was not an analist because I preferred to be an amateur gynaecologist and there’s an inch or so of difference, the error being significant.
Carl Smith

Posted Mar 30, 2008 at 10:25 PM | Permalink

My comment above was based on a TV news report, however here is a page listing notable Australian weather data scavenged from the BoM for the 30th March:

http://www.australianweathernews.com/news/2008/080330.stm

Those that are interested can use the calendar LHS to check other days.
Ian Castles

Posted Mar 30, 2008 at 11:29 PM | Permalink

Re #40. The estimated average mean temperature of South Australia for the latest available month (February 2008) was 1.3 C cooler than the 1961-90 average, and the 10th coolest February in the 59-year record beginning in 1950. These are official Australian Bureau of Meteorology figures.
Geoff Sherrington

Posted Mar 30, 2008 at 11:31 PM | Permalink

Re HADCRU and extrapolation

We have recently corrected an error in the way that the smoothed time series of data were calculated.

Wrong. This is not an error. It is a standard math procedure. It just became inconvenient.

Re # 20 JD

ALL methods that project into the future are guesses. Nature does not know our clever maths. Sure, we can make pretty projections as an aid to understanding, or dot them as Hu suggests, but they should be clearly labellad as GUESS. Even if we have years of past data that accurately fits a sine curve, we have no justification for projecting it unless we have an “authenticated” physical explanation for the curve, PLUS a knowledge that the curve will behave in the future. We can never know the latter, so science would lead us to use the GUESS word even when it’s high probability guessing. Otherwise, we get egg on the face like HADCRU did in this thread.
Mark T

Posted Mar 30, 2008 at 11:38 PM | Permalink

I would argue that using a non-causal filter to describe otherwise causal processes is indeed an error. If it were only used for presentation (i.e., to make it pretty) I wouldn’t have as much a problem. However, this is used as if it is real data.

Mark
Ian Castles

Posted Mar 31, 2008 at 12:10 AM | Permalink

Further to #39, #40, #43 and #44. For Australia as a whole, the Bureau of Meteorology estimate of mean temperature for February was the 9th coolest since 1950 and the 2nd coolest (after 2002) of the 32 Februarys since 1977. This is for an area the same size as the contiguous 48 States of the US.
UC

Posted Mar 31, 2008 at 12:53 AM | Permalink

Data for 2008 were being used in the smoothing process as if they represented an accurate estimate of the year as a whole.

This brings up, of course, interesting question. How accurate estimate of upcoming year Jan-Feb average is ? Jan-Feb is part of that average, so they cannot be independent. We can use past data to obtain some clue:

( http://signals.auditblogs.com/files/2008/03/janfeb_yxpred.png )

(simple average data)

(another way to look at the data)

Simple model T_JF=T_annual+noise fits quite well. And with that model, we have just noisy estimate of 2008, and Hadcrut people found out how badly these end-point methods are sensitive to noise. Yet, http://hadobs.metoffice.com/hadcrut3/diagnostics/global/simple_average/annual_s21.png
still has unchanged uncertainties near the end-points. But let’s give them some time.

Of course, if someone wants to do ICE or CCE, you can try to fit T_JF=alpha*T_annual+noise or T_annual=alpha*T_JF+noise , but I wouldn’t go there 😉
UC

Posted Mar 31, 2008 at 12:53 AM | Permalink
Rob

Posted Mar 31, 2008 at 1:08 AM | Permalink

Re #47 et al. Thanks for your interesting respnses. Yes, I think the heat in Adelaide was because of a high pressure system in the Tasman Sea, that had been trapped by La Nina, which was directing a slow moving northerly airstream over this part of the Country.

We’re probably all getting rather OT by this point!
Patrick Hadley

Posted Mar 31, 2008 at 3:38 AM | Permalink

Re #37 David Stockwell asks for a reference to Hadley describing their error:
Quote: We have recently corrected an error in the way that the smoothed time series of data were calculated. Data for 2008 were being used in the smoothing process as if they represented an accurate esimate of the year as a whole. This is not the case and owing to the unusually cool global average temperature in January 2008, the error made it look as though smoothed global average temperatures had dropped markedly in recent years, which is misleading. End Quote

http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/
Willis Eschenbach

Posted Mar 31, 2008 at 3:54 AM | Permalink

Hu, you are 100% correct to say:

Agreed. But if one really insists on continuing the filter (technically smoother) to the endpoint, a more reasonable approach than endpadding (or Mann’s inflated endpegging algorithm, debunked here earlier somewhere) would be to simply lop weights off the filter, so that the last value of an eg 21 point filter would simply use the first 11 weights, renormalized to sum to unity. An appropriate graphical way to show that the formula has changed is to show the last eg 11 points as a dotted continuation of the solid complete filter curve.

I use the same method (use all available weights, then renormalize to sum to unity) with a gaussian filter. A gaussian filter is virtually identical to a high number (e.g. 21 point) binomial filter. I have tested this method against all of Mann’s methods, and found it to have the least error (compared to the eventual full acausal filter value revealed after time).

However, your suggestion of lopping off 10 points ( (21 points – 1) / 2 ) goes way too far. Using the method we both recommend, on say a 21 point binomial filter, the first 8 points only contain 13% of the weight. So if we lop off two points ( (21 points-1) / 2 – 8 points) we’ll only have an error of about 13% in the third point from the end.

A 21-point binomial smoother is almost identical to a 7 point full width to half maximum (FWHM) Gaussian smoother. One implication of this is that the full width of a 21 point binomial smoother covers ±5 standard deviations. That’s why we can chop off so many points without loss of accuracy. It’s a poor choice in smoothers because there’s almost no weight out towards the edges. All the weight is in the center.

An equivalent 7 point FWHM Gaussian smoother has 13% of the weight in the very first point, so again we can chop off two points ((7 points-1) / 2 – 1 point) to get a similar error. This is as we’d expect, since the two are almost identical. A 21 point binomial filter measures about 7 point FWHM.

It is quite possible to calculate error bars for the estimate. It is done by calculating, for each timestep in the dataset, the error of the final point smoothing (with only 11 of 21 datapoints available) as compared to the eventual reality. From the historical record of the errors of your chosen smoothing method (the error dataset), obtain the standard deviation. That standard deviation is the best estimator of the standard error of the final data point in a series.

It is also possible to use the error dataset to improve final smoothing point forecasts … but there you’re heading into uncharted, slippery territory. I have had some success from using a weighted combination of short, medium, and long gaussian causal filters to improve estimates of final point errors.

Thanks for bringing up this interesting and important point.

w.
Chris Wright

Posted Mar 31, 2008 at 4:49 AM | Permalink

Although the recent falls in global temperature appear to be dramatic, I think the HADCRUT3 data for the southern hemisphere may be more significant. It shows a very consistent decline over the last six years (about a fifth of a degree, not including the recent falls). If this trend continues for a few more years then, for the southern hemisphere at least, virtually all of the 20th century global warming will have been wiped out.

Bearing in mind that the very high 20th century solar activity may be coming to an end, it may be that we really will have to start worrying about climate change: global cooling. History shows that people starve and civilisations fall when the world gets colder. It may be that there really is a problem with carbon dioxide: there isn’t enough of it!

Chris
James A. Donald

Posted Mar 31, 2008 at 5:47 AM | Permalink

Steve wrote:

From my immediate view in Toronto, we still have banks of snow, which will still be here at the beginning of April in two days. I certainly don’t recall such a situation during my adult life

It looks to me that a single site with a maritime climate, averaged over several years, gives excellent agreement with other such sites, indicating that the oceans are good global thermometers. Sites with continental climates are, however, dreadful global thermometers. It starts to look to me that we can do a decent job of measuring global temperature with a quite small number of high quality sites.

When the satellite data gave unacceptable results for global warming, people looked for sources of error, and were not happy till they found them. One may suspect that they looked much too hard, and looked for potential errors in one direction only. The Aqua satellite, however is free from these “corrections”, so since 2002, we have unimpeachable data for global temperatures, against which earthbound methods for measuring global temperatures may be judged. I sometimes suspect that global cooling may have set in substantial part because of the launch of the Aqua satellite in 2002.
steven mosher

Posted Mar 31, 2008 at 6:02 AM | Permalink

re 49 thanks uc. james annan notes that there is a bet on 2008 final figures. see his blog 4 details
Bernie

Posted Mar 31, 2008 at 6:32 AM | Permalink

Geoff, Ian , Rob and Carl:
Interesting variance. It would help if you gave a sense of the distances between say Sydney, Melbourne, Adelaide and Brisbane. I have a sense that some do not realize quite how large Australia is not the topography of SE Australia, which explains the climate difference between Sydney and Adelaide.
Most importantly, what’s the prognosis for this years wine?
Hu McCulloch

Posted Mar 31, 2008 at 8:39 AM | Permalink

Re Willis (#52), it’s true that if you lop the first (or last) 8 points off a 21-point binomial filter, you still have well over 75% of the weights remaining (86.8%, in fact).

It wouldn’t be too misleading to stick with a solid line as long as you have say 75% of the weights, and switch to a dotted line only when you are under 75%. With the 21 point binomial filter, this would require dotting only the last 3 points (or intervals to be precise, since the points themselves are just dots).

But at a minimum these last 3 should definitely be differentiated somehow from the others.
Johan i Kanada

Posted Mar 31, 2008 at 9:21 AM | Permalink

Re: Filtering & Smoothing

What the reason for using anything else than simple n-period moving average? Such a method is simple, always works, and
cannot be misused or misinterpreted.

What extra knowledge does one gain by applying e.g. a 21-point binomial filter? Or any other “fancy” method?

If a link to some other source is the best response, that would be appreciated too.

Thanks,
/Johan
Mark T.

Posted Mar 31, 2008 at 9:29 AM | Permalink

The binomial weights the current sample as the heaviest, which makes intuitive sense. All FIR filters are non-causal if implemented in any other manner than trailing edge (i.e. if the taps extend into the future). The problem is that the delay of the filter is half the filter length, so you almost have to implement it in a non-causal manner. The result from an MA is that future samples end up consisting of half the total current output, which makes very little sense since they don’t exist yet. At least with the binomial future samples are not weighted as heavily.

Mark
Jim P

Posted Mar 31, 2008 at 10:42 AM | Permalink

I find this sentence strange

A significant drop in global average temperature in January 2008 has led to
speculation that the Earth is experiencing a period of sustained cooling

Where has this speculation appeared and why would anyone think that data from these two months only would lead to speculation of sustained cooling?

Jim P
Steve McIntyre

Posted Mar 31, 2008 at 10:46 AM | Permalink

#60. Why indeed? I don’t see this statement in any prior comments.
Phil.

Posted Mar 31, 2008 at 11:00 AM | Permalink

I find this one strange too:

Hadley Center, authors of the pre-eminent temperature series, have suddenly identified an “error” in how they presented temperature data.

What was so sudden about it? Surely it was just recent (the word they themselves used)?
David Stockwell

Posted Mar 31, 2008 at 11:18 AM | Permalink

#51 Thanks you Patrick. The lack of outrage is amazing.
kim

Posted Mar 31, 2008 at 11:23 AM | Permalink

Recent implies a lag between discovery and announcement. They could have said, ‘just now’ or ‘now we have discovered’. Nonetheless, any moment of revelation is ‘sudden’. I doubt they came upon the realization slowly.

You are quite a rhetorician. It’s a gift, and shouldn’t be misused.
========================================
Aaron Wells

Posted Mar 31, 2008 at 11:48 AM | Permalink

I find this sentence strange

A significant drop in global average temperature in January 2008 has led to
speculation that the Earth is experiencing a period of sustained cooling

Where has this speculation appeared and why would anyone think that data from these two months only would lead to speculation of sustained cooling?

Jim P

What HadCRU really meant to say, but refused to, is that the drop in global average temperature has led to speculation that the persistent warming has slowed, or even ceased.

They can’t let that go on unchallenged.
Mat S Martinez

Posted Mar 31, 2008 at 12:13 PM | Permalink

two months of cool weather during the peak months of a strong La Nina and people are saying some pretty wild ideas.

Also, why are people reading that the correction has an ulterior motive other than just being a correction? People just love a good conspiracy…
kim

Posted Mar 31, 2008 at 12:22 PM | Permalink

Seven years of flat or dropping temperatures and people are saying some pretty wild ideas.
======================================================
Phil.

Posted Mar 31, 2008 at 12:39 PM | Permalink

Re #64

Recent implies a lag between discovery and announcement. They could have said, ‘just now’ or ‘now we have discovered’. Nonetheless, any moment of revelation is ‘sudden’. I doubt they came upon the realization slowly.

You are quite a rhetorician. It’s a gift, and shouldn’t be misused.
========================================

Whereas you have english comprehension difficulties and frequently misuse the language.
As well as a fertile imagination and an inability to distinguish your daydreams from fact!
Sam Urbinto

Posted Mar 31, 2008 at 12:55 PM | Permalink

Why would or would not anything lead to speculation. I can speculate about anything for any or even no reason.

And I would in any event certainly be surprised if two solid months towards the zero anomaly line didn’t lead to speculation.
Sam Urbinto

Posted Mar 31, 2008 at 12:57 PM | Permalink

Especially given that it’s speculation that the anomaly is a valid way to “assess global climate system heat changes”. Saying it does is an assumption that it’s both valid and correct. Neither of which is to any degree certain.

http://wmbriggs.com/blog/2008/03/28/quantifying-uncertainty-in-agw/

http://climatesci.org/2008/03/28/a-short-tutorial-on-global-warming/
Bishop Hill

Posted Mar 31, 2008 at 1:01 PM | Permalink

I’ve written to Derek Twigg, the minister responsible for the Hadley Centre, pressing for publication of the raw data and code behind HADCRUT. I’m sure it would help if others did the same too.
Hu McCulloch

Posted Mar 31, 2008 at 1:06 PM | Permalink

Johan i Kanada writes,

What the reason for using anything else than simple n-period moving average? Such a method is simple, always works, and cannot be misused or misinterpreted.

What extra knowledge does one gain by applying e.g. a 21-point binomial filter? Or any other “fancy” method?

A binomial or gaussian filter has nicer spectral properties when you have an unbounded sample, but you are right that just an n-period moving average would be a lot easier to understand. Perhaps someone can correct me, but I think a 7- or 9- year MA would have about the same properties as a 21-point binomial filter, given that the middle 5 points of the latter carry about half the total weight. A 7- or 9-year MA would also reduce endpoint considerations.
Phil.

Posted Mar 31, 2008 at 1:25 PM | Permalink

Re #72

What the reason for using anything else than simple n-period moving average? Such a method is simple, always works, and cannot be misused or misinterpreted.

Check out n-period moving averages here:

http://www.amstat.org/publications/jse/v11n1/datasets.hays.html
Willis Eschenbach

Posted Mar 31, 2008 at 1:42 PM | Permalink

Hu, thanks for the reply. You say:

Re Willis (#52), it’s true that if you lop the first (or last) 8 points off a 21-point binomial filter, you still have well over 75% of the weights remaining (86.8%, in fact).

It wouldn’t be too misleading to stick with a solid line as long as you have say 75% of the weights, and switch to a dotted line only when you are under 75%. With the 21 point binomial filter, this would require dotting only the last 3 points (or intervals to be precise, since the points themselves are just dots).

But at a minimum these last 3 should definitely be differentiated somehow from the others.

If you are happy with 75% of the weight, on a 21 point binomial filter (which is about a 7 point FWHM binomial filter) you only need to use a dotted line for the very last interval. The first 12 points of a 21 point binomial filter contain almost exactly 75% of the weight.

Our range of year-to-year changes is finite, with an average and a standard deviation. For the HadCRUT dataset, the standard deviation of the first difference (year-to-year change) is about .3. So a 25% error in the final year of smoothing will be about a 0.07°C standard error. As can be seen above in UC’s post, this is not a whole lot. (yes, it’s a bit more complex than that … but that’s a reasonable approximation.)

w.
kim

Posted Mar 31, 2008 at 2:04 PM | Permalink

Gad, Phil., I hope Steve leaves #68 up. It’s suddenly more revelatory about you than about me.
===================================================
00

Posted Mar 31, 2008 at 2:08 PM | Permalink

In the answer from Hadley regarding the different average values:

In 2007, large areas of the Arctic Oceans were ice free and had sea surface temperatures that were a few degrees warmer than average. A number of ships went up to the Arctic each month and took observations.

However, when the annual average was computed there were observations in most Arctic grid boxes so overall the Arctic received a greater weight in the annual global average than in any single monthly average.

Does it mean that the Artic gets more important in the annual global average when the Artic is warmer than normal, but not when it is colder and the ships can’t do their work?
MarkW

Posted Mar 31, 2008 at 2:23 PM | Permalink

The big drop in temperatures started in Jan/Feb of last year. The last two months are just a continuation of that trend.
Steve McIntyre

Posted Mar 31, 2008 at 2:27 PM | Permalink

#66. No one used the word “conspiracy”. The point is that these institutions seem far more alert to errors causing something to go down than to errors that cause something to go up.

If it was an error this year, then it was also an error last year when a warm El Nino led to warm Jan-Feb 2007. Did these high numbers cause Hadley Center to re-examine their algorithm and see if there was an error? Nope. However, low numbers caused them to promptly identify the error.

I think that it’s a reasonable to say that there is a bias towards much faster identification of downward errors than upward errors. There’s a big difference between “bias” and “conspiracy” – so please don’t use inflammatory terms like “conspiracy” to pick a fight.
James Chamberlain

Posted Mar 31, 2008 at 2:30 PM | Permalink

And Matt, #66. There is and was no error. That’s the whole point. There is just “another” way of looking at the data depending on what you want the final product to look like.
JD

Posted Mar 31, 2008 at 2:37 PM | Permalink

How about considering Kalman filtering (Linear quadratic estimation, LQE)?
Willis Eschenbach

Posted Mar 31, 2008 at 2:38 PM | Permalink

Regarding my last post, Hu, I just took a look at the HadCRUT3 dataset that’s causing all of the fuss.

Using the method we discussed, I calculated the difference between the truncated (causal) gaussian and the acausal gaussian filters for the entire dataset (less the first and last years to avoid end effects). The average error was 0° ± 0.045°C (1 standard error).

Thus, using that method, the 95% CI on the last datapoint would be 1.96 * 0.045 = 0.09°/C … not much.

w.
Jim Miller

Posted Mar 31, 2008 at 2:57 PM | Permalink

FWIW (not a lot), we have had a cold March here in the the Seattle area, along with a significant amount of late snow. And the snowfall in the mountains here during this winter was much higher than average.
Robinedwards

Posted Mar 31, 2008 at 4:01 PM | Permalink

I’m just a bit mystified by the intense interest in details of the techniques that could be (and indeed are) used for “smoothing” data observations. I’m not a fan of smoothing in general, though I know how to do it in simple ways. No-one seems yet to have mentioned non-parametric smoothers, as far as my reading goes. This type of smoothing uses running medians as the estimate of local level, so is not as heavily influenced by real fliers as parametric smoothers. Would such methods be deemed acceptable in the present (climatological) context?

The above is “by the way”, since I really wanted to ask why smoothing seems to be so important? Are we hypothesising that the actual observations are in error, and need modifying before we can put any faith in them? Normally, we use averages of “repeat” observations to handle data misfortunes. With time series for a single set of observations that is not really an option, so we guess that other observations closely related in time could be used to produce the soothing effects of averaging. This is fair enough, but as has been pointed out, choosing a method to handle the ends of the time series can give rise to substantial differences of opinion. I’d have surmised that the actual end observations, perhaps two at either end of the series, have the best chance of being decent estimates of the local level, and that an acceptable smoothing technique should make use of that. On this rather simplistic reasoning Hadley Centre seems to be using a lot of imagination in presenting their “smoothed curves” or bar charts.

Over the last few weeks I’ve commented to several friends on the strange approach of the Hadley Centre to displaying plots of their numerical information. I wondered at first whether I was misreading the plots, but having collected the data (at least, some of it) it was pretty clear that the HC was using artistic licence quite liberally. It is gratifying to see that this subject is now the subject of an active discussion here.

The methods I favour for examining climate data do not rely on smoothing. On the contrary, I accept all reported values as being the best information available for that series, and I look for major patterns that are affected hardly at all by occasional observational misfortunes. The “grand scale” is I feel the most telling way to handle noisy data generated by inherently chaotic multi-feedback systems (that may be influenced by external “forcings” – to use the jargon terminology). We all think of CO2 concentration, greenhouse gasses in general, solar energy input, solar activity effects such as those espoused by Svensmark, and no doubt many others.

What climatologists appear to be trying to do is to collect data that might (possibly) contain a signal that relates to these sorts of effects and to prove as quickly as possible, and to their own satisfaction, that their surmised effect fits the observed data better than anyone else’s.

Strangely, they seem to lose sight of the fact that by smoothing their data they are surely disguising anything that might signal the onset of an abrupt change. This applies to both current and historical climate data. My investigations have convinced me that most climate change takes place abruptly, generally preceded and followed by periods of remarkable stability or only very gradual change. Large scale data – global or hemispherical – do not show very abrupt changes. Such occurrences may be non-contemporaneous on the globe scale, smearing out any very striking discontinuity, but are they are very obvious for given sites.

I’ve loads of graphical displays, backed up by more formal statistical analyses, to illustrate abrupt climate change should anyone be interested to see them.

Robin
SteveSadlov

Posted Mar 31, 2008 at 4:56 PM | Permalink

Mat Martinez – AGWist agent provocateur.
James A. Donald

Posted Mar 31, 2008 at 5:44 PM | Permalink

I’ve loads of graphical displays, backed up by more formal statistical analyses, to illustrate abrupt climate change should anyone be interested to see them.

How abrupt is abrupt?

If the climate is solar forced, as seems likely, we are seeing a substantial change in solar behavior over a short period, starting August 2007, and simultaneously temperatures start taking a dive, starting August 2007, which hints that abrupt climate change may have began in August 2007, and temperatures may continue to fall over the next decade or two.

How does that agree with your notion of “abrupt”?
cce

Posted Mar 31, 2008 at 6:45 PM | Permalink

The AR4 WGI SPM was released February 2nd, 2007. The HadCRU temperature anomaly for January 2007 would have come out weeks later.
Cthulhu

Posted Mar 31, 2008 at 6:59 PM | Permalink

I suspect the graph that made them change the method used was the following one. Right on the front of the hadcrut3 page and the Jan+Feb 2008 updates made it obvious the method was in error when it hadn’t been so obvious before.

Compare the recent:

With the one from Aug 2007 (latest wayback machine archived one, wish I could find the Jan or Feb 2008 one):

Notice they no longer include the current year in the trend.

After the update for Jan 2008 the method flaw became obvious. The end of the trend sloped down dramatically as the method effectively made every month in 2008 have the same low anomoly as Jan 2008.

I doubt it was as apparent in 2007 (I didnt notice it) as the spurious end part of the trend simply continued the prior trend. Wasn’t until the trend bucked that it became obvious how much undue weighting Januaries were recieving.
John Lang

Posted Mar 31, 2008 at 7:16 PM | Permalink

Let’s not sight of the fact that the cooling over the last year was really driven by the ENSO – La Nina. The ENSO abruptly switched from a moderate El Nino to a moderate La Nina about January 15th, 2007. Global temperatures starting dropping by February 2007.

It should be more than clear now that the ENSO has a larger than expected effect on the global climate. Just look at 1878, 1998 and 2007. Just look at the ENSO switch which occured around 1976.

This effect is not built into the climate models at all. Or, at least, the predictive capability of the climate models for ENSO is non-existent.

On January 7th, 2007, the NOAA predicted that El Nino would continue for at least 3 more months and they placed the chance on a La Nina developing within those 3 months at 1.0%. Well, 8 days later, the La Nina was 100.0% in place and the NOAA is supposed to have the best predictive models for the ENSO.

So given we are relying on climate models to provide us with the global warming likelihoods and two of the most important factors in the climate are non-predictive in the models, clouds and the ENSO, we should take the global warming predictions with the grain of salt indicated.

I should also note that La Nina has weakened considerably in the last few months. The most recent sea surface temperature maps now show warmer than average waters at the South American coast. We’ll have to keep watching to see if La Nina returns (since the most recent map shows a slight cooling trend at the coast again versus the past few weeks but the best bet is that La Nina will be over shortly.)
Anthony Watts

Posted Mar 31, 2008 at 8:11 PM | Permalink

RE68 Steve McIntyre writes:

The point is that these institutions seem far more alert to errors causing something to go down than to errors that cause something to go up.

I would attribute it to “expectation bias” on the part of the data and webpage gatekeepers. Since they are English, I’ll use the tea analogy.

You are making tea. You put water to boil on the stove, light the fire, and set the teakettle on the burner, see that all is well, and go about your business.

You look over from your desk, you see the burner going, the kettle is making the pops and creaks as the metal expands due to increasing temperature. All is well, the temperature is rising.

In two minutes, and you begin to hear the chorus of small bubbles forming on the bottom. No need to look over, all is well. The temperature is rising.

In another minute, you hear bubbles, no need to look to see thin wisps of steam rising from the spout, all is well. The temperature is rising, water should be ready soon.

30 seconds later, the whistle begins, and you know the heating process (AGW) went perfectly. The water temperature went up as expected.

But if the burner had gone out, just before the whistle, you wouldn’t notice it, for some time, until you realize the whistle never came. Then you’d get up from your chair to do something about it. Ah, the burner went out, the water is cold, we’ll move it to another burner that isn’t faulty.

All is well.

Expectation bias in temperature rise, the Lipton Tea of climate science.
Geoff Sherrington

Posted Mar 31, 2008 at 10:54 PM | Permalink

Re many posts on end effects in smoothed series:

Is there any disagreement with my contention that all projections into the future are a GUESS and should be labelled as such?

I cannot understand why you mathematical types argue the merits of different ways to guess. The data ougt be presented in actual form, stopping at the last accepted real observation.

If others than the bookkeepers of data wish to be clairvoyant or clever with future maths and future data, let them go their hardest – so long as they label it a GUESS and don’t go to the bank with it.

Did you hear about the clairvoyant with dementia who used to forget his predictions before they actually happened?
UC

Posted Apr 1, 2008 at 12:01 AM | Permalink

#58

What the reason for using anything else than simple n-period moving average? Such a method is simple, always works, and cannot be misused or misinterpreted.

#72

Perhaps someone can correct me, but I think a 7- or 9- year MA would have about the same properties as a 21-point binomial filter, given that the middle 5 points of the latter carry about half the total weight. A 7- or 9-year MA would also reduce endpoint considerations.

These choices are not very good if there is some specific frequency band that needs to be filtered out,

( http://signals.auditblogs.com/files/2008/04/bin_vs_ma.png )

But, weather noise is not defined anyway, so anything goes. Filter order is of course important, less endpoint worries with a short filter.

#80

How about considering Kalman filtering (Linear quadratic estimation, LQE)?

Weather noise is not defined, so we can’t write the dynamic model. Maybe climate signal is a random constant, and all we observe is weather noise. I can write a Kalman filter for that case 😉

# 66

Also, why are people reading that the correction has an ulterior motive other than just being a correction? People just love a good conspiracy…

They still haven’t corrected the s21 figure uncertainties. I guess it takes few more cold months to realize..

There are claims of great accuracy. Criticisms are met by ad hoc excuses.
Willis Eschenbach

Posted Apr 1, 2008 at 12:10 AM | Permalink

Geoff, you say:

Is there any disagreement with my contention that all projections into the future are a GUESS and should be labelled as such?

I suspect no one answered you because we are not talking about projections into the future. Since you are, I can’t figure out what you are talking about.

We are talking about parsimoniously extending a data smoothing up to the present. The method I advocate incrementally transforms a 100% acausal smoother into a 100% causal smoother.

If you don’t understand that, or if you couldn’t care less about that, that’s perfectly fine. However, since many of the people on this blog both understand and care about such things as acausal smoothers, you might consider contributing to another online community that cares about and understands things that are important to you.

My best to you,

w.
Marshall

Posted Apr 1, 2008 at 3:31 AM | Permalink

Greg Meurer #6

Considering the recent sudden drop in observed temp. global cooling can certainly be measured instead of with a small group ( in months .) Why can’t we determine an average for decades,say 1850 – 2008 ?

Does anyone have any insite about what will happen during the midsummer high point ? Does anyone believe there will be any sudden melting of glaciers and flooding ?
Geoff Sherrington

Posted Apr 1, 2008 at 4:14 AM | Permalink

Re # 93 Willis Eschenbach

Willis, I have no argument with you. Indded, I am enjoying learning from you. I was reacting to some aspects of endpadding and to comments above that do deal with looking into the future. There are some.

Heck, I’ve been smoothing data for decades. When we get into a discussion about which is the best method for a particular set of data, we sometimes make a guess as to which method might be most suitable, and we usually find that different choices give different results. Not all the guesses are equal. When the guess is aimed at positioning past data so that the next point sits smoothly, that is (obliquely) guessing about the future.

My preference is simplistic. The last data point should be actual.

Our backgrounds might also have us at odds. In my work we treasured anomalous data points as they were often information-rich. In this work the more common treatment is smoothing out the spike to make a trend easier to comprehend.

Mea culpa for not expressing concisely.
IT Audit

Posted Apr 1, 2008 at 6:17 AM | Permalink

Marshall, sudden melting or flooding are of all ages. It’s very likely to happen this summer.
Mark

Posted Apr 1, 2008 at 7:13 AM | Permalink

Interesting how Hadley pretends it is something that just caught their attention. Last year I noted that Hadley used the YTD average for the current year and how this was “distorting’ the temperatures early in the year due to the El Nino. As the year progressed this disappeared as temperatures began to drop with the onset of the La Nina. At the time I thought it will be interesting to see what would happen when the cold data for early 2008 was reflected in their graphs. I wasn’t disappointed! When the January report came out in mid February there was no YTD number for 2008 so I sent a note off to them asking why. The response and follow up is shown below. I was not necessarily convinced on the answer but I had no way of checking what had been done for January 2007. Then sure enough in mid March the YTD numbers were initially included and showed a tumultuous drop. I was then very surprised when I referenced the data several weeks later to find this claim of an error and the graphs changed. The fact of the matter is that they were well aware of this methodology all along. This is not a matter of ‘catching an error’. This was driven by politics. The powers that be did not like the picture that was being painted!

==============================================================

Dear Mark,

I’m sorry if I didn’t make it clear the first time: The January monthly
estimate is not added to the annual series because it is the same
number. When the February data are processed, an ‘annual’ average is
calculated and added to the annual series. As far as I know, this is
what happened last year.

Best regards,

John

On Mon, 2008-02-25 at 10:06 -0500, Mark Thompson wrote:

>> However, looking at the annual series, there is currently no figure
>> shown for 2008 even though the January 2008 monthly figure is available.
>>
>> John Kennedy wrote:
>>
>
>>> >Dear Mark,
>>> >
>>> >The practise of including a year-to-date figure continues. However, when
>>> >we have only the January data the estimate for the year is the same as
>>> >the January monthly figure. A separate estimate is made of the annual
>>> >figure when the February data are processed.
>>> >
>>> >Best regards,
>>> >
>>> >John Kennedy
>>> >
>>> >Dataset: hadcrut3
>>> >Date/Time: 2008/02/21:18:08
>>> >Name: Mark Thompson
>>> >E-mail: xxxx@sympatico.ca
>>> >Quote?:
>>> >Comment: Last year you included a year-to-date figure in your annual
>>> >data series/graphs. Why has this practice been stopped as of January
>>> >2008?
>>> >
>>> >
>>> >

n John Kennedy Climate Monitoring and Research Scientist Met Office Hadley Centre FitzRoy Road Exeter EX1 3PB Tel: +44 (0)1392 885105 Fax: +44 (0)1392 885681 E-mail: john.kennedy@metoffice.gov.uk http://www.metoffice.gov.uk Global climate data sets are available from http://www.hadobs.org

================================================================
kim

Posted Apr 1, 2008 at 7:42 AM | Permalink

Mark, thanks for the context for the word ‘recent’ in comment #62. Perhaps the revelation was not so sudden.
=============================================
Ron Cram

Posted Apr 1, 2008 at 8:11 AM | Permalink

re:88

John Lang,

You write that the recent cooling is driven by ENSO turning to La Nina. I disagree. While this has played a role, it is really the PDO turning to its cool phase in combination with La Nina. When PDO is in its warm phase, El Nino can be quite warm – such as in 1998. When the PDO is in its cool phase, the La Nina can be quite cool. But the opposite is not true. La Nina will never be this cool when the PDO is in its warm phase.
Ron Cram

Posted Apr 1, 2008 at 8:12 AM | Permalink

re: 88

John Lang,

I should have pointed out in the comment above that this is important because the PDO will be in its cool phase for the next 30 years or so.
James Chamberlain

Posted Apr 1, 2008 at 8:23 AM | Permalink

Mark #96. Do you have the communications from 2007 pointing out their “error”, or the so-called smoking gun? That would make a lovely thread post or blog topic in its own right.
Mark

Posted Apr 1, 2008 at 8:44 AM | Permalink

#100

I didn’t communicate with them in 2007. I only did this when I noticed that they hadn’t included YTD figures in the January 2008 output. However at that time they clearly knew about the practice of using YTD averages for the current year in the annual graphs. Then they went ahead and produced their standard graphs and posted them. It took a couple of weeks but my guess is that someone up the chain of command noticed, didn’t like the picture that was presented and voila, Hadley suddenly comes out with this ‘error’ charade.
Mike C

Posted Apr 1, 2008 at 8:49 AM | Permalink

98 & 99 Ron Cram,

I looked at the PDO and MEI after reading your comment. I understand the decadal phases of PDO but when there is a La Nina the PDO also swings negative, even if for a short period of time. What was most compelling for me was the 1998 El Nino. It sticks out like a sore thumb on both ENSO and temperature records but not in the PDO record. I’m also stumped by the fact that ENSO is very well understood and studied; it is essentially the effect of changing amounts of cold upwelling water at the equator that is spread across the surface by wind. PDO and its causes on the other hand are not understood as is stated on the PDO web page.
Mark T.

Posted Apr 1, 2008 at 8:50 AM | Permalink

Weather noise is not defined, so we can’t write the dynamic model. Maybe climate signal is a random constant, and all we observe is weather noise. I can write a Kalman filter for that case

Indeed, the first step in implementing a Kalman filter is to estimate the noise covariance. Easy enough to do when your noise is thermal noise in a receiver, which is white: sigma^2 * I.

Mark
Mark

Posted Apr 1, 2008 at 9:12 AM | Permalink

It is interesting to note that they note only removed the 2008 YTD data for the smoothed annual series but also for the standard annual series too. I guess they were stymied with an excuse to diddle the monthly series which still shows the precipitous drop they’d rather not have anyone see!
Gaelan Clark

Posted Apr 1, 2008 at 9:47 AM | Permalink

While this is all generally confusing to me, and I thank all of you who post in terms that us laymen can understand, I am still confused about the “HadCRU GLB: End Effects” chart that Steve has shown above.
Does Hadley apply the new formula across the board?–Or does Hadley apply the new formula to only the months showing cooling? (i.e.–Jan, Feb 2008)
Essentially the graph that Steve highlights only shows a difference at the very end of the plot. Should not the graph have changed significantly across the the entire plot?
Ron Cram

Posted Apr 1, 2008 at 9:57 AM | Permalink

re: 102

Mike C,

You did not provide any links to support your comments so I am not sure what information you are looking at. I presume it is something that looks like the graph found here.

You are correct that the PDO is not well understood. I believe the “monthly values” measurement contains a great deal of noise. The important signal is found in the longer term “regime shifts.” See the Wikipedia article here.

It takes four or five years for the PDO to shift from one regime to another. The Bratcher and Giese paper was written in 2002 and based on their observations they predicted the PDO would change regimes in “about four years.” It did not actually seem to shift until late 2007.

So while monthly values may fluctuate, I do not expect you will see any exceptionally warm El Ninos for the next 30 years or so. The exceptionally strong and cool La Ninas, which we have not seen in recent decades, will dominate this cool PDO regime.
Mark

Posted Apr 1, 2008 at 10:11 AM | Permalink

#105

I believe the graph is produced based on the smoothing of the annual averages. The issue was that 2008 was also included using a year-to-date average which comes out really low at the current time due to the cold temperatures experienced in January and February. Hadley didn’t like the picture this presented so they’ve simply decided to truncate the annual graphs as of the end of 2007 and they probably now won’t get updated (with 2008 data included) until 2008 is behind us (unless of course temperatures rocket up for the rest of the year!). All data for prior years is thus unaffected although the actual smoothed curve pre-2008 is changed as its calculation incorporated the now excised 2008 YTD data.
Sam Urbinto

Posted Apr 1, 2008 at 10:12 AM | Permalink

Last year looked “normal” and this one didn’t. What whom knew when is rather immaterial, as long as the practice gets fixed. Hopefully now that everyone knows the auditing is going on, they’ll (climate folks) fix problems themselves before somebody else exposes them. A good thing I think.
pk

Posted Apr 1, 2008 at 10:14 AM | Permalink

Am I missing something? Why does January/February come out so low when we’re talking about global temps?
pk

Posted Apr 1, 2008 at 10:21 AM | Permalink

Nevermind…having a blonde moment.
aurbo

Posted Apr 1, 2008 at 10:31 AM | Permalink

On Smoothing:

Having been in the business of generating temporal series of daily, monthly and annual data graphically, for over 50 years, one does a lot of experimenting to find a method that produces consistent and informative results without totally destroying the significance of any particular data point. I found that if one is not too eager to smooth data right up to the most recent observation, a simple MA was acceptable with a period long enough to accomplish the smoothing, but short enough to provide an operationally useful end-point.
An important secondary consideration is that in data showing a measurable cyclical characteristic, the period of smoothing should not be equivalent to a fundamental or low order harmonic frequency of such a cycle.

For most observational weather parameters, I found that 5 units (days, months or years) is a reasonable period that accommodates most of the above considerations. This means that the end-points of the smoothed curve are only 3 units away from the most recent data point. Over time, it became apparent that a 5-unit linear (equal-weighted) smoothing often fails to appropriately capture the appearance, if not the significance, of abrupt discontinuities in a timely way. So I modified the smoothing algorithm to a quasi-logarithmic one in which the smoothing period was still 5 units, but the data points were weighted by a 1-3-5-3-1 multiplier centered on each data point sequentially. This solved the problem of bringing more attention to any significant inflections in the curves and also had the salubrious effect (in the days before cheap calculators, much less portable computers, were widely available) that in summing the 5 adjusted data points, one merely had to move the decimal point one place to the left to obtain the value of the point to be plotted.

Additionally, if one felt it necessary to extend smoothing up to the current data point, the point could be plotted with 70% (1+3+5)/10 of the weight that this point would ultimately receive when the missing two data points became available. Those who are willing to accept forecasts for the unobserved (future) points could do so without being terribly embarrassed by the magnitude of the adjustments needed to correct the originally plotted curve when the real data became available.

Finally, like most other people who deal with weather data operationally, I noted with some distain Hadley’s inclusion of the warm Jan-Feb in 2007 in their bogus annual averages last year and looked forward to seeing what they would do if a subsequent Jan-Feb combination came in cold. Therefore, I wasn’t surprised in the least when they “discovered” the bias error this year. How convenient!
aurbo

Posted Apr 1, 2008 at 10:44 AM | Permalink

Important correction to the boo-boo in my previous post (#111) above.

The quasi logaritmic 5-unit smoothing weighting should have been 1+2+4+2+1 and the weighting for estimating 70% of the weight of the final end point in the 4th paragraph should have read (1+2+4)/7.

My bad.
Mark

Posted Apr 1, 2008 at 11:03 AM | Permalink

#108

So what exactly is “normal”? Hadley apparently defines it as “as long as the information as presented supports global warming alarmism”. As Steve pointed out they had no interest in addressing this matter until it ran counter to their interests. Yes, Hadley appears to be very good at “fixing” things!
Mike C

Posted Apr 1, 2008 at 11:39 AM | Permalink

106 Ron Cram,

Fortunately I know enough about Ron Cram to know that Ron Cram knows what I’m talking about as evidenced by the link you provided. If you look at the maps of the cool and warm phases you might notice that they are generally smoothed versions of pacific temperatures during ENSO events. The reason that PDO advocates cannot identify the cause of shifts in PDO phases is probably because they view ENSO as being caused by PDO rather than the other way around. ENSO is a function of thermocline slope, which is a function of winds, which is a function of several factors, particularly MJO, the Pacific gyre and surface temperature in the southern Pacific, the later being a function of solar variability. Solar activity will have to remain low for Pacific SSTs to come down a little more to get back to the strong and long lasting La Nina’s, similar to the mid 20th century cooling period.
Sam Urbinto

Posted Apr 1, 2008 at 12:08 PM | Permalink

Mark: “Normal” matches the earlier data fairly well. Going up trendline keeps going up, not switching suddenly to going down on the trendline.

🙂

Anyway, what’s funny is all the folks obsessing on only counting a month or a year as not being kosher (aside from the assumed correlated and accurate relationship between the anomaly trend and temperature in the first place) obviously the climate people themselves are concerned about the months.

So it’s okay for Hadley or GISS to speak in terms of months, and track months, and get their first anomaly mean over the course of a month, and speak of “warmest months” and “hottest years” etc but not for anyone doing the opposite. Typical.

It’s an anomaly trend, and it goes up and down. Get over it. 🙂
bill-tb

Posted Apr 1, 2008 at 12:16 PM | Permalink

Steve, I agree with not taking to much from this, I ask a simple question — What would be the story if the sign or the error were reversed?
Mark

Posted Apr 1, 2008 at 12:43 PM | Permalink

#116 – Steve can of course answer for himself but here’s my two cents worth:

The sign WAS reversed in early 2007 with the El Nino spike. In that case there WAS NO STORY because it was accepted that using a YTD average in deriving the annual graphs was standard practice at Hadley. Anyone in the know of course realized that this temporarily over emphasized temperatures for the current year (and earlier years for the smoothed series).

If you wanted to mirror what has just happened in a hypothetical situation based on the transition from 2006 into 2007 it would have been:

– Prior practice would have been to NOT include YTD numbers in the calculation/display of the annual series (smoothed and unadjusted).
– Then starting at the beginning of 2007 Hadley would have changed their method and included a current YTD figure in the graphs.
– This would have caused the curves on the annual graphs to suddenly spike up at the end.
– What Hadley has actually done now is no better than the hypothetical situation just decribed except that with the hypothetical situation you couldn’t try and pass it off as an “error”
UK John

Posted Apr 1, 2008 at 1:01 PM | Permalink

Phil is quite right this is just a correction, nothing more, nothing less. I cannot believe there was any motive behind this, I am a firm believer in the cock up theory, you have to be clever to conspire.

It is a correction of a statistical algorithm, so poor, that only two months of “against the trend unexpected data” can destroy the whole time series.

Now that is not clever!
Mark

Posted Apr 1, 2008 at 1:19 PM | Permalink

#115 While on the smoothed curve it might have looked like a continuation of a normal trend this wouldn’t have been the case with the unsmoothed curve. I’ve taken the annual data from 2001 on and then added in the 2007 number as it would have been calculated as of the end of February 2007:

Tell me how they didn’t notice this “anomaly”!!!???

Have no doubt they knew EXACTLY what was going on!
Willis Eschenbach

Posted Apr 1, 2008 at 2:38 PM | Permalink

Geoff, many thanks for your reply. You say:

Re # 93 Willis Eschenbach

Willis, I have no argument with you. Indeed, I am enjoying learning from you. I was reacting to some aspects of endpadding and to comments above that do deal with looking into the future. There are some.

Heck, I’ve been smoothing data for decades. When we get into a discussion about which is the best method for a particular set of data, we sometimes make a guess as to which method might be most suitable, and we usually find that different choices give different results. Not all the guesses are equal. When the guess is aimed at positioning past data so that the next point sits smoothly, that is (obliquely) guessing about the future.

My preference is simplistic. The last data point should be actual.

Our backgrounds might also have us at odds. In my work we treasured anomalous data points as they were often information-rich. In this work the more common treatment is smoothing out the spike to make a trend easier to comprehend.

Mea culpa for not expressing concisely.

Geoff, rather than guess which smoothing method is most suitable, it is quite possible to test each candidate smoothing method to determine their average final point errors over the dataset. That lets you select the one which does the best job of estimating the final end point.

Regards,

w.
Spence_UK

Posted Apr 1, 2008 at 3:30 PM | Permalink

Re #91 (following on from #80), UC:

If we can’t specify the dynamic model for a Kalman filter, how about the sequential monte carlo method (particle filter)?

Of course, we can only determine whether it is any good after seeing the results 😉

FWIW I tend to agree with Leif (and others who expressed the view), better to truncate the smoothed series. This gives a clear, concise line with no added complication for the reader to understand, and fewer degrees of freedom for tuning the graphic.
Sam Urbinto

Posted Apr 1, 2008 at 4:28 PM | Permalink

I’ll get worried when the anomaly decides to go over +/- 1 C So what if the anomaly in 2007 spiked to .575 or not?

Did the preliminary number for 2007 ever show up like that? If not, why would they have the 2008 prelimnary number show up like it did?

Something still seems wrong. Who knew what, when? I don’t think it matters, but still, interesting. And curious.
bill-tb

Posted Apr 1, 2008 at 4:54 PM | Permalink

Curious that all the errors seem to go the wrong way for AGW team, isn’t it?
Sam Urbinto

Posted Apr 1, 2008 at 4:59 PM | Permalink

It’s what’s expected. I think it’s the measurements, methods, and averaging just happening to be higher since about 1980. Certainly historically it’s been up. What it means, eh.
Neil Fisher

Posted Apr 1, 2008 at 8:53 PM | Permalink

Curious that all the errors seem to go the wrong way for AGW team, isn’t it?

Which only ever seems to mean that “it’s worse then we thought!”. ;-(
Geoff Sherrington

Posted Apr 1, 2008 at 9:28 PM | Permalink

Re # 120 Willis Eschenbach

Thank you for the courtesy in your reply to my poor expression.

Steve’s second graph HadCru CLB: End Effects shows a red line resulting from a projection into the future that could have been made in 2007. He labelled it as questionable. My small plea was to keep up such labelling.

Again obliquely, I failed to specify “future”. I did not just mean the days following today. I meant to include the time following a data point in the past. In this sense, smoothing/filtering can be a projection from the past into the future and different methods do give different results.

It is a given that improvements to bring calculated data closer to observed are to be congratulated. The more that method can replace guessing, the better we all are. I was not having a go at you.

OT if I may – In 1978 or so we flew an aircraft for a half million km recording some 300 observations a second. We sought by software to construct an automatic positive anomaly filter/detector on line data. I have not revisited this for a decade. Can you point me to a recent reference that appeals to you as a specialist? It’s not vital, just a lingering question. Wikis have limits.
hswiseman

Posted Apr 1, 2008 at 9:55 PM | Permalink

It will be interesting to see how the Cryosphere folks react when a really late Hudson’s Bay ice out messes with the anomoly big time. I predict some crying….and a whole lotta smoothing.
Phil.

Posted Apr 1, 2008 at 10:16 PM | Permalink

Re #119

Mark, where did you get the 2007 from, it doesn’t match with the data I’ve seen? Early in 2007 the monthly data showed 0.507.
Ron Cram

Posted Apr 1, 2008 at 10:25 PM | Permalink

re: 114

Mike C,

I am not sure what is meant by “PDO advocates” but evidently I am suspect. I do not view the PDO as “causing” ENSO, but I do view the PDO as more important than ENSO for long-term climate prediction. Although interactions between PDO and ENSO occur, I also do not think ENSO “causes” the PDO – as you seem to think. PDO regime changes are not well understood but it is highly likely the PDO is very sensitive to changes in solar over time. Other factors may also be involved that no one has yet identified.

ENSO gets a great deal of publicity because the changes from year to year are memorable. News readers/viewers have a context in which to understand changes in ENSO. Changes in the PDO are not not memorable and a great many news readers would not remember the last time the PDO was in its cool phase.
Willis Eschenbach

Posted Apr 1, 2008 at 11:39 PM | Permalink

Steve M., I was able to replicate your results exactly regarding the difference between the monthly and the annual data … very strange.

However, I haven’t found anything to explain the difference. It’s not the removal of the anomalies, I checked that. It’s not that some months have different numbers of days, checked that as well.

I’m in mystery …

w.
Geoff Sherrington

Posted Apr 2, 2008 at 12:12 AM | Permalink

Re 139 Willis Eschenbach

Do I recall that the reference period for calculating anomalies was changed for monthly as opposed to annual at some stage? But that would not give the observed pattern, I think.
jeez

Posted Apr 2, 2008 at 12:21 AM | Permalink

Am I not the only one amazed that we have global temperature measurements from 1861 perceived as accurate to three decimal places?

Yes I know we are talking about anomalies, but you have to have numbers from which to subtract the mean.

Wasn’t spurious accuracy covered in these people’s basic arithmetic classes?

They would be lucky if measurements from 1861 from a single station are accurate to half a degree.

Begin law of large numbers discussion and debunking, or just link to appropriate thread on CA.
JD

Posted Apr 2, 2008 at 12:31 AM | Permalink

Weather noise is not defined, so we can’t write the dynamic model. Maybe climate signal is a random constant, and all we observe is weather noise. I can write a Kalman filter for that case

Indeed, the first step in implementing a Kalman filter is to estimate the noise covariance. Easy enough to do when your noise is thermal noise in a receiver, which is white: sigma^2 * I.

Whilst I agree that weather noise is not defined, I was under the impression that red (brown) noise, possibly modified, was a reasonable model. It is best to truncate or show that there are reservations about the ends of the curves. However, if this region is important, and it clearly is judging by the amount of interest here, then why not use Kalman filtering?
Tuukka Simonen

Posted Apr 2, 2008 at 1:14 AM | Permalink

The correction was probably done because I sent a following e-mail to the guy who maintains the pages a couple of weeks ago:

Dear mr. Kennedy

I was looking at the temperature record graphs here:

http://hadobs.metoffice.com/hadcrut3/diagnostics/global/simple_average/

I noticed that the year 2008 is already drawn to the annual graphs even though data are only available for january. This makes the first curve to look like that the whole global warming has stopped because of cold january 2008 which obviously isn’t the case.

I think it would be best to either weight every month separately and make a graph with monthly records instead of annual records or leave the year 2008 out of annual graphs until the dataset is available for the whole year. Now the cold januarly makes the whole year 2008 to look like coldest for a long time even though we don’t know yet whether it’s going to be record warm or moderately cold this year.

Best regards

Tuukka Simonen
Project planning officer
Information systems science
Turku School of Economics, Finland

You can see the similarities with the correction notification:

We have recently corrected an error in the way that the smoothed time series of data were calculated. Data for 2008 were being used in the smoothing process as if they represented an accurate esimate of the year as a whole. This is not the case and owing to the unusually cool global average temperature in January 2008, the error made it look as though smoothed global average temperatures had dropped markedly in recent years, which is misleading.

What do you think?
Jeff Norman

Posted Apr 2, 2008 at 2:35 AM | Permalink

Steve,

In your original post you say:

Merely from looking at the monthly temperature histories, I urge readers not to draw any particular conclusions from a couple of cold months. The monthly history has many such cold downspikes and recoveries tend to be quite rapid.

Okay, but where did the heat energy go? And if the recovery from the downspike is quite rapid then where would the heat energy come from?

If you assume hot air is hiding somewhere then you are forced to concede that the temperature record is flawed or certainly has a much larger uncertainty associated with it than normally gets attributed.

If you assume the heat is hiding out somewhere in the atmosphere as water vapour then you have to concede again that, as Roger Pielke sr. has said, measured temperature is not a good indicator of atmospheric heat content and therefore a poor indicator of global warming.

If you assume that the heat energy has been lost to space then you have to concede the possibility of the Lindzen iris effect and thereby cast out the current warming hypothesis.

If as you suggest the heat energy quickly recovers then someone has to explain where it came from and how it got there because a rapid increase in temperature is not really consistent with AGW by GHG.

It’s like the 1998 el Nino. Where did the heat come from and where did it subsequently go? These questiona have never really been addressed AFAIK.

And don’t say the oceans because I have yet to hear/see an explanation of a joule of how heat energy in the atmosphere gets carried back to the surface and hidden away in either the oceans or the land surfaces.
Steve McIntyre

Posted Apr 2, 2008 at 6:07 AM | Permalink

#135. Jeff, the questions are valid and puzzling, but there have certainly been other downspike from which there were rapid returns.

The question that I take from this – and it’s similar to yours – is this: the present downspike is presumably “unforced” in GCM terms, in the sense that El Nino-La Nina cycles are or should be generated within a GCM and are not themselves “independent” variables at the level of solar, volcanic, GHG, aerosol. If you can have an unforced short-term fluctuation of a half-degree or 3/4 degree on a global scale over a short-term, then why can’t you have it on a decadal scale as well? I know that people will arm-wave and say that it’s climate and not weather, but it’s not hard for me to imagine a 1/f distribution for Nino-Nina’s that could generate decadal unforced fluctuations of the same scale as the present cold snap.

If there were 1/f noise, as Cohn and Lins among others argued, then this would have a considerable impact on the level of significance of the trend over the past 30 years.
Patrick Henry

Posted Apr 2, 2008 at 6:27 AM | Permalink

The Met office says that Canada is warming fast “the coldest days are now up to 4 °C warmer than they were in the middle of the 20th Century”
http://www.metoffice.gov.uk/corporate/pressoffice/2008/pr20080326.html

Maybe Ottawa should invite them to come over and help clear the snow.
David Smith

Posted Apr 2, 2008 at 6:34 AM | Permalink

Re #135 Jeff during a La Nina event the heat basically accumulates in the western tropical Pacific, beneath the surface. Being piled up like that, the warm water loses contact with the atmosphere and is unable to release its heat.

Then, when the anomalous winds which caused the pile-up weaken, the water is able to spread eastward across the surface of the ocean. It then has greater contact with the atmosphere and is able to release its heat.

A good website to visit is TAO Warm Water Volume , which shows accumulation and dissipation of warm water in the tropical Pacific. Check the anomaly plots.

I have some interesting graphs of warm pool behavior but they’re at home, and I’m not. There are also some good links which show the ocean profile but those aren’t handy.

Steve M’s question is key and I’d add something to it: can there be multidecadal periods in which warm water that piles up in the western Pacific mixes with cool water beneath it, thus effectively dissipating part of the accumulated warmth? The heat doesn’t disappear, it instead just slightly warms the deeper water. But, as far as we on the surface are concerned, we have a multidecadal cool spell.
Mark

Posted Apr 2, 2008 at 7:03 AM | Permalink

#128 – I used the January/February average from the HADCRUT3 global average series:

http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/monthly
Mark

Posted Apr 2, 2008 at 7:07 AM | Permalink

#134 Did you get a reply?
David Smith

Posted Apr 2, 2008 at 7:56 AM | Permalink

Re 3135 The current upper-ocean temperature anomaly profile is here . This is a cross-section along the equator – colors represnt temperature anomalies while the gray is land.

It shows the current pile-up of warm water in the western Pacific. This pile is overlaid to some extent by wind-driven cool water from the eastern Pacific. At some point the winds will weaken (already happening) and the warm pile will tend to rise and spread eastward across the surface, releasing heat to the atmosphere.

$64,000 questions include whether, and to what extent, does the pile of warm water mix with cooler deep water. And, does the behavior change over time? And, might any mix eventually move to a region where radiative loss is better than it is in the Warm Pool region?
Ron Cram

Posted Apr 2, 2008 at 8:09 AM | Permalink

re: 134

Tuukka,

You certainly noticed the issue. No doubt you wish they had credited you for finding the error and you probably should have been named.

Why do you think the error was not noticed when January 2007 was unusually warm?
Phil.

Posted Apr 2, 2008 at 8:17 AM | Permalink

Re #140

OK that explains it, the currently listed data is different than that used in 2007, which were 0.611 and 0.506.

Re #143
Ron in Jan 2007 the warm anomaly was continuing a warming trend so it doesn’t result in a change of slope, whereas in 2008 it was a sharp change in sign which attracts attention (there’s no reason to suppose that Tuukka was the only one who commented on it.
Mike C

Posted Apr 2, 2008 at 8:48 AM | Permalink

129 Ron Cram,
Given that PDO is so poorly understood is the point. ENSO changes weather patterns all over the globe. When pressure centers change in location they create changes in surface winds, which create changes in ocean currents, which create changes in the volume of cold upwelling ocean water (not just cold upwelling in the equatorial pacific but all over the pacific). The question as to why ENSO events follow a 30 or so year cycle is not yet known.

136 Steve,
That is a major area where the models fail. They look at the surface as being (sort of) a constant temperature that is changed by forces from above (volcanic aerosols cool, solar heats and etc). What they do not consider is the cool upwelling from below, which essentially changes the assumption that the surface temperature is constant.
Mark

Posted Apr 2, 2008 at 8:53 AM | Permalink

#144: As I stated in #119 the not changing in slope argument is not valid since we’re not just talking about the smoothed curve. When you reference the Hadley graphs there are 3 series on the one web page: smoothed annual, annual (unsmoothed) and monthly. Certainly the sudden jump in the annual (unsmoothed) curve would be noticeable as I illustrated. Also Hadcrut3 is not the only data series which uses this standard set of charts. These graphs are also produced for multiple geographic regions for a variety of datasets. Probably the most noticeable jump as per the February 2007 results would have been in the Northern Hemisphere land temperatures (CRUTEM3) where the annual (unsmoothed) graph would have been based on these numbers:

2001 0.779
2002 0.854
2003 0.801
2004 0.819
2005 0.923
2006 0.881
2007 1.598 (average of Jan07/Feb07 monthly data).

Any argument that Hadley didn’t know all along that their methods created distortions in their graphs does not have any sound basis. Again, they had no problem with that when it worked in favour of their beliefs. When the shoe was moved to the other foot – well we’ve just witnessed the reaction.
Hu McCulloch

Posted Apr 2, 2008 at 9:03 AM | Permalink

Re #134, 143, 144 etc, so CRU admits the old formula (endpad with the average of the last year even if it only has two months of data) was wrong, but what is the new formula? Will they discard 2008 and endpad with 2007 until 2008 is complete so that the series will only change once a year? Are they still endpadding?
Mark

Posted Apr 2, 2008 at 9:09 AM | Permalink

#145 I agree with the notion that upwelling of cold water is a key factor in what drives planetary climate. There is a very poor understanding at this point as to the underlying mechanisms and related data. However I did find one item from McPhaden and Zhang which dealt with changes in upwelling volumes off the coast of South America asociated with the Great Pacific Climate Shift in the late 70’s. It allowed me to do a sample calculation as to how powerful a factor this could be relative to other items incorporated into IPCC analyses:

Calculated Impact of Reduced Deep Upwelling off S. American Pacific Coast

47 Sverdrups (From McPhaden and Zhang)
– 35 Sverdrups
12 Sverdrups Difference
x 264,000,000 US Gallons/Sverdrup
x 3.785411 Litres/US Gallon
x 16 Degrees C. Temperature Differential (Zero at depth, 16 C. at surface) – see note at end
= 191,874,912,768 Kcal
x 4,184 Joules/Kcal
802,804,635,021,312 Watts

divided by

510,064,471,909,788 Earth Surface Area (square metres) 6,371 Earth Radius (km)

= 1.57 Watts/Square Metre

I used a 16 degree differential as that got me to a number similar to the IPCC claims on net AGW impacts. However it really depends on where the offsetting “downwelling” occurs. However given that average ocean surface temperatures are just over 12 degrees C. and equatorial sea surface temperatures off of South America can exceed 30 degrees C., an actually figure of 16 degrees C. is certainly plausible.
aurbo

Posted Apr 2, 2008 at 9:11 AM | Permalink

Re some questions above on why the Global temperature anomalies do not appear to be conserved on a monthly or inter-annual basis. It may be that there is some confusion or conflation of energy in general whether it be thermal, observed as temperature, or latent, potential or kinetic measured by other physical parameters.
My woefully incomplete understanding of the complexity of thermodynamics notwithstanding, conservation of energy does not mean conservation of temperature.

For example, as for the month to month imbalance in total global temperature anomalies, one element to consider is the monthly or annual variation in Global water vapor. That is, the same amount of thermal energy may be present in the atmosphere even if the sensible heat (temperature) is somewhat different. It’s not called latent heat for nothing.

Consider the countervailing effects that take place when, for example, large areas of the Arctic Ocean are free of ice which occurs in many summers, especially in the late stages of a presumably cyclical multi-year warm regime. Heat that would be trapped in the water under an ice-cover is freed to move into the atmosphere in the form of latent heat contained in the water vapor evaporated from the exposed liquid sea surface. This latent heat will be subsequently converted into sensible heat in the atmosphere when condensation or sublimation takes place. At the same time, the process of evaporation extracts heat from the adjacent sea/atmosphere interface which results in cooling of the sea surface. The saturation water vapor which limits the rate of evaporation is totally dependent upon the surface temperature of the Ocean, and evaporation will continue until the atmosphere in contact with the surface becomes saturated. Ocean heat energy observed as water temperature then becomes “hidden” as latent heat in the water vapor which is carried away into the atmosphere.

When the sun-angle in the Arctic is very low and the water is exposed in ice-free areas (typical of late summer/early autumn) the balance of outgoing and incoming thermal radiation will shift to a net negative (outgoing). So, ironically, the greater the area of exposed water in the Arctic Ocean, the larger the amount of heat lost from the Ocean whether by conduction at the immediate interface, by evaporation into the atmosphere as latent heat of the water vapor, or radiated out into space. I suspect this is one component of an iris effect which involves sea ice rather than overhead cloudiness.

All of the above is exclusive of the processes occurring at the ocean/ice interface under an ice-cover. The SST is essentially isothermal where melting and/or refreezing is taking place and until the ice-cover becomes very thin, solar heating of the ocean surface layers is minimal.

The bottom line is that these ocean/atmosphere/solar processes can be very complex and at times, perhaps, counterintuitive.
Mark

Posted Apr 2, 2008 at 9:11 AM | Permalink

#146 If you look at the data files for the Hadley annual graphs you’ll see that they no longer include current year-to-date data hence the annual graphs will likely now only change (ie. be extended a year) once a year in January.
Phil.

Posted Apr 2, 2008 at 9:24 AM | Permalink

Re #145

Certainly the sudden jump in the annual (unsmoothed) curve would be noticeable as I illustrated.

Not as much as you think, firstly the ‘jump’ wasn’t as big as you show in the contemporaneous data, secondly as you show 2007 just appears as a continuation of the previous warming in 2004 & 2005 with a hiatus in 2006, it isn’t that remarkable.
Steve McIntyre

Posted Apr 2, 2008 at 9:36 AM | Permalink

IPCC chapter 3 Appendix 3A, with many Hadley authors, says that for its endpoint padding , it reflected the time series at the boundary, a frequent padding alternative.

… which effectively reflects the time series about the boundary. If there is a trend, it will be conservative in the sense that this method will underestimate the anomalies at the end.

Interesting that Hadley Center publications for the public do not follow the “more conservative” practice adopted by IPCC.
UC

Posted Apr 2, 2008 at 9:43 AM | Permalink

Hu,

Are they still endpadding?

Yes. s21 is currently padded with 2007 values http://www.climateaudit.org/?p=2865#comment-229074
Phil.

Posted Apr 2, 2008 at 9:52 AM | Permalink

Re #152

Yes. s21 is currently padded with 2007 values

Not in their datafiles.
Mark

Posted Apr 2, 2008 at 10:00 AM | Permalink

#150 Not very noticeable???
UC

Posted Apr 2, 2008 at 10:45 AM | Permalink

Not in their datafiles.

You have better explanation for 0.420 in http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual_s21 , last row, second column ? (Downloaded today 16:38 UTC, (no daylight saving ))
Hu McCulloch

Posted Apr 2, 2008 at 10:54 AM | Permalink

Steve (#151) writes,

IPCC chapter 3 Appendix 3A, with many Hadley authors, says that for its endpoint padding , it reflected the time series at the boundary, a frequent padding alternative.

… which effectively reflects the time series about the boundary. If there is a trend, it will be conservative in the sense that this method will underestimate the anomalies at the end.

Interesting that Hadley Center publications for the public do not follow the “more conservative” practice adopted by IPCC.

But I thought you and Willis established back in the 6/9/07 thread on “Mannomatic Smoothing and Pinned Endpoints” that reflecting about the boundary, as advocated by Mann (2004) and evidently adopted by IPCC, amounts to endpegging and therefore is much “less conservative” than endpadding with the last value, as apparently done by CRU.

Of course, endpadding with the last value still gives it inordinate importance. Short of extrapolating the final trend, it would make more sense to truncate the filter and renormalize as discussed above. And as I discussed on the earlier thread, there are much better ways to extrapolate the final trend than endpegging.

Steve: Hu, there are two different sorts of reflection. “Standard” reflection is to reflect along the x-axis, so that the value in year N+10 is equal to the value in year N-10. This is an expedient often used in algorithms. Mannian reflection is a double reflection – along the x-axis and along the y-axis centered on the endpoint value. If this were done in the HAdCRU case, they’d more or less be doubling up the Jan-Feb downspike.
Phil.

Posted Apr 2, 2008 at 11:34 AM | Permalink

Re #155

You have better explanation for 0.420 in http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual_s21 , last row, second column ? (Downloaded today 16:38 UTC, (no daylight saving ))

Yes it’s an average of the actual data taken in 2007!

Re #156

Hu the reflection you refer to is just preserving the gradient at the end, it’s used in some versions of cubic spline fitting for example.
UC

Posted Apr 2, 2008 at 11:53 AM | Permalink

Yes it’s an average of the actual data taken in 2007!

and

http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual

2007 0.403 is ?
Steve McIntyre

Posted Apr 2, 2008 at 12:03 PM | Permalink

Phil: UC is 100% correct. You can confirm that the s21 version is produced by padding of the annual version as follows:

had_s21=read.table(“http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual_s21”)
#Annual series smoothed with a 21-point binomial filter
had=read.table(“http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/annual”)
#1 1850 -0.374 -0.359 -0.389 -0.273 -0.476 -0.327 -0.417 -0.272 -0.477
id=c(“year”,”cru”,”ub1″,”lb1″,”ub2″,”lb2″,”ub3″,”lb3″,”ub1+2″,”lb1+2″,”ub”,”lb”)
dimnames(had_s21)[[2]]=id
dimnames(had)[[2]]=id

#pad through end value
f=function(x) {
F=choose(20, 0:20)/sum(choose(20, 0:20))
N=length(x)
test= c(rep(x[1],10),x,rep(x[N],10))
test=filter(test,F);
f=test[11:(N+10)];
f}
range(f(had[,”cru”])-had_s21[,”cru”])
# -0.0006833076 0.0006065302
Mike C

Posted Apr 2, 2008 at 12:04 PM | Permalink

147 Mark,
The associated down welling mainly occurs in the Indonesia area. But you would have to be careful because some of the heat in that area during La Nina is dissipated by evapotranspiration. The warm water is piled up in a small but deep pool near Indonesia. Evaporation carries away some of the water leaving a saltier / heavier water that sinks. See the bottom animation here:
http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/wkxzteq.shtml Some of the heat is also dissipated by the “horse shoe pattern” of weather systems that travel to higher latitudes (both north and south) then back to the east. Animation of actual and anomalies here: http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_update/sstanim.shtml
One thing you might want to keep in mind in all of this is that there is no Coriolis effect at the equator so you will have different ocean and atmosphere behavior at the equator than at higher latitudes.
Phil.

Posted Apr 2, 2008 at 12:11 PM | Permalink

Jeez, I thought you were a mathematician!

0.403 is the actual average of the data for the whole of 2007, the value in the smoothed version (0.421) is the result of smoothing the annual data.
The data since 2000 is: 2001 0.400, 2002 0.455, 2003 0.457, 2004 0.432, 2005 0.479, 2006 0.422 , 2007 0.403
As you’d expect there’s a slight downturn at the end in the smoothed data too!
The announced change has been not to include data from the partially completed year, hence no entries for 2008 in the annual data, previously the data for the partially completed year had been used.

Steve: Oh, puh-leeze, Phil. What’s your point? UC’s filter yields the reported s21 exactly.
UC

Posted Apr 2, 2008 at 12:14 PM | Permalink

Mannian reflection is a double reflection – along the x-axis and along the y-axis centered on the endpoint value. If this were done in the HAdCRU case, they’d more or less be doubling up the Jan-Feb downspike.

I used that in http://www.climateaudit.org/?p=2955#comment-229796 , red curve (starts at 1850 and ends 2008). Blue is vector ‘padded’ from Mann’s code. The method is not Mann’s invention, but fits to the picture very well when temps go up (‘the constraint employed by the filter preserves the late 20th century trend’, MannJones03), and very badly if temps keep going down.

Memo to Hadley: Expand the uncertainties or do not smooth near the end-points.
Phil.

Posted Apr 2, 2008 at 12:15 PM | Permalink

Re #160

Yes they’re still using the same smoothing algorithm, it seems to do a fairly good job. They’re not padding out the incomplete year anymore which is what the whole storm in a teacup has been about.

Steve: Oh puh-leeze. We’re just observing a little irony here.
Phil.

Posted Apr 2, 2008 at 12:29 PM | Permalink

Memo to Hadley: Expand the uncertainties or do not smooth near the end-points.

Or do what they are doing which is show the annual data and smoothed data together. Doesn’t seem to be much to complain about frankly, since they give the actual data on the same page there’s nothing to stop anyone checking out their own favorite algorithms.

Steve: They’re disseminating this information to the public. The “public” should not be required to run algorithms to check theHAdley Center.
UC

Posted Apr 2, 2008 at 12:33 PM | Permalink

Jeez, I thought you were a mathematician!

🙂

Enough talking with you, back to the Kalman filter problem..
jeez

Posted Apr 2, 2008 at 12:55 PM | Permalink

I never claimed to be a mathematician. Although I did compete in a Math Field Day in 1973. Did pretty well too.
Johan i Kanada

Posted Apr 2, 2008 at 1:28 PM | Permalink

Option 1)
When a year starts and ends is entirely arbitrary. Hence, for the purposes of graphing temp trends, the best and most robust
method must be to use the “before present” approach, i.e. the last year is always the last 12 months, regardless of whether
the last months is Dec, Jan, June or any other month. Then calculate the annual values from the monthly values (simple
average). Finally, run the annual values through a causal LP filter of some sort, and generate a curve (where possibly some
sort of trend can be calculated/detected).

Option 2)
Forget about the annual values, start from the available monthly values, run them through a causal LP filter of some sort,
and generate a curve (where possibly some sort of trend can be detected/calculated).

Why complicate life?

I would also like to posit that a simple 10 sample moving average (120 samples in the second case) is perfectly good enough
for trend detection purposes. If this is not correct, please show me the difference. (I really would like to understand
the problem (if any) with using a simple moving average.)

If a non-casual filter is used, one has to exclude the last N values (where N is the number of samples right
of center in the smoothing window (e.g.)) since they are not valid (whatever the padding method, because they are simply
not comparable to the rest of the values). Hence, much better to use a causal filter (i.e. no padding). Does that make
sense?
steven mosher

Posted Apr 2, 2008 at 2:19 PM | Permalink

re 166. that was a mass feel day in 1968 dude.
Tuukka Simonen

Posted Apr 2, 2008 at 4:29 PM | Permalink

#140 Yes, I got a reply:

Dear Tuukka Simonen

We are constantly reviewing the way that we present our data on the web and feedback like yours helps us to do this in an informed manner. While we cannot guarantee to please everyone, we will consider all suggestions.

My thanks and best regards,

John

#142 Well, I think the January 2007 didn’t cause as strong bending in the trend that would have been clearly visible since the trend was going upwards to begin with. Now that it clearly twisted the trend it was easy to identify the problem (well not that easy because feedback was needed to identify it).
jeez

Posted Apr 2, 2008 at 5:17 PM | Permalink

I think the Hadley Center missed a great opportunity to demonstrate prescience.

Should all of 2008 turn out to be as cold as January, February, and March have been, the predictive power of their treating of endpoints would have been very impressive.

Of course this would be different from their current demonstration of postscience.
hswiseman

Posted Apr 2, 2008 at 6:11 PM | Permalink

#135 & 148

I suspect that most here know of the latent heat phenomena, but this is a good explanation below. Most interesting is the difference in energy transfer between evaporation/condensation versus melting or freezing.

Directly quoted from METEOROLOGIST JEFF HABY, theweatherprediction.com

The processes of evaporation and condensation take 7.5 times as much energy as melting or freezing. This is why evaporational cooling will cool the air much more than the melting of snow. For example, let’s say snow is falling and the outside temperature is 40 degrees Fahrenheit. As the snow falls into the warmer air it will begin to melt and some of it will be evaporating. The evaporation from the wet snow will cool the air 7.5 times as much as the melting of the snow. If the temperature drops from 40 to 32 degrees as the snow falls, about 7 of those 8 degrees of cooling is caused by the evaporation process. Melting cools the air also, just not near as much as evaporation does. When water undergoes a phase change (a change from solid, liquid or gas to another phase) the temperature of the H20 stays at the same temperature. Why? Energy is being used to either weaken the hydrogen bonds between H20 molecules or energy is being taken away from the H20 which tightens the hydrogen bonds. When ice melts, energy is being taken from the environment and absorbed into the ice to loosen the hydrogen bonds. The energy taken to loosen the hydrogen bonds causes the surrounding air to cool (energy is taken away from the environment: this is latent heat absorption). The temperature of the melting ice however stays the same until all the ice is melted. All hydrogen bonds must be broken from the solid state before energy can be used to increase the H20’s temperature.

Energy always flows from a warmer object toward a colder object. An ice cube at 32 degrees F absorbs energy from air that has a temperature warmer than freezing. Energy flows from the room toward the ice cube. Throw enough ice cubes in your kitchen and you may notice the temperature of the air cooling slightly. Energy is taken from your warmer room and moved into the ice cubes to melt them; A subtraction of energy causes cooling. The same holds when comparing freezing to condensation. The condensation process will warm the surrounding air 7.5 times as much as when the freezing process occurs. When a thunderstorm develops, the release of latent heat by condensation is 7.5 times as much as the release of latent heat by freezing. Now let’s do some application of this latent heat process with regard to forecasting.

1. Evaporational cooling from rain (in the absence of downdrafts) will cause the temperature to decrease but the dewpoint to increase. The dewpoint will always (in the range of normally observed temperatures) increase more than the temperature falls (e.g. suppose the temperature is 70 F with a dewpoint of 50 F, after a persistent rain the temperature will cool to about 63 and the dewpoint will rise to about 63).

2. Temperatures have a difficulty warming significantly on days when there is surface snow cover. The melting and evaporation from the snow continuously cools the air.

3. Condensation releases latent heat. This causes the temperature of a cloud to be warmer than it otherwise would have been if it did not release latent heat. Anytime a cloud is warmer than the surrounding environmental air, it will continue to rise and develop. The more moisture a cloud contains, the more potential it has to release latent heat.

4. The amount of cooling experienced during melting or evaporation is a function of the dewpoint depression. If the air is saturated, evaporation will be minimized. Evaporational cooling can not take place once dew forms on the ground but can start to take place when the sun begins to warm the surface (dewpoint depression becomes greater than 0).

5. Dry climates tend to have a larger diurnal range in temperature than moist climates. The primary reason is because of latent heat. In a dry climate, evaporational cooling is at a minimum and there is little water vapor to trap longwave radiation at night. Therefore, in a dry climate the highs will be higher and the lows lower as compared to a moist climate at the same altitude and latitude (all else being equal).
Hu McCulloch

Posted Apr 2, 2008 at 7:37 PM | Permalink

Re Steve #156, I stand corrected. Just a single reflection will in fact end up being equivalent to truncating the weights at the end point, and therefore is more conservative than endpadding. I had forgotten Mann’s method uses a double reflection.
Mark

Posted Apr 2, 2008 at 8:54 PM | Permalink

#142 As I’ve pointed out in several posts, this change is not just about the smoothed data plot as it also involves the unsmoothed graph as well. With that graph it was very clear that in early 2007 the use of a YTD average in conjunction with full year averages for prior years was misleading in the presentation of results with a significant spike up in what had been a relatively flat trend for the prior 4 or 5 years. However this suited the alarmist bent of Hadley so they were quite happy to leave it be. It was only when it started to work against them that they decided to change things up.
Phil.

Posted Apr 2, 2008 at 9:20 PM | Permalink

Re #174

With that graph it was very clear that in early 2007 the use of a YTD average in conjunction with full year averages for prior years was misleading in the presentation of results with a significant spike up in what had been a relatively flat trend for the prior 4 or 5 years.

Not true as shown by your own graph (reproduced below), it was not a flat trend for several years, in fact 2007 continued the trend from 2004 & 2005 with a hiatus in 2006!
Geoff Sherrington

Posted Apr 3, 2008 at 2:42 AM | Permalink

Re # 168 Steven Mosher

A mass feel day in 1968. Memory lane. She was only a farmer’s daughter, but you could never get her back on the land. But rigidy didge, I have never smoked pot. Not even forgetting to exhale. So the 60s were not the magic dragon for everyone.

More OT, where does the heat go from month to month as the average temp changes? Probably clouds have a lot to do with it, as so many have eloquently said on CA before. Ever noticed on a hot sunny day how much cooler it is when a cloud rolls by? That cloud is sending heat back to space. Or so some models say.
John F. Pittman

Posted Apr 3, 2008 at 5:06 AM | Permalink

#174 Phil By the same reasoning, 2008 should have been considered as continuing “the trend from 2003 to 2004 and 2005 to 2006 with a hiatus in 2005 & 2007. And thus it is still “bait and switch”. Notice that the small trend that started in 2003 to 2004 is simpler getting larger every other year. (As if anyone can actually believe in such a small number of points essentially the same in magnitude representing such a complex system as the world’s “average”.)
RomanM

Posted Apr 3, 2008 at 6:36 AM | Permalink

#172 Hu:

Truncating the weights at the end point is what I consider a great way to continue the smoothing to the endpoint. However it is not equivalent to a single reflection padding. Consider the following example:

Suppose you are doing a simple 1-2-1 smoothing on a series and the final two terms are a, b. The last smoothed value is:

Single reflection (x axis): Smooth a, b, a. Smoothed result: (a+b)/2

Double reflection (x and y axes): Smooth a, b, b+(b-a). Smoothed result: b

Truncated: Smoothed result: (a+2b)/3

Truncating the weights is equivalent to padding the series with the smoothed result itself:

Smooth: a, b, (a+2b)/3. Smoothed result: (a+2b)/3.

Notice that in a truncation smoothing with more weights, at each stage of truncation as you approach the end of the series, you would be using different padding value.
Johan i Kanada

Posted Apr 3, 2008 at 8:14 AM | Permalink

#177

This illustrates why acousal smoothing is, not only un-physical, but also arbitrary and inconsistent, and thus incorrect.

Isn’t the best (in fact the correct) approach to apply a simple LP filter? Such an approach generates simple, non-disputed,
consistent, and physically appropriate results, i.e. valid and correct results.

What are the main objections to the LP filter approach?
Mark T

Posted Apr 3, 2008 at 8:38 AM | Permalink

Um, a binomial filter is a simple low-pass filter. You need to define what you mean by “a simple LP filter” since that is about as vague as you can get. Any FIR implementation will be acausal or, if implemented causally, it will result in a delay that also skews the results. Are you meaning an IIR, i.e. a filter with feedback?

Mark
Johan i Kanada

Posted Apr 3, 2008 at 9:08 AM | Permalink

#179

I meant an IIR. Like e.g. y(t) = (1-a)u(t) + ay(t-1).

Btw, if by binomial filter you mean smoothing taking the future into account, I don’t really see that as a filter in a real physical sense, but more like a way to recalculate the numbers for presentation purposes. Note what we are trying to do, is to detect a trend (or not) in a measured (sort of) physical property. Thus, to use acausal methods does not seem like a valid approach.
Pat Keating

Posted Apr 3, 2008 at 9:11 AM | Permalink

175 Geoff
Not only the models. Pilots flying above the cloud tops know how bright the reflection of sunlight from them is.

179 Mark
Yes, indeed.
My favorite smothing/LP filtering technique for most applications is the recursive filter, Yn = (1 – a)*Yn-1 + a*Xn, where {X} is the
raw data, {Y} is the smoothed data, and a determines the cut-off frequency.

It is computationally very-efficient, well behaved, and causal. However, it does produce a delay, but that is inevitable.
Hu McCulloch

Posted Apr 3, 2008 at 9:55 AM | Permalink

Phil #174, I agree with John (#176) that it’s a real stretch to call 2004 and 2005 a trend “with a hiatus in 2006”. Cherry-picking the start point of a trend line is bad enough, but cherry-picking both ends is even worse! (In fact, what Phil is doing here might be called inverse cherry picking, or “lemon dropping”.)

Roman (#177) is right that even at the end point, reflection will not be quite equivalent to truncation with discrete data. I was thinking in terms of integrals of continuous data, where they will be equivalent at the end point (though not quite as you approach the endpoint). But of course all this data is discrete…

With an odd number of points in the filter (so that its center corresponds unambiguously to one of the observations), there is, however, a little issue of where one should hold the “mirror” when reflecting. IPPC AR4 WG1 p. 336 (Appendix 3.a) says that Chapter 3 “uses the ‘minimum slope’ constraint at the beginning and end of all time series, which effectively reflects the time series about the boundary.” For short smoothing it uses the 5 weights (1 3 4 3 1)/12, and for longer smoothing a 13-period filter. However, it does not spell out how these are implemented at boundaries.

If we are using say the 5 year filter on 2003-2007, these years could either be reflected by holding the mirror on 07 (as Steve assumes in remark on #156) to yield a synthetic series that has the values of 03 04 05 06 07 06 05 04 …, or else they could be reflected by holding the mirror immediately after 07, to yield the synthetic series 03 04 05 06 07 07 06 05 …

The former case, reflecting on the last observation, yields weights in 2006 and 2007 of (1 3 5 3)/12 and (2 6 4)/12, respectively. Notice that the final average (for 07) actually places a higher value on 06 than 07.

The latter case, reflecting just after the last observation yields (1 3 4 4)/12 and (1 4 7)/12, resp. Here, the final average places a higher value on 07 than 06, which seems more reasonable.

Simply truncating eliminates this issue and gives weights for the last 2 averages of (1 3 4 3)/11 and (1 3 4)/8. With a long filter and small weights on each point, however, it would be hard to see the difference between truncating and either form of reflection, particularly on the last date observed.

CRU endpadding, on the other hand, will yield (1 3 4 4)/12 and (1 3 8)/12. With CRU’s 21 year filter, the emphasis on the last observation is even more dramatic than with IPCC’s 5-point filter, since the last average always places half the total weight plus half the middle weight on the last observation. This served CRU’s alarmist purposes when the trend was up, but will not if the average for 2008 turns out to be much lower than earlier years.

So my prediction for 2008 is that if 2008 does continue to run cold, CRU will quietly discover another “error” in its algorithm before it releases the smoothed value for 2008, and will switch to either reflection or truncation.

At least until another hot year materializes, that is!

Mann (2004) doesn’t spell out where he wants to reflect the data, but since his illustration involves an even-numbered (10-year) filter, he presumably reflects just after the last data point. An even filter raises the additional little issue of how you should align the averaged values relative to the original series, since they are in effect off by half a time unit. One approach would be to plot the annual average for 1990 say (which runs throughout 1990) at 1990.5, but the 10-year smoother which runs from 1985 through 1994 (whose center point is at the beginning of 1990) at 1990.0. With an odd filter, you can just plot 1990 and any average centered on 1990 at 1990.0 or 1990.5 without ambiguity. Note that the graph Phil links linked in #174 plots the reading for 2001 at 2001.5, etc.
UC

Posted Apr 3, 2008 at 9:57 AM | Permalink

RomanM, that’s a good idea to re-think the original acausal-with-padding filter as a non-time-invariant filter. It helps when we start to look at the propagation of uncertainties
( filter matrix no more Toeplitz in the non-time-invariant case, BTW, I’ve heard that term here earlier 😉 ).
UC

Posted Apr 3, 2008 at 10:37 AM | Permalink

Hu,

Mann (2004) doesn’t spell out where he wants to reflect the data, but since his illustration involves an even-numbered (10-year) filter, he presumably reflects just after the last data point.

Careful, it’s Mann , see

http://holocene.meteo.psu.edu/Mann/tools/Filter/lowpass.m

ipts=10; fn=frequency*2; npad=round(1/fn);

filter order (ipts) and padding length (npad) are independent.
Sam Urbinto

Posted Apr 3, 2008 at 11:09 AM | Permalink

Jeez:

Right, take min/max of 5 and 20 and get 12.5 for the mean for a day, It’s not measuring (recording) it to a half degree, the math is just giving me that answer. It’s the process of averaging and reaveraging. If we get a month of 31 days with a variety of .5 and .0 days (since we only record the whole number for each measurement (rounded or truncated?) what do we end up with?

Then we have to combine the readings within the grids (monthly mean for each station, then with the other stations, or daily means for all stations into a monthly mean?) Oh, and where and how are the adjustments made? Are the readings weighted by days in the month or number of stations in the grid?

Anyway, in the process of all this, we end up with more digits to the right of the decimal point. What I’m amazed is that anyone pays attention to anything other than the whole numbers at all.

Jeff:

What makes the anomaly go up “so much over ‘normal'” in Jan 2007 and go up “not as much over ‘normal'” in Jan 2008? If it’s assumed the decimal portion of the anomaly rising means anything at all, and it’s not just the averaging process, or measurement bias/error, or coincidence.

Good question. Random chance of where and how we measure it, and what the wind is doing in regards to that. Where measured point A goes up and non-measured B goes down in most locations making it go up more versus measured point A goes down and non-measured B goes up in most locations. Errors in the methods of combining or calculating or both. Plenty of answers.

David:

That’s the answer I would think, to what has happened to any additional energy in the system. It is stored (and released) in some combination of the atmosphere and hydrosphere. Then there’s the mechanical aspects of what the planet is doing related to the sun and internally. Ugh, this seems somewhat complicated.

hswiseman:

Exactly. It seems mostly ignored that both the atmosphere and hydrosphere participate, and that clouds and water vapor are part of both. The interactions between them are powered by whatever the energy coming in has to with land changed by population, urbanization and industrializaion.

Geoff:

Yes. Clouds. All part of the big picture of the role that they and water vapor have in the atmosphere and how that interacts with what they do as part of, and in relation to, the hydrosphere.

John:

Oh, no, you see, What happened in or since 2003 is just cherry picking on your part. It’s the trend. 5 years is not long enough. You can’t just look at a month or a year!!!! Climate only counts if it’s 30 years. Waaaaaah!!!!
Phil.

Posted Apr 3, 2008 at 11:17 AM | Permalink

Re #183

Phil #174, I agree with John (#176) that it’s a real stretch to call 2004 and 2005 a trend “with a hiatus in 2006″. Cherry-picking the start point of a trend line is bad enough, but cherry-picking both ends is even worse! (In fact, what Phil is doing here might be called inverse cherry picking, or “lemon dropping”.)

It wasn’t me that picked those cherries, it was Mark! I was just disagreeing with his description of the previous trend as flat, his cherry picking started after 2000 when the anomaly was 0.239, which would have been way off his scale! It’s interesting that you didn’t have any problem with Mark’s original cherry picking Hu, but go after me when I rebut his argument.
RomanM

Posted Apr 3, 2008 at 11:21 AM | Permalink

It seems to me that we are ignoring the intent of the smoothing in this discussion. What is “best” depends quite a bit on the answer to that question.

For the most part, if I am looking at past history, then I assume the process has the basic form X(t) = m(t) + e(t) where m is the “signal” (deterministic component) and e(t) is the “noise” (random component). The intent of the smoothing is to emphasize m and minimize e. When looking at the behaviour in the interior of the series, if m(t) is reasonably continuous and not varying inordinately, I would think that information about the value of m(t) at some fixed point t = t0 would be contained not only in the behaviour of the series before t0, but also after t0 as well. It makes sense to use an acausal filter which involves values on both sides of t0 to estimate the value at m(t0). Choices of the weights, symmetry, number of points, etc. can be governed by the assumptions about the character of m and e. At the ends of the interval, less information is available (but still there is some) on one side than the other and again I don’t see a problem about smoothing to the endpoint. You need to examine the assumptions on m(t) inherent in the method of smoothing used to decide whether in that specific choice of smoothing is appropriate. IMHO, truncated smoothing makes the fewest assumptions about the future behaviour of the series while retaining the flavour of the method used in the interior. The loss of information at the endpoints can be reflected by the obviously wider error bounds (which can be estimated fairly simply because of the linearity of the smoothing method).

The “LP” method, y(t) = (1-a)u(t) + ay(t-1), (which I know as single exponential smoothing) has the property of being biased towards the past (the “delay” referred to above) and seems to depend more on the assumptions one makes about the form of the series. If my intent was to predict future values of the series, then this might be what I would use. As well, the more complicated the smoothing method, the less obvious the impact of the individual series’ values on the smoothed result – think back to the earlier threads on “how did they get that smoothed curve with those weird properties?” – and the more difficult the calculation of believable error bars.
Johan i Kanada

Posted Apr 3, 2008 at 12:15 PM | Permalink

#187 Roman

No, you can not assume that m(t) contains information from m(t+1), as this is un-physical. m(t) will (likely) affect m(t+1), but not the other way around.

What does “biased toward the past” mean? (I may show my ignorance here.) That it represents physical realities? I.e. the past may affect the future, and not the other way around.

Further, you write “If my intent was to predict future values of the series…” Well, that is exactly what we are trying to do, isn’t it? (I.e. determine some trend embedded within the random noise, for the purpose of understanding where we’re heading.)
Mark

Posted Apr 3, 2008 at 12:35 PM | Permalink

#186 UC, don’t worry about Phil cherry picking. It’s a spurious argument as it’s not pertinent to the original point by Phil claiming no noticeable change in slope and arguing about whether or not the trend from 2001 to 2006 was ‘flat’. Well I’ll admit I was wrong – it’s actually downward (see bottom)! And look at that, there’s almost a right angle between the trendline to 2006 and then the one from 2007! C’mon Phil, who’s not going to notice that! Ironically Phil’s ‘cherry pick’ of 2004 to 2006 has almost the identical slope. If Phil is claiming I did a cherry pick well that’s just bogus – were arguing over whether there would have been a noticeable diverence in the graph’s slope. The ‘plateau’ in temperatures after 2000 is clearly evident in the Hadley graph for unsmoothed annual HADCRUT3 temperatures (see second graph):
Mark

Posted Apr 3, 2008 at 12:38 PM | Permalink

Here’s the link to the referenced charts:
Mark

Posted Apr 3, 2008 at 12:39 PM | Permalink

Try this:

http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/
RomanM

Posted Apr 3, 2008 at 12:53 PM | Permalink

#188 Johan

No, you can not assume that m(t) contains information from m(t+1), as this is un-physical. m(t) will (likely) affect m(t+1), but not the other way around.

If both time t and time t-1 are in the past, then it is not a question of which value physically can or cannot affect the other, but whether I can use the relationship between them to extract mutual information. If I know how tall a tree is now, could I not make some sort of inference from that information about how tall it might have been one year ago?

What does “biased toward the past” mean? (I may show my ignorance here.) That it represents physical realities? I.e. the past may affect the future, and not the other way around.

Look at the smoothing formula: y(t) = (1-a)u(t) + ay(t-1). Suppose, e.g., your sequence has 0 error and u(t) = m(t) = t. (m(t) is increasing at one unit per step). Even if the last smoothed value is right on: y(t-1) = m(t -1) = t-1, and your observation u(t) = m(t) = t, the smoothed value y(t) = t – a, a systematic underestimate of the correct value – that is known as bias.

Further, you write “If my intent was to predict future values of the series…” Well, that is exactly what we are trying to do, isn’t it? (I.e. determine some trend embedded within the random noise, for the purpose of understanding where we’re heading.)

That is exactly my point. Sometimes we want to understand what has happened in the (noisy) past. If so, then my choice would be to use an acausal filter which could give me the best information about whether there were trends in the data. If I want to numerically extend these trends to the future, then I need to understand the physical realities of the situation more fully and and use proper statistical techniques to create a sequence of forward-moving predicted values. Clearly, the theme in this thread is that this was done using incredibly poor methodology with the temperature data.
Johan i Kanada

Posted Apr 3, 2008 at 2:04 PM | Permalink

#192 Roman

Hmmm, I still don’t understand.
You write “If I know how tall a tree is now, could I not make some sort of inference from that information about how tall it might have been one year ago?” But the case here is that we do have a signal also from a year ago, from two years ago, from three years ago, etc. We know a priori that m(t) is independent from m(t+1), i.e. this is given from basic scientific principles. So why try to reconstruct m(t) from m(t+1) when we know that is against the underlying science?

Better to use the causal LP filter approach (and of course it can be of higher order than one), which is physically sound, and further you do not have any end point padding problem.
UK John

Posted Apr 3, 2008 at 2:48 PM | Permalink

As a lurker, at times it gets difficult to follow the statistical “my dicks bigger than yours” arguements.

However its clear that Hadley got it wrong, so badly wrong that just two months “rogue” data has forced them to change their whole statistical approach. For ordinary mortals this would just be an error requiring correction, but for experts in these climate records, people who profess to tell us “facts”, this is incompetence.

Even I could work out that it might have been wise to stick in a few unexpected rogue values and test what your pretty algorithm then does. Any reasonable software QA procedure would do this, why didn’t they?
RomanM

Posted Apr 3, 2008 at 3:10 PM | Permalink

#193 Johan

One more post on this side topic…

We may have a “signal” from a year ago, but it has an error component so it is not perfect information. m(t) has its own characterisitics which can be used to improve our guess at the value of m(t) at that time. For example, if m(t) is approximately linear around t, then it is easy to show that (m(t-1) + m(t+1))/2 is equal to m(t). Thus the average of the values on either side of t is actually an estimate of m(t). The smoothing procedure takes advantage of this fact to improve our guess at time t.
Johan i Kanada

Posted Apr 3, 2008 at 3:45 PM | Permalink

#195 Roman

I will let go for now, hopefully I will get some time to analyse this a bit more in detail soon, including the theoretical basis. (First I have to re-read those course books from way back when.)

“I’m right, so there!”

See ya,
/Johan
Hu McCulloch

Posted Apr 3, 2008 at 6:08 PM | Permalink

Mark wrote (#189),

Ironically Phil’s ‘cherry pick’ of 2004 to 2006 has almost the identical slope.

Look again. His point was that if you just look at 2004 and 2005, and skip 2006, 2007 is almost on trend. He “lemon dropped” 2006.

For that matter, if you just look at 2001 and 2002, 2007 is well below “trend,” but so what?
jeez

Posted Apr 3, 2008 at 6:19 PM | Permalink

You are rapidly converging on the zen question of climate science. What is the trend of a single point?
Sam Urbinto

Posted Apr 3, 2008 at 6:36 PM | Permalink

If you’ve drunk the Kool-Aid of the trend equaling energy levels, the only other question is simple. Is it lemon- or cherry-flavored?

And can you drink it while you’re meditating trying to reach Zen?
Hu McCulloch

Posted Apr 3, 2008 at 7:44 PM | Permalink

UC writes (189),

Hu,

Mann (2004) doesn’t spell out where he wants to reflect the data, but since his illustration involves an even-numbered (10-year) filter, he presumably reflects just after the last data point.

Careful, it’s Mann , see
http://holocene.meteo.psu.edu/Mann/tools/Filter/lowpass.m
ipts=10; fn=frequency*2; npad=round(1/fn);
filter order (ipts) and padding length (npad) are independent.

Strange, but it looks like there would be no problem if npad >= ipts/2. What happens if npad > ipts/2 depends on what filtfilt does.

Here, anyway, he appears to be placing the “mirror” before and after the first and last points, not directly on them. It’s unclear what he would do if ipts were odd.
UC

Posted Apr 3, 2008 at 10:32 PM | Permalink

What happens if npad > ipts/2 depends on what filtfilt does.

filtfilt.m (*) pads 3 times the filter order at both ends, and reflects both x and y

Here, anyway, he appears to be placing the “mirror” before and after the first and last points, not directly on them.

Hmm, I think it is mirror before first data point, and mirror at the last data point:

lowpass([ 3 2 1 zeros(1,15) 1 2 3 ]’,0.1,2,2);

%npad =

5

%padded’ =

6
6
5
4
3
3
2
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
3
4
5
6
6
6

(*) $Revision: 1.7.4.2 $ $Date: 2004/12/26 22:15:49 $
UC

Posted Apr 4, 2008 at 3:32 AM | Permalink

RE Hacrut error propagation

http://hadobs.metoffice.com/hadsst2/diagnostics/time-series.html

says

Uncertainties on the biases (bucket corrections) have strong correlations in space and time, so they are just as large for decadal global averages as for monthly grid-point values.

I’ll try :

load annual_s21.htm
load annual.htm

t=annual(:,1);
F=diag(fliplr(pascal(21)));F=F/sum(F);
s21a=(annual_s21(:,2)-annual_s21(:,8)); % lower bias uncert.
sa=(annual(:,2)-annual(:,8)); % annual lower bias uncert.

plot(t, sa)

hold on

plot(t, s21a,’r’)
saF=filter(F,1,[ones(10,1)*sa(1); sa; ones(10,1)*sa(end)]);

plot(t,saF(21:end),’g’)

( http://signals.auditblogs.com/files/2008/04/sst_cis.png )

Almost exact match. That is, uncertainty is passed trough the same filter as the best estimate. This is the only one of those CIs that I can replicate.

This brings up a question, is this valid way? I didn’t find answer right away, question in mathy way:

Let C be a covariance matrix of errors, and F the filter matrix. What is required for C and F to satisfy

$diag(\sqrt{FCF^T})=F\sqrt{diag( C )}$

where diag(A) is a vector of main diagonal elements of matrix A, and $\sqrt{A}$ is the square root of elements of A ?
Phil.

Posted Apr 4, 2008 at 8:02 AM | Permalink

Re #194

However its clear that Hadley got it wrong, so badly wrong that just two months “rogue” data has forced them to change their whole statistical approach. For ordinary mortals this would just be an error requiring correction, but for experts in these climate records, people who profess to tell us “facts”, this is incompetence.

They haven’t “changed their whole statistical approach”, they’ve kept the same smoothing algorithm but decided not to include incomplete year data in their annual graphs. The CRU has adopted a different approach on their graphs of showing the smoothed curve for the complete years and the incomplete year’s data as the last element in the histogram:
Steve McIntyre

Posted Apr 4, 2008 at 8:18 AM | Permalink

#203. Phil, #194 is too strident. However, the field has been rife with what appear to be opportunistic end point smoothing – whether it be Emanuel’s pinned end points, Mann’s double reflection, CRU’s extrapolation of two months of El Nino.
UK John

Posted Apr 4, 2008 at 8:30 AM | Permalink

Sorry, will retire to lurker room !

Steve: No need to do that… I’m just disagreeing.
Phil.

Posted Apr 4, 2008 at 8:42 AM | Permalink

Re #204

Or even closer to home, Loehle’s E&E paper.
Craig Loehle

Posted Apr 4, 2008 at 8:47 AM | Permalink

PHil says: “Or even closer to home, Loehle’s E&E paper.”

What do you mean by that? I only reported dates for which I had data. in my first paper 1980 was the last date shown because 1995 was the last date for which I had data (I was using a running mean). In the correction, 1935 was the last date shown because 1946 was the last date for which I had enough series (I was being more conservative). An end point bias is when you have a multiyear smoothing but somehow manage to fudge the smooth to still report the last year. For example, for a 5 year running mean, how can you have a 2007 value?
Mark

Posted Apr 4, 2008 at 9:34 AM | Permalink

#204 – Steve, as I’ve tried to emphasize, this is not just about smoothing. Again the Hadley statement was:

“We have recently corrected an error in the way that the smoothed time series of data were calculated. Data for 2008 were being used in the smoothing process as if they represented an accurate esimate of the year as a whole. This is not the case and owing to the unusually cool global average temperature in January 2008, the error made it look as though smoothed global average temperatures had dropped markedly in recent years, which is misleading.”

There is no mention of the annual series in the same 3 chart set which was also altered. No smoothing applies to this graph so prior years were not impacted hence no need to change this graph. Yet they did! Why? As I said in my prior comments, the spike upward that was created in this graph at the beginning of 2007 caused no problems for them. So why change it now?
Earle Williams

Posted Apr 4, 2008 at 11:32 AM | Permalink

I would love to see Phil. explain the end point smoothing in Loehle, 2007. Please, no one offer conjecture of what Phil. means. Let us see if Phil. is capable of recognizing the distinction between smoothing and opportunistic end point smoothing, and if he can show us where the smoothing bias is in Loehle 2007. I’d like to think there is substance to go with the snark, but too often I see vague assertions and innuendo from the gadflies with nothing meaningful to back it up.

“I’m waiting!” -Vicini
Phil.

Posted Apr 4, 2008 at 2:58 PM | Permalink

Re #209

I would love to see Phil. explain the end point smoothing in Loehle, 2007. Please, no one offer conjecture of what Phil. means. Let us see if Phil. is capable of recognizing the distinction between smoothing and opportunistic end point smoothing, and if he can show us where the smoothing bias is in Loehle 2007. I’d like to think there is substance to go with the snark, but too often I see vague assertions and innuendo from the gadflies with nothing meaningful to back it up.

Just popped back in for a few minutes, so here’s an example: Loehle has a proxy with one data point per century with the last data point at 1955, he smooths it which has the effect of putting that value in every year from 1905-1980 and then averages that with the other proxies.
Craig Loehle

Posted Apr 4, 2008 at 3:20 PM | Permalink

Ok, Phil, let’s be precise. The data I was using was sparse. The data were only available as samples. Typically geologists then interpolate these samples to date between them, which is what I was doing with temperature (yes, this introduces some unknown noise). In my first paper, some data were interpolated (when I got them) and others I just smoothed. A point at 1955 was spread out from 1940 to 1970 (not 1905 to 1980). In the correction, I interpolated all data first and then backed off from the ending in 1949 to get my last point at 1935. The effect of interpolation and smoothing sparse data was NOT hidden but was mentioned. It was an effort to make use of sparse data. Funny, though, no one on the Team objected to Moberg’s Nature paper! I guess once you use wavelets and hide the details, no one can tell what you did, so it must be ok since it was so sophisticated.
Earle Williams

Posted Apr 4, 2008 at 3:21 PM | Permalink

Re #210

Phil.,

And how does this bias the endpoints compared to the entire time over which the smoothing is applied? Does the smoothing only fill the years 1905 – 1980 and not for example the years 1005 – 1904? Feel free to pop in for a few minutes when you can.
Mat S Martinez

Posted Apr 4, 2008 at 4:11 PM | Permalink

#84, sorry I didn’t reply sooner. Been busy at work this week. I am not:

AGWist agent provocatuer

I just like to see people be fair. I think that many actions done by those in climate science are viewed as being guided by some “agenda”. Like people can’t make mistakes and that only when it suits some hidden agenda do people correct these mistakes. In fact, it is amusing to see how the criticisms toward the Loehle paper are given/received on climateaudit. Pretty much a double standard b/c many posters on this site want so badly to have evidence against global warming… which by the way I don’t see the Loehle paper as providing.

Opinions and desire cloud objectivity.
Earle Williams

Posted Apr 4, 2008 at 4:33 PM | Permalink

Re #213

Mat,

Opinions and desire cloud objectivity.

Seeing what you want to see isn’t unique to any particular side on this or any other issue. E.g., you seeing that folks at CA want to see Loehle 2007 as disproving AGW. That’s hardly the case. Speaking for myself, I want it to be disproving the MBH et al hockey stick that tried to do away with the MWP. Nothing more, nothing less. Yet if you want to interpret that as looking for evidence to disprove AGW, well all I can say is you’ve proved your own point very well.
Phil.

Posted Apr 4, 2008 at 8:40 PM | Permalink

Re #211

A point at 1955 was spread out from 1940 to 1970 (not 1905 to 1980).

Not according to your datafile, LoehleData13.csv, on the web, it’s filled with the value 0.13606107 from 1905-1980 inclusive.
Michael Smith

Posted Apr 5, 2008 at 3:53 AM | Permalink

From 213:

evidence against global warming… which by the way I don’t see the Loehle paper as providing.

Can anyone explain why AGW-proponents can’t grasp the distinction between “evidence against global warming” and “evidence against unprecedented global warming”?
Geoff Sherrington

Posted Apr 5, 2008 at 4:49 AM | Permalink

Re # 56 Bernie

Doing some Aussie catch-up while much of the world sleeps. Road distances, Melbourne to ….., in km (multiply by 0.6 to get miles)

Sydney 880
Brisbane 1700
Adelaide 730
Perth 3400
Broome 4000 km.

Hope this helps.
UC

Posted Apr 5, 2008 at 12:07 PM | Permalink

Hu (146)

Will they discard 2008 and endpad with 2007 until 2008 is complete so that the series will only change once a year?

This brings up a question, which one is better predictor of 2008 average,

Jan-Feb 2008 average or 2007 average ?

Or more generally, if your prediction method is restricted to n-month average, how to choose n that minimizes the prediction error ?

No answers to #202 question so far; One solution: C is matrix of ones and F is row vector (?)
Matthew

Posted Apr 7, 2008 at 8:21 AM | Permalink

Quick question about DAILY AVERAGES. I eMailed NOAA yesterday because I was curious about daily temperature averages, and how they are calculated. They take the MAX & MIN, and average them.

“The daily average temperature is the average of the daily maximum and
minimum temperatures (which frequently occur at times other that the
hourly reported temperatures).

Ron Jones
Internet Projects Specialist
National Weather Service
Office of the CIO
Silver Spring, MD 20910”

This doesn’t seem to take into account the fact most of the day – especially here in New England – could have been warm or cold. A precipitous drop or a spike in temperature for just a short period of time, would change the daily average by a few degrees.

April 5 average of MAX & MIN was 45.5˚ (based upon figures in the 3 day history on the NOAA site for East Milton, MA) link below,
April 5 average of hourly recorded temperature was 41.5˚ (based on the same NOAA chart from April 5)

Am I being completely naive about this?
MarkW

Posted Apr 7, 2008 at 8:31 AM | Permalink

Matthew,

You are not being naive. This problem has been discussed a number of times on this site, as well as others.

NOAA takes the position that it doesn’t matter that an average of MIN and MAX does not adequately represent that aveerage daily temperature. Let’s assume that the daily temperature can be represented by a curve of temperature vs. time. NOAA believes that as climate changes, this curve will maintain it’s present shape, therefore a simple MIN/MAX average is adequate.

There’s also the problem that historically, MIN and MAX are all that were recorded. So if you want to compare current temps to historical temps, you have to keep the reporting methods consistent.

There are those amongst us that believe that the historical record is so bollixed (am I allowed to use language this strong?) up, that trying to figure out what historical temperatures were, is not possible, much less trying to figure it out to within a few hundredths of a degree.

Other problems include changes to the environment around the temperature stations, some of which have been recorded, most of which have not.
Combined with no studies detailing how changes to the environment around the temperature stations will affect the readings.
Matthew

Posted Apr 7, 2008 at 8:57 AM | Permalink

Thanks Mark! I appreciate your response. I guess I understand why in the past MAX & MIN were the only readings taken, but it seems irrational to have “upgraded” from liquid MAX/MIN thermometers to stations that can measure weather data all day, and then basically throw out the rest of the day’s temperature data & only use MAX/MIN. I realize the complexity of trying to use earlier records, but at some point we should be creating a new data set that reflects contemporary methods & equipment for collecting data. Has anyone started from scratch to build a new data set that doesn’t have to jive with past data sets? In twenty years we’d have an interesting collection.
MarkW

Posted Apr 7, 2008 at 10:27 AM | Permalink

Matthew,

They don’t throw out the data, they just don’t use it compare to the historical record. The problem in trying to use the enhanced data, is that they just don’t have enough of it, maybe in another 30 years.
Sam Urbinto

Posted Apr 7, 2008 at 12:15 PM | Permalink

The min/max question is an interesting one. The logic is that if you always take the measurements the same way, then you are comparing apples to apples; regardless if the numbers are off, they’re always off the same way (and as Mark said, the old records are like that, so if you want to compare….)

If only we could be sure of the site biases over time, the site as indicative of the area (sampled site A and unsampled site B 100 meters away behave the same from day to day), the frequency and representativeness of the measurements similar from day to day (similarity between high today with conditions of the high tomorrow) and any adjustments for TOBS etc are correct and stable over time.

Would it be better to average 10 minute readings into an hour, and 24 of those into a day? Or to take the high/low of the 10 minute averages? Or to perform some statistical calculation on the 240 readings? Probably. Would it be any better than sampling a spot (or say 5 spots of known un-biased conditions within 100 meters of each other) for min/max? Do readings originally designed for short term weather rather than long term climate help us no matter how we do it? I don’t think we know the answers to those questions.

But hey.
JD

Posted Apr 11, 2008 at 8:17 AM | Permalink

A one-dimensional Kalman filter for the correction of near surface temperature forecasts
George Galanis a1 and Manolis Anadranistakis a2
a1 Greek Naval Academy, Hatzikyriakion, Piraeus 145 10, Greece
a2 Hellenic National Meteorological Service, El. Venizelou 14, Elliniko 167 77, Athens, Greece

Abstract

A one-dimensional Kalman filter is proposed for the correction of maximum and minimum near surface (2 m) temperature forecasts obtained by a Numerical Weather Prediction model. In our study we used only one parameter (observed temperature), employing in this way scalar system and observation equations, and a limited time interval (7 days). As a result, our algorithm can be easily run on any PC with only minor technical support. The corresponding results are rather impressive, since the systematic error of our time series almost disappears, and they show the merit of post-processing even when only a simple method is applied.
steven mosher

Posted Apr 11, 2008 at 9:37 AM | Permalink

re 221. Google NOAA CRN.

The problem is more than historical. Its also global as not every nation has the
resources to build out instrumentation networks.

There have been studies comparing the differences between integrating temps over the day to sampling the min and max and dividing by 2. I still cant remember whe I read it, but I think StMac remembers. The final issue is do yu get a different measure of trend.

Oddly enough, when you run a GCM and you report for average temp to the IPCC, you report
a metric that results from integration rather than (Max+Min)/2
UC

Posted Apr 15, 2008 at 11:52 PM | Permalink

This figure http://www.cru.uea.ac.uk/cru/data/temperature/nhshgl.gif has been fixed now as well. Black line doesn’t overlap with last value anymore (compare, for example with this ). It still overlaps with year 2007 value, I guess it takes whole cold year for them to learn..
Phil.

Posted Apr 16, 2008 at 11:11 AM | Permalink

Re #226

This figure http://www.cru.uea.ac.uk/cru/data/temperature/nhshgl.gif has been fixed now as well. Black line doesn’t overlap with last value anymore

Which is exactly what I showed a couple of weeks ago in #203.
UC

Posted Apr 16, 2008 at 11:57 AM | Permalink

#227
Oops, didn’t pay attention, sorry. Let me try to combine those two versions,
UK John

Posted Apr 16, 2008 at 1:39 PM | Permalink

As a non statistical lurker a general question for all you expert types.

We have statistical analyses of observational Trends with all sorts of adjustments and smoothing, we have proxy studies of things that might represent temperature and then do the statistics bit with them.

Then we have computer models that simulate what might happen in the real climate system, and we even start to discuss and analyse them with statistics as if the model were real. I suppose why not!

Then we have a model of how the atmospheric gases might absorb infra red and warm up, and I suppose someone has applied statistics to that as well.

But out in the real practical world where I live, has this sort of statistical stuff ever been applied to anything that is useful. I know a lot of obscure mathematical theory does get used in all sorts of stuff, even the thing I am typing on! but where is the practical realisation of something like a proxy study.
Mark T.

Posted Apr 16, 2008 at 4:45 PM | Permalink

But out in the real practical world where I live, has this sort of statistical stuff ever been applied to anything that is useful.

Oh yes, quite a lot, actually. Communications as well as radar are heavily based on detection theory, which is statistics wrapped up in engineering form. Component analysis techniques are used widely in many engineering fields, particularly w.r.t. any sort of detection problem (such as comm, radar, image processing). Of course, the distributions for the desired “signal” (the thing you’re attempting to detect) as well as accompanying noise and/or other impairments, are typically known apriori. These distributions are known often because they have been constructed in a particular way, or they can be tested. Of course, even when you have this additional information from a theoretical/empirical viewpoint, once you get out into the real world it all gets fuzzy, and many of the detection methods either don’t work as well as advertised, or stop working altogether.

but where is the practical realisation of something like a proxy study.

Hehe… $64,000 question, eh? 🙂

Mark
bender

Posted Apr 16, 2008 at 6:40 PM | Permalink

Systematic bias in choice of methods favoring a particular conclusion.
Judgement for the plaintiff.
Next topic.
Geoff Sherrington

Posted Apr 17, 2008 at 4:04 AM | Permalink

Re # 231 Bender
Have not seen you post for a while. If you’ve been away, welcome back.

Maybe the agenda is a little more complex than you summarise.

Systematic bias in choice of methods, reported with several different caveats, so that some unexpected outcomes will fit past assertions.

“I’ll be Judge, I’ll be Jury,” said cunning old Fury;
“I’ll try the whole cause, and condemn you to death”.

Written by a mathematician.
Tony Edwards

Posted Apr 17, 2008 at 10:37 AM | Permalink

Geoff Sherrington #217

In the interest of accuracy on a science based and auditing site, you really should multiply kilometres by 0.625 to get miles, otherwise you end up with a 4% error. Not much in temperature but a lot when travelling.

Lewis Carroll was quite a guy apparently.
manacker

Posted Jul 20, 2008 at 11:24 AM | Permalink

I have been following and downloading the monthly temperature anomaly values published by UAH, Hadley, etc. as they come out.

Recently I noticed that the Hadley values I had downloaded as they were first published for the first four months of 2008 had subsequently been changed.

I have noticed occasional minor adjustments after the fact in most of the records, but this adjustment covered four successive months and was not “minor.

Original record
J -0.105
F +0.039
M +0.430
A +0.250

“Corrected” record
J +0.054
F +0.192
M +0.445
A +0.254

The net difference is an average of +0.083C per month, so fairly significant in a record where annual changes are only a fraction of this amount.

So my question, “Has the Met Office changed its method of calculating the reported monthly values or has it started some ex post facto “corrections” to the monthly record for the first four months of 2008, in order to “mitigate” the current cooling trend?

I sincerely hope that the latter this is not the case.

Up until now, I have always assumed that it is only the GISS record that has been compromised (and is therefore out of line with the others).

Do we now have a similar problem with the Hadley record?

If anyone has any information on what has happened, I would appreciate hearing it.

Max
Steve McIntyre

Posted Jul 20, 2008 at 12:40 PM | Permalink

I’m glad that you saved an earlier version. This may be useful.

We know much less about CRU methodology than GISS methodology.
1d mark iv

Posted Dec 27, 2009 at 1:07 PM | Permalink

Those are some nice illustrative graphs. thanks for this man!
Mary Tembo

Posted Nov 23, 2011 at 9:59 AM | Permalink

I am new in this field. Can any good samaritan please send me some scripts on how to calculate a running mean when you have already computed timeseries. I tried with Grads, and i am failing terribly. Or maybe one can help me with using a different plotting software altogether starting with time series and continuing to Moving average. Thanks