Comments on: The First Difference Method

By: Hu McCulloch

Hu McCulloch — Tue, 26 Jun 2012 21:43:11 +0000

Tq —
Jeff Id convinced me that while FDM solves one problem, it just creates another, and so is not the way to go. See https://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240064 above.
I’m not sure what CSM stands for, offhand.

By: WIP on First Differences | Musings from the Chiefio

WIP on First Differences | Musings from the Chiefio — Tue, 26 Jun 2012 17:26:54 +0000

[…] https://climateaudit.org/2010/08/19/the-first-difference-method/ Changes that are likely to cause a level shift in the series, such as a TOBS or equipment change or a station move, should simply be treated as the closing of the old station and the creation of a new one, thereby eliminating the need for the arcane TOBS adjustment program or a one-size-fits-all MMTS adjustment. Missing observations may simply be interpolated for the purposes of computing first differences (thereby splitting the 2 or more year observed difference into 2 or more equal interpolated differences). When these differences are averaged into the composite differences and then cumulated, the valuable information the station has for the long-run change in temperature will be preserved. (When computing a standard error for the index itself, however, it should be remembered in counting stations that the station in question is missing for the period in question). […]

By: bonn mailin

bonn mailin — Mon, 30 Aug 2010 16:57:36 +0000

hello guys, can you help me? what is the advantages and disadvantages of FDM.
2nd, what is the advantages and disadvantages of CSM. Tq

By: Hu McCulloch

Hu McCulloch — Sun, 29 Aug 2010 18:27:54 +0000

Ryan O
Posted Aug 21, 2010 at 8:28 AM

The First Difference Method

Hu, what they do with RegEM is not a separate issue. It’s integral.

Here’s an example (I don’t know how to do tables, so unfortunately I’m stuck with words):

Generate 3 series, each of length 400, and embed a trend in them. For series 1, replace the last 200 points with NA. For series 2, replace the first and last 100 points with NA. For series 3, replace the first 200 points with NA.

Seasonally demean the existing points. If you calculate the mean trend at this point, it will be greatly reduced from what you embedded. But this doesn’t mean that the trend information is “lost” . . . all it means is that you calculated the trend prematurely.

The trend information is easy to restore. First, calculate the difference in mean between the 100 overlapping points for series 1 and 2 and add that mean to series 1 (such that the difference in mean during the overlap is zero). Then calculate the difference in mean between the 100 overlapping points for series 2 and 3 and add that mean to series 3. If you calculate the trend now, it will be equal to the embedded trend.

While you could certainly go through the entire BAS READER database and do this (Jeff Id did exactly that with the nearest-station reconstructions), you could also use RegEM to determine these offsets. The latter is what S09 chose. Despite the vast difference in complexity, the two methods yield nearly identical results.
….

Suppose that each of your three series, after demeaning, runs from -1 to +1 during its 200-period experience. While I no longer have RegEM up and running, my understanding from Tapio Schneider’s article is that RegEM will compute an initial covariance matrix using this data, and thus find equal variances, and a negative correlation of -1/2 between series 1 and 2 and between 2 and 3 (with no information on 1 and 3 as yet).

It will therefore initially infill series 1 in periods 201-300 and series 2 in periods 301-400 with values growing in magnitude from 0 to -.5, and series 2 in periods 0-99 and series 3 in periods 100-199 with values declining from +.5 to 0.

It then recomputes the covariance matrix from the infilled data, with an add factor to prevent loss of total variance. The correlation between 1 and 2, and between 2 and 3, is still negative, though there is now a weak positive correlation between 1 and 3.

It then iterates to convergence.

I can’t quite visualize what it ends up with, but it must be pretty flat. I therefore don’t see that RegEM (unlike Roman’s PlanB) solves the problem of lost trend that I posted.

What do you actually get putting these 3 series into RegEM?

By: Hu McCulloch

Hu McCulloch — Sun, 29 Aug 2010 15:29:25 +0000

Eric Steig
Posted Aug 20, 2010 at 2:55 PM

The First Difference Method

….
McCulloch shows an example with missing values.
Station A has 13, 14, –, — , –.

What is the correct thing to assume about the missing values? I said that he assumed zero, but what he actually implicitly assumes is 15, 16, 17, since he claims the trend is 1°/ yr. But then he ignores that in calculating the anomalies, which implicitly are assumed (correctly) to be zero (that is, no different from the the climatological mean value, of 13.5 , based on existing data) for all missing data.

So he’s comparing apples and oranges.

Actually, I wasn’t thinking in terms of “infilling” the missing values at all, but only in terms of computing a best-estimate of the regional or global temperature anomaly.

In order to do this, of course, one must implicitly fill in a value for every point in the region or globe, including any missing station locations, and then integrate these up.

However, the missing station locations are of no special importance for this calculation. Furthermore, it is not necessary to actually compute the value for every point in the region (or even a dense finite subset of those points). All we need to know is the covariances of the observed stations with each other and with the regional temperature. This can be done either with a spatial model like exponential kriging (see https://climateaudit.org/2010/08/26/kriging-on-a-geoid/) or with the aid of comprehensive climate data like the S09 AVHRR matrix (see https://climateaudit.org/2010/08/26/kriging-on-a-geoid/#comment-240413).

But that said, if I were to infill the missing values for station A from my hypothetical data, I would have come up with (13 14 15 16 17), not (13 14 13.5 13.5 13.5).

Sorry for the delay getting back to you, Eric — see also the corrections in the main post.

I’m still thinking about what RyanO says RegEM does with anomalies, but my understanding is that you at least did start by subtracting seasonal means from the Reader and AWS data, and therefore had anomalies relative to each station’s observation period. However, a method like RomanM’s could still recover the full trend from such data, so that it does not in itself mean that you underestimated the fluctuations in Ant. temps.

(Replies to replies to replies can come out very narrow in the new CA format, so I’ve just inserted this at the end with its header, and then pasted in the Permalink.)

By: Hu McCulloch

Hu McCulloch — Sun, 29 Aug 2010 14:56:19 +0000

In reply to Steven Mosher.

Steven Mosher
Posted Aug 20, 2010 at 10:43 PM

Hmm. It doesnt look right.

“If the data are monthly or even daily, annual differences are taken, divided by 12 or 365, and then accumulated to obtain a monthly or daily index. ”

Unclear,Hu.

That’s because it wasn’t — see corrections to head post! 🙂

By: sky

sky — Fri, 27 Aug 2010 20:52:50 +0000

In reply to Mesa.

Ditto your thoughts on confusion here. What is remarkable is that virtually no one recognizes that the FDs of a time-series lead to a fundamentally different sense of “trend” than that implicit in linear regression.

Inasmuch as the time-average FD of any finite series is simply the ROC of the secant-slope between the initial and last point in the series, we get an EQUALLY time-weighted trend. Linear regression, on the other hand, uses weighting that INCREASES LINEARLY away from the central time-point of the series.

FDing materially changes the spectral structure of the time-series via tha well-known high-pass filter amplitude response 2sin(omg/2), where omg is the radian frequency in the baseband range 0-pi. By increasingly suppressing the low-frequency components, which tend to dominate climate data, their confounding effect upon estimates of regressional trend is sharply reduced. Thus the temporal mean of the FDs is closer to the SECULAR trend. The regressional trend, of course, can be obtained by accumulating the FDs from beginning to end to reconstruct the original series sans the temporal mean.

The rub, however, comes when there are gaps in the multiple series whose FDs are ensemble averaged. This introduces errors in the average FD at any time step, which propagate cumulatively thereafter, making exact recontruction of the ensemble-averaged series impossible. There is no free lunch!

What further compromises ensemble averages obtained from gapped temperature data is the fact that the variance is by no means uniform from station to station regionally, let alone throughout the globe. Thus high-variance Siberian stations have a stronger influence on the global average than low-variance tropical coastal stations.

The entire enterprise of using fragmented data records to construct “global average temperature series,” relying implicitly upon the premise of spatial homegeneity of temperature variation and uniform geographic coverage, needs to be re-evaluated from the ground up.

By: Rod McLaughlin

Rod McLaughlin — Thu, 26 Aug 2010 23:08:34 +0000

As you may have noticed, the Guardian apologised to Montford for this article:
http://www.guardian.co.uk/environment/cif-green/2010/aug/19/climate-sceptics-mislead-public

By: Kriging on a Geoid « Climate Audit

Kriging on a Geoid « Climate Audit — Thu, 26 Aug 2010 18:24:12 +0000

[…] on a Geoid Geoff Sherrington and others on the First Difference Method post have requested a post for discussing […]

By: Andrew

Andrew — Wed, 25 Aug 2010 17:53:23 +0000

Although first differences are not widely used (and the surface datasets all seem to use anomaly methods) it has been applied to some climate problems before. Here is one done with radiosonde data:

http://journals.ametsoc.org/doi/abs/10.1175/JCLI3198.1