Jeff Id convinced me that while FDM solves one problem, it just creates another, and so is not the way to go. See http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240064 above.

I’m not sure what CSM stands for, offhand. ]]>

2nd, what is the advantages and disadvantages of CSM. Tq ]]>

Ryan O

Posted Aug 21, 2010 at 8:28 AM

http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240022

Hu, what they do with RegEM is not a separate issue. It’s integral.

Here’s an example (I don’t know how to do tables, so unfortunately I’m stuck with words):

Generate 3 series, each of length 400, and embed a trend in them. For series 1, replace the last 200 points with NA. For series 2, replace the first and last 100 points with NA. For series 3, replace the first 200 points with NA.

Seasonally demean the existing points. If you calculate the mean trend at this point, it will be greatly reduced from what you embedded. But this doesn’t mean that the trend information is “lost” . . . all it means is that you calculated the trend prematurely.

The trend information is easy to restore. First, calculate the difference in mean between the 100 overlapping points for series 1 and 2 and add that mean to series 1 (such that the difference in mean during the overlap is zero). Then calculate the difference in mean between the 100 overlapping points for series 2 and 3 and add that mean to series 3. If you calculate the trend now, it will be equal to the embedded trend.

While you could certainly go through the entire BAS READER database and do this (Jeff Id did exactly that with the nearest-station reconstructions), you could also use RegEM to determine these offsets. The latter is what S09 chose. Despite the vast difference in complexity, the two methods yield nearly identical results.

….

Suppose that each of your three series, after demeaning, runs from -1 to +1 during its 200-period experience. While I no longer have RegEM up and running, my understanding from Tapio Schneider’s article is that RegEM will compute an initial covariance matrix using this data, and thus find equal variances, and a *negative* correlation of -1/2 between series 1 and 2 and between 2 and 3 (with no information on 1 and 3 as yet).

It will therefore initially infill series 1 in periods 201-300 and series 2 in periods 301-400 with values growing in magnitude from 0 to -.5, and series 2 in periods 0-99 and series 3 in periods 100-199 with values declining from +.5 to 0.

It then recomputes the covariance matrix from the infilled data, with an add factor to prevent loss of total variance. The correlation between 1 and 2, and between 2 and 3, is still negative, though there is now a weak positive correlation between 1 and 3.

It then iterates to convergence.

I can’t quite visualize what it ends up with, but it must be pretty flat. I therefore don’t see that RegEM (unlike Roman’s PlanB) solves the problem of lost trend that I posted.

What do you actually get putting these 3 series into RegEM?

]]>
Eric Steig

Posted Aug 20, 2010 at 2:55 PM

http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-239928

….

McCulloch shows an example with missing values.

Station A has 13, 14, –, — , –.

What is the correct thing to assume about the missing values? I said that he assumed zero, but what he actually implicitly assumes is 15, 16, 17, since he claims the trend is 1°/ yr. But then he ignores that in calculating the anomalies, which implicitly are assumed (correctly) to be zero (that is, no different from the the climatological mean value, of 13.5 , based on existing data) for all missing data.

So he’s comparing apples and oranges.

Actually, I wasn’t thinking in terms of “infilling” the missing values at all, but only in terms of computing a best-estimate of the regional or global temperature anomaly.

In order to do this, of course, one must implicitly fill in a value for every point in the region or globe, including any missing station locations, and then integrate these up.

However, the missing station locations are of no special importance for this calculation. Furthermore, it is not necessary to actually compute the value for every point in the region (or even a dense finite subset of those points). All we need to know is the covariances of the observed stations with each other and with the regional temperature. This can be done either with a spatial model like exponential kriging (see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/) or with the aid of comprehensive climate data like the S09 AVHRR matrix (see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/#comment-240413).

But that said, if I *were* to infill the missing values for station A from my hypothetical data, I would have come up with (13 14 15 16 17), not (13 14 13.5 13.5 13.5).

Sorry for the delay getting back to you, Eric — see also the corrections in the main post.

I’m still thinking about what RyanO says RegEM does with anomalies, but my understanding is that you at least did start by subtracting seasonal means from the Reader and AWS data, and therefore had anomalies relative to each station’s observation period. However, a method like RomanM’s could still recover the full trend from such data, so that it does not in itself mean that you underestimated the fluctuations in Ant. temps.

(Replies to replies to replies can come out very narrow in the new CA format, so I’ve just inserted this at the end with its header, and then pasted in the Permalink.)

]]>
Steven Mosher

Posted Aug 20, 2010 at 10:43 PM

Hmm. It doesnt look right.

“If the data are monthly or even daily, annual differences are taken, divided by 12 or 365, and then accumulated to obtain a monthly or daily index. ”

Unclear,Hu.

That’s because it wasn’t — see corrections to head post! :-)

]]>Inasmuch as the time-average FD of any finite series is simply the ROC of the secant-slope between the initial and last point in the series, we get an EQUALLY time-weighted trend. Linear regression, on the other hand, uses weighting that INCREASES LINEARLY away from the central time-point of the series.

FDing materially changes the spectral structure of the time-series via tha well-known high-pass filter amplitude response 2sin(omg/2), where omg is the radian frequency in the baseband range 0-pi. By increasingly suppressing the low-frequency components, which tend to dominate climate data, their confounding effect upon estimates of regressional trend is sharply reduced. Thus the temporal mean of the FDs is closer to the SECULAR trend. The regressional trend, of course, can be obtained by accumulating the FDs from beginning to end to reconstruct the original series sans the temporal mean.

The rub, however, comes when there are gaps in the multiple series whose FDs are ensemble averaged. This introduces errors in the average FD at any time step, which propagate cumulatively thereafter, making exact recontruction of the ensemble-averaged series impossible. There is no free lunch!

What further compromises ensemble averages obtained from gapped temperature data is the fact that the variance is by no means uniform from station to station regionally, let alone throughout the globe. Thus high-variance Siberian stations have a stronger influence on the global average than low-variance tropical coastal stations.

The entire enterprise of using fragmented data records to construct “global average temperature series,” relying implicitly upon the premise of spatial homegeneity of temperature variation and uniform geographic coverage, needs to be re-evaluated from the ground up.

]]>http://www.guardian.co.uk/environment/cif-green/2010/aug/19/climate-sceptics-mislead-public

]]>