Over on WUWT (http://wattsupwiththat.com/2010/07/13/calculating-global-temperature/), Zeke Hausfarther and Steven Mosher have been discussing the calculation of global temperature from station data. They list several methods of combining records, noting that most of the major indices use the Common Anomalies Method (CAM). They mention, but do not discuss, the First Differences Method (FDM).

In fact, FDM is far superior to any method based on averaging anomalies. Simply averaging anomalies relative to each station’s mean (~~as in the much-discussed Steig et al 2009 study of Antarctic temperatures~~ — *see below*) greatly understates any trend there may be in the data, while using a common period unnecessarily restricts the available data. At the same time, FDM eliminates the need for opaquely complex adjustments for TOBS or MMTS, and automatically takes care of station moves.

Suppose, to take a purely hypothetical example, that four stations, A, B, C and D, have partial data on years 1-5, as indicated in the following table:

**Table 1: Temperatures (°C)**

Station |
Yr 1 | Yr 2 | Yr 3 | Yr 4 | Yr 5 |

A | 13 | 14 | - | - | - |

B |
- | 10 | 11 | - | - |

C |
- | - | 22 | 23 | - |

D |
- | - | - | 15 | 16 |

Clearly all the stations point to a +1°C/yr uptrend in temperature.

If we compute anomalies for each station, as was done by Steig et al (2009) (*to the best of my knowledge*), and then average over available stations (*S09 did something much more elaborate*), we obtain the following:

**Table 2: Anomalies**

Station |
Yr 1 | Yr 2 | Yr 3 | Yr 4 | Yr 5 |

A | -.5 | +.5 | - | - | - |

B |
- | -.5 | +.5 | - | - |

C |
- | - | -.5 | +.5 | - |

D |
- | - | - | -.5 | +.5 |

Avg. |
-.5 | 0 | 0 | 0 | +.5 |

The Least Squares trend in the average anomalies is only ~~1/6~~ 1/5 °C/yr. *[(-2)^2 + (-1)^2 + 0^2 + 1^2 + 2^2 = 10, not 12 as I computed last night!]*

The First Difference Method, on the other hand, first computes annual differences as available for each station and then averages the first differences as available across stations. The average first differences are then cumulated from an arbitrary starting point (Year 1 in the following table), and then may be adjusted as anomalies relative to any desired reference period (Years 1-5 in the last row of the table):

**Table 3: First Differences**

Station |
Yr 1 | Yr 2 | Yr 3 | Yr 4 | Yr 5 |

A | - | 1 | - | - | - |

B |
- | - | 1 | - | - |

C |
- | - | - | 1 | - |

D |
- | - | - | - | 1 |

Avg. |
– | 1 | 1 | 1 | 1 |

Cumulative |
0 | 1 | 2 | 3 | 4 |

Yr1-5 = 0 |
-2 | -1 | 0 | 1 | 2 |

The FDM trend is correctly +1 °C/yr, using either reference point, Yr1 = 0 or Yr1-Yr5 = 0.

If the data are monthly or even daily, annual differences are taken, ~~divided by 12 or 365,~~ and then accumulated to obtain a ~~monthly (or daily) index~~ *separate index for each month or day. Each index is then normalized to average to 0 in a desired base period, and then these are interleaved the monthly series. The interleaved monthly series may then be averaged to obtain an annual series.* .

Changes that are likely to cause a level shift in the series, such as a TOBS or equipment change or a station move, should simply be treated as the closing of the old station and the creation of a new one, thereby eliminating the need for the arcane TOBS adjustment program or a one-size-fits-all MMTS adjustment.

Missing observations may simply be interpolated for the purposes of computing first differences (thereby splitting the 2 or more year observed difference into 2 or more equal interpolated differences). When these differences are averaged into the composite differences and then cumulated, the valuable information the station has for the long-run change in temperature will be preserved. (When computing a standard error for the index itself, however, it should be remembered in counting stations that the station in question is missing for the period in question).

The Common Anomalies Method (CAM), used by most of the major indices, including MOSHtemp ;-), requires restricting the data base to stations that are observed during a common base period, or at least during most of it. This requires throwing out a large portion of the data, and/or relying heavily on estimated TOBS and MMTS adjustments to artificially extend stations with these changes. In Table 1, for example, there is no period longer than 1 year for which more than 1 station has complete data, so that no trend could be computed by this method at all. If we were to settle for a base period in which each station has at least 2/3 of its data, a 3-year period such as Yr2-Yr4 could be used, as illustrated below:

**Table 4: Common Anomalies (Yr2-Yr4)**

Station |
Yr 1 | Yr 2 | Yr 3 | Yr 4 | Yr 5 |

B |
- | -.5 | +.5 | - | - |

C |
- | - | -.5 | +.5 | - |

Avg. |
- | -.5 | 0 | +.5 | - |

The CAM index is unnecessarily restricted to just three years, and has a trend of only +~~0.25~~ 0.5 °C/yr.

FDM of course entails the usual problem choosing a method of averaging, as do all the other methods. I would incline toward Kriging, though gridding could give similar results.

The Least Squares Method (LSM), advocated by Tamino and RomanM (the latter at http://statpad.wordpress.com/2010/02/19/combining-stations-plan-b/) will give similar results to FDM if I understand it correctly, but FDM is far more transparent, based as it is on just differencing, averaging, and cumulating.

FDM has been advocated by NCDC scholars Peterson, Karl, et al (JGR 1998, http://homepage.mac.com/williseschenbach/.Public/peterson_first_difference_method.pdf), and also by Craig Loehle (working paper), but evidently is not widely used.

**Update 8/29** Just for the record, as noted below at http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240064, Jeff Id has convinced me that while FDM solves one problem, it just creates other problems, and hence is not the way to go.

Instead, one should use RomanM’s “Plan B” — see

http://statpad.wordpress.com/2010/02/19/combining-stations-plan-b/, http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240129 , with appropriate covariance weighting — see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/ .

## 134 Comments

I found that in applying a FDM to the Antarctic, spurious trends were introduced by the step in the first delta of the series. I think it would be fine if you had several thousand series but what this analysis is missing is that the single first step (first delta) is simply the offset for the ‘anomaly’ from the mean of the trend. Since trends are tenths of a degree and monthly anomaly steps are several degrees, this first step offset causes a huge amount of noise in local trends.

Since the time window anomaly method uses a number of steps it actually will compute a more realistic trend for the Antarctic or a regional area of the globe. If you are looking for a global average only, you are correct though that first difference will do a good job.

That’s the beauty of Roman’s method. It uses all the overlapping data on a monthly basis to insure that artificial steps aren’t introduced.

Thanks, Jeff. I can see that as I described the method, monthly temperatures will be constant within the first year, so that a constant trend of 1 deg. per year will show up as a series of 1 deg. jumps every Dec.-Jan.

Some modification would be required to smooth this out. Perhaps the monthly values during the first year could just be set to reflect the average annual change.

On further thought, this problem exists with any form of seasonal adjustment — if all the Januaries sum to 0 over some period and all the Februaries sum to zero, then all the Jan-Feb differences sum to zero, and any long-run trend must show up in the Dec-Jan differences.

Using a 1-year reference period as in the raw cumulants from year 1 just makes the problem more conspicuous. The solution is just to use a longish reference period, like 1961-90, and then to set each month’s average to zero in this period as well as the grand average. Climate mavens will notice that Dec-Jan mysteriously has a higher variance than any other 1-month change, and any long-run trend will be slighly understated (since the intra-year changes are missing), but no one else will see anything odd.

(In my example, a 1 dC change per year would imply 100 dC per century, perhaps a bit extreme for anything but IPCC projections for the 21st c. ;-) A value of .01 dC per year would be more realistic for a trend, but still a 1dC actual 1-year change would not be unusual, and this would jump out in the first Dec-Jan change if Year 1 were retained as the reference period.)

Computing a temperature index is a lot like computing a price index. CAM would be like insisting on using the same commodities, with the same weights, in 1850 as in 2010. Staples like whale oil and cell phones would have to be ignored entirely. But FDM is like using the preferred chain-linked approach to a price index. Comparing 1850 to 2010 is still pretty artificial, but at least you get a number.

This stair step effect is also a problem, but I was referring to the effect of noise on the method.

When the first difference is integrated the anomaly is returned inside the average trend. The actual offset of the anomaly results in no effect on the first point introduced. The constant in the integration (cumsum) is the mean of the total trend at the first point. While it sounds good, the noise in the data causes so much random offset that trends cannot be accurately discerned without many hundreds or even thousands of anomaly introductions to cancel each other out.

IMO, It’s really a non-working method for anything less than thousands of trends, unless this offset is computed with more data somehow.

Here’s a comment at least one order of magnitude less significant than that of everything else here: Temperatures are measured (and reported) in degrees centigrade, but differences (or changes) in temperature are measured and recorded in centigrade degrees. So it’s Cd, not dC, when talking about a trend.

I think the answer to this is that you lose one year of data to initialize the FDM–you only get a diff for yr 2 and lose yr 1.

I was excited to try the method on the Antarctic but found it didn’t work well for me. I was combining two to four close temperature stations into a regional trend. On integration the trends I was getting were nothing like the original stations. After some consideration I came to understand that the first differences method doesn’t work well if the station count is too low.

The first delta steps from the Anomaly mean to the second value. It makes a lot of sense conceptually and all other deltas follow. The anomaly mean becomes the C in the integration. Mathematically it is equivalent to using the raw anomalies and before averaging, offset the first point of the anomaly to the anomaly mean of the rest of the data.

In practice, I found that the up to several degree monthly deltas between nearby temperature stations meant that the offsets of the first month of each newly introduced anomaly ended up making large and essentially random steps in the data. While it sounded like the right way to go, first differences led to terrible replication of trends. The effect was so bad that even a hundred stations didn’t return the entire Antarctic to reasonable trends.

I produced a plot a long time ago where multiple months of coexisting data were used to determine each anomaly offset. It’s identical to first differences except that it uses additional data to provide an accurate initial offset for the integration. The plot shows the net effect on Antarctic trend with various numbers of months used to calculate the offset between anomalies. Figure 3 in the attached link.

http://noconsensus.wordpress.com/2009/08/31/area-weighted-antarctic-offset-reconstructions/

The first points in the plot even started with a negative continental trend, it was only when the offset values used information from about 30 months of overlap that things made sense. Now when I see the first differences method discussed, I have to explain that the method doesn’t work as well as expected on real world data. It makes more sense that it shouldn’t work when you consider the tiny little trends we’re looking for in the noisy anomaly data.

The answer from my perspective is to use multiple months of overlapping data to compute the offset such that artificial steps aren’t introduced. Roman’s method uses all overlapping months in its current configuration but that doesn’t allow for nearby stations to diverge in trend. An ideal balance could be found where his method is limited to a reasonable number of coexisting data points to insure that spurious steps aren’t introduced and allow enough freedom for trend divergence over time.

In principle, whenever a trend line is computed from monthly data, 12 seasonal intercepts should be used instead of 1 common intercept. This completely eliminates the arbitrary choice of a seasonal reference period, effectively letting the trend line regression reestimate the 12 arbitrary seasonal constants of integration.

In practice, it won’t make much difference for the estimated trend, but it can make the residuals look less strange. It will add a little standard error to the estimated trend, but that is as it should be, since we really don’t know what the seasonal constants are.

I apparently wasn’t logged in when I posted the above, so please color it pink!

My preference is Roman’s method (but on my MAC there was a floating point bug that I couldnt get anyone on the Mac sig to take note of) Then probably first differences.

I did CAM primarly because I could. and because I wanted to compare against zeke and CRU.

based on the CAM i lose about 3K stations out of 7K.

If I pick continous records, the number drops below 1000. answer stays the same, more or less. So, in theory, I’d pick the methods that give me the most data: Roman and FDM. In practice, CAM works.

except some tests with sparse data could only be done with Romans or FDM

hu. I;m not getting table 3. must be late

Re: Steven Mosher (Aug 20 03:37),

FD(N) = Yr(N) – Yr(N-1)

Avg(N) = average over station FD(N)’s

Cumulative = cumsum(Avg)

get that. The (y1-5) =0 ?. and then How you get -2?

Given the code I have this should be a piece of cake. bang it out this weekend

Don’t forget to compare your local trends to the trends by averaging methods. I think you’ll be surprised.

what do u mean by that. little slow this morning.

I’ve been trying to explain why first differences doesn’t work well, without much success :D.

When you combine two anomalies A and B, B starting after A both are continuous after that. Take first differences a and b. The first delta of b has a powerful affect on the offset during integration.

When you integrate(cumsum) db/dt you get B + C (a constant). When you integrate or cumsum a first difference of b by itself, C is zero, IOW the first point from the original anomaly is set to zero. When you do the cumsum of both a and b together, the first point of B is offset by a constant C which happens to be the value of A at the time B is introduced. IOW, the first point makes no step in the data and people just consider that all the rest of the data will work out.

What that means is that you have assumed that the two thermometers were both measuring the same error in temperature as compared to the climatological trend. The same delta from the true regional trend, wheras we know that all kinds of microclimate issues can drive temp a degree off kilter in one region compared to another. In practice I think you’ll find that the thermometers are often more than 1 C different from a mean trend line – but FDM assumes that they are the same – always.

The result is FDM gives you a lousy result with real data.

If however, you use more of the data to calculate that offset when B starts, you can caluclate the correct climatological offset and combine your anomalies with far less error. The link I provided elsewhere in this thread does exactly that. Figure 3 in it started with 1 month – FDM all the way to 60 months and shows the net effect on trend. I even made a video once of what happened regionally in the Antarctic but never posted it.

Try it out yourself though, I’ve been wrong before. It doesn’t take a lot of programming anyway, just check the result to make sure that it’s decent. On a global average with thousands of stations, it will probably turn out fine, in a regional plot, my guess is that it will make a colorful map.

This paragraph wasn’t wrong but to make it more clear:

What that means is that you have assumed that

[at that first point where B is introduced]the two thermometers were both measuring the same error in temperature as compared to the climatological trend.Jeff,

I may have this wrong, but I don’t think that FDM creates the issue you’ve attributed to it. From the first paragraph at the link you provided, it appears that you’ve anomalized (is that a word?) each record, and then averaged the anomalies. That yields results such as Hu’s Table 2, which clearly does not produce an accurate answer for the toy problem.

Instead, the order of operations must be:

(a) compute first differences for each record;

(b) for each year, average the first differences, only using records which have a FD for that year;

(c) cumulate the average FD’s from step b;

(d) compute a reference level from the values of step c, and

(e) subtract the reference level of step d from the step c sequence to produce the anomaly sequence.

It’s still unclear to me whether this technique will be better with real-world data, than the extant methods. In particular, I wonder how sensitive it is to lower-quality records. Offhand, it would seem that a low-quality record could distort the average FD during its tenure, and thereby would introduce error into the accumulated anomaly. But under the assumption that all records show the same anomaly trends, FDM seems to be an excellent method.

Having written the last paragraph above about the effect which a low-quality record can introduce into a trend calculation, I now realize that Jean S’s 4:49 post made the same observation more concisely. Far more importantly, he proposed a solution to the issue. So read that post instead.

When you reintegrate all your steps, the first step from your FD becomes an offsetting step. It’s mathematically equivalent to offsetting the first point of a newly introduced anomaly to the mean of the existing trend.

The point to point noise of the anomaly and trend cause steps in the data which are not small in comparison to the trend.

Jeff,

I agree with you fully if all the records were complete, that is, contained data for all years. FDM provides no advantage in that case, and almost certainly requires additional mathematical operations.

But its utility arises when combining records which don’t span the entire duration under consideration. It provides a simple, though not foolproof, method of combining partial records. As your straightforward example in the first paragraph of http://noconsensus.wordpress.com/2009/08/31/area-weighted-antarctic-offset-reconstructions/ shows, averaging anomalies can produce artificial steps when records begin or end. FDM deals with that case without introducing any artifacts.

I just came across this and have been toying with the same differential method. The key is to drop the first data point after a station break. I’d like to say great minds think alike but I need to be more modest ;-)

I see he means offset<-ave(y1:y5) so thats 2

then first year is 0-2. ok

OK.

http://www.google.com/images?hl=en&q=senor+wences&rlz=1I7SKPB_en&um=1&ie=UTF-8&source=univ&ei=uRxvTJCyIpSgnQfBsrTeBw&sa=X&oi=image_result_group&ct=title&resnum=4&ved=0CDoQsAQwAw

In general, I like the fundamental idea behind the FDM. The problem with the method IMO is that it is relative sensitive to even single outliers in a time step (the error propagates to all subsequent time instances due to the cum operator). I’ve previously proposed here to fix this issue by modifying the FDM such that

median(instead of the average) of station FDs is calculated. This can be generalized to other robust techniques. A relatively appealing (to me) method would be rather simple weighted median if, e.g., area/distance weighting is desired.The cumulative should be insensitive to outliers. It reduces down to just last value minus first value in the series. (n2 – n1) + (n3 – n2) …

My question with FDM is what are the differences telling us. If there’s a series that is:

2 2 2 2 2 4 2 2 2 2 2

FDM gives us the following for differences

0 0 0 0 2 -2 0 0 0 0

Interpreted like anomalies, this says that the sixth value is warm and the seventh value is cold. Anomaly from the average says that the sixth value is warm, the rest are slightly below average. Which interpretation is correct for the seventh value, cold or average?

mt,

In case you didn’t follow Hu’s reply, your sequence of FDMs are correct. However, you can’t interpret this sequence as an anomaly sequence; its units are degrees (C or K) per year. In order to reach a sequence which is temperature-like, you must cumulate the FDMs as described in the original post. That yields the sequence

0 0 0 0 0 2 0 0 0 0 0,

as in Hu’s post, arbitrarily setting the year 1 value to 0. Hu then forms a reference temp. by averaging all the values, and subtracts the reference value from each element of the sequence, producing

-.2 -.2 -.2 -.2 -.2 -.2 1.8 -.2 -.2 -.2 -.2 -.2

which is the temperature anomaly sequence by this method. Year 7 is warm, others are “average.” No cold years in this record.

By the way, the subtraction of a baseline period (in order to compute the anomaly) means that the value selected above for the first year during the cumulation step, is immaterial. Whatever value you choose, it will be cancelled in the anomaly calculation. Of course, the choice of baseline period can shift the entire sequence up or down, as is true of any anomaly sequence.

A weighted median or trimmed mean would certainly be a useful check on the average.

Erroneous single observations — recording 17 dC as 71 dC for example — won’t affect the long-run change in FDM, but will only show up in the one cumulative observation.

FDM would still be subject to permanent errors caused by unrecorded changes in station location etc, but that’s true of any method. But Jean’s median FDM method would automatically eliminate such errors, whether transitory or permanent, if they were important.

In reply to mt, FDM would give

0 0 0 0 0 0 2 0 0 0 0 0 0

relative to Year 1 = 0, or approximately

-.2 -.2 -.2 -.2 -.2 -.2 1.8 -.2 -.2 -.2 -.2 -.2 -.2

relative to Years 1-11 = 0. Either way, year 7 shows up as warm.

I was interested in the comment concerning kriging for estimating the spatial averages. This is an interesting question that I have thought about a lot. Kriging has a number of advantages for this purpose, such as:

1. Automatic declustering (based on a spatial covariance model assumption, of course)

2. Easily assigning measurement errors to each sample

3. Filtering noise modelled in the spatial covariance structure

4. Providing a simple estimate of the uncertainty of each estimation

5. A well established method of making estimates from point measurements (stations) to blocks (cell) averages.

6. Straightforward inclusion of collocated data eg elevations etc.

7. Option to use inequalities eg if a station is considered contaminated and is reading too hot then we can indicate that this is a maximum temperature for the station series and the Kriging should take this into account.

All of these are important issues and not well addressed by simple cell averaging of samples.

However, there are then the questions with Kriging of estimation on a geoid, the handling of stationarity and the selection of the spatial covariance model itself. Also, do you krige each series by year over the globe or consider the problem as a 2D geoid surface with a third dimension in time? The latter 3D approach has the advantage of allowing the kriging to interpolate also missing station values. In the time direction the stationarity assumption would become quite complicated though.

I like Jean’s notion of the median. If I understand it correctly it would better handle abrupt changes due to poorly documented station and instrument changes.

Would this method mean that we could forego other types of adjustments and simply go with the raw data? Assuming, of course, we could actually use raw data.

The other question I have is the whether stations with discontinuous records generally have as good data as those with continuous records.

There go those pesky Team members, underestimating trends again!

Hu: Your point about using ‘first differences’ in calculations of average trends may have some merit, but your example of what is wrong with simply averaging anomalies relative to individual stations means is totally misleading. It would make no sense to calculate anomalies the way you suggest we did (and we certainly did not).

In the face of missing data, one cannot simply assume values of zero — which is what you have done in estimating the raw trends for each station as 1°/yr. Indeed, your estimate of the trend is 10 times (!) too large. As a minimum infilling method, one would put in climatology: 13.5 for station A, 10.5 for station B, etc. In that way, your original estimate for the trend — all over the same period — would be 0.05. This is the conservative approach, which is presumably what one wants. This is also the value you get when you calculate the mean trend (not 1/6).

Re: Eric Steig (Aug 20 09:23),

Dr. Steig,

Thanks for the comments. Your critique of Hu’s Toy Example is opaque to me, starting with

“In the face of missing data, one cannot simply assume values of zero — which is what you have done in estimating the raw trends for each station as 1°/yr.”

Specifically, I can’t follow why one would infill 13.5 for station A, 10.5 for B, and so on. A resulting trend estimate of 0.05 for that particular Toy Example would seem to be much too low.

Apologies in advance if I am missing something obvious…

13 to 14 can be a change from 13.45 to 13.55.

it can also be a change from 12.55 to 14.45.

So it can range from almost 0 to almost 2 degrees. Hence Dr. McCulloch’s estimate of 1 degree.

However given the expected magnitude of warming, a 2 degree change would be very large and very unlikely. Dr. Steig’s point is that a better estimate would be for only a small increase and thus the estimate should be that the temperature should be estimated to be 13.5 for unfilled entries.

I think that the reply to Dr. Steig’s objection is that the error might be large for a single entry but for a realistic situation in which there are a large number of stations then the errors would average out.

Re: Stan Plamer (Aug 20 11:34),

OK, fwiw I was interpreting the Toy Problem differently. I thought that with “13”, Hu meant “13.0” or “13.00”, etc. Not trying to give realistic trends, instead providing an example of how some simple numbers would play out via the First Difference Method arithmetic.

Yes, by “13”, etc, I meant that the precisely measured local temperature was the integer 13.

I’d like to expand a bit on that…

I fully realize that this was just a “toy example” from Hu to demonstrate the difference between the two techniques, but a more realistic example where there is a truly discernable trend would be more helpful. To me at least.

Also, Dr. Steig, could you clarify exactly where Hu assumed missing values were zero?

Thanks.

In Table 3, missing values will generate a value of 0 (marked as -)

Not following…in Table 3, Hu has the average of 1, -, -, and – as 1. If he were substituting “0” for -, then the average would be 0.25, no?

Or am I still “missing” something. :)

Since I ventured into the fray here, I suppose I ought to help out a bit. But this is not the new normal for me. Don’t go complaining if I dont’ reply to every last question. I’m don’t spend time here. I only took a look since a colleague told me ‘they are complaining about your work at CA again…’ As usual, it’s a quick one-off opinion, which turns out to be wrong, but I don’t expect anyone will correct it, since the ridiculous stuff about station Harry and how much difference it made (not) is still up, uncorrected, as far as I’m aware.

Anyway, this is not complicated.

McCulloch shows an example with missing values.

Station A has 13, 14, –, — , –.

What is the correct thing to assume about the missing values? I said that he assumed zero, but what he actually implicitly assumes is 15, 16, 17, since he claims the trend is 1°/ yr. But then he ignores that in calculating the anomalies, which implicitly are assumed (correctly) to be zero (that is, no different from the the climatological mean value, of 13.5 , based on existing data) for all missing data. So he’s comparing apples and oranges.

If you don’t get it, play around with the numbers and it will make sense.

Dr. Steig,

In fairness, your response on Station Harry was given a fair amount of prominence on CA – see http://climateaudit.org/2009/02/06/steigs-corrections/

No-one expects you to answer all questions, but if you do stick around, I think you will see that you are treated with respect by the regulars here. This is probably the most technical of the “skeptic” blogs (sorry Jeff) and people appreciate questions being answered and errors being pointed out – no matter what side of the aisle they are made.

At what other blog, mainstream or skeptic, do you see this type of back and forth discussion on technical issues like the First Difference Method?

Re: Eric Steig (Aug 20 14:55),

(Bold mine)

Really? This assumption is the cause of the exact problem with the use of anomalies that the FDM is trying to fix. The “correct” assumption should be that the unobserved missing temperature is the sum of the climatology (i.e. the long term average) PLUS the anomaly you would get if you were able to measure it at that location at that particular time. Replacing the latter with zero is going to produce a bias in the end result.

It should be noted that if there are no missing values in the data set – all stations are present throughout the entire temperature record without a single missed value – there would be no need for exotic statistical methods. To calculate a “global” temperature, simply average (possibly in a weighted areally representative fashion) the station measurements. Now, start selecting values in the set and replace them with their long term station mean. The “new” global average including this imputed value will shrink toward the mean global temperature over the entire record so zero anomaly is a poor choice particularly since the effect will depend on whether the overall temperatures are warmer or cooler than average at that moment.

What is needed is a way of estimating the missing anomaly when combining records. Some of the methods may include temporal interpolation of the single station (needs anomalies if the interpolation uses consecutive months and misses annual variation if using consecutive year values) or imputation from nearby stations (possibly only one or two stations). The Least Squares method used by Jeff and myself uses all available measurements during that month to estimate the missing anomaly.

Re: Eric Steig (Aug 20 14:55),

What makes you think

anythingwas assumed about the missing data?What I appreciate about this method, at first glance, is it doesn’t appear to require any infilling nor assumptions about missing data.

It operates only on the data that is present. I could easily be missing something, but where is it making assumptions about missing data?

Re: Eric Steig (Aug 20 14:55),

Actually, he explicitly measures the trend based on data that is present. The trend measures out (and is visibly obvious as) 1°/ yr.

I suppose you could say he assumes the missing data matches the measured trend. But in reality, he simply assumes the missing data is missing… ie it should not factor into any calculations.

Of course, with so much data missing, I would hope the confidence intervals would be rather wide… :)

[replying to Eric Steig 8/20 9:23 AM]

Thanks, Eric, for checking in! BTW, thanks also for posting your AVHRR matrix last year. It gave us a lot to chew on!

It was my understanding last year that the two Jeffs and Ryan O were able to replicate your reconstruction pretty well using seasonally demeaned BAS manned station data, plus the first 3 PCs of the AVHRR matrix. Each seasonal mean can only be computed from the available data. Please correct me if I’m wrong.

No — I’m just assuming that we know nothing about the missing values, and computing the mean change from the (very limited) available data.

Is this what you did in Steig et al 09??? If so, you grossly underestimated any trend that was actually present!

As it happens, I should have said 1/5, not 1/6 (see correction in post). But not .05!

Yes — When I was playing with the data last year, I immediately noticed that the only real data for W. Ant. is Byrd (1957-1970 + a few summers until 1975) and Russkaya (1980-90). Byrd has a strong up trend during its short life, but if you compute anomalies they contribute next to nothing to the overall trend, since they are all in the first half. Russkaya is essentially flat, so anything computed from these two series by CAM will return little uptrend.

FDM will give a strong uptrend in the first half of your 1957-2006 period, but flatness in the second half. This will increase the point estimate of the trend, but also its se, so it’s not obvious what the net effect on significance would be.

Team H underestimates MWP trends as well!

Hu, that is not quite right. You start with the seasonally demeaned BAS data, but then you infill the missing points with RegEM. This is not conceptually different than the interpolation required for FDM; and is arguably a more accurate procedure. One might argue whether it is the

mostaccurate (probably not), but after having tried both, I would place my money on RegEM.Anyway, Dr. Steig did not simply regress the average seasonally demeaned BAS data against the AVHRR eigenvectors. While your comments may be quite applicable to the temperature indices like GISTemp, they don’t apply to the S09 reconstruction.

What RegEM does with the data is a separate issue. My problem is with starting with the seasonally demeaned BAS data. This would be appropriate if every station were observed for the entire study period (1957-2006). However, since many of them (eg Byrd, Russkaya) are only observed for a decade or so, subtracting their seasonal means forces them to average to zero in their observation period. But if you want to entertain the possiblity of a trend (or even random walk drift that leads to a significant change in the level), you should not start out by assuming the data is stationary with mean 0.

Do you concur, then, that S09 started by removing each station’s seasonal means? My point was that no matter what you do from there, you have lost a large part of any trend (or random drift) that may or may not be in the data.

Hu, what they do with RegEM is

nota separate issue. It’s integral.Here’s an example (I don’t know how to do tables, so unfortunately I’m stuck with words):

Generate 3 series, each of length 400, and embed a trend in them. For series 1, replace the last 200 points with NA. For series 2, replace the first and last 100 points with NA. For series 3, replace the first 200 points with NA.

Seasonally demean the existing points. If you calculate the mean trend

at this point, it will be greatly reduced from what you embedded. But this doesn’t mean that the trend information is “lost” . . . all it means is that you calculated the trend prematurely.The trend information is easy to restore. First, calculate the difference in mean between the 100 overlapping points for series 1 and 2 and add that mean to series 1 (such that the difference in mean during the overlap is zero). Then calculate the difference in mean between the 100 overlapping points for series 2 and 3 and add that mean to series 3. If you calculate the trend

now, it will be equal to the embedded trend.While you could certainly go through the entire BAS READER database and do this (Jeff Id did exactly that with the nearest-station reconstructions), you could also use RegEM to determine these offsets. The latter is what S09 chose. Despite the vast difference in complexity, the two methods yield nearly identical results.

Just because you start with seasonal anomalies does not mean the end result is a reduction in trend. If one does not account for the different anomaly reference periods, your comments are spot on. If one does account for the different reference periods, this problem does not exist (excluding sampling error effects).

This, of course, makes sense . . . the whole problem from the beginning was removing means from the series that were referenced to different timeframes. Adjusting the difference in means to be zero during periods of overlap adjusts the anomalies to have the same reference timeframe (plus or minus sampling error in determining the difference in means).

With noiseless data, this will yield identical results to FDM. With noise, the bargain-basement FDM method seems to perform

worse. Your proposed solution – a longish reference period where most of the data is complete – is conceptually identical to seasonally demeaning everything using that as a reference period and doing FDM on the anomalies. FDM provides the offsets for stations that don’t exist in the reference period in a neater way than how one would offset anomalies (and in certain cases, a more robust way), it, too, has issues.S09 does not have the problem you describe.

Replace this:

with this:

With noiseless data, this will yield identical results to FDM. With noise, the bargain-basement FDM method seems to perform worse. Your proposed solution – a longish reference period where most of the data is complete – is mathematically the same operation as seasonally demeaning everything using a longish reference period (i.e., removal of 12 seasonal means calculated over the same period) where most of the data is complete and doing FDM on the anomalies. Though FDM provides the offsets for stations that don’t exist in the reference period in a neater way than what is typically done for anomalies, FDM is not without problems and the anomaly method is not entirely without solutions.

Actually, for a really simple example, just perform the offsetting procedure I described on your Table 2. You will get the correct trend.

I have a somewhat unrelated question for you guys. I posted this over at moshtemp too.

Which method do you view as superior, the RSM method or the first difference method? I have tried to do my own variant of the RSM method in which the MAAT average of all the stations over the time period is used as a reference. I then compute the offsets between each stations data and the MAAT average data. Adjust each station for the average offset of the overlapping period and then compute the MAAT average of the offset-adjusted stations. Do you think this method introduces any flaws into the method? I just thought it seemed easier than the RSM method without the data requirements of the anomaly method

Interesting discussion. I’ve been meaning to play around with implementations of each method in the same model and see how trends change both regionally or globally, but I’ve been somewhat sidetracked with other projects.

Its worth noting that in our piece linked at the beginning of the post, Mosh and I were only making the point that the choice of anomalization method appears not to significantly alter the temperature trend at the global level. I certainly imagine that CAM can lead to a fair bit of bias in specific regions with poor coverage and discontinuous records.

Re: Zeke Hausfather (Aug 20 14:38),

First cut is done. I have some more clean up work to do and then I’ll shoot you final final answers. Maybe a week depending how the code clean up and organization goes. Plus I need a break

Hu,

While FDM has advantages over anomalies, the specific deficiency you have shown in your examples only occurs if the baseline has incomplete data

andyou do not adjust station means to be equal during periods where data overlap occurs. If you do that adjustment with your examples above, the difference between FDM and anomalies is exactly zero.As far as the Antarctic trends of Dr. Steig et al., I agree with Dr. Steig. This particular issue doesn’t cause a problem directly. It does indirectly, but it is the same problem FDM has: what to do about missing data and “noise” (deviations from the assumed linear trend). Rather than the simple interpolation procedure you propose above for FDM (which has it’s own issues), the interpolation is determined via RegEM. Different ways to skin the same cat, but conceptually similar. And RegEM is not nearly as sensitive to outliers. While RegEM may not be the best method, without some kind of modification to FDM like Jean S proposed, I think it’s far better.

Echo Jeff’s experiences trying to use it for Antarctica. I think I tried once and gave up, because I’d already tried it on Siberian GHCN data and didn’t have much luck there, either.

I also tried FDM on the Antartic station data and got hopeless results. I concluded that the data was just too noisy and incomplete for the method to succeed. RegEM isn’t perfect by any means, but it certainly works far better in this case.

Anyone know if there’s any accuracy to the method I mentioned above at 10:43 AM?

I’d really like some feedback as i’m working on something

Robert you should provide an example of sorts. hard to tell what you are doing. pretend you were asking me to code it

Well, Call me stupid but I’ll just code the sucker up.

Looking at my data structures it should be a quick job.

R help list to the rescue.

Hu, if you want to supervise I am open to taking direction.

I’ll open a thread since I just completed my other work, replicating myself. how dumb was that.

So I can think about this some more.

1.you difference every time series.

2. you average them

a. by gridcell

b. then area weight.

3. you cumsum.

4. Pick a reference point or period.

5. recenter to the reference.

is that it?

Hmm. It doesnt look right.

“If the data are monthly or even daily, annual differences are taken, divided by 12 or 365, and then accumulated to obtain a monthly or daily index. ”

Unclear,Hu.

with monthly data I’ve taken the first difference of every series. next step?

I’m reading between the lines here, but I’m guessing that Hu’s approach is

1. FD(month n) = (Temp(n) – Temp(n-12))/12 — for each record

2. average FDs, same as annual data

3. cumulate result of step 2. Difference here is that one seems to lose 11 months, whereas in using annual data, the resultant sequence is as long as the original. E.g. if you had a single record of length N years, you compute N-1 first differences, and after cumulation you have N years again. With a monthly record of length N months, you get N-12 first differences, and after cumulation you have only N-11 data points.

4. & 5. Same as annual.

Off the top of my head, I’d think it preferable to treat monthly data as twelve separate annually-sampled sequences. Segregate data by month. Process all the January data (perform your steps 1 through 5), then February, etc. Finally, interlace the 12 resultant anomalies. Cum grano salis.

Mosh —

Take the January differences, Feb. differences, etc, so that you lose the first 12 observations (instead of just the first observation as in my annual example). Then average these differences (somehow weighted) across available stations. Then select a constant of integration for each month, the simplest being 0, so that the 12 values for the first year are all 0. Then accumulate all the January average differences to get the January portion of the index, all the Feb averages differences to get the Febs, etc.

If desired, convert the entire series to a selected base period like 1961-90, by subtracting out each month’s average during this period, and thereby making the grand average during this base period zero as well. This erases your arbitrary choice of zero in year 1 as the constant of integration, but replaces it by a zero average during your base period instead, which is less conspicuous but equally arbitrary.

Then if you try to fit a trend line to the data, you should ideally include 12 seasonal intercepts rather than just 1 intercept in order to allow the seasonals to fit the same trend as the annual averages.

If you want just come to my blog and we will work through it there. the first step, taking the 12 month lag is done, and combining and area weighting are also built in functions for me. If you like.

That’s because it wasn’t — see corrections to head post! :-)

I have a few posts up that might be of interest.

Methods to Combine Station Data

Combining Inhomogeneous Station Data – Part I

Combining Inhomogeneous Station Data – Part II

I wonder if these methods break down when you have wide variations in the geography in the cell grid, mainly due to differences in altitude, latitude and

coastal vs inland locations of the stations. In a similarly sized grid in California you have vaste differences between temperatures between coastal and

desert locations even though they are at similar altitudes and latitudes. Temperature swings in the desert can be 100 deg F from winter to summer. Taking the average monthly or annual temperature and then computing off-sets from this leads to widely fluctuating values. On the coast the temperature extremes are highly muted by the closeness to the Pacific Ocean. The problem is that most of the stations that are used in the station temperature averaging

(I am thinking of the stuff I saw when we first found out what stations CRU

were using) are from coastal, urban centers. Giant sections of the hinterlands

are not used, because they lack the longevity and accuracy of the more

civilized stations in the urban areas. As you know, those stations can have

missing data for large chunks of time. Unless interior and high altitude station are used, the average of what is left is likely to have less variance

and less likely to have extremes. Also, most of the proxies are out in the hinterlands near the remote sites.

Hu M when you say,

“At the same time, FDM eliminates the need for opaquely complex adjustments for TOBS or MMTS, and automatically takes care of station moves.”

I absolutely do not understand what FDM or using anomalies would have to do with TOBS and MMTS adjustments.

USHCN and GHCN use difference series amongst stations to make homogeneity adjustments by looking for statistically significant break points in the differenced series.

I also recall a major temperature series using FDM for making adjustments but then switched to anomalies and for reasons. Perhaps Steve Mosher has a reference on this – or I could look again.

The substantial TOBS adjustment is based on a complicated regression model taking latitude, position within the time zone, proximity to mountains, etc, into account in order to try to predict the effect of a TOBS change on an individual station. If you need to calculate anomalies relative to some base period, like 1961-90, this lets you extend your series back or forward past a TOBS change. But then the value of your index depends systematically on any error in your TOBS model.

But if you use FDM, you can just treat the two series as different stations, leaving it to the numerous other stations to bridge the change date. After the fact, you can come back and look at the unaccounted for change in the station’s readings in order to study the effect of TOBS changes, but your index won’t depend on that study.

The also substantial MMTS adjustment is just a constant, but that constant has been measured with error from just a few observed equipment changes (and then only indirectly, from stations that may be in the next county). Again, FDM would just treat the new equipment as a new station. After the fact, you could come back and estimate the MMTS constant from all the observed equipment changes, but your estimate would be derived from the index and not vice-versa.

It’s baffling to me that this causes so much confusion (?).

1. **Basics**. If you simply first difference everything, then average (or median, or robust average, etc) the stations that have data you get the mean (or median, or robust average) temperature change for all stations that report in that period *and the period before*.

2. **Zeros**. This is one point of confusion I think. If data doesn’t exist for a time period, **and the period immediately before**, then the first difference doesn’t exist, by definition. You may call that data point an ‘N/A’ if you wish, but it is certianly **not a zero**. Zero is a number, so to call it zero is essentially an infill estimation (see below). So for example, if you average a bunch of zeros for missing data with first differences that exist, you will reduce the absolute magnitude of the overall first differnce. ‘N/A’ s will not affect the average of the existing first differences at all. They just don’t count in the calculation.

(NB – Note that if you had data that was only available every other period, you would never have a strictly defined first difference available!]

4. **Missing Data**. This is open to interpretation/judgment. If a first difference is unavailable for a given time period for a given station, you have several choices. (1) You may call it N/A and exclude it from the calculation. (2) You may assume it has the same first difference as the mean (or median,etc) of all the other stations – this is exactly the same as (1) for a mean calculation!. (3) You can attempt to estimate it with a regression using the past data from this station and other stations. Depending on the r^2 of this regression this may be OK. (4) You can set it to zero – this is by far the worst of all the options….it has no justification at all.

Any thoughts appreciated…

I second argument 2 – you need one datum to initialise the FDM (Craig Loehle has an unanswered post to the same effect at 6:22 AM).

To me, this method is particularly attractive, because it assumes nothing (!) about nonexistent data, and still uses the most of the existing data.

Re: Mesa (Aug 20 20:07),

Ok. I’ll use NA approach and not infill

Ditto your thoughts on confusion here. What is remarkable is that virtually no one recognizes that the FDs of a time-series lead to a fundamentally different sense of “trend” than that implicit in linear regression.

Inasmuch as the time-average FD of any finite series is simply the ROC of the secant-slope between the initial and last point in the series, we get an EQUALLY time-weighted trend. Linear regression, on the other hand, uses weighting that INCREASES LINEARLY away from the central time-point of the series.

FDing materially changes the spectral structure of the time-series via tha well-known high-pass filter amplitude response 2sin(omg/2), where omg is the radian frequency in the baseband range 0-pi. By increasingly suppressing the low-frequency components, which tend to dominate climate data, their confounding effect upon estimates of regressional trend is sharply reduced. Thus the temporal mean of the FDs is closer to the SECULAR trend. The regressional trend, of course, can be obtained by accumulating the FDs from beginning to end to reconstruct the original series sans the temporal mean.

The rub, however, comes when there are gaps in the multiple series whose FDs are ensemble averaged. This introduces errors in the average FD at any time step, which propagate cumulatively thereafter, making exact recontruction of the ensemble-averaged series impossible. There is no free lunch!

What further compromises ensemble averages obtained from gapped temperature data is the fact that the variance is by no means uniform from station to station regionally, let alone throughout the globe. Thus high-variance Siberian stations have a stronger influence on the global average than low-variance tropical coastal stations.

The entire enterprise of using fragmented data records to construct “global average temperature series,” relying implicitly upon the premise of spatial homegeneity of temperature variation and uniform geographic coverage, needs to be re-evaluated from the ground up.

Hu, thanks for a clear explanation of the First Difference Method. I use the FDM solely. It makes the best use of sparse and spotty data, or as it is known in climate science … data …

If an “anomaly” is to be computed, it should be calculated after the temperature series is computed. That avoids the missing data overlap problem, etc. The “anomaly” calculation is just a renormalization of the temperature data. In no way should it affect the calculation itself….that’s silly.

BTW – Eric Steig’s posts make no sense to me at all….I believe you did the FDM calculation correctly – your dashes are ‘N/A’s – not zeros.

Finally, there is a huge amount of technology available for infilling methods if that is desired, including tests ffor R2 that tell you if the infilling is likely to help versus excluding unavailable data….

Another point – infilling is likely to be more successful with autocorrelated data, and data that has strong correlation to ‘neighbors’…but this can be tested easily and a metric established for go/no go on infilling….

Thanks for the explanation Chad. Took 5 minutes to do it

( actually less)

1. email the R list for the right function in the package I use.

2. load the right Robject

3. a couple lines of code.

DataOBJ<-loadV2Mean(getwd())

V2Mean<-DataOBJ$Data

V2Zoo<-as.Zoo(V2Mean)

X<-diff(V2Zoo,lag=12,na.PAD=T)

X<-X/12

Y<-rowMeans(X,na.rm=T)

Z<-cumsum(Y)

plot(as.zoo(Z)))

First cut, no spatial weighting, which are just builtin functions for me

LinkText Here

Re: steven mosher (Aug 21 11:30),

Thats’ 1/10s of C guys. so needs to be scaled

I’ve just spent about 4 hours comparing different methods of combining anomaly. The post demostrates what FDM really is and why it shouldn’t ever be used for combining temperature series. As well as why Roman’s method is better.

I hope that those considering using FDM will take a moment to understand it.

http://noconsensus.wordpress.com/2010/08/21/comparison-and-critique-of-anomaly-combination-methods/

Thanks, Jeff — I’ll get back to you after I’ve studied it and Ryan’s points.

That’s why you’re the best.

Jeff, I think the limitation you found in using the FDM is what I read that influenced the change in methods from FDM to CAM that one of the major temperature series owners announced. I suspect it is the GHCN series and I also suspect that it included some of the same authors who wrote the 1998 paper referenced above in the introduction to this thread that were promoting the FDM. I am now more motivated to find that reference.

I think you may have made a mistake in implementation. . As I think the FDM should be used, the key is to average the derivatives, not the integrals. For the simple two station series, the derivatives are identical so when averaged and then integrated give no jump.

I averaged the derivatives dsld is the first difference of a two column matrix, the cumulative sum is taken from rowMeans.

dsld=diff(sld,1)

sim3=ts(cumsum(rowMeans(dsld,na.rm=TRUE)),end=2010,deltat=1/12)

Maybe the demo of the offset anomalies that produced the exact same result wasn’t clear enough.

http://www.ncdc.noaa.gov/temp-and-precip/ghcn-gridded-temp.html

Data Set Development

These datasets were created from station data using the Anomaly Method, a method that uses station averages during a specified base period (1961-1990 in this case) from which the monthly/seasonal/annual departures can be calculated. Prior to March 25, 2005 the First Difference Method (Peterson et al. 1998) was used in producing the Mean Temperature grid, and prior to February 24, 2004, the FD Method was used in producing the Maximum and Minimum Temperature grids. Due to recent efforts to restructure the GHCN station data by removing duplicates (see Peterson et al. 1997 for a discussion of duplicates), it became possible to develop grids for all temperature data sets using the Anomaly Method. The Anomaly Method can be considered to be superior to the FD Method when calculating gridbox size averages. The FD Method has been found to introduce random errors that increase with the number of time gaps in the data and with decreasing number of stations (Free et al. 2004). While the FD Method works well for large-scale means, these issues become serious on 5X5 degree scales.

Hu, yoy say at the start “Thanks, Jeff. I can see that as I described the method, monthly temperatures will be constant within the first year, so that a constant trend of 1 deg. per year will show up as a series of 1 deg. jumps every Dec.-Jan.”

In geostatistics, you look beyond the first difference, to observations spaced 2 apart, then those spaced 3 apart ….

Why cannot the earth be teated like an ore deposit? Each day, it is a 3-D figure with observations which can be skipped where missing, and a series of block grades made to cover the globe. Next day, do likewise. After all daily data (preferably max and min treated separately) are digested, you have a time sequence of the pattern of block estimates. This is then the starting material for investigations of trends and sensitivity studies and validation studies and autocorrelation corrections and estimates of error.

Nobody has been able to explain to me why geostatistics has been so successful in calculation block grades of are and waste in mines, but would not be so useful for gridded global temperatures. Indeed, the exercise would possibly be simpler for temperatures, though the volume of numbers would be greater. Going through first differences and kriging is starting down the path, so why not follow the whole yellow brick road?

Re: Geoff Sherrington (Aug 21 21:43),

nick stokes had a look at kridging. drop him a note and he may hold court

Hi Nick,

Did you have a deep look? Our first foray into geostatistics in its developmental years took a decade to come to grips with, then there were still plenty of hard questions unanswered. Fortunately, we had a specialist computer section with professional statsticians and mathematiciand and cartologists doing the work and I was involved on the fringe of it all, so I can’t go back and spend another decade at my age. We got superb results from Ranger One, Number One orebody, in terms of pre-mining prediction versus after-mining production. Geoff.

try at his blog

Re: Geoff Sherrington (Aug 22 20:40),

Geoff, sorry about the late response – I haven’t been tuning in much the last few days.

I’ve always thought kriging seemed like the right way to look for spatial correlation over rather randomly distributed points. It comes back to the idea of a variogram.

However, existing software and published algorithms are oriented towards a distribution of a sigle scalar, or at most just a few, whereas here we have a time series at each point.

I’m sure someone must have done kriging on a full borehole depth profile, which would be a good analogy. But I haven’t found anything on that.

So I haven’t got much further with the idea. My post is here.

Wasn’t GS’s idea to do kriging for each day and thus create a time series of kriged data?

Re: Stan Plamer (Aug 24 09:12),

I hadn’t realised that. But it would be a big task. I think monthly rather than daily would be a bit less noisy, and could use GHCN data. Also anomalies would be needed – I guess daily anomalies could be calculated, but again everything is noisier.

Nick,

If you are interested in investigating kriging further I would be happy to discuss with you and try some tests. I have a licence for the full commercial software package Isatis which is the gold standard for geostatistical applications.

I also know a bit about kriging etc too…

Re: ThinkingScientist (Aug 24 12:58),

TS, thanks. Sounds like you probably know more about it than I do.

It occurred to me that kriging is the best way to calculate interpolates, but may be overkill for the application that we mainly talk about, which is in calculating a regional or global average.

Is Isatis set up to interpolate borehole profiles in total, or does it do one level (or an average) at a time?

Re: ThinkingScientist (Aug 24 12:58),

“Also, do you krige each series by year over the globe or consider the problem as a 2D geoid surface with a third dimension in time? The latter 3D approach has the advantage of allowing the kriging to interpolate also missing station values. In the time direction the stationarity assumption would become quite complicated though.”I missed your earlier post, which covered some of the issues I’d been wondering about. Assuming the geoid issue can be dealt with, does Isatis treat that 2D+time approach, which seems to be what is needed?

Hi Nick,

Thanks for the reply.

There is no problem with kriging in 3D where one of the dimensions is time. The problem is exactly equivalent to drillholes or (as in my case) construction of 3D models from wells for stoachstic seismic inversion.

I also have in-house software for 3D kriging and uncertainty simulation using our own technology. But Isatis is an excellent starting point because it is a technical gold standard and well known in many disciplines.

Modern geostatistics has many tools appropriate to the problem and handling of of temperature series (dense in the time direction which is equivalent to well logs, but sparse in XY) is straightforward in well log software. This is essentially an exercise in 3D modelling which is what I do for a living.

I don’t write here under my real name or details but happy to coordinate offline through a mutual and anonymous email exchange – our host SM will have access to my email from my posts or maybe you have an open email I can contact you on to discuss further?

Re: ThinkingScientist (Aug 24 15:18),

TS, my contact details are here at my blog. Thanks for the suggestion.

Re: ThinkingScientist (Aug 24 15:18),

TS, for some reason my first response went in to moderation. It just had a link for the email address. I’ll try without a link – the address is at the contact link at my website (under Resources, on the right).

Hi Geoff/TS/others

would be nice in SM absence

if could give a anon post on Modern geostatistics as I think this certainly would be of interest to me & possibly others (how does it work in real life/commerce?) & possibly relate to CC debate.

thanks

Dougie

Hi Geoff,

I made some comments about kriging at the top of the post. I view the problem as a 2D surface (of the geoid) with a third dimension in time. The problem then becomes, as you say, one of interpolation into blocks of dimension (timestep)xLatxLong in size. You do need to account for great circle distances to krige on the geoid so some special code is required.

There are many advantages of doing it by kriging (see list near top of this post) but you still need a method to prepare the input data. Using the FDM approach seems to remove the problems of station moves etc as you end up with separate station series following a move which can easily be kriged in 3D (where the third dimension is time) However with kriging (and implicit in many other methods eg inverse distance etc) you still have the issue of the stationarity assumption to deal with.

Hu, My apologies for the runaway on geostatistics. It was not a fair diversion of your post. Is there a way it can be placed on a new thread if deemed worthwhile? There is a large group of geostats people from widerworld who do not seem to have entered the discussion to date – I’m sure they have a lot to contribute. There are a few suggestions I’d like to make to examine another path or two, some overalapping with your approach.

A post on Kriging on the Sphere would be a good idea — plus it ties in with some of Steve’s old posts, including the Chaldni pattern ones.

I can log in and start the post this evening from my home computer, but for some reason my office computer won’t let me log into CA.

I’ve been playing with the math a little, and have figured out how to integrate the Krige (exponential decay with great circle distance) correlations over the sphere in closed form. The covariances about the global mean, which must integrate to zero, are then the Krige covariances minus this constant. The correlations about he global mean are then a slight modification of the Krige correlations, and go slightly negative at great distances.

Jeff’s illustration at http://noconsensus.wordpress.com/2010/08/21/comparison-and-critique-of-anomaly-combination-methods/ is based on the point that when a second series is added to a single initial series, FDM is equivalent to adjusting the second series to match the first series at its first observation and then averaging the two levels together from that point on. However, if each series consists of a climate signal plus transitory noise, it will sometimes happen that the first series will have a high noise value when the second series is added. The noise in the first series then gets enshrined into the second series, even though the second series has no tendency to reverse it (as will happen in the first series). The result can be a very noisy reconstruction. The problem is lessened, but not eliminated when there are several series. My example was all signal and no noise, so this problem did not arise.

Jeff recommends instead adjusting the new series to have the same average as the original series over a long timespan. This makes sense, but I’m wondering how to do this with several series, given that each pair of series probably have a different overlap period. Perhaps a weighed average could be used, weighting each of the overlaps by its length. However, something symmetrical has to be done when series drop out, and I’m not sure what that would be.

So I guess FDM isn’t all I cracked it up to be. Thanks for your patience, Jeff!

One should still just treat TOBS and equipment changes as a new station, in order to dispense with complicated and perhaps inaccurate adjustments for these factors.

You might try looking at several posts which describe the method used by Jeff starting with this one. In fact, the method effectively compares the stations pairwise using all available observations over the common (and not necessarily contiguous) overlap period.

It has many positive features. No single common period is required. Weights for combining can be incorporated. Since the method reduces to a statistically standard approach, there are no difficulties in calculating uncertainty bounds. Examining the residuals from the fit gives information about possible station changes without a reliance on a small sample of specifically chosen neighbors.

The resulting combined series represents actual seasonal temperatures, not anomalies. Anomalies can then be calculated relative to any base if so desired after the fact (even for stations which do not overlap the reference period). One small drawback is that adding new data can change earlier estimates in the combined series because the latest values will add new information on station differences. However, these differences will generally be relatively small.

Thanks, Roman — I now see that your Plan B modification of Tamino’s suggestion is the way to go. http://statpad.wordpress.com/2010/02/19/combining-stations-plan-b/

As stated, it gives equal weights to all stations. But estimating it by GLS with an appropriate covariance matrix would be straightforward.

True — but in live time this just means that you have to settle on a set of stations (with at least 10 years or so of readings), compute offsets, and then go with that formula for several years. 5 or 10 years later, you come out with Version 2 of your index, with new stations added and slightly modified offsets for the old stations.

And so on.

Unfortunately, Tamino changed his server in March and so his Feb. post on this is not included in his archive. Perhaps he can respost it?

Hu,

I got so excited to see a better method to average temperatures together, I created a whole global temperature trend using Roman’s method.

http://noconsensus.wordpress.com/2010/03/24/thermal-hammer/

Note that the anomaly isn’t even calculated until the last moment and only for visual display. Figure 4 is the 12 annual intercepts you recommended elsewhere for trend calculation in this thread to take care of the stairstep problem. The multiple intercepts were then used for a non-stepped anomaly creation as well.

More complete documentation of the code and method is here (less plots though).

http://noconsensus.wordpress.com/2010/03/25/thermal-hammer-part-deux/

Are you ignoring TOBS and MMTS changes, or treating them as new “stations”? Is this what you mean by “same instrument”?

I just used the ghcn V2.mean file with the various ‘duplicate’ versions stated to come at times from different sources. Often the data is exactly the same where it coexists but covering different time windows. Other times it’s got some small noise between them, still others it’s totally different.

A good example is here:

http://statpad.wordpress.com/2010/07/19/ghcn-twins/#more-417

Comment got stuck in moderation, it had links to a site of questionable character.

Thanks Hu,

Just to be really clear, I didn’t read about FDM and say that won’t work. I said that’s a great idea, and went on to implement it. ’twas the god of physics that smote my efforts.

My Role in the Antarctic paper was to provide an area weighted reconstruction to demonstrate that our result matched what the temperature stations were showing, I started with FDM and ended up with monthly offset. It was an engineering style sanity check which visually and numerically matched Ryan and Nic’s various more complex regressions and didn’t match Steig’s original very well. To me it was important because it demonstrated that higher resolution from higher DOF reconstructions was correctly spatially constraining the station information, but the reviewers seemed so unimpressed that they didn’t even mention it :D In the latest re-submission we took it and a lot of other work completely out.

The thing I tried for the method you described was averaging out all the raw data per year (or month if you wish) and then using that as the reference station from which offsets for each series are calculated.

“FDM is equivalent to adjusting the second series to match the first series at its first observation and then averaging the two levels together from that point on. However, if each series consists of a climate signal plus transitory noise, it will sometimes happen that the first series will have a high noise value when the second series is added. The noise in the first series then gets enshrined into the second series, even though the second series has no tendency to reverse it (as will happen in the first series).”

Why try and combine the series at all? From the first differences, one should be able to compute a separate trend for each part and then – if desired – average these trends to arrive at an overall trend. E.g. if you have three bits of 100 years each, just adjacent to each other with no overlap at all (and thus no easy way of joining them without spurious steps and years 101 and 201), you can still determine there’s a trend of, say, +1deg/century for the first bit, -2deg/century for the second, and +1deg/century again for the third, which then tells you that in this hypothetic example, over these 300 years of observations the total trend is zero. With real-world data that doesn’t come in orderly one-century blocks, the calculation of the overall trend from the partial trends will be less obvious, but can certainly be done.

If one has a limited objective, for example finding the trend in Antarctic temperatures for 1957-2006 as in S09, Chris is quite right — just run a regression on time with 12 seasonal intercepts for each station. Or, if that won’t fit in your computer and you don’t care about losing the intra-year drift, skimp by first computing seasonals for each station relative to the station’s average, and then just include a single intercept for each station.

But if we just want a temperature index that can be used to fit all sorts of models, rather than one that is built on a specific model (a linear trend over a specific period), we’re back to looking for overlapping subperiods.

Because we want to know how the earth’s temperature is changing on both regional and global scales. This change is not linear in time and depends on a variety of factors whose effects we may wish to understand. Regional series are necessary for paleo reconstructions, hurricane research, etc. All of these are considerably more important than simple linear trends.

The method you describe would be a nightmare to solve and implement – probably more difficult than the problem at hand. Since the trends are not linear, the results are strongly affected by how many stations exist at a given time, where those stations are located and how long and complete their records are. At the end you would like to have uncertainty estimates. What sort of precision might you expect from such a procedure’s end result? It may sound relatively simple, but then so does the problem of properly combining temperature series.

I think RomanM makes some very pertinent points in this post and ones of which we need to be reminded:

1) Importance of regional temperatures in understanding climate

2) Understanding the factors affecting temperature

3) Estimating the uncertainty of temperature trends

I have done my share of looking at regional temperatures, temporal and spatial associations and geographical factors and I always come back to these points.

Unfortunately, I also come up against my lack of the statistical expertise to confidently make the proper comparisons and estimates.

I have nothing to contribute other than to admire the willingness to share and exchange ideas and results on this site. This to me seems to be a tremendous example of how effective CA is and how effective other areas of the Internet could be.

Doesn’t the FDM method eliminate the need for subjective adjustments that are provided for station moves?

Wouldn’t this compensate for its sensitivity to noise in the first sample?

We started plotting the global surface trend using the First Difference Method back in 2006 (although we called it the Integrated Derivative Method). The graph we obtain is almost identical to the CRU trend starting from the same NCDC data. We present our results in the Home Analysis section here:

http://www.hashemifamily.com/Kevan/Climate/

Once you have an easy way to calculate the global trend, you can start to look for systematic sources of error in the original data. As an example, we eliminate the last twenty years of a station that disappears from the record, and recalculate the trend. The warming of the latter 20th century drops from 0.6C to 0.4C. In other exercises, we show how sensitive the rising end of the trend is to the subset of stations you are prepared to use. With stations that continue to operate from 1960-2000, for example, the 1990s are no hotter than the 1940s.

http://homeclimateanalysis.blogspot.com/2009/12/continuous-stations.html

Your website indicates that there were only 73 such stations. Where were they located geographically? Mainly in the United States?

That’s a good question, which I never asked myself. So I just extracted them again and plotted them on a map, which you will find here:

Sure enough: all but a few of them are in the United States.

I thought that might be the case. What your reduced set shows is the difference between the US temperature record and the global temperatures. Despite James Hansen’s statements to the contrary, US temperatures in the 1930s were likely .as high or higher than in the 1990s.

I was must admit that I was somewhat surprised that so few records were complete during a 40 year period.

I think this comment sub-thread illustrates the problem in trying to come up with a “global average temperature” from such horribly inconsistent data. I can’t really see anything meaningful from it.

Interesting, but how do you compare the 90s to the 40s with stations that only operate from the 60s to the 90s?

It’s interesting that S. Korea is one of the few places that has good temperature records. But surely these were interrupted during WW II and the Korean War.

Of the set of continuous stations from 1960 to 2000, there are always several existing between 1940 and 2000, so we can use the FDM (or Integrated Derivatives if you like) to get a trend. So, our calculation of the trend does not assume the Korean data is present in the 1940s.

How would I use the first difference method on yearly data. Like for instance I have 31 stations with yearly data starting at different times and so on but stretching from 1885 to 2009.

I would like to use the first difference method on it but what are the steps I need to use. I would preferably need some pretty specific details because i’d have to do it by hand (excel) rather than through programming.

We describe exactly how to apply first differences to yearly data in the Home Analysis section here:

http://www.hashemifamily.com/Kevan/Climate/

We provide all our analysis code as well as archives of the NCDC data with monthly and yearly averages in a format that’s easy to analyze. We also have a mapping from station ID to global location. Let me know if you have any questions.

Yeah it looks like an interesting analysis you guys did. So essentially you take Station A year 2 and minus Station A year 1 from it. This gives you the derivative for year 2? Finally you average all the derivatives for all stations across all years but then you integrate it. How do you do integration?

The integration we do in Excel, by adding the differences together. All the plots we did in Excel (you can probably tell). You will find the integration in the GYD sheet of Climate.xls in this archive:

http://www.hashemifamily.com/Kevan/Climate/Climate.zip

Sorry I’m off-topic again, but where else is there to just paste interesting tidbits?

A Guardian hack decides to take on the Global Warming Policy Foundation, who have chosen Andrew Montford to do an enquiry into the enquiries. Montford will tell the truth, but he’s obviously not neutral, so his conclusions – which we already know – will not convince anyone who is not already on board. Nigel Lawson’s GWPF think tank is an embarrassment, and Bob Ward picking on it is like Michael Mann choosing Sarah Palin as an opponent on climate change.

http://www.guardian.co.uk/environment/cif-green/2010/aug/19/climate-sceptics-mislead-public?showallcomments=true

In Unthreaded…

Hu,

If it would help, I can provide 6 years of daily max min data from 2 stations a few km apart where most of the externalities are standardised naturally. There are visual differences when graphed, but some are systematic and some seem (small) noise, so it might make a good test bed. If you are trying to maximise match between 2 signals, a good starting point should have a favourable signal:noise ratio, then you go to worse cases. Your experiments do not need to accidentally introduce artefacts that are hard to know about.

Although first differences are not

widelyused (and the surface datasets all seem to use anomaly methods) it has been applied to some climate problems before. Here is one done with radiosonde data:http://journals.ametsoc.org/doi/abs/10.1175/JCLI3198.1

As you may have noticed, the Guardian apologised to Montford for this article:

http://www.guardian.co.uk/environment/cif-green/2010/aug/19/climate-sceptics-mislead-public

Actually, I wasn’t thinking in terms of “infilling” the missing values at all, but only in terms of computing a best-estimate of the regional or global temperature anomaly.

In order to do this, of course, one must implicitly fill in a value for every point in the region or globe, including any missing station locations, and then integrate these up.

However, the missing station locations are of no special importance for this calculation. Furthermore, it is not necessary to actually compute the value for every point in the region (or even a dense finite subset of those points). All we need to know is the covariances of the observed stations with each other and with the regional temperature. This can be done either with a spatial model like exponential kriging (see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/) or with the aid of comprehensive climate data like the S09 AVHRR matrix (see http://climateaudit.org/2010/08/26/kriging-on-a-geoid/#comment-240413).

But that said, if I

wereto infill the missing values for station A from my hypothetical data, I would have come up with (13 14 15 16 17), not (13 14 13.5 13.5 13.5).Sorry for the delay getting back to you, Eric — see also the corrections in the main post.

I’m still thinking about what RyanO says RegEM does with anomalies, but my understanding is that you at least did start by subtracting seasonal means from the Reader and AWS data, and therefore had anomalies relative to each station’s observation period. However, a method like RomanM’s could still recover the full trend from such data, so that it does not in itself mean that you underestimated the fluctuations in Ant. temps.

(Replies to replies to replies can come out very narrow in the new CA format, so I’ve just inserted this at the end with its header, and then pasted in the Permalink.)

Suppose that each of your three series, after demeaning, runs from -1 to +1 during its 200-period experience. While I no longer have RegEM up and running, my understanding from Tapio Schneider’s article is that RegEM will compute an initial covariance matrix using this data, and thus find equal variances, and a

negativecorrelation of -1/2 between series 1 and 2 and between 2 and 3 (with no information on 1 and 3 as yet).It will therefore initially infill series 1 in periods 201-300 and series 2 in periods 301-400 with values growing in magnitude from 0 to -.5, and series 2 in periods 0-99 and series 3 in periods 100-199 with values declining from +.5 to 0.

It then recomputes the covariance matrix from the infilled data, with an add factor to prevent loss of total variance. The correlation between 1 and 2, and between 2 and 3, is still negative, though there is now a weak positive correlation between 1 and 3.

It then iterates to convergence.

I can’t quite visualize what it ends up with, but it must be pretty flat. I therefore don’t see that RegEM (unlike Roman’s PlanB) solves the problem of lost trend that I posted.

What do you actually get putting these 3 series into RegEM?

hello guys, can you help me? what is the advantages and disadvantages of FDM.

2nd, what is the advantages and disadvantages of CSM. Tq

Tq —

Jeff Id convinced me that while FDM solves one problem, it just creates another, and so is not the way to go. See http://climateaudit.org/2010/08/19/the-first-difference-method/#comment-240064 above.

I’m not sure what CSM stands for, offhand.

## 3 Trackbacks

[...] The First Difference Method Over on WUWT (http://wattsupwiththat.com/2010/07/13/calculating-global-temperature/), Zeke Hausfarther and Steven Mosher [...] [...]

[...] on a Geoid Geoff Sherrington and others on the First Difference Method post have requested a post for discussing [...]

[...] http://climateaudit.org/2010/08/19/the-first-difference-method/ Changes that are likely to cause a level shift in the series, such as a TOBS or equipment change or a station move, should simply be treated as the closing of the old station and the creation of a new one, thereby eliminating the need for the arcane TOBS adjustment program or a one-size-fits-all MMTS adjustment. Missing observations may simply be interpolated for the purposes of computing first differences (thereby splitting the 2 or more year observed difference into 2 or more equal interpolated differences). When these differences are averaged into the composite differences and then cumulated, the valuable information the station has for the long-run change in temperature will be preserved. (When computing a standard error for the index itself, however, it should be remembered in counting stations that the station in question is missing for the period in question). [...]