Thanks for the offer. The three papers that are related to the undocumented changepoints are the ones I’m interested in. Especially the Lund and Wang since they deal with changepoints in trends. This is what they’re planning on using in V2 so I’d like to get a head start on understanding what we’re going to see.

Alexandersson, H. and A. Moberg, 1997: Homogenization of Swedish temperature data. Part I: Homogeneity test for linear trends. Int. J. Climatol., 17, 25-34.

Wang, X.L., 2003: Comments on “Detection of undocumented changepoints: A revision of the two-phase model”. J. Climate, 16, 3383-3385.

Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: a revision of the two-phase regression model. J. Climate, 15, 2547-2554.

]]>I’ve got most of the papers – which ones are you looking for?

]]>http://www.ncdc.noaa.gov/oa/climate/research/ushcn/

The homogenization algorithm is described as:

1. First, a series of monthly temperature differences is formed between numerous pairs of station series in a region. The difference series are calculated between each target station series and a number (up to 40) of highly correlated series from nearby stations. In effect, a matrix of difference series is formed for a large fraction of all possible combinations of station series pairs in each localized region. The station pool for this pairwise comparison of series includes U.S. HCN stations as well as other U.S. Cooperative Observer Network stations.

2. Tests for undocumented changepoints are then applied to each paired difference series. A hierarchy of changepoint models is used to distinguish whether the changepoint appears to be a change in mean with no trend (Alexandersson and Moberg, 1997), a change in mean within a general trend (Wang, 2003), or a change in mean coincident with a change in trend (Lund and Reeves, 2002) . Since all difference series are comprised of values from two series, a changepoint date in any one difference series is temporarily attributed to both station series used to calculate the differences. The result is a matrix of potential changepoint dates for each station series.

3. The full matrix of changepoint dates is then “unconfounded” by identifying the series common to multiple paired-difference series that have the same changepoint date. Since each series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby series share the same changepoint date.

4. The magnitude of each relative changepoint is calculated using the most appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a jump in mean within a general linear trend, etc.). This magnitude is used to estimate the “window of uncertainty” for each changepoint date since the most probable date of an undocumented changepoint is subject to some sampling uncertainty, the magnitude of which is a function of the size of the changepoint. Any cluster of undocumented changepoint dates that falls within overlapping windows of uncertainty is conflated to a single changepoint date according to

1. a known change date as documented in the target station’s history archive (meaning the discontinuity does not appear to be undocumented), or

2. the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity appears to be truly undocumented)

5. Finally, multiple pairwise estimates of relative step change magnitude are re-calculated at all documented and undocumented discontinuities attributed to the target series. The range of the pairwise estimates for each target step change is used to calculate confidence limits for the magnitude of the discontinuity. Adjustments are made to the target series using the estimates for each discontinuity.

For UHIE they state:

“the change-point detection algorithm effectively accounts for any “local” trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2.”

If I’m following this correctly they are looking for discontinuities relative to the nearby sites, but if all the sites are showing UHIE then how could this method adjust for it? The best you could get, I would think, was something that looked like the average slope of the line trend lines. Or is the assumption that UHIE appears only as discontinuities in the data? It would seem that if the majority of a data set is bad then it would contaminate the whole set, “correcting” the good sites based on bad data.

They reference a number of papers for their algorithm but I can’t get to them since they are not available without paying for them. You would think they would have the algorithms online if not the papers. Although, actually I’d prefer to see the code.

]]>That’s the way I read it. Gobsmacks ya, doesn’t it?

]]>Is that adjustment in the second figure what I think it is? Is this added to the raw data? Are they really dropping the temperature by 4 degrees from 1909 to 1989 then increasing the temperature by 5 degrees from 1989 to 2005?

]]> You can find a link to Peterson 2006 over on Peilke. Basically, Peilke/Davey wrote about

some questionable sites in Colorado ( Anthony, your inspiration right?) So, Peterson

Came back and analyzed the Peilke sites and concluded ” no microsite effects”

The argument turns on Homogeniety adjustments.. Opaque to me at this point, I did a quick

skim. SteveM the greater indicated that he may post something on it.

Peterson was “somewhat” careful not to generalize conclusions.. But mumbled about weight

of evidence building.

However, toward the end Peterson says this.. Paraphrasing ” although I showed this doesnt matter,

Thanks to Peilke and Davy. And This in no way excuses shoddy sites”

http://icecap.us/images/uploads/CENTRAL_PARK_TEMPERATURE_COMPARISON.pdf

]]>