Yesterday I described the work done to the surface station records in Hansen Step 2 in preparation for adjusting urban stations to match the trend of nearby rural stations. The basic substeps are
- Deciding which stations are rural and which are urban. The methodology used for most of North America differs from that applied to the rest of the world.
- Sorting the rural stations by record length
- Identifying rural stations that are near the urban station, where near is defined to be any station within 500km. Failing that, near can be extended to 1000km, or about the distance from New York City to Indianapolis, IN.
- After the nearby stations are identified, they are combined into a single series, beginning with the series that has the longest record.
- The urban series is subtracted from the combined rural series.
The overlap period of this new series is passed to a FORTRAN program called getfit. The purpose of getfit is to find a fit using regression analysis of a line with a break in slope somewhere in the middle (referred to as the knee). The slopes of the two lines are returned along with the coordinates of the knee. The following image is an example of what this program is trying to do.
The algorithm iterates through all but the first five years and last five years of overlap, selecting each year as the knee in it’s search for the best broken-line fit to the curve.
Each knee is processed through the fitting algorithm, which returns the two line slopes, the temperature value of the knee (y-value), and an RMS value for the fit. If the resulting RMS is smaller than the previous smallest RMS, the old slopes and knee are discarded in favor of the new values. At the end of the iteration process, the best knee and slopes will have been selected for this particular curve.
The two slopes returned are the left slope (ml) and the right slope (mr). During the calculations, the hinge point is considered to be a part of the “left-side”data. The formulas for calculating the slopes are:
- x is a year in the overlap period.
- y is the temperature value for the year x.
- n is the count of years in the overlap period with valid y values.
- Variables with a subscript l represent data to the left of, or including, the knee. A subscript r represents data to the right of the knee.
The y-value of the knee, yk, is found using the following:
RMS is calculated using yk:
The left and right slopes are now used to adjust the urban record. The years of overlap between the urban and combined rural records are iterated. For years less than and including the “knee year” the adjustment to the urban record (rounded to the nearest integer) is:
For years greater than the “knee year” the adjustment to the urban record (rounded to the nearest integer) is
Finally, the adjustment is added to the urban record, producing the homogenized urban record. One would therefore expect the adjustment values to be largely negative in order to remove the UHI effect.
For each year that an adjustment is applied, it is done so from December of the previous year through November of the current year. This is in line with GISS reporting annual temperatures on a winter through fall seasonal cycle.
There is a special case that seems to extend the range of years that can be adjusted more broadly than the period of overlap. I do not fully understand what is going on yet, nor do I know if it is a case that will ever actually happen. Right now I believe the above fairly summarizes the general case.
If I am able to determine what is going on in the special case I will post the results here. Understanding things is complicated by the fact that the case is considered across several programs, and the variable names are not only unclear, they are inconsistent.
Right now, however, I intend to take a fresh look at Cedarville to try and understand what is happening in that urban, one stop-light town.
Steve: It’s worth comparing John’s above analysis to http://www.climateaudit.org/?p=2095