It is my understanding (correct me if I am wrong) that the following methodology applies: No stations outside of a given gridcell are directly used in the calculation of the temperature series of that gridcell.

Really? That’s a huge oversight.

]]>I think you missed the point. In Steve’s example, station 507934360010 has a weight more than 17 times as large as station 507933090000 in determining the **global** average merely because of the artificially chosen gridding pattern. That is mainly because it does not contribute anything to estimating temperatures in neighbouring gridcells (at points that may be closer to it than any station in that neighboring cell).

Stations at the edges of the gridcells have virtually a zero contribution to the overall result!

The wieghting decreases the impact of a sngle station but there are proprtionally more stations (greater surface area). The effect washes out so stations in each a4ea contribute to the same amount.

]]>It is my understanding (correct me if I am wrong) that the following methodology applies: No stations outside of a given gridcell are directly used in the calculation of the temperature series of that gridcell. The series from the stations are combined using weights that are a decreasing function of the distance of the station from the center of the grid cell. If this is true, then there are serious problems with the current method.

What’s wrong? If you want to merely estimate the temperatures at the center of the grid-cell (which is an arbitrary point defined as a result of the particular grid pattern chosen), this is not necessarily a bad way to do it. However, this value is a poor indicator of the temperatures anywhere else in the cell.

There are several negative side effects. The obvious one is that stations are assigned a weight due not to their quality, but to their distance from a center point which is the result of the particular (arbitrarily chosen) gridding pattern. Notice that in Steve’s example, station 507934360010 has a weight more than 17 times as large as station 507933090000 in determining the gridcell result. Shift the boundaries of the gridcell and the weights will change, possibly reversing the weights for these two stations.

The second side effect is less obvious. Suppose that we now use these gridcell values (area weighted) to calculate a “world temperature”. The weight of a station toward calculating the world temperature series is now basically the same as the weight of the station in calculating the gridcell value. Stations at the edges of the gridcells have virtually a zero contribution to the overall result! However, rotate the grid east or west a little, and you will get a different world average since the relative weights of the stations change!!! I can’t believe that the GISS folks would not be able to see this. Does it matter? At best, the error bounds would be larger than they need to be because of this type of unequal treatment of the stations. At worst, there is serious room for possible biases in the estimates. Either way, I doubt that this is a good way of doing things.

Never let it be said that we can’t be constructive her on CA. Let me suggest a better approach. A more equitable treatment of the stations should not be based on the chosen gridding pattern. First, for each point on the globe, estimate the temperature using all the stations using weights similar to the GISS method – decreasing as a function of distances to that point (but also possibly further weighted by station quality). Now, for a particular grid pattern, we average the estimated temperatures of all points in the gridcell. In practice, the procedure is reasonably easy to carry out. First, distance weighting can ignore all stations farther than some given distance, so that each point temperature estimate is calculated from a relatively small number of stations. Secondly, the averaging over points within a grid cell becomes two-dimensional integrals of the weights belonging to the points in the gridcell (which may have to be evaluated by numerical methods, depending on the weight functions chosen). This needs to be done only once for each gridcell and results in a small number of stations (some of which may be outside of the gridcell) with an associated set of corresponding weights for that grid cell. Once these are calculated, they can be used to calculate the temperature series.

Does this remedy the previous issues? The answer is yes. Although stations near the center of the gidcell will still have more influence in determining the cell value, stations near the edges have higher influence than they do in the current method. As well, their influence will extend into neighboring cells so that in the overall structure each station is treated equally. Secondly, since the initial estimation was done independently of the grid pattern, the estimates of world temperatures would not depend on the grid structure chosen. Differences between HadCrut and GISS have been explained away by some as differences in the treatment of Arctic temperatures. One wonders if it goes deeper than that.

]]>I’m not sure if this is your question in the original post but HadR2 refers to Hadley SST version 2. As if you didn’t already have buckets of problems.

]]>The problem appears to be in weights0 defined in the lines:

distance0=c( 71.6541, 194.1838, 200.6309, 240.3882) #calculated distance from gridcell center

weights0=1-distance0/1200;weights0

I took another approach which often helps in decoding the coefficients of linear combinations. Fitting a regression to predict their gridcell series (target) using the given stations gives the “weights” used in the calculation. However, the monthy anomaly needs to be calculated as well. You can do both of these with an ANOVA using month as a categorical variable and the station series as the covariates. On R this can be done with the following:

month = as.factor(rep(month.abb, 54))

linmod = lm( chron[,2] ~ 0 + month + chron[,3:6], na.action = “na.exclude”)

summary(linmod)

predanom = predict(linmod)

cor(predanom, chron[,2],use = “pair”) # .999815

plot(chron[,1], chron[,2]-predanom, type = “l”) # look at residuals

The fit is very good with a correlation of .999815.The coefficients for the months adjust for the anomalies (but not specifically exactly zeroed to the time period used by GISS) and the coefficients for chron[,3:6] are the weights used. This gives the following weights for the stations:

X507934360010 0.62333689

X507935460000 0.18488482

X507933730000 0.15513372

X507933090000 0.03618952

which would correspond roughly to Weight = 0.872273 – Distance / 283.9 (my guess is approximately constant x (1 – Distance/250)). I would have checked these weights using your programme, however there seem to be some errors in the hansen_combine function, in particular, in the line:

if (max(long0) reference=X[,1] #picks longest series to insert.

Hope this helps you track down what is going on.

]]>I haven’t looked at any of the code, but I was struck by the endian discussion. Assuming – possibly incorrectly – that this ‘problem’ has been sorted out, is it possible that the differences in compilers handling ‘negative zero’ data could be be causing the delta flyers?

It might be instructive to find out what kind of computers this stuff ran on. The world of mainframes wasn’t always comprised of little-endian, twos-compliment machines, and compilers that functioned with them – and I expect NASA was loaded with various mainframes for a long time.

It may also be unwise to assume that NASA code was necessarily compiled with the even ‘then current’ versions of compliers.

]]>The originally unreported revisions a few weeks later did not have the effect of improving the organization or coherency of the code, but substituted a different USHCN data version, which happened to restore 1998 as being a titch warmer than 1934 in the U.S. Just a coincidence, I’m sure.

One has to keep an eye out on versions – but there are some pretty recent versions available and I don’t think that this is the problem.

]]>At the time they indicated that they would shortly provide a version that would be of more interest to scientists. A few weeks later they did provide a revised version but I am not clear that this of any greater interest than the previous release and whether it is actually the improved version Jim Hansen was referring to.

The versions that have been released appear to be quite old. Assuming they were originally used then they will help in understanding what they have done in the past. However there is a possibility that they are doing something different now. Even if the steps are eventually made to run they may not fully replicate the current output.

]]>One question is whether the GISTEMP urban adjustments do anything beyond a random permutation of data and, if that’s the case, does that matter?

I thought that was already answered by the “adjustments since 2000″ graph in The Register and your previous analysis of Step 2.

If it’s not clear enough, here’s another way to clarify it:

Use the current data set.

Pick a recent but not too recent time, like Jan 1990 or 1995.

For each end date from selected date through today

Run Step 2 for data up to end date

Calculate unweighted average ROW temperature of Step 2 output for selected date

Graph selected date ROW temperature vs end date

If that shows a trend, then Step 2 is clearly having an effect. If it does, there’s probably merit in proceeding to the next analysis:

Use the current data set.

Pick a start date as before

For each end date from selected date through today

Run Step 2 for data up to end date

Calculate unweighted average annual ROW temperature of Step 2 output for all years through selected date

For all years 1880 to selected date

Calculate best fit slope of adjusted temperature of that year

Graph slopes of previous step (ie 1880-1990)

The Register suggests that the zero crossing is around 1970. I’d make a SWAG that for Step 2, it’s actually 1980 – the year of the census data on which Step 2 is based. Further, the slope of the final graph will tell you roughly how badly Step 2 is currently distorting the data.

]]>