I’ve discussed “mixed effects” methods from time to time in paleoclimate contexts, observing that this statistical method known off the Island can provide a context for some paleoclimate recipes, e.g. in making tree ring chronologies. This would make a pretty good article.
Another interesting example of this technique, which would also make a pretty good article, is placing Hansen’s rather ad hoc “reference” period method in a statistical framework using a mixed effect method. I’ve done an experiment on the gridcell containing Wellington NZ with interesting results.
The details of Hansen’s “reference method” is not fully described in Hansen et al 1999, 2001, but should be decodable from code placed online in the past year (though there are details of the code that remain still impenetrable to me, but I’m chipping away at them.)
For this particular gridcell calculation with 250 km smoothing, there are 4 contributing stations in which the starting point is 4 monthly adjusted station series. Let’s stipulate the adjustments for the purposes of the present calculations, though the urban adjustments are themselves questionable. Hansen’s “recipe” for his reference period is to start with the longest series, then arrange the other series in descending length. For the second series, in the overlap period where both series have values, he calculates the average difference by month in the overlap period, which is called “bias”. The “bias” is subtracted from the second series and a weighted average of the two series is made according to their weights (the distance from the gridcell center), using whatever values are available. This process is repeated and then the final series is converted to an anomaly centered on 1951-1980.
At the end of the day, they have, in effect, estimated 48 deltas (4 stations x 12 months) plus an anomaly for each year. Hansen argues that this “reference period” method is a better utilization of available data than requiring a fixed anomaly period as in the CRU method.
Now watch the following calculation, about which I’ll comment more at the end. First we import the test data: the 4 stations and the Hansen’s gridded data (all excerpted from Hansen data).
Next, I make a data frame that is more organized for statistical analysis – a “long” array in which each row contains the date in decimal years (1955.42), the station id, the month (as a factor) and the adjusted temperature. Each measurement goes on its own row.
Network=data.frame ( c(t(array( rep(name0,N),dim=c(M,N)))), rep(year,M), month,c(chron))
This dataset will look something like this:
id year month chron
1 X507934360010 1938.000 1 NA
2 X507934360010 1938.083 2 NA
3 X507934360010 1938.167 3 NA
4 X507934360010 1938.250 4 NA
Now for an important trick. To do the sort of analysis that I wish to do, we have to create a nested factor for each of the 12 months for each station. This can be done as follows (a technique noted in Bates 2005 article in R-News):
Now the data frame looks like this – the IM factor becomes very useful:
id year month chron IM
1 X507934360010 1938.000 1 NA X507934360010:1
2 X507934360010 1938.083 2 NA X507934360010:2
3 X507934360010 1938.167 3 NA X507934360010:3
4 X507934360010 1938.250 4 NA X507934360010:4
Now we can calculate a statistical model doing everything that Hansen does in pages in one line – only better – as follows using the lme4 package (also loaded here):
fm.month = lmer(chron~(1|id)+(1|IM)+(1|year),data=Network)
The object fm.month contains random effects for both the station:month combinations and for each time period, all calculated according to an algorithm known off the Island. The values in ranef(fm.month)$year compare to the Hansen gridded anomalies as shown in the graphic below (the scale of the difference in the thrid panel differs from the upper two panels). The lmer random effect and the Hansen reference-method anomaly have a 0.975 correlation (which is about as high as I’ve been able to do with any actual emulation of Hansen’s method BTW. In this particular case, the Hansen reference method turns out to increase the trend by 0.05 deg C/decade or about 0.25 deg C over the 50 years shown here. I don’t know whether this occurs in other gridcells; it’s the sort of thing that might be worth returning on some occasion. (The urban adjustment to the Wellington NZ series had also caused an increase in the trend.)
I know that some critics have been screeching about the pointlessness of examining what Hansen actually does and urging me to study other “more important” problems. In my opinion, before making suggestions, it’s a good idea to understand what people are doing or trying to do. In this particular step, Hansen’s reference period method can be construed as a way of constructing estimates of the various quantities estimated through the above method (all neatly packaged in the lmer-object fm.month). In this particular case, Hansen’s calculations, viewed as statistical estimates, do not coincide with the maximum-likelihood estimates, which are arrived at in the lmer calculation. At the present time, it’s hard to tell whether the Hansen method would tend to cause the introduction of an upward trend relative to the maximum-likelihood calculation (as in the above case) or whether the trend differences from maximum likelihood estimates are randomly distributed between positive and negative values.
However, I think that the above example is sufficient to show that the Hansen “reference period” method does not yield a maximum likelihood estimate. Once the issue is framed in a proper statistical format, it’s hard to think of any particular reason for using the Hansen “reference” method. As noted above, it’s possible that errors introduced by this method are random and cancel out, but that would be sheer good luck and not a justification of the method.