I’ve transliterated relevant Tapio Schneider code into R (pttls.m) and parts of regem.m that seem relevant at present. Jeff Id has extracted a variety of intermediates from his Matlab run and I’ve fully reconciled through two steps with remaining differences appearing to be probably due to transmission rounding. My dXmis statistic at step one was 1.258116 (Jeff – 1.258) and at step two was 0.4673552 (Jeff – 0.4674). I’ve posted a detailed turnkey script for the reconciliationhere (something that I do from time to time to make sure that I don’t inadvertently modify a reconciliation – scripts which are unexciting but time consuming.)
I’ve added in a few features that I’ll use for analysis purposes. RegEM calculates a bazillion coefficients – well, maybe not quite a bazillion – in the case at hand, 1,341,662 coefficients per iteration. Given that there are only 23,449 actual measurements between the AWS and surface stations in the 1957-2006 period, this is a yield of over 57 coefficients per measurement per iteration, which seems like a high yield and one that might give rise to simplification.
In particular, given that the “money” series is the reconstruction trend, this means that the station and AWS reconstructions end up being combined some way. By doing the algebra in a little different order (merely the commutative property of matrix algebra), we should be able to get to weighting factors using this information. But that’s a little distance in the future.
Today, I want to show a couple of graphics from the first two steps, in which I calculated the proportion of positive coefficients to total coefficients by year. Given that we’re dealing with millions of coefficients, these coefficients have to be considered statistically – something that you see in random matrix theory, which we’ve alluded to from time to time. Here they are. The proportions are not identical (I checked), but there isn’t much change in the first step.
What does this mean? Dunno but, intuitively, there are a couple of things that I don’t like.
First, I don’t like the idea that 1 out of 3 coefficients is negative. When things are eventually recombined into an Antarctic average, these negative coefficients will express themselves as upside down contributions from some stations – which surely can’t be a good thing.
Second, I don’t like the non-stationarity in the proportion of positive coefficients. Why does the proportion of positive coefficients decrease so sharply from the early 1980s to 2000? Does it “matter”? Dunno. This is the sort of thing that IMO makes it very undesirable to use poorly understood methods (such as TTLS RegEM with RegPar=3) for important applied results, particularly when the new results seem to yield “better” answers than earlier results with better understood methods (GISTEMP, Monaghan) and the differences are not clearly reconciled.
Anyway, I should have an end-to-end port of this method to R later today or tomorrow.