I’ve now ported my emulation of Schneider’s RegEM PTTLS to R and benchmarked it against Jeff’s Matlab as shown below. I caution readers that this is just an algorithm. There are other ways of doing regressions and infills. The apparent convergence to three PCs noted by Roman is still pending as a highly interesting phenomenon. There are other RegEM flavors in Schneider’s algorithm, but I haven’t bothered trying to emulate these as it’s the PTTLS version that seems to be in play right now.

The method below retains coefficients from the final iteration, which I plan to use for further analysis. Here’s a short benchmark of the *regem_pttls* function.

First load functions

source(“http://data.climateaudit.org/scripts/regem/regem.functions.txt”)

Then load the data (also with a collation of Matlab intermediates)

#this uses BAS scrape for surface and reverse-engineered Steig recon_aws for AWS. Two surface series need to be deducted (Marion, Gough)

#version here is not quite file compatible with Steig. See Nic L on Dec 2002 problem

download.file(“http://data.climateaudit.org/data/steig/Data.tab”,”Data.tab”,mode=”wb”)

load(“Data.tab”)

download.file(“http://data.climateaudit.org/data/steig/Info.tab”,”Info.tab”,mode=”wb”)

load(“Info.tab”)

download.file(“http://data.climateaudit.org/data/steig/recon_aws.tab”,”recon_aws.tab”,mode=”wb”)

load(“recon_aws.tab”)

dimnames(recon_aws)[[2]]=Info$aws_grid$id

download.file(“http://data.climateaudit.org/data/steig/jeff.tab”,”temp.dat”,mode=”wb”);load(“temp.dat”) #matlab compilation#surface

surf=Data$surface

surf=window(surf,start=1957,end=c(2006,12)) #ends in 2006 per Steig

surf = surf[,-26] #deletes column 26 – Marion

surf = surf[,-17] #deletes column 17 – Gough

dim(surf) #600 42

sanom=as.matrix(surf)

for (i in 1:42) sanom[,i]=anom(surf[,i],reference=1957:2007)

#Steig probably used different reference period and this can be experimented with later

(ny=ncol(sanom)) #42#AWS reverse engineered (rather than READER scrape)

anoms =window(recon_aws, start = 1980);dim(anoms) #324 63

dat_aws = window(Data$aws[,colnames(recon_aws)],start=1980,end=c(2006,12));

dim(dat_aws) #324 63

anoms[is.na(dat_aws)] =NA

#apply(anoms,2,mean,na.rm=T) #some are zero, some aren’t

(nx=ncol(anoms)) #63# combine anomalies

anomalies=ts.union(anoms,sanom)

dimnames(anomalies)[[2]]=c( dimnames(anoms)[[2]],dimnames(sanom)[[2]])

dim(anomalies) #600 105

Now do the iteration through the function below for 5 iterations (it takes a while (: )

regemtest=regem_pttls(X=jeff$X,maxiter=5,regpar=3)

Now do a quick comparison to Matlab results:

test=unlist(sapply(regemtest,function (A) A$dXmis) )

data.frame(test=test,jeff=jeff$dXmis[1:length(test),3])

Match perfectly. Wasn’t that easy.

# test jeff

#1 1.25811556 1.25800

#2 0.46735524 0.46740

#3 0.31128378 0.31130

#4 0.18168994 0.18170

#5 0.11207077 0.11210

#6 0.08017931 0.08018

To do a run on (say) surface stations by themselves with regpar=2, just do this

regem.surface=regem_pttls(X=jeff$X[,64:105],maxiter=5,regpar=2)

To get station-by-station RegEM trends (whatever they may prove to be):

trend=function(x) lm(x~ I((1:length(x))/12))$coef[2]

trend.ant= apply(regem.surface[[21]]$X,2,trend)

## 12 Comments

I had to remove the line

surf < – surf[,-26] #deletes column 26 – Marion

to get the script to run.

Martin,

Get rid of the space between the less-than symbol and the dash and try it again.

#1. That’s a WordPress artifact. I changed to = to avoid the problem.

Steve, can’t you just link to a file containing the scripts rather than attempting to embed them in blog post text? It seems that such scripts are often munged after running the WordPress guantlet.

Re: joshv (#4), “running the WordPress guantlet.”

Visualizing a printing press which involves steel gauntlets…no wonder things get munged.Go stattos!

Think how much better it would be if all climate research pieces that use statistics were subject to peer review (and testing) by statisticians. Whatever the outcome we could feel reassured by the robustness of those papers that passed this process.

snip -editorialing

I have been running the AWS reconstruction using Steve’s R RegEM code, and there are a few things that I would welcome help with and/or an explanation of:

1) The initial values of non-missing observations don’t seem to be preserved, which I thought RegEM was meant to do. For instance, the temperature anomaly for Cape King (ID 7351) December 2004 (anomalies[576,5]) is 1.58125. Looking at the values of the data matrix X output by the regem_pttls function (by applying unlist(sapply(test,function (A) A$X) ) to its output), after iteration 1 the value is 1.675311, after iteration 2 1.724464, after iteration 3 1.755643, after iteration 25 1.805082 and after iteration 51 1.796506.

I see that the column means of the regem output are zero, and I was wondering if re-centering of them on each iteration could explain the drift of the values of non-illed in data?

NB the initial column mean for Cape King is zero, but it is non zero for a number of the other AWS due to the inclusion of various data points (in December 2002, etc) that Steig did not use and to the issue of what period he averaged AWS data over, as per previous posts of mine.

2) When, in order to carry out further iterations, I try to run regem_pttls on its own output (after applying unlist(sapply(test,function (A) A$X), I get the message:

“Error in while (iter tol) { :

missing value where TRUE/FALSE needed”.

The script I used was:

test=regem_pttls(X=anomalies,maxiter=2,regpar=3)

aa=test[[3]]

dimnames(aa$X)=dimnames(anomalies)

bb=aa$X

test2=regem_pttls(X=bb,maxiter=2,regpar=3)

3) I don’t seem to be getting the AWS regem reconstruction to converge anywhere near accurately on Steig’s reconstruction (using uncorrected data for both). With maxiter=50 (which so far as I can see actually involves 51 applications of the algorithm), the excess of the regem output and Steig’s reconstruction is at lowest -3.05 and at highest 3.59, albeit the average standard deviation is not that high at 0.23. May I ask if Steve or anyone else has obtained better results – perhaps I am doing something wrong?

Re: Nic L (#7),

yes, it’s recentered after the splice. That would probably account for the drift.

#7. Nic, I’m not presently in a position to provide support or explanation on this. I’ve just established my own foothold. I’ve only tried to reconcile to Jeff Id’s Matlab run so far – reconciling to Steig is another project as we don’t know what “adaptations”, if any, Steig might have done.

#7. it looks like this diagnostic is thrown up if you try regem_pttls with no missing values.

3) I’ve only reconciled to Jeff Id. I’ll compare to AWS next.

Steve,

.

In your regem.functions.txt, you have 2 versions. One is the regem_pttls function (which calls pttls) and the other is the regem_pca function, which seems to be self-contained. I was wondering what the difference was between the two.

.

I ran both using the cloudmasked data from Comiso and the 42 surface stations (reader.steig.anom):

.

The above is how I get the PCs to place next to the surface stations for running RegEM. The script is based off Jeff Id’s script, with some simplifications.

.

Next, I use the first member of the steig.PCs list – “ts.union(surf, PCs[, 1:PC.max])” – which is the 42 surface stations plus the 3 PCs – as an input to both regem_pttls and regem_pca:

.

.

The pttls version took about 5 times as long, but only took 28 iterations to converge. The pca version took 46 iterations to converge, but was much faster.

.

I then extracted the last iteration in each list, took the means of the rows, and plotted the difference:

.

.

Pretty close, but there is more divergence the further you go in time from the 3 satellite PCs. If you do not know why this occurs, that’s okay. It was more a question of curiosity. I benchmarked a reconstruction using the both function versions vs. Jeff Id’s Matlab RegEM reconstruction. He got a continent-wide trend of 0.118; using regem_pca and regem_pttls I got a continent-wide trend of 0.115 for both.