RegEM PTTLS Ported to R

I’ve now ported my emulation of Schneider’s RegEM PTTLS to R and benchmarked it against Jeff’s Matlab as shown below. I caution readers that this is just an algorithm. There are other ways of doing regressions and infills. The apparent convergence to three PCs noted by Roman is still pending as a highly interesting phenomenon. There are other RegEM flavors in Schneider’s algorithm, but I haven’t bothered trying to emulate these as it’s the PTTLS version that seems to be in play right now.

The method below retains coefficients from the final iteration, which I plan to use for further analysis. Here’s a short benchmark of the regem_pttls function.

First load functions

source(“http://www.climateaudit.info/scripts/regem/regem.functions.txt”)

Then load the data (also with a collation of Matlab intermediates)

#this uses BAS scrape for surface and reverse-engineered Steig recon_aws for AWS. Two surface series need to be deducted (Marion, Gough)
#version here is not quite file compatible with Steig. See Nic L on Dec 2002 problem
download.file(“http://www.climateaudit.info/data/steig/Data.tab”,”Data.tab”,mode=”wb”)
load(“Data.tab”)
download.file(“http://www.climateaudit.info/data/steig/Info.tab”,”Info.tab”,mode=”wb”)
load(“Info.tab”)
download.file(“http://www.climateaudit.info/data/steig/recon_aws.tab”,”recon_aws.tab”,mode=”wb”)
load(“recon_aws.tab”)
dimnames(recon_aws)[[2]]=Info$aws_grid$id
download.file(“http://www.climateaudit.info/data/steig/jeff.tab”,”temp.dat”,mode=”wb”);load(“temp.dat”) #matlab compilation

#surface
surf=Data$surface
surf=window(surf,start=1957,end=c(2006,12)) #ends in 2006 per Steig
surf = surf[,-26] #deletes column 26 – Marion
surf = surf[,-17] #deletes column 17 – Gough
dim(surf) #600 42
sanom=as.matrix(surf)
for (i in 1:42) sanom[,i]=anom(surf[,i],reference=1957:2007)
#Steig probably used different reference period and this can be experimented with later
(ny=ncol(sanom)) #42

#AWS reverse engineered (rather than READER scrape)
anoms =window(recon_aws, start = 1980);dim(anoms) #324 63
dat_aws = window(Data$aws[,colnames(recon_aws)],start=1980,end=c(2006,12));
dim(dat_aws) #324 63
anoms[is.na(dat_aws)] =NA
#apply(anoms,2,mean,na.rm=T) #some are zero, some aren’t
(nx=ncol(anoms)) #63

# combine anomalies
anomalies=ts.union(anoms,sanom)
dimnames(anomalies)[[2]]=c( dimnames(anoms)[[2]],dimnames(sanom)[[2]])
dim(anomalies) #600 105

Now do the iteration through the function below for 5 iterations (it takes a while (: )

regemtest=regem_pttls(X=jeff$X,maxiter=5,regpar=3)

Now do a quick comparison to Matlab results:

test=unlist(sapply(regemtest,function (A) A$dXmis) )
data.frame(test=test,jeff=jeff$dXmis[1:length(test),3])

Match perfectly. Wasn’t that easy.

# test jeff
#1 1.25811556 1.25800
#2 0.46735524 0.46740
#3 0.31128378 0.31130
#4 0.18168994 0.18170
#5 0.11207077 0.11210
#6 0.08017931 0.08018

To do a run on (say) surface stations by themselves with regpar=2, just do this

regem.surface=regem_pttls(X=jeff$X[,64:105],maxiter=5,regpar=2)

To get station-by-station RegEM trends (whatever they may prove to be):

trend=function(x) lm(x~ I((1:length(x))/12))$coef[2]
trend.ant= apply(regem.surface[[21]]$X,2,trend)

12 Comments

  1. Martin Åkerberg
    Posted Feb 17, 2009 at 5:52 PM | Permalink

    I had to remove the line

    surf < – surf[,-26] #deletes column 26 – Marion

    to get the script to run.

  2. Paul Penrose
    Posted Feb 17, 2009 at 5:55 PM | Permalink

    Martin,
    Get rid of the space between the less-than symbol and the dash and try it again.

  3. Steve McIntyre
    Posted Feb 17, 2009 at 5:57 PM | Permalink

    #1. That’s a WordPress artifact. I changed to = to avoid the problem.

  4. joshv
    Posted Feb 17, 2009 at 6:47 PM | Permalink

    Steve, can’t you just link to a file containing the scripts rather than attempting to embed them in blog post text? It seems that such scripts are often munged after running the WordPress guantlet.

    • AnonyMoose
      Posted Feb 17, 2009 at 11:16 PM | Permalink

      Re: joshv (#4), “running the WordPress guantlet.”

      Visualizing a printing press which involves steel gauntlets… no wonder things get munged.

  5. tolkein
    Posted Feb 18, 2009 at 7:52 AM | Permalink

    Go stattos!

    Think how much better it would be if all climate research pieces that use statistics were subject to peer review (and testing) by statisticians. Whatever the outcome we could feel reassured by the robustness of those papers that passed this process.

    snip -editorialing

  6. Nic L
    Posted Feb 19, 2009 at 10:42 AM | Permalink

    I have been running the AWS reconstruction using Steve’s R RegEM code, and there are a few things that I would welcome help with and/or an explanation of:

    1) The initial values of non-missing observations don’t seem to be preserved, which I thought RegEM was meant to do. For instance, the temperature anomaly for Cape King (ID 7351) December 2004 (anomalies[576,5]) is 1.58125. Looking at the values of the data matrix X output by the regem_pttls function (by applying unlist(sapply(test,function (A) A$X) ) to its output), after iteration 1 the value is 1.675311, after iteration 2 1.724464, after iteration 3 1.755643, after iteration 25 1.805082 and after iteration 51 1.796506.
    I see that the column means of the regem output are zero, and I was wondering if re-centering of them on each iteration could explain the drift of the values of non-illed in data?
    NB the initial column mean for Cape King is zero, but it is non zero for a number of the other AWS due to the inclusion of various data points (in December 2002, etc) that Steig did not use and to the issue of what period he averaged AWS data over, as per previous posts of mine.

    2) When, in order to carry out further iterations, I try to run regem_pttls on its own output (after applying unlist(sapply(test,function (A) A$X), I get the message:

    “Error in while (iter tol) { :
    missing value where TRUE/FALSE needed”.

    The script I used was:
    test=regem_pttls(X=anomalies,maxiter=2,regpar=3)
    aa=test[[3]]
    dimnames(aa$X)=dimnames(anomalies)
    bb=aa$X
    test2=regem_pttls(X=bb,maxiter=2,regpar=3)

    3) I don’t seem to be getting the AWS regem reconstruction to converge anywhere near accurately on Steig’s reconstruction (using uncorrected data for both). With maxiter=50 (which so far as I can see actually involves 51 applications of the algorithm), the excess of the regem output and Steig’s reconstruction is at lowest -3.05 and at highest 3.59, albeit the average standard deviation is not that high at 0.23. May I ask if Steve or anyone else has obtained better results – perhaps I am doing something wrong?

    • Steve McIntyre
      Posted Feb 19, 2009 at 11:45 AM | Permalink

      Re: Nic L (#7),

      X[indmis]=Xmis[indmis] #update data
      X=scale(X,scale=FALSE) # #recenter

      yes, it’s recentered after the splice. That would probably account for the drift.

  7. Steve McIntyre
    Posted Feb 19, 2009 at 11:40 AM | Permalink

    #7. Nic, I’m not presently in a position to provide support or explanation on this. I’ve just established my own foothold. I’ve only tried to reconcile to Jeff Id’s Matlab run so far – reconciling to Steig is another project as we don’t know what “adaptations”, if any, Steig might have done.

  8. Steve McIntyre
    Posted Feb 19, 2009 at 11:55 AM | Permalink

    #7. it looks like this diagnostic is thrown up if you try regem_pttls with no missing values.

  9. Steve McIntyre
    Posted Feb 19, 2009 at 11:56 AM | Permalink

    3) I’ve only reconciled to Jeff Id. I’ll compare to AWS next.

  10. Ryan O
    Posted Apr 13, 2009 at 4:57 PM | Permalink

    Steve,
    .
    In your regem.functions.txt, you have 2 versions. One is the regem_pttls function (which calls pttls) and the other is the regem_pca function, which seems to be self-contained. I was wondering what the difference was between the two.
    .
    I ran both using the cloudmasked data from Comiso and the 42 surface stations (reader.steig.anom):

    get.PCs=function(dat, surf=reader.steig.anom, PC.max=3) {

    dat=t(dat)
    dat.svd=svd(dat)

    ### Multiply the right singular vectors through and store as a time series
    PCs=ts(t((dat.svd$d)*t(dat.svd$v)), start=1982, freq=12)

    ### Return the desired number of PCs next to the surface stations,
    PCs=list(ts.union(surf, PCs[, 1:PC.max]), dat.svd$u[, 1:PC.max], dat.svd$u, dat.svd$v, dat.svd$d)
    PCs
    }

    steig.PCs=get.PCs(avhrr.anom)

    .
    The above is how I get the PCs to place next to the surface stations for running RegEM. The script is based off Jeff Id’s script, with some simplifications.
    .
    Next, I use the first member of the steig.PCs list – “ts.union(surf, PCs[, 1:PC.max])” – which is the 42 surface stations plus the 3 PCs – as an input to both regem_pttls and regem_pca:
    .

    steig.regem.pttls=regem_pttls(steig.PCs[[1]], maxiter=100)
    length(steig.regem.pttls) #28
    steig.regem.pca=regem_pca(steig.PCs[[1]], maxiter=100)
    length(steig.regem.pca) #46

    .
    The pttls version took about 5 times as long, but only took 28 iterations to converge. The pca version took 46 iterations to converge, but was much faster.
    .
    I then extracted the last iteration in each list, took the means of the rows, and plotted the difference:
    .

    .
    Pretty close, but there is more divergence the further you go in time from the 3 satellite PCs. If you do not know why this occurs, that’s okay. It was more a question of curiosity. I benchmarked a reconstruction using the both function versions vs. Jeff Id’s Matlab RegEM reconstruction. He got a continent-wide trend of 0.118; using regem_pca and regem_pttls I got a continent-wide trend of 0.115 for both.