.

In your regem.functions.txt, you have 2 versions. One is the regem_pttls function (which calls pttls) and the other is the regem_pca function, which seems to be self-contained. I was wondering what the difference was between the two.

.

I ran both using the cloudmasked data from Comiso and the 42 surface stations (reader.steig.anom):

get.PCs=function(dat, surf=reader.steig.anom, PC.max=3) {

dat=t(dat)

dat.svd=svd(dat)### Multiply the right singular vectors through and store as a time series

PCs=ts(t((dat.svd$d)*t(dat.svd$v)), start=1982, freq=12)### Return the desired number of PCs next to the surface stations,

PCs=list(ts.union(surf, PCs[, 1:PC.max]), dat.svd$u[, 1:PC.max], dat.svd$u, dat.svd$v, dat.svd$d)

PCs

}steig.PCs=get.PCs(avhrr.anom)

.

The above is how I get the PCs to place next to the surface stations for running RegEM. The script is based off Jeff Id’s script, with some simplifications.

.

Next, I use the first member of the steig.PCs list – “ts.union(surf, PCs[, 1:PC.max])” – which is the 42 surface stations plus the 3 PCs – as an input to both regem_pttls and regem_pca:

.

steig.regem.pttls=regem_pttls(steig.PCs[[1]], maxiter=100)

length(steig.regem.pttls) #28

steig.regem.pca=regem_pca(steig.PCs[[1]], maxiter=100)

length(steig.regem.pca) #46

.

The pttls version took about 5 times as long, but only took 28 iterations to converge. The pca version took 46 iterations to converge, but was much faster.

.

I then extracted the last iteration in each list, took the means of the rows, and plotted the difference:

.

.

Pretty close, but there is more divergence the further you go in time from the 3 satellite PCs. If you do not know why this occurs, that’s okay. It was more a question of curiosity. I benchmarked a reconstruction using the both function versions vs. Jeff Id’s Matlab RegEM reconstruction. He got a continent-wide trend of 0.118; using regem_pca and regem_pttls I got a continent-wide trend of 0.115 for both.

X[indmis]=Xmis[indmis] #update data

X=scale(X,scale=FALSE) # #recenter

yes, it’s recentered after the splice. That would probably account for the drift.

]]>1) The initial values of non-missing observations don’t seem to be preserved, which I thought RegEM was meant to do. For instance, the temperature anomaly for Cape King (ID 7351) December 2004 (anomalies[576,5]) is 1.58125. Looking at the values of the data matrix X output by the regem_pttls function (by applying unlist(sapply(test,function (A) A$X) ) to its output), after iteration 1 the value is 1.675311, after iteration 2 1.724464, after iteration 3 1.755643, after iteration 25 1.805082 and after iteration 51 1.796506.

I see that the column means of the regem output are zero, and I was wondering if re-centering of them on each iteration could explain the drift of the values of non-illed in data?

NB the initial column mean for Cape King is zero, but it is non zero for a number of the other AWS due to the inclusion of various data points (in December 2002, etc) that Steig did not use and to the issue of what period he averaged AWS data over, as per previous posts of mine.

2) When, in order to carry out further iterations, I try to run regem_pttls on its own output (after applying unlist(sapply(test,function (A) A$X), I get the message:

“Error in while (iter tol) { :

missing value where TRUE/FALSE needed”.

The script I used was:

test=regem_pttls(X=anomalies,maxiter=2,regpar=3)

aa=test[[3]]

dimnames(aa$X)=dimnames(anomalies)

bb=aa$X

test2=regem_pttls(X=bb,maxiter=2,regpar=3)

3) I don’t seem to be getting the AWS regem reconstruction to converge anywhere near accurately on Steig’s reconstruction (using uncorrected data for both). With maxiter=50 (which so far as I can see actually involves 51 applications of the algorithm), the excess of the regem output and Steig’s reconstruction is at lowest -3.05 and at highest 3.59, albeit the average standard deviation is not that high at 0.23. May I ask if Steve or anyone else has obtained better results – perhaps I am doing something wrong?

]]>Think how much better it would be if all climate research pieces that use statistics were subject to peer review (and testing) by statisticians. Whatever the outcome we could feel reassured by the robustness of those papers that passed this process.

snip -editorialing

]]>*Visualizing a printing press which involves steel gauntlets…* no wonder things get munged.