So how did Steig et al. calculate the AWS reconstruction? Since we don’t have either the satellite data or the exact path they followed using the RegEM Matlab files, we can’t say for sure how the results were obtained.
However, using a little mathemagic, we can actually take the sequences apart and then calculate reconstructed values for those times when the reconstructions have been replaced by the measured temperature values.
I realize that there are problems with the AWS data sets, but for our purposes that is irrelevant. This analysis can easily be repeated for the updated results if and when Prof. Steig does the recalcs.
We are given 63 reconstructed sequences each extending monthly from 1957 to 2006. The first step is to look only at the part of those sequences which is not affected by either the measured values nor by the satellite induced manipulation – the time period 1957 – 1979.
We begin by using a principal component analysis (and good old R) on the truncated sequences. For our analysis, we will need three variables previously defined by Steve Mc.: Data, Info, and recon_aws. In the scripts, I also include some optional plots which are not run automatically. Simply remove the # sign to run them.
#Run this section if you don’t have the variables Data, Info and recon_aws
#start the calculations
early.rec = window(recon_aws,start=1957,end = c(1979,12))
rec.pc = princomp(early.rec)
# Comp.1 Comp.2 Comp.3 Comp.4 …
#1.334132e+01 6.740089e+00 5.184448e+00 1.267015e-07 …
There are 63 eigenvalues in the PCA. The fourth largest one is virtually zero. This makes it very clear that the reconstructed values are a simple linear combination of only three sequences, presumably calculated by the RegEM machine. The sequences are not unique (which does not matter for our purposes).
We can use the first three principal components from the R princomp procedure:
rec.score = rec.pc$scores[,1:3]
#matplot(time(early.rec),rec.score,type = “l”)
#flip the PCs for positive trend
#does not materially affect the procedures
rec.score = rec.score%*%diag(sign(coef(lm(rec.score ~ time(early.rec)))[2,]))
#SM: this gives coefficients for all 63 reconstructions in terms of the 3 PCs
pred.recon = predict(reg.early)
max(abs(predict(reg.early)-early.rec)) # 5.170858e-07
I have flipped the orientation of the PCs so that the trend coincides with a positive trend in time for each component. This also doesn’t make a difference to our result (I sound like a member of the team now) because the coefficient which multiplies the PC in the reconstruction will automatically change sign to accommodate the change. pred.recon is the reconstruction using only the three PCs . The largest difference between Steig’s values and pre.con is .0000005 so the fit is good!
Now, what about the time period from 1980 to 2006? The method we have just used will not work because, for each series, some of the reconstructed values have been replaced by measurements from the AWS. We will assume that exactly three “PCs” were used with the same combining coefficients as in the early period.
The solution is then to find intervals in 1980 to 2006 time range where we have three (or more) sites which do not have any actual measurements during that particular interval. Then, using simple regression, we can reconstruct the PC sequences. It turns out that the problem can be reduced to just two intervals: 1980 – 1995 and 1996-2006:
#figure out first and last times for data for each aws
dat_aws = window(Data$aws[,colnames(early.rec)],end=c(2006,12))
not_na = !is.na(dat_aws)
trange = t(apply(not_na,2,function(x) range(time(dat_aws)[x])))
#construct 1980 to 1995
#identify late starters
# 89828 21363 89332
#1996.000 1996.083 1996.167
names1 = c(“89828″, “21363″, “89332″)
reg1 = lm(rec.score ~ early.rec[,names1]+ early.rec[,names1]+ early.rec[,names1])
rec.pc1 = cbind(1,window(recon_aws[,names1],start=1980,end =c(1995,12)))%*%coef(reg1)
#matplot(rec.pc1,type = “l”)
#construct 1980 to 1995
#identify earlier finishers
# 89284 89834 89836
#1992.167 1994.750 1995.917
names2 = c(“89284″,”89834″,”89836″)
reg2 = lm(rec.score ~ early.rec[,names2]+ early.rec[,names2]+ early.rec[,names2])
rec.pc2 = cbind(1,window(recon_aws[,names2],start=1996))%*%coef(reg2)
#matplot(rec.pc2,type = “l”)
Now we put the two together and check to see how the fit is (ignoring the elements that have been replaced)
#check for fit 1980 to 2006
late.rec = window(recon_aws,start=1980)
estim.rec = cbind(1,rbind(rec.pc1,rec.pc2))%*%reccoef
diff.rec = estim.rec-late.rec
matplot(time(late.rec),diff.rec,type=”l”,main = “Diff Between Recons and Measured Temps”,ylab = “Anom (C)”,xlab = “Year”)
Not bad! Again we have a very good fit. The differences visible in the plot are the differences between the reconstruction we just did and the “splicing” of the AWS data by Steig. Some of the larger ones are likely due to the already identified problems with the data sets.
Finally we put everything together into a single piece:
recon.pcs = ts(rbind(rec.score,rec.pc1,rec.pc2), start = 1957,frequency=12)
colnames(recon.pcs) = c(“PC1″,”PC2″,”PC3″)
matplot(time(recon.pcs),recon.pcs,type = “l”,main = “AWS Reconstruction PCS”,xlab=”Year”, ylab =”PC Value”)
legend(1980,-40,legend = c(“PC1″, “PC2″,”PC3″),col = c(1,2,3), lty = 1)
#trends (per year) present in the PCs
reg.pctrend = lm(recon.pcs ~ time(recon.pcs))
# PC1 PC2 PC3
#(Intercept) -199.522419 -62.90221792 -222.1577310
#time(recon.pcs) 0.101605 0.03189134 0.1128271
The trend slopes (which are per year) should be taken with a grain of salt since the PCs are multiplied by different coefficients (often quite a bit less than one).
Where can we go from here? I think one interesting problem might be to evaluate the effect of surface stations on the PC sequences (only pre-1980). This might give some insight on what stations drive the results. As well, AWS positions appear to be related to the coefficients which determine the weight each PC has on that AWS reconstruction.
Finally, measures of “validation” can now be calculated using the observed AWS data to see how well the reconstruction fits. All possible without knowin’ how they dunnit!