Hence my regressions using her “pseudo data” don’t count for much of anything.

I take that back. On rereading my 4/11/08 comment with my OLS regressions above at http://climateaudit.org/2008/04/07/more-on-li-nychka-and-ammann/#comment-142629 ,

I see that I used Steve’s archived version of the 14 MBH proxies rather than the noisy data on Li’s webpage, so the results should be relevant.

The same goes for my CO2 adjustments on 4/12/08 at

http://climateaudit.org/2008/04/07/more-on-li-nychka-and-ammann/#comment-142629

and my Mizon-based model on 4/16/08 at

http://climateaudit.org/2008/04/07/more-on-li-nychka-and-ammann/#comment-142646 .

The data sets that I posted on my website are not the real Northern Hemisphere temperature and the MBH99 proxies. They are generated by adding white noise with unit variance to the standardized real data. The pseudo data sets on my website only serve as a toy example to try the R code that I used in my paper. However, the results in Li et al. (Tellus, in press) are based on the real data instead of the pseudo data. I am sorry that I did not explain very clearly what the data set on my webpage is and also sorry for the confusion that I brought to you as a consequence. I have modified my webpage to make the point more explicitly.

Hence my regressions using her “pseudo data” don’t count for much of anything. It’s strange she wouldn’t post the real data.

Adding unit variance white noise to a series that has already been standardized to unit variance may explain why Steve was finding correlations of precisely sqrt(2) with archived versions.

]]>##

##LOAD FUNCTIONS AND DATA

library(nlme)

extend.persist< -function(tree) {

extend.persist<-tree

for (j in 1:ncol(tree) ) {

test<-is.na(tree[,j])

end1<-max ( c(1:nrow(tree)) [!test])

test2end1) & test

extend.persist[test2,j]<-tree[end1,j]

}

extend.persist

}

proxy=read.table("http://data.climateaudit.org/data/mbh99/proxy.txt",sep="\t",header=TRUE)

proxy=ts(proxy[,2:ncol(proxy)],start=1000) #1000 1980

proxy=extend.persist(proxy)

m0=apply(proxy[903:981,],2,mean);sd0=apply(proxy[903:981,],2,sd)

proxy=ts(scale(proxy,center=m0,scale=sd0),start=1000)

proxy[,3]=- proxy[,3] #flip NOAMER PC1

name0=c("Tornetrask","fran010","NOAMER PC1","NOAMER PC2","NOAMER PC3","Patagonia","Quel 1 Acc","Quel 1 O18","Quel 2 Acc","Quel 2 O18","Tasmania","Urals",

"W Greenland O18","morc014")

dimnames(proxy)[[2]]=name0

sparse<-read.table("ftp://ftp.ncdc.noaa.gov/pub/data/paleo/paleocean/by_contributor/mann1998/nhem-sparse.dat",skip=1)

sparse<-ts(sparse[,2],start=sparse[1,1])

#MAKE DATA FRAME

Z=data.frame(sparse[1:127],proxy[(1854:1980)-999,])

names(Z)[1]="sparse"

fm=lm(sparse~.,data=Z)

fm1 <- gls(sparse~., Z, correlation = corARMA(p = 1, q = 1))

#ARMA 1 1

arima(fm1$residuals,order=c(1,0,1))

#Coefficients:

# ar1 ma1 intercept

# 0.9630 -0.7688 0.0029

#s.e. 0.0272 0.0670 0.0809

#sigma^2 estimated as 0.02881: log likelihood = 44.63, aic = -81.25

arima(fm1$residuals,order=c(2,0,0))

#Coefficients:

# ar1 ar2 intercept

# 0.2576 0.4223 0.0087

#s.e. 0.0797 0.0798 0.0467

#sigma^2 estimated as 0.02998: log likelihood = 42.21, aic = -76.41

#CRU3 NH

source("d:/climate/scripts/spaghetti/CRU3.nh.txt")

Z=data.frame(cru.nh[1:131],proxy[(1850:1980)-999,])

names(Z)[1]="cru"

fm2 <- gls(cru~., Z, correlation = corARMA(p = 1, q = 1))

#ARMA 1 1

arima(fm2$residuals,order=c(1,0,1))

#Coefficients:

# ar1 ma1 intercept

# 0.9521 -0.7081 -0.0024

#s.e. 0.0322 0.0775 0.0532

#sigma^2 estimated as 0.01257: log likelihood = 100.38, aic = -192.77

arima(fm2$residuals,order=c(2,0,0))

#Coefficients:

# ar1 ar2 intercept

# 0.3386 0.3608 -0.0086

#s.e. 0.0809 0.0812 0.0326

#sigma^2 estimated as 0.01324: log likelihood = 97.09, aic = -186.17

#CRU2 NH

source("d:/climate/scripts/spaghetti/CRU2.nh.txt")

Z=data.frame(CRU[1:130],proxy[(1851:1980)-999,])

names(Z)[1]="cru"

fm3 <- gls(cru~., Z, correlation = corARMA(p = 1, q = 1))

#ARMA 1 1

#arima(fm3$residuals,order=c(1,0,1))

#Coefficients:

# ar1 ma1 intercept

# 0.9712 -0.8494 -0.0102

#s.e. 0.0235 0.0472 0.0654

#sigma^2 estimated as 0.02853: log likelihood = 46.4, aic = -84.8

arima(fm3$residuals,order=c(2,0,0))

#Coefficients:

# ar1 ar2 intercept

# 0.1497 0.3294 -0.0235

#s.e. 0.0821 0.0824 0.0293

#sigma^2 estimated as 0.03100: log likelihood = 41.2, aic = -74.4

Hu, have you looked at the effect of ARMA(1,1) rather than AR2 in this type of analysis?

No — Perhaps a Mizon-like “general first order dynamic model” (see #46) should in fact be construed to include 1 lag of the error in addition to 1 lag of y and the x’s. This would include ARMA(1,1) as a testable special case, I believe.

I’m surprised your AR(2) and ARMA(1,1) coefficients look so different. What are the first few correlations implied by the two processes? Are these estimates from the OLS residuals, or are they iteratively obtained using GLS estimates?

]]> # Coefficients:

# ar1 ma1 intercept

# 0.9373 -0.8487 -0.0012

#s.e. 0.0530 0.0783 0.0361

as compared to the following for AR2:

#Coefficients:

# ar1 ar2 intercept

# 0.0724 0.2212 -0.0004

#s.e. 0.0862 0.0862 0.0227

#sigma^2 estimated as 0.033: log likelihood = 36.36, aic = -64.71

I did a quick experiment using the gls function in the nlme package and none of the coefficients were significant. The standard error of the target series (df=127) was 0.265 while the standard error of the residuals (df=112) was 0.248.

]]>Brown & Sundberg 87 take a ML approach that ultimately assumes n is large.

Is that true? Isn’t R just one component of it, and in addition R gives relatively simple means to check possible outliers in , something Steve was looking for earlier. I have to admit that I haven’t spent much time on Brown87 & Brown89..

]]>Brown’s (2.8) includes a factor of

sigma^2(xi) = 1/el + 1/n + xi^T inv((X^T X) xi,

where “xi” is the unknown to be reconstructed (or, in the case of validation, the reserved X value to be compared to the reconstructed value.) This is a scalar, so it factors out and appears outside the quadratic form, but conceptually it starts off as part of the covariance matrix in the middle.

It is true that as n goes to infinity, the coefficients become known perfectly and this term (along with 1/n) drops out. Brown & Sundberg 87 take a ML approach that ultimately assumes n is large. Have they already imposed this assumption in their R?

However, n is not so large in the present context that the coefficient uncertainty can just be ignored. Isn’t this the whole point of your critique of MBH standard errors — that MBH just look at the disturbance term (whose variance equals that of the calibration errors), and overlook the coefficient uncertainty? In the post above, I argue that this factor could be the completely expected source of the ad hoc 1.30 “inflation factor” employed by LNA.

The Davis and Hayakawa paper isn’t online, but I think I can find it in our stacks. If so I’ll send you a copy at the e-mail on your webpage.

Sundberg99 looks like a good update of the issues raised in Brown82 and Brown & Sundberg 87. I’ll take a look at it.

]]>I tried with simple simulations, when I add outliers to , both R and Brown82 2.8 will detect them.

making a record of errors per post, should be

I tried with simple simulations, when I add outliers to , both R and Brown82 2.8 will detect them.

Hu, one more paper that might be of interest,

Prediction Diagnostic and Updating in Multivariate Calibration, by Brown and Sundberg, Biometrika, Vol. 76, No. 2. (Jun., 1989), pp. 349-361.

]]>In #53 replace with ,

S is used in Brown82 Eq. 2.8, and this equation for R uses .

Hu,

I am puzzled that the R in the new paper does not seem to take “xi”

See Sunberg99 Eq. 2.12, I guess that kind of division is useful in Brown87; ML estimator is very close to CCE, except when R is too large. R is asymptotically distributed (as . I tried with simple simulations, when I add outliers to , both R and Brown82 2.8 will detect them.

]]>