Mannian Collation by Santer

I reported a while ago on Santer’s refusal to provide the T2 and T2LT data as collated. Despite Santer’s rude refusal, PCMDI placed the data online anyway, notifying me that this had been their plan along. I reported on this at the time, but so far haven’t reported on the collated data.

The T2 and T2LT data is now located at http://www-pcmdi.llnl.gov/projects/msu in a reasonably accessible format. I collated the T2 and T2LT “TROP” data sets, each into a time series with 49 columns (representing each of the 49 runs) and as a first cut analysis, took an average of the 49 runs and the count of represented runs up to 2000 (when the population decreases sharply, though a few models are archived up to 2050). This yielded the following interesting graphic:

I was intrigued by the sharp decline in the average of the models around 1960. There were no changes in population at the time (as shown by the count information.)

I cross-checked the data in a couple of different ways, one of which was the following barplot of the maximum value for each of the 49 model-run combinations. The 7th model CNRM3.0_run1 had a very anomalous maximum value of over 13 deg C.

Figure 2. Maximum Value for the 49 PCMDI Model-Run Combinations.

I plotted the values for the 7th model-run combination (from CNRM in France), yielding the following interesting graph, showing some sort of goof in the data as archived. I don’t know whether the error rests with Santer’s collation, with the archiving at PCMDI or with the underlying model. The only thing that we know with considerable certainty is that Santer will say that the error doesn’t “matter”.

Figure 3. Santer Collation of CNRM3.0 Tropical Troposphere Data

When he refused to provide data as collated in response to my original request, Santer had emphasized how easy it was to calculate the T2 and T2LT averages from the terabytes of data at PCMDI:

You should have no problem in accessing exactly the same model and observational datasets that we employed. You will need to do a little work in order to calculate synthetic Microwave Sounding Unit (MSU) temperatures from climate model atmospheric temperature information. This should not pose any difficulties for you. Algorithms for calculating synthetic MSU temperatures have been published by ourselves and others in the peer-reviewed literature. You will also need to calculate spatially-averaged temperature changes from the gridded model and observational data. Again, that should not be too taxing.

Perhaps so. But the operation appears to have enough complexity in it to result in a botched version of the CNRM 3.0 run1 in the PCMDI archive for Santer et al 2008.

UPDATE: The impact of one bad series on the 49-series is not negligible as shown below.

UPDATE: Here’s a comparison of averages of GISS AOM (which apparently does not have volcanic forcing) with GISS EH (which does) – both averages of archived T2 tropical troposphere. I must say that I’m a little surprised by the form of the difference between the two – my understanding was that volcanic impacts were held to be fairly transient, whereas there is a notable displacement of GISS EH relative to GISS AOM around 1960. The step takes place at or near the time of the bad CNRM3.0 splice, resulting in a peculiar similarity between the diagram showing GISS EH versus GISS AOM and the the diagram with and without the bad CNRM run.

Figure: vertical red lines are at 1960.0 and 1965.0.
Documentation of models is said to be available at www-pcmdi.llnl.gov/ipcc/model_documentation/ipcc_model_documentation.php .

This entry was written by Stephen McIntyre, posted on May 9, 2009 at 10:41 AM, filed under Santer and tagged cnrm, pcmdi, santer. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

130 Comments

TonyS

Posted May 9, 2009 at 10:46 AM | Permalink

Now that can’t be right. Noise over a step function? Surely there must be something quite wrong here…
Steve McIntyre

Posted May 9, 2009 at 10:51 AM | Permalink

If anyone doubts the error, here is a script that scrapes the PCMDI data and yields the above graph (this copied from my working script and I sometimes edit at the console):

##DOWNLOAD DATA: both T2 and T2LT
tam=list(NA,2);names(tam)=c(“T2″,”T2LT”)
url=”http://www-pcmdi.llnl.gov/projects/msu”
name0=c(“tam2.tar.gz”,”tam6.tar.gz”)
for (k in 1:2) {
download.file(file.path(url,name0[k]),”temp.gz”,mode=”wb”)
handle=gzfile(“temp.gz”)
fred=readLines(handle)
close(handle)
N=length(fred)
index2= grep(“RTIME”,fred) #length 49
index1= c(index2[2:length(index2)]-28,N)
index=data.frame(index2+1,index1);names(index)=c(“start”,”end”)
writeLines(fred,”temp.dat”)
writeLines(fred[28],”name1.dat”)
name1=scan(“name1.dat”,what=””)
id=fred[grep(“Data:”,fred)];id=substr(id,34,nchar(id));id=gsub(” +$”,””,id)

tam[[k]]=list()
for(i in 1:49) {
tam[[k]][[i]]=read.table(“temp.dat”,skip=index$start[i]-1,nrow=index$end[i]-index$start[i]+1)
names(tam[[k]][[i]])=name1
tam[[k]][[i]]=ts(tam[[k]][[i]],start=c(round(tam[[k]][[i]][1,2]),1),freq=12)
}
names(tam[[k]])=id
}# k
save(tam,file=”d:/climate/data/models/santer/tam.tab”)

##COLLATE INFO
fred=strsplit(names(tam[[1]]),”/”)
sapply(fred,length) #all 4
info.pcmdi=sapply(fred,function(A) A[[1]])
info.pcmdi=data.frame(info.pcmdi)
names(info.pcmdi)=”model”

test=sapply(fred,function(A) A[[2]])
test=gsub(“-“,”_”,test)
test=strsplit(test,”_”)
n=sapply(test,length)
info.pcmdi$run=sapply(test, function(A) A[[length(A)]])
#run id

##COLLATE T2 and T2LT Time Series
test=sapply(tam[[“T2″]],function(A) A[,”TROP”])
T2=NULL
for(i in 1:length(test)) T2=ts.union(T2,test[[i]])
dimnames(T2)[[2]]=id
temp= T2< -900
T2[temp]=NA
dim(T2)
#[1] 2412 49
tsp(T2)
year= floor( time(T2));
N=length(unique(year));month=rep(1:12,N)
T2out=cbind(year,month,T2) ;dimnames(T2out)[[2]]=c("year","month",id)
#write.table(T2out,file="d:/climate/data/models/santer/T2out.csv",sep="\t",row.names=FALSE)

temp= time(T2)<2001
plot( c(time(T2))[temp],T2[temp,7],type="l",main=paste(id[7],"T2"),ylab="deg C");abline(h=0,lty=2)
- John Ritson
  
  Posted May 10, 2009 at 4:25 AM | Permalink
  
  Re: Steve McIntyre (#2),
  I tried running the script but it clagged out at
  index1= c(index2[2:length(index2)]-28,N)
  Error: object ‘N’ not found
  
  Any assistance would be appreciated.
  
  Steve: Fixed. N is the length of the readLines object. Sorry bout that. Don’t know how that miss crept in.
- Dave Dardinger
  
  Posted May 12, 2009 at 5:30 PM | Permalink
  
  Re: Steve McIntyre (#2),
  
  I decided I really need to learn R, so I pulled up the installation I’d created a couple of years ago and loaded in your code. Then I copied the download section into the R Console. It took some time by but read 13 items twice [ tam2 & tam6 ] But I have a problem with the save line. Since I don’t have a d drive, I changed it to c, but didn’t try creating the file structure implied. The actual error messages are:
  
  > save(tam,file=”c:/climate/data/models/santer/tam.tab”)
  Error in file(file, “wb”) : unable to open connection
  In addition: Warning message:
  cannot open file ‘c:/climate/data/models/santer/tam.tab’
  
  Should I have left it as d:, even though I don’t have such a drive, or should I create c:/climate/data/models/santer/ or what?
  - Jeff Id
    
    Posted May 12, 2009 at 7:16 PM | Permalink
    
    Re: Dave Dardinger (#83),
    
    I believe you need to have the folder already created.
MarkB

Posted May 9, 2009 at 11:05 AM | Permalink

This is why teacher tell you “Show your work” in math class. The principle shouldn’t be difficult to undstand.
Ross McKitrick

Posted May 9, 2009 at 11:10 AM | Permalink

It’s very reassuring to learn that the algorithms were published by Santer and colleagues in the peer-reviewed literature. Otherwise we might have worried there was a screwup somewhere.
- stan
  
  Posted May 10, 2009 at 6:15 AM | Permalink
  
  Re: Ross McKitrick (#4),
  
  “peer-reviewed” is becoming a pejorative. If he were still alive, George Carlin might someday have to add it to his “seven words” sketch.
Steve McIntyre

Posted May 9, 2009 at 11:17 AM | Permalink

I uploaded collated monthly versions for the 49 runs into CA/data/santer/T2out.csv and T2LTout.csv – both tab-separated ASCII.
Steve McIntyre

Posted May 9, 2009 at 11:27 AM | Permalink

Maybe we should be looking for a bristlecone strip bark core that “predicts” the CNRM3.0 model.
Dishman

Posted May 9, 2009 at 11:57 AM | Permalink

Steve, I know you like to believe that people are acting in good faith.

I’m sorry, but I feel compelled to question that in this case.

It seems likely to me that Santer knew of the specific problem with this data set when he rudely refused to release the data.

It seems to me that he was adopting a stance of arrogance to conceal a known defect. This is a common human behavior. (oh noes, I’ve accused him of being human.)
Jeff Id

Posted May 9, 2009 at 11:57 AM | Permalink

It’s the same rubbish as Steig, it should be no problem for you to guess a hundred steps. Work it out.

I just snipped the rest. — again. Sigh..
Luboš Motl

Posted May 9, 2009 at 12:19 PM | Permalink

The 1900 warming by 13 degrees was caused by the collapse of capitalism while the refreshing cooling around 1962 was caused by the publication of the Silent Spring:

http://www.marxists.org/archive/lafargue/1900/xx/bankrupt.htm
http://en.wikipedia.org/wiki/Silent_Spring

Steve, before you allow the septics 😉 to develop their speculations, you should think twice about the Real Climate drivers!

Best wishes
Gavin …
Shallow Climate

Posted May 9, 2009 at 12:49 PM | Permalink

Aha! Now I see! Mr. McIntyre, “You have made one of the all-time classic blunders! The most famous of which is never get involved in a land war in Asia…” But your blunder is interpreting Santer’s comments to you as rudeness–they were actually a sincere compliment! Ya see, he was saying that you COULD sort through this mess, although he could not. (So he had no choice but to go ahead and publish something anyway and then hide the mess.) Ya see, this guy worships you.
snip
Alan S. Blue

Posted May 9, 2009 at 1:12 PM | Permalink

How does simple removal of the offending model effect the overall average (aka first figure)? And does the agreement with instrumental temperature improve?
- Steve McIntyre
  
  Posted May 9, 2009 at 1:57 PM | Permalink
  
  Re: Alan S. Blue (#11),
  I’ve added a figure showing the impact of this one goofy series on the ensemble average.
  
  Santer et al 2008 considers the period 1979-1999. This bit of goofiness is in an earlier period (1900-1960). Given the voluminous number of studies utilizing PCMDI data, I presume that some study somewhere has used this goofy information – unless of course the error is limited to Santer’s collation.
theduke

Posted May 9, 2009 at 1:47 PM | Permalink

. . . Rather that “auditing” our paper, you should be directing your attention to the 2007 IJoC paper published by David Douglass et al., which contains an egregious statistical error.

Please do not communicate with me in the future.
Ben Santer
- Dishman
  
  Posted May 9, 2009 at 2:49 PM | Permalink
  
  Re: theduke (#12),
  
  Oof. I see that in a whole new light now.
  
  It appears to me that he knew there were flaws, and tried to direct Steve’s attention elsewhere.
  
  “No, no, do it’ to Julia.”
  
  Steve’s polite requests for data should not strike fear into the hearts of men.
Pat Frank

Posted May 9, 2009 at 4:57 PM | Permalink

The CNRM square wave step in Figure 3 puts me in mind of Figure 1 in the paper Chris Essex ea published on prediction in chaotic systems, 2007 “Broken symmetry and long-term forecasting” JGR 112, D24S17 1-9, doi:10.1029/2007JD008563.

They show that large-scale vertical jumps can appear in the dynamics of a chaotic system due only to the internal dynamics, without any external cause. They call these jumps dynamical surprises, in that they cannot be anticipated. They also show that the system can return just as suddenly to the former state. Very small changes in the starting conditions can cause a system to exhibit a dynamical surprise, where before it apparently evolved coherently.

So, really, maybe the CNRM model output isn’t actually “goofy.” Maybe it’s telling us something profound about the evolution of chaotic systems. Maybe that jump is a true model result — a spontaneous and relevant analogy to “global warming.”

Gee, maybe climate can warm or cool without any change at all in external forcing. What a thought. It just takes the right variation in the climatic micro state to produce the new initial conditions that lead to a sudden large thermal transition.

All this business of not enough intensity in TSI to account for the climate change is looking in the wrong direction. Causeless sudden steps are just a natural outcome of a chaotic climate. Richard Lindzen has been pointing this out for a decade or more.
- John Ritson
  
  Posted May 11, 2009 at 6:47 AM | Permalink
  
  Re: Pat Frank (#15),
  Pat it’s a 10 degree jump, it would have gone down in history if it was real. It didn’t happen, it’s just another mistake (like all the others).
  - Pat Frank
    
    Posted May 11, 2009 at 11:39 AM | Permalink
    
    Re: John Ritson (#56), That’s true, John, but it’s a model simulation, not actual physical data. Given the hysteresis of chaotic systems, maybe the jump was a legitimate manifestation of the CNRM model rather than an aberration.
    - John Ritson
      
      Posted May 11, 2009 at 3:14 PM | Permalink
      
      Re: Pat Frank (#59), Pat, I take your point. In either case, a shaky model or bad data handling has resulted in a non physical result that should have been rectified or excluded from the paper.
Sonicfrog

Posted May 9, 2009 at 5:29 PM | Permalink

How long until Gavin claims some anonymous reader at Climate Audit found the error first?
- AnonyMoose
  
  Posted May 10, 2009 at 8:23 AM | Permalink
  
  Re: Sonicfrog (#16),
  
  How long until Gavin claims some anonymous reader at Climate Audit found the error first?
  
  Now that it’s a holiday Sunday, Gavin will pull someone away from their mother and get this critical data problem fixed right away. But if course it’s not critical to all the studies which used it.
  - Steve McIntyre
    
    Posted May 10, 2009 at 8:43 AM | Permalink
    
    Re: AnonyMoose (#34),
    Gavin’s effort to expunge evidence of the Harry error at BAS was pretty amazing – especially when the supposed urgency didn’t extend to fixing the GISS version, which remained in correct for some weeks afterwards.
    
    Anyway we’ve given Gavin something to do today.
Poptech

Posted May 9, 2009 at 5:59 PM | Permalink

Thankfully this paper was published in a Peer-Reviewed Journal and worked on by mathematical wizard Gavin Sch[m]idt, clearly making data errors simply impossible. I know, I’ve checked the wiki and 2500 of the worlds greatest climatologists to ever exhale CO2 agree, there are no such thing as data errors (so says Wikipedia – the sum of all human knowledge)
Neil Fisher

Posted May 9, 2009 at 6:47 PM | Permalink

It never ceases to amaze me that the Team takes such a bad attitude to people who ask questions. As I’ve said before, taking the time to organise your thoughts sufficiently to coherently answer a question you haven’t considered deeply before often results in a “Eureka!” moment – at least it often does for me. And judging by the reactions and explainations of the team to previous CA deconstructions, once they get past the “it doesn’t matter” (denial) stage, it appears to me that the same thing happens to them. Odd, then, that so very few of them have seen the value of such work and continue to denigrate the fine work that is shown here, Jeff Id’s blog, and Lucia’s blog.

Keep plugging away people – the work is extremely valuable, even though that might not be widely recognised until some, umm, climate scale time has passed.
Barclay E. MacDonald

Posted May 9, 2009 at 8:09 PM | Permalink

Maybe I’m wrong, and this is all hindsight, but the error Steve points out does seem like an obvious one, readily noticed by someone familiar with or reviewing the data. My first reaction was to wonder whether there wasn’t simply some error in transcribing the data to the archive or some error in Steve’s collection of it. Obviously Steve has checked for the latter and is asking others to review what he did, just in case he missed something.

If it is an error in transcribing Santer’s data, it doesn’t appear that the person responsible for the transfer was not paying attention. On the archive site Steve refers to above, it states, “No additional documentation nor additional support for these data can be provided”

So it was known that what was archived had to be accurate. There is no place else to go. Moreover, it was anticipated these datasets needed to be accurate because they would be referenced in future publications: “Publications using any or all of the synthetic MSU T2 temperatures and/or the synthetic T2LT temperatures should reference these datasets as follows: “Synthetic MSU temperatures from 49 simulations of 20th century climate change were calculated as described in Santer, B.D., et al., 2008: Consistency of modeled and observed temperature trends in the tropical troposphere. International Journal of Climatology, 28, 1703-1722, doi:10.1002/joc.1756.”

But of course there is the following disclaimer: “Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees (I assume this includes Santer) makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.”
Steve McIntyre

Posted May 9, 2009 at 8:33 PM | Permalink

I can pretty much exclude collation problems at my end. This period of the data wasn’t used in Santer et al 2008, but the data has presumably been used in other studies over the 20CEN, where it would have an impact if this is the version in use.

When one sees this sort of error, I am seldom comforted by statements that this error is one-off without knowing precisely what coding led to the error. While the quantum of the error may be large in this run (hence the discovery of the effect), perhaps there are other less obvous goofs in collation elsewhere.

IMO, if Santer believes that this is an isolated problem, the easiest way to forestall criticism is to show the source code and thereby demonstrate that it’s one off. In a real audit, the onus would be on the company to prove that there’s no problem; it’s not all that wasy to assess the impact of this sort of problem without knowing exactly what people did and this takes time. I know the steps for the ensemble mean, but not for the H1 model runs.
Geo

Posted May 9, 2009 at 8:43 PM | Permalink

It really is a pity they don’t see the obvious usefullness of using Steve as a resource rather than insisting on him as the enemy. To quote an old saying, they would be much, much better off having him inside the tent pissing out, rather than the outside the tent pissing in. It would be a much better world to be arguing over interpretations rather than GIGO.
Alan Wilkinson

Posted May 9, 2009 at 9:03 PM | Permalink

The updated “average run” chart without the “bad CNRM” still shows a step down at 1960, merely halved in size.

So is there more bad data hidden somewhere or is there some rational reason to expect such an abrupt decrease in temperature at that date?
- Steve McIntyre
  
  Posted May 9, 2009 at 9:16 PM | Permalink
  
  Re: Alan Wilkinson (#22),
  
  I’ve been trying to figure that out. It appears to be due to how volcanic forcings are modeled and related to the Agung volcano – tho not all 2-CEN models appear to include volcanos/
  - Alan Wilkinson
    
    Posted May 9, 2009 at 10:19 PM | Permalink
    
    Re: Steve McIntyre (#23),
    
    According to Hadcrut the biggest monthly temp drop after Mount Agung was 0.45 deg C which seems pretty close to the drop in your adjusted chart. Assuming the models “model” volcanic forcings by plugging in actual date/magnitude of eruptions then if some of the models don’t do this it seems odd that the average drop remains as large as the actual drop.
    - Nick Moon
      
      Posted May 10, 2009 at 9:18 AM | Permalink
      
      Re: Alan Wilkinson (#25),
      
      …if some of the models don’t do this it seems odd that the average drop remains as large as the actual drop.
      
      That is a strange co-incidence isn’t it? I suppose it could be just serendipity or well, hey you don’t suppose it could be something like…
      
      Some climate scientist takes a whole bunch of time series datasets and decides to average them together. But it doesn’t give the right answer (or answer they want). So they try adding and removing series until they get something that ‘looks right’. You could end up with a whole pile of essentially similar series, and then one rogue one added in, becuase it changes the average in one particular part of the series by just the right amount.
      
      No, something like that has never happened and surely never could get past the peer review process 🙂
      
      But, why 49 series why not 50 or a hundred? And why are some models run once and some models loads of times? Who decides to include 5 versions of one model run but only one of another?
- lucia
  
  Posted May 9, 2009 at 9:22 PM | Permalink
  
  Re: Alan Wilkinson (#22),
  
  .. updated “average run” chart without the “bad CNRM” still shows a step down at 1960, merely halved in size
  
  The step down seems to correspond to Mount Agung which erupted in 1963. http://en.wikipedia.org/wiki/Mount_Agung
  
  The error SteveM found is definitely troubling. It would be easy enough to find something that big.
pete m

Posted May 9, 2009 at 11:50 PM | Permalink

I collated the T2 and T2LT “TROP” data sets, each into a time series with 49 columns (representing each of the 49 runs) and as a first cut analysis, took an average of the 49 runs and the count of represented runs up to 2000 (when the population decreases sharply, though a few models are archived up to 2050).

Steve -did you mean 2005?

an egregious statistical error

And lol. Beware the quote that comes back to haunt you.

Steve: 2050 is correct. A few models (ECHAM) carry forward to 2050.
james lane

Posted May 10, 2009 at 1:55 AM | Permalink

I’m a little confused. Does the error persist into Santer, or is it limited to the PCMDI archive?
- Steve McIntyre
  
  Posted May 10, 2009 at 5:57 AM | Permalink
  
  Re: james lane (#27),
  
  Santer has done a number of widely cited articles, of which Santer et al 2008, the one arguing with Douglass et al 2007, is only one.
  
  In other articles (And at present this is only a quick impression), Santer seems to have worked off the same collation of PCMDI models. For example, Santre et al PNAS 2006 ( http://www.pnas.org/content/103/38/13905.abstract) looks to me like it uses the same collation – but I’m not certain of this.
  
  The data archived for Santer et al 2008 covers quite a bit more than Santer et al 20008, which limits itself to the satellite period – so this particular goof doesn’t matter to the supposed findings in the satellite period – some of which don’t hold up on other grounds. BTW I’ve posted our IJC submission up at arxiv.org (more on that on another occasion).
  
  However, CNRM3.0 over the full course of the 20th century appears to have been used in Santer et al PNAS 2006. Does the error in the archive affect the article? Dunno. It’s possible that they used a different data set for their articles and this error was introduced only in the archived version – sort of like Mann’s excuse back in 2003.
  
  We’ll see. To evaluate this sort of thing, you need to know how the error was introduced and what studies the erroneous data was used in, if any. I’ll wait and see what PCMDI says before examining this further.
Steve McIntyre

Posted May 10, 2009 at 6:22 AM | Permalink

I’ve added a comparison of averages of GISS AOM (which apparently does not have volcanic forcing) with GISS EH (which does) – both averages of archived T2 tropical troposphere. I must say that I’m a little surprised by the form of the difference between the two – my understanding was that volcanic impacts were held to be fairly transient, whereas there is a notable displacement of GISS EH relative to GISS AOM around 1960. The step takes place at or near the time of the bad CNRM3.0 splice, resulting in a peculiar similarity between the diagram showing GISS EH versus GISS AOM and the the diagram with and without the bad CNRM run.

The two versions are more or less in synch after 1960, but GISS EH trends up relative to GISS AOM prior to 1960, with a very large step in the GISS EH series around 1960. It’s hard to picture that the Agung impact by itself would be so permanent. So they must have changed something else around 1960 – maybe they changed their aerosols or something like that.
- Craig Loehle
  
  Posted May 10, 2009 at 8:09 AM | Permalink
  
  Re: Steve McIntyre (#31), It looks to me like a 15 year recovery time from the 1963 volcano (assuming that is what dropped the temp). That is far longer than actual and suggests either too long a residence time for volcanic aerosols or errors in how the earth warms/cools due to changes in forcing which then takes a long time to recover from a cool spot.
Sonicfrog

Posted May 10, 2009 at 8:11 AM | Permalink

How long until Gavin claims some anonymous reader at Climate Audit (?) found the error first?

WOW. My worst snark ever. That should be “RealClimate”.

PS. Happy Mom’s Day to the Moms.
Hu McCulloch

Posted May 10, 2009 at 9:13 AM | Permalink

You’d think that with 17 authors, at least one of them would have taken a look at the data:

B. D. Santer, P. W. Thorne, L. Haimberger, K. E. Taylor, T. M. L. Wigley,
J. R. Lanzante, S. Solomon, M. Free, P. J. Gleckler, P. D. Jones, T. R. Karl, S. A. Klein, C. Mears, D. Nychka, G. A. Schmidt, S. C. Sherwood, and F. J. Wentz

I wonder which will come forward and take the responsibility?

Did Douglass, Christy, Pearson and Singer use the same data in their original article, or a different version without the big error?

If this conspicuous error is present, it’s likely that lesser ones are present as well — Steve’s discovery of the Harry AWS error led to Racer Rock, plus several others.

I just heard Fred Singer speak at OSU, and read his “Nature, Not Human Activity, Rules the Climate”, which gives a lay-level explanation of some of the issues in Douglass et al, though it precedes the important Santer+16 comment. It’s available via http://www.sepp.org or http://www.heartland.org.
Jeff Norman

Posted May 10, 2009 at 9:14 AM | Permalink

Steve,

As always, your work is truly appreciated.

The 9th, 10th & 11th runs appear to have the next highest maximum values, on the order of 2°C. Are these the ones that extend out to 2050 or are these next on your list of things to check?
Paul Linsay

Posted May 10, 2009 at 10:16 AM | Permalink

This whole business of averaging models is nonsense. Each one embodies a different set of physical assumptions. At best, only one can be right. They could all be wrong. One theory says the sky is blue, the other that it’s red, therefore we take the average and agree that it’s green. Huh? Ensemble averaging of models is an admission of ignorance, not a statement of understanding.

We discussed this a while ago over at Lucia’s. I think I convinced her of this.

snip

/rant
pyromancer76

Posted May 10, 2009 at 10:38 AM | Permalink

It is obvious that scientific publications — “peer reviewed” — need an (honest) auditor. There should be some consequences (isn’t this how we raise children) for all co-authors [#32] who approve errors in data or interpretation, and for publications (owners, editors) if they do not require the data and methods of interpretation filed with them before publication date.

At least the government data base (or whatever its purpose) PCMDI maintains “public” records. However, they should have some “auditor” who stands by the accuracy of the data stored and/or who makes certain it is corrected as soon as errors are found.

snip – prohibited editorializing

I think the effect of volcanic eruptions on temperature/climate with regard to step changes and the time for recovery needs closer attention — #24 and #32.
Dee Nyer

Posted May 10, 2009 at 11:05 AM | Permalink

This is off-topic and I am completely new to ccommenting here, but I do not know how or where to place this in a site-friendly manner. I come here frequently to read, but am not qualified to add informed comment. So, correct me gently.

Actually, my comment is a question. As a layman relying solely on common sense and what I read here as well as WattsUp, I have been embroiled in one of those common exchanges on another site with AGW zealots. I would appreciate comment on the following poll. I instinctively sense there is something amiss:

http://news.mongabay.com/2009/0122-climate.html

Looking a little further, I discovered that the 97% figure came from 75 of 77 ‘climatologists’ (out of over 3,000 respondents) agreeing with the two major questions of the poll.

Click to access 012009_Doran_final.pdf

Any help?
Steve McIntyre

Posted May 10, 2009 at 11:07 AM | Permalink

GISS introdues volcanic forcing through Stratospheric Optical Depth, estimates for which were developed by Hansen’s colleagues at GISS – see references here: http://data.giss.nasa.gov/modelforce/strataer and, strictly speaking, were not developed independently of the temperature data. Here’s the GISS graphic showing the history.
Plimple

Posted May 10, 2009 at 11:15 AM | Permalink

Does this error impact upon Santer’s study i.e of a comparison between model results, radiosonde data and satellite observations during the period from the late seventies to the late nineties / early 2000s?

It’s not obvious to me from your figures that it will impact upon their analysis. The error seems to end in the 1960s whereas their analysis begins in the late seventies. Also, Santer show comparisons between the observational data and each separate model, and CNRM doesn’t seem to be an outlier in this instance i.e. in figure 3. Of course, later when they plot multi-model averages this could be important if they were calculating trends pre late seventies.

Another possibility. Have you looked at the CNRM data in the IPCC model archive? Is this error present in that data, or is its genesis definitely linked to Santer’s efforts?
- Steve McIntyre
  
  Posted May 10, 2009 at 12:38 PM | Permalink
  
  Re: Plimple (#43),
  
  Another possibility. Have you looked at the CNRM data in the IPCC model archive? Is this error present in that data, or is its genesis definitely linked to Santer’s efforts?
  
  No idea. At some point, I’ll notify PCMDI of the error and ask them to investigate.
  
  The error seems to end in the 1960s whereas their analysis begins in the late seventies.
  
  I discussed this above, noting the same point. The problem, if any, is more likely to be with studies like Santer et al PNAS 2006, which used models for the full 20CEN than with Santer et al 2008, which is post-1979. Please look at my earlier discussion of this point.
dearieme

Posted May 10, 2009 at 12:34 PM | Permalink

“Santer’s rude refusal”: careful, chaps. Santer has a contractual obligation to be rude. It’s called the Santer Clause.
David Douglass

Posted May 10, 2009 at 1:09 PM | Permalink

Please excuse my intrusion. I am just an everyman who only superficially understands the arguments here. Is there any way for this discussion to be made accessible (meaning understandable) to the public at large? This is so important; Our future hinges on it. And, maybe because my name is David Douglass, I’d like to avoid the possibility that my namesake could be responsible for catastrophic decision making. It’s obvious now that the power of public opinion CAN be a powerful force.

Steve: There’s nothing in this post that is relevant to policy making nor of general public interest. Please relax.
Alan S. Blue

Posted May 10, 2009 at 1:54 PM | Permalink

“There’s nothing in this post that is relevant to policy making….”

Well, I certainly wouldn’t say that.

Both at a governmental level and a journal level there is a clear need for more transparency. This post would benefit from a code-publishing requirement. The way things appear to be, there are actually requirements – but the actual application appears sporadic and clearly insufficient for replication. (Which, IMNSHO should be the absolute minimum level of compliance allowed.)
Henry

Posted May 10, 2009 at 3:09 PM | Permalink

It is interesting that you first spotted the 1960 dip rather than the 1900 jump. There are probably two reasons for this.

One is that there are in fact two dips in the 1960s making the effect particularly dramatic.

The other is that with a rising trend, erroneous upward jumps are harder to spot, since they look like an extreme version of what is happening anyway. This could lead to an unconscious bias in other examples; authors checking might be spotting downward errors more easily and ignoring upward errors. All the more reason for allowing others access to the code.
Hu McCulloch

Posted May 10, 2009 at 4:26 PM | Permalink

RE David Douglass # 46,

Is there any way for this discussion to be made accessible (meaning understandable) to the public at large?

I recommend you get a copy of Fred Singer’s “Nature, Not Human Activity, Rules the Climate”, which gives a lay-level explanation of some of the issues in Douglass, Singer et al., which Santer+16 are contesting. It’s available via http://www.sepp.org or http://www.heartland.org.
Barclay E. MacDonald

Posted May 10, 2009 at 5:56 PM | Permalink

I appreciate your efforts to keep this thread on topic, but I kind of have to agree with Alan in 47 above. I keep coming back to the disclaimer at the PCMDI site:

“This data available on this site was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied …for the accuracy, … of any information, … or process disclosed”

So what good is the data for anyone …..(OK, I’m begining to wander far afield).
Dennis Wingo

Posted May 10, 2009 at 6:14 PM | Permalink

When he refused to provide data as collated in response to my original request, Santer had emphasized how easy it was to calculate the T2 and T2LT averages from the terabytes of data at PCMDI:

Steve this may be slightly tangental to the topic but I have been working with Apple computer on a project associated with the release of the next version of their Operating System Snow Leopard. It seems that there will be a native capability (through Open CL) to do some truely awesome number crunching using the main parallel Intel CPU’s as well as the Graphics Processor Units (GPU’s). I can’t say exactly what we are doing yet (I am giving one of the lunch talks at the Apple Worldwide Developer Conference in June) but we are going to very soon have the ability to extremely rapidly process tens of gigabyte data files. I think that this is going to take desktop computing to an entirely new level in regards to scientific data processing.

After my talk on the 12th I will post a link to the video. Here is the announcement on the Apple website.

http://developer.apple.com/wwdc/sciencemedicine/

If you are interested, check the Open CL Lab link.
- DaveR
  
  Posted May 11, 2009 at 1:35 AM | Permalink
  
  Re: Dennis Wingo (#51), Such a concept has already been up and running for over 2 years on Windows platforms. It gives a very substantial increase in speed to the folding@home distributed computing project.
Willis Eschenbach

Posted May 11, 2009 at 1:06 AM | Permalink

This dataset is still incomplete without the reference surface temperature data. They’ve given T2 and T2LT emulations. But we can’t replicate their results without whatever they might have used as the surface temperature in the tropics. This is the “base case” against which the tropospheric amplification is measured. Did they use sea surface temperature? Did they use the temperature of the lowest level of the atmosphere (1000 hPa over the ocean, less over land)? Did they use the SAT? NMAT?
.
Since they’re being reluctant with their data, I’ve tried three of the model results against the most logical candidate surface temperature (listed in the PCMDI database as tas). One worked perfectly. The other two didn’t work at all.
.
So I can’t replicate their amplification figures. Grrrr. I believe I have used the proper tas, as I had downloaded those model results before and checked them against the corresponding atmospheric temperatures. Always possible that it’s my error.

In any case it would just be easy for them to have included the surface data as well.
.
Unfortunately, the PCMDI datasets are huge, a quarter gigabyte or more each. I recently snagged some from a hotel while I was traveling, but here in the islands it’d be a half-day download. So I can’t check further.
.
Laugh or cry …
.
w
- Kenneth Fritsch
  
  Posted May 11, 2009 at 10:32 AM | Permalink
  
  Re: Willis Eschenbach (#52),
  
  This dataset is still incomplete without the reference surface temperature data. They’ve given T2 and T2LT emulations. But we can’t replicate their results without whatever they might have used as the surface temperature in the tropics. This is the “base case” against which the tropospheric amplification is measured.
  
  Willis E makes a very cogent point here. The essence of the Douglass/Santer debate is about the ratio (differences) of the tropical troposphere temperature trends versus those of the tropical surface temperature trends and differences between these ratios (differences) for the observed and the modeled.
  
  The proper test of this would be to calculate the troposphere to surface ratios (differences) for the model results and compare them statistically with the same ratios (differences) for the observed.
  
  Also the satellite comparisons confine one to bands of altitude in the troposphere while the radiosondes allows one to make many comparisons (modeled to observed) for smaller increments of the troposphere. We can argue about the relative errors between the satellites and the radiosondes just as we can about the difference between the UAH and RSS satellite results, but in the end they are all we have at the moment.
- Steve McIntyre
  
  Posted May 11, 2009 at 2:05 PM | Permalink
  
  Re: Willis Eschenbach (#52),
  
  Willis, KNMI has a web-based facility for downloading subsets of some model scenarios. While T2LT and T2 are not available at KNMI, surface temperatures for the 20cen runs are available. I’ve been experimenting off and on with a scraping program since their own access programs are the typical labor-intensive methods apparently still used in the “community”.
  
  I’ve collated tas data for the 19 Santer models and uploaded it to CA/data/santer/ensemble.trp.tab. A list of 19 objects, each a time series with each run in a column. Total size is 371K (as opposed to the terabytes that Santer insists be downloaded). I haven’t tried to match KNMI runs to Santer runs yet.
  - Willis Eschenbach
    
    Posted May 11, 2009 at 2:11 PM | Permalink
    
    Re: Steve McIntyre (#62), true, I’d forgotten about the KNMI data. While they have very little data at KNMI for the atmospheric temperatures, they have the surface temperature (tas). I’ll take a look at your data and see what it looks like.
    
    Many thanks,
    
    w.
    - Steve McIntyre
      
      Posted May 11, 2009 at 2:30 PM | Permalink
      
      Re: Willis Eschenbach (#63),
      Ive compared the Santer PCM and GISS_EH runs with KNMI tas runs and they seem to be more or less consistent using “natural” order.
  - Willis Eschenbach
    
    Posted May 11, 2009 at 7:29 PM | Permalink
    
    Re: Steve McIntyre (#62), you say:
    
    I’ve collated tas data for the 19 Santer models and uploaded it to CA/data/santer/ensemble.trp.tab. A list of 19 objects, each a time series with each run in a column.
    
    I looked in climateaudit.org/data/santer and can’t find anything called “ensemble.trp.tab” … you have further clues about its location or its real name when it’s at home?
    
    w.
    - Steve McIntyre
      
      Posted May 11, 2009 at 9:20 PM | Permalink
      
      Re: Willis Eschenbach (#68), sorry, it’s in CA/data/models/
    - pete m
      
      Posted May 11, 2009 at 9:23 PM | Permalink
      
      Re: Willis Eschenbach (#68),
      
      look under models tab
Amabo

Posted May 11, 2009 at 2:37 AM | Permalink

I’m using CUDA right now. It’s pretty efficient until you get to datasets that are larger than the memory allocatable on your GPU, after which you have to split your data into parts and up- and download your data to and from your graphic card. A serious efficiency choker.
Julius St Swithin

Posted May 11, 2009 at 2:39 AM | Permalink

It seems as if GCMs have problems with warming when it is not caused by CO2 and cooling when it is not caused by volcanoes. To see how models performed against observed temperatures have a look at:

http://www.climatedata.info/Temperature/temperature.html

Julius
Mike Lorrey

Posted May 11, 2009 at 12:03 PM | Permalink

Excellent work, Steve, I think looking through the top ten ranked models with high maximums is good, also rank up the minimums as well, that should help weed out what may be causing the anomalous drop in 1960.

That said, everyone should remember that all of these data sets are NOT actual Microwave Sounding Unit satellite data of tropospheric temperatures, they are models/simulations of *synthetic* data, i.e. what MSU data should look like if a particular model was accurate.

“5. Output files of synthetic MSU temperatures
There are a total of 98 ASCII files containing synthetic MSU temperatures from
49 simulations of 20th century climate change (16).”

As such, we need to compare these models against the actual MSU satellite data.

According to the referenced explanatory paper:

“The designations T4 and T2 reflect the fact that the original MSU measurements employed MSU channels 4 and 2 (subsequently replaced by equivalent data from other channels in the latest Advanced Microwave Sounding Units). The bulk of the microwave emissions monitored by channel 4 is from roughly 14 to 29 km above Earth’s surface (150 to 15 hPa). Channel 2 primarily samples emissions from the surface to 18 km (75 hPa). The T2LT retrieval is constructed using the outer and inner “scan angles” of channel 2, and is a measure of temperatures from the surface to 8 km (350 hPa). For further details of the atmospheric layers sampled by MSU, see (2).

2 Karl TR, Hassol SJ, Miller CD, Murray WL (eds). 2006. Temperature Trends in the Lower Atmosphere: Steps for Understanding and Reconciling Differences. A Report by the U.S. Climate Change Science Program and the Subcommittee on Global Change Research. National Oceanic and Atmospheric Administration, National Climatic Data Center, Asheville, NC, 164 pp.”
Steve McIntyre

Posted May 11, 2009 at 5:11 PM | Permalink

I downloaded the CNRM3.0 TRP surface series from KNMI and it did not have the goofy problem of the Santer version, though the form of the model is rather odd looking.

Update May 12: Note- see correction on KNMI collation below.
- Scott Brim
  
  Posted May 11, 2009 at 6:08 PM | Permalink
  
  Re: Steve McIntyre (#66)
  
  ” …. though the form of the model is rather odd looking.”
  
  Well, to my unpracticed eye, it sort of resembles a Scotch Terrier.
  .
  Does this pooch come with confidence intervals of some kind?
Ross McKitrick

Posted May 11, 2009 at 9:09 PM | Permalink

Steve, there’s something else odd about the data files. In both T2LT and T2T the Year 2000 occurs 13 times. 2000:1-2000:12 is followed by 2000:1, 2001:2, 2001:3, etc. After 2000 the years are off by one row.

Steve: There’s a trailer record. Delete the last row. I’ll do so some time
Steve McIntyre

Posted May 11, 2009 at 10:15 PM | Permalink

BTW I haven’t finished verifying this collation of KNMI data. It’s a newish script and bugs sometimes creep into these scrapes.
Willis Eschenbach

Posted May 12, 2009 at 12:19 AM | Permalink

Steve M. and Pete M., many thanks. I’ll report back.

w.
- Steve McIntyre
  
  Posted May 12, 2009 at 6:59 AM | Permalink
  
  Re: Willis Eschenbach (#73),
  
  I re-scraped the data fixing a problem in my read. Here is a replacement for graphic in earlier figure. (This was an issue only in the KNMI scrape, which I noted above, I was still verifying. The problem arose because I needed to change the number of columns in a fixed format read in my KNMI scrape – E-01 were not being read right.)
  - Willis Eschenbach
    
    Posted May 13, 2009 at 7:36 PM | Permalink
    
    Re: Steve McIntyre (#75), Steve, thanks for the fix. It has allowed me to calculate the amplification in the Santer datasets … not pretty. I’m writing a paper for the journals on amplification now, I’ll post the results for the Santer dataset as time allows.
    
    My best to all,
    
    w.
    - Steve McIntyre
      
      Posted May 13, 2009 at 7:43 PM | Permalink
      
      Re: Willis Eschenbach (#88),
      there seem to be some disrepancies between availability. KNMI only has 2 PCM models, while Santer uses 4.
Hu McCulloch

Posted May 12, 2009 at 6:54 AM | Permalink

RE #66,
In the 1980s, some French economists were interested in “bifurcation” models, systems of equations that could give either of two very different outcomes depending on small changes in some input. Perhaps they contributed to the Meteo France CNRM model as well?
lucia

Posted May 12, 2009 at 9:18 AM | Permalink

Hu
I always assume a screwup in my processing before suspecting bifurcations in model output. I tend to suspect the same in others, particularly those who have been as reluctant to archive data as Santer. (If this were an exciting birfurcation, it would merit a paper in itself!)
- bender
  
  Posted May 12, 2009 at 11:25 AM | Permalink
  
  Re: lucia (#76),
  
  I always assume a screwup in my processing
  
  Whenever I see a trend in data my first angle of attack is to ask whether there is a systematic bias in the way that the population was sampled or the attributes were measured. Similarly, anomalies usually indicate correctable errors.
Kenneth Fritsch

Posted May 12, 2009 at 9:52 AM | Permalink

When I was playing around with the RSS and UAH troposphere temperature series I noticed that using the annual in place of monthly results yielded serial correlations of the regression residuals such that the AR1 adjustments were reduced to the point of giving more evidence of statistically significant differences between observed and modeled trends. Has anyone here looked at those differences or plans to look at them?

I also will repeat a question that was not answered on another thread here at CA:

If the serial correlations of the residuals are found to be not statistically significant should a correction for it be applied?
Hu McCulloch

Posted May 12, 2009 at 1:30 PM | Permalink

RE Steve #75,

Here is a replacement for graphic in earlier figure.

Does this replace the graph in #66 (with the apparent “bifurcation”), or Figure 3 in the original post (with its 13°C step between 1900 and 1960)? [Steve: the KNMI scrape in #66 NOT the Santer 13 degrees. ]

RE Ken Fritsch #77,

I also will repeat a question that was not answered on another thread here at CA:

If the serial correlations of the residuals are found to be not statistically significant should a correction for it be applied?

Good question — On the one hand, it’s generally best to keep your model simple unless there is good reason to make it more complicated. So if you can’t reject zero serial correlation, it’s reasonable to impose that assumption. (This is the principle of “Occam’s razor,” but all too often people insist we slash our wrists with it…)

On the other hand, if the s.c. is truly zero, as long as it’s being estimated consistently you’ll get the right answer even if you (unnecessarily) correct for it.

And on the third hand (we economists have a lot of hands!), if you have such a short sample that it would be hard to detect serial correlation even if it were present, yet with other, similar data you regularly see serial correlation of say 0.3, it would be reasonable to correct with that figure without trying to estimate it from your own data.
lucia

Posted May 12, 2009 at 1:55 PM | Permalink

Kenneth Fritsch

If the serial correlations of the residuals are found to be not statistically significant should a correction for it be applied?

The answer, as with most things is “that depends”. If you had a theoretical basis for expecting some particular magnitude of serial autocorrelation, you could use that level of autocorrelation. In which case, this is part of your hypothesis of the data, and you explain that. Alternatively, if you have tons of past data estabiishing some ‘known’ constant, utterly true, always existing level, you could correct based on that.

But if you are just data mining, and find a small, statistically insignificant autocorrelation, you should probably assume the autocorrelation is zero. (On the other hand, if you are getting some number above 0.5 and excluding it, you might realize you really have so little data you can’t begin to say. Be cautious about any conclusions. Better yet, do it both ways and tell people what you get both ways.

Oh… I should have read Hu’s answer. Anyway, I’m an engineer and give pretty much the same advice as a three handed economist. (But then, in the lab, the rule is “take s*** wads of data, if you want uncorrelated, wait long enough between samples. That way, you avoid complicated statistics.)
- bender
  
  Posted May 12, 2009 at 4:12 PM | Permalink
  
  Re: lucia (#81),
  Ken:
  If a series is not serially correlated (correlation does not attain significance), then there is no serial correlation to correct for. The lower the significance, the more trivial would be the correction.
  .
  Now, whether your sample is representative of the population about which you would like to make an inference is another question. Short time-series may exhibit neglible serial correlation simply because the statistics have not converged as they would with a long time-series. You might want to correct a sample based on what you know to be true for the population that is supposedly represents.
  - Kenneth Fritsch
    
    Posted May 13, 2009 at 12:14 PM | Permalink
    
    Re: bender (#82),
    
    I have previously asked a second question that has not been addressed:
    
    If I have a monthly data series and I find it highly correlated, I do my AR1 adjustments with the resulting decrease in the degrees of freedom. Monthly data gives me more degrees of freedom than the corresponding annual data series, but can give some or all of those degrees of freedom back with the AR1 adjustments.
    
    If I assume that the adjustment for serial correlation of the regression residuals is an imprecise one, why would I not use the annual data series? I have observed UC state more than once at CA that the best way to handle serial correlation is to avoid it.
    
    I did a statistical comparison of the UAH and RSS satellite tropical troposphere temperature trends against the modeled trends using both monthly and annual data series and found that statistical significance was more readily obtained using the annual data series with the decreased AR1 values.
    - bender
      
      Posted May 13, 2009 at 9:09 PM | Permalink
      
      Re: Kenneth Fritsch (#87),
      
      Monthly data gives me more degrees of freedom than the corresponding annual data series
      
      More, yes. Twelve times as much, no. Most of the monthly data is redunant as far as low-frequency climatic fluctuations are concerned. Most of the information contained in the twelving of the observations is the seasonal cycle, which is extraneous noise as far as internal climate variability and responses to external forcings are concerned. IOW the monthly data don’t increase the information content (and degrees of freedom) nearly as much as you seem to think.
      .
      Not sure what your question is.
    - Kenneth Fritsch
      
      Posted May 14, 2009 at 1:42 PM | Permalink
      
      Re: bender (#90),
      
      The degrees of freedom to which you excerpted from my post and I was refering were those used in calculating the standard error without making any adjustments for serial correlation of the residuals.
      
      Actually, Bender, my point is the same as yours and I understand that subdividing the annual data does not provide 12 times more data. In fact my point was, and is, that I see monthly data series used frequently when annual data would appear to provide more straight forward data given its reduced tendency to have serial correlation of the regression residuals. At least a trend calculation providing all the adjustments and standard errors for both monthly and annual data would be more informative.
      
      So, no, my question has not been answered – which put more directly would be why is there a seemingly large preference to use monthly time series with all the serial correlations of residuals when annual data could be used more staightforwardly.
    - bender
      
      Posted May 14, 2009 at 8:24 PM | Permalink
      
      Re: Kenneth Fritsch (#91),
      
      So, no, my question has not been answered – which put more directly would be why is there a seemingly large preference to use monthly time series with all the serial correlations of residuals when annual data could be used more staightforwardly.
      
      Ah. Well, I could guess why they do it. But then I’d be accused by DaveR of mind-reading.
    - Mark T
      
      Posted May 14, 2009 at 2:13 PM | Permalink
      
      Re: bender (#90),
      
      Most of the information contained in the twelving of the observations is the seasonal cycle
      
      “Twelving” made me chuckle. 🙂
      
      Anyway, if you assume the seasonal (yearly) variation is the highest frequency of interest, Nyquist theory states that you only need twice the bandwidth of the “signal,” or two samples per year, to keep all of your information. It logically follows that any higher sample rate is redundant (in the case of an analog to digital conversion, there is reason to go quite a lot higher, but that is not germane to this discussion). It also logically follows that any seasonal information would be aliased back in if one only used annual data, and would show up as a low frequency term.
      
      Not sure if this helps toward your question, however, Kenneth. Maybe monthly data with a 6 month LPF cutoff would remove the residual problems?
      
      Mark
    - Mark T
      
      Posted May 14, 2009 at 3:12 PM | Permalink
      
      Re: Mark T (#92),
      
      It also logically follows that any seasonal information would be aliased back in if one only used annual data, and would show up as a low frequency term.
      
      Of course, since the annual data is presumably a mean for the year, i.e., an integrate and dump operation, there’s no real concern unless you absolutely desire the seasonal variation (for whatever reason).
      
      Mark
Greg F

Posted May 12, 2009 at 7:15 PM | Permalink

Dave,

You can define any path you want. If your using Windows it is easier to create the folders you want and copy and paste the address. It is best not to save your data in the root of the C drive (that is the drive you boot from). I would suggest going to “My Documents” and creating a folder or folders there. Open the folder the data is going into and copy the address from the address bar at the top. Paste it into the R code to replace the path Steve is using. Steve’s path is:

save(tam,file=”c:/climate/data/models/santer/tam.tab”)

I created a folder with the path:

C:\Documents and Settings\Greg F\My Documents\My Data

Append tam.tab:

C:\Documents and Settings\Greg F\My Documents\My Data/tam.tab

It looks like you need to replace the back slash with forward slashes:

C:/Documents and Settings/Greg F/My Documents/My Data/tam.tab

The save command will now look like this:

save(tam,file=”C:/Documents and Settings/Greg F/My Documents/My Data/tam.tab”)
Laws of Nature

Posted May 12, 2009 at 11:17 PM | Permalink

Dear Dave,

in “Windows-R” you could also set a working directory
(with “change Dir” in the “File”-tab).
I found that very useful since it allows you to simplify the file operations. .

Cheers,
LoN
- Dave Dardinger
  
  Posted May 15, 2009 at 1:55 PM | Permalink
  
  Re: Laws of Nature (#86),
  
  Thank you for the hint. That let me save the file properly (I’m just getting back to R since I’ve been busy with other things till today.)
John Ritson

Posted May 15, 2009 at 7:00 AM | Permalink

I’ve learnt R this week, what fun.
Here is my first contribution:

These are some selected rows from CNRM3.0
It looks like a MCMK type problem they have run into.
Each part of the globe gets an error of different magnitude, at least the sign is always the same.

[9808] ” NO RTIME MS1800 GLOBAL NH SH NHHL NHML NHLL SHHL SHML SHLL TROP
[10288] ” 480 1899.9166 1199.0000 -0.9937 -0.9119 -1.0754 0.4406 -1.0786 -1.1522 -0.7002 -1.0025 -1.2294 -1.1892″
[10289] ” 481 1900.0000 1200.0000 8.8407 8.4133 9.2676 3.4258 6.6583 11.0346 4.0506 7.8419 11.7094 11.7167″
[11008] ” 1200 1959.9166 1919.0000 9.3575 9.0609 9.6534 4.3603 7.4355 11.5106 5.2422 8.2988 11.8272 11.9896″
[11009] ” 1201 1960.0000 1920.0000 -0.8382 -0.9180 -0.7583 -0.2712 -0.7613 -1.2061 0.1493 -0.4824 -1.2034 -1.2610″
lucia

Posted May 15, 2009 at 11:55 AM | Permalink

Ken

So, no, my question has not been answered – which put more directly would be why is there a seemingly large preference to use monthly time series with all the serial correlations of residuals when annual data could be used more staightforwardly.

I can only speak for myself when answering.

I use monthly data because, if I assume noise is AR(1 and run monte-carlo test using lag-1 auto-correlations of the level I see in the data, t-test using monthly data has more power than those using annual average data. So, I reduce type II error at the same level of type I error.

It’s not a big difference. But dealing with the auto-correlation under the assumption the noise is Red is not onerous, so why “simplify” and accept somewhat higher type II errors?

Plus, often “simplifying” means I can’t use the most recent data. I can understand ending in Dec. for peer reviwed papers that take months to go through the process, but why ignore Jan-Nov data in December?
- Kenneth Fritsch
  
  Posted May 15, 2009 at 12:38 PM | Permalink
  
  Re: lucia (#97),
  
  I use monthly data because, if I assume noise is AR(1 and run monte-carlo test using lag-1 auto-correlations of the level I see in the data, t-test using monthly data has more power than those using annual average data. So, I reduce type II error at the same level of type I error.
  
  I do not think that what you found above is always the case. I have found the opposite.
  I am not so sure that the ARn adjustments are that well accepted – why else would there exist so many methods of adjustemnt.
  
  Plus, often “simplifying” means I can’t use the most recent data. I can understand ending in Dec. for peer reviwed papers that take months to go through the process, but why ignore Jan-Nov data in December?
  
  On this part I totally agree with you -and particularly where a tin of fudge could depend on the trend result using the latest month’s results.
  
  Another question: If monthly data is net/net better than annual could weekly data be better than monthly and daily better than weekly -assuming at least a ten year trend?
lucia

Posted May 15, 2009 at 3:02 PM | Permalink

Ken–
I ran tests using synthetic data that was AR(1) with lag 1 autocorrelations near what we see in monthly temperature data. I don’t know if it’s always true or just true near the regions I tested using synthetic data. I don’t know what happens if the noise has a different structure. But, if I’m already assuming AR(1), I figure if I am going to assume AR(1) in both instances, then selecting the method with the higher power under the assumption that the statistical model one is applying is true makes some sense.

Also, if I had data where the monthly lag-1 autocorrelation was 0.85 and statistically significant given the number of monthly data, but the annual lag-1 autocorrelaiton was 0.14 but not statistically signficant because I have too little data, I would be very cautious about analyzing the annual average data without accounting for the autocorrelation in the tme series. You know the autocorrelation shouldn’t be zero based on the strong monthly value. If the an analysis that ignored autocorrelation says “reject” but the one that said r=0.14 sayd “fail to reject”, what do you conclude? (How about you bender?)

This sort of thing is why I think the answer to whether or not we neglect autocorrelation when it’s not statistically significant is “it depends”. In this case, the monthly data are sufficient to give strong confirmation to the notion that the lag 1 autocorrelation for annual average data is not zero.

Mind you, if your point is that we can minimize the impact of making different assumptions about the noise structure if we use annual averaged data, I agree. If I were designing a lab experiment, I would avoid putting myself in the position of having to nake assumptions about the structure of the noise by either a) taking so much data it almost doesn’t matter or b) spacing samples at least 3 integral time scales apart so the data points are effectively uncorrelated.
- Kenneth Fritsch
  
  Posted May 15, 2009 at 4:50 PM | Permalink
  
  Re: lucia (#100),
  
  I ran tests using synthetic data that was AR(1) with lag 1 autocorrelations near what we see in monthly temperature data. I don’t know if it’s always true or just true near the regions I tested using synthetic data. I don’t know what happens if the noise has a different structure. But, if I’m already assuming AR(1), I figure if I am going to assume AR(1) in both instances, then selecting the method with the higher power under the assumption that the statistical model one is applying is true makes some sense.
  
  Synthetic data with AR1 correlations would not necessarily or always simulate the real world. I did the Santer based observed versus modeled temperature trends on an annual and monthly basis (corrected for AR1 correlations of the residuals where it was large for monthly and negligible for annual). I found the annual data in that case showed more statistical differences than the monthly.
  
  I appreciate your replies and those of others here on these matters and realize that there are not necessarily always yes and no answers. Lucia, could you direct me to your Monte Carlo simulations and testing the Type I and II power of monthly and annual data series?
  
  Many temperature series have daily data, but I do not believe I have seen papers published on temperature trends using daily data.
lucia

Posted May 15, 2009 at 3:25 PM | Permalink

Ken

Another question: If monthly data is net/net better than annual could weekly data be better than monthly and daily better than weekly -assuming at least a ten year trend?

if the noise is AR1, then I suspect that 10 years worth of daily data will be better than 10 years worth of annual average data from the point of view of type II error. I’d have to check specifically with montecarlo. Or, I could fiddle around with some equations to figure out the expected value of the uncertainty in the trends drops as we switch from annual average to monthly to daily data.

You never get around the problem that you can’t ever know the precise nature of the statistical model with any certainty.
Larry T

Posted May 15, 2009 at 3:26 PM | Permalink

One of the things that bothers me with the whole average temperature thing is that we basically use 2 points to get the average. I will use this example at 12:01am a cold front comes in and drops temp by 20 degrees and stay constant til 11:59 pm following day when a warm front warms by 20 degrees. The average temperature for the 2 days would be approxiamtely minimum + 10 degrees where true average is very close to minimum. Integrating to get the average station daily temperature based on 2 points is very inaccurate compared to using 24 points (hourly) or 24*60 (minute) or or 24*60*60 (second) or even near continous as we can get in this computer age. The further integrating this to get a monthly, season or yearly avearge station temperature and then further integrating to get state, regional, national, continental, hemisphere or world temperature averages strikes me as meaningless.
Larry T

Posted May 15, 2009 at 3:27 PM | Permalink

sorry — I forgot to use spell checker first
lucia

Posted May 15, 2009 at 3:35 PM | Permalink

LarryT–
You are describing what amounts to measurement uncertainty in the system. All measurement systems contain uncertainty. Given your specific example, one imagines that on some other day, a warm from comes in doing more or less the opposite. Statistical analyses actually handle that type of uncertainty rather well. It’s not a bias.

The siting issues are much more perplexing. If a tree slowly grows and shades a sensor, that could create a trend that has nothing to do with climate in the grid cell the experinentalists intend to observed. I suspect over decades, encroaching or receding cities, station moves, poor station siting and slowly deterorating housing for the sensors are potentially much bigger issues than measuring only twice a day.
Larry T

Posted May 15, 2009 at 3:57 PM | Permalink

Lucia –

My work as a professional applied mathematician in my career was doing orbital and trajectory analysis of missiles, satellites, and deep space missions including one manned Apollo mission where I was using measurements to at least 8 decimal places and use double precision was mandatory so extraction of .1 trends in data measured twice a day to .5 or .1 degrees seems very unscientific.
lucia

Posted May 15, 2009 at 4:40 PM | Permalink

LarryT–
Sure. You need that sort of precision for space missions. That doesn’t mean averages can’t be useful in some context or that the effect of measurement uncertainty in an average quantity doesn’t diminish when we take lots of data. Global surface temperature is an averaged quantity. Monthly averaged global temperature is averaged over space and time. The effects of a cold front making an instantanous daily measurement wrong will be cancelled by the warm fronts. That doesn’t do you any good if you want to figure out whether you need to wear a coat on tuesday, but it does do use some good when trying to detect trends over centuries.

But bias errors can be killers.
lucia

Posted May 15, 2009 at 6:59 PM | Permalink

Ken-
It’s true synthetic data aren’t necessarily real world data. Synthetic data are useful for answering certain questions about statistical techniques.

I did comparisons for AR(1) at home, but I only documented cases with gaussian white noise at my blog. (A href=”http://rankexploits.com/musings/2009/have-you-wondered-why-i-use-monthly-data/”>here.)

I didn’t check a the specific issue you allude to which is: How often does the annual average data happen to result in a “statistically significant” result when the monthly don’t. Instead, I looked at the question: If the data have structure X, how often do we make certain types of mistakes using a statistical method.

I’m a bit curious about the first question and I think it now may be worthwhile to document further. (I thought more than I did previously would bore readers.)
steven mosher

Posted May 16, 2009 at 8:50 AM | Permalink

re 102.we may prefer integrating temp over a time period, but what we have to work with is a historical fact. The data wasn’t collected that way. At a typical station a min/max thermometer is read once a day. two measurements are
recorded. A daily min and a daily max. The observer rounds the observed measurement to the nearest degree. Later
these two numbers are averaged and the result ( for that day ) is rounded.

But WRT to the issue of integrating the data versus recording the high and low and averaging them the fundamental questions are:

1. Does it bias the estimation of the daily average?
1. Does it it bias the trend estimation?

Somewhere around here on CA I linked to an article that empirically suggests that there is no systematic bias introduced. You can check this for yourself by downloading real time data from an ASOS or from new stations in the CRN :

http://www.ncdc.noaa.gov/crn/hourly
and you can get it every 5 minutes if you like
http://www.ncdc.noaa.gov/crn/hourly?sensmon=1&station_id=1007&year=2009&month=05&day=16&time=0500&timeZoneID=US/Alaska&category=exp

here’s how the hourly temp is calculated.

http://www.ncdc.noaa.gov/crn/officialtemp.html

And I found a nice little directory that I overlooked before
http://www1.ncdc.noaa.gov/pub/data/uscrn/documentation/research/

i suggest the power point on the density study… and the doc file
steven mosher

Posted May 16, 2009 at 9:11 AM | Permalink

hmm. interesting bias in averaging the min and max

wwwmaths.anu.edu.au/events/IBS03/biometrics%20conference/ Roger%20Littlejohn.ppt

and

http://www.engr.udayton.edu/weather/source.htm
- John Baltutis
  
  Posted May 16, 2009 at 2:35 PM | Permalink
  
  Re: steven mosher (#110), first link is broken. Try IBSO3 talks and click on the Roger Littlejohn link.
Steve McIntyre

Posted May 16, 2009 at 10:54 AM | Permalink

The PCMDI extraction of CNRM data for CNRM has a very weird pattern in which values are extracted by decade for this model (but not for any others other than CSIRO Mk 3.5, which wasn’t used by Santer.)

Here’s a screensave of the PCMDI directory for SST (tos) – I presume that temperature (tas) will be similar. Note the tremendous economy from Geert’s program, which gives you the desired monthly series without having to download and process 400+ GB of data. Santer and PCMDI are employed to make this data available – Geert of KNMI works part-time on this and his program (tho far from perfect) is much better than what Santer provides.
Willis Eschenbach

Posted May 16, 2009 at 3:03 PM | Permalink

Steve, all of the PCMDI extractions have different time periods. I suspect it depends on file size. Some of them have big one file going back to 1850, while others have split it into three or more files. Some, as you point out, are as much as 15 files for that time period.

PCMDI desperately needs a decent interface to these files. I can only download a quarter gigabyte file when I’m on the road in some hotel, the connection here in the Solomon Islands sucks.

Geert’s program is much, much better, but it doesn’t contain but a fraction of the atmospheric data (most models have no atmospheric data at KNMI). Geert has been most helpful, but it is not really his job.

w.
Hu McCulloch

Posted May 19, 2009 at 10:01 AM | Permalink

RE Lucia #100,

Also, if I had data where the monthly lag-1 autocorrelation was 0.85 and statistically significant given the number of monthly data, but the annual lag-1 autocorrelaiton was 0.14 but not statistically signficant because I have too little data, I would be very cautious about analyzing the annual average data without accounting for the autocorrelation in the tme series.

If the monthly data is AR(1) with rho = .85, then the lag 12 correlation should be .85^12 = .142. However, the annual averages will not be AR(1) with this rho, since Dec and Jan still have correlation .85. In fact, r1 for the annual averages will be .334. However, r2 for the annual averages will only be .048, far less than .334^2 = .112, so that the annual averages are no longer exactly AR(1).

In any event, AR(1) should only be regarded as an approximation to the actual pattern of serial correlation. With monthly data, AR(2) might be identifiable in Santer’s data (20+ years), but it’s unlikely that anything more than AR(1) could be identified with the annual data. It’s probably safest to try both monthly and annual and see if they’re consistent.

But you’re right that if AR(1) is highly significant in the monthly data, you’re justified in correcting the annual data for SC, even if it’s not significant at the annual frequency.
lucia

Posted May 19, 2009 at 10:09 AM | Permalink

Hu–
Yes. I know you need to adjust the form of the correlation when I average over 12 months. In fact, I even posted on it several times. I’ll run that more rigorously and check if things change. (I admit.. I didn’t even though I knew. ) I also want to look at the synthetic series to see not only the type II errors, but to see how often the embedded annual averages relate to the specific monthly ones. (I didn’t do that.)

In anycase, I think whatever one does, it’s best to not ‘shop around’. That is: you shouldn’t run both and then pick the one that gives you the answer you want in the specific case being tested. (That’s not to say you should never look at both. But the criterion for picking which you take more seriously shouldn’t be whichever gives the answer you prefer.)

I’m going to run some more synthetic data cases to see if I change my mind about annual vs. monthly on this bases.
lucia

Posted May 19, 2009 at 3:27 PM | Permalink

Ken–
I reran so I could compare something I hadn’t checked before, but which your questions triggered.

This time I did the following:
1) Created 96 months of synthetic data
2) Created 8 years of equivalent data by computing annual averages based on the 96 months just computed.
3) I repeated this 64,000 times to create 64,000 test.

I did this for a case where the monthly data are AR(1) with ρ=0.5 for the known lag 1 correlations for monthly data. The 8 year data is not AR(1), but it is what you get if the monthly data are AR(1).

I then used the Nychka method Neff/N= (1+r)/(1-r) to correct the uncertainty bounds and found the number of false rejections when I applied the 95% confidence limits.

For the monthly data, I got 6.581% false rejecctions (That is: Type I error is higher than intended. I could fix this up– but I don’t want to worry about it yet.

Now, for the annual average data, I didn’t quite know what to do to figure out the type I error for what you actually do. The average lag 1 autocorrelation based on 8 years worth of data was -0.253. For most cases, the lag 1 autocorrelation would not be significant. However, for some it would. So, when you apply your method, do you apply the Nychka correction based on the observed correlation? Or not?

Anyway, I decided that I’d just pretend the residual are uncorrelated no matter what I got for lag 1 correlations. In this case, I get the number of false rejections is 3.009% when I should get 5%. So the method gives too few rejections. (Type I error too low.)

On the other hand, if I apply Nychka for any and all values of sample autocorrelation, I get 15.683% false rejections when I should get 5%. (This is really bad, and likely no one would do this method. But the reason it happens is the lag 1 autocorrelation tends to be negative, so applying the correction often narrows the uncertainty intervals.)

I could do other things and report various results. But, before I do, could you explain your full decision making chain? That way, I can compute the type I and type II errors the method as implemented. That is: How do you test whether the lag autocorrelation is statistically significant? When it is, do you use Nychka? Even if the lag-1 autocorrelation is negative? Etc.
- Kenneth Fritsch
  
  Posted May 19, 2009 at 6:10 PM | Permalink
  
  Re: lucia (#116),
  
  Lucia, I just read your post. I will look up my Santer related calculations and get back to you. I used the Nychka correction for the monthly AR1 and as I recall the annual also as I do not recall checking any of the AR1 or ARn residuals for statistical significance. I may be able to link to the CA post where I posted my results. As I recall Hu M made a comment along the lines that he did above – but also something about the leveraging of data at the ends of the time series.
lucia

Posted May 19, 2009 at 8:18 PM | Permalink

Ken–I found a slight error in my spread sheet which wil change some numbers. I’ll do the type II error under the assumption that you didn’t check the lag coefficients after applying the annual average.
lucia

Posted May 19, 2009 at 9:47 PM | Permalink

Ken–

Using the synthetic data above, I do find using monthly data has greater power when I set test to have identical type I errors.

In the test, I found that using the annual average data created by averaging 12 months of months data, when I apply a t-test using p=95%, I get 5.95% false rejections. (That is, I say I find a statistically significant difference between the individual synthetic run and the known trend I used to generate it.) I should get 5%.

Using the Nychka method on monthly data, I get too many rejections. So, to fix this, I added a fiddle factor of 0.204/sqrt(N) to the ordinary Nychka method to change the ‘effective number of data points. (This is cludgy, but Nychka recommended something similar. I changed the value of 0.68 he recommentes to 0.204 because that resulted om 5.92% which is very close to the type I error when I use annual average data and ignore the AR1 coefficient.

So, both these tests match type I error.

So, then I looked at the power (power is 1-type II error.)

To find power, you have to chose an alternate hypothesis. I picked an alternate hypothesis of 0 C/cnetury which I know is wrong because I generated the trends using 2 C/century. For this case, the power of the tests using monthly data is 60.4%; the power using annual average data is 52.4%. I tested other alternates; the power chagnes, but monthly data always wins.

Obviously, this is only 1 test. But I’m pretty sure we will find that if we correctly identify the statistical model match type I error, using monthly data is nearly always going to have greater power than using annual average data.

One reason I suspect this is that the standard deviation of the 64,000 trends is slightly smaller when I use monthly data. Also, I computed the standard deviation of the residuals, sy, for the 64,000 synthetic runs. I find the standard deviation of sy is a smaller fraction of the average of sy with monthly data. This means that no matter what statistical model we assume to hold, the results for the monthly data are less noisy. That has to be a point in favor of using monthly data.

Of course, with real data, we don’t know the statistical model. But I don’t see how that can suddenly make using the annual average data obviously better. It might be better in the sense that people aren’t going to argue so much about the assumptions you made about the statistical models, but this comes at the expense of making more type II errors because you lost informatin when you averaged the data before applying least squares.
Kenneth Fritsch

Posted May 20, 2009 at 8:16 AM | Permalink

Lucia, I reported my results of reducing the adjusted trend slope standard error (not standard deviation as referenced in the posts) by using the annual data and the Nychka adjustment at the thread “This Gets Even More Amusing” in posts #30 and #32 at the link listed below:

http://www.climateaudit.org/?p=4195

Using monthly data from the RSS T2LT temperature series for the tropics from Santer et al. (2008) and the corresponding annual data where TSD is the unadjusted trend SEM, ARC is the AR1 auto correlation of the regression residuals, ARFact is the NYchka factor for adjustment and ATSD is the adjusted trend SEM:

1979-1999 monthly: TSD = 0.031; ARC = 0.884; ARFact = 4.29; ATSD = 0.132
1979-1999 annual: TSD = 0.085; ARC = 0.111; ARFact = 1.13; ATSD = 0.096
1979-2007 monthly: = 0.0173; ARC = 0.874; ARFact = 4.01; ATSD = 0.069
1979-2007 annual: TSD = 0.047; ARC = 0.018; ARFact = 1.02; ATSD = 0.047

The trends for these series were in the neighborhood of 0.150 degrees C per decade.

In the meantime I want to go back and double check my calculations and better document what I did and reference any calculations I did subsequently on troposphere and surface trend differences using annual and monthly data.

My review of the old posts on Santer et al. reminded me of all the debate we had on using standard deviations versus standard error in the Santer paper calculations and my initial confusion on that issue. The Santer authors (and Gavin Schmidt at RC) seemed to be contradicting themselves about Douglass being wrong in using the SEM instead of SD when in fact they used a form of SEM in the paper. I had to read the paper a second time to pick up on the contradiction. MM, in their unpublished comment paper to Santer et al., mention this issue.

I had forgotten my initial confusion on the SEM issue in the Santer paper and was thinking that I had understood it from the start – not. Had my wife been involved in these discussions I would not have forgotten my past mistakes so conveniently.

As an aside, Lucia, there might be differences between monthly and annual data for Type I and II errors depending on the length of the time series. I noticed that you are using an 8 year time series whereas my series were for 21 and 29 years.
- Kenneth Fritsch
  
  Posted May 20, 2009 at 8:33 AM | Permalink
  
  Re: Kenneth Fritsch (#120),
  
  I presented my difference calcualtions between the surface and the troposphere T2LT and T2 tropical temperature trends in the several posts in the same thread following my Post #32.
  
  http://www.climateaudit.org/?p=4195
  
  See also Hu M’s comment on trend dfs and autocorrelation at Post #64 of the the same thread.
lucia

Posted May 20, 2009 at 8:35 AM | Permalink

Ken–
Is your argument that in this specific case using annual average data happens to result in a “significant” conclusion and monthly does not? I sometimes get similar things in my synthetic data. When I test for type I error, sometimes annual average data will get the false rejection but monthly won’t and vice versa. Sometmes both do. (Most of the time, neither do.) Similar things happen with the typeII error. But, on average using monthly data results in fewer errors over all.

So, I conclude, we should give monthly data precedence over annual average even if we prefer the answer with annual average data.

With synthetic data, we can know whether a particular result was an error or a correct result. But you can’t know when applying to the troposphere.

There could e a difference in the rate of type I and type II errors depending on the length of the time series. I could check. But…. I doubt there will be a difference.

You know the real problem with the Santer test in the troposphere. It’s not the annual vs monthly. It’s the volcanoes.

The Santer test assumes the data are a realization of a process that is “determistic linear trend” + “AR1 noise”. When computing the uncertainty intervals, all deviations from the straight lind are treated as “random noise”.

But, if you get the tropospheric temperature trends from the models, you will discover the ensemble average over all models during that 21 year periods is not linear. So, some of the residuals from the line are deterministic not noise. In the limit that weather had no noise, the Santer method would treat these as noise with a correlation very near 1, create humongonourmous error bars, and you would not be able to reject any observed trend no matter what.

For surface measurements during this period, this makes the Santer method over estimate the uncertainty intervals such that a 95% confidence limit gets ratched up to something like 97%. I don’t know what it is for the Troposphere, but when Steve gets all the data from 1970-now sorted out, it would be worth examining.
- Kenneth Fritsch
  
  Posted May 20, 2009 at 10:49 AM | Permalink
  
  Re: lucia (#122),
  
  Is your argument that in this specific case using annual average data happens to result in a “significant” conclusion and monthly does not?
  
  At this point it has to be my argument as I have not looked at general cases or the theoretical. I may want to do some simulations on monthly and annual data.
  
  But, if you get the tropospheric temperature trends from the models, you will discover the ensemble average over all models during that 21 year periods is not linear.
  
  I think that the observed trends are also probably not necessarily linear either and there has been discussion of that and the possibility of break points in the time series –somewhat, I think. in the vain of the Koutsoyiannis process to which Bender refers in the above post.
bender

Posted May 20, 2009 at 9:38 AM | Permalink

You guys (Hu, lucia, Ken) are getting a little too serious about the use of AR1&2 models to represent real-world temperature data. A Koutsoyiannis process (Kolmogorov, fracdiff, yaddayadda) is not an AR process. Therefore all these various error models are going to be pretty poor approximations. Look at how your AR models perform within a given series. I bet the estimated AR coefficients are not stable through time.
lucia

Posted May 20, 2009 at 12:58 PM | Permalink

Bender–
I’m not suggesting that AR(1) is “the truth”. Ken asked why I pick monthly data rather than annual averaged data when doing analyses. I pick it because I think that, all other things being equal, using monthly data rather than averaging first to create annual average data, results in tests wtih greater statistical power.

On the one hand we can’t know the true statistical model. But we can assume one and, given the assumed statistical models, determine which method has greater power. I told Ken I checked for two cases: Linear trend with white noise and linear trend with AR(1). In principle, I could check with any number of statistical models. But… well, that would be time consuming. But, it does seem to me that if to do an analysis, we make an assumption about the statistical properties of the noise, then, we should select the use of monthly rather than annual average data based on which which method has greater statistical power under the assumption already made.

Ken… Roger Sr. suggested a reference for a manuscript I put together. And…. I’m going to look at something. But, Tom Wigley has a section discussing what to do when the “noise” in two separate time series are autocorrelated. You don’t treat them the way Santer treated them. And guess what? The “noise” from Pinatuo (and others) a) must be autocorrelated across runs of models and b) it must be correlated with the earth’s data series.

So, I’m going to do create a time series by creating the multi-model average over all runs and subtracting observations. The do method of Nychka on that.

What you are testing here is if the differene has a trend. And, if I’m imterpreting the Wigley paper, this is the right hting to do. (His discussion relates to comparing UAH to RSS.)
lucia

Posted May 20, 2009 at 1:07 PM | Permalink

Bender–

Look at how your AR models perform within a given series. I bet the estimated AR coefficients are not stable through time.

I’m absolutely sure they will be unstable. If nothing else, the assumption that the series over the 20th century is a linear trend + noise is poop. It’s known to be poop by everyone. So, in that context, I restrict my choices to: Given that we are making assumption “X”, which method would be best if assumption “X” was true.

Does my doing this necessarily mean I believe “X” true? Not necessarily. But sometimes it’s worth making assumptions for the purpose of trying to pin something down. I don’t know how this fits into the way real statisticians approach problems. But I guess being an engineer, I don’t see it as much different from “If the thermocouple reached steady state, what would the temperature be?” “If we account for phenonma A, B and C, ignoring D on the evolution of temperatures for the thermistor, what do we discover?” We can get that answer and then later investigate the effect of “D”. Doing the analysis ignoring D does not mean that we absolutely, positively are sure D should be ignored. It only means we want to know the answer under a certain specific set of assumptions.
Kenneth Fritsch

Posted May 22, 2009 at 10:13 AM | Permalink

Lucia, my primary reason for noting the annual versus monthly differences was to determine what others here had experienced and their thoughts. I appreciate your comments and others. I am certainly not advocating, nor would I be qualified to advocate, for annual over monthly in all or even some cases of time series. It appears that monthly data are used in published papers most often for moderately length temperature series.

I was attempting a thought experiment while planting annuals and it will probably reinforce the fact that I am a better gardener than would-be-statistician. It had to do with the end point data on a monthly series. I then kind of got into an argument with myself and decided that it would be an opportune time to use some R code to do some synthetic analyses with monthly versus annual time series. It is discussions like this one that provide me the incentive to look deeper and with more detail into these statistical choices.

I agree that doing the analyses by differences, where practical, is the preferred method of comparison and can make determining statistical significance less complicated.

My only problem with using monthly data lies in the extreme case where if one has a time series (or other series, for that matter) where subdividing the data only provides more degrees of freedom and one does not attempt to compensate for auto correlation of the residuals, one would with high probability be struck dead by the Stats gods. In this extreme case (not saying it can be related directly to temperature series) I am thinking that the end to start of the subdivided parts would cause problems for an AR1 adjustment.

The critical question about using annual versus monthly (if one assume reasonably good adjustment for auto correlation of the monthly data) then becomes one of expectations of temperature trend changes occurring on an annual or monthly basis – or, putting both of these propositions in question, if the occurrences are on a decadal or longer basis.
lucia

Posted May 22, 2009 at 3:23 PM | Permalink

Ken–
We had a frost last Sunday. Fortunately, I covered my annuals in pots, and had deferred planting other annuals. I did sprinkle basil seeds the day after the frost. I have extra packages of seeds, so I figure it’s worth the risk to sprinkle 1 set.

Yes. You do need to compensate for the autocorrelation if you use monthly data (or any data that is correlated.) There might be some combination of spectral properties, averaging windows and amounts of data that sometimes make using annual average better. I tend to doubt it– but I haven’t actually tested anything more than white noise (where monthly always beats annual average) and the case where the monthly data is AR(1) with lag1 correlations about what we are seeing, and the length of data sets I’ve been using.

My motivation in doing this was that people (particularly those who don’t like what I find) frequently suggest that somehow I’m picking monthly because i “like” the results or I’m picking OLS when in fact some other idiosyncratic method they dreamed up would be “better”. Anyway, I wanted to have a specific criteria to select a method that is suggested vs. what I am currently doing. So, my method is: If both are applied at the same confidence level, pick the method with lower type II error. Then, I figure out of two suggested methods that a) use the same assumptions about the noise and b) use the same confidence level, which has lower type II errors.

Monthly has been winning out when I’ve checked which is better given a set of assumptions. I’ve gotten monthly is better ‘every’ time I’ve checked. (But ‘every’ means very few cases.) I anticipated this answer–but if someone found something else with other assumptions, I’d believe them. (Then, I’d rerun the numbers. But that’s me. 🙂 )
UC

Posted May 24, 2009 at 12:41 PM | Permalink

Assumption of trend+AR(1) cannot hold for all time-scales (daily, monthly, annual), as the downsampling operation breaks the autocorrelation structure. Maybe we need to start with daily data (e.g. http://www.engr.udayton.edu/weather/ ) and find proper continuous process model for ‘weather noise’ ( AR(1) — http://en.wikipedia.org/wiki/Ornstein-Uhlenbeck_process , etc. )
- Steve McIntyre
  
  Posted May 24, 2009 at 1:38 PM | Permalink
  
  Re: UC (#129),
  
  Hmmm, interesting point. A “month” is not a period that has any obvious meaning. As you observe, monthly AR1 must come somehow from daily and annual structures.
  
  It’s ironic that modern climate science applies a period preserved from the Babylonians or Assyrians or even earlier. Perhaps in addition to her other attributes, Ishtar is the goddess of AR1.
  - Kenneth Fritsch
    
    Posted May 24, 2009 at 3:00 PM | Permalink
    
    Re: Steve McIntyre (#130),
    
    A “month” is not a period that has any obvious meaning.
    
    Tell that to any female member of the family.
    
    Actually the Q1-Q4 that economists employ might work better for climate science.
Hu McCulloch

Posted May 24, 2009 at 6:09 PM | Permalink

RE Lucia, #116,

Now, for the annual average data, I didn’t quite know what to do to figure out the type I error for what you actually do. The average lag 1 autocorrelation based on 8 years worth of data was -0.253. For most cases, the lag 1 autocorrelation would not be significant. However, for some it would. So, when you apply your method, do you apply the Nychka correction based on the observed correlation? Or not?

The traditional Durbin-Watson protocol is to ignore negative serial correlation in the residuals and to only worry about positive serial correlation. For this reason, the DW tables are 1-tailed with an null of zero SC and an alternative of positive SC.

The reason for this is presumably that positive serial correlation in the errors typically makes OLS t-statistics overstated. Good statisticians never want to overstate their results, but can live with understated results.

Furthermore, if there is zero serial correlation in the errors themselves, the residuals of a trend regression will be expected to have a negative first order serial correlation, especially when the sample is small, as shown in your simulation with 8 years of data, or 30 years as in the STHTWLSFGJKKMNSSW, aka Santer08, data.

In fact, as I show in my recent working paper at http://www.econ.ohio-state.edu/jhm/papers/MomentRatioEstimator.pdf, a zero r1 from the residuals in a trendline regression indicates modest positive serial correlation in the errors. But it won’t be significant, so you’re justified in just ignoring the problem in most cases.
- Kenneth Fritsch
  
  Posted May 25, 2009 at 11:29 AM | Permalink
  
  Re: Hu McCulloch (#132),
  
  STHTWLSFGJKKMNSSW, aka Santer08, data.
  
  In fact, as I show in my recent working paper
  
  Your working paper would appear very lonely compared to the 17 authored Santer et al. (2008).
  
  I have been attempting to better understand ARIMA and ARMA models so that I can properly simulate some temperature series. I found this link to be useful for me:
  
  http://www.duke.edu/~rnau/411arim.htm
  
  On my initial simulations (with large AR1 correlations) I was seeing significant auto correlation of residuals out to very high orders (greater than AR(10)). Then I discovered the pacf function in R to go with the acf and quickly discovered that it was AR(1) that was contributing to all those higher order correlations. Now watch me abuse the pacf function.
- lucia
  
  Posted May 25, 2009 at 1:25 PM | Permalink
  
  Re: Hu McCulloch (#132),
  
  a zero r1 from the residuals in a trendline regression indicates modest positive serial correlation in the errors. But it won’t be significant, so you’re justified in just ignoring the problem in most cases.
  
  Yes. That’s what I get. I get negative serial autocorrelation.
  
  When I asked my question, “you” referred specifically to Kenneth. He’s trying to explain some specific results he got using the Santer tropospheric data and I’m trying to do comparable synthetic runs to see what I find end up being the type I and type II errors for his method overall.

4 Trackbacks

By How would Figure 9.5a from the AR4 look today? | The Blackboard on May 11, 2009 at 11:15 AM

[…] May, 2009 (11:15) | Data Comparisons Written by: lucia While visiting Climate Audit I clicked a link to web page that reminded me that that Chapter 9 of the WG1 contribution to the […]
By Celebrity Paycut - Encouraging celebrities all over the world to save us from global warming by taking a paycut. on May 11, 2009 at 12:57 PM

[…] visiting Climate Audit I clicked a link to web page that reminded me that that Chapter 9 of the WG1 contribution to the […]
By Arthur’s Questions about Figure 9.5 | The Blackboard on May 12, 2009 at 1:24 PM

[…] Now, assuming I had a list that told me precisely which models to exclude from the hind-cast, to perfectly replicate the graph in Figure 9.5a, I would need to obtain PCMDI data and recompute the monthly data masking as discussed in a cited reference. Presumbably, that could be done. I’ve heard rumors it’s a snap, so presumably, I quickly and easily write a script to do as great a job as Santer evidently did with the CNRM3.0 runs for the Tropical Troposphere, (see discussion by SteveM.) […]
By CMIP Data or What I’ve been up to. « Scientific Prospective on May 15, 2009 at 1:45 AM

[…] like those used by Santer that are readily available on the PCMDI website. Steve M. recently posted on a strange anomaly (pun) with the CNRM3.0 model showing a maximum value of over 13 °C: I […]

Climate Audit