I tested some cells which were outliers under the ARMA(1,1) model. Here’s the result of the first cell that I looked at: the top panel shows the ACF – which has an unusual structure to say the least. The temperature anomaly plot is shown in the second panel and is also unusual to say the least.
I started by selecting cells for which the AR1-MA1 coefficients >1.9, looking at the cell with the maximum value first – which was gridcell 2.5N, 12.5E, which seems to represent a location in Congo (not Zaire), plotting the autocorrelation function and then the temperature plot.. Obviously some gridcell values are wrong. Here is an excerpt from the data set, indicating that 10 values are completely wrong. This is a 2003 vintage of the data set and I will update to 2005. It’s possible that this has been picked up in later updates, but, even if it has, these datasets have obviously been extensively used in each edition.
It seems pretty sloppy in a dataset that has supposedly been intensively scrutinized and peer reviewed by stadiums of scientists. It’s not like it took me a very long time to see this defect. I was under the impression that CRU was supposed to have quality control systems in place to pick up egregious outliers like this. It would be a good idea for some one to scrutinize the procedures and see what happened in this case.
Does this sort of error "matter"? I don’t know. The first problems that I noticed with the Mann data set were little things.
Another curiosity in this data set which may indivate a more serious type of problem: notice the episode of values in the 19th century. How on earth could the editors of this dataset purport to guarantee "homogeneity" from that data to the 20th century data with over 50 years gap in the data?