As a technique, CPS is little more than averaging and shouldn’t take more than a few lines of code. You standardize the network – one line; take an average – one more line; and then re-scale to match the mean and standard deviation of your target – one more line. (I’m not commenting here on the statistical merits of CPS – merely what implementing it in a modern programming language.) However, Mannian CPS takes pages of code, and, needless to say, everywhere there are snares for the unwary. I guess Mann learned his programming from Hansen and passed the style along to his students.
We had previously spent some time starting down this analysis before we were so rudely interrupted by Santer et al 2008. So let’s pick up where we left off.
We had just finished smoothing the proxies. This had its own set of complications as Mann used a butterworth filter on uncentered data. This led to peculiar end point problems requiring long end point padding to control the effect of the butterworth filter. Emulating this in R was complicated by the odd problem that R filtfilt operated differently than Matlab filtfilt. Roman M identified and resolved this by writing a pretty function filtfiltx, which emulated Matlab filtfilt, enabling an emulation of Mann’s smoothing function in R.
Aside from the fact that I’m used to R, going through the exercise of translating Matlab into R is helpful in walking through each step, always bearing in mind UC’s “Implicit Assumption Advisory.”

On to gridboxcps.m.
After smoothing the proxies using Mannian smoothing, they smooth the “instrumental” data – now the “instrumental” data appears to be related to CRU data, but has been “infilled” back to 1850 in many gridcells that don’t actually have 19th century data. Mannian correlations are often between infilled proxies and infilled data. The instrumental data is smoothed using the same smoothing function as the proxies. The comments in the program say that the data is smoothed with a “lowpass filter with a cutoff frequency of 0.05 — 20 years”, but the parameter
cutfreq is set at 0.1 (10 years). It’s a bit of a guess as to which was actually used for the archived runs, so I worked this through using cutfreq=0.1, presuming that the comment is incorrect, as opposed to the wrong version being archived (but either is possible.)
Averaging into Gridcells
The next step is to “average all proxies … into 5×5 grid boxes”. This is something that should be pretty routine, but hey… If I were doing the exercise from scratch, I would do a one-time assignment of a gridbox number (I use the “Jones” sequence N to S, Dateline to Dateline) in my proxy “info” database and use this information whenever required. This is easy enough to do.
Mann does something a little different. Instead of assigning each proxy to a gridcell, he makes an empty dummy array of 1732 gridcells and looks for proxies contained within each gridcell, a procedure repeated over and over, a piece of pointless inefficiency in rococo GISTEMP style. However, in addition to the inefficiency, each search looks for proxies that in the “less than or equal to” gridcell. In the AD800 network that I’ve been looking at, Mann has (incorrectly) used coordinates of 12.5N 50E for the Socotra Island speleothem. On its face, the logic of the program includes this proxy in two gridcells: 12.5N 47.5E and 12.5N 52.5E. This problem was noted a couple of weeks ago by Jean S and UC, who observed that that the regts series in two gridcells were identical in an early network (Socotra Island.)
Given that a relatively small number of proxies are exactly on gridcell boundaries, my procedure would be to re-examine the source documents for a more precise location to take the lat-long off dead center. In the case of Socotra Island, the matter is easily resolved as the original reference places the site at 12.5N 54E, removing the problem in this instance.
Because the Mann algorithm permits proxies to be located in more than one gridcell, the emulation of his method (using a concise style) is complicated somewhat. Instead of having one gridbox associated with each proxy, you need a list of 1-4 gridcells associated with each proxy. I did this in what I thought was a fairly pretty way. I have a function jones that locates the Jones gridcell for a given lat-long. I created a new function outerjones that yielded 1-4 gridcells in the Mann style and made a list for all 1209 proxies for repetitive use. The outerjones function perturbed each lat-long location plus-minus 0.0001 deg and found the unique gridcells (4 on the exact corner). This eliminates one repetitive step.
outerjones=function(x) unique( c(outer (x[1]+c(-.0001,.0001),x[2]+c(-.0001,.0001),jones) ))
outerlist=apply(details[,c(“lat”,”long”)],1,outerjones)
To simplify the programming, for a given step, I then made a subset of the proxies that (1) started in time; (2) passed the Mannian correlation step (3) starting the collection at the start of the Mannian step. For the AD1800 step, this network goes from 800-1996 and has 18 series.
tempi=!is.na(proxys[period_MBH[k]-tsp(proxys)[1]+1,])&passing;sum(tempi)
temp=(time(proxy)>=period_MBH[k])
working=proxyscaled[temp,tempi];dim(working) #1197 18
dimnames(working)[[2]]
We then determine the number of gridcells attached to these proxies using the list that we created once (rather than cycling through 1732 gridcells over and over.) In this case, there are 12 such gridcells (including two cells for Socotra Island.) The list of gridcells can be obtained in one line as follows:
idsort= sort(unique(unlist(outerlist[tempi]))) ;K=length(idsort);K #12
# 94 245 328 402 661 780 922 955 1126 1127 1462 1626 1938
Next we set up array, one column for each of the above gridcells.We name each column by its Jones gridcell number (which is not the same as the Mann gridcell number – Mann goes from S to N). I use Jones numbers because I’m used to them.
grids=array(NA,dim=c(nrow(working),K) )
dimnames(grids)[[2]]=idsort
We then average the standardized proxies within each gridcell, again emulating the Mannian calculation in a more intelligible style. One of the few annoying features of R is that some important functions (apply) apply only to matrices; the average function shown here will average matrices but won’t fail with a vector.
average= function(X) if( is.null(ncol(X) )) X else apply(X,1,mean,na.rm=T)
For each column in the gridcell, the attached proxies are picked out from our standard list and averaged (they have already been standardized in Mann style). Using techniques like lists and lapply enables one to accomplish pages of programming in one line and, more importantly, to avoid the endless subscripting that characterizes Mann-Hansen programming.
for (i in 1:K) {
tempr= !is.na(unlist(lapply(outerlist[tempi],function (A) match(idsort[i],A)) ) ) # this picks out the relevant series in gridcell
grids[,i]=average( working[,tempr])
}
grids=ts(grids,start=period_MBH[k])
But aside from style, one feels that clearer programming would have identified the apparent defect by which one proxy is assigned to multiple gridcells.
Calibrate Proxies Against Instrumental Data
In a typical CPS calculation, the proxies are assumed, rightly or wrongly, to be temperature proxies and calibration is done after they are averaged. Mannian CPS calibrates on more than one occasion and again it appears that programming issues lead to some unexpected results.
This step is done in the gridboxcps.m program after the comment % calibrate proxies against instrument data. The calibration is a usual Team method: re-scale and re-center the proxy to the mean and standard deviation of the “target” instrumental data. As the program is written, this requires that target instrumental data exist for the gridbox (or gridboxes) to which the proxy is attached. What happens if there is no target instrumental data in the required box? As written, the program omits the affected proxy. In the AD800 network, two series are affected: Agassiz melt and Central Greenland GISP. As far as I can tell, they do not enter into subsequent CPS calculations. So at the end of these two operations, Mannian CPS has double-counted Socotra Island and removed Agassiz and Central Greenland, leaving us with 11 gridcells. [see below as this doesn’t seem to be right for the AD1000 network examined by UC. UC’s compilation below doesn’t contain Tornetrask and Tiljander, but does contain GISP and Agassiz – I’ll check with him on this, as this seems odd.]
Update- I’ve now compared UC’s graphic here against my emulation. I can identify all 10 series in his graphic, which confirms the double-counting of the Socotra series, but UC’s chart here includes the Agassiz and GISP series (but excludes a number of series that my emulation would have included – so I’m missing something here. The caption below this figure identifies the series.

Left column, top to bottom: Socotra Island speleothem (C13-O18 combination); 2- Punta Laguna sediment combination; 3- Graybill foxtail (Campito Mt ca533); 4- Scotland speleothem(Baker); 5- Graybill bristlecone (nv512); Right top to bottom; 1 – Duplicate Socotra speleothem; 2- inverted Dongge cave speleothem; 3- Shihua cave speleothem; 4 – GISP O18; Agassiz melt.
Here’s how I emulated this calculation (the first line coordinates between Jones-sequence gridcell numbering and Mann-sequence gridcell numbering):
test=match(idsort,info$jones);temp=!is.na(test);sum(temp);info[temp,]
mean.obs= info$mean[test];sd.obs=info$sd[test]
mean.proxy=apply(grids[(1850:1995)-tsp(grids)[[1]]+1,],2,mean)
sd.proxy=apply(grids[(1850:1995)-tsp(grids)[[1]]+1,],2,sd)
regts=array(NA,dim=c(nrow(grids),ncol(grids)))
dimnames(regts)[[2]]=dimnames(grids)[[2]]
for( i in (1:ncol(grids))[!is.na(mean.obs)]) {
regts[,i]=(grids[,i]-mean.proxy[i])*sd.obs[i]/sd.proxy[i]+mean.obs[i] }
regts=ts(regts[,!is.na(mean.obs)],start=tsp(grids)[1]);dim(regts)
Re-Gridding
The Mann SI states:
In regions within North America and Eurasia with especially dense spatial coverage, grid boxes were first averaged over 15° resolution boxes before inclusion in the spatial average to diminish potential bias due to variations in spatial sampling density (see Fig. 1B).
In the computer code, this is done as follows. All of the gridded series located south of 30N are retained as is. The gridded series north of 30N are re-averaged into 15×15 degree boxes. In the AD800 network, this results in the Tornetrask and (upside-down) Tiljander sediments being averaged. We now have 10 gridboxes, plots shown below (with names of contributing series attached):
As in other studies, for analyis of the period back to AD800, the number of contributing proxies is very small. Claiming the use of 1209 proxies is mostly rhetorical. Many of these proxies are already familiar to us.
Tornetrask, Graybill bristlecones, Quelccaya and Cook’s Tasmania reconstruction are all staples of all the reconstructions. The Graybill bristlecone chronology is one of only a couple of stong HS-contributors. The other main HS contributor is the Tornestrask-Tiljander combination. We’ve already discussed the Tiljander series – Tiljander reported that the modern portion of this series has been disturbed; Mann inverted the interpretation of the authors and shows the series upside down from the interpretation of the authors. (Obviously the authors don’t think that the data went hugely down in the 20th century, that’s why they said the modern portion shouldn’t be used.) But you can see that Mann’s error in the Tiljander series is not simply incidental.
Most of the other series are speleothems, a class of proxy relatively undiscussed at CA. Shihua Cave was used in Moberg; the South African speleothem was used in Loehle, with this use being criticized by Gavin Schmidt. (Loehle used the grey-scale series, Mann used C13 and O18 – some of Schmidt’s disputes with Loehle would seem to apply to Mann as well.) Dongge Cave and Socotra Island are both speleothems as well. I plan to do specific posts on these proxies.
NH and SH Averages
Finally, Mann calculates NH and SH averages, re-scaling the averages to the target CRU (or HadCRU) hemispheric temperature. As far as I can tell from the code, each of these reconstructions to a different “target” is a linear transformation of one another, with only the mean and standard deviation of the recon varying somewhat. In each case, the reconstruction is a weighted average of the contributing proxies, with the Mannian weights varying a little due to things like re-gridding, intermediate rescaling etc.
Though Mann has provided a commendable amount of information on the reconstruction, I was unable so far to locate anything other than spliced versions of output (as with MBH98). Below is a comparison of my emulation (spliced) to the spliced NH reconstruction (CRU target). It isn’t quite the same – even when you’re spotted the data and code, it’s never easy repeating a Mann calculation, but it’s pretty close – certainly sufficiently close to feel justified in doing sensitivity analyses, like the impact of using Tiljander data the right way up or Ababneh bristlecones, which I’ll do another day.
