Here are some notes on my attempts to replicate Juckes’ CVM calculations, together with a script.
I can replicate some reconstructions very closely – e.g. Esper and Jones within less than a tenth of a degree of the archived CVM, but other replications,including the Union reconstruction, are not as close. In each case, I checked the closeness of the CVM-replication by calculating the correlation to the archived CVM series and the range of discrepancies. There are a few other puzzles listed below, including Juckes’ use of the “unadjusted” Mannian PC1. Maybe Juckes will be prepared to clarify some of the problems that I encountered; but, if not, maybe others can solve the Juckesian conundra. The script contains two functions: juckes.cvm – which calculates cvm reconstructions using a network of proxies and a target period; and verification.stats which generates calibration r2,RE and verification r2,RE and CE statistics.
For nearly all cases, I first did a calibration on 1856-1980 following Juckes, and, in addition, did a calibration on 1902-1980 with verificaiton on 1856-1901 following MBH. I presented results last December at AGU on verification statistics for the archived reconstructions noting the high calibration r2 statistics, negligible verification r2 statistics and typically high danger-zone calibration DW statistics.
Juckes archived his data in zipped directories. The files are not particularly large and it would be convenient if they were directly readable without having to manually unzip them. You also need to be familiar with the ncdf format to start using the series. These are not big obstacles, but I don’t understand why the series couldn’t have been saved in text files. To save interested readers the labor of doing these collations, I’ve archived proxy and CVM reconstruction data from Juckes’ website in *.txt files so that interested readers can verify analyses without having to fool around with Juckes’ zip files or download ncdf packages.
The Esper CVM reconstruction is based on only 5 series – of which two are foxtail sites within about one inch of one another. These two series are said to be “independent”, although they obviously aren’t.. (In some chronologies, sites as close as these two foxtails sites would be included in the same chronology (e.g. Tornetrask, Yamal). Both foxtail sites recur in the Union reconstruction again as “independent” sites. Here I got a very close replication – four 9s correlation and within 0.01 deg. I’m not sure why the replication wasn’t exact – one should be able to get exact replication at this stage. However, the closeness of the replication indicated that the emulation method was correct i.e. scale on 1000-1980; average; and re-scale on 1856-1980 target. The Esper CVM, calibrated on 1856-1980, had a high r2 in 1902-1980 and ~0 in 1856-1901. The RE statistic (1856-1901 here as later in this note) was 0.11 (0.06 for 1902-1980 calibration). In my script, I’ve shown results for calibration on 1856-1980 for each case – the results are typicallly high calibration r2, ~0 verification r2 and RE values a little higher than calibration on 1902-1980. HOwever, to summarize the discussion, results in the paragraphs below will be based on 1902-1980 calibration reserving a 1856-1901 calibration period (and readers can consult the script for further details.)
This uses 6 series, including 3 SH series. Again I got pretty much exact replication, although I’d like to get exact replication. The calibration r2 was 0.20; verification r2 was ~0 and verification RE was -0.60. I’ve discussed some particular problems with this reconstruction – the ad hoc Tornetrask “adjustment”, which is carried into the Union reconstruction and the questionable 11th century Polar Urals record, also carried into the Union reconstruction.
Here my emulation had three 9s correlation, but the scale was somewhat off for unknown reasons leading to discrepancies of up to 0.05 deg C. I have no idea why. Juckes uses 10 of 18 proxies, excluding the Sargasso Sea proxy (which seems like one of the best SST proxies) and the Yakutia/Indigirka proxy – that’s the one that Juckes challenged me to show was a proxy. Once again, a characteristic pattern of statistics – high calibration r2 (0.37), negligible verificaiton r2 (0.0004) and low RE (0.11).
Again my emulation had four 9s correlation, but a slight scale discrepancy leading to occasional discrepancies of over 0.12 deg C. This reconstruction only went to 1960 (because of the divergence problem), is heavily smoothed and is 2nd-generation cherry-picking of proxies from prior studies. It has characteristic high calibraiton r2 (0.37), negligible verification r2 (0.002), but highish RE (0.34 for 1856-1980 calibration and 0.48 for 1902-1980 calibration) This higher RE doesn’t mean that it’s “right” merely that, as readers can surely see, the RE statistic is sensitive to tuning the change in mean in the series.
The Union CVM
This is another second-generation exercise in which proxies are selected from prior studies, which themselves already contained a significant element of data mining and data snooping. .The use of two foxtail series and the Yamal substitution are symptomatic. Here I got a worse replication than in the cases discussed above. I only got 0.95 correlation and discrepancies ranged as much as 0.20 deg C. This is rather puzzling in view of the other replications. The calibration r2 was 0.49, while the verification r2 was 0.001; the verification RE was 0.22. Upate – Jean S observed that Juckes flipped the Chesapeake Mg/Ca series. Re-doing the calculation with a flipped version of this series improves the emulation correlation to 0.98 and reduces the discrepancies a little, but there’s something still off. Interestingly, the verification RE increased to 0.42 merely by flipping the Chesapeake series. Since there’s a rather well-established physical connection between temperature and Mg/Ca ratisw, the flipping seems a little opportunistic.
Here we start getting into some replication puzzles. I had some difficulty sorting out these reconstructions. Juckes twitted me for taking a long time to figure out his rms normalization and indeed that did take a long time, as the methodology was nowhere mentioned in the text and has not previously, to my knowledge, been used in paleoclimate studies. In my opinion, if I have difficulty in figuring these things out, I would assume that any other reader would as well.
A first complication for CVM analysis is that one does not know the exact orientation of any PC series as used in Juckes’ CVM reconstructions from the archive. For example, the unfixed PC1 (series 20) and fixed PC1 (series 32) in Juckes’ archive are opposite. Some series are flipped for the reconstruction. I don’t disagree with flipping PC series when there is a natural interpretation justifying one orientation. To replicate the CVM reconstructions, you have to try to figure what orientation was used for each PC series Juckes used. I ened up doing a reverse engineering (of a type that Jean S and I frequently do with Mannian mysteries) in which I regressed the reconstruction against the candidate proxies and used the sign of the coefficient as an input to the CVM emulation. This worked pretty well for accomplishing an emulation although the signing process needs to be examined. On an earlier occasion, I asked Juckes for details providing exact references for which PC series (including data citations) were used for which reconstruction. Juckes was very unhelpful.
Base Case: Series #3 appears to be the Mannian CVM version used in Juckes’ spaghetti graph. Here there was a real surprise – although one should never be surprised when the Team is involved. When I reverse -engineered series 3 against all the candidate series, including both fixed and unfixed proxies, I got a much higher correlation using the unfixed PC1 (0.9999922) – one of the best correlations that I got, combined with a close range. So it appears highly likely that the Mannian CVM was constructed with the unfixed PC1. For this reconstruction, the calibration r2 is 0.29; verification r2 is 0.008; RE a surprisingly low 0.002. (Jean S has recently noticed that the MNH99 reconstruction has a low RE statistic). Update: Juckes does say that the spaghetti graph uses the unadjusted PC1. In MM05 (EE) as discussed in another thread, we observed that the key issue in MBH99 North American network was simply the dominance of bristlecones through having so many of them in the longer data set, rather than the artificial dominance in MBH98 through the mathematical artifice of the Mannian PC methods
French Extrapolation: This is a totally irrelevant and inconsequential variation. Here Juckes assesses the impact relevant and inconsiders the impact of extrapolating a French series. This variation uses the unfixed PC1 plus extension by persistence of this one series. Results are very similar to the Base Case, both in terms of closeness of replication and for verification statistics. The French series is really indistinguishable from adding a series of white noise in these reconstructions.
The “Adjusted PC1” – The Juckes SI stated that the “pc” reconstruction used the “unadjusted first proxy PC”. This does not appear to be what was done. The reverse-engineering through regression of the reconstruction on the candidate proxies indicates that the adjusted PC1 was used. However, there’s avery interesting twist. In this reconstruction, Jucjes used the adjusted PC1 in its inverse (downward-pointing) orientation. With this particular combination, I get a correlation of 0.996 and range of about 0.05 deg – which is about what I’ve been able to get for any of Juckes’ CVM versions of Mann. So I feel confident that this is what Juckes did. The calibration r2 is low (0.13); the verification r2 is its usual ~0 and it has a negative RE. While these statistics aren’t very good, they aren’t much different than for the Jones reconstruction.
As a caveat: I do not accept that the Mannian “adjustment” to the PC1 is “correct” in any sense as this adjustment is a can of worms which (somewhat surprisingly) I’ve not discussed on the blog. I will try to do so as Jean S is interested in this issue and others should be.
Mannian PCs – Juckes Edition: Juckes re-calculated PCs on a slightly reduced version of the MBH AD1000 network and reported reconstructions for several cases. The first variation (suffix “mbh”) uses re-calculated Mannian PCs (despite this method being condemned by all parties) using 25 proxies instead of the 27 in the MBH network. The two exclusions are two bristlecone series which end in the 1970s – Methuselah Walk (lower border) and Upper Timber Creek. Inconsistently, Juckes uses the Methuselah Walk series (ca535) in its upside down version in one of the two versions used in Moberg (Moberg inadvertently used two versions from the same site). These PCs are rather similar to Mannian PCs – Juckes uses an “unfixed” version of the PC1. This version has a calibration r2 of 0.28, verification r2 of 0.008 and an RE of -0.001.
Mannian PCs – Juckes Variation – The “mbhx” reconstruction results from another pointless variation. Here Juckes uses Mannian PCs using a calibration of 125 years instead of 79 years. I would presume that the bias would be less in such circumstances. Since the methodology is condemned, I see little purpose in analyzing this variation of the condemned methodology. The variation has virtually identical statistical performance as the prior case : calibration r2 of 0.30, verificaiton r2 of 0.007, verification RE of 0.001.
Correlation PCs on slightly reduced network (“std”) – The “std” PCs use the archived “std” PCs, which are equivalent to correlation PCs (the chronologies divided by their standard deviation). Juckes uses the slightly reduced network of 25 series. in Juckes’ AD1400 network, there are versions for the network without closing extrapolations and for the network with closing extrapolations. Although the formalism is carried over to the AD1000 network, PC series 16-20 are identical to PC series 1-15 and do not show results with the expanded network with closing fills. The purpose of having both versions on file is unclear. In our EE article, we observed that the emphasis on bristlecones in the AD1000 network arises simply from longevity without a mathematical artifice as in MBH98, so one wouldn’t expect big differences between versions, but there is some difference. Statistics are similar – the calibration r2 for the CVM was 0.30; verification r2 improved by an order of magnitude – from 0.001 to 0.1, and verification RE slightly improved to a low 0.04.
Covariance PCs on reduced network “cen” – again I got a decent replication of the “cen” series by using the “cen” PCs without “adjustment other than flipping PC series. . Results were similar – calibration r2 0.27; verification r2 a very “strong” 0.016 and RE of 0.012.
So where does that leave us: first, some puzzling replication anomalies, most of which should be resolvable. The most serious questions are whether the Mannian CVM in the spaghetti graph uses the unfixed PC1 and whether the “pc” variation unintentionally used an inverted PC1. The more serious issue pertains to the handling of verification statistics. Obviously there’s been lots of publicity and discussion of the failure of the verification r2 statistic. Mann told the NAS panel that he did not calculate the verification r2 statistic as that would be a “foolish and incorrect thing” to do (although there is indisputable evidence that he did calculate it.) Wahl and Ammann argued that correlation was not a relevant statistic for the low-frequency of interest to paleoclimate and advocated exclusive use of the RE statistic (a statistic curiously not reported by Juckes). Juckes didn’t report verification r2 statistics? Did Juckes, like Mann, calculate verification r2 statistics, not like the results and not report them? Or was there a little data snooping even on the calculation of verification statistics – with Juckes attempting to avoid an unpleasant answer by not calculating the statistic?