Today I wish to re-visit an issue discussed in one of the very first CA posts: did MBH98 actually use the PC retention rules first published in 2004 at realclimate here and described both there and by Tamino as “standard” selection rules.
Much of the rhetorical umbrage in Tamino’s post is derived from our alleged “mistakes” in supposedly failing to observe procedures described at realclimate. As observed a couple of days ago, Tamino has misrepresented the research record: both our 2005 articles observed that the hockey stick shape of the bristlecones was in the PC4. In MM 2005 (EE), we observed - citing the realclimate post as Mann et al 2004d:
If a centered PC calculation on the North American network is carried out (as we advocate), then MM-type results occur if the first 2 NOAMER PCs are used in the AD1400 network (the number as used in MBH98), while MBH-type results occur if the NOAMER network is expanded to 5 PCs in the AD1400 segment (as proposed in Mann et al., 2004b, 2004d ). Specifically, MBH-type results occur as long as the PC4 is retained, while MM-type results occur in any combination which excludes the PC4. Hence their conclusion about the uniqueness of the late 20th century climate hinges on the inclusion of a low-order PC series that only accounts for 8 percent of the variance of one proxy roster.
Mann, M.E., Bradley, R.S. and Hughes, M.K., 2004d. False Claims by McIntyre and McKitrick regarding the Mann et al. (1998) reconstruction. Retrieved from website of realclimate.org at http://www.realclimate.org/index.php?p=8.
In MM2005 (EE), we observed correctly that you got a HS-shaped reconstruction if you use 5 PCs (including the bristlecones in the PC4), while you don’t get one if you use 2 PCs. (We considered many other permutations in MM2005 (EE), including correlation PCs; Tamino’s not the first person to criticize us without properly reading our articles.) Despite the above clear statement, Tamino distorted the research record by falsely alleging that we had failed to consider results with 5 PCs:
When you do straight PCA you *do* get a hockey stick, unless you make yet another mistake as MM did. .. When done properly on the actual data, using 5 PCs rather than just 2, the hockey stick pattern is still there even with centered PC — which is no surprise, because it’s not an artifact of the analysis method, it’s a pattern in the data.
Here Tamino, as he acknowledges, relies heavily on Mann’s early 2005 realclimate post where Mann stated: :
MM incorrectly truncated the PC basis set at only 2 PC series based on a failure to apply standard selection rules to determine the number of PC series that should be retained in the analysis. Five, rather than two PC series, are indicated by application of standard selection rules if using the MM, rather than MBH98, centering convention to represent the North American ITRDB data. If these five series are retained as predictors, essentially the same temperature reconstruction as MBH98 is recovered (Figure 2).
So what exactly is this “mistake” that we are supposed to have made? We said that you got a HS reconstruction with 5 PCS; so did Mann and Tamino. We said that you didn’t get a HS reconstruction with 2 PCs, so did Mann and Tamino. Their argument, as I understand it, is that it is a “mistake” to do a reconstruction with 2 PCs rather than 5 PCs, as 5 PCs are mandated by “standard selection rules”.
I’ve reviewed what Preisendorfer and others have said about determining “significance” of PCs – that PC analysis is merely exploratory and Rule N (or similar rules) merely create a short list of candidate patterns; they do not themselves establish scientific significance. As Preisendorfer said, there is no “royal road” to science. Someone somewhere has to do the grunt work of showing that the PC4 has scientific validity as a temperature proxy, a Rule N analysis can’t do that.
But today I want to discuss something quite different and something that has really annoyed me for a long time. For all the huffing and puffing by Mann and Tamino about Preisendorfer’s Rule N being a “standard selection rule” or a “correct” way of doing things, Mann failed to produce the source code for the Preisendorfer tree ring calculations when asked by the House Energy and Commerce Committee for MBH98 source code, even though it was a highly contentious issue where implementation errors had already been alleged.
Worse, as shown below (re-visiting a point made in early 2005), it is impossible to reproduce the observed pattern of retained PCs shown for the first time in the Corrigendum SI of July 2004. MBH98 itself made no mention of Rule N in connection with tree ring networks, referring instead to factors such as “spatial extent” which have nothing to do with Rule N:
Certain densely sampled regional dendroclimatic data sets have been represented in the network by a smaller number of leading principal components (typically 3–11 depending on the spatial extent and size of the data set). This form of representation ensures a reasonably homogeneous spatial sampling in the multiproxy network (112 indicators back to 1820)
In fact, when one re-examines the chronology of when Rule N was first mentioned in connection with tree rings, it is impossible to find any mention of it prior to our criticism of Mann’s biased PC methodology (submitted to Nature in January 2004) and Mann then noticing that the hockey stick shape of the bristlecones was in the PC4, a point first made in Mann’s Revised Reply to our submission, which presumably was submitted around May 2004. The Supplementary Information 1 to that submission was substantially identical to the later realclimate post (this post itself being one of the very first dated posts, actually preceding the start-up of realclimate on Dec 1, 2004 – evidencing perhaps a little too much interest in the matter on their part, something that is worth noticing. )
I’ve been substantially able to replicate the methodology illustrated in the realclimate post and I’ve applied the methodology to all other MBH98 network/step combinations, resulting in some disquieting inconsistencies. The figure below shows on the left the realclimate Rule N calculation for the AD1400 NOAMER network. The 4th red + sign marks the eigenvalue of the bristlecone PC4 upon which the MBH98 reconstruction depends in this period. The red + signs show eigenvalues using a centered (covariance) calculation; the blue values are using Mannian PCs. The two lines show simulated results generating random matrices based on AR1 coefficients. On the right is my emulation of this calculation – which, as you can see, appears to be identical. The eigenvalue information shown here is completely consistent with the information in MM 2005 (GRL) and MM2005 (EE).
The blue arrow shows the eigenvalue of the bristlecone-dominated PC using Mannian methods; the red arrow show the eigenvalue using a centered calculation. These results were reported in MM2005 (GRL) where we stated that explained variance of the bristlecones wsa reduced from 38% (the blue arrow) to 8% (the red arrow) and that the bristlecones were not the “dominant component of variance” in the North American network, as previously claimed by Mann et al – an impression that they would obviously have been given by their incorrect PC calculation. So when they say that the error didn’t “matter”, it certainly mattered when they said that this particular shape was the “dominant component of variance” or the “leading component of variance”, claims that they made in response to our 2003 article.
Caption to realcimate figure and MBH 2004 Nature submission: FIGURE 1. Comparison of eigenvalue spectrum for the 70 North American ITRDB data based on MBH98 centering convention (blue circles) and MM04 centering convention (red crosses). Shown is the null distribution based on simulations with 70 independent red noise series of the same length with the same lag-one autocorrelation structure as the actual ITRDB data using the centering convention of MBH98 (blue curve) and MM04 (red curve). In the former case, 2 (or perhaps 3) eigenvalues are distinct from the noise floor. In the latter case, 5 (or perhaps 6) eigenvalues are distinct from the noise floor. The simulations are described in “supplementary information #2″.
Now for some fun. The figure below shows the calculations for two other MBH98 network/step combinations, illustrating here Mannian PC results for comparison to actual retention. On the left is a calculation for the Vaganov AD1600 network where 2 PCs were retained (retained PCs are circled.) PCs 3,4 and 5 were not retained, but all are “significant” according to the Rule N supposedly used in MBH98. In fact, the unused PC3 here is surely more “significant” than the covariance PC4 which Mann now claims under the algorithm illustrated at realclimate.
On the right is another network – Stahle.SWM AD1750, showing an opposite pattern. In this case, MBH retained nine PCs although only three are “significant” under the realclimate version of Rule N. This is also a relatively small network (located in southwestern U.S. and Mexico and inexplicably excluded from the NOAMER network with which it somewhat overlaps.)
Note: The dotted line additionally shows the 2/M “rule” that was noticed in the examination of the source code in connection with Mann deciding how many “climate fields” to retain. It’s not mentioned anywhere but is also illustrated here for convenience.
It seems quite possible to me that the Rule N version illustrated at realclimate was not actually used in MBH98. The first surfacing of the realclimate algorithm in the present form came only after Mann realized that the bristlecone hockey stick really did occur in the PC4 and was not the “dominant component of variance”. There is no contemporary evidence that it was used and there is no source code proving its use. Both Mann and Nature refused to provide details back in 2003 and 2004. The actual pattern of retained PCs cannot be reproduced according to the way that I’ve implemented the realclimate algorithm. As I said earlier, I’m not sure why Tamino has decided to re-open these particular scabs. In Mann’s shoes, I would have left this stuff alone.
The question for Tamino. Which is incorrect: the information on retained PCs at the Corrigendum SI? Or the claim that the algorithm illustrated at realclimate was used in that form in MBH98? If there is some other explanation, some way of deriving the Vaganov AD1600 and Stahle/SWM AD1750 using the realclimate algorithm, please show how to do it. I’ll post up data and code for my implementation to help you along. C’mon, Tamino. You’re a bright guy. Show your stuff.
UPDATE: Willis Eschenbach reports below in a comment that he has examined also examined these calculations and results and also concludes that MBH98 did not use Rule N.
NOTE: Just in case Tamino says that …sigh… it’s too much work, here’s my script. The functions used for the calculations are in http://data.climateaudit.org/data/mbh98/preisendorfer.functions.txt and can be analyzed there in case there are any defects; the collated information is http://data.climateaudit.org/data/mbh98/preisendorfer.info.dat and the tree ring networks are in http://data.climateaudit.org/data/mbh98/UVA. These can be read into an R session as follows:
The Stahle 1750 graphic can be produced using this command (and the functions show the calculation):
The Vaganov 1600 example can be produced as follows:
The North American comparison can be done as follows:
#Do Plot illustrated at CA
title(main=paste(target$network[i],”: AD”,target$period[i],sep=”") )
temp=( (x$lambda-x$preis)>0); target$preis[i]=sum(temp)
arrows(x0=2.2, y0=.382, x1=1.15, y1=.382, length = 0.1, angle = 30, code = 2, col = 4, lty = 1, lwd = 4)
arrows(x0=5, y0=.12, x1=4.15, y1=.08, length = 0.1, angle = 30, code = 2, col = 2, lty = 1, lwd = 4)
Results from my simulations are stored in the R-object http://data.climateaudit.org/data/mbh98/preis_mannomatic.tab which contains results for 20 network/step cases shown in the preisendorfer.info.dat file.