Creating a guide for Climate Audit Newbies

This is something that I should write up, but others are volunteering. Pete T asked for this thread. I don’t promise to accept or use anything. WordPress formerly had a Sticky button and I used to have a Road Map post as a sticky. An old one is here . I’m increasingly using the Pages to keep track of links – see for example here ; there’s also a collection of links to Station Data,… I use the Categories button in the left frame to sort through things myself. I urge readers to do so as well.

In editor mode, I can search through posts and/or comments for individual words which helps me locate things. If any of you can persuade WordPress to make this function available to readers (without editing privileges), it would be a useful improvement to WordPress.

A Custom R Function

Doug Nychka’s fields package in R is really excellent. Among its nice functions is a command world() that draws an outline of the continents. I wrote him today inquiring whether there was an option in which land could be colored. Unfortunately there wasn’t. However by the end of the day, there was. Doug sent me a function world.color() which yielded the following pretty map.

world1.gif

It will be included in the next release of fields, but in the meantime CA readers can download the function here

library(fields) #load the package
source(“http://data.climateaudit.org/scripts/world.color.R”) #download function
world.color()

You can handle this like the world() function where you can use xlim and ylim delimiters as in a plot function; you can use the add=T option if you want to overlay on a plot of points to make a location map. Thanks, Doug.

UC on CCE

I then compared verification statistics for the different reconstructions as shown below. OLS yielded much the “best” fit in the calibration period, but the worst fit in the verification period.

If OLS is equivalent to ICE, it actually finds the best fit (minimizes calibration residuals), and in proxy-temperature case makes the most obvious overfit. Let me try to get an understanding of the methods applied. As we know, core of MBH9x is the double pseudoinverse, in Matlab

RPC=P*pinv(pinv(TPC(1:79,T_select))*P(903:end,:));

where P is standardized (calibration mean to zero, calibration de-trended std to 1) proxy matrix, TPC target (temperature PCs) and T_select is that odd selector of target PCs. After this, RPCs are variance matched to TPCs, and brought back to temperatures via matrix multiplication. As you can see, I can replicate MBH99 reconstruction almost exactly with this method:

comparison
\

Differences in 1400-1499 are related problems with archived data, and in 1650-1699 they are due to unreported step in MBH procedure. Steve and Jean S have noted these independently, so I’m quite confident that my algorithm is correct.

I’ve considered this method as a variation of classical calibration estimator (CCE) and Steve’s made a point that this is one form of PLS. These statements are not necessarily in conflict. Original CCE is (with standardized target)

\hat{X}'=(\hat{B}S^{-1}\hat{B}^T)^{-1}\hat{B}S^{-1}Y'

where matrices \hat{B} and S are ML estimates of B and var(E) , obtained from the calibration experiment with a model

Y=XB+E

By setting S=I , I get exactly the same result as with double pinv. Which verifies my observation that MBH9x is CCE with special assumption about proxy noise and with incorrect variance matching step after this classical estimation.

Back to OLS (ICE) estimate, which is obtained by regressing directly X on Y,

\hat{\hat{X}}'=Y'^T(Y^TY) ^{-1}Y^TX

this is justified only with a special prior distribution for X' , which we don’t have. Thus OLS is out of the question. Yet, it is interesting to observe that OLS is actually a matrix weighted average between CCE and zero-matrix (Brown82, Eq 2.21) :

\hat{\hat{X}}'=[(X^TX)^{-1}+\hat{B}S^{-1}\hat{B}^T]^{-1}(\hat{B}S^{-1}\hat{B}^T)\hat{X}'

It would be interesting to compute MBH99 results with real CCE, but S does not have inverse for AD1700 and later steps. But results for up to AD1700 are here, CCE:

CCE ad1600

ICE:

ICE ad1600

As you can see, variability of ICE is lower, as it is weighted towards zero. But hey, where did that hockeystick go ?

Tree Ring Standardization – A Linear Mixed Effects Perspective

I’ve already received my first condescending comments from the dendro world about the mysteries of standardization. Just to pre-empt some further pontification presuming that I know nothing about these mysteries, I’m posting up some notes that I wrote in 2004 on standardization – which was what I would have been working on had people just accepted the defects in the Mannian multiproxy articles at the time. I hadn’t looked at these comments in a couple of years, but they are interesting and well worth publishing as they are quite original.

Many “standardization” operations carried out in dendrochronology can be interpreted using nonlinear mixed effects methods (implemented here in the nlme package of Pinheiro and Bates). RCS standardization can be interpreted using the nls function, while “conservative” standardization (negative exponential “COFECHA option 2) can be interpreted using the nlsList function. Given this equivalency, there are many advantages to interpreting dendrochronological standardization in a mixed effects context as dendrochronological standardization can then be integrated into a broader statistical context, rather than simply being a rather ad hoc collection of techniques, as it is presently. There are many diagnostic tools available in nlme that are unavailable in standard dendrochronology application packages. Interestingly, while dendrochronologists are well aware of both RCS and “conservative’ methods, one sees exactly the same pulls in nonlinear mixed effects modeling, where the nlme function effects a form of compromise between nls and nlsList functions. (For nlme users, this parallels the identical compromise that the lme function achieves between the lm and lmList functions.) I’ve been able to accurately emulate site chronologies using nlme methods. By utilizing the tools within the nlme package, new diagnostics and analyses can be produced, illuminating the results. [I have not yet considered spline modeling from a mixed effects framework, but it should be possible.]

Conversely, dendrochronology brings an extraordinarily rich collection of data, which somehow has eluded the mixed effects statistical community, but which integrates many of the issues that interest mixed effects statisticians. Longitudinal data sets used in mixed effects modeling are often very small and short in comparison with tree ring collections, where one can have data sets for individual sites with over 100,000 measurements.

The application of mixed effects techniques is very natural. Tree ring data is already grouped into cores from individual trees, which have pronounced individual differences between trees that are suited to mixed effects modeling. Growth is affected by aging, usually modeled through a generalized exponential. Individual cores usually have pronounced serial correlation and heteroskedasticity. Growth is also affected by annual climatic factors (and, indeed, the measurement of this effect is the primary goal of dendrochronology).

By applying a more general statistical methodology, there is the prospect that the analysis and allocation of growth among age effects, climatic effects and individual effects under the simple nls and nlsList modeling of dendrochronology can be placed in a broader context and perhaps some new results obtained. Continue reading

Another Assignment for the EAS8001 Mission Impossible Team

Mission Impossible Team, here’s your assignment today. Unfortunately you failed your last assignment: replicating Mann’s claimed correlations. But that probably was impossible. Today your assignment is probably possible, but is a dangerous expedition into the dark underground – into the terrifying world of Mannian RegEM. Courage and perseverance will be required. You may not return alive. Continue reading

Tucson Update

You may recall the post earlier this year where the USHCN official climate station with the largest positive trend in the USA turned out to be located in a parking lot at the University of Arizona in Tucson. See below:

Tucson1.jpg
Click picture to see image gallery at surfacestations.org

The GISS surface temperature plot for Tucson:

The Uof A Tucson station is still there in the parking lot, but as www.surfacestations.org volunteer Bob Thompson finds out while visiting family this Thanksgiving, it appears they are in the process of dismantling the station. The wooden portion of the Stevenson Screen has been removed.

tucson-update.jpg
Click picture for full sized image

It seems odd for them to remove the shelter, yet leave the precision aspirated thermometer, which is the long tube with the “T” on the left side spanning the old shelter mount.

Almagre Data

A couple of days ago, I received the Almagre measurement data from the excellent dendrochronological lab at the Univesity of Guelph. I’ve posted the measurement data online in two files (as I received them): crossdated and uncrossdated. There are 41 cores that have been crossdated and 20 cores are not crossdated. These files are as I received them – in “Tucson format” – which is in a punch card format. I’ve re-collated them into an alternate format crossdated uncrossdated which will be easier to use for anyone other than dendros.

As all of you well know, I’m just working my way through this process to see what happens. Given that our cores worked from bark in, I would have thought that there would have been no particular difficulty in crossdating once you had done the measurements. It’s not as though these are timbers from a medieval pueblo that one is trying to place in a sequence. But dendros have their own ways of doing things. There’s a reason for why they do this – the methods are ones that have been used for a long time and are quite stable. These aren’t Mannian innovations to multivariate analysis. However, I don’t know whether there are any statistical implications in this process and I’m interested in seeing what the difference is between the crossdated and uncrossdated trees. I need to visit with the lab and get some more education on this. My guess is that they use a program called COFECHA which tests for a common signal.

From a statistical point of view, the step does seem intriguing and I’ll need to crosscheck with Pete Holzmann on the quality of each of the non-crossdated cores. If some or all of these were “good” cores i.e. there was at least a decent section before heartrot was encountered (and heartrot was more frequent in the cores than one would ever guess from the literature) and a priori there was no reason why the core could not be crossdated, one wonders whether the COFECHA process is introducing a false unanimity into the “signal”.

Information on the core locations in a Excel file in the relevant data directory http://climateaudit.info/data/colorado .

After I look through the data, I’ll forward the data to the ITRDB data bank, where I’m pretty sure that I’ll set a record for promptness of placing the data into a public archive.

However as we stand presently, the measurement data is now online here. IPCC AR4 Box 6.4 illustrates 8 proxies going back to the MWP of which 6 are tree ring based. Three of the data sets are completely unarchived: Taymyr, Yamal and the Luckman-Wilson Jasper measurements (over which Rob Wilson does not have jurisdiction); updates for two more are unarchived (Tornetrask, Sheep Mountain – the active ingredient of the Mann and Jones 2003 PC1). Only one (Jacoby’s Mongolia) is archived. Our meta-data is considerably more complete than for any of these sites. We have also posted pictures online. I will be receiving a DVD with detailed photos of the cores used for digitizing (which will be over 1 GB in size and will send a copy of the DVD, together with a DVD of tree photos, to ITRDB in case this sort of data ever becomes of interest to anyone.)

I need to do some analysis of these cores. After reading Mannian RegEM dreck for the past few days, I feel like I need to be cleansed.

Pseudoproxies in Mann et al 2007

Judith Curry and JEG have expressed an interest in talking about Mann et al 2007, Robustness of proxy-based climate field reconstruction methods, JGR pdf. This is successor article to Mann et al, 2005, Testing the Fidelity of Methods Used in Proxy-Based Reconstructions of Past Climate, pdf.

Looking past the annoying and amusing faults, here are some thoughts about the substance of the article. There are two sorts of results in Mann et al 2007: results on the MBH98 network and pseudoproxy results.

The pseudoproxy results are much weakened because they only consider results from one quirky and idiosyncratic multivariate method (RegEM TTLS) in a very tame pseudoproxy network without

(a) comparing results from the method to other methods other than their own equallty quirky RegEM Ridge, or

(b) examining results from networks that are flawed and, in particular, flawed in ways that may potentially compromise MBH. I’ve posted on both these issues and will review some thoughts on this.

It is somewhat surprising to see another lengthy effort to rehabilitate the MBH98 network, which is analysed complete with original (incorrect) PC series without conceding a comma to the NAS panel or Wegman reports or even mentioning the bristlecone problem. Mann shows that he can recover the bristlecone shape using RegEM if we spot him the PC1 or raw bristlecone series. This was never in doubt – see MM (EE 2005) and, other than for polemical reasons, it’s hard to see any purpose or interest in the application of RegEM to this flawed network. Continue reading

Re-Visiting RegEM: Rutherford et al 2005

Mann et al 2007 is a new RegEM version, replacing the RegEM calculations of Rutherford et al 2005. The logic is not entirely self-contained and so I re-visited some of our previous comments on Rutherford et al here here here here . I’m going to reprise two issues today: Collation Errors, a small but amusing defect; and Calibration-Period Standardization, a more serious problem, raised last year by Jean S in this post – an issue evocative of the calibration period standardization error in MBH98. This latter problem has now been acknowledged; Mann and Rutherford have now deleted all source code pertaining to Rutherford et al 2005 in June 2007, nearly a year later. The error is mentioned in Mann et al 2007 – needless to say without mentioning climateaudit.

Collation Errors
First, the smaller point, collation errors. I observed an amusing collation error in Rutherford et al 2005, in which they had inadvertently spliced their instrumental data to the wrong calendar year. On an earlier occasion, we had reported collation errors in the first MBH98 data set to which we had been directed at Mann’s FTP site, which we reported in MBH98. After MM03, the “wrong version” was deleted from Mann’s ftp site and a new MBH98 directory appeared (which has in its turn dematerialized to be replaced by yet another version.) The collation errors observed in the first version did not occur in the later versions and some time ago, I arrived at the conclusion that the PCproxy version that Rutherford had directed us to was a version that he’d developed for his studies (perhaps Rutherford et al 2005) and that it probably hadn’t been a factor in the original MBH. Anyway, Rutherford et al 2005 contained a snark about this topic as follows:

some reported putative “errors” in the Mann et al. (1998) proxy data claimed by McIntyre and McKitrick (2003) are an artifact of (a) the use by these latter authors of an incorrect version of the Mann et al. (1998) proxy indicator dataset

.

Of course why did we use an “incorrect version”? Because they said to use it. I raised the splicing issue with Rutherford and he said that it was before his time. I asked Mann about it and he said that he was too busy to respond, but that Eduardo Zorita hadn’t had any problems replicating his results. 🙂 Anyway, when I went to re-check the code containing the collation error, it’s now been hoovered as well. If you go to the Rutherford et al 2005 SI and follow the link for MultiproxyPC Matlab Code, you go here , where a message states:

UPDATE (June 20, 2007):
Since the original publication of this work, we have revised the method upon discovering a sensitivity to the calibration period.
When, as in most studies, data are standardized over the calibration period, however, the fidelity of the reconstructions is diminished when employing ridge regression in the RegEM procedure as in M05 (in particular, amplitudes are potentially underestimated; see Auxiliary Material, section 2).

[Note: a re-visit on Feb 12, 2009 showed that the above link had been modified some time previously to say only:

UPDATE (June 20, 2007): Since the original publication of this work, we have revised the method upon discovering a sensitivity to the calibration period.

]

Interested parties can obtain the hoovered material here , originally saved by John A for me since I had been blocked from the Rutherford site.

3) Unreported “standardization”
The issue of sensitivity to the calibration period was noted on July 10, 2006, here when Jean S posted the following note, in which he noticed the then unreported short-segment standardization (a huge issue in MM2005) and thought that it was a potential problem (although he merely noted the issue without fully diagnosing it). (UC had noticed another interesting problem):

There is a “standardization” step in Rutherford’s code, which seems to be (to my knowledge) unreported. Here’s a segment of the code (taken from “mbhstepwiselowrecon.m”, but the same is present in all reconstruction codes):

[nrows,ncols]=size(composite);
if i==1% first step standardize everything
means=nanmean(composite(500:560,:));
stdevs=nanstd(composite(500:560,:));
composite=composite-repmat(means,nrows,1);
composite=composite./repmat(stdevs,nrows,1);
save standardlow
else % now just standardize the new pcproxy network
means=nanmean(composite(500:560,1009:1008+nproxies));
stdevs=nanstd(composite(500:560,1009:1008+nproxies));
composite(:,1009:1008+nproxies)=composite(:,1009:1008+nproxies)-repmat(means,nrows,1);
composite(:,1009:1008+nproxies)=composite(:,1009:1008+nproxies)./repmat(stdevs,nrows,1);
end

The above code “standardizes” all proxies (and the surface temperature field) by subtracting the mean of the calibration period (1901-1971) and then divides by the std of the calibration period. I’m not sure whether this has any effect to the final results, but it is definitely also worth checking. If it does not have any effect, why would it be there?

In Mann et al 2007, they say:

When, as in most studies, data are standardized over the calibration period, however, the fidelity of the reconstructions is diminished when employing ridge regression in the RegEM procedure as in M05 (in particular, amplitudes are potentially underestimated; see Auxiliary Material, section 2).

The difference is not illustrated in the article itself, but here is the relevant comparison from the SI, illustrating the impact of calibration period standardization on Rutherford et al 2005 RegEM (using ridge regression) said to be corrected in Mann et al 2007 (using TTLS regression).

ruther10.jpg

Rutherford et al had cited their code in their Journal of Climate article. In my opinion, Journal of Climate should require them to restore the code even if it was erroneous so that people can inspect what they did.

Stepwise Reconstruction
In Jean S’ note, he observes that the RegEM method is not really designed to deal with stepwise situations. In the MBH PCproxy network where PCs are calculated on several occasions, this poses an interesting conundrum: the PC most dominated by the bristlecones changes position so that bristlecones contribute mainly to the MBH PC2 in the AD1820 network, while they morph over to the PC1 in the earlier networks. What happens in a RegEM situation when PCs change places? It’s not discussed in any of the publications. It seems to me like it would be an issue, although I have no plans to investigate the matter.

Ross McKitrick on Mann et al 2007

Ross McKitrick writes:

First, on page 15, they say:

This result (as well as the separate study by Wahl and Ammann [2007]) thus refutes the previously made claim by MM05 that the features of the MBH98 reconstruction are somehow an artifact arising from the use of PC summaries to represent proxy data.

Well, if we had argued that the only problem in MBH98/99 is the PC error, and that the PC error alone produces the MBH hockey stick, then this paper and its triumphant conclusion might count for something. But we argued something a tiny bit more complex (though not much). The PCs were done wrong, and this had 2 effects.

  1. it overstated the importance of the bristlecones in the NOAMER network, justifying keeping them in even though they’re controversial for the purpose and their exclusion overturned the results.
  2. It overstated the explanatory power of the model when checked against a null RE score (RE=0), since red noise fed into Mann’s PC algorithm yielded a much higher null value (RE>0.5) due to the fact that the erroneous PC algorithm “bends” the PC1 to fit the temperature data.

Mann’s new paper doesn’t mention the reliance on bristlecones. Nobody questions that you can get hockey sticks even if you fix the PC algorithm, as long as you keep the bristlecones. But if you leave them out, you don’t get a hockey stick, no matter what method you use. Nor, as far as I can tell, does this paper argue that MBH98 actually is significant, and certainly the Wahl&Ammann recalculations (http://www.climateaudit.org/?p=564) should put that hope to rest.

Second, maybe I’m missing a nuance, but the diatribe against r2 (and by extension, Steve et moi, para 64) is misguided on 2 counts. They say that it doesn’t “reward” predicting out of sample changes in mean and variance (paragraph 39). Where I work, we don’t talk about test statistics “rewarding” estimations, instead we talk about them “penalizing” specific failures. The RE and CE scores penalize failures that r2 ignores. We argued that r2 should be viewed as a minimum test, not the only test. You can get a good r2 score but fail the RE and CE tests, and as the NRC concluded, this means your model is unreliable. But if you fail the r2 test, and you “pass” the RE test, that suggests you’ve got a specification that artificially imposes some structure on the out of sample portion that conveniently follows the target data, even though the model has no real explanatory power. Another thing that’s misguided is their use of the term “nonstationary”. They say that RE is much better because it takes account of the nonstationarity of the data. Again, where I work, if you told a group of econometricians you have nonstationary data, then proceeded to regress the series on each other in levels (rather than first differences) and brag about your narrow confidence intervals, nobody would stick around for the remainder of your talk. And you’d hear very loud laughter as soon as the elevator doors closed up the hall.

Third, what’s missing here is any serious thought about the statistical modeling. I get the idea that they came across an algorithm used to infill missing data, and somebody thought “whoah, that could be used for the proxy reconstruction problem” and thus was born RegEM. Chances are (just a guess on my part) people developing computer algorithms to fill in random holes in data matrices weren’t thinking about tree rings and climate when they developed the recursive data algorithm. You need to be careful when applying an algorithm developed for problem A to a totally different problem B, that the special features of B don’t affect how you interpret the output of the algorithm.

For example, in the case of proxies and temperature, it is obvious that there is a direction of causality: tree growth doesn’t drive the climate. In statistical modeling, the distinction between exogenous and endogenous variables matters acutely. If the model fails to keep the two apart, such that endogenous variables appear directly or indirectly the right-hand side of the equals sign, you violate the assumptions on which the identification of structural parameters and the distribution of the test statistics are derived. Among the specification errors in statistical modeling, failure to handle endogeneity bias is among the worst because it leads to both bias and inconsistency.

A comment like (para 13)

“An important feature of RegEM in the context of proxy-based CFR is that variance estimates are derived in addition to expected values”

would raise alarm bells in econometrics. It sounds like that guy in Spinal Tap who thinks his amp is better than the others because the dial on his goes up to 11. So, the stats package spits out a column of numbers called “variances”. My new program is better than the old one because the dial goes up to “variances”.

It’s just a formula computer program, and it won’t tell you if those numbers are gibberish in the context of your model and data (as they would be if you have nonstationary data). You need a statistical model to interpret the numbers and show to what extent the moments and test statistics approximate the scores you are interested in.