Comments on: Wahl and Ammann Again #1

By: Steve McIntyre

Steve McIntyre — Thu, 06 Sep 2007 05:04:15 +0000

Wahl and Ammann “accepted March 1 2006” is finally online at Climatic Change. It will be interesting to see whether (a) the status of “Ammann and Wahl, (under review)” [at GRL] has been changed; or (b) whether the text of the paper supposedly “in press” for IPCC on Feb. 28, 2006 has been altered and how it has been altered to accommodate the rejection of the GRL paper.

By: Brooks Hurd

Brooks Hurd — Thu, 31 Aug 2006 23:06:32 +0000

Steve,
It appears to me that WA and others on the team are responding to criticism of their work in ways that are very similar to the methods used by politicians when they wish to silence their detractors. Both groups make use of logical fallacies (Strawman, Appeal to Authority, Ad Homs, etc.) to “prove” their points. I am afraid that if this continues, certain branches of science will become indistinguishable from politics.

By: UC

UC — Thu, 31 Aug 2006 13:34:16 +0000

#20:

I’ll clarify the presentation later (no paychecks from oil industry, so I need to do other work as well 🙂 ). But to put it shortly:

y=s+n, y is measurement, s is signal and n is additive Gaussian(0,4/12) noise. s is AR1 process with p=0.99 and driving noise var q=0.01. Optimal linear filter is used to obtain estimate of s (see e.g. Gelb, Applied Optimal Estimation). But the trick is that the filter uses underestimated p, 0.77. As a result, 2-sigma bars are broken, and the variability is clearly underestimated. Yet, RE and RMSE tell that ‘good work fella!’ Disclaimer: Haven’t double-checked the computations, colors of first figure are confusing, etc.

By: Steve McIntyre

Steve McIntyre — Thu, 31 Aug 2006 12:07:40 +0000

#18. UC, you need to add some text explaining what you’re doing in these calculations. They look very interesting and I’d like to understand them, but I can’t follow it. Can you expand the description of the calculations?

By: Steve McIntyre

Steve McIntyre — Thu, 31 Aug 2006 12:04:00 +0000

In MM05a and improved in Reply to Huybers, we argued that an appropriate benchmark for the RE statistic in an MBH context was over 0.5, as compared to Mann and dendro benchmark for significance of 0.

Our modeling in these articles was probably needlessly complicated because you get very high RE statistics in “classical” spurious regressions e.g. Yule’s alcoholism versus C of E marriages.

The history of the RE score deserves a separate paper. I saw the same statistic (under a different name) mentioned for economic forecasting in Granger and Newbold 1973 – 1973 not the famous 1974 paper, citing earlier use by Theil.

My take on it is that the statistc has very low power (or no power) against spurious co-trending as evidenced by its values in the Yule case – which is the very problem of concern.

By: UC

UC — Thu, 31 Aug 2006 06:29:12 +0000

My simple Matlab simulations indicate that RE rewards for heavy low-pass filtering (here). Comparison of 2nd and 3rd figure is quite interesting: the RMSE and RE are smaller in the 3rd figure, but the residuals are correlated (is it better reconstruction than the original measurement?). When I have more time I can write the equations down. S/N ratio is very important issue here.

By: Martin Ringo

Martin Ringo — Thu, 31 Aug 2006 05:27:15 +0000

Re: RE

If one is trying to reconstruct temperature anomalies and is using a data set of temperatures from 1850-2000, then the RE statistics are inherently designed to be closer to one than if one were using a stationary data set. The RE is, at least as I understand it (and in 40 years of statistics and econometrics I never ran in to it until started reading climate papers), is 1 minus the ratio of the predicted minus actual values (both in the verification period) squared over the squared deviations of the actual values (in the verification period) less the mean of the __calibration period__. [I don’t know how to produce bold or underline in this format.] Thus, to make an RE statistic bigger simply pick a data set for which the means of the verification and calibration period differ by a significant amount. [Steve, this may be a point that you made earlier, but I missed it.]

For the CRU anomalies the 1856-1902 mean is something like -0.32 while the 1902 to 1980 is something like -0.16, i.e. half the size. Run a Monte Carlo on these two with errors that average out with a calibration period R-squared, and you will get RE statistics bigger than the R-squared. The fact that the denominator of the ratio in the RE is greatly different from the mean of the verification period makes the deviations from that calibration period mean very large and hence the ratio very small. Thus the RE moves toward one when there is a significant (relative the standard errors of the prediction) difference in the means of the variable being prediction in the two periods.

I will further speculate regarding the benchmarking of the RE statistic that it may have been done using a “no relationship” null hypothesis but not specifying the change in the means of the dependent variable. Note: how one does this change with a no-relationship relationship in a simulation is a bit tricky. For instance, what is the empirical average RE stat for the subset of “no-relationship” experiments that have a OLS significant relationship in the calibration period and a doubling or halving of the average values between the two periods? But if you, Steve, did that in you benchmarking you might want to start making a big deal about it and show the arithmetic of the incremental effect on the RE when there are differing means for the verification and calibration periods.

By: Ross McKitrick

Ross McKitrick — Thu, 31 Aug 2006 02:38:18 +0000

#10:

In his report Dr Wegman also discusses in detail the issue of non centred PCA analysis. However, the hockey stick pattern emerges whether non centred or centred PCA analysis is applied (not mentioned in Wegman’s report) provided that the correct number of statistically significant principal components are employed. Mclntyre and McKitrick (MM) failed to do this, hence their analysis censored out a great deal of data.

Whoever Paul Monro is, he obviously has not read our E&E05 paper, nor has he grasped the real message of W&A, so maybe he didn’t read it either. (We were just recently told about another European scientist who declared to a questioner a similar refusal to read E&E05, even while chastising us for not doing the very analysis undertaken therein.) As Steve says above, W&A showed, as did we, that the hockey stick = the bristlecones. The PC error makes it seem legit to rely on them because it bounces then up to the PC1. That being the case, it’s a wonder why MBH even bother keeping all the rest of the data, since the bcp’s force the opposite conclusion than what the rest of the data would indicate.

His bristling dismissal of the social network analysis only serves to illustrate the Wegman panel’s point about the ‘self-reinforcing feedback mechanism’ and the isolation of that community.

The Wegman panel did study E&E05, and address the issues Monro raises in his summary of it on pp79-80 of his report. It’s too bad they didn’t discuss it more in the front matter of their report, but they obviously grasped the issues:

In the MBH98 de-centered principal component calculation, a group of twenty primarily bristlecone pine sites govern the first principal component. Fourteen of these chronologies account for over 93% variance in the PC1 and 38% of the total variance. The effect is that it omits the influence of the other 56 proxies in the network. In a centered version of the data, the influence of the bristlecone pine drops to the fourth principal component, where it accounts for 8% of the total variance. The MM03 results are obtained if the first two NOAMER principal components are used. The MBH98 results can be obtained if the NOAMER network is expanded to five principal components. Subsequently, their conclusion about the climate of the late 20th century is contingent upon including low-order principal components that only account for 8% of the variance of one proxy roster. Furthermore, the MM03 results occur even in a de-centered PC calculation, regardless of the presence of PC4, if the bristlecone pine sites are excluded.

In the Gaspe “northern treeline” series, MM05a found that the MBH98 results occur under three conditions: 1) the series must be used as an individual proxy; 2) the series must contain the portion of the series that relies only on one or two trees for data; and 3) it must contain the ad-hoc extrapolation of the first four years of the chronology. Under all other conditions, including using an archived version of the series without extrapolation, MM03 type results occur.

MM05a also addresses the MBH98 claims of robustness in their findings. The sensitivity of the 15th century results to slight variations in the data and method of two individual series show a fundamental instability of the results that flatly contradicts the language used in MBH98 and in Mann et al. (2000) where it states “…whether we use all data, exclude tree rings, or base a reconstruction only on tree rings, has no significant effect on the form of the reconstruction for the period in question…” Additionally, MM05a notes much of the specialist literature raises questions about these indicators and at the least these questions should be resolved before using these two series as temperature proxies, much less as uniquely accurate stenographs of the world’s temperature history.

In response to MM03, Mann et al. wrote several critiques that appeared in Nature magazine as letters and as separate articles. The Mann et al. (2004) paper argued that the MM03 use of centered principal components calculations amounted to an “effective omission” of the 70 sites of the North American network. However, the methodology used omits only one of the 22 series. A calculation like this should be robust enough that it is relatively insensitive to the removal of one series. Also, “effective omission” is more descriptive of the MBH98 de-centering method, which uses 14 bristlecone sites to account for over 99% of explained variance.

By: Steve McIntyre

Steve McIntyre — Wed, 30 Aug 2006 22:19:29 +0000

#14. No, you’re looking for too conventional an explanation. Look at my next post on overfitting. It’s not that the relationship is varying – it’s just that you have overfitting on an unimaginable scale through their “inverse regression” method which nobody bothered figuring out.

By: Mark T.

Mark T. — Wed, 30 Aug 2006 22:12:26 +0000

They fail r2 outside of the calibration and training periods simply because the statistics are not stationary (among other glaring problems, including components that are highly correlated). Nearly every text I’ve read clearly states that varying statistics require online (adaptive) methods for calculating components. There are also calculations that can be done to determine the level of “non-stationarity” in the data, and thresholds for determining if even online methods will work properly. I have citations if necessary (may take a day to revive them all).

Mark