Comments on: Conflict and Confidence: MBH99

By: UC

UC — Tue, 24 Nov 2009 14:08:11 +0000

This is the sort of “dirty laundry” one doesn’t want to fall into the hands of those who might potentially try to distort things…

No, no, just trying to clarify. So, from this text

http://www.eastangliaemails.com/emails.php?eid=355&filename=1062527448.txt

I can infer that I was quite close with http://www.climateaudit.org/?p=647#comment-103560 , but now we can hopefully figure it out exactly. Or do we need to, as the main post above shows how to really do it ? ( And all the millennial reconstructions published so far need to be corrected? )

By: K. Hamed

K. Hamed — Thu, 14 Aug 2008 16:11:48 +0000

Steve,

can you explain why the correlations between proxies and the temperature in nhem-raw.dat are very poor in the period 1902 to 1980 for proxies 2, 5, 7, 12, 13, and 14?

By: Hu McCulloch

Hu McCulloch — Tue, 12 Aug 2008 13:46:20 +0000

Gary (#8) asks,

Can you clarify the definition of the ‘Inconsistency R’ statistic in layman’s terms?

Let me try — Suppose you have 3 proxies (different tree ring series or whatever) that give reconstructed temperature anomalies for some year in the past of -.5, -1.0, and -1.5 dC, respectively, each with error bars (2 sigma) of plus or minus 1.0 dC. These average to -1.0 dC, with a somewhat reduced composite error bar of about 0.6 dC. In this case, each proxy’s reading is within its error bar of the average, so there is probably no inconsistency problem. Brown’s R-stat more rigorously combines the three discrepancies into a single chi-square distributed statistic that takes into account any differences in precisions and correlations of the different proxy measures.

But now suppose that instead the three proxies gave readings of +1.0, -1.0, and -3.0 dC, each still with an error bar of 1.0 dC. They still average out to -1.0 dC, with the same composite error bar of 0.6dC. However, something must be wrong with the assumption that these are measuring the same global temperature, since two the readings are 4 standard errors from the average. Brown’s statistic would surely register an inconsistency in this case.

Of course, any statistic should give a false rejection of the null (here that the proxies are all measuring the same global temperature anomaly) p percent of the time, where p is the test size used. Steve’s Figure 1 above has a red line at the 95% critical value of the appropriate chi-squre distribution, corresponding to a test size of p = 5%, I believe. Outside the calibration period (where the fit has to be good, by construction), it looks to me like maybe 40% of the readings are above the line, which is far more than one would expect by chance if the proxies really were consistent.

Steve — what is the actual fraction of pre-calibration values that lie above the line?

Inconsistency of proxies during the reconstruction period could be caused by cherry picking of proxies during the calibration period. If say 100 proxies were examined for correlation to instrumental temperature, just by chance 5 or so would appear to be significant at the 5% test size (loosely called the “95% significance level”), even if none had any real value as a proxy. If the three “best” proxies were then singled out and used, their computed standard errors would give a misleadingly small confidence interval for the historical reconstruction. However, there would be a very good chance that they would flunk Brown’s R-stat test.

Phil. (#5) also asked if the test could be applied to Craig Loehle’s reconstruction. (See http://www.econ.ohio-state.edu/jhm/AGW/Loehle/. However, he did not calibrate his data from scratch as assumed by Brown, but rather just took published local temperature reconstructions that had already been calibrated by their various authors, and then averaged them together. A few of these series did have published error estimates, but my understanding is that not all of them did. Accordingly, the standard errors I provided for the corrected estimates in the 2008 Loehle and McCulloch paper are based entirely on the deviations of the individual proxies from their average, which basically assumes that the individual proxies are consistent. In any event, any published errors would be only for the precision of the estimate of local temperature, and would not include the also important deviation of local temperature from the global average. Since Brown’s statistic does not take this additional source of error into account, it is possible that it would reject that the various proxies are all measuring the same temperature anomaly, but only because local temperature anomalies naturally differ at any point in time.

Someone also asked why the statistic looks so good in the calibration period. As noted above, this is true by construction, since the estimated coefficients and standard errors were computed from this data, and therefore must be consistent with it. It is only out of the calibration period that the statistic starts to tell us something we didn’t already know.

Brown’s R-stat in a sense provides a “verification” check on the calibration, but one that does not require withholding instrumental observations as is often done in this literature.

(Sorry I’ve been away from the group for so long — I’ve been traveling and have had time-consuming day-job and other responsibilities.)

By: Sam Urbinto

Sam Urbinto — Wed, 06 Aug 2008 17:22:10 +0000

Phil. #29

After that SEM versus SD debacle (Why don’t you use the same easy to pass method everyone else does rather than one that’s hard to pass?) I don’t see what use it is to delve into a study that clearly shows once you remove the material that gives you a certain signal, the signal goes away. 😀

Seriously, it might be interesting, but unless the IPCC relies upon it for policy, what’s the point of taking time away from investigating the things it relies upon?

By: Mark T.

Mark T. — Wed, 06 Aug 2008 16:16:37 +0000

Wordsmithing.

Mark

By: Phil.

Phil. — Wed, 06 Aug 2008 15:34:59 +0000

Re #28

You asked why not audit Loehle?

I certainly did not, I said: "For comparison why not run the same test on the Loehle reconstruction" As I have pointed out since it would be a comparison of two reconstructions with different philosophies (e.g. dendro vs non-dendro) and it would be interesting to see if this consistency analysis would pick up a significant difference, also comparing with recons which contain the same proxies wouldn't seem to tell us much. Steve pointed out that the use of smoothing and non-annual data might be a problem: "Also, because Loehle (and Moberg) used some very smoothed data versions, there are some issues involved in transposing the methods that need to be thought through fairly carefully before jumping to any conclusions.", which is a valid concern. As for auditing Loehle that has already been done on here and errors were found and a corrected version produced. If the use of the data by IPCC is an important consideration then compare with Moberg with similar logic (but also subject to Steve's reservations).

By: DeWitt Payne

DeWitt Payne — Wed, 06 Aug 2008 14:32:41 +0000

Phil,

You asked why not audit Loehle? I answered. The priority is to investigate the data and methods used by the IPCC, because they are the ones who claim there is a problem (the sky is falling) and we must do something about it. The false dichotomy is that if the skeptics do not have a valid alternative theory for the recent temperature change, then the AGW theory must be correct.

By: Phil.

Phil. — Wed, 06 Aug 2008 13:49:38 +0000

Re #24

Also, because Loehle (and Moberg) used some very smoothed data versions, there are some issues involved in transposing the methods that need to be thought through fairly carefully before jumping to any conclusions. I think that it’s more orderly to work through the recons that purport to have annual resolution, see what they look like from different perspectives, before worrying too much about Loehle.

I can appreciate the concern regarding the smoothing issues and how that might impact on the method.
I’m not worrying about Loehle rather my way of approach is different that yours, Steve, faced with a group of recons and applying a new analytical technique I’d pick the two most different ones first to see what the range of values is likely to be. Your approach is more orderly, just two different ways to skin the same cat.
Re #25
DeWitt I’ve no idea what you’re referring to here?

By: UC

UC — Wed, 06 Aug 2008 07:51:59 +0000

#17, #20, let me try to add something from engineering viewpoint;

Assume that calibration was perfect, i.e. and . Now, in the prediction phase we have to estimate in the normal regression model

where is normally distributed, and . Under these assumptions it can be shown that the quadratic form of residual vector ,

is distributed. Thus, when having multiple observations, we can test whether observations agree with the statistical model, without knowing the true .

Only difference between R and W is that in the former (calibration), we have only estimates of . Thus, R is asymptotically ( ) distributed.

By: DeWitt Payne

DeWitt Payne — Tue, 05 Aug 2008 21:25:18 +0000

Phil,

I realize the semantic implications of my choice of analogy can be considered pejorative, but that isn’t my intention.

If someone claims the sky is falling, do you spend all your time investigating the validity of the claims that the sky isn’t falling? I hope not, because disproving their claims says nothing about whether the sky is falling or not. Believing otherwise would be the false dichotomy fallacy. The data and models of the AGW proponents need to be subject to at least the same level of skepticism and scrutiny as the FDA applies to a potential blockbuster new drug application.