Signal to Noise Ratio Estimates of Mann08 Temperature Proxy Data

Guest post by Jeff Id from The Air Vent (used by invitation)

Occasionally when working on one thing long enough, you discover something unexpected that allows you to take a step forward in understanding.  At the ICCC conference, I met Steve McIntyre and took time to ask him how come Mann07 “Robustness of proxy-based climate field reconstruction methods” didn’t show any variance loss in the historic signal.  The paper makes the claim that CPS is a functional method for signal extraction, which I’ve long and vociferously contested ;) .  Neither of us had a good answer, but I had to know.   In Mann07 – Part II at the Air Vent, the mystery was solved.  The M07 paper uses model data as a ‘known’ temperature signal and adds various levels of noise to it.  While the work oddly uses white noise in most operational tests, it does present the example of ARMA (1,0,0) ρ = 0.32 models, and it showed very little variance loss.  Replicating M07 using CPS wasn’t difficult and the results were confirmed – no historic variance loss so no artificially flat handles for the Mann hockeystick.

With white noise or low autocorrelation noise, there will be none variance loss (HS handle) reported in VonStorch and Zorita 04, Christiansen2010, McIntyre Mckitrick o5 or numerous other studies.   This is because low AR noise doesn’t create signal obscuring trends on a long enough timescale to make a difference.  However, if red noise having autocorrelation which matches observed values in proxies is used, we get a whole different result overturning the conclusions of Mann07. But, this isn’t the topic of this post.

Continue reading

Erice 2010

I’m off to Sicily tonight for the 2010 conference of the World Federation of Scientists, hosted by the redoubtable Antonio Zichichi. I’ll be a bit spotty checking in.

I never did finish reporting on the 2009 conference as we got overtaken by Yamal and then by Climategate and the inquiries. I’ve got a better computer this year (enough battery life to get through the day and I’ll try to post some conference reports.)

I’m making a presentation in a session on Improving IPCC.

Briggs on McShane and Wyner

As usual, a good analysis from Matt Briggs here

McShane and Wyner 2010

A reader (h/t ACT) draws attention to an important study on proxy reconstructions (McShane and Wyner 2010) in the Annals of Applied Statistics (one of the top statistical journals)
A Statistical Analysis of Multiple Temperature Proxies: Are Reconstructions of Surface Temperatures Over the Last 1000 Years Reliable?

It states in its abstract:

We find that the proxies do not predict temperature significantly better than random series generated independently of temperature. Furthermore, various model specifications that perform similarly at predicting temperature produce extremely different historical backcasts. Finally, the proxies seem unable to forecast the high levels of and sharp run-up in temperature in the 1990s either in-sample or from contiguous holdout blocks, thus casting doubt on their ability to predict such phenomena if in fact they occurred several hundred years ago.

They cite the various MM articles.

Ross on Panel Regressions

Ross comments:

One of the benefits of panel regressions is that it forces you to spell your null hypothesis out clearly. In this case the null is: the models and the observations have the same trend over 1979-2009. People seem to be gasping at the audacity of assuming such a thing, but you have to in order to test model-obs equivalence.

Under that assumption, using the Prais-Winsten panel method (which is very common and is coded into most major stats packages) the variances and covariances turn out to be as shown in our results, and the parameters for testing trend equivalence are as shown, and the associated t and F statistics turn out to be large relative to a distribution under the null. That is the basis of the panel inferences and conclusions in MMH.

It appears to me that what our critics want to do is build into the null hypothesis some notion of model heterogeneity, which presupposes a lack of equivalence among models and, by implication, observations. But if the estimation is done based on that assumption, then the resulting estimates cannot be used to test the equivalence hypothesis. In other words, you can’t argue that models agree with the observed data, using a test estimated on the assumption that they do not. As best I understand it, that is what our critics are trying to do. If you propose a test based on a null hypothesis that models do not agree among themselves, and it yields low t and F scores, this does not mean the hypothesis of consistency between models and observations is not rejected. It is a contradictory test: if the null is not rejected, it cannot imply that the models agree with the observations, since model heterogeneity was part of the null when estimating the coefficients used to construct the test.

In order to test whether modeled and observed trends agree, test statistics have to be constructed based on an estimation under the null of trend equivalence. Simple as that. Panel regressions and multivariate trend estimation methods are the current best methods for doing the job.

Now if the modelers want to argue that “of course” the models do not agree with the observations because they don’t even agree with each other, and it would be pointless even to test whether they match observations because everyone knows they don’t; or words to that effect, then let’s get that on the table ASAP because there are a lot of folks who are under the impression that GCM’s are accurate representations of the Earth’s climate.

Re-read Pielke Jr on Consistency

Roger vs Annan here

Using Santer’s Method

Using Santer’s own methodology with up-to-date observations, here are results comparing observations to the ensemble mean of Chad’s collation of 57 A1B to models to 2009. In each case, the d1* calculated Santer-style has moved into very extreme percentiles.

The results from Ross’ more advanced methodology are not getting results that are in any sense “inconsistent” with the application of Santer’s own methods to up-to-date data.

Tropo

Sat

Obs Trend

Ensemble

Santer d1* (1999)

d1*(2009)

Percentile

Lapse_T2LT

rss

-0.033

-0.079

-0.67

-2.819

0.003

Lapse_T2LT

uah

0.048

-0.079

-3.5

-7.395

0

Lapse_T2

rss

0.005

-0.069

NA

-4.212

0

Lapse_T2

uah

0.084

-0.069

NA

-8.518

0

T2LT

rss

0.159

0.272

0.37

1.69

0.948

T2LT

uah

0.075

0.272

1.11

2.862

0.996

T2

rss

0.121

0.262

0.44

2.196

0.981

T2

uah

0.04

0.262

1.19

3.449

0.999

A Mixed Effects Perspective on MMH10

Today’s post is complementary to MMH10, which, as readers obviously realize, is in Ross’ excellent style. There has been a kneejerk reaction from climate scientists that the article is “wrong” – typically assuming that we have neglected some trivial Santerism (which we haven’t).

This post is NOT – repeat NOT – an explication of MMH10. Let me repeat another point – Santer’s ensemble results don’t hold up with additional data using his own method. So this result doesn’t depend on the validity of the MMH10 method.

Some of the defences of the Santer results are based on the idea that the standard deviation of an “ensemble mean model” should be huge relative to the standard deviation of realizations of any given model. Unfortunately, the precise meaning of an “ensemble mean model” ishard to pin down in precise statistical terms.

I thought that it would be useful to try to pin the concept of “ensemble mean model” down in statistical terms. As I was doing so, I did the simple boxplot diagram presented here that provided interesting results and made me think more about the stratification issue. But this post is not an elucidation of the MMH10 algebra, which comes at things from a different perspective. ’ll try to figure out a way of presenting the MMH10 algebra in a friendly version as well on another occasion.

UPDATE Aug 13 – there was a mistake in the collation of trends that I used in this post -which was (confusingly) supposed to help clarify things rather than confuse things. I used trends calculated up to 2099 instead of 2009, which ended up making the within-group standard deviations too narrow. I’ll rework this and re-post. However, I’m going to Italy next Tuesday and am working on a presentation so the re-post will have to wait. This has also been a distraction from the MMH issues so let’s focus on them.

CRU: “We had never undertaken any reanalysis…”

At the close of Boulton’s April 9 interview with CRU, the only such interview relevant to the proxy reconstruction controversies that constitute 99% of the Climategate emails, Boulton asked CRU to comment on Ross McKitrick’s National Post op ed last October during Yamal. The response was given to Muir Russell on or after June 16 and the “report” doesn’t refer to it. But it contains some interesting answers pertaining to long-standing questions about the Polar Urals chronology that were not addressed in the “report”. (Whether the answers make sense is a different question.)

Remarkably, CRU’s explanation for never reporting the Polar Urals chronology of Briffa et al 1995 with the incorporation of additional measurements is that they never bothered calculating the impact of the additional measurements. And check out their explanation for why they didn’t do a regional Yamal-Polar Urals-Schweingruber chronology.

Continue reading

Wahl-Briffa Attachments Were Deleted

The Muir Russell Inquiry was supposed to examine the email controversy. One of the issues that they purported to examine was the surreptitious Wahl-Briffa correspondence of 2006 that Fred Pearce described as a “direct subversion of the spirit of openness intended when the IPCC decided to put its internal reviews online”.

In April 2010, I requested copies of the attachments to the controversial Wahl-Briffa correspondence of July 2006 and in addition, attachments to some earlier less controversial correspondence. The July 2006 Wahl-Briffa correspondence is of particular interest because it was the subject of Jones’ “delete all emails” request to Mann, Briffa, Wahl and Ammann.

From time to time, we’ve heard reassurances that nothing was really deleted.

However, in their response to my FOI request, the university said that they were unable to comply with my request for the attachments to the Wahl-Briffa correspondence because the documents had been destroyed. They refused to provide me with the Wahl and Ammann version that was used in the IPCC AR4 First Draft.
Continue reading