In today’s post, I will look at a new Naturemag climate reconstruction claiming unprecedentedness (h/t Bishop Hill): “Evolution of the Southern Annular Mode during the past millennium” (Abram et al Nature 2014, pdf). Unfortunately, it is marred by precisely the same sort of data mining and spurious multivariate methodology that has been repeatedly identified in Team paleoclimate studies.

The flawed reconstruction has been breathlessly characterized at the Conversation by Guy Williams, an Australian climate academic, as a demonstration that, rather than indicating lower climate sensitivity, the recent increase in Antarctic sea ice is further evidence that things are worse than we thought. Worse it seems than previously imagined even by Australian climate academics.

the apparent paradox of Antarctic sea ice is telling us that it [climate change] is real and that we are contributing to it. The Antarctic canary is alive, but its feathers are increasingly wind-ruffled.

**A Quick Review of Multivariate Errors**

Let me start by assuming that CA readers understand the basics of multivariate data mining. In an extreme case, if you do a multiple regression of a sine wave against a large enough network of white noise, you can achieve arbitrarily high correlations. (See an early CA post on this here discussing example from Phillips 1998.)

At the other extreme, if you really do have a network of proxies with a common signal, the signal is readily extracted through averaging without any ex post screening or correlation weighting with the target.

As discussed on many occasions, there are many seemingly “sensible” multivariate methods that produce spurious results when applied to modern trends. In our original articles on Mann et al 1998-1999, Ross and I observed that short-centered principal components on networks of red noise is strongly biased to the production of hockey sticks. A related effect is that screening large networks based on correlation to modern trends is also biased to the production of hockey sticks. This has been (more or less independently) observed at numerous climate blogs, but is little known in academic climate literature. (Ross and I noted the phenomenon in our 2009 PNAS comment on Mann et al 2008, citing an article by David Stockwell in an Australian mining newsletter, though the effect had been previously noted at CA and other blogs).

Weighting proxies by correlation to target temperature is the sort of thing that “makes sense” to climate academics, but is actually even worse than ex post correlation screening. It is equivalent to Partial Least Squares regression of the target against a network (e.g. here for a discussion). Any regression against a large number of predictors is vulnerable to overfitting, a phenomenon well understood with Ordinary Least Squares regression, but also applicable to Partial Least Squares regression. Hegerl et al 2007 (cited by Abram et al as an authority) explicitly weighted proxies by correlation to target temperature. See the CA post here for a comparison of methods.

If one unpacks the linear algebra of Mann et al 1998-1999, an enterprise thus far neglected in academic literature, one readily sees that its regression phase in the AD1400 and AD1000 steps boils down to weighting proxies by correlation to the target (see here) – this is different from the bias in the principal components step that has attracted more publicity.

At Climate Audit, I’ve consistently argued that relatively simple averaging can recover the “signal” from networks with a common signal (which, by definition “proxies” ought to have). I’ve argued in favor of working from large population networks of like proxies without ex post screening or ex post correlation weighting.

**The Proxy Network of Abram et al 2014**

Abram et al used a network of 25 proxies, some very short (5 begin only in the mid-19th century) with only 6 reaching back to AD1000, the start of their reconstruction. They calibrated this network to the target SAM index over a calibration period of 1957-1995 (39 years.)

The network consists of 14 South American tree ring chronologies, 1 South American lake pigment series, one ice core isotope series from the Antarctic Peninsula and 9 ice core isotope series from the Antarctic continent. The Antarctic and South American networks are both derived from the previous PAGES2K networks, using the subset of South American proxies located south of 30S. (This eliminates the Quelccaya proxies, both of which were used upside down in the PAGES2K South American reconstruction.)

Abram et al described their proxy selection as follows:

We also use temperature-sensitive proxy records for the Antarctic and South America continental regions [5 – PAGES2k] to capture the full mid-latitude to polar expression of the SAM across the Drake Passage transect. The annually resolved proxy data sets compiled as part of the PAGES2k database are published and publically available5. For the South American data set we restrict our use to records south of 30 S and we do not use the four shortest records that are derived from instrumental sources. Details of the individual records used here and their correlation with the SAM are given in Supplementary Table 1.

However, their network of 14 South American tree ring chronologies is actually the product of heavy prior screening of an ex ante network of 104 (!!) chronologies. (One of the ongoing methodological problems in this field is the failure of authors to properly account for prior screening and selection).

The PAGES2K South American network was contributed by Neukom, the co-lead author of Gergis et al 2012. Neukom’s multivariate work is an almost impenetrable maze of ex post screening and ex post correlation weighting. If Mannian statistics is Baroque, Neukom’s is Rococo. CA readers will recall that non-availability of data deselected by screening was an issue in Gergis et al. (CA readers will recall that David Karoly implausibly claimed that Neukom and Gergis “independently” discovered the screening error in Gergis et al 2012 on the same day that Jean S reported it at Climate Audit.) Although Neukom’s proxy network has become increasingly popular in multiproxy studies, I haven’t been able to parse his tree ring chronologies as Neukom has failed to archive much of the underlying data and refused to provide it when requested.

Neukom’s selection/screening of these 14 chronologies was done in Neukom et al 2011 (Clim Dyn) using a highly non-standard algorithm which rated thousands of combinations according to verification statistics. While not a regression method per se, it is an ex post method and, if eventually parsed, will be subject to similar considerations as regression method – the balloon is still being squeezed.

**The Multivariate Methodology of Abram et al 2014 **

Abram et al used a methodology equivalent to the regression methodology of the AD1400 and AD1000 steps of Mann et al 1998-1999 – a methodology later used (unaware) in Hegerl et al 2007, who are cited by Abram et al.

In this methodology, proxies are weighted by their correlation coefficient with the resulting composite scaled to the target. Abram et al 2014 described their multivariate method as follows (BTW “CPS” normally refers to unweighted composites):

We employ the widely used composite plus scale (CPS) methodology [5- PAGES2K,11 – Jones et al 2009, 12 – Hegerl et al 2007] with nesting to account for the varying length of proxies making up the reconstruction. For each nest the contributing proxies were normalized relative to the AD 1957-1995 calibration interval…

The normalized proxy records were then combined with a weighting [12- Hegerl et al 2007]

based on their correlation coefficient(r) with the SAM during the calibration interval (Supplementary Table 1). The combined record was then scaled to match the mean and standard deviation of the instrumental SAM index during the calibration interval. Finally, nests were spliced together to provide the full 1,008-year SAM reconstruction.

Although Abram et al (and their reviewers) were apparently unaware, this methodology is formally equivalent to MBH99 regression methodology and to Partial Least Squares regression. Right away, one can see potential calibration period overfitting perils when one is using a network of 25 proxies to fit over a calibration period of only 29 years. Such overfitting is particularly bad when proxies are flipped over (see another old CA post here – I am unaware of anything equivalent in academic climate literature).

**The Abram/PAGES2K South American Tree Ring Network
**

The Abram/PAGES2K South American tree ring network is an almost classic example of what not to do. Below is an excerpt from their Supplementary Table 1 listing their South American proxies, together with their correlation (r) to the target SAM index and the supposed “probability” of the correlation:

Right away you should be able to see the absurdity of this table. The average correlation of chronologies in the tree ring network to the target SAM index is a Mannian -0.01, with correlations ranging from -0.289 to +0.184.

Thare’s an irony to the average correlation being so low. Villalba et al 2012, also in Nature Geoscience, also considered a large network of Patagonian tree ring chronologies (many of which were identical to Neukom et al 2011 sites), showing a very noticeable decline in ring widths over the 20th century (with declining precipitation) and a significant negative correlation to Southern Annular Mode (specifically discussed in the article). It appears to me that Neukom’s prior screening of South American tree ring chronologies according to temperature (reducing the network from 104 to 14) made the network much less suitable for reconstruction of Southern Annular Mode (which is almost certainly more clearly reflected in precipitation proxies.)

The distribution of correlation coefficients in Abram et al is inconsistent with the network being a network of proxies for SAM. Instead of an average correlation of ~0, a network of actual **proxies** should have a significant positive (or negative) correlation, and, in a “good” network of proxies of the same type (e.g. Patagonian tree ring chronologies), all correlations will have the same sign.

Nonetheless, Abram et al claim that chronologies with the most extreme correlation coefficients within the network (both positive and negative) are also the most “significant” (as measured by their p-value.) They obtained this perverse result as follows: the “significance” of their correlations “were assessed relative to 10000 simulations on synthetic noise series with the same power spectrum as the real data [31 – Ebisuzaki, J. Clim 1997]”. Thus both upward-trending and downward-trending series were assessed as more “significant” within the population of tree ring chronologies and given higher weighting in the reconstruction.

The statistical reference of Abram et al was designed for a different problem. Their calculations of significance are done incorrectly. Neither their network of tree ring chronologies nor their multivariate method is suitable for their task. The coefficients clearly show the unsuitability.

**Conclusion**

A reconstruction using the methods of Abram et al 2014, especially accumulating the previous screening of Neukom et al 2011, is completely worthless for estimating prior Southern Annular Mode. This is different from being “WRONG!”, the adjective that is too quickly invoked in some skeptic commentary.

Despite my criticism, I think that proxies along the longitudinal transect of South America are extremely important and that the BAS Antarctic Peninsula ice core isotope series from James Ross Island is of great importance (and that it meets virtually all CA criteria for an ex ante “good” proxy.)

However, Abram et al is about as far from a satisfactory analysis of such proxies as one can imagine. It is too bad that Naturemag appears unequal to identifying even elementary methodological errors in articles that claim unprecedentedness. Perhaps they should reflect on their choice of peer reviewers for paleoclimate articles.

http://climateaudit.org/2006/04/01/natures-statistical-checklist-for-authors/

## 103 Comments

Who says climate science doesn’t advance, step by step? :)

What’s not so simple is to unpick many different forms of sophistication that find spurious signal. This CA has consistently done. That it still needs doing – and that Abram et al is even ‘worse than we thought’ – is a scandal.

Given that there is so much knowledge of this in other academic subjects – see for example this exposition in (economist)David Friedman’s response to this blog post https://plus.google.com/u/0/117663015413546257905/posts/BprqVQWaFKa – is there reason to suspect that many of the researchers who go into Climate Science or indeed Mathematical Physics (since the author of the post, John Baez is one of these) are lacking in the basics of the scientific method?

Steve: I discourage readers from overeditorializing about topics like “basics of the scientific method”. Nor do I think that the subject of this post is related to the exposition to which you link (which was about calculating trends in a univariate series) The authors of Abram et al are British Antarctic Survey ice core specialists. I don’t think that skill in extraction of oxygen isotopes from ice cores is a qualification for statistical analysis of a proxy network. Merely adding a statistician who didn’t understand the data isn’t necessarily a solution, as we’ve seen with, for example, Tingley and Huybers uncritical use of the contaminated portion of the Tiljander series (upside down, needless to say.)

Dr Nerilie Abram did her B.Sc. and Ph.D. in Australia which makes me think she is Australian. She did work for the British Antarctic Survey between 2004 and 2011, but is back down under now. http://rses.anu.edu.au/people/nerilie-abram

On June 26-27, 2014, Nerilie Abram will be joining Joelle Gergis at the 3rd PAGES2K AUS workshop at the Australian Bureau of Meteorology, 700 Collins St, Melbourne, Australia (see here). One of the topics on the 2nd day will be:

It’s hard to see how either Abram or Gergis will be able to shed much light on these topics.

CA readers may recall that the 1st PAGES2K AUS workshop was at the University of Western Australa and was attended by both Christopher Turney of the Ship of Fools and Stephan Lewandowsky.

“lacking in the basics of the scientific method?”

But how good they are at the basics of the fund gathering policy based method?

Steve, the following 3 sentences in your introduction to this thread hit on the very basic problems with most if not all published temperature reconstructions. Sentence 3 below covers a “way out” for these reconstructions that is never used nor even considered. Of course, if there is not a reasonably consistent and distinguishable temperature signal in the proxy response, the averaging will not find it either.

I hope that people who read at these blogs will take the time to truly understand the critical importance to what you are saying here. I suspect those people more associated with advocacy on the matter of climate are inclined to accept without or with little questioning that methodology – failed or otherwise -that yields what they already “know” must be true. On the other hand, I think any number of those people in the more skeptical camp fail to grasp the true significance of these basic flaws in temperature reconstruction approaches because it is difficult to believe that such basic errors could be made without question and evidently into the present time.

“A related effect is that screening large networks based on correlation to modern trends is also biased to the production of hockey sticks. This has been (more or less independently) observed at numerous climate blogs, but is little known in academic climate literature.”

“Weighting proxies by correlation to target temperature is the sort of thing that “makes sense” to climate academics, but is actually even worse than ex post correlation screening.”

“At Climate Audit, I’ve consistently argued that relatively simple averaging can recover the “signal” from networks with a common signal (which, by definition “proxies” ought to have). I’ve argued in favor of working from large population networks of like proxies without ex post screening or ex post correlation weighting.”

Kenneth, one of the few academic journal acknowledgements that ex post screening creates problem was in PAGES2K, which cited Esper et al 2009:

However, Esper et al 2009 nowhere mentions the problem of ex post screening. Indeed, the word “screening” doesn’t even occur in the article. An actual academic journal reference for the phenomenon is McIntyre and McKitrick 2009 here, where we (wryly) cited an article by David Stockwell in an Australian mining newsletter. The phenomenon had been, of course, also discussed independently at CA, Lucia’s, Jeff Condon’s and Lubos’. Not that anyone would expect a climate academic to cite Mc-Mc when they had the alternative of citing an irrelevant article.

Readers may recall that Mann flatly repudiated that ex post screening introduced bias as follows:

It is true enough that the procedure is “common” in Team reconstructions and that its bias is undiscussed in academic journals. However, the phenomenon is real enough and,as noted above, has been independently observed at several technical skeptic blogs.

Observing the phenomenon is obviously not evidence of “unfamiliarity” with the concept of “screening/regression”. On the contrary, unawareness of the phenomenon on the part of Team climate academics shows weaknesses in their own comprehension of mathematical methods, a shortcoming that will hardly surprise CA readers.

PAGES2K purported to deal with the potential screening problems as follows:

However, these protocols were completely irrelevant to Neukom’s screening which occurred prior to the presentation of the network to PAGES2K and Naturemag. All the various “alternatives” began with the screened network (screened from 104 chronologies to 14.) If one reads the text as it is written, it is totally clueless.

I haven’t the paper here just now, but have read that the basis for screening tree ring proxies against local temperatures is that the response of trees is immutable, so that finding a tree that tracks temperatures now implies it has tracked temperature over its entire lifetime.

This assumption powers the entire field of tree ring proxies and may be the basis for Mann’s huffy rejection of criticism. It doesn’t take much reading into the biology of trees to discover that the ‘immutable response’ assumption is meritless.

The immutable response assumption has oozed over into other proxy evaluations as well, such as those of corals, where it’s not even plausibly justified.

Yes, for example, the Siberian larch’s growth cycle seems to be controlled by the maturation of the seed.As soon as this is complete, the needles drop (the larch is a dedious conifer). A warmer growth season would mean earlier maturation and defoliation,that is, a shorter growth season rather than a longer.This botanical fact would seem to eliminate this species from consideration as a “treemometer”. But it is doubtfull that climate scientists look very deeply into the botanical aspects of trees.

@ Pat Frank

“I haven’t the paper here just now, but have read that the basis for screening tree ring proxies against local temperatures is that the response of trees is immutable, so that finding a tree that tracks temperatures now implies it has tracked temperature over its entire lifetime.”

Not actually knowing anything about tree ring proxies other than what you said above, my going in assumption has always been that there are so many factors affecting tree ring growth that with regard to temperature they are essentially white (or some color) noise.

As such, if you select a whole bunch of trees and compare a subset of their rings to actual measured temperatures you will inevitably find a tree whose rings are highly correlated with the temperature record. And a whole bunch in the same general area whose rings are not.

How a climate scientist can postulate that ALL the rings of a tree with a subset of rings that are highly correlated with a relatively short thermometer record are equally well correlated with historic temperatures is a mystery to me, given that there are presumably a lot of trees in the same area whose rings are NOT highly correlated, but it is apparently SOP.

I am not technically qualified to reject the idea out of hand, but it sure appears problematical to me, especially since the resulting ‘historical record’ is being taken as gospel and used to justify enormously disruptive political actions.

That is, of course, assuming that such use is actually based on unawareness, and not on willfulness. Based on the number of times we’ve seen it, I’m not willing to assume “unawareness” or “weaknesses in comprehension” on the part of The Team.

The Jeff’s agree on this. However, I’m sometimes surprised at what people can’t understand.

Let’s make it three Jeffs.

Its a shame they do not show the same skills in quality data management they do in data torturing , to make it tell them what they want.

In the ‘old days ‘ the need to use proxies which where know to be ‘problematic’ was not an issue because everyone accepted the problems existed but with no choice and with these problems not actually having any real impact outside of research no one really cared. They only became a problem with the grand claims of ‘settled science’ with a great deal of real world impacts being based on their numbers.

In other words they went from where ‘about a foot ‘ which was good enough when judging a distance to claiming their values accurate to three decimal places over a life a death issue, without actual ever improving the accuracy of their measurements in the first place.

“…the balloon is still being squeezed”

Steve – thanks for another delightful piece of writing. Your technical expertise is only matched by your writing skill.

I’m only a canary in a gilded balloon.

==========

I don’t believe “annular” is the same thing as “annual”…

Das ist nicht nur nicht richtig, es ist nicht einmal falsch!

Translation: “That is not only not right, it’s not even wrong!”

Heh, some of my German still works 35 years later.

This was a very refreshing read. The amount of statistical abuse still perplexes me, even though it’s been some 6 years since I first started to have a closer look at the hockey stick and started to read your postings here on Climate Audit.

Trying to talk some sense seems like a Sisyphus work, though – Mann is using PAGES2K for what it’s worth these days, promoting it as “The study #HockeyStick deniers like #JudithCurry don’t want you to know about” in a tweet. (I responded that “The PAGES2K study is well known among hockey stick heretics. I’m not impressed, it contains proxies used upside-down…” – I got tagged with “#TinFoilHats #BlackHelicopters” for that, and then blocked by Mann while a few others took over his defense and bullying)

I’ve not been on Twitter for a while so it’s fun to hear how the #emptyinsults and mighty blocking ego are going. Well done.

Sampling on the Dependent Variable – UCLA

http://gabrielr.bol.ucla.edu/soc210a_f09/w9.pdf

210A Week 9 Notes

Sampling on the Dependent Variable

Technically, sampling on the dependent variable is when you select cases on the basis of meeting a criteria and then use those cases as evidence for the criteria. Since we’re usually more interested in associations than distributions we can broaden this problem to something like “sampling on theory affirmation.”

Steve: you;ve spotted highly relevant discussions. “Selecting on the dependent variable” appears to cover pretty much the same sins as ex post correlation screening and is probably more familiar to third parties.

“Thus both upward-trending and downward-trending series were assessed as more “significant” within the population of tree ring chronologies and given higher weighting in the reconstruction.”

===========

they selected cases based on their trend, and then used the result to argue that there was a trend in the data.

Case Selection and Causal Inference in Qualitative Research

Thomas Plümpera, Vera E. Troegerb, and Eric Neumayerc

We show that causal inference from qualitative research becomes more reliable when researchers select cases from a larger sample, maximize the variation in the variable of interest, simultaneously minimize variation of the confounding factors, and ignore all information on the dependent variable. We also demonstrate that causal inferences from qualitative research become much less reliable when the variable of interest is strongly correlated with confounding factors, when the effect of the variable of interest becomes small relative to the effect of the confounding factors, and when researchers analyze dichotomous dependent variables.

ex post screening or ex post correlation weighting

=============

isn’t ex post screening equivalent to ex post correlation weighting using weights of 0 or 1 exclusively?

in either case, if you are interested in studying temperature, if you use temperature to select your samples, you have created a statistically invalid sample.

“Based on a Random Sample” is the underlying assumption in statistics. Screening removes the randomness from the sample, leaving you with a mathematically invalid assumption.

I recently performed an audit of a study of wildlife habitat for a species in which spurious correlation was present. I was able to show that over the 11 study sites, the accuracy of the fitted multivariate predictive model declined linearly with number of nest sites observed (R^2=.85) from an “excellent” model with 70 nest sites to “poor” at 500 nest sites. I expect to have trouble with reviewers not understanding spurious correlation.

Coming from a natural resource background, I somehow learned the dangers of overfitting and spurious relationships sometime during grad school (and about how to handle outliers, as well). It baffles me that this concept is so hard for people.

Correlations as in the table above are exactly what you get from a shotgun pattern of random data. Try it.

Craig Loehle says:

.

I like that method, showing that the fit gets progressively worse as N goes up … definitely gonna steal that one.

w.

In a general analysis class in engineering grad school many eons ago, the professor gave us a simple assignment:

1) Using SAS on the IBM 370, generate two sets of pseudo-random numbers. IIRC, about 100 points each.

2) Perform a least-squares fit.

It was pretty eye-opening. Rsquare was something like .6, as I recall.

“The average correlation of chronologies in the tree ring network to the target SAM index is a Mannian -0.01”

And despite the layers of confusion this creates, it is important to note that the authors have likely pre-picked these proxies for the best temperature signal series. They achieved a zero correlation by ignoring piles of less preferred data. The average correlation should call the whole thing into question before it starts.

Reviewers and authors have some fast talking to do if they claim to misunderstand the problems multivariate data mining methods create. It doesn’t matter one lick which preferred MV tecnique they use, unless it breaks down to basically an average of the predictors, they all lead to the same place—Increased correlation with temperature in recent years creating the blade and repressed historic signals making the stick.

“It is equivalent to Partial Least Squares regression of the target against a network (e.g. here for a discussion).”

Keep writing it and writing it until it sinks in! It’s all the same thing with minor variations! EXCEPT for the decentered PCA mess that started Steve into the HS discussion which people mix up all the time.

There are several articles in print which support these conclusions, all a reviewer needs to do is read a little. I frankly can’t believe that they haven’t done that level of required reading. I also have difficulty understanding that anyone doing enough work to publish a paper, is so mathematically challenged that they can’t, or at this point, haven’t figured this out.

Re: screening. As a post-doc, a biologist brought me his data. It was a relationship of leaf area vs leaf length for an allometric model. There was a clear cluster of outlier points. He asked if he could drop them as being “no good”. I said you can’t just drop data because it doesn’t “look right”. I asked if he had done anything differently. He went back and checked and came and told me that in fact these were all leaves where insects had taken big bites out and he had estimated the missing part. I told him to either go remeasure those leaves, to check his math, or to not try to infill (!) missing areas.

In biometrics, people say anything less than an R^2 of .7 is not good enough to do anything with, and don’t count p values as meaningful. If something has a small correlation, it is called “weak”, as it should be.

As far as proof of something, I had a paper a few years ago (Loehle, C. 2006. Ecology 87:2221-2226) showing that species abundance distributions (with R^2>.9) could be derived from multiple underlying assumptions, so that even an outrageously high GOF does not necessarily “prove” what one might think.

I had to write a paper and present it when I was working in a pathology lab. I got the same statistical lesson in the dangers of spurious relationships. I fixed my paper and got a decent reception for it. That was more than 30 years ago. I wonder what is being taught these days.

I remember something similar in doing statistical analyses of multivariate geochemical data for my Masters thesis, in the early 70s. This was when “canned” statistical routines such as Statpac were popular, and students would try everything that looked like it might work, and see what gave “interesting” results, knowing little about the techniques.

It doesn’t seem like too much has changed in the last 40 years.

PDTillman

Other than that modern computing and packagaes like MatLab allow you to process much more data more rapidly. Still mainly the same ‘black box’ thinking – dump all the data into various different stats functions and see is anything sticks.

As I am sure you are aware, most of our geology colleagues are not the most numerate (even many with post grad qualificatiosn) other than when it comes to knowing how many beers to order. Heck, remember Phil Jones’s reputed difficulty in producing a simple chart in Excel…

My suspicion is that most of the failure of both authors and reviewers in perpetuating the stats errors is that there are relatively few highly numerate scientists in the palaeoclimate community, so the method development is left to the few like Mann who are confident in their maths and stats skills, and so if they are saying this is the way to calculate things, much of the rest of the community has neither the knowledge or confidence to pull them up on any errors. Add in some confirmation bias (i.e. ‘liking’ the hockey stick results), and you have a situation where a few people can really dominate the discourse and steer it whichever way they want.

Here is the correlation between two random vectors of 40 numbers, repeated 10 times: {0.393675, 0.234809, -0.151743, 0.0792625, 0.0634328, -0.0649641,

0.103304, 0.135903, 0.12292, -0.0789473}

sure looks like the table above.

Craig,

A neat demo but one that should be second nature to anyone used to stats.

Why did they not abort the exercise when they saw the correlation coefficient low values?

A waste of time and money (some mine?).

I was going to ask the question if there had been a rigorous survey made of the statistical techniques used in these reconstructions. However, i think that my question was answered in the negative by the section quoted above. Given the importance of this work, I find it surprising that there has been little mathematical investigation into its basics. Am I correct in my impression that the techniques used are just ad hoc methods created by the researchers themselves with no input from mathematicians on their utility?

Steve, I wonder if it would be worthwhile writing up these cautions and examples for a statistics journal, perhaps with McKitrick and/or Matt Briggs. Or with a volunteer in the audience? It’s good to see the blog postings, but if you hope to have an impact in the academic world, you need something citeable.

Science does (eventually) self-correct, and a formal publication, even well outside the climatology world, might help. Just a thought.

As always, thanks for the amazing and insightful work you do here.

Cheers — Pete Tillman

Professional geologist, advanced-amateur paleoclimatologist

I have a some quick questions about the paper under discussion in this thread:

1. The SAM index is a measure of zonal sea level pressure differences between high and lower latitudes in the southern hemisphere. How does one get from sea level pressure differences to temperature anomalies and tree ring proxies?

2. The authors do reconstructions by combining proxy data in several ways using correlation selection (5 of 26 proxies qualify) and by weighting by correlation. I did not follow how all the correlations of the proxy reconstruction and instrumental data were used to determine the 4 reconstruction variations listed at the bottom of Table 1 in the SI.

3. Table 1 in the SI, and shown in part in this thread, shows 26 proxies with mostly low correlations with some negative and some positive. Then at the bottom of the SI are 4 renditions of the SAM reconstruction (I assume by using weighting and selection based on correlation of the proxies to the SAM index) with relatively high correlations including the largest at 0.750. On the face of this result we should be talking about the miracle of averaging.

4. How do you get degrees North in longitude and degrees E in latitude. See column labels in Table 1 in the SI?

Am I correct in assuming that the proxy data from Table 1 in the SI is supposed by the authors to be sensitive to temperature and further that the SAM index will tend to produce a colder Antarctica interior and a warmer Antarctica Peninsula and lower latitudes in its positive phase and vice versa in the negative phase? If this is correct then correlating the supposed temperature proxies with the SAM index requires something more than an averaging of all the proxy responses.

Kenneth,

Despite reading the paper and Steve’s post twice, and mulling over the SI, I am confused as well. Warm water to the west of S. America yields more precipitation, and the SAM reconstruction roughly follows Kim Cobb’s Palmyra coral isotope (and so ENSO) record with a prominent maximum/minimum at 1460-70 AD. But I can’t quite sort out what the correlation statistics represent – are we correlating to the SAM or to temperature? When I try it either way I cannot understand how they could have screened the proxies ex ante and still ended up with an average zero correlation.

Steve: Matt, you’re entirely correct to wonder how an author can both have ex post screening and average zero correlation to SAM. Here’s a (partial/preliminary) answer. The underlying chronologies appear to be mostly responsive to precipitation, which has a sharp gradient. Neukom’s original selection was to fit temperature: it wasn’t correlation screening, but presumably was sort of like correlation screening. The ante data is unavailable and Neukom refused to provide it. This reduced the data from 104 to 14. Abram than correlated the reduced data set to SAM, thus the zero. Ironically, they might have done better with the original data, which appears to be correlated with SAM. It’s a real clusterf.

The SAM index is an annular measure of the surface pressure difference between something like the latitudinal zones 65S or 70S and 40S. Marshall uses 6 strategically located stations to better obtain an accurate measure of the SAM index and it was that index the authors of the paper under discussion on this thread used for correlation with the proxy responses. It is also known that the positive and negative phases have counter effects on the 40S and 65/70S temperatures as the latitudinal pressure difference effects the atmospheric patterns and thus temperatures. Knowing those facts I can only conclude that the correlation of the temperature proxies used had to applied separately to the proxy responses based on latitudinal zone origin. I would suppose that one could merely take a difference between the average responses (temperatures?) from the SA tree ring proxies (shown in a pink blocked background in Table 1 of the SI) and the Antarctica ice cores (shown in Table 1 of the SI in a blue blocked background). The Antarctic Peninsula is blocked in a lime color and being at 64 S and known to warm/cool counter to the Antarctica mainland probably is included with the SA tree ring proxies in any averaging. I believe the Peninsula was given its unique color block because it was data new with this paper.

Given that what I assume above to be reasonably close to what the authors did, I would continue to have problems with the final high correlation result of the SAM to proxy (zonal differences) as within each zone there are a mix of negative and positive correlations. What I find very suspicious about this paper is a lack of discussion on exactly how these correlations were made.

Were the editors and reviewers paying much attention when Table 1 shows the Latitude and Longitude columns incorrectly labeled?

Steve: the reason for the high correlation of the reconstruction despite the indifferent correlations is simple: think about what happens when you do an OLS linear regression of 39 observations against 25 nearly independent variables. You’re going to get a very high correlation even with white or red noise. PLS is a little different -BUT – and this is a point that people don’t think about it – if there was very little common “signal” in the independent variables, as is too often the case with tree rings, then the rotation vector (X^T X) ^{-1} in OLS regression is “nearly” (in some sense) orthogonal and thus the PLS coefficients will be “closer” to the OLS coefficients. I.e. you get an overfitting problems in PLS regression that is “close” in some sense to the overfitting problem in OLS regression. I don’t know how to do the algebra to document this, but I am 100% certain of the result from the geometry,“Steve: the reason for the high correlation of the reconstruction despite the indifferent correlations is simple: think about what happens when you do an OLS linear regression of 39 observations against 25 nearly independent variables. You’re going to get a very high correlation even with white or red noise.”

Steve, my point here is that given the known underlying relationships of differences in seal level pressures (SAM) and temperatures by latitudinal zones, the only correlation that makes sense of supposedly temperature sensitive proxy responses with SAM is to use the difference between the average proxy response for the Antarctica interior ice cores and the average of the proxy response of the Peninsula ice core and the SA tree rings. If the authors combine all these proxy responses and then make a correlation with the SAM index (in the calibration/validation period) they may obtain a high but spurious correlations as you imply but that would tell me that the authors are not only abusing statistics but are also making a nonsensical correlation.

It is easy enough to show the ease of obtaining a spurious correlation by simulations – which I think I’ll do here for my own benefit.

I collected all the Abram paper data for the SAM index and Proxies from 1957-1995 into Excel and then into R for my calculations. I standardized all these series by subtracting the series mean and dividing by the standard deviation of the anomaly series. I detrended the series and modeled the residual series for the best fit to an ARMA model by aic score and avoidance of ar and ma coefficients too close to a unit root. The models tested were ARMA(0,0), ARMA(1,0), ARMA(2,0), ARMA(1,1) and ARMA(0,1).

I used the models (red/white noise) from the SAM index and 25 proxies to simulate and calculate the correlation between the SAM index simulation and each of the 25 proxy simulations and the corresponding p.values. From the correlations with the 5 highest r values, I made a weighted composite by the value of the correlation by the method used in the Abram paper and calculated the correlation of this composite with the SAM index simulation. I recorded the 100 r, p.values and the absolute value of the r values. I also used various ranges of trends selected randomly for the proxy simulations. I found that the trends did not change the results. The average absolute value of the correlations was 0.54 and the standard deviation was 0.075. The p.value average was 0.002.

These results show that a selection routine as used in the Abram paper could take series of pure white/red noise and find a composite of 5 selected proxies that results in a significant correlation of this composite with a white noise reference or white noise reference with a trend with a 95% CI range that could approach r=0.69.

Finally I looked at simulations as above except here I drew the 5 best correlations from a selection of 100 simulations instead of 25. I did this to demonstrate how using 25 proxies as in Abram, that well could have been selected from a larger population, can affect the composite of 5 correlations. In this case I obtained an average r=0.65 and a p.value=0.00006 with 95% CI range for r of 0.54 to 0.76.

I am in the process of sending my R code, results and input data to SteveM.

SteveM, I plan to put the R code that used to model, simulate and estimate r and p.values in good form and send it to you. I am entertaining grand kids this weekend so it may not be sent for a few days.

To complete my exercise here, I evaluated the standardized 25 Proxy series and the SAM index series from 1957-1995 and evaluated the series for auto correlation and best ARMA fit. It turns out that the SAM series and some proxy series were essentially white noise while some proxy series fit ARMA (1,0) model with an ar1 varying from approximately -0.35 to +0.35. I did not model all 25 proxy series and judged that using a white noise model for the SAM and proxy simulations with standard deviations of 0.60 and 0.85, respectively, would give a conservative estimate of the 25 correlations between the SAM simulation with the 25 proxy simulations.

The 25 correlations and p.values are given in the table below. Taking the 5 proxies with the best (lowest) p.value scores and then weighting those proxies by the correlation values as indicated by Abram I obtained r=0.66 and p.value=0.0000054. Those values obtained using white noise simulations are very close to what I obtained using the proxy and SAM data from the Abram paper and what the authors indicated was there method.

While I could do several of these simulations and might in the future, in conclusion, I agree with the comments made here by SteveM and others that the Abram paper looks like an exercise in manipulating white noise series or at least what one can do by manipulating white noise series.

Correlation p.value

[1,] -0.093584817 0.570939700

[2,] -0.091791102 0.578374741

[3,] -0.163639126 0.319538009

[4,] -0.374221249 0.018926836

[5,] -0.017344765 0.916533230

[6,] -0.302895697 0.060883738

[7,] 0.005644802 0.972793366

[8,] 0.371790445 0.019778471

[9,] 0.089741850 0.586921996

[10,] 0.153393991 0.351175438

[11,] -0.130053958 0.430040770

[12,] -0.158393918 0.335505034

[13,] -0.255354867 0.116660907

[14,] 0.010473145 0.949544771

[15,] -0.031280840 0.850062143

[16,] -0.022911088 0.889890658

[17,] 0.016384593 0.921138908

[18,] -0.145965719 0.375262026

[19,] 0.181251567 0.269477959

[20,] 0.041380909 0.802492375

[21,] -0.048882315 0.767600852

[22,] -0.490362748 0.001529344

[23,] -0.015759319 0.924139547

[24,] 0.076486140 0.643512034

[25,] -0.145018654 0.378401661

Kenneth: looks like an excellent analysis in true CA spirit. Can you post post up or email your scripts so that I can verify. Better that you only report 2-3 digits.

I took all the proxy data and the SAM index used by Abram and made correlation estimates with p.values for all proxies for the period 1957-1995. While the values I obtained were not exactly those of the authors the results were reasonably close. I then did a composite-plus-scale to combine all the proxy data and calculated a correlation between the proxy composite and SAM for the period 1957-1995 and obtained a correlation of 0.15 and a p.value of 0.44.

Evidently the authors did some other calculation. They talk about weighting and selecting proxies by correlation of the individual proxy response to the SAM index. I might try that next – even though this method makes no sense to me on a couple levels.

I did not use na.rm =TRUE (remove NAs from a couple of proxies for the later years in the series) when I calculated the row means for the composite of the proxies and thus the correlations with SAM and the p.value that I reported previously are only for the period 1957-1983. For the period 1957-1995 the correlation is 0.04 and p.value is 0.82. The conclusion is even more emphatic when using the entire calibration period, i.e. one has to do more to the composite of the proxies to get the higher correlations and lower p.values obtained by the authors.

Steve: in my article above, I was focusing on the tree ring network. Overall, the average correlation to SAM among the 25 Abram proxies is an even more Mannian (supermannian ?) 0.00616 – remarkably similar to the verification r2 of his AD1600 MBH reconstruction.

I calculated a correlation of SAM versus the selected and weighted proxy composite as noted in the Abram paper. I used the 5 proxies with p.value equal to or less than 0.10 and weighted the proxies by the correlations found in Abrams. (My calculations only yielded 2 proxies with correlation p.values equal or less than 0.10). I used the sign of correlation in weighting, i.e. negative correlation would require a negative weight. Under those conditions I obtained correlation and p.values close to what the authors reported for SAM to proxy composite of r=0.63 and p.value=0.00001, respectively.

I also calculated SAM to composite correlation and p.value for the period 1957-1995 by a method that makes sense to me. I did a composite on the SA tree ring and Antarctica Peninsula ice core and a composite on the Antarctica interior ice cores and then after differencing those 2 series I did the correlation with that differenced series and the SAM index series. For that calculation I obtained r=0.21 and p.value=0.20.

It is therefore evident that for Abram to obtain significant SAM to proxy correlations requires selecting 5 of 25 proxies and then in turn weighting those proxies by correlation. After weighting, Abram used, in effect, about 2.5 proxies out of 25.

I now believe I have the information required to do simulations with white and red noise proxies.

Kenneth, question 4 in Table 1 above, flip Longitude and Latitude. Negative latitude values

are south of the equator, negative longitude values are west of Greenwich, UK.

(Funny to walk thru Greenwich and find 3 clocks with different times!).

42 deg S Latitude and 69 deg W Longitude is Patagonia.

>However, their network of 14

This ‘however’ is unnecessary. I was suspicious as soon as I saw ‘temperature-sensitive’.

Slightly off topic – Steve while your expertise resides with math and statistics, can you address the divergence issue in Briffa’ proxies. Granted there are numerous factors that affect tree species growth rates, temps, precipitation, sunlight, nutrition, etc.

As a general rule, trees grow slowest at the colder temps and grow faster as temps rise up until the point at which the optimum temp is reached. As temps continue to rise, the growth rate slows until the point that it becomes too warm for growth. Similar to a bell curve. The question is whether this issue has been addressed with Briffa’s proxies and if so, what kind of response. Secondly – has the issue been addressed as possible reasons for the ability and/or inability of to pick up warmer temps during the 1,000 temp reconstructions

thanks. ( I concur with your prior comments that the reconstructions provide little scientific insight

Did you try searching the blog?

thanks mn – found several good references

http://climateaudit.org/2008/11/30/criag-loehle-on-the-divergence-problem/

This is so sad that PhD scientists would countenance such statistical malpractice, when everyone with any sense known it is wrong, and when doing it right is so easy. If you think tree rings in the Upper Slobovian Plateau are a proxy for temperature, randomly subdivide them into two groups. Use the first group for hypothesis generation, compute a coefficient and weight from it, and then use the other group for validation.

This is a recapitulation of a delusional and self-serving period in the oncology literature, where cancer drugs were tested in large numbers of patients, and then analysis was done only on those (usually a minority) with “chemo-sensitive” disease. If it is wrong when a radiologist measures a tumor, it is wrong when a dendroclimatologist measures a tree ring.

Does every field have to learn this lesson individually?

The use of advanced statistics by some climate scientists is like the weekend golfer who is trying to hit a 2-iron. Just pull out a 5-iron and get it close. The author’s could do a better job with fewer proxies, simpler techniques.

Again I am embarrassed by this poor science coming from fellow Australians. There is enough evidence by now that there are educational failings in parts of Australia’s tertiary and related institutions receiving government grants for research. These problem types were not apparent in my career years. They seem to have become common in the last 2 decades and are often in association with the global warming theme.

So can I please ask again for readers not to assume that all Australian science is poor. I can’t produce hard evidence in support, but I can note for example that there is discussion about world class research on medical/immunology themes at present.

What can be done? Not much from where I sit years into retirement and absent a once-productive network. More critical evaluation of grant applications would seem to be one way. Most of the relevant grants are from government. We have a reforming Prime Minister and a Minister for Environment who is being manhandled by a bureaucracy that has been over funded for a decade. Here is a letter from a top departmental advisor, a person whose advice the Minister takes – apparently knowing no better. (I have kept it anonymous because I have not asked the parties for permission to reproduce it). http://www.geoffstuff.com/EnvirLetter5May_2014.docx

This letter is about as appalling as the statistical errors in Steve’s thread. There needs to be more penetration of the message that the science is poor, into the Ministry and the Departments. Has anyone found ways beyond those on useful blogs to get the message about poor science to those who make policy and can control funding?

Re: Geoff Sherrington (Jun 17 03:17),

Guys, Geoff’s first link is just a word doc. Right mouse click and choose save as xxx.docx or use I.E. The second is pwd protected,

Thank you, Charles. Medical distractions happening here. Apologies.

Geoff.

Nobody should broad-brush a whole country’s scientific community because of the failings of a few, especially if they are concentrated in one or two topics. Just off the top of my head, the high-precision lasers used in the international attempt to detect gravitational waves (LIGO) are largely developed in Perth.

Sorry, Geoff, but for me all your link provides is machine language

ianl8888

Apologies. Site maintenance issue. Try https://cpanel37.syra.net.au:2083/cpsess6836888530/frontend/crazycloud/filemanager/showfile.html?file=CaptureDepEnvir.JPG&fileop=&dir=%2Fhome3%2Fgeoffstu%2Fhome2%2Fgeoffstu%2Fpublic_html&dirop=&charset=&file_charset=&baseurl=&basedir=

No again – demands a passworded login

Oh well … :)

Geoff,

Don’t worry, Australia has too much competition for dopiest academics to just waltz off with the cup…..to give a little love to Aussie science I recently saw a presentation from these guys on how to handle nuclear waste:

http://www.austceram.com/Synroc-prgress.htm

They found certain mineral combinations that naturally contain radiative materials safely and for extremely long time periods; figured out how to create these structures using hot isostatic pressing; and so encase the waste in these synthetic rocks. Extremely clever……

If the statistics work in this paper and many others is as bad as you say, there ought to be a doozy of a paper in it for you, Steve. Why not write it up? Journals love publishing the big noisy contrarian paper that makes waves: it’s what they live for.

Steve: the climate science community has an obligation not to use bad statistical methods – whether or not I choose to comment on it either here at Climate Audit or in academic journals. Quality control in the field should not be dependent on whether I write a “doozy of a paper” or not. I think that there are a number of topics that I’ve written about the blog that would make interesting longer articles, but, for some reason, my submissions (or ones associated with me) have tended to provoke extreme animosity among reviewers. If I were younger or had more energy, I would spend more time on this, but due to limited time and energy, haven’t spent as much time on academic journals as I would have liked. But, at the end of the day, it’s up to the publishing scientists to do a proper job in the first place. As to your use of the term “contrarian”, I’m not sure why you apply it to me: I’ve made technical statistical criticisms of specialist papers, criticisms that I believe to be well founded. My statistics and recommended protocols are completely conventional and even “consensus”. The authors who are flouting consensus statistical protocols are (too often) climate scientists, though I view their statistical practice as incompetent and would not use the term “contrarian statistics” to describe their work.

don’t know if this article has been discussed here before…. mostly focused on biomedical research, but it takes note of some problems with weak statistics and lack of (incentives for) replication:

http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

Ioannidis has now been appointed at Stanford U. with a new program to examine problems in scientific research. Might be good for those who discuss these issues at a high level to consider sending examples with analysis to this group at Stanford:

http://www.economist.com/news/science-and-technology/21598944-sloppy-researchers-beware-new-institute-has-you-its-sights-metaphysicians

interview with Ioannidis…. his article at PLoS Medicine has passed one million views! Perhaps a parallel article in PLoS One (their most general, multi-disciplinary journal) could achieve similar attention to failings of statistics in climate related research:

http://blogs.plos.org/speakingofmedicine/2014/06/23/one-one-million-article-views-qa-author-john-ioannidis/

“Journals love publishing the big noisy contrarian paper that makes waves…”

Oh yeah, that’s exactly what we’ve seen all these years, haven’t we?

Sven.

I detected sarcasm in claimsguy’s post.

Journals love “novel” results, new instruments/measurements and “worse than we thought” but are very conservative when it comes to showing that ulcers are caused by something other than stress or that the big results in climate change are horrifically badly done.

So for contrarian read competent. For consensus read clusterf.

I think you may be slightly out of your depth claimsguy. This site focusses on the mathematics, and in particular the statistics, used by the climate science community. Outside of that community, and its supporters, there is no real notion of “consensus”, or “contrarians”. In other sciences papers are written to be discussed, and in 99% of the cases dismissed, or at least found to have flaws, by the scientific community in general. In climate science it appears that papers are put forward to prove the hypothesis, or that Mann’s hockeystick paper is right, a paper that has been proved comprehensibly flawed by Steve and others, which has seen some parts of the climate science community double down in trying to prove Mann was correct (he wasn’t and isn’t).

In this case the paper under discussion is using statistics, as most of them do, to develop a picture of past temperatures. It is these statistics that are under discussion, what your witnessing is people who are expert in the statistical methods of extracting signals from noisy data discussing the methods used in the Abrams paper, and as they are all pretty hard bitten, experienced men and women, who’ve made, or are making, a living out of identifying the signal in the noise for various disciplines they are passing on their knowledge to improve the science. That’s hardly contrarian, and indeed outside the realms of “consensus” science would be welcomed rather than scorned.

In short this isn’t about “global warming” it’s simply about a paper that is seriously flawed in it’s statistical analysis, and not for the first time has been demonstrated to be so on this site. The Gergis et al 2013 paper was three years in production and was found to have serious problems with its statistical analysis within a few days of being reviewed on this site and had to be withdrawn, that’s how good the consensus scientists are in using statistics.

Partial Least Squares (PLS) was notably used in Kinnard et al, Nature 24 Nov 2011, pp. 509-512, discussed on CA 12/3/11 “Kinnard and the DArrigo-Wilson Chronologies,” http://climateaudit.org/2011/12/03/kinnard-and-the-darrigo-wilson-chronologies , and 12/5/11, “The Kinnard Arctic O18 Series,” http://climateaudit.org/2011/12/05/kinnard-arctic-o18-series/ .

Although PLS is essentially data mining on steroids, there is actually a statistical literature on it, cited in the Kinnard et al. paper. Supposedly it is considered to be a reputable methodology in “chemometrics”, though it is not clear to me if this view is general, or just restricted to a handful of promoters. There is a cross-validation “Q-squared” procedure that that is supposed to compensate for the data mining aspect, but I am skeptical.

The issue of where valid empiricism ends and invalid data mining begins is an important one that permeates all applied statistics. Steve has done everyone a great service to keep raising this issue. At this point I don’t have the answers, but at least I’m aware of a lot more questions than I used to be.

Steve: I tend to use Partial Least Squares in the same sense as Borga and to consider one-factor PLS (i.e. correlation weighting.) PLS with more factors becomes more complicated – in my older posts, I generally took care in noting the number of factors and ought to have done so in this post as well. In the proxy setup that I’m interested in, proxies by definition are supposed to be the signal plus low-order noise. Thus one-factor methodology ought to suffice.My own perspective on these methods is strongly influenced by the highly geometric perspective of Borga and also Stone and Brooks, where the OLS coefficents are related to PLS coefficients through a path in coefficient space. From that perspective, the variously mind-numbing alternative multivariate methodologies merely appear as different paths in the coefficient space (netting out forests of matrix algebra)

Having been exposed to econometrics, it always amazes me how little justification is offered for ignoring the Gauss-Markov theorem. When the conditions of the G-M theorem don’t hold, there are specific techniques that have been developed to deal with each deviation.

All the bizarre off-brand multivariate techniques cooked up and institutionalized in stats packages may be great for exploratory data analyses but are just not legitimate for confirmatory purposes. They don’t necessarily even plim to the true coefficients as sample size goes to infinity. You have multicollinear data in a correctly specified model? Tough! Nature is not running the experiment you want. Come up with a better research design or different data. Using gobbledygook estimators that give a “significant” answer even though they don’t converge to the true value is not adding enlightenment.

Steve: the sort of matrix that you want for calibration-reconstruction is opposite to what you want in regression. Your assertion “You have multicollinear data in a correctly specified model? Tough!” is diametrically wrong for reconstruction. If you have a network of “proxies” which actually are “proxies” i.e. signal plus noise, then you will have very strong multicollinearity. I.e. simple averaging methods ought to work. OLS regression is the worst.

Not sure what you mean here–multicollinearity is only a “problem” in the first place if your goal is to separately identify a coefficient on each of the collinear independent variables, which clearly would not be the case in the situation you describe. If what you’re interested in is the collective implication of the whole set of proxies then there’s no need for multiple regression in the first place. You could just average up the proxies somehow and correlate those with temperature.

But my understanding of what some folks are doing is that they are trying to tease out the differential impact of different instrumental measurements on proxies, and some of these co-vary a lot so that their influence can’t be distinguished. In which case, there isn’t any statistical “trick” that can tease out information that isn’t actually in the observed pattern of independent variables. If, say, it’s always hot in your data when it’s rainy (and vice versa), then you can’t tell what happens when it’s hot and dry or rainy and cool.

There is no penalty to pay for data mining (data snooping is better term for what is done in climate science) and even post fact selection as long as the experimenter can do a controlled experiment with new data. In the hard sciences where controlled experiments can be performed conveniently in the laboratory or even in the field there is no statistical penalty to pay.

It is when the experiments cannot be controlled and repeated that post fact selection becomes a statistical problem and that happens in climate science and economics, for that matter. It has been my experience that some hard scientists have a difficult time appreciating the problem with data snooping outside their field. It is probably because they can run controlled experiments in their field and are unaware of the statistical issues in soft science fields.

I once participated in a blog on investing strategies where a criteria was applied to buying and selling stocks based on past (in-sample) data. The criteria never made much sense from a financial or business aspect but the in-sample results were, of course, always with very high past returns and the originator of the strategy could, after the fact, come up with some rather amazing financial rationale for the criteria used. Failed strategies from in-sample data would of course never see the light of day and thus one did not even have an accounting of how many different models were analyzed before making the final selection. What I found was that any number of hard scientists with otherwise apparent high intelligence never “got” the problem with these strategies. I was there not to invest but to point to the flaws. It did not help that the time period was when momentum stocks were in their glory. I never went back to gloat when the market crashed.

Steve: the lessons of financial models has very much influenced my thinking on ex post multivariate methods. As you observe, there is a remarkable obtuseness to the problems among many otherwise intelligent people. I’ve discussed this for years with Rob Wilson, who, for example, remains unconvinced that there is any problem with ex post screening and ex post correlation weighting.

Wouldn’t a topic of this importance be subject to rigourous mathematical investigation rather than being a matter of opinion?

You don’t need math to understand that rejected data can be used to refute a study’s conclusions.

Steve’s reply to Hu has been repeated as a theme here in multiple forms and is very helpful in understanding just what really most proxy papers are made of. Plenty of others here have background in MV methods that I didn’t. So the same lesson said many ways here has helped me form a sort of intuitive feel for a wide variety of math problems that a BS in engineering is not typically exposed to. Learning those lessons in particular has been quite entertaining for me over the years.

Steve: my initial exposure to linear algebra was very pure math and barely mentioned matrices. It focused on eigenvalues not as singular value decomposition but as solutions to Av=lambda*v and was all about dual spaces. It was very “geometric”. I like thinking about coefficients as a vector in dual space. I also really like the Stone-Brooks idea of ridge regression providing a path in dual space. The same concept applies when one regularizes solutions to (X^T X)^{-1} through using 1-n principal components/eigenvector pairs. This also gives a path (of line segments) in coefficient space. At the end of the day, multivariate methods are going to yield a point or perhaps a one-parameter path in coefficient space. It is a very unifying concept. It’s not particularly complicated though it doesn’t seem to be an approach familiar to people whose interests are more applied. My math is just strong enough to have a picture of this but I’m at full stretch,

I’ve noticed a recent trend in the cultists’ and their useful trolls’ postings at Climate Etc and WUWT that when McIntyre is mentioned it is to state that he has (insert variation of “gone around the bend”).

It’s no wonder when you relentlessly point out their incompetence/transgressions.

Please continue the great work!

I just stumbled on this article dealing with statistical errors in medical research. Some people more skilled in statistics than me may find it useful. http://www.isdbweb.org/documents/file/855_11.pdf

JD

Hmm, some confirmation of things that I was introduced to back in the 1980s when I was working in a pathology lab. The more things change, the more they stay the same. One would think that people do not study history or extend their research back far enough to see what has been said at least a generation before them.

This is just people not understanding what they are doing, and doing it in a computer package. Knowing how to do an analysis is not the same as understanding what is being done.

There may be some truth to that. Among the things driving this is, quite possibly, dependence on computers, and the related loss of mathematical intuition. Hamming must be rolling in his grave over the way so many use computers as a substitute for understanding and intuition rather than an enhancement.

Good work Steve

Its interesting that despite Gergis et having been found wanting, that state sponsored “scientists” would return unbowed with further claims. It seems to me that there is a complete lack of internal control by a number of Australian institutions when it comes to releasing AGW material. Is it because:

1) Insufficient diversity amongst the pool

2) Poor internal controls and no external/higher review

3) They really believe and therefore subjectively discriminate the outcomes

4) The institutions are adhering to the IPCC line and the pool follows the party line

5) A function of the long term decline in mathematics education in Australia

Whatever it is, each time an event such as this occurs, it corrodes the reputation and legitimacy of the institution which from a real science perspective is a detrimental outcome. It certainly suggests that there should be an independent gateway such as an Office of Statistical Review that all such material should pass through before being forced upon us by “our” ABC.

The problems discussed at this thread are not an Australian problem, but rather a climate science problem. I would venture to guess that these problems exist to some extent in other areas of the softer sciences. Hard sciences would not be effected by these problems – at least where controlled experiments are available.

I’ll agree with your statement. Perhaps I’d change “softer sciences” to “slower sciences”.

Steve — I finally read the thing — You didn’t tell me that it was a CPS answer sucker.

” The normalized proxy records were then combined with a weighting based on their correlation ” – nice stuff.

I wonder why they didn’t chose an obscure MV method that leaves the authors and reviewers with a defensive layer of comprised of pure confusium to hide behind when arguing its veracity. It leaves me wondering if mayhaps they don’t have even that level of math chops.

Steve: Although they say that their method was “CPS”, CPS denotes to me a method with simple averages or averages weighted by area in some way – not one with ex post correlation weighting. So I don’t think that it was CPS. They do mention alternative variants (unarchived) that do use “CPS”. I haven’t parsed these yet. One of the variants includes yet another screening step so it’s not even network CPS.So in some real sense they get both the hockey stick blade for free by simply cherry picking those that match by weighing them more and also the straight blade for free by simply using noisy data that averages each other out to no signal at all in the historical time period they are not applying selective weighting to. It’s such a statistical black box even when data and software *are* released that one can barely argue against it except as a voice of authority minus the academic standing to claim such authority. That’s why the simplicity of Marcott 2013 is so crucial in that anybody looking at a simple plot of the input data that lacks a blade can fully understand that the blade was a pure artifact of simple data drop-off at the end due to re-dating of some of the input data. The published and widely promoted blade was so extreme that anybody can graphically comprehend that it’s just not to be found in the input data.

Nic,

No hockey sticks here, although the logic in the paper is very diffuse and hard to follow. The one sentence summary would actually be “we have finally nailed down the natural climate cycles of the SH over the last millenium, and we can clearly show koyaanisqatsi caused by greenhouse gases in the 20th century.” This is the part they hope to exploit going forward:

“If the ENSO-SAM relationship that exists on interannual timescales also influences the mean state of these climate modes, then the maximum in Niño3.4 SSTs during the fifteenth century (26) could have contributed to the SAM minimum at this time (Fig. 4b). The positive trend in Antarctic Peninsula temperature and the SAM during the sixteenth to eighteenth centuries, and the reversal of these changes during the nineteenth century, also closely mirror changes in mean Niño3.4 SST. However, tropical Pacific climate seems to become a secondary influence on SAM trends during the twentieth century, when the positive trend in Niño3.4 SST (ref. 26) would be expected to have imposed a negative forcing on the mean state of SAM in the Drake Passage sector. A recent study highlighted that the positive trend in summer SAM during the twentieth century has emerged above the opposing interannual forcing by ENSO (ref. 27). Our findings extend this perspective over the past millennium and suggest that tropical Pacific SST trends could have acted in a way that has muted the impact of increasing greenhouse gases and ozone depletion on SAM during the twentieth century.

The proverbial pea is being shifted around on many levels. Note that reference (26) links back to the Palmyra corals (which were suspiciously not used in this study most likely to provide “independent corroboration” about ENSO/SAM) and has Michael Mann as coauthor. The overarching goal is to carve down the significance of 20th Century Antarctic cooling to a size that can be drowned in a bathtub.

Am I alone in detesting language like “on interannual timescales” when “from year to year” would do?

Wide error bars and a bit of wiggle can’t hide the blade that breaks out to form a hockey stick indeed:

I appreciate the clarification about Antarctic cooling, so this reminds me of Steig too except now the Peninsula has been extended all the way up the coast of South America.

This article will become notorious for popularizing (and coining?) the word “unprecedentedness”. It doesn’t yet exist, so are you giving it any meaning different from “unique” or “first”?

In Climate ScienceTM, any novel finding (or “worse than we thought”) is described as unprecedented. Therefore “unprecedentedness” is a valid adjective. It is like an inverse barometer of time. The greater the degree of “unprecedentedness”, the shorter the period of history considered by the researcher(s).

Hector, another potential noun-form was proposed in a CA “poetry” contest in 2008:

http://climateaudit.org/2008/07/08/bull-dogs-have-little-dogs/ as follows:

Other offerings which I thought worthwhile:

And of Hansen’s twin bulldogs Tamino and Gavin:

The thread is worth re-reading.

Oops. Unprecedented is an adjective, unprecedentedness is a noun. Mea Culpa.

I sent this post to a friend of mine who is a statistics PhD and does very sophisticated modelling of equity/options/derivatives markets for a major bank. He has no view on global warming one way or the other.

Here’s his reply:

Thanks Sean – very clear & salient write-up. It’s the first paper I’ve read that delves into the underlying statistics that frame the debate.

So I replied:

Do you have any interest in reading the underlying paper and seeing if his analysis is correct?

And his reply:

Yes indeed although from what he’s written I’m willing to be that he’s right bc he’s pointed out some fairly common and widespread misspecifications of statistical models amongst non-statisticians.

I sent him the article and will try to get him to review it.

You don’t even need stats. Just looking at the individual proxies is enough to identify the problem. If only one or none show a hockey-stick beforehand then combining them by whatever method shouldn’t produce one either. It’s just common sense!

It is puzzling that there are statisticians and/or those who are familiar with statistical techniques and methods that coauthor climate science papers and thus it is difficult to believe the problems with climate science as shown in this thread results from a dearth of people in the field with a lack of statistical backgrounds.

Could it be that these people lack the courage to speak out or can it be that even statistically minded people can have a blind spot when it comes to these selection processes used in temperature and index reconstructions?

I agree. But when you talk to a so-called expert they have multiple reasons why “it’s more complicated than that” and therefore common sense effectively doesn’t apply. So having a real but disinterested expert validate common sense is always helpful.

Off topic, but this was interesting:

http://klimazwiebel.blogspot.com/2014/06/misrepresentation-of-bray-and-von.html?showComment=1403506008877#c3987302369502607948

“@ Mike R

I would be happy to send the raw data and the code book to anyone that makes a request. Simply send the request by email: dennis.bray@hzg.de. I don’t have time, however, to calculate crosstabs for the entire data set, not at this time, anyway.”

As I mentioned over there, I’m not competent to do it, but I would like to know if the same scientists who are skeptical on one point are skeptical on all, or if a much larger group of scientists are skeptical on at least one critical component of CAGW. Would anyone here like to look at the raw data?

Update from Dennis Bray over there:

“The full raw data set in Excel format with code book is now available at https://www.academia.edu/7454421/CliSci_2013_Data_Set_Excel_Format_with_code_book. The report of descriptive statistics can be found at

https://www.academia.edu/5211187/A_survey_of_the_perceptions_of_climate_scientists_2013

It would be appreciated if any user would share their findings on this blog.”

SteveM, if you are attending your blog, I have a post that is in moderation – and because I think it was posted immediately after my previous post. It certainly is not controversial, rude or does it use any forbidden words.

## 2 Trackbacks

[…] http://climateaudit.org/2014/06/15/abram-et-al-2014-and-the-southern-annual-mode/ […]

[…] Again I am embarrassed by this poor science coming from fellow Australians. There is enough evidence by now that there are educational failings in parts of Australia’s tertiary and related institutions receiving government grants for research. These problem types were not apparent in my career years. They seem to have become common in the last 2 decades and are often in association with the global warming theme. So can I please ask again for readers not to assume that all Australian science is poor. I can’t produce hard evidence in support, but I can note for example that there is discussion about world class research on medical/immunology themes at present. (Geoff Sherrington) […]