Industrial Strength Voodoo Correlations

This is a very pretty example, though the problem is endemic as Mann et al 2008 uses Mannomatic methods for industrial strength voodoo correlations.

This one came up from trying to replicate Mannian confidence intervals from original data – an effort which promptly foundered because my CPS emulation, which, after much effort, finally worked on the 1850-1995 period, immediately foundered when I tried with the “late-miss” (1850-1949) and “early-miss” (1896-1995) calibrations. So back to step by step reconciliation, obviously a lot easier since a lot of modules are working. UC saved the Matlab intermediates so we had something to work with.

Here’s what happened.

In making my “late-miss” emulation for AD1000, I got all the “right” proxies and the calculations all worked (with a little editing.) But the orientation of the Socotra dO18 speleothem was inverted in the late-miss version (this is the speleothem series that we looked at in some detail a month ago, but that’s just a coincidence.)

Huh?? Why would the orientation be right in one emulation and wrong in another emulation? The answer was very timely in terms of “voodoo correlations”.

The “low-frequency” correlation was 0.476 for the 1850-1995 period: a “significant” correlation. The “low-frequency” correlation for the subperiod 1850-1949 was also “significant”: but it was 0.580.

In my initial attempt to emulate Mann’s “late-miss” recon, I had oriented the series using the sign of the correlation for the 1850-1995 period. Silly me. It appears that Mann’s alignment is so opportunistic that the same series is used in different orientations depending on whether it is a “late miss” or “early miss” version. Industrial strength Mannomatic.

Ooga in:

ooga=load(strcat(‘cps_results/’,upper(‘NH’),’_ea100′,expe,’_newgrid_’,iseries,’_gbeachyear.mat’));

Chaka out.

Ooga chaka.

NOAA versus NASA: US Data

Anthony has a post reporting NOAA’s 2008 results, with NOAA reporting:

For 2008, the average temperature of 53.0 degrees F was 0.2 degree above the 20th Century average.

Anthony showed the following image from NOAA:

Readers need to keep in mind that there is a substantial “divergence” between NOAA US and NASA US temperatures as shown in the graphic below. Since 1940, NOAA’s US has increased relative to NASA’s US at a rate of 0.39 deg C/century, thus 0.27 deg C since 1940.


Figure 2. Difference (deg C) between NOAA US and NASA US temperature anomalies.

At present, we don’t know very much about the NOAA calculation. To my knowledge, they make no effort to make a UHI adjustment along the lines of NASA GISS. As I’ve mentioned before, in my opinion, the moral of the surfacestations.org project in the US is mainly that it gives a relatively objective means of deciding between these two discrepant series. As others have observed, the drift in the GISS results looks like it’s going to be relatively small compared to results from CRN1-2 stations – a result that has caused some cackling in the blogosphere. IMO, such cackling is misplaced. The surfacestations results give an objective reason to view the the NOAA result as biased. It also confirms that adjustments for UHI are required. Outside the US, the GISS meta-data on population and rural-ness is so screwed up and obsolete that their UHI “adjustment” is essentially random and its effectiveness in the ROW is very doubtful. Neither NOAA nor CRU even bother with such adjustments as they rely on various hokey “proofs” that UHI changes over the 20th century do not “matter”.

Voodoo Correlations and Correlation Picking

Ex post selection based on correlation has been a long-standing issue at this blog (and has been independently discussed at other blogs from time to time – Luboš, Jeff Id and David Stockwell have all written on it independently. The issue came back into focus with Mann 2008, in which there is industrial strength correlation picking. While the problem is readily understood (other than by IPCC scientists), it’s hard to find specific references. Even here, surprisingly, my mentions of this have mostly been passim – in part, because I’d worked on this in pre-blog days. We mention the issue in our PNAS comment, using Stockwell (AIG News, 2006) as a reference as I mentioned before.

“Spurious” correlations are very familiar to someone familiar with the stock market, whereas people coming from applied math and physics seem to be much quicker to reify correlations and much less wary about the possibility of self-deception.

Reader Jonathan brings to our attention an interesting new study entitled “Voodoo Correlations in Social Neuroscience” which was discussed in Nature here

The problem seems to be highly similar to ex post selection of proxies by correlation. Vul et al write (and touch on other issues familiar to CA readers):

The implausibly high correlations are all the more puzzling because social-neuroscience method sections rarely contain sufficient detail to ascertain how these correlations were obtained. We surveyed authors of 54 articles that reported findings of this kind to determine the details of their analyses. More than half acknowledged using a strategy that computes separate correlations for individual voxels, and reports means of just the subset of voxels exceeding chosen thresholds. We show how this non-independent analysis grossly inflates correlations, while yielding reassuring-looking scattergrams. This analysis technique was used to obtain the vast majority of the implausibly high correlations in our survey sample. In addition, we argue that other analysis problems likely created entirely spurious correlations in some cases. We outline how the data from these studies could be reanalyzed with unbiased methods to provide the field with accurate estimates of the correlations in question. We urge authors to perform such reanalyses and to correct the scientific record.

In their running text, they observe:

in half of the studies we surveyed, the reported correlation coefficients mean almost nothing, because they are systematically inflated by the biased analysis.

They illustrate “voodoo correlations” with one more example of spurious correlation (echoing our reconstruction of temperature with principal components of tech stock prices:

It may be easier to appreciate the gravity of the non-independence error by transposing it outside of neuroimaging We (the authors of this paper) have identified a weather station whose temperature readings predict daily changes in the value of a specific set of stocks with a correlation of r=-0.87. For $50.00, we will provide the list of stocks to any interested reader. That way, you can buy the stocks every morning when the weather station posts a drop in temperature, and sell when the temperature goes up. Obviously, your potential profits here are enormous. But you may wonder: how did we find this correlation? The figure of -.87 was arrived at by separately computing the correlation between the readings of the weather station in Adak Island, Alaska, with each of the 3315 financial instruments available for the New York Stock Exchange (through the Mathematica function FinancialData) over the 10 days that the market was open between November 18th and December 3rd, 2008. We then averaged the correlation values of the stocks whose correlation exceeded a high threshold of our choosing, thus yielding the figure of -.87. Should you pay us for this investment strategy? Probably not: Of the 3,315 stocks assessed, some were sure to be correlated with the Adak Island temperature measurements simply by chance – and if we select just those (as our selection process would do), there was no doubt we would find a high average correlation. Thus, the final measure (the average correlation of a subset of stocks) was not independent of the selection criteria (how stocks were chosen): this, in essence, is the non-independence error. The fact that random noise in previous stock fluctuations aligned with the temperature readings is no reason to suspect that future fluctuations can be predicted by the same measure, and one would be wise to keep one’s money far away from us, or any other such investment advisor9. 9 See Taleb (2004) for a sustained and engaging argument that this error, in subtler and more disguised form, is actually a common one within the world of market trading and investment advising.

Nature’s summary states:

They particularly criticize a ‘non-independence error’, in which bias is introduced by selecting data using a first statistical test and then applying a second non-independent statistical test to those data. This error, they say, arises from selecting small volumes of the brain, called voxels, on the basis of their high correlation with a psychological response, and then going on to report the magnitude of that correlation. “At present, all studies performed using these methods have large question marks over them,” they write.

The scientists under criticism say that the criticisms do not matter because they do appropriate adjustments:

Appropriate corrections ensure that the correlations between the selected voxels and psychological responses are likely to be real, and not noise,

Interestingly, these criticisms are said to have an “iconoclastic tone” and to have been widely covered in blogs, much to the annoyance of the scientists defending their correlations. Nature:

The iconoclastic tone have attracted coverage on many blogs, including that of Newsweek. Those attacked say they have not had the chance to argue their case in the normal academic channels. “I first heard about this when I got a call from a journalist,” comments neuroscientist Tania Singer of the University of Zurich, Switzerland, whose papers on empathy are listed as examples of bad analytical practice. “I was shocked — this is not the way that scientific discourse should take place.”

Hansen's Digits

Both Luboš and David Stockwell have drawn attention today to the distribution of digits in Hansen’s GISS, suggesting that the distribution is, to borrow an expression, a fingerprint of anthropogenic impact on the calculations.

I disagree with both Luboš and David and don’t see anything remarkable in the distribution of digits.

I don’t disagree with the distribution of digits reported by Luboš. I replicated his results as follows:

url=”http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt” #monthly glb land-ocean
working=readLines(url)
temp=is.na(as.numeric(substr(working,1,4)));sum(temp)
working=working[!temp]
working=substr(working,1,65)
widths1=cbind( seq(0,60,5)+1,seq(5,65,5))
giss=array(NA,dim=c(length(working),13))
for(i in 1:13) giss[,i]= as.numeric(substr(working,widths1[i,1],widths1[i,2]))
giss=c(t(giss[,2:13]))
x= (giss%%10)
(test=tapply(!is.na(x),x,sum) )
# 0 1 2 3 4 5 6 7 8 9
# 186 173 142 140 127 170 150 148 165 147

As a simple statistic to test the deviation from a uniform distribution, I did the following:

test=test/ (length(giss)/1000)
K=length(test); mu=1000/K
sum( (test/mu- 1)^2 ) # 0.1220880

I did a Monte Carlo distribution test in a similar way (it is a different than Luboš’ test and it is this difference that accounts for the difference in results). I generated random sequences of length 1000 with sd 100 and then rounded their absolute values. I then calculated distributions of digits in the same fashion and the same statistic as above.

stat=rep(NA,1000)
for(i in 1:1000) {
x=rnorm(1000,sd=100)
y=round(abs(x))%%10
test=tapply(!is.na(y),y,sum)
stat[i]=sum((test/mu-1)^2) # 0.1278
}

This generated the following distribution. GISS was as about the 75th percentile – nothing to get excited outside Team-World.

quantile(stat)
# 0% 25% 50% 75% 100%
#0.0058 0.0610 0.0877 0.1214 0.3416

The only people who could take a different view would be Mannians who might say that this was a 75% significant (see Wahl and Ammann). But I discount that sort of Mannianism elsewhere and don’t support it against Hansen either.

Lucia on Model E's Viscous Dissipation

Lucia has an interesting post on how GISS Model E deals with heat from viscous dissipation here. This is an excellent and technical discussion of a specific modeling issue.

It is precisely the sort of discussion that I think is instructive and useful in this field: a specific issue about a specific model. If the models were properly documented (in an engineering sense), there would be no need to try to figure out (speculate) on how climate models did things, as it would be set out in long boring reference manuals, as is done in other fields. (Lucia mentions nuclear plants.) Climate science has, for the most part, eschewed reference manuals, preferring toy articles in “high-impact” journals. As a result, there is a niche for technical blogs, as discussions such as Lucia’s somewhat fill the gap left by the absence of reference manuals that would exist in other fields.

People sometimes complain that I’m not “interested” in the physics of AGW – opining this because of the lack of coverage of physics topics here. It’s not that I’m uninterested in the “physics” – it’s that I’m uninterested in personal opinions and pet theories. I’m interested in technical discussion of articles and models relied upon by IPCC. But 99.99% of the people who want to discuss the “physics” are not interested in discussing viscous heat dissipation in Model E, they want to discuss Mickowski or Beck or Svensmark. I have no interest in such discussions.

Lucia’s topic is dry. It’s technical. I don’t personally understand the issues, but I like seeing people discuss them in the hope that maybe I will. It’s a “Climate Audit” sort of topic. At least there’s a fighting chance that people entering the discussion can end up finding some foothold of common understanding on a technical point, as opposed to merely venting opinions past one another.

More Changes at the Mann 2008 SI

Mannian confidence intervals have always been a mystery with MBH99 confidence interval methodology remaining an intractable mystery that has defeated all reverse engineering (and engineering) efforts by UC, Jean S and myself to date (though we haven’t picked up this file for a while.)

I was very interested to see how Mann 2008 calculated confidence intervals. You can’t really tell from the running text or the SI. And while, to his credit, Mann archived a lot of source code – a fairly brave undertaking given that the code is a lot like GISTEMP on a scale of 1 to needles-in-your-eyeballs, he didn’t archive the source code for confidence interval calculations – so we had another mystery.

I emailed Gerry North, said to have reviewed Mann 2008, seeking an explanation, but he told me to “move on”. Not very helpful. (But again in fairness to Gerry North, I’ll bet that our PNAS comment was sent to him for screening and he’s the sort of person who would be fair.)

In our comment (submitted on Dec 8, 2008), we commented on the calculation of confidence intervals, noting that, contrary to assurances in Mann et al 2008, the source code did not contain their calculation of their Figure 3 confidence intervals.

Jean S just noticed that, on Dec 15, 2008, Mann added what appears to be the relevant source code as http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/code/codeveri/calc_error.m. Although their cover webpage reports some other changes, this change was not reported there.

UC and Jean S have taken a quick look at this code and we should have something to report in the next week or two.

One thing that seems “very likely” to me: I’ll bet that, in their Reply to our Comment, Mann will say that we were “wrong” in our statement that the archived source code did not contain this calculation because if you go to their website, you can see that the calculation is there [disregarding the fact that it was added after the fact].

Note: Compare to http://www.climateaudit.org/?p=4449 – see UC comment below.

Update: Their reply, needless to say, did not acknowledge that they placed the code online AFTER our comment:

The method of uncertainty estimation (use of calibration/validation residuals) is conventional (3, 4) and was described explicitly in ref. 2 (also in ref. 5), and Matlab code is available at http://www.meteo.
psu.edu/~mann/supplements/MultiproxyMeans07/code/
codeveri/calc_error.m.

What's the red dot?

Note: I’m having trouble publishing new posts at WUWT, so since it has been awhile since I posted at CA, I thought I’d share this puzzle with CA readers while I wait for the issue to be resolved. – Anthony

A simple question; what is that red dot on the map? I was looking at the CONUS map browser depicting the 2008 temperature departure from normal provided by NOAA’s High Plains Regional Climate Center and noticed something odd:

last12mtdeptus-shaded-520

Click for a larger image

Note the red dot in Arizona, which is the only one in the USA. Truly an anomaly. At first I thought it might be University of Arizona Tucson and its famous parking lot station, but that is further southeast.

The other map depiction HPRCC offers also shows it, and narrows it to a single data point: Continue reading

Weblog Awards 2008

Most of you are aware of the voting right now for the 2008 Weblog Awards. Anthony is winning handily; Climate Audit is also doing well, running a strong third. (2008 Logo on right leads to vote.) Anthony is a friend of mine and I’m very pleased on his behalf, though, truth be told, I’d be just as happy if Climate Audit won again. (Hey, I’m a squash player and, if a ball is thrown up, I like to compete.) So I appreciate the votes for Climate Audit; but, if you’re a reader here and have voted for Anthony, that’s OK. Hey, if you’re a reader here and voted for realclimate or one of the other blogs, that’s OK too.

I’ve voted in a few other categories for blogs that have spoken kindly of Climate Audit in the past: Luboš (Reference Frame) in Best European Blog; Jennifer Marohasy in Best Online Community and Kate (Small Dead Animals) in Best Conservative Blog. Luboš is winning easily; Jennifer has an uphill fight while Kate is in a pretty close battle. Occasionally speaking kindly of Climate Audit is not necessarily the most meaningful metric of the quality of these blogs relative to their competition, other than perhaps indicating discernment on their part. 🙂

When voting started, the antagonism towards Anthony and me at realclimate, climateprogress and pharyngula was palpable.

RC started their post on the Weblog Awards by observing:

Science … is not generally marked by … the persistent cherry-picking of datasets to bolster pre-existing opinions.

This is a view that I definitely share, but it seems odd coming from the Bristlecone Masters. They went on to say that:

[science is ] not generally marked by … accusations of bad faith, fraud and conspiracy..

These are views that I share and policies at this blog prohibit such accusations. I ask readers not to use such language and, for the most part, this has resulted in more moderate language here than in many online blogs. I moderate after the fact and am not online 24/7 and delete or snip such comments. I’m gratified when this effort at ensuring civility in expression is recognized, as it was by a recent commenter at Tom Yulsman who said:

Any fair reading of McIntyre’s website gives the overwhelming impression of someone who bends over backward not to engage in ad hominem attacks, to the point of cutting off commenters to his blog who move in that direction.

Quite so. I’ve drawn a firmer line on this as time has passed. I appreciate it when people draw my attention to comments that breach blog policies so that I can deal with it.

I often wonder whether there are any mirrors in Team-World. As RC says, science is not generally marked by “accusations of bad faith, fraud and conspiracy”. Yet it was a realclimate coauthor who made the following accusation here:

This claim by MM is just another in  a series of disingenuous (off the record: plainly dishonest) allegations by them about our work.

Or Michael Tobis at Tamino here (in one of countless quotes from Tamino’s site):

There is a strong case that the game McIntyre et al is playing is not honest.

In the very thread in which realclimate authors uttered the above pious thoughts, NASA employee Gavin Schmidt or one of his associates approved the following comment for publication:

McFraudit and Watts-up-my-A** provide a very useful service of giving the tin-hat crowd the illusion of doing science.

That RC do not disassociate themselves from Hansen’s comments on coal trains, crematoria or the prosecution of business leaders speaks for itself.

RC goes on to opine piously:

Science blogging can play a role in improving science … but the kind of vituperative tone that dominates some blogs greatly diminishes any positive contribution they might make.

Quite so. But they don’t seem to see any problem with PZ Myers at Pharyngula calling me “undeserving mouthpiece for right-wing hackery” or with Myers’ subsequent rant:

And then, of course, what’s bringing you and your fellow naive whiners here is the need to defend the climate change denialist, McIntyre — so many of you, after carping that I’m not meeting your demands, are protesting that he’s not a denialist, and you aren’t denialists, and you’re all here in the cause of good science.

Bullshit.

My expertise is not in climate, but in biology, and I’m familiar with his type — it’s a common strategy among creationists, who do dearly love to collect complaints. There are people who put together a coherent picture of a scientific issue, who review lots of evidence and assemble a rational synthesis. They’re called scientists. Then there are the myopic little nitpickers, people who scurry about seeking little bits of garbage in the fabric of science (and of course, there are such flaws everywhere), and when they find some scrap of rot, they squeak triumphantly and hold it high and declare that the science everywhere is similarly corrupt. They lack perspective. They ignore everything that doesn’t fit their search criterion, and of course, they’re focused only on putrescence. They aren’t scientists, they’re more like rats.

And the worst of the rats are the sanctimonious ones that declare that they’re just ‘policing’ science. They aren’t. They’re just providing fodder for their fellow denialists, and like them all, have nothing of value to contribute to advance the conversation. You can quit whining that you and McIntyre are finding valid errors; it doesn’t matter, since you’re simultaneously spreading a plague of lies and ignorance as you go.

So bugger off, denialists. I am not impressed.

Tom Yulsman: The Gadfly and the Dim-Witted Horse

I had a pleasant interview yesterday afternoon with Tom Yulsman of the Center for Environmental Journalism in Colorado. He also posted an article yesterday reporting on an interview with Roger Pielke Jr in which Yulsman described me as a “gadfly”. I don’t know whether this was posted before or after our interview; he didn’t mention it.

My first reaction was that this was a “pejorative” term, but, with a little research”, it turned out that Wikipedia cites Socrates as a “gadfly”, who was a goad to “slow and dimwitted horse”.

Gadfly” is a term for people who upset the status quo by posing upsetting or novel questions, or just being an irritant.

The term “gadfly” (Gk. muopa)[1] was used by Plato in the Apology[2] to describe Socrates’ relationship of uncomfortable goad to the Athenian political scene, which he compared to a slow and dimwitted horse … During his defense when on trial for his life, Socrates, according to Plato’s writings, pointed out that dissent, like the tiny (relative to the size of a horse) gadfly, was easy to swat, but the cost to society of silencing individuals who were irritating could be very high. “If you kill a man like me, you will injure yourselves more than you will injure me,” because his role was that of a gadfly, “to sting people and whip them into a fury, all in the service of truth.”

In modern and local politics, gadfly is a term used to describe someone who persistently challenges people in positions of power, the status quo or a popular position.[3] The word may be uttered in a pejorative sense, while at the same time be accepted as a description of honorable work or civic duty.[4]

In the article itself, Yulsman quotes Pielke Jr making points about the limitations of peer review that are very familiar to Climate Audit readers, and, in the process, praising the blogs for peer review services:

Q: Have we put too much faith in the peer review system? And should we seek sources outside the usual scientific circles?

A: Peer review is simply a cursory check on the plausibility of a study. It is not a rigorous replication and it is certainly not a stamp of correctness of results.  Many studies get far more rigorous peer review on blogs after publication than in journals.  I use our own
blog for the purpose of getting good review before publication for some of my work now, because the review on blogs is often far better and more rigorous than from journals. This is not an indictment of peer review or journals, just an open-eyed recognition of the realities.
It is hard to say who is outside and who is inside scientific circles anymore.  McIntyre now publishes regularly in the peer reviewed literature.  [Pielke is speaking of Steve McIntyre, whom I would describe as a climate change gadfly; he publishes a blog called “Climate Audit”] Gavin Schmidt blogs and participates in political debates.  [Schmidt is a NASA earth scientist who conducts climate research.] Lucia Liljegren works at Argonne National Lab as an expert in fluid dynamics and blogs quite well on climate predictions for fun. She is preparing a paper for publication based on her work, but she has never done climate work before.  I am a political scientist who publishes in the Journal of Climate and Nature Geoscience and blogs. Who is to say who is ‘outside’ and who is ‘inside’?  Is participation in IPCC the union card?  How about having a PhD?  Publishing in the literature?  Testifying before Congress? 

The Wikipedia article refers to a BBC piece on gadflies:

The term ‘gadfly’ is usually pejorative and is often bestowed by organizations or persons who are on the receiving end of the gadfly’s attentions. It implies that the gadfly is an intellectual lightweight whose only intent is to annoy, thereby gaining attention for himself…Being a gadfly is generally a thankless task. People with something to hide will go to almost any length to discredit one who brings their behaviour to light.

Back to the “slow and dim witted horse”. I’m a little surprised that Yulsman likens the climate science community to a “slow and dim witted horse”, but who are we mere gadflies to argue? If so, I think that we can safely say that certain body parts are already spoken for.

R – the choice for serious analysis

While Steve is a little “under the weather” (it must be all the snow that Al Gore sent him), I thought I’d mention an interesting article in the New York Times which sings the praises of the programming language R.

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

What makes R so useful — and helps explain its quick acceptance — is that statisticians, engineers and scientists can improve the software’s code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

So there you have it: R is the future.

Give up the Fortran, Mike and the Hockey Team, and join the 21st Century. Uncle Steve does know best.

R can be downloaded at http://www.r-project.org/ and is available for Windows, Linux and Mac OSX. And its free.