A new and interesting paper by Carl Wunsch is online here (thanks again to Eduardo Zorita for the reference and link). The abstract says:

The human eye and brain are powerful pattern detection instruments. Coupled with the clear human need to perceive the world as deterministic and understandable, and the often counter-intuitive results of probability theory, it is easy to go astray in making inferences. In particular, many examples exist where attention was called to apparent extreme behavior, whether in time or space series, or in the appearance of unusual patterns, that are just happenstance.

Wunsch quotes the following “verse” as his text:

My eye is better than any statistical test.

Well-known paleoceanographer, circa 2001.

To which, any person attending the NAS Panel presentations in 2006 cannot help but add:

“I am not a statistician”

Michael Mann to NAS Panel, March 2006

There’s an interesting tie-in in one of the citations in this article to AR4. Wunsch quotes an example from Wunsch (1999). which includes a demolition of the statistics in Trenberth and Hurrell, 1997. Ironically, in response to criticism of the significance testing for trends in chapter 3 of AR4 from Ross McKitrick, IPCC reviewers (also proving that they are not statisticians) invoked an *unprecedented* use of the Durbin-Watson test – a usage unknown in statistical literature off the Island ( the Durbin-Watson test is fine, it’s just the IPCC usage was nonsensical – see posts last summer on this topic.) As purported justification for this, they cited the Trenberth and Hurrell reply to Wunsch (1999), which Wunsch’s response rebutted.

Ross McKitrick’s criticism was as follows:

3-1132 A 116:55 116:56 The sentence beginning, “Nevertheless, the results depend ” is vague, disputatious and incorrect. It applies more to the REML results, which are presented without such caveat in the chapter. No citation to any literature is given to defend the implication that fractionally-integrated estimators are less physically-realistic than the linear regression models used elsewhere. Persistency models were developed in hydrology precisely to improve physical realism, so as to provide a better match between the stochastic model and the geophysical phenomena. As for transparency, the lack of transparency of GCM’s or other numerical models is never regarded as a deficiency in IPCC documents. And there is no sense in which fractional-integration models lack transparency–the methods are well-known and code is published. They are not trivial, but that doesn’t mean they are not transparent. The sentence is wrong, unnecesary and should be removed. [Ross McKitrick (Reviewers comment ID #: 174-13)]

To which the IPCC reviewers replied:

Fractionally-integrated estimators have not been shown to be good models or fits to the data. On the contrary some examples exist where it is demonstrated they are not (e.g. Trenberth, K. E., and J. W. Hurrell, 1999: Comment on [Wunsch 1999]: The interpretation of short climate records with comments on the North Atlantic and Southern Oscillations. Bull. Amer. Met. Soc., 80, 27212722.

The Trenberth and Hurrell comment was not an exposition of statistics by renowned statisticians, but an exchange sparked by Carl Wunsch’s 1999 criticism, covering somewhat similar ground as the present article. Although AMS publications are mostly online, Trenberth and Hurrelll 1999 is not online (though it is in the paper copies of the journal.)

So when Wunsch (2007) rebuts Trenberth and Hurrell one more time, it is in a debate that has seemingly been going on for a decade without acceptance of Wunsch’s points by rank-and-file IPCC climate scientists who are not statisticians.

## 85 Comments

Money quote:

I recommend taking the time to read the Wunsch paper. As a layperson Climate Audit reader who does not deal with statistics everyday, I also found it extraordinarily informative and helpful in better understanding many of the issues that Steve M. raises with some of the current “climate science” research papers.

As I learned to my deep regret in grad school a badly specified statistical test can be misleading or, worse, mask a more subtle and correct effect. Wunsch’s paper needs to be read and understood by the people who are premising billions of dollars in public money and radical shifts in economic behaviour on interpretations of data which amount to little more than wiggle matching.

I also recommend Wunschs paper. It’s very well written and very relevant to paleoclimate studies.

Thanks for posting the Wunsch paper. While I agree that his main point is correct — even modestly complex systems offer many opportunities to go astray statistically — IMHO climate scientists often exacerbate this problem by defiantly sticking to bad statistical practices and results, such as the hockey stick, long after their statistical shortcomings have been identified. I do not know whether this is due to lack of comprehension, self-deception, desire to deceive others, or something else. But it does impede progress.

Incidentally, Nassim Nicholas Taleb has written two interesting and very readable books (here and here)on human deficiencies with respect to drawing inferences about complex systems — in this case focusing on financial markets.

Delightful paper by Wunsch. Reminder of the Martin Gardner “Mathematical Games” when Scientific American had repute.

Plase don’t think “I told you so”, but I have written on CA about the dangers of discarding information-rich outliers, as in Wensch

I have also expressed frustration about using low correlation coefficients and relatives. The Wunsch 99.9% confidence for drug evaluation is a long way from levels used in some climate science. If you have to get it right to get your next pay check, then you prefer high confidence.

Gardner also wrote of lags inserted to make patterns match (in the CA context, dendro correlations back to present year? Bulloides shell profiles?) and on causation as a required part of explaining correlation, no examples needed.

Steve, you are miles ahead of most of us in your math skills and I am in no way trying to lecture or make a point. It simply gives a person like me some comfort that experiences of a career are shared by others with more rigid and colourful analysis.

May your weblog popularity numbers increase. Almost into second now, because you select articles like this.

Lies, Damn Lies and Statistics…

How to Lie With Statistics by Darell Huff.

1. The Sample with the Built-in Bias

2. The Well-Chosen Average

3. The Little Figures That Are Not There

4. Much Ado about Practically Nothing

5. The Gee-Whiz Graph

6. The One-Dimensional Picture

7. The Semi-attached Figure

8. Post Hoc Rides Again

9. How to Statisticulate

10. How to Talk Back to a Statistic

I feel great trepidation in invading an area where I have an interest but where I am acutely aware that I have little formal training.

For what it’s worth, it seems to me that the problem often arises where the human eye and brain tries to make sense out of a mess of figures in a time series where there is a lot of noise with not very much signal or where it is unclear whether or not there is a signal.

I have had to meet this in my former career as an industrial chemist looking at a time series of chemical analysis results which by their nature will always have an imprecision. I was taught that time series analysis cannot be assessed by simple parametric tests because most of the elementary statistical tests rely on the assumption of independence of each result from every another. If you believe there are Looking step changes or trends it immediately weakens/invalidates this independence. I also feel uncomfortable with the use of moving averages, wondering whether there is going to be a removal or artificial imposition of a cyclic variation (perhaps this just shows my ignorance which I freely admit).

A way I found to look more dispassionately at time series results and to gain a bit more rigour in the independence stakes was to use CUSUM charting and analysis. I wonder why this is not used more often in analysing palaeoclimate time series measurements and proxies.

I see advantages in making a messy data set more easily grasped in a visual sense but without having preconceptions. Use of moving averages is unnecessary and no assumptions are made in doing the original charting. If the CUSUM of the graph is a straight line with a positive slope, there is an increase in the mean value; a decrease with a negative slope. A curved line will suggest a longer term trend up or down.

The significance of an apparent change can be assessed with simple test statistics (although to be fair, I have yet to get to grips with the test for counted data).

While CUSUM analysis does not give the complete answer (!!) it gives another way of simplifying a set of messy data without imposing any preconceptions.

For those who have access to it, British Standard BS5703-2003 in 4 parts is the British way of doing it.

I used CUSUM analysis (with other methods) to assess the landfall hurricane data for USA, 1850-2006 for a project for my Open University degree.

(Disclaimer I own no stocks or shares in any CUSUM-based company. /humour)

Alan

A climate scientist has an equal probability of writing a good or a bad paper. He writes two papers, the first one is good. What is the probably that the second one will be bad?

Carl Wunsch:

No! Really? I hadn’t noticed…

Tamino is giving a quick class in statistics.

Interesting read, but my statistics skills are poor. Could people from here read and comment?

7

You have some reason for that, if the simplest form of MA calculation is used [rectangular window]. However, if a recursive approach is used ( MAn+1 = (1 – a)*MAn +a*yn+1 ), you are pretty safe. Of course the high-frequency variations are lost, but that is the intention.

11, I’m not a statistician, but I do have had a little exposure to DSP, and one thing bothers me about rolling averages: while they may be simple and computationally efficient, their frequency response if pretty crude. If you have some idea of the time or frequency scales of the noise in a signal, wouldn’t it do a much better job of separating signal from noise if you used a real filter? It’s not as if the amount of computing involved is prohibitive, as it was in previous years.

RE 12.

I think in some ways we are hindered by comparisons to engineered communication channels.

#8:

To a

realclimate scientist, 0%. To a superstitious gambler, 100%. To those of us who have taken even the most basic statistics course: still 50%. (Unless I’m making a total fool of myself by misunderstanding the question)13, that’s why I phrased that as a question. I don’t know if that’s a reasonable question, or a distraction from more important issues.

One might ask, why was statistics even invented? Because it is too easy to collect 3 data points and say you have proved your point when done by eye. Looking at the hockey stick, we can ask, if the model is in fact false, and does not properly represent tree response to temperature (and is maybe precip driven), what would a reconstruction look like? And the answer is white noise, which plotted over time and averaged is nothing else except a flat line — the shaft of the hockey stick. Stats would tell you what you must do to assure that you do in fact have a model and are not accepting white noise as if it were a signal.

There’s an interesting issue to re-visit with MBH in connection with the limited degrees of freedom issue.

Remember the “adjustment” for CO2 fertilization in MBH99? Why do you suppose that Mann made the “adjustment”? The AD1000 PC1 is too HS-shaped and yields a lousy RE because it overshoots. Look at the post Reply to Huybers #2 for an example of such overshooting.

So what Mann did was “fix” the PC1 so that the recon didn’t overshoot any more getting a favorable RE statistic. In effect, there is one degree of freedom in the RE statistic here and Mann used it up in his “adjustment”. MBH is a neverending source shall-we-say mann-dacity.

Steve:

Has Wunsch commented explicitly on Mann and the Hockey Stick in other forums? There is no reference in his very compelling paper.

I will say I disagree with Wunsch on one or two of his examples, the basketball player. It the difference was that player A hit 50% of his shots instead of 35 and B hit 25% instead of 30% it’d be a different story. With two rather closely rated players, it’s just too likely that external circumstances make a difference. (good night’s sleep, argument with girlfriend, upcoming test.) The same is true with the pilot landing example. The placebo effect has been proven too many times to be discounted.

#10

I think it’s a good read but wanted more when he discussed testing relative to red noise.

Re: #14 Jeremy Friesen

If you had read that excellent little paper that’s the subject of this thread or had taken the advanced statistics course, you would know that the answer is 2/3.

Re: 14

That requires the assumption that the quality of the second paper is independent of the quality of the first one.

If people learn from mistakes, then a bad first paper may imply a higher quality second paper (and a bad first paper may imply a lower quality paper as no learning would have taken place).

Or, the scientist may exhibit Hysteresis.

Sinan

re: #21 Reference,

You’ve made a mistake. Either in your English or your math. Since you’ve specified that the first paper is good, it won’t allow you to make the same assumptions as result in the 2/3 figure. Now you might have MEANT to say, You read one of them and it’s good. But that isn’t what you said. Sometimes we say “the first one” meaning “the first one I read” but that should be made explicit. The ambiguity of assuming others will read your question the same as you renders your argument invalid.

Correction to 22:

18, Carl Wunch is a little schizoid in that he’s very critical of abuse of statistics and models, but is still beholden to the AGW party line. After some of his commentary was used in the Swindle movie, the next morning, Lubos Motl (who was at Harvard at the time, and aware of Wunch’s reputation) predicted that he would complain vociferously. Sure enough, within hours, he was denouncing the use of his own words about the problems with models in the movie.

I think it’s what you have to do in a Boston university to get along socially.

#25: By happy coincidence I was seated next to Carl Wunsch during dinner at a small conference on Thursday evening, and we had a long conversation. He is definitely not beholden to any school of thought or party line. He objects to it when people on either side of the climate debate exaggerate their understanding of the climate and claim certainty about what’s going to happen. I guess both sides can appeal to his writings for ammunition against the other, though from my perspective, since the IPCC side has invested so heavily in its claims to 90% certainty etc., etc., his message sounds more troubling for them than for their critics. Still, he is worried about CO2 emissions and the fact that we have so much trouble knowing what will happen. I came away wishing my environment minister had been at the table taking notes. It’s not like his views would directly imply more or less aggressive greenhouse gas policies, but they do clarify the reality that decisions have to be made in a context of genuine uncertainty, rather than pretending to know stuff that we don’t.

In his presentation he didn’t talk about the hockey stick except briefly in passing to note the dispute about what the paleo record could teach us.

re: #23 correction,

I was just sitting here musing and realize I have an error in my analysis. It doesn’t matter if it’s the firt one written or the first one picked or any other sort of “first.” Once you’ve identified a particular paper, the other paper is now a separate entity and the chances of it being good or bad is 50%. It’s only the stating that there are two papers and that (at least) one is good which allows you to estimate that the other is bad 2/3rds of the time.

This means that you, “Reference”, made a logical / math error rather than a linguistic error.

#7

I think there has been some discussion about this here, try to google Slutsky-Yule Theorem. Correlations between two series may change quite randomly as well, see

http://www.climateaudit.org/?p=980#comment-75608

#12

The problem seems to be that there are no ideas about frequency scales of noise or signal. Tamino seems to be in the ‘noise is AR(1)’ -phase. AR(1) +white sequence for noise would be too much, of course ;) It eventually leads to non-sense like MannLees96 paper.

The frequency response of rolling average was discussed here .

Some of the cases ideas discussed in the paper are also discussed in this paper:

Judgment under Uncertainty: Heuristics and Biases (PDF)

First paper read: Good or bad? Second paper read: Good or bad?

Now, here’s the questions. Who wrote it? How many papers do they have? If 0 and the first one is good, what are the chances that somebody who can write one good paper can write another? Probably pretty good, but how do you qualify an unknown like that? You can’t. The answer to the question is “unknown” not some percentage.

Now if the author has 99 papers, and they’re all good (or all bad) what’s the chance they’ll continue in the same way? Probably pretty good. How do you quantify it? You can’t. Maybe the next paper, that 100th, will be a fluke.

So, like a lot of things, the question simply doesn’t have an answer to it.

Plus that, those aren’t the only two answers. Fantastic. Horrible. Just okay. Mostly correct. Mostly incorrect. How are you qualifying what a “bad” paper is versus a “good” paper versus one that’s just mediocre or simply average?

Uh, Sam.

This is a math puzzle, not literary criticism. The salient points are that 50% of the papers by this person are good, 50% bad and that there are exactly 2 papers we’re concerned with. Reference took the old “A man has two children. at least one is a son. What are the odds the other child is a daughter?” and converted it to a question about science papers but stated it wrongly.

BTW, I remember arguing with someone who made the same mistake at least 12 years ago on CompuServe.

#27 Dave Dardinger

My apologies, however, if you accept the meaning of ‘the first one’ as ‘the first one selected’ from the two papers, then the answer of 2/3 is correct. A more concise formulation would of course be ‘one’. Ain’t english onederful :)

Forms of Self-Deception

Wunsch’s first two examples of self-deception have to do with looking too narrowly at the signal, in this case, the last 100 or so years of the climate pattern, when it takes at least a thousand years to make any sense of it. Like listening to the last few bars of the aria, without hearing all the variations that came in front.

Curious how many climate scientists seem to have come to doubt their own findings.. What happened to John Eddy?

re: #32 Reference,

That’s what I was thinking at first but was wrong about. Look at it like this: There are 4 possible combinations of Good / Bad after you’ve fixed your definition of the terms, assured that the occurance of them in the universe of papers under consideration is 50% G & 50% B and “selected” two papers randomly (note that this is different than labeling / selecting one of the papers and reporting whether it is good or bad). Now whatever method you use to decide which paper to select first you’ll have the following possibilities:

1 2

G G

G B

B G

B B

Now each of them will have a 25% possibility of being the situation in the actual pair of papers at hand. Now if I tell you that the first paper (i.e. #1) is G then you can easily see that there’s a 50% chance each that the second paper is G or B (Same if paper 2 is selected). But if I merely tell you one of the papers is G (possibly both), then the only possibility excluded is the last one B B, Of the other three, given that we know one of them is G, two out of the other three have a B in them, thus the 2/3 chance of the other paper being bad. This has nothing to do, IOW, with the language you used per se but that you stated that one paper was selected or read. That changes the logic of the situation and thus the odds. Not unlike cherry-picking proxies, to bring it back to one of the main topics of this blog.

Alan Bates

I’m a big fan of CUSUM and Exponentially Weighted Moving Average (EWMA) charts, as in #11 above, for time series analysis. They are the best at picking up small changes in process means, not so good for sudden large changes, though. X-bar and R charts are much better at spotting outliers. You would think that researchers looking for changes in trends would be more aware of these tools. I’ve also tried to apply the CUSUM technique (Poisson statistics) to North Atlantic hurricane numbers here and a few other posts.

I don’t agree I’ve made a mistake, Dave. I think what you are missing is that if you create an artificial notional situation that this person writes papers and half exactly are good and half exactly are bad, there’s not enough information to solve it. Here’s some scenarios in the hypothetical world where a paper writer is like that.

1. The person writes exactly two papers and that’s it. If the first is good, the second is 100% due to be bad, in the situation of exactly 50/50 papers.

2. The person writes two bad then two good, alternating. If the author already has two bad papers, the first has to be good, and so does the other one. 100% both are good.

3. The papers alternate good/bad. If the first is good, the second has to be bad, 100%

4. The order is totally random. So we can have good/good, bad/bad, bad/good, good/bad. Whatever the first one is, 50% the second is one of the two choices….

5. There are three papers. If one is good and one is bad for the first two, the third one could be either, 50% Unless there’s a pattern…

Or maybe I just disproved myself, and I did make an error. It’s either 100% or 50% the second will be bad and 100% or 50% the second will be good. I don’t have enough info to answer one or the other though. So it’s unknown, other than it’s either 100% or 50% (So the median and mean is 75%?)

Rethinking that, if 2 good + 2 bad + 2 good + 2 bad, it could be that after that string, it changes to reverse itself, in which case the next two would be both bad, or the pattern could fall apart and start alternating.

See?

Of course, we could always look at the next

npapers and start getting into permutations and combinations.Anyway, in real life, I suppose the answer is 50% since past indicators are not predictions of future actions, right?

Re#18, agree on the basketball player one. If people hit FTs at a 30-35% clip, then by basketball standards, it’s almost “chance” when they make one. Neither is worth a wager. Furthermore, I’d be inclined to think that a 30% one who has hit 8 in a row has to have an explanation (playing on his “home goal” where he shoots much better than elsewhere either because of surroundings, confidence, familiarity with the possibly skewed dimensions/standards, etc).

#17 — My thanks also for Carl Wunsch’s paper. It’s very worthwhile reading. I especially liked the reproduction of Hogg’s & Owens’ Figure 6b, showing the wild trajectories of neutral float bouys off the coast of Brazil, in contradiction of the presumed laminar flow of deep North Atlantic currents. My impression of Wunsch is that he’s brilliant and honest.

Wunsch also referred to a paper by J. Murray Mitchell, Jr., “An Overview of Climatic Variability and Its Causal Mechanisms,” which makes an interesting case that much of climate variability is driven by internal stochastic processes. In getting that paper, the same issue of Quaternary Research (vol. 6, 1976) had another paper making a similar case — Edward N. Lorenz, “Nondeterministic Theories of Climatic Change.”

Here’s the abstract of the Lorenz paper:

Edward N. Lorenz “Nondeterministic Theories of Climatic Change” QR 6,495-506 (1976)

Abstract: “A basic assumption in some climatic theories is that, given the physical properties of the atmosphere and the underlying ocean and land, specified environmental parameters (amount of solar heating, etc.) would determine a unique climate and that climatic changes therefore result from changes in the environment. The possibility that no such unique climate exists and that nondeterministic factors are wholly or partly responsible for long-period fluctuations of the atmosphere-ocean-earth system, is considered. A simple difference equation is used to illustrate the phenomena of transitivity, intransitivity, and almost-intransitivity. Numerical models of moderate size suggest that almost-intransitivity might lead to persistence of atmospheric anomalies for a whole season. The effect of this persistence could be to allow substantial anomalies to build up in the underlying ocean or land, perhaps as abnormal temperatures or excessive snow or ice. These anomalies could subsequently influence the atmosphere, leading to long-period fluctuations. The implications of this possibility for the numerical modeling of climate, and for the interpretation of the output of numerical models, are discussed.”This idea is roundly ignored in circles of AGW science.

#39

Pat:

When you believe that the world is made of nails, you go looking for a hammer. Since many who believe in AGW already believed that we were killing the planet, why would anybody be surprised that they found a single mechanism to explain climate change.

My God, does anyone else remember Einstein?

#34

Dave, language is critical in this case, changing the language changes the problem. By saying that the ‘first one’ is good determines that the question is about the remaining paper, not the set of two papers.

Consider that our climate scientist has written a third paper, we find that two of them are good. The chance that the third one is bad is now 0.75, not the 0.5 one may intuitively expect. This may nicely help to explain why the majority of prolific climate scientist’s papers are not referenced in the literature :)

( For new readers )Indeed, in MBH99 Figure 1 ITRDB PC #1 series is “corrected”, in the next figure he uses incorrect reference series. ( http://www.climateaudit.org/?p=647 , Jean S summarized it in #189 ). I haven’t been able to reconstruct Fig 3 b.

MBH98 Figure 3 is a mess, Figure 4 is based on heavy overfitting (calibration with q=112 , p=11, N=79 + variance matching, you can fit an elephant with these ). Figure 5a RPC no.5 starts at 1650, unreported step. Figure 5b uncertainties are completely underestimated (degrees of freedom forgotten, effect of scaling errors misunderstood..). Figure 7 is based on Mann’s own statistical method, evolving multivariate regression, which leaves even math pros speechless. All this before we go to the actual text ;)

Recent British commentary questioning AGW

http://www.telegraph.co.uk/earth/main.jhtml?xml=/earth/2007/11/04/eaclimate104.xml

Mentions 2 unnamed Canadians.

#17/#43: Slightly OT, but I noticed that the submission version of MBH99 has an interesting line (the last line; removed from the published version along with the Briffa reference) in “Data and Method” (the ITRDB PC#1 discussion):

So here we have MBH not only acknowledging the “divergence problem” but also speculating a reason for it. Of course, now they have moved on.

Bill says

“Wunschs first two examples of self-deception have to do with looking too narrowly at the signal, in this case, the last 100 or so years of the climate pattern, when it takes at least a thousand years to make any sense of it. Like listening to the last few bars of the aria, without hearing all the variations that came in front.”Yet we still hear the AGW community say that “30 years is enough to see a trend”.

It appears the ony reason some AGW papers are put out are to discount the MWP or LIA, only so that the current weather changes can be “unprecedented”.

Just like the glacier that retreated to “unprecedented” levels, only to find tree stumps there…

Great post Jeans S. Can you say Mannipulation (sometimes confused with capitulation).

re: #42 Reference,

Yes, but are you agreeing with me or not? Your wording for the 3 paper case is ok since you don’t specify which two papers are good. If you’d said instead, “you read two papers and they’re good, what are the odds the third paper is good?”, then you’d be back to 50% instead of 75%. BTW, at this point we might start discussing the Monte Hall example from the paper, which is really equilivant to the two paper example. Once he opens a door he converts what looks like a 3 door problem where you have a 1/3 chance of winning into a two door problem where you know one of the remaining doors is a goat and therefore the other door is a car. Since you selected your door under the 1/3 regime, you know the other door is 2/3 likely to be the car.

Hmmm… Now I’m not sure the situation is equilivent despite the mathematical likeness. So here’s your homework. If we had 4 door Monte Hall game, and he opened two doors with goats, would the odds of the 4th door being the car be 3/4 or not?

Ref 48

Dave,

I believe that the Monte Hall example is a beautiful demonstration of how information changes strategy.

The contestant first makes a random selection choosing one door from three options, but she has a two in three chance of making the wrong choice. Therefore for two out of three times the host is forced to open the other wrong door. He can neither duplicate her choice nor open the door that contains the prize, so he deliberately makes the other wrong choice. Because this happens two out of three times, the contestants best strategy is to then switch and two times out of three she will win the prize.

Link

re 48: going a little OT

If we had 4 door Monte Hall game, and he opened two doors with goats, would the odds of the 4th door being the car be 3/4 or not?

Yep. The Monty Hall problem is a simple 1/n odds problem complicated needlessly by irrelevant door openings. Your original odds are 1 over n (the number of doors). Monty gets the rest. Since there is only 1 car, Monty will always have at least n-2 goats. The door openings are not random. Monty selects n-2 doors to open revealing goats that you already knew he had. Monty’s odds haven’t changed from n-1/n. Switching would be good.

re: #49 Philip,

Quite right, but just now I’d like to know if the Monte Hall Paradox is mathematically equilivent to the other problem involving two children (or two science papers)? It’s again a case of information changing strategy, but there may be more than one of these sort of situations which may yield the same result in one particular case but not in general. I haven’t set up the problem on paper and solved it yet to give others a chance to do so without my biasing the results. I suspect they’re the same under the hood, but wouldn’t be particularly surprised if they aren’t. And I’m sure there’s some mathematical researcher somewhere who’s an expert on this sort of thing and has published a thesis or book on the subject.

All of which brings up two things which have been often mentioned on this blog. First cherry picking is always a danger in the proxy studies and has not been properly accounted for. Second, don’t re-invent the wheel! Mann et. al. should have consulted with statisticians who could have given them advice about how to set up their studies to avoid things like the Monte Hall paradox.

re: #50 Jax,

Thanks! That sounds right. So let’s try an intermediate case and then set a question for the statisticians. It’s obvious that any doors opened with knowledge increases the odds of your getting the car if you switch. Contrawise, doors opened without knowledge don’t increase the odds. So if we had 10 doors one of which had a car behind it and we select a door and then Monte opens 3 doors knowing which door the car is behind, our odds of winning increase from 1/10 to 1/7 if we switch to another door. But if Monte merely opens three doors at random, not knowing if the car is behind them or not, our odds don’t increase by switching, but I think now our odds of winning by holding are 1/7, not 1/10. Hmmm… Now I’m not so sure again. I think I do need to go away and double-check your result and see if it matches the results of using a boolean table like I did in #34.

IMO, the Wunsch example that comes closest to the cherry picking issue is the one referencing the ratio the mass of a proton to the massof an electron. A pattern may simply be a “pretty” pattern with no additional meaning.

Question:

In 3 card Monte or find the lady, I assume the logic of the Monte Hall paradox holds, if the con artist shows you the non-Queen?

#53, in the case of find the lady the odds are always zero that you will win. That’s why it’s played by ‘con’ artists. The ‘game’ depends on switching the queen when the con artist turns the cards over.

Just FYI.

Re #49 – 54

Perhaps hunting for Waldo is analogous with the Monte Hall example. As Steve opens each door revealing yet another goat, the excitement builds as we correctly expect the noxious polluting car to appear.

re: Monty Hall Paradox

Monty’s opening doors is irrelevant to the decision. It’s just game show machinations to disguise the real offer you are given which is to keep your 1 door or take all the remaining doors. If Monty phrased the offer like above, there would be no suspense. So he opens all his doors but 1 revealing all non-winners. Now you appear to be equal to Monty with 1 door apiece. What should you do?

But, there is only 1 car. Monty has multiple doors. Therefore all his doors but 1 must have goats. He peeks behind the doors and opens all but 1 to show you the goats that had to be there. Nothing new has been learned. Nothing changed. Monty’s remaining door has n-1 chances of having a car.

His selectively discarding non-winners and offering you his 1 remaining door is the same odds with less drama as offering to trade your 1 door with all his doors. But that doesn’t make a good game show or paradox. Oh, and you should switch doors.

jax:

Nicely put: The kind of 2 for 1 deal that is hard to resist.

The interesting thing is how many of us would decline to switch and how one can benefit by knowing how few people have figured it out!

The Monty Hall Paradox is a specific example of the general Principle of Restricted Choice. It also applies to Contract Bridge.

The Monty Hall Paradox is just a straight forward probability problem, nothing more. Do you want your 1 door or all of his. You have 1/n chance of winning. Monty has n-1/n chance of winning. His action of selecting losers from n-2 of his doors to open and then offering you his remaining door is an unnecessary complexity. The offer is the same as offering you all of his doors initially and letting you open them.

Monty has the greater probability of winning from the outset. Nothing in the problem can change his or your odds. and nothing would alter your decision to accept his offer to switch.

Funny aside. One day I’m testing YASH

( yet another stupid hypothesis) and my friend sits down.

F= friend. M = me.

F:” weren’t you testing that data last week?”

M:” bonus points Einstein! 6 more points and you get to ride the short bus to school”

F:” well?”

M: ” the hypothesis failed”

F:” So what are you doing now?”

M: “testing the old data with a new hypothesis with a new method specified by the principal investigator

who refused to listen about the power of tests or factorial designs or ANYTHING at the start of this fiasco”

F: ” whiner”

M: ” [self snip]”

F: ” How many different tests have you looked at?”

M: ” I looked at every test, six ways from sunday.”

F: ” Did you adjust your confidence level accordingly?”

M: ” huh?”

At which point he explains Wunsches point to me about multiple tests on the same data.

M: “Mumble mumble. makes no sense I JUST LOOKED AT THE DATA I DIDNT TOUCH IT.”

Funny story. nobody died.

I got one of the best pieces of advice sitting in the Borgata poker room in Atlantic City.

There was a doctor next to me that I had been conversing with and he told me that at his medical school commencement, the Dean of his school had this simple statement that we all need to keep in perspective….

“As you leave this institution, remember, 50% of everything we believe to be true today will be proven wrong in your lifetime.”

There are some climate clergymen that need to be reminded of what science really is.

re: #56 JAX,

Good way to put it, but I still want to find out how Monte relates to the sex of the children puzzle, if at all. Obviously there is one big difference in that we know in the Monte Hall Paradox that there is a car among all the goats. If you’re trying to figure out about the sex of N children, however, there’s no guarantee that there’s a boy among a bunch of girls, for instance. Of course you can force the children puzzle to be a Monte Hall Paradox, e.g.

A man comes up to you and says, “I have 10 siblings, all but one of whom are girls. One of my siblings lives in Chicago and the rest live on the hill. Oh, here come eight of my remaining siblings down the hill now and they’re all girls as you can see.!” What are the odds the man has a brother living up on the hill? Actually to make it a true Monte puzzle you’d have to pick the sibling in Chicago first in some way but let’s assume that. But I don’t know that that matches well with the usual puzzle.

#34

We should trust our intuition. The correct answer is what we always thought 1/2. The brain is playing tricks. You have ascribed a property to the population (G, B) which does not exist , that is columns 1 and 2. Instead of combinations think of outcomes. The combinations g,b and b,g are identical and the same outcome. No-one goes around with No1 or No2 on their back. There are therefore, in reality, only 3 combinations not 4 which makes the probability of the other one being B 1/2 (Note I say the other one not the second one).

re: 62

What makes the Monty Hall problem a paradox is the appearance of new information when there isn’t. You make a 1 choice. Monty selectively shows you losers from the remaining choices, keeping one unrevealed. Since there is only 1 winner, selecting losers to show you doesn’t tell you anything you didn’t already know.

Initially, the odds are 1/10 the brother lives in Chicago. To make the problem a Monty Hall paradox, the man would have to ask 8 sisters to come down the hill. The odds haven’t changed. No matter where the brother lives, you know that 8 sisters must live on the hill. The paradox is Monty (or the man) selecting losers to show you and the self-deception that you must have learned something that changed the odds. If 8 siblings came down the hill randomly and they were all sisters, then the odds would change. Not saying the sisters are losers, but, but I’ll stop now…

re: #63 Gary,

Not true. One doesn’t have to have a number on one’s back to none-the-less have ordering. Take one coin and flip it two times noting the results. Now take 2 coins, shake both and let them go at once. Same results; twice as likely to have a head and a tail as to have two heads.

63

It sounds like a few commenters need to take some Probability and Stats courses :) We do go around with a number on our back, it’s the order we were born in. (g,b) and (b,g) are not the same, otherwise it would be equally likely for a family with two kids to have two girls as having one girl and one boy.

Sam, this is a simple math problem, we’re not talking about an actual author. There is an assumption that we know the exact probability distribution, but of course in reality we would never know anything exactly. But if you can’t understand the simple problems, the real life problems become impossible.

All combinations have identities, it’s just that some different identities are identical. A good example is in organic chemistry, where for example, meta xylene is the same stuff, whether the methyl groups are at 12 and 4 o’clock, or 8 and 12. The tricky thing in organic is to not fool yourself into thinking that two molecules are different, when they can be re-oriented into the same thing. It gets even trickier with stereoisomers, which look the same, but are non-interchangeable mirror images.

You guys seem to be confusing combinations with permutations.

Mark

Ouch

Dave Dardinger’s description in post #34 is spot on, and the best explanation you’ll get (with clear enumeration).

But the confusion on this thread demonstrates exactly how statistics can be counter-intuitive…

I agree with Spence and Dave in 34, btw, though I haven’t read deep enough to understand why there’s a discussion going on regarding good and bad papers or boys and girls? ;)

Mark

#65 and #66

Mike, Dave

The problem says nothing about ordering or who is the eldest etc. It is as Mark T says permutations not combinations that should be defined. In this case b,g is the same as g,b. In this and Montys paradox Wunsch has given you an answer (2/3) and you have strived to derive it and therefore it must be true. He has done this on purpose to show how you can justify a preconceived answer 2/3 (or AGW) by working backwards and because it shows how 2/3 can be derived the calcs must be true. In fact they are nonsense. The answer to the 3 doors is 1/2 and both your door and the unopen door have the same probability. The number of combinations is irrelevant to the calc of chance. The only determinant of chance is the number of doors. You start at 1/3 but it increases to 1/2 when a goat is revealed. If you want to stick to the combination method then you must realise that when the goat is revealed immediately 2 combinations are not possible. The number of possible combinations decreases with more data.

I’m beginning to see the value of this thread. Helping people learn careful, logical analysis is an important goal. And doing so without the additional pressure of the Hot Topic is at least entertaining for onlookers!

Here’s some questions for readers who see Gary’s logic as compelling:

Does it matter that you have already chosen a door?

Does it matter that Monty knows the answer, i.e. Monty knows whether you have chosen the car already?

You picked a door when there were N choices available.

Gary is saying the probability that your choice is correct increases as the other non-winners are selected.

Wouldn’t it be nice if it really worked that way??!! Let’s jack up the odds a bit.

Mosher’s Hitman offers you your choice of a million lottery tickets. He guarantees one is a winner.

You pick one. You know your odds of winning are one in a million, correct?

Now he tosses 999,998 losing tickets out of the pile. Now there’s the ticket you already picked, and the one remaining. Two tickets total.

By Gary’s logic, now the odds have changed: 50% odds that the ticket you are already holding is the winning ticket.

I’m kind of amazed it is not obvious to everyone what several have stated: the real offer here is your one ticket or ALL the other tickets. 1 or 999999.

Scenarios (3 doors)

Door 1 Door 2 Door 3

Car Goat Goat

Goat Car Goat

Goat Goat Car

(Y = You choose, H= He opens)

For scenario 1 Outcome if you switch

Y H ? Lose

Y ? H Lose

H Y ? X (He cannot reveal Car)

? Y H Win

H ? Y X (He cannot reveal Car)

? H Y Win

For scenario 2

Y H ? X (He cannot reveal Car)

Y ? H Win

H Y ? Lose

? Y H Lose

H ? Y Win

? H Y X (He cannot reveal Car)

For scenario 3

Y H ? Win

Y ? H X (He cannot reveal Car)

H Y ? Win

? Y H X (He cannot reveal Car)

H ? Y Lose

? H Y Lose

Scenarios( 4 Doors )

Door 1 Door 2 Door 3 Door 4

Car Goat Goat Goat

Goat Car Goat Goat

Goat Goat Car Goat

Goat Goat Goat Car

(Y = You choose, H= He opens, ? Unopened door)

For scenario 1 Outcome if you switch

Y H H ? Lose

Y H ? H Lose

Y ? H H Lose

H Y H ? X

H Y ? H X

? Y H H Win

H H Y ? X

H ? Y H X

? H Y H Win

H H ? Y X

H ? H Y X

? H H Y Win

For scenario 2

Y H H ? X

Y H ? H X

Y ? H H Win

H Y H ? Lose

H Y ? H Lose

? Y H H Lose

H H Y ? X

H ? Y H Win

? H Y H X

H H ? Y X

H ? H Y Win

? H H Y X

For scenario 3

Y H H ? X

Y H ? H Win

Y ? H H X

H Y H ? X

H Y ? H Win

? Y H H X

H H Y ? Lose

H ? Y H Lose

? H Y H Lose

H H ? Y Win

H ? H Y X

? H H Y X

For scenario 4

Y H H ? Win

Y H ? H X

Y ? H H X

H Y H ? Win

H Y ? H X

? Y H H X

H H Y ? Win

H ? Y H X

? H Y H X

H H ? Y Lose

H ? H Y Lose

? H H Y Lose

Looks like its 50/50 to me. Where did I go wrong?

Re #73, PeteW

I think you need to weight the probability of actual outcomes to reflect the rejection of disallowed outcomes.

For example, where you say, at the end of the first block :

You are counting this as one outcome for ‘Win’ out of four available, where it is actually two outcomes of the six available.

Ref 73

Pete

Think about probability of outcome (how to maximize your chance of winning) rather than looking at probability of choice. The issues are Scenarios, Order of Play, Hosts Degree of Freedom, Hosts Knowledge, & Contestants Strategy.

You are invited to play the three door game. From a careful study of the history of this game, you have established that the following rules of play apply:

Rule 1. The contestant always goes first.

Rule 2. The host cannot duplicate the contestants choice.

Rule 3. The hosts choice never reveals the prize (for this to always happen the host must know where the prize is located).

Rule 4. The contestant cannot switch to the hosts revealed choice (thereby deliberately lose the game).

The contestant has two playable strategies Strategy Stick or Strategy Switch (other strategies such as sell my option to choose are not allowed).

For a three door game, the contestant has a two in three chance of making the wrong choice, therefore two times out of three the host is forced to eliminate the other wrong door because he cannot reveal the prize. With strategy Stick you win one in three with strategy Switch you win two in three.

Hi,

Thanks for the replies. Isn’t this just smoke-and-mirrors stuff. The contestant is left with a choice of 2 doors, behind one of which is the car. His initial choice is not relevant, it just adds to the tension of the occasion, he will always be left with a choice of 2, one of which is the car.

No. The doors are not equal.

Hi,

OK, I get it now, thanks for the insights.

Gary:

I think the interesting point is illustrated by Mr Pete’s example. However, to add to the fun there is a new game on TV that apppears to have some of the same dimensions as the three door game with brief cases that have different amounts of cash in them. I am not sure of the rules so perhaps someone else can extend the analysis to this game. The wrinkle is that the prizes behind the different doors vary in value. Does the same logic hold?

Statistical analyses of “runs” in good athletic performance have evidently ruled against the existence of the “hot hand”. While, in general, that might be the case, I would swear that I have experienced being in a zone, where I felt success was building on success. It may, however, truly not occur sufficiently often to have a significant presence in these studies.

To determine whether my feelings might have some merit, I searched and found this link where the author discusses the issue of the hot hand in basketball shooting and the confusion of testing Bernoulli trials in athletic endeavors with the concept of the hot hand being an auto correlation phenomenon versus one of nonstationarity. I would like to think that my experiences, referenced above, would fit the case for nonstationarity in that they occurred very infrequently and consisted of runs, the length of which by way of my simplistic binomial trials calculations and estimates of probable success rates would amount to occurring by chance only 1 out of 100 thousands and 10s of millions of times.

My feeling of being in the zone with the resulting long odds of success occurring by chance occurred in one time a basketball shooting success, a one time baseball throwing precision experience and a one time bowling success that never occurred before or after that one experience. Have others posting here had similar experiences?

I am certinly not beyond being shown that my intuition here is a form of self-deception.

http://www.stat.wisc.edu/~wardrop/papers/tr1007.pdf

re:73 PeteW

All your outcomes X (he cannot reveal the car) are really wins if you switched. Monty has the foreknowledge and freedom to switch to another door to open. If he did not have the freedom to open either of his doors every time, it would be a 50/50 game. But this isn’t a game where you pick 1, Monty picks 1, and the other is randomly tossed out. It’s you pick 1, Monty gets the rest, and he can pick either of his doors to open.

Hi,

I think the problem is in assigning a probability to the remaining door, and seeing it as ALL the non-contestant-selected doors rolled into one, that gets you to the correct answer ( see MrPete ).

Since there are a lot of people on the contest thread complaining that we need to get back to science, I thought I’d amplify something I said earlier. This sort of puzzle / game / paradox is a good sort of example of importance of knowledge.

I’m sure that anyone, even if they haven’t ‘got it’ on these puzzles will agree that if the contestant were able to peek behind the curtain that would change the odds on whether to switch or hold drastically. And as these smaller difference in odds, 1/2 vs 2/3, matter substantially, it should make people more careful when it comes to the sort of cherry picking which Steve has been discussing ever since this blog started. Small changes in how proxies are chosen can make big differences in whether final results are significant or not.

OK Dave, Ill go for the bait. The doors and goats thread seems to have kept people from commenting on several other examples of counter-intuition raised in the paper. However, IMHO, Wunsch has not sufficiently explained why and how these examples are relevant.

The reference given is to a result in a well-known book by probabilist, William Feller. The result talks about the distribution of the

proportion of timethat accumulated winnings stay positive. Intuition says that, since winning and losing are equally likely, the proportion should be close to one-half. Feller shows that in fact the proportions of time that the winnings would be closer to either 0 or to 1 are each by far much more likely, a very non-intuitive result.In a second example, I think that the author sort of misses the point:

The answer given is The second is more probable. This is neither surprising nor does it demonstrate that intuition may be wrong. A more common presentation of such a problem is to look at this in the following way:

Let us call the distribution of marbles A-B-C-D-E where some person gets A marbles, another B, another C, etc. (but, unlike the example, the particular person getting a given number of marbles is not specified). Here the numbers are always written in non-increasing order. Game 1 has distribution 5-4-4-4-3 and the second game 4-4-4-4-4. This is similar to the way bridge players talk about the number of cards in each suit in a bridge hand. Which distribution is more likely: 4-4-4-4-4 or 5-4-4-4-3? The “so-called law of small numbers” quoted by the author seems to be violated here. (There are extra points for giving the ratio of the two probabilities. ;) )

And intuition here is correct. The

long-term expectationfor the proportion is one half. But the question of how long the series stays on either side of the 0 line during any givenfinite time intervalis a totally separate question. The longer the observation interval, the more severe the expected departure from the zero line, and the longer it takes for the series to return to the expected value. This is a well-known property of the so-called random walk/drunkard’s walk.