<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Greene: I am not, nor have I ever been a member of a data-mining discipline</title>
	<atom:link href="http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/feed/" rel="self" type="application/rss+xml" />
	<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/</link>
	<description>by Steve McIntyre</description>
	<lastBuildDate>Sat, 18 May 2013 17:16:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: John Hekman</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62952</link>
		<dc:creator><![CDATA[John Hekman]]></dc:creator>
		<pubDate>Tue, 05 Sep 2006 15:34:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62952</guid>
		<description><![CDATA[There is one other distinction to be made in this type of modelling:

When you throw more and more variables into a regression without theory to back them up in order to get a significant result, that is called a fishing expedition.

When (a la Mann) you also select which data series to allow into the estimation in addition to thowing in variables at will, that is called shooting fish in a barrel.]]></description>
		<content:encoded><![CDATA[<p>There is one other distinction to be made in this type of modelling:</p>
<p>When you throw more and more variables into a regression without theory to back them up in order to get a significant result, that is called a fishing expedition.</p>
<p>When (a la Mann) you also select which data series to allow into the estimation in addition to thowing in variables at will, that is called shooting fish in a barrel.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sinan</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62951</link>
		<dc:creator><![CDATA[Sinan]]></dc:creator>
		<pubDate>Tue, 05 Sep 2006 12:17:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62951</guid>
		<description><![CDATA[At least back in 1988 when I took my first Econometrics class, &quot;data mining&quot; had a specific meaning.

It is used to refer to the practice of looking at the data (graphing, computing descriptive statistics, running regressions, reading previous studies using the same data set etc) and *then* coming up with hypotheses that match the patterns in the data.

Since the hypotheses are formed to conform to whatever is seen in the data, there is no chance they can be rejected. Now, if the data set were used to form ideas about a model to be tested, then a separate but equally applicable data set were obtained and the model were tested against those data, then that would be a proper application.

&lt;em&gt;If you torture a data set enough, it will confess to what you want to hear&lt;/em&gt;.

When I look at climate related work, I do not see many testable hypotheses. I see a lot of maintained hypotheses. And, an alarming tendency to equate correlation with causation.

The idea seems to be to search for ways to construct series that correlate with human activity without seriously questioning the connection between actual temperature movements and the constructed series.

Sinan]]></description>
		<content:encoded><![CDATA[<p>At least back in 1988 when I took my first Econometrics class, &#8220;data mining&#8221; had a specific meaning.</p>
<p>It is used to refer to the practice of looking at the data (graphing, computing descriptive statistics, running regressions, reading previous studies using the same data set etc) and *then* coming up with hypotheses that match the patterns in the data.</p>
<p>Since the hypotheses are formed to conform to whatever is seen in the data, there is no chance they can be rejected. Now, if the data set were used to form ideas about a model to be tested, then a separate but equally applicable data set were obtained and the model were tested against those data, then that would be a proper application.</p>
<p><em>If you torture a data set enough, it will confess to what you want to hear</em>.</p>
<p>When I look at climate related work, I do not see many testable hypotheses. I see a lot of maintained hypotheses. And, an alarming tendency to equate correlation with causation.</p>
<p>The idea seems to be to search for ways to construct series that correlate with human activity without seriously questioning the connection between actual temperature movements and the constructed series.</p>
<p>Sinan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sonicfrog</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62950</link>
		<dc:creator><![CDATA[sonicfrog]]></dc:creator>
		<pubDate>Tue, 05 Sep 2006 04:32:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62950</guid>
		<description><![CDATA[To me, the issue isn&#039;t so much data mining as it is output mining.]]></description>
		<content:encoded><![CDATA[<p>To me, the issue isn&#8217;t so much data mining as it is output mining.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John S</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62949</link>
		<dc:creator><![CDATA[John S]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 21:16:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62949</guid>
		<description><![CDATA[The term &#039;data mining&#039; has been used in econometrics for quite some time. I suspect it predates the definitions above by quite some time. It was in use pejoratively in the early 1990s at least.

Guess I better go over and use the power of Wiki to fix that erroneous information.]]></description>
		<content:encoded><![CDATA[<p>The term &#8216;data mining&#8217; has been used in econometrics for quite some time. I suspect it predates the definitions above by quite some time. It was in use pejoratively in the early 1990s at least.</p>
<p>Guess I better go over and use the power of Wiki to fix that erroneous information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve McIntyre</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62948</link>
		<dc:creator><![CDATA[Steve McIntyre]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 21:10:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62948</guid>
		<description><![CDATA[Any articles on &quot;data mining&quot; usually distinguish between senses in which it is used.

&quot;Data snooping&quot; is another term which may be more apt for what&#039;s going on with the multiproxy studies.  The point is that Osborn and Briffa or Hegerl know in advance what particular proxies look like and so the data has been &quot;snooped&quot; in advance and just as in the econometric cases, statistical tests no longer apply as they would to fresh data.]]></description>
		<content:encoded><![CDATA[<p>Any articles on &#8220;data mining&#8221; usually distinguish between senses in which it is used.</p>
<p>&#8220;Data snooping&#8221; is another term which may be more apt for what&#8217;s going on with the multiproxy studies.  The point is that Osborn and Briffa or Hegerl know in advance what particular proxies look like and so the data has been &#8220;snooped&#8221; in advance and just as in the econometric cases, statistical tests no longer apply as they would to fresh data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hans Erren</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62947</link>
		<dc:creator><![CDATA[Hans Erren]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 21:02:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62947</guid>
		<description><![CDATA[Steve, i&#039;ts spelled Oerlemans not Oerle-&lt;strong&gt;mann&lt;/strong&gt;-s. :-)

The term Data-mining I am familiar with, is a method to scan corporate archives for useful legacy data. Wiki describes misuse of statistical data as &quot;data-dredging&quot; or &quot;data-fishing&quot;.
&lt;blockquote&gt;Data mining has been defined as &quot;the nontrivial extraction of implicit, previously unknown, and potentially useful information from data&quot; [1] and &quot;the science of extracting useful information from large data sets or databases&quot; [2].

Used in the technical context of data warehousing and analysis, the term &quot;data mining&quot; is neutral. However, it sometimes has a more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist.[citation needed] This imposition of irrelevant, misleading or trivial attribute correlation is more properly criticized as &quot;data dredging&quot; in the statistical literature. Another term for this misuse of statistics is data fishing.&lt;/blockquote&gt;
http://en.wikipedia.org/wiki/Data_mining]]></description>
		<content:encoded><![CDATA[<p>Steve, i&#8217;ts spelled Oerlemans not Oerle-<strong>mann</strong>-s. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The term Data-mining I am familiar with, is a method to scan corporate archives for useful legacy data. Wiki describes misuse of statistical data as &#8220;data-dredging&#8221; or &#8220;data-fishing&#8221;.</p>
<blockquote><p>Data mining has been defined as &#8220;the nontrivial extraction of implicit, previously unknown, and potentially useful information from data&#8221; [1] and &#8220;the science of extracting useful information from large data sets or databases&#8221; [2].</p>
<p>Used in the technical context of data warehousing and analysis, the term &#8220;data mining&#8221; is neutral. However, it sometimes has a more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist.[citation needed] This imposition of irrelevant, misleading or trivial attribute correlation is more properly criticized as &#8220;data dredging&#8221; in the statistical literature. Another term for this misuse of statistics is data fishing.</p></blockquote>
<p><a href="http://en.wikipedia.org/wiki/Data_mining" rel="nofollow">http://en.wikipedia.org/wiki/Data_mining</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pat Frank</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62946</link>
		<dc:creator><![CDATA[Pat Frank]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 20:42:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62946</guid>
		<description><![CDATA[#2, I can only speak from my own experience in chemistry, which is that people put their own ideas ahead of professional friendship. Among the professionally ethical, their ideas must be analytically defensible. Among the less ethical (often those with the most ego-driven ambitions), their ideas are defended, whether ultimately analytically defensible or not. I&#039;ve seen intense debates, even publicly in conferences, between prominent scientists who knew one another well, but were contesting mutually exclusive interpretations. (I&#039;ve had a couple such debates myself.)

So, it often doesn&#039;t matter whether people know one another or not, or have met at conferences. It matters whether someone is advancing ideas that contradict one&#039;s own, and what one thinks about the meaning of the relevant data. Ideas are, after all, the currency of science. Those who have the best ones are the richest. That&#039;s what the fight&#039;s all about, and the passions are real.

People of good personality, like Rob Wilson, almost always will gracefully concede if they&#039;re shown wrong through an unarguable analysis. Others will show bad grace, right or wrong. The personal lesson from practicing science is that it is always better to not be ego-involved in the outcome. The challenge is to be thoughtful, not to be right. Nature decides who&#039;s right, and nature is always trickier than anyone knows.

The critical ingredient leading to clarity, as always, is that data and methods be fully exposed. I&#039;ve never, ever, seen anything in chemistry like the deliberate and dishonest obscurantism that infects climate science.]]></description>
		<content:encoded><![CDATA[<p>#2, I can only speak from my own experience in chemistry, which is that people put their own ideas ahead of professional friendship. Among the professionally ethical, their ideas must be analytically defensible. Among the less ethical (often those with the most ego-driven ambitions), their ideas are defended, whether ultimately analytically defensible or not. I&#8217;ve seen intense debates, even publicly in conferences, between prominent scientists who knew one another well, but were contesting mutually exclusive interpretations. (I&#8217;ve had a couple such debates myself.)</p>
<p>So, it often doesn&#8217;t matter whether people know one another or not, or have met at conferences. It matters whether someone is advancing ideas that contradict one&#8217;s own, and what one thinks about the meaning of the relevant data. Ideas are, after all, the currency of science. Those who have the best ones are the richest. That&#8217;s what the fight&#8217;s all about, and the passions are real.</p>
<p>People of good personality, like Rob Wilson, almost always will gracefully concede if they&#8217;re shown wrong through an unarguable analysis. Others will show bad grace, right or wrong. The personal lesson from practicing science is that it is always better to not be ego-involved in the outcome. The challenge is to be thoughtful, not to be right. Nature decides who&#8217;s right, and nature is always trickier than anyone knows.</p>
<p>The critical ingredient leading to clarity, as always, is that data and methods be fully exposed. I&#8217;ve never, ever, seen anything in chemistry like the deliberate and dishonest obscurantism that infects climate science.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve McIntyre</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62945</link>
		<dc:creator><![CDATA[Steve McIntyre]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 20:17:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62945</guid>
		<description><![CDATA[It&#039;s common sense that you become less critical of people when you know them. I&#039;ve corresponded with Rob Wilson; he&#039;s been pleasant to me and I&#039;d be pretty reluctant to get hard-edged with him.

Wouldn&#039;t all the climate conferences have an impact? When I went to university many years ago, I&#039;m sure that professors weren&#039;t haring off to conferences all over the world every few weeks. Now they see each other a lot and that must surely smooth things over.]]></description>
		<content:encoded><![CDATA[<p>It&#8217;s common sense that you become less critical of people when you know them. I&#8217;ve corresponded with Rob Wilson; he&#8217;s been pleasant to me and I&#8217;d be pretty reluctant to get hard-edged with him.</p>
<p>Wouldn&#8217;t all the climate conferences have an impact? When I went to university many years ago, I&#8217;m sure that professors weren&#8217;t haring off to conferences all over the world every few weeks. Now they see each other a lot and that must surely smooth things over.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pat Frank</title>
		<link>http://climateaudit.org/2006/09/04/greene-i-am-not-nor-have-i-ever-been-a-member-of-a-data-mining-discipline/#comment-62944</link>
		<dc:creator><![CDATA[Pat Frank]]></dc:creator>
		<pubDate>Mon, 04 Sep 2006 20:05:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=805#comment-62944</guid>
		<description><![CDATA[There is a climatologically apt irony present in the immediately preceding paper in the same issue of the journal.  Here&#039;s the abstract of the paper by Pagan and Veall, with the relevant parts bolded:

&quot;&lt;em&gt;We maintain that the actions of researchers show that data mining is a necessary part of econometric inquiry. We analyse this phenomenon using the analogy of an industry producing a product (econometric analyses). There is a risk of selective reporting as Mayer indicates but we argue that &lt;strong&gt;other researchers (competition) will ensure that the sensitivity of truly important findings is checked. Hence, initial researchers have an incentive to analyse sensitivity from the beginning and so produce a quality product&lt;/strong&gt;. Some suggestions are made towards encouraging this process. The &#039;general to specific&#039; approach to data mining as promoted by Hoover and Perez can be valuable but it is premature to eliminate other strategies.&lt;/em&gt;&quot;

The bolded part immediately elucidates the problem brought to proxy climatology by Michael Mann. His obscure language in publication, his secrecy concerning methodology, and his continuing obstructionism has frustrated the &quot;competition&quot; that should have had the opportunity to check his results. (One might say the same about Phil Jones.)

This analogy has the further benefit of putting Wegmann&#039;s social network analysis in its proper light. That is, by roping most proxy workers into a collaborative relationship, Mann (deliberately or not) ensured there is no competition at all. He produced in the proxy community the business-equivalent of an interlocking directorate. The result was monopoly and restraint of trade. The &quot;&lt;em&gt;incentive to analyse sensitivity from the beginning and so produce a quality product&lt;/em&gt;&quot; was entirely short-circuited.

As a result, the product was not quality, as you and Ross have shown.

Unfortunately for us all, Mann&#039;s product had an addictive ingredient that has enslaved the minds of his consumer base. They are very angry about being exposed and very threatened by the incipient removal of a product that eases so much (psychological) pain, and are going to great lengths to keep it in the market.

A. R. Pagan and M. R. Veall (2000) &quot;Data mining and the econometrics industry: comments on the papers of Mayer and of Hoover and Perez&quot;  Journal of Economic Methodology 7(2) 211-216]]></description>
		<content:encoded><![CDATA[<p>There is a climatologically apt irony present in the immediately preceding paper in the same issue of the journal.  Here&#8217;s the abstract of the paper by Pagan and Veall, with the relevant parts bolded:</p>
<p>&#8220;<em>We maintain that the actions of researchers show that data mining is a necessary part of econometric inquiry. We analyse this phenomenon using the analogy of an industry producing a product (econometric analyses). There is a risk of selective reporting as Mayer indicates but we argue that <strong>other researchers (competition) will ensure that the sensitivity of truly important findings is checked. Hence, initial researchers have an incentive to analyse sensitivity from the beginning and so produce a quality product</strong>. Some suggestions are made towards encouraging this process. The &#8216;general to specific&#8217; approach to data mining as promoted by Hoover and Perez can be valuable but it is premature to eliminate other strategies.</em>&#8221;</p>
<p>The bolded part immediately elucidates the problem brought to proxy climatology by Michael Mann. His obscure language in publication, his secrecy concerning methodology, and his continuing obstructionism has frustrated the &#8220;competition&#8221; that should have had the opportunity to check his results. (One might say the same about Phil Jones.)</p>
<p>This analogy has the further benefit of putting Wegmann&#8217;s social network analysis in its proper light. That is, by roping most proxy workers into a collaborative relationship, Mann (deliberately or not) ensured there is no competition at all. He produced in the proxy community the business-equivalent of an interlocking directorate. The result was monopoly and restraint of trade. The &#8220;<em>incentive to analyse sensitivity from the beginning and so produce a quality product</em>&#8221; was entirely short-circuited.</p>
<p>As a result, the product was not quality, as you and Ross have shown.</p>
<p>Unfortunately for us all, Mann&#8217;s product had an addictive ingredient that has enslaved the minds of his consumer base. They are very angry about being exposed and very threatened by the incipient removal of a product that eases so much (psychological) pain, and are going to great lengths to keep it in the market.</p>
<p>A. R. Pagan and M. R. Veall (2000) &#8220;Data mining and the econometrics industry: comments on the papers of Mayer and of Hoover and Perez&#8221;  Journal of Economic Methodology 7(2) 211-216</p>
]]></content:encoded>
	</item>
</channel>
</rss>
