<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Porting RegEM to R #1</title>
	<atom:link href="http://climateaudit.org/2009/02/17/porting-regem-to-r-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/</link>
	<description>by Steve McIntyre</description>
	<lastBuildDate>Tue, 21 May 2013 05:19:05 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Jeff Id</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177241</link>
		<dc:creator><![CDATA[Jeff Id]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 20:10:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177241</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328359&quot; rel=&quot;nofollow&quot;&gt;Ryan O (#40)&lt;/a&gt;,

For sure.  Not only is there no way to know, there is no effort to check.]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328359" rel="nofollow">Ryan O (#40)</a>,</p>
<p>For sure.  Not only is there no way to know, there is no effort to check.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan O</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177240</link>
		<dc:creator><![CDATA[Ryan O]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 15:43:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177240</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328352&quot; rel=&quot;nofollow&quot;&gt;Mike B (#39)&lt;/a&gt;, There&#039;s no way to know.  In the case of the AWS recon, it is very possible to have undetected, spurious correlations because some of the station data is so short.  For longer series - like the manned stations - this is less likely because a spurious correlation is unlikely to persist for long periods.  But with the shorter duration of many of the AWS stations, it could definitely be an issue.]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328352" rel="nofollow">Mike B (#39)</a>, There&#8217;s no way to know.  In the case of the AWS recon, it is very possible to have undetected, spurious correlations because some of the station data is so short.  For longer series &#8211; like the manned stations &#8211; this is less likely because a spurious correlation is unlikely to persist for long periods.  But with the shorter duration of many of the AWS stations, it could definitely be an issue.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike B</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177239</link>
		<dc:creator><![CDATA[Mike B]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 15:33:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177239</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328229&quot; rel=&quot;nofollow&quot;&gt;Jeff C. (#23)&lt;/a&gt;,

Thanks Jeff.  And I get that part (i.e. the algorithm implicitly &quot;recognizes&quot; the higher correlation between two stations 50 miles apart and two stations 950 miles apart).

But my poorly posed question(s) are slightly different:

a) What if station B, via error or spurious correlation, is more highly correlated with station C 950 miles away than it is with station A 50 miles away?

and more generally:

b) Even if things are working well, you&#039;re still stuck (in your hypothetical) with 950 miles of guesswork between two stations.  And RegEM seems to crank along indifferent to this.]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328229" rel="nofollow">Jeff C. (#23)</a>,</p>
<p>Thanks Jeff.  And I get that part (i.e. the algorithm implicitly &#8220;recognizes&#8221; the higher correlation between two stations 50 miles apart and two stations 950 miles apart).</p>
<p>But my poorly posed question(s) are slightly different:</p>
<p>a) What if station B, via error or spurious correlation, is more highly correlated with station C 950 miles away than it is with station A 50 miles away?</p>
<p>and more generally:</p>
<p>b) Even if things are working well, you&#8217;re still stuck (in your hypothetical) with 950 miles of guesswork between two stations.  And RegEM seems to crank along indifferent to this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bender</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177238</link>
		<dc:creator><![CDATA[bender]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 13:54:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177238</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328295&quot; rel=&quot;nofollow&quot;&gt;Alan S. Blue (#32)&lt;/a&gt;,
I just asked the same question in the other thread. Why infill if your sole purpose is to estimate a trend? Unless you are infilling for one data type based on patterns in another data type ...]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328295" rel="nofollow">Alan S. Blue (#32)</a>,<br />
I just asked the same question in the other thread. Why infill if your sole purpose is to estimate a trend? Unless you are infilling for one data type based on patterns in another data type &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bender</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177237</link>
		<dc:creator><![CDATA[bender]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 13:50:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177237</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328305&quot; rel=&quot;nofollow&quot;&gt;Jeff Id (#33)&lt;/a&gt;,

&lt;blockquote&gt;It seems like statistics should have a frequency based r val.&lt;/blockquote&gt;

For periodic time-series you use cross-spectral coherence. It is interpreted like Pearson&#039;s r wrt a specific frequency range. But for aperiodic time-series the spectral methods (frequency domain) are not so good. There is nothing illegal about filtering and correlating based on the filtered component series. This is not a statistical issue. It&#039;s a signal processing issue. The idea is that weather noise, climate noise, and forcing signals occur on different characteristic timescales and you filter on that basis. This is not a statistical proposition; it is a climatoogical propostiion. So it&#039;s out of the hands of the statisticians. You can ask a statistician&#039;s opinion, but they are going to tell you that it makes sense, if that&#039;s what the physics dictates. Two statisticians I would trust on this are Bloomfield and Nychka. (And of course, Wegman.)]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328305" rel="nofollow">Jeff Id (#33)</a>,</p>
<blockquote><p>It seems like statistics should have a frequency based r val.</p></blockquote>
<p>For periodic time-series you use cross-spectral coherence. It is interpreted like Pearson&#8217;s r wrt a specific frequency range. But for aperiodic time-series the spectral methods (frequency domain) are not so good. There is nothing illegal about filtering and correlating based on the filtered component series. This is not a statistical issue. It&#8217;s a signal processing issue. The idea is that weather noise, climate noise, and forcing signals occur on different characteristic timescales and you filter on that basis. This is not a statistical proposition; it is a climatoogical propostiion. So it&#8217;s out of the hands of the statisticians. You can ask a statistician&#8217;s opinion, but they are going to tell you that it makes sense, if that&#8217;s what the physics dictates. Two statisticians I would trust on this are Bloomfield and Nychka. (And of course, Wegman.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Craig Loehle</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177236</link>
		<dc:creator><![CDATA[Craig Loehle]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 13:47:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177236</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328289&quot; rel=&quot;nofollow&quot;&gt;Molon Labe (#31)&lt;/a&gt;, Oh I just love all the tech jokes.  You had me completely sucked in for a minute.  I notice when the discussion gets really mathematical the trolls stay away: is math like wolfbane or garlic?]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328289" rel="nofollow">Molon Labe (#31)</a>, Oh I just love all the tech jokes.  You had me completely sucked in for a minute.  I notice when the discussion gets really mathematical the trolls stay away: is math like wolfbane or garlic?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MrPete</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177235</link>
		<dc:creator><![CDATA[MrPete]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 09:47:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177235</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328295&quot; rel=&quot;nofollow&quot;&gt;Alan S. Blue (#32)&lt;/a&gt;, that sounds like a great methodology to me. But what do I know, I&#039;m certainly no statistician. (Note: Alan&#039;s comment must be read carefully; many times when referring to &quot;average&quot; or &quot;weight&quot; he&#039;s referring to average or weight &lt;i&gt;of the errors&lt;/i&gt;&quot; not of temperature.)]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328295" rel="nofollow">Alan S. Blue (#32)</a>, that sounds like a great methodology to me. But what do I know, I&#8217;m certainly no statistician. (Note: Alan&#8217;s comment must be read carefully; many times when referring to &#8220;average&#8221; or &#8220;weight&#8221; he&#8217;s referring to average or weight <i>of the errors</i>&#8221; not of temperature.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Geoff Sherrington</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177234</link>
		<dc:creator><![CDATA[Geoff Sherrington]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 09:02:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177234</guid>
		<description><![CDATA[Re your graphs in header:

I&#039;m not too sure of the validity of correlation coefficients on annual data averaged from daily data. Using years 2004 (less a day for leap year) and 2005 for Macquarie Island, the CORREL Excel function gives 0.555 for Tmax and 0.453 for Tmin. Years chosen at random.

The stats package for year 2005 daily gives for Tmax

Mean	7.039452055
Standard Error	0.116673752
Median	6.9
Mode	6.6
Standard Deviation	2.229048909
Sample Variance	4.96865904
Kurtosis	0.086984527
Skewness	-0.312414358
Range	13.8
Minimum	-0.2
Maximum	13.6
Sum	2569.4
Count	365

and for Tmin

Mean	3.536191781
Standard Error	0.142589332
Median	3.9
Mode   3.0
Standard Deviation	2.724165366
Sample Variance	7.421076941
Kurtosis	-0.452491633
Skewness	-0.457868522
Range	13.1
Minimum	-4.3
Maximum	8.8
Sum	1290.71
Count	365

How do you get correlations of 0.75 in your graphs when taking a favourable case I can only manage 0.45-0.55? Am I missing a square root terminologically somewhere? If not, surely correlation coefficients should decrease as you go further out on a data limb?]]></description>
		<content:encoded><![CDATA[<p>Re your graphs in header:</p>
<p>I&#8217;m not too sure of the validity of correlation coefficients on annual data averaged from daily data. Using years 2004 (less a day for leap year) and 2005 for Macquarie Island, the CORREL Excel function gives 0.555 for Tmax and 0.453 for Tmin. Years chosen at random.</p>
<p>The stats package for year 2005 daily gives for Tmax</p>
<p>Mean	7.039452055<br />
Standard Error	0.116673752<br />
Median	6.9<br />
Mode	6.6<br />
Standard Deviation	2.229048909<br />
Sample Variance	4.96865904<br />
Kurtosis	0.086984527<br />
Skewness	-0.312414358<br />
Range	13.8<br />
Minimum	-0.2<br />
Maximum	13.6<br />
Sum	2569.4<br />
Count	365</p>
<p>and for Tmin</p>
<p>Mean	3.536191781<br />
Standard Error	0.142589332<br />
Median	3.9<br />
Mode   3.0<br />
Standard Deviation	2.724165366<br />
Sample Variance	7.421076941<br />
Kurtosis	-0.452491633<br />
Skewness	-0.457868522<br />
Range	13.1<br />
Minimum	-4.3<br />
Maximum	8.8<br />
Sum	1290.71<br />
Count	365</p>
<p>How do you get correlations of 0.75 in your graphs when taking a favourable case I can only manage 0.45-0.55? Am I missing a square root terminologically somewhere? If not, surely correlation coefficients should decrease as you go further out on a data limb?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Id</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177233</link>
		<dc:creator><![CDATA[Jeff Id]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 07:36:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177233</guid>
		<description><![CDATA[Re: &lt;a href=&quot;#comment-328284&quot; rel=&quot;nofollow&quot;&gt;bender (#30)&lt;/a&gt;,

I was hoping for a rule or something.  It seems like statistics should have a frequency based r val.  I&#039;m just an engineer though and have not yet attempted to prove anything with a correlation.  Things either work or they don&#039;t and in my experience the god of physics pretty well let&#039;s me know when I messed up.

By the way, I did some slightly more complete PC analysis of the Sat data.  I&#039;m completely exhausted but if I didn&#039;t mess it up, Dr. Steig is going to have a headache tomorrow.

http://noconsensus.wordpress.com/2009/02/18/the-three-pcs-of-the-antarctic/]]></description>
		<content:encoded><![CDATA[<p>Re: <a href="#comment-328284" rel="nofollow">bender (#30)</a>,</p>
<p>I was hoping for a rule or something.  It seems like statistics should have a frequency based r val.  I&#8217;m just an engineer though and have not yet attempted to prove anything with a correlation.  Things either work or they don&#8217;t and in my experience the god of physics pretty well let&#8217;s me know when I messed up.</p>
<p>By the way, I did some slightly more complete PC analysis of the Sat data.  I&#8217;m completely exhausted but if I didn&#8217;t mess it up, Dr. Steig is going to have a headache tomorrow.</p>
<p><a href="http://noconsensus.wordpress.com/2009/02/18/the-three-pcs-of-the-antarctic/" rel="nofollow">http://noconsensus.wordpress.com/2009/02/18/the-three-pcs-of-the-antarctic/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alan S. Blue</title>
		<link>http://climateaudit.org/2009/02/17/porting-regem-to-r-1/#comment-177232</link>
		<dc:creator><![CDATA[Alan S. Blue]]></dc:creator>
		<pubDate>Wed, 18 Feb 2009 06:08:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=5252#comment-177232</guid>
		<description><![CDATA[I&#039;m still missing the fundamental reason for infilling or inferring data in the first place.

First break the spacial area into gridcells (which is already done). Then determine &quot;The Gridcell Temperature&quot; in each gridcell at each time. Gridcells with no data are exactly that - gridcells with no data. Gridcells with 30 datapoints at a given time mean that you&#039;ve probably substantially reduced the error on that particular datapoint.

When it becomes time to estimate the average temperature of the entire area, you&#039;re doing a average weighted by the errors on the individual gridcells. The &#039;30 point&#039; cell is going to get a substantially heavier weight than the zero data cell (which would actually get a zero weight). Along with the &quot;Temperature Maps&quot; that we keep getting, you&#039;d be able to show an &quot;Error Map.&quot;

At the next level, when you&#039;re moving from individual times to trend across the years, you&#039;ve got individual error bars on each time. If there&#039;s very little data in 1973, you can see it flat out - because the error for the 1973 data would be enormous. Whereas when you&#039;re doing the same thing for a gridcell in 2003 with full coverage, you should have nice tight error bars.

So in fitting a trend, you also get to weight the yearly data by it&#039;s own individual merit. Rock solid data after 2000? Excellent, strongly weighted. All data from any random year missing? That&#039;s fine - you&#039;ve made no assumptions about missing data whatsoever.]]></description>
		<content:encoded><![CDATA[<p>I&#8217;m still missing the fundamental reason for infilling or inferring data in the first place.</p>
<p>First break the spacial area into gridcells (which is already done). Then determine &#8220;The Gridcell Temperature&#8221; in each gridcell at each time. Gridcells with no data are exactly that &#8211; gridcells with no data. Gridcells with 30 datapoints at a given time mean that you&#8217;ve probably substantially reduced the error on that particular datapoint.</p>
<p>When it becomes time to estimate the average temperature of the entire area, you&#8217;re doing a average weighted by the errors on the individual gridcells. The &#8217;30 point&#8217; cell is going to get a substantially heavier weight than the zero data cell (which would actually get a zero weight). Along with the &#8220;Temperature Maps&#8221; that we keep getting, you&#8217;d be able to show an &#8220;Error Map.&#8221;</p>
<p>At the next level, when you&#8217;re moving from individual times to trend across the years, you&#8217;ve got individual error bars on each time. If there&#8217;s very little data in 1973, you can see it flat out &#8211; because the error for the 1973 data would be enormous. Whereas when you&#8217;re doing the same thing for a gridcell in 2003 with full coverage, you should have nice tight error bars.</p>
<p>So in fitting a trend, you also get to weight the yearly data by it&#8217;s own individual merit. Rock solid data after 2000? Excellent, strongly weighted. All data from any random year missing? That&#8217;s fine &#8211; you&#8217;ve made no assumptions about missing data whatsoever.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
