<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Hansen Step 1</title>
	<atom:link href="http://climateaudit.org/2007/09/18/hansen-step-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://climateaudit.org/2007/09/18/hansen-step-1/</link>
	<description>by Steve McIntyre</description>
	<lastBuildDate>Tue, 18 Jun 2013 04:29:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Steve McIntyre</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106958</link>
		<dc:creator><![CDATA[Steve McIntyre]]></dc:creator>
		<pubDate>Sat, 22 Sep 2007 13:51:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106958</guid>
		<description><![CDATA[#33. yes we&#039;re on  the same page.]]></description>
		<content:encoded><![CDATA[<p>#33. yes we&#8217;re on  the same page.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: BarryW</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106957</link>
		<dc:creator><![CDATA[BarryW]]></dc:creator>
		<pubDate>Sat, 22 Sep 2007 13:42:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106957</guid>
		<description><![CDATA[Re #32

Sorry to bother, but my reading of the subroutine is that if no record is ranked higher than &#039;UNKNOWN&#039; then the first record with the longest length is used.  If there is at least one record with a rank greater than &#039;UNKNOWN&#039; then the first record with the highest rank is used regardless of length.]]></description>
		<content:encoded><![CDATA[<p>Re #32</p>
<p>Sorry to bother, but my reading of the subroutine is that if no record is ranked higher than &#8216;UNKNOWN&#8217; then the first record with the longest length is used.  If there is at least one record with a rank greater than &#8216;UNKNOWN&#8217; then the first record with the highest rank is used regardless of length.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve McIntyre</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106956</link>
		<dc:creator><![CDATA[Steve McIntyre]]></dc:creator>
		<pubDate>Sat, 22 Sep 2007 02:36:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106956</guid>
		<description><![CDATA[HEre&#039;s HAnsen&#039;s ranking subroutine from the comb_records.py.  Picking the right-hand series only works when there&#039;s a MCDW record (Which is always on the right.)   John, in the cases that you&#039;ve identified, looking at this script: there&#039;s no USHCN, MCDW applicable;  so ranks are tied;  the lengths are tied. Examining the details of the program, because the second record failed to be longer than the first record, the first record kept priority. The ordering of records here is taken over from GHCN.  I inquired from GHCN whether they had any concordance of the sources of their versions and there is none online; they said that there may be some information offline somewhere and they are looking into it (even if they give an answer that I don&#039;t like, the NOAA folks do so pleasantly as opposed to Hansen&#039;s crowd).

Anyway this confirms John G&#039;s point and I need to modify my algorithm a little. As to why the ordering should  seemingly be non-commutative in its impact on trends (if indeed that is the case), that should prove another worthy topic for investigation.

&lt;blockquote&gt;def get_best(records):
    ranks = {&#039;MCDW&#039;: 4, &#039;USHCN&#039;: 3, &#039;SUMOFDAY&#039;: 2, &#039;UNKNOWN&#039;: 1}
    best = 1
    for rec_id, record in records.items():
        source = record[&#039;dict&#039;][&#039;source&#039;]
        rank = ranks[source][/source]
        if rank &gt; best:
            best = rank
            best_rec = record
            best_id = rec_id
    if best &gt; 1:
        return best_rec, best_id
    longest = 0
    for rec_id, record in records.items():
        length = record[&#039;length&#039;]
        if length &gt; longest:
            longest = length
            longest_rec = record
            longest_id = rec_id
    return longest_rec, longest_id
&lt;/blockquote&gt;]]></description>
		<content:encoded><![CDATA[<p>HEre&#8217;s HAnsen&#8217;s ranking subroutine from the comb_records.py.  Picking the right-hand series only works when there&#8217;s a MCDW record (Which is always on the right.)   John, in the cases that you&#8217;ve identified, looking at this script: there&#8217;s no USHCN, MCDW applicable;  so ranks are tied;  the lengths are tied. Examining the details of the program, because the second record failed to be longer than the first record, the first record kept priority. The ordering of records here is taken over from GHCN.  I inquired from GHCN whether they had any concordance of the sources of their versions and there is none online; they said that there may be some information offline somewhere and they are looking into it (even if they give an answer that I don&#8217;t like, the NOAA folks do so pleasantly as opposed to Hansen&#8217;s crowd).</p>
<p>Anyway this confirms John G&#8217;s point and I need to modify my algorithm a little. As to why the ordering should  seemingly be non-commutative in its impact on trends (if indeed that is the case), that should prove another worthy topic for investigation.</p>
<blockquote><p>def get_best(records):<br />
    ranks = {&#8216;MCDW&#8217;: 4, &#8216;USHCN&#8217;: 3, &#8216;SUMOFDAY&#8217;: 2, &#8216;UNKNOWN&#8217;: 1}<br />
    best = 1<br />
    for rec_id, record in records.items():<br />
        source = record['dict']['source']<br />
        rank = ranks<br />
        if rank &gt; best:<br />
            best = rank<br />
            best_rec = record<br />
            best_id = rec_id<br />
    if best &gt; 1:<br />
        return best_rec, best_id<br />
    longest = 0<br />
    for rec_id, record in records.items():<br />
        length = record['length']<br />
        if length &gt; longest:<br />
            longest = length<br />
            longest_rec = record<br />
            longest_id = rec_id<br />
    return longest_rec, longest_id
</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Goetz</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106955</link>
		<dc:creator><![CDATA[John Goetz]]></dc:creator>
		<pubDate>Fri, 21 Sep 2007 02:51:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106955</guid>
		<description><![CDATA[I wrote Steve a little while ago pointing out that we still have an error in the ordering used to combine scribal records. When I ran Steve&#039;s code against all Russian stations, I noticed the bias calculated tended to be warm. This was counter to what I had observed when I randomly looked at GISS graphs for the Russian stations. I found it very easy to find stations where the early record was biased cool, but quite difficult to find ones in which the early record was biased warm. I don&#039;t tend to be streaky with my luck, so this sounded some alarm bells.

I took a quick look at one of the stations with two records and a large warm bias applied to one of the records: CEKUNDA (GISS ID 22231532000). If you compare the two graphs &lt;a href=&quot;http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=222315320000&amp;data_set=0&amp;num_neighbors=1&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=222315320000&amp;data_set=1&amp;num_neighbors=1&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;, you can see that the record is obviously biased downward (this is why Al Gore invented tabbed browsing). So something about our assumptions regarding the first record is incorrect.

Two other stations quickly jumped out and I think help point to an answer: BARGUZIN (GISS ID 22230636000) and BORZJA (GISS ID 22230965000). I see that the first record is at least tied for the latest record, if it is not clearly the latest record. If it is tied for the latest record, I noticed it is also at least tied for the longest record, as measured by the number of valid annual averages. Interestingly, Borzja&#039;s second record is longer than the first, as measured in months, but equal as measured in years. The first record seems to have been selected as the starting point in the Step 1 process. Interesting...because it has more estimated values than the second record (a whole &#039;nother topic).

At any rate, the ordering seems to be:
1) sort by record with the latest annual data
2) if tie, sort by record with the longest annual record
3) if tie, use the record with the smallest record index.

Now, in Russia (and the rest of the world outside the US), the only way the first record index has a snowball&#039;s chance in an AGW world is if no MCDW (post 1990) record exists for the station. Well, that seems to happen a lot in the rest of the world, and it happened with Borzja, Cekunda, and Barguzin. And a bunch of others.

What I find &lt;em&gt;really&lt;/em&gt; interesting, and will investigate further, is that the first record for those three stations (which &quot;ceased operation&quot; c1990) is cooler than the other records. That means the other records for that station are biased downward to the first record. However, for many, many stations in Russia that continued reporting past 1990, the first record is warmer than most, if not all, later records that overlap it. Thus, the first record is biased downward toward the other records. I find it interesting that, no matter how we slice it, the early Russian record gets biased downward. I didn&#039;t think they were &lt;em&gt;that&lt;/em&gt; oppressed.]]></description>
		<content:encoded><![CDATA[<p>I wrote Steve a little while ago pointing out that we still have an error in the ordering used to combine scribal records. When I ran Steve&#8217;s code against all Russian stations, I noticed the bias calculated tended to be warm. This was counter to what I had observed when I randomly looked at GISS graphs for the Russian stations. I found it very easy to find stations where the early record was biased cool, but quite difficult to find ones in which the early record was biased warm. I don&#8217;t tend to be streaky with my luck, so this sounded some alarm bells.</p>
<p>I took a quick look at one of the stations with two records and a large warm bias applied to one of the records: CEKUNDA (GISS ID 22231532000). If you compare the two graphs <a href="http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=222315320000&amp;data_set=0&amp;num_neighbors=1" rel="nofollow">here</a> and <a href="http://data.giss.nasa.gov/cgi-bin/gistemp/gistemp_station.py?id=222315320000&amp;data_set=1&amp;num_neighbors=1" rel="nofollow">here</a>, you can see that the record is obviously biased downward (this is why Al Gore invented tabbed browsing). So something about our assumptions regarding the first record is incorrect.</p>
<p>Two other stations quickly jumped out and I think help point to an answer: BARGUZIN (GISS ID 22230636000) and BORZJA (GISS ID 22230965000). I see that the first record is at least tied for the latest record, if it is not clearly the latest record. If it is tied for the latest record, I noticed it is also at least tied for the longest record, as measured by the number of valid annual averages. Interestingly, Borzja&#8217;s second record is longer than the first, as measured in months, but equal as measured in years. The first record seems to have been selected as the starting point in the Step 1 process. Interesting&#8230;because it has more estimated values than the second record (a whole &#8216;nother topic).</p>
<p>At any rate, the ordering seems to be:<br />
1) sort by record with the latest annual data<br />
2) if tie, sort by record with the longest annual record<br />
3) if tie, use the record with the smallest record index.</p>
<p>Now, in Russia (and the rest of the world outside the US), the only way the first record index has a snowball&#8217;s chance in an AGW world is if no MCDW (post 1990) record exists for the station. Well, that seems to happen a lot in the rest of the world, and it happened with Borzja, Cekunda, and Barguzin. And a bunch of others.</p>
<p>What I find <em>really</em> interesting, and will investigate further, is that the first record for those three stations (which &#8220;ceased operation&#8221; c1990) is cooler than the other records. That means the other records for that station are biased downward to the first record. However, for many, many stations in Russia that continued reporting past 1990, the first record is warmer than most, if not all, later records that overlap it. Thus, the first record is biased downward toward the other records. I find it interesting that, no matter how we slice it, the early Russian record gets biased downward. I didn&#8217;t think they were <em>that</em> oppressed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Geoff Sherrington</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106954</link>
		<dc:creator><![CDATA[Geoff Sherrington]]></dc:creator>
		<pubDate>Thu, 20 Sep 2007 11:56:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106954</guid>
		<description><![CDATA[Re 15, 16, 19, 20 etc

Thank you for your patience re my #14. Occasionally, very occasionally, there is a Eureka moment but this was not one of them. You guys are far better at that and far closer to analysis of the developments.

I continue to have serious misgivings about (a) adjusting a data string based on an overlap (why up, why down, which one?) instead of simply averaging the overlap pairs (b) infilling missing values by interpolations that make dubious assumptions about constancy of weather over time (c) inappropriate assigning of weights to weighted averages (d) using a search range of 1000 km. and a linear drop-off as well (e) rejecting outliers instead of using them for valuable metadata interpretations.(f) as John G says, using a method that favours one dataset through order of selection over another when no apparent basis for preference exists.

Can I please ask another basic question? Why does one not take a simple arithmetic average of all available values at a point at a time, using the only value if missing data reduce you to just that one? (and using no value if one does not exist).]]></description>
		<content:encoded><![CDATA[<p>Re 15, 16, 19, 20 etc</p>
<p>Thank you for your patience re my #14. Occasionally, very occasionally, there is a Eureka moment but this was not one of them. You guys are far better at that and far closer to analysis of the developments.</p>
<p>I continue to have serious misgivings about (a) adjusting a data string based on an overlap (why up, why down, which one?) instead of simply averaging the overlap pairs (b) infilling missing values by interpolations that make dubious assumptions about constancy of weather over time (c) inappropriate assigning of weights to weighted averages (d) using a search range of 1000 km. and a linear drop-off as well (e) rejecting outliers instead of using them for valuable metadata interpretations.(f) as John G says, using a method that favours one dataset through order of selection over another when no apparent basis for preference exists.</p>
<p>Can I please ask another basic question? Why does one not take a simple arithmetic average of all available values at a point at a time, using the only value if missing data reduce you to just that one? (and using no value if one does not exist).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steven mosher</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106953</link>
		<dc:creator><![CDATA[steven mosher]]></dc:creator>
		<pubDate>Wed, 19 Sep 2007 18:33:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106953</guid>
		<description><![CDATA[re 27&amp;28.

 Exactly!]]></description>
		<content:encoded><![CDATA[<p>re 27&amp;28.</p>
<p> Exactly!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SteveSadlov</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106952</link>
		<dc:creator><![CDATA[SteveSadlov]]></dc:creator>
		<pubDate>Wed, 19 Sep 2007 17:52:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106952</guid>
		<description><![CDATA[RE: #27 - I&#039;d add that (this is to all you estudientes out there, especially ones trying to come up with Masters Thesis or PhD dissertation topics) doing such an excercise would, in and of itself, constitute valuable original research and a welcome contribution to the general knowledge and experience base of Software Engineering and Quality studies. A wonderful case study. Especially for those of you interested in root cause and corrective action analysis, forensics, etc.]]></description>
		<content:encoded><![CDATA[<p>RE: #27 &#8211; I&#8217;d add that (this is to all you estudientes out there, especially ones trying to come up with Masters Thesis or PhD dissertation topics) doing such an excercise would, in and of itself, constitute valuable original research and a welcome contribution to the general knowledge and experience base of Software Engineering and Quality studies. A wonderful case study. Especially for those of you interested in root cause and corrective action analysis, forensics, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: SteveSadlov</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106951</link>
		<dc:creator><![CDATA[SteveSadlov]]></dc:creator>
		<pubDate>Wed, 19 Sep 2007 17:48:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106951</guid>
		<description><![CDATA[RE: #26 - in combination with a state machine diagram (to depict the specific logic including all its branches and erroneous race conditions and dead ends) that would be a very powerful analysis step.]]></description>
		<content:encoded><![CDATA[<p>RE: #26 &#8211; in combination with a state machine diagram (to depict the specific logic including all its branches and erroneous race conditions and dead ends) that would be a very powerful analysis step.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: steven mosher</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106950</link>
		<dc:creator><![CDATA[steven mosher]]></dc:creator>
		<pubDate>Wed, 19 Sep 2007 17:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106950</guid>
		<description><![CDATA[SteveMC.

 I think a &quot;data flow&quot; diagram will help people understand the &quot;maze o data&quot;

 A picture is worth 988.6784679 words]]></description>
		<content:encoded><![CDATA[<p>SteveMC.</p>
<p> I think a &#8220;data flow&#8221; diagram will help people understand the &#8220;maze o data&#8221;</p>
<p> A picture is worth 988.6784679 words</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leon Palmer</title>
		<link>http://climateaudit.org/2007/09/18/hansen-step-1/#comment-106949</link>
		<dc:creator><![CDATA[Leon Palmer]]></dc:creator>
		<pubDate>Wed, 19 Sep 2007 17:06:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.climateaudit.org/?p=2083#comment-106949</guid>
		<description><![CDATA[Has anyone generated a set of truth data to test with, other than rerunning sets of data that exhibit the various flaws that these step 1 algorithms are supposed to fix?  That is,

There are some data sets that are much continuous and reliable, without the flaws that the step 1 algorithms are supposed to fix.  They could be artifically modified to exhibit the flaws, and after running through the step 1 algorithms are the flaws fixed, by comparision to the original flawless data set?

This would illuminate whether the step 1 algorithms are creating any of the postulated problems / biases that have been mentioned by various comments(e.g., #4). It would be a valuable addition to distributing the step 1 algorithms, to distribute these test data sets.]]></description>
		<content:encoded><![CDATA[<p>Has anyone generated a set of truth data to test with, other than rerunning sets of data that exhibit the various flaws that these step 1 algorithms are supposed to fix?  That is,</p>
<p>There are some data sets that are much continuous and reliable, without the flaws that the step 1 algorithms are supposed to fix.  They could be artifically modified to exhibit the flaws, and after running through the step 1 algorithms are the flaws fixed, by comparision to the original flawless data set?</p>
<p>This would illuminate whether the step 1 algorithms are creating any of the postulated problems / biases that have been mentioned by various comments(e.g., #4). It would be a valuable addition to distributing the step 1 algorithms, to distribute these test data sets.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
