Cedarville Sausage

In May I began a quest to better understand how GISS does its homogeneity adjustment, also known as GISS Step 2. Steve McIntyre took the ball from that scrum and ran with it, producing a set of R tools that nearly replicate the GISS method. Some of the endpoint cases continue to confound those of us trying to understand the source code and how it reconciles – or doesn’t – with the peer-reviewed literature.

As this was going on, Anthony Watts pinged me several times, asking that I look at the Cedarville, CA adjustment to better understand why GISS would apply an urban adjustment to an obviously rural station, a topic which he explored in a previous post. I hesitated, because Cedarville had a lot of “nearby” (as defined by GISS) rural stations, and I wanted something simpler to look at. However, I did not forget his request and I took occasional peeks at the station and its neighbors. Below is an overall site view of the Cedarville station. GISS assigns a “night lights” value of 2 to this station, which is what causes it to go through the homogenization process.

cedarville.JPG

Here is a Google Earth image of Cedarville and the surrounding area with the NASA City Lights image overlay enabled. I am not sure what the NASA sensors are picking up to assign Cedarville the “2” rating.

cedarvillenight.JPG

Anthony says this in his post about Cedarville: “a place with a good record and little in the way of station moves”. Generally this may be true, but I personally am suspicious about the fidelity of a station’s record when I see Batman lurking in the 1930’s:

cedarville.gif

OK, let’s assume for the moment that Cedarville’s record is beyond reproach. Let’s further assume that the Cedarville station is urban, and is cursed with the typical frailties of an urban station: lots of asphalt, little vegetation, and placement near an air-conditioner, strip mall, or jet engine. Certainly the surrounding urban stations are of such pristine fidelity that they can be used to remove the urban noise from Cedarville. Let’s take a closer look at those stations and that homogeneity adjustment.

Below is a Google Earth view of all of the rural stations within 500km of Cedarville that are used to completely determine the station’s urban homogeneity adjustment. The Oregon stations seem to be well-represented:

cedarvillerurals.jpg

In the next plot I have color-coded the markers to reflect each station’s trend: white is neutral, reds are warming, blues are cooling (the darker the color, the greater the trend). Unfortunately, the red circle I used to indicate the 500km radius gives the white markers a red cast. Orleans and Electra stick out like sore thumbs with sharp cooling trends, while other stations generally exhibit a flat or warming trend.

ruralstrend.jpg

I then went through the GISS Step 2 process of combining the rural stations into a single “rural record”. This is done by starting with the station with the longest record – Golconda – and combining the remaining stations one by one from longest to shortest record. Without going deep into the details, each station is first adjusted (biased) such that it’s mean matches the mean of the combined record. Then, the station’s record is averaged in with the combined record using a weight that decreases linearly with distance from the urban station at the center (in this case, of course, the metropolis of Cedarville).

The next plot compares the difference between the Golconda record and the final combined rural record. While Golconda has an influence on the final record, it is does not appear overwhelming.

gminusr.gif

I did notice a big difference when the fourth station, Orleans, was combined (Mina is the second and Willows the third). Comparing the difference between Orleans and the final combined rural record, I saw the slope go slightly negative but close to zero, and the extrema pull in much closer to the final value. I am not sure how to determine which station of the 29 has the greatest influence on the combined record, but my instinct is telling me Orleans is the one.

ominusr.gif

Here is a comparison of combined Golconda, Mina, Willows, and Orleans record with that of all 29 stations combined. Clearly the first four rural stations of the 29 get us very close to the final solution:

otoa.gif

The next part of GISS step 2 takes the difference between the Cedarville record and that of the combined rural stations. Following is a plot comparing those two records:

candr.gif

The difference between Cedarville and the combined rurals is shown in the next plot. Also shown is the adjustment value that GISS calculates from the difference. I would have expected an adjustment that looked less like a lower envelope value and more like an average value. The adjustment result indicates all values before 1910 should be adjusted upward and all values after 1910 should be adjusted downward.

cminusrvsa.gif

The adjustment shown above in red is then added back into the Cedarville record to produce the homogenized result. The next plot compares the homogenized version of Cedarville with the original. It clearly shows that values before 1910 are adjusted up, and values since are adjusted down.

cvsh.gif

So what do I make of all this? In the simplest terms, I see Orleans having a rather large influence on the adjustment made to Cedarville. Should Cedarville be adjusted? Well, it certainly is not urban, so the standard GISS urban adjustment seems inappropriate. But the fact that Batman lurks in the 1930s indicates to me that something is amiss and needs to be (as Delbert Grady says in The Shining) “corrected”.

Is Orleans an appropriate adjuster? Certainly it is rural, and the station history indicates it has not moved. However, when I look at the plot of the Orleans data, I see something happened around 1929, and my guess is that it was not sudden global cooling:

orleans.gif

I don’t think this is necessarily a situation where garbage in equals garbage out. Rather, I think it is a situation in which a bunch of trimmings are thrown together and mixed to produce a kind an adjustment sausage. It is not necessarily something that accurately reflects the initial ingredients (inputs), but the output sure is tasty, especially after it has been cooked for a while.


54 Comments

  1. steven mosher
    Posted Jul 19, 2008 at 7:28 AM | Permalink

    John take care with GISS and northern california. Hansen throws out data from some of the stations you have included: from hansen 2001.

    The strong cooling that exists in the unlit station data in the northern California region is not found in either
    the periurban or urban stations either with or without any of the adjustments. Ocean temperature data for the same
    period, illustrated below, has strong warming along the entire West Coast of the United States. This suggests the
    possibility of a flaw in the unlit station data for that small region. After examination of all of the stations in this
    region, five of the USHCN station records were altered in the GISS analysis because of inhomogeneities with
    neighboring stations (data prior to 1927 for Lake Spaulding, data prior to 1929 for Orleans, data prior to 1911 for
    Electra Ph, data prior of 1906 for Willows 6W, and all data for Crater Lake NPS HQ were omitted), so these
    apparent data flaws would not be transmitted to adjusted periurban and urban stations.

  2. steven mosher
    Posted Jul 19, 2008 at 7:45 AM | Permalink

    John if you look in the gisstemp source files you will find a list of those stations that the program excludes and the time periods it excludes.
    This is all done prior to processing. I can point you the exact file.

    So for example, you may find data for crater lake at USHCN and assume that
    GISS uses it, but Hansen dumps it in his first step. same with the early
    20th century data for Orleans, lake spaulding, electra, willows.

    A little known quirk in the sausage machine, but one I have been bugging about
    for a year or so

  3. Dishman
    Posted Jul 19, 2008 at 8:44 AM | Permalink

    Steve #2,
    Are you referring to Ts.strange.RSU.list.IN and Ts.discont.RS.alter.IN?
    or is there something more that I’m missing?

  4. John Goetz
    Posted Jul 19, 2008 at 9:02 AM | Permalink

    Steve, please point me to the file.

    The v2.inv file that is used in STEP 2 includes the stations you have listed, so I am not convinced they are actually discarded.

  5. Gary
    Posted Jul 19, 2008 at 9:05 AM | Permalink

    Perhaps the cooling at Cedarville is real? This story on the only grwoing glaciers on the West Coast appeared last week.

    http://news.yahoo.com/s/ap/20080708/ap_on_re_us/growing_glaciers

    Mapquest says we can drive east to Cedarville in less than 3 hours. I drove part of that route CA 299 in the floods before Christmas in 1964 and apart from having to ford five streams it was a 60 mph roadway then.

    My old DOS program, Mapexpert says it’s less than 115 crow fly miles almost due east of Mt. Shasta(downwind?).

    As a check on the last 20+ years of temperatures, the The Pacific Northwest Cooperative Agricultural Weather Network has had a station, http://www.usbr.gov/pn/agrimet/agrimetmap/cedcda.html, about 3 miles due north since May 1985. (That may also indicate a significant irrigation effect in the area as they report daily Evapotranspiration demand for grains, alfalfa, pasture, onions, and garlic.)

  6. Dishman
    Posted Jul 19, 2008 at 9:30 AM | Permalink

    John, take a look at the two files I named. They’re in STEP0/input_files (and duplicated in STEP1/input_files).

    The v2.inv file that is used in STEP 2 includes the stations you have listed, so I am not convinced they are actually discarded.

    As I’ve noted elsewhere, I’m investigating the Software Assurance process used on GISTEMP. I have found no evidence of any such formal process for either GISTEMP or GISS GCM. Neither GISTEMP or GISS GCM source code contains any artifacts indicating a formal process.

    It is not safe to assume that GISTEMP functions as described in documentation, including the peer reviewed papers.

  7. John Goetz
    Posted Jul 19, 2008 at 10:19 AM | Permalink

    The stations around Cedarville are not listed in the files referenced by Dishman.

  8. Bernie
    Posted Jul 19, 2008 at 11:15 AM | Permalink

    I looked at the Cedarville data at the NW Reclamation Site. The data series cover more variables but are a lot shorter (rooughly 25 years) , but the Cedardale data indicates a zero trend. These look to be rural sites by definition. Could this data be used to compare to the GISS (and satellite) data sets and used as controls? It will take someone more adept than me to do this.

  9. Bernie
    Posted Jul 19, 2008 at 11:15 AM | Permalink

    Sorry, the link is http://www.usbr.gov/pn/agrimet/webarcread.html

  10. Dishman
    Posted Jul 19, 2008 at 11:51 AM | Permalink

    John, it may be that we have here an example of the reason why Software Assurance is essential.

    It appears to me that Mosher’s understanding (from the documentation) is that some sites were not used. It appears that the code does not reflect that.

    Both views are internally consistent, and correct as far as they go. The code appears to test out against itself, and the documentation is peer reviewed. The problem is that they do not match.

    Regarding testing: “The cummulative defect-detection rate is often less than 60%.” Jones, C. 1986 “Programming Productivity”
    Cite chosen in part because it pre-dates Hansen’s testimony.

  11. steven mosher
    Posted Jul 19, 2008 at 12:33 PM | Permalink

    Its in the source files John

    these are the stations completely removed.

    115624640010 HURGHADA lat,lon 27.3 33.8 omit: 0-9999
    134652010000 LAGOS/IKEJA lat,lon 6.6 3.3 omit: 0-9999
    134652360000 WARRI lat,lon 5.5 5.7 omit: 0-9999
    134652430000 LOKOJA lat,lon 7.8 6.7 omit: 0-9999
    205549450010 JUXIAN lat,lon 35.6 118.8 omit: 0-9999
    207433330002 PORT BLAIR lat,lon 11.7 92.7 omit: 0-9999
    210476960010 YOKOSUKA lat,lon 35.3 139.7 omit: 0-9999
    219415600005 PARACHINAR lat,lon 33.9 70.1 omit: 0-9999
    303824000000 FERNANDO DE N lat,lon -3.9 -32.4 omit: 0-9999
    314804440000 CIUDAD BOLIVA lat,lon 8.2 -63.5 omit: 0-9999
    403717300040 RUEL,ON lat,lon 47.3 -81.4 omit: 0-9999
    414762200010 CIUDAD GUERRERO,CHIHUAHUA lat,lon 28.6 -107.5 omit: 0-9999
    414762580020 QUIRIEGO, SONORA lat,lon 27.5 -109.2 omit: 0-9999
    414763730000 TEPEHUANES,DG lat,lon 25.4 -105.7 omit: 0-9999
    414766950010 CHAMPOTON, CAMPECHE lat,lon 19.4 -90.7 omit: 0-9999
    414767750030 CANTON, OAXACA lat,lon 18.0 -96.3 omit: 0-9999
    440785260010 ANNAS HOPE, ST.CROIX VIRG lat,lon 17.7 -66.7 omit: 0-9999
    425724910030 HOLLISTER USA lat,lon 36.8 -121.4 omit: 0-9999
    425725130010 FREELAND lat,lon 41.0 -75.9 omit: 0-9999
    425725210010 PHILO 3SW lat,lon 39.8 -81.9 omit: 0-9999
    425725970120 CRATER LAKE NPS HQ lat,lon 42.9 -122.1 omit: 0-9999
    425726710020 ROCK SPRINGS FAA AP lat,lon 41.6 -109.1 omit: 0-9999
    501947880000 KEMPSEY lat,lon -31.0 152.8 omit: 0-9999

    these are the stations some period removed

    122637720000 LAMU lat,lon -2.3 40.8 omit: 1914/07
    148628400000 MALAKAL lat,lon 9.6 31.7 omit: 1990/08
    148628400000 MALAKAL lat,lon 9.6 31.7 omit: 1991/03
    303838210000 IGUAPE lat,lon -24.7 -47.5 omit: 1985/04
    403718670006 THE PAS,MAN. lat,lon 54.0 -101.1 omit: 1995/05
    425722190010 COVINGTON lat,lon 33.6 -83.9 omit: 1882/11
    425722250030 TALBOTTON lat,lon 32.7 -84.5 omit: 1988/06
    425722250030 TALBOTTON lat,lon 32.7 -84.5 omit: 1988/07
    425722780090 WICKENBURG lat,lon 34.0 -112.7 omit: 1908/11
    425723760020 PRESCOTT lat,lon 34.6 -112.4 omit: 1904/08
    425724220030 FRANKFORT LOCK 4 lat,lon 38.2 -84.9 omit: 1881/09
    425724690010 DILLON 1E lat,lon 39.6 -106.0 omit: 1910/07
    425724710050 LOA lat,lon 38.4 -111.6 omit: 1903/07
    425725190010 AUBURN lat,lon 42.9 -76.5 omit: 1849/07
    425725490030 ALGONA 3W lat,lon 43.1 -94.3 omit: 1863/06
    425727560040 PEMBINA lat,lon 49.0 -97.2 omit: 1887/07
    425745300040 LAMAR lat,lon 38.1 -102.6 omit: 1895/12
    425746470060 JEFFERSON lat,lon 36.7 -97.8 omit: 1899/07
    501947190020 GILGANDRA POST OFFICE lat,lon -31.7 148.7 omit: 1920/07
    651032920010 SCARBOROUGH UK lat,lon 54.2 -.4 omit: 1912/08
    651032920010 SCARBOROUGH UK lat,lon 54.2 -.4 omit: 1912/09
    623160900003 VERONA/VILLAF lat,lon 45.4 10.9 omit: 1987/04
    623161580004 PISA/S.GIUST lat,lon 43.7 161.5 omit: 1987/10
    113655550000 BOUAKE lat,lon 7.7 -5.1 omit: 0-1954
    115624500010 SUEZ lat,lon 29.9 32.6 omit: 0-1888
    140632500000 BARDERA lat,lon 2.4 42.3 omit: 0-1919
    150617010000 BATHURST/YUNDUM lat,lon 13.4 -16.7 omit: 0-1939
    155674750003 KASAMA lat,lon -10.2 31.1 omit: 0-1932
    205526520000 ZHANGYE lat,lon 38.9 100.4 omit: 0-1944
    205528360002 DULAN lat,lon 36.3 98.1 omit: 0-1949
    205535640010 XINGXIAN lat,lon 38.5 111.1 omit: 0-1929
    205538630000 JIEXIU lat,lon 37.1 111.9 omit: 0-1950
    205560800020 LINXIA lat,lon 35.6 103.2 omit: 0-1951
    205565710002 XICHANG lat,lon 27.9 102.3 omit: 0-1938
    205565860010 LEIBO lat,lon 28.3 103.6 omit: 0-1954
    205573480000 FENGJIE lat,lon 31.1 109.5 omit: 0-1956
    205577990000 JI’AN lat,lon 27.1 115.0 omit: 0-1934
    207428070003 CALCUTTA/ALIP lat,lon 22.5 88.3 omit: 0-1859
    207432790003 MADRAS/MINAMB lat,lon 13.0 80.2 omit: 0-1859
    224434660003 COLOMBO lat,lon 6.9 79.9 omit: 0-1869
    302853650000 YACUIBA lat,lon -21.9 -63.6 omit: 0-1935
    315814050000 CAYENNE/ROCHA lat,lon 4.8 -52.4 omit: 0-1911
    403717140040 SHAWINIGAN,QU lat,lon 46.6 -72.7 omit: 0-1918
    414765560010 MASCOTA, JALISCO lat,lon 20.5 -104.8 omit: 0-1940
    414767260010 CUAUTLA, MORELOS lat,lon 18.8 -98.9 omit: 0-1953
    425702710000 GULKANA/INTL. lat,lon 62.2 -145.4 omit: 0-1930
    425722130010 WAYCROSS 4NE lat,lon 31.3 -82.3 omit: 0-1884
    425722330030 AMITE lat,lon 30.7 -90.5 omit: 0-1885
    425722530000 SAN ANTONIO/I lat,lon 29.5 -98.5 omit: 0-1868
    425722530010 ENCINAL lat,lon 28.0 -99.4 omit: 0-1913
    425722570050 TEMPLE lat,lon 31.1 -97.3 omit: 0-1885
    425722710030 LUNA RS lat,lon 33.8 -108.9 omit: 0-1902
    425722740020 TUCSON U OF AZ lat,lon 32.2 -110.9 omit: 0-1889
    425722970060 TUSTIN IRVINE RANCH lat,lon 33.7 -117.8 omit: 0-1909
    425723120000 GREENVILLE/GR lat,lon 34.9 -82.2 omit: 0-1907
    425723270050 BOWLING GREEN FAA AP lat,lon 37.0 -86.4 omit: 0-1884
    425723400040 BRINKLEY lat,lon 34.9 -91.2 omit: 0-1889
    425723600020 SPRINGER lat,lon 36.4 -104.6 omit: 0-1899
    425723650020 LOS LUNAS 3SSW lat,lon 34.8 -106.7 omit: 0-1894
    425723710020 KANAB lat,lon 37.1 -112.5 omit: 0-1909
    425723710030 ZION NATIONAL PARK lat,lon 37.2 -113.0 omit: 0-1917
    425723760020 PRESCOTT lat,lon 34.6 -112.4 omit: 0-1879
    425724170010 DALE ENTERPRISE lat,lon 38.5 -78.9 omit: 0-1891
    425724280060 KENTON lat,lon 40.7 -83.6 omit: 0-1879
    425724620040 CHAMA lat,lon 36.9 -106.6 omit: 0-1900
    425724710040 BEAVER lat,lon 38.3 -112.6 omit: 0-1912
    425724830010 DAVIS EXP FARM 2WSW lat,lon 38.5 -121.8 omit: 0-1909
    425724880000 RENO/INT., NV lat,lon 39.5 -119.8 omit: 0-1889
    425724920010 LODI lat,lon 38.1 -121.3 omit: 0-1890
    425725070070 TAUNTON lat,lon 41.9 -71.1 omit: 0-1883
    425725250010 NEW CASTLE 1N lat,lon 41.0 -80.4 omit: 0-1900
    425725250050 FRANKLIN lat,lon 41.4 -79.8 omit: 0-1894
    425725510070 DAVID CITY lat,lon 41.3 -97.1 omit: 0-1894
    425725540030 WEEPING WATER lat,lon 40.9 -96.1 omit: 0-1905
    425725830050 GOLCONDA lat,lon 41.0 -117.5 omit: 0-1889
    425725910004 RED BLUFF/MUN lat,lon 40.2 -122.2 omit: 0-1889
    425725910020 WILLOWS 6W lat,lon 39.5 -122.3 omit: 0-1905
    425725940010 FORT BRAGG 5N lat,lon 39.5 -123.8 omit: 0-1934
    425725940020 ORLEANS lat,lon 41.3 -123.5 omit: 0-1928
    425726080020 WOODLAND lat,lon 45.2 -67.4 omit: 0-1921
    425726170040 LAKE PLACID 2S lat,lon 44.3 -74.0 omit: 0-1919
    425726510040 ALEXANDRIA lat,lon 43.7 -97.8 omit: 0-1884
    425726700090 HEBGEN DAM lat,lon 44.9 -111.3 omit: 0-1912
    425742010020 PORT TOWNSEND lat,lon 48.1 -122.7 omit: 0-1874
    425742010090 BLAINE lat,lon 49.0 -122.7 omit: 0-1899
    425744800060 UTICA lat,lon 43.1 -75.2 omit: 0-1879
    425744900070 LAWRENCE lat,lon 42.7 -71.2 omit: 0-1859
    425745000030 MARYSVILLE lat,lon 39.2 -121.6 omit: 0-1897
    425745010010 ELECTRA PH lat,lon 38.3 -120.7 omit: 0-1910
    425745010040 LAKE SPAULDING lat,lon 39.3 -120.6 omit: 0-1926
    425745090010 LOS GATOS USA lat,lon 37.2 -122.0 omit: 0-1890
    425745090040 LIVERMORE lat,lon 37.7 -121.8 omit: 0-1881
    425745160060 HEALDSBURG lat,lon 38.6 -122.9 omit: 0-1899
    425745330020 FORT COLLINS lat,lon 40.6 -105.1 omit: 0-1874
    425746190010 DEATH VALLEY lat,lon 36.5 -116.9 omit: 0-1921
    425746300010 SOCORRO lat,lon 34.1 -106.9 omit: 0-1889
    425746350030 CANYON-DE-CHELLY lat,lon 36.2 -109.5 omit: 0-1929
    425746620040 FARMINGTON lat,lon 37.8 -90.4 omit: 0-1890
    425747350010 SNYDER lat,lon 32.7 -100.9 omit: 0-1899
    425747950010 FORT PIERCE lat,lon 27.5 -80.3 omit: 0-1879
    432788970000 LE RAIZET,GUA lat,lon 16.3 -61.5 omit: 0-1940
    501943330000 BOULIA lat,lon -22.9 139.9 omit: 0-1899
    501943660010 BOWEN POST OFFICE lat,lon -20.0 148.3 omit: 0-1909
    501945660000 GYMPIE (FORES lat,lon -26.1 152.6 omit: 0-1909
    501945890000 YAMBA lat,lon -29.4 153.4 omit: 0-1899
    501947840000 TAREE lat,lon -31.9 152.5 omit: 0-1909
    501948420000 CAPE OTWAY lat,lon -38.8 143.5 omit: 0-1900
    501949330000 GABO ISLAND lat,lon -37.6 149.9 omit: 0-1899
    501949370000 MORUYA HEADS lat,lon -35.9 150.2 omit: 0-1898
    523969950000 CHRISTMAS ISL lat,lon -10.4 105.7 omit: 0-1970
    615075100003 BORDEAUX/MERI lat,lon 44.8 -.7 omit: 0-1879
    615076900003 NICE lat,lon 43.7 7.2 omit: 0-1859
    623160900003 VERONA/VILLAF lat,lon 45.4 10.9 omit: 0-1879
    623161400000 BOLOGNA/BORGO lat,lon 44.5 11.3 omit: 0-1879
    636085060002 HORTA (ACORES lat,lon 38.5 -28.6 omit: 0-1916
    649170300004 SAMSUN lat,lon 41.3 36.3 omit: 0-1879
    651039170003 BELFAST/ALDER lat,lon 54.7 -6.2 omit: 0-1878
    205544710010 GAIXIAN XIONGYUE lat,lon 40.2 122.2 omit: 1920-1930
    205567780004 KUNMING lat,lon 25.0 102.7 omit: 1940-1945
    207425150003 CHERRAPUNJI lat,lon 25.3 91.7 omit: 1991-1993
    425723830030 TEJON RANCHO lat,lon 35.0 -118.7 omit: 1909-1912
    425726720010 DIVERSION DAM lat,lon 43.2 -108.9 omit: 1988-1989

  12. steven mosher
    Posted Jul 19, 2008 at 12:40 PM | Permalink

    just unzip the source and you will see these two files.

    I think in step1 there is python routine called “drop strange” or something like that, where these sites are dropped or segments are dropped.

  13. Dishman
    Posted Jul 19, 2008 at 12:50 PM | Permalink

    I can’t find that in my version of the source code, downloaded directly from GISS about a month ago.
    What’s the file name?

  14. Dishman
    Posted Jul 19, 2008 at 12:51 PM | Permalink

    Here’s my version of Ts.strange.RSU.list.IN:

    122637720000 LAMU lat,lon -2.3 40.8 omit: 1914/07
    148628400000 MALAKAL lat,lon 9.6 31.7 omit: 1990/08
    148628400000 MALAKAL lat,lon 9.6 31.7 omit: 1991/03
    303838210000 IGUAPE lat,lon -24.7 -47.5 omit: 1985/04
    403718670006 THE PAS,MAN. lat,lon 54.0 -101.1 omit: 1995/05
    501947190020 GILGANDRA POST OFFICE lat,lon -31.7 148.7 omit: 1920/07
    651032920010 SCARBOROUGH UK lat,lon 54.2 -.4 omit: 1912/08
    651032920010 SCARBOROUGH UK lat,lon 54.2 -.4 omit: 1912/09
    623160900003 VERONA/VILLAF lat,lon 45.4 10.9 omit: 1987/04
    623161580004 PISA/S.GIUST lat,lon 43.7 161.5 omit: 1987/10
    113655550000 BOUAKE lat,lon 7.7 -5.1 omit: 0-1954
    115624500010 SUEZ lat,lon 29.9 32.6 omit: 0-1888
    140632500000 BARDERA lat,lon 2.4 42.3 omit: 0-1919
    150617010000 BATHURST/YUNDUM lat,lon 13.4 -16.7 omit: 0-1939
    155674750003 KASAMA lat,lon -10.2 31.1 omit: 0-1932
    205526520000 ZHANGYE lat,lon 38.9 100.4 omit: 0-1944
    205528360002 DULAN lat,lon 36.3 98.1 omit: 0-1949
    205535640010 XINGXIAN lat,lon 38.5 111.1 omit: 0-1929
    205538630000 JIEXIU lat,lon 37.1 111.9 omit: 0-1950
    205560800020 LINXIA lat,lon 35.6 103.2 omit: 0-1951
    205565710002 XICHANG lat,lon 27.9 102.3 omit: 0-1938
    205565860010 LEIBO lat,lon 28.3 103.6 omit: 0-1954
    205573480000 FENGJIE lat,lon 31.1 109.5 omit: 0-1956
    205577990000 JI’AN lat,lon 27.1 115.0 omit: 0-1934
    302853650000 YACUIBA lat,lon -21.9 -63.6 omit: 0-1935
    315814050000 CAYENNE/ROCHA lat,lon 4.8 -52.4 omit: 0-1911
    403717140040 SHAWINIGAN,QU lat,lon 46.6 -72.7 omit: 0-1918
    414765560010 MASCOTA, JALISCO lat,lon 20.5 -104.8 omit: 0-1940
    414767260010 CUAUTLA, MORELOS lat,lon 18.8 -98.9 omit: 0-1953
    425702710000 GULKANA/INTL. lat,lon 62.2 -145.4 omit: 0-1930
    425725910004 RED BLUFF/MUN lat,lon 40.2 -122.2 omit: 0-1889
    425745090010 LOS GATOS USA lat,lon 37.2 -122.0 omit: 0-1890
    432788970000 LE RAIZET,GUA lat,lon 16.3 -61.5 omit: 0-1940
    501943330000 BOULIA lat,lon -22.9 139.9 omit: 0-1899
    501943660010 BOWEN POST OFFICE lat,lon -20.0 148.3 omit: 0-1909
    501945660000 GYMPIE (FORES lat,lon -26.1 152.6 omit: 0-1909
    501945890000 YAMBA lat,lon -29.4 153.4 omit: 0-1899
    501947840000 TAREE lat,lon -31.9 152.5 omit: 0-1909
    501948420000 CAPE OTWAY lat,lon -38.8 143.5 omit: 0-1900
    501949330000 GABO ISLAND lat,lon -37.6 149.9 omit: 0-1899
    501949370000 MORUYA HEADS lat,lon -35.9 150.2 omit: 0-1898
    523969950000 CHRISTMAS ISL lat,lon -10.4 105.7 omit: 0-1970
    636085060002 HORTA (ACORES lat,lon 38.5 -28.6 omit: 0-1916
    205544710010 GAIXIAN XIONGYUE lat,lon 40.2 122.2 omit: 1920-1930
    205567780004 KUNMING lat,lon 25.0 102.7 omit: 1940-1945
    207425150003 CHERRAPUNJI lat,lon 25.3 91.7 omit: 1991-1993
    115624640010 HURGHADA lat,lon 27.3 33.8 omit: 0-9999
    134652010000 LAGOS/IKEJA lat,lon 6.6 3.3 omit: 0-9999
    134652360000 WARRI lat,lon 5.5 5.7 omit: 0-9999
    134652430000 LOKOJA lat,lon 7.8 6.7 omit: 0-9999
    205549450010 JUXIAN lat,lon 35.6 118.8 omit: 0-9999
    207433330002 PORT BLAIR lat,lon 11.7 92.7 omit: 0-9999
    210476960010 YOKOSUKA lat,lon 35.3 139.7 omit: 0-9999
    219415600005 PARACHINAR lat,lon 33.9 70.1 omit: 0-9999
    303824000000 FERNANDO DE N lat,lon -3.9 -32.4 omit: 0-9999
    314804440000 CIUDAD BOLIVA lat,lon 8.2 -63.5 omit: 0-9999
    403717300040 RUEL,ON lat,lon 47.3 -81.4 omit: 0-9999
    414762200010 CIUDAD GUERRERO,CHIHUAHUA lat,lon 28.6 -107.5 omit: 0-9999
    414762580020 QUIRIEGO, SONORA lat,lon 27.5 -109.2 omit: 0-9999
    414763730000 TEPEHUANES,DG lat,lon 25.4 -105.7 omit: 0-9999
    414766950010 CHAMPOTON, CAMPECHE lat,lon 19.4 -90.7 omit: 0-9999
    414767750030 CANTON, OAXACA lat,lon 18.0 -96.3 omit: 0-9999
    440785260010 ANNAS HOPE, ST.CROIX VIRG lat,lon 17.7 -66.7 omit: 0-9999
    425724910030 HOLLISTER USA lat,lon 36.8 -121.4 omit: 0-9999
    501947880000 KEMPSEY lat,lon -31.0 152.8 omit: 0-9999

  15. steven mosher
    Posted Jul 19, 2008 at 1:13 PM | Permalink

    one file name I had was “list.of.stations.someperiod.removed.”

    I’ll go re download.

  16. steven mosher
    Posted Jul 19, 2008 at 1:39 PM | Permalink

    The previous version of GISSTEMP elimanated certain norcal records as documented in hansen 2001 and I confirmed from my first download of the code.

    The current commited code by hansen doesnt appear to eliminate the records
    that he eliminated in 2001.

    AND there is more python. which means a newcomer is working on it.

    hmm. grad student? old dog learning new tricks? idle speculation I know

  17. steven mosher
    Posted Jul 19, 2008 at 1:43 PM | Permalink

    re 14. I think maybe the current release has forgotten to adress the issue
    of stations with partial records removed. It was there in the first release
    of code, but I cant find it in the current commit.

  18. Dishman
    Posted Jul 19, 2008 at 2:14 PM | Permalink

    The CVS archives for GISTEMP were already on the list for my next FOIA request.

  19. Barney Frank
    Posted Jul 19, 2008 at 3:51 PM | Permalink

    Can’t help with any statistical stuff but I am familiar with all three spots; Electra being about ten miles from my house.
    Electra and Orleans are both located right next to rivers (Electra PH being virtually in the Mokulumne River) which might effect any temperature trends. And both rivers are subject to hydroelctric and irrigation flow manipulations.

  20. Scott-in-WA
    Posted Jul 19, 2008 at 4:47 PM | Permalink

    Old lawyer’s proverb: “Those who love both sausage and the law shouldn’t watch either one being made.”

  21. John Goetz
    Posted Jul 19, 2008 at 5:21 PM | Permalink

    Steven Mosher: I have two versions of the programs, one from last September and the other from June of this year. Neither have the stations listed in them. Neither had “list.of.stations.someperiod.removed”. What is the time stamp on the .tar file you have with those stations listed?

  22. Dishman
    Posted Jul 19, 2008 at 6:46 PM | Permalink

    The version of source code I downloaded last June had Ts.strange.RSU.list.IN with a timestamp of 8/25/2007.
    The gzip file was created 6/24/2008.

  23. John Goetz
    Posted Jul 19, 2008 at 7:17 PM | Permalink

    Dishman…you and I are looking at the same Ts.strange.RSU.list.IN file then.

    Steven Mosher…I can’t read Python (OK, I have not tried very hard to read it). Is this something special with the Python routines?

  24. Posted Jul 19, 2008 at 9:34 PM | Permalink

    Interesting. It appears that I have three gistemp versions, for the first one contains the Ts.strange.RSU.list.IN file, which has a timestamp of 08/25/2007 1:59PM, and the *.tar.gz versions have different file sizes with version 1 being 2,201,165 bytes as reported by MS Windows Vista (and the MD5 hash is 496f97150a9b585e0a8a4e7c77014ab). Version 2 does not have the Ts.strange.RSU.list.IN file, and the *.tar.gz being 2,182,299 bytes as reported by MS Windows Vista (and the MD5 hash is 62d0c749de2d24aa4d041cc3c95d3558). Version 3 also does not have the Ts.strange.* file and the *.tar.gz file size is 1,980,434 bytes as reported by Windows Vista (and the MD5 hash is 9d5f149acd01551cfc23e68fb57f6241). The MD5 hashes reported by Windows Vista may vary from machine to machine.

  25. fred
    Posted Jul 20, 2008 at 1:45 AM | Permalink

    Read in conjunction with SM’s last post on Administrative Law, this is truly crazy. These are very obviously no longer adjustments, they are simply changes to the observational record. Its one thing if you have reason to believe there are instrumental errors, but we are no longer making one set of changes based on this, we seem to be changing the past several times a year, for no well publicized or experimentally verified reasons. I can’t even see why we have any reason to doubt the validity of the ‘batman’ peaks. They look remarkable, but it doesn’t mean they are wrong. Once you start changing your raw data you lose all credibility.

  26. Geoff Sherrington
    Posted Jul 20, 2008 at 5:07 AM | Permalink

    Re # 11 Steven Mosher. Deleted stations.

    Every station has a story. Whether it is relevant to climate reconstruction is another matter.

    205567780004 KUNMING lat,lon 25.0 102.7 omit: 1940-1945

    When Japan treatened China in WWII, many eastern universities were moved or part-moved to Kunming, which became a type of academic city. Kunming was part of the Flying Tigers airlift from India by Chennault. Instinct would say that the record would be well kept, both for the hazardous aviation and because of the influx of scholarly people. The reason for the deletion has me beat.

    Likewise Christmas Island, deleted 1908-1970. This was mined for phosphate far some decades and so there were technical people there. One can only guess at the reason for deletion over these years.

    It is becoming possible to get some meta data for Australian stations (incl Christmas Is, I think) but having seen only a subset, I’m not sure that the answer lies in the soil.

    Sorry for another post with more questions than answers.

  27. steven mosher
    Posted Jul 20, 2008 at 6:45 AM | Permalink

    re 23. there’s nothing special about the Python stuff. It just indicates somebody
    new working on the stuff.

    I still cant figure out where the file with certain peroids removed has disapperred to. But if you look at my post above you’ll see the list of stations
    and the dates. That comes from my download back in september.

    maybe grep the latest download for one of those text lines

  28. steven mosher
    Posted Jul 20, 2008 at 7:01 AM | Permalink

    john the date code on my first tar was 9/8/2007

  29. Scott-in-WA
    Posted Jul 20, 2008 at 7:04 AM | Permalink

    Fred #25: These are very obviously no longer adjustments, they are simply changes to the observational record.

    For these “changes” to be considered “adjustments” there has to be:

    — A documented basis in science or observational method which justifies a particular modification to a particular instance of observed raw data

    — A documented process for making the data modifications which maintains a record of the observed and modified data values

    — A reliable and repeatable mechanism for tying each individual data modification to both the processes which performed the modification and to the justifying scientific and/or methodological reasons

    — A reliable and repeatable mechanism for tracking the historical sequence of changes to any individual data instance, and which maintains the proper references to both the processes which performed the modifications and to the justifying scientific and/or methodological reasons

    — A means of monitoring the process to determine if data modifications are consistent with the both the documented process and the stated reasons for the modifications.

    I’ve said this before: for government-supported climate data and research, this area is fertile ground for a GAO audit of how climate information is being managed and modified.

    The GAO auditors themselves do not need to be climate scientists. The requirements listed above are only what should be considered normal and professional practice in managing any body of important scientific data.

    What is needed most in performing such an audit is for the GAO to assist the Congress and the interested public in bringing these data modifications into the sunlight where those who are most qualified to comment on the validity of the modification processes can get a look at what is actually being done to the data, and why.

    However, that being said, I suspect that a GAO audit would reveal that few, if any, of the above-listed practices are being followed in government-supported climate data management.

    The absence of a disciplined climate data management process automatically casts doubt upon the ability of any diligently-pursued peer review process to determine the validity of the methods and processes being used to modify this climate data — and hence upon the validity of the modified climate data itself.

  30. Cliff Huston
    Posted Jul 20, 2008 at 7:25 AM | Permalink

    The download of GISTEMP_SOURCES that I got on Sept. 13, 2007 contains the files:
    list.of.stations.completely.removed.txt -Stamp:Sep 7, 2007 4:17 PM
    list.of.stations.someperiod.removed.txt -Stamp:Sep 7, 2007 4:17 PM

    Also: gistemp.txt -Stamp:Sep 10, 2007 10:57 AM; in which I found the following statement:

    The files “list.of.stations.completely/someperiod.removed.txt” had to be
    revised after switching to the newest USHCN file:
    all USHCN station corrections could be removed, since USHCN either omitted
    or corrected the parts listed in these files. For the currently used versions see
    STEP0/input_files/Ts.strange.RSU.list.IN .

    The file: STEP0/input_files/Ts.strange.RSU.list.IN -Stamp: Aug 25 2007 1:59PM

    Cliff

  31. steven mosher
    Posted Jul 20, 2008 at 7:45 AM | Permalink

    re 30. thanks cliff. My sept 8 download had the exact same txt files.
    I thought I had taken crazy pills or something.

  32. John Adams
    Posted Jul 20, 2008 at 7:52 AM | Permalink

    haha that is crazy! Thanks for that.

  33. steven mosher
    Posted Jul 20, 2008 at 8:02 AM | Permalink

    RE 30. So, if we are to make sense of this. In hansen 2001, hansen deleted
    certain portions of norcal records ( orleans, lake spaulding, electra etc)
    because they showed an anomalous cooling trend that didnt match the SST records

    why trust the SST record?

    Anyways, sometime after that USHCN went in and either “fixed” these records
    or deleted them. So now since ushcn has adjsuted these records (pre 1929 orleans for example)GISS now no longer has to truncate certain time periods for norcal
    records. It’s turtles all the way down. what was this odd early 20th century
    NorCal cooling and how has uschn adjusted it?.

    For Norcal I dont think the 1915 eruption of Mount Lassen had much of an effect,
    so that cant be it.

    more oddities.

  34. Francois Ouellette
    Posted Jul 20, 2008 at 9:57 AM | Permalink

    #29 While reading yours and other comments, it reminded me of my days in the lab. Doing experimental work is a tedious process. You can (and should) repeat an experiment dozens, if not hundreds of times, and you end up with tons of measurement data. A lot of it is bad, for a multitude of reasons, which are not always explainable (often times it’s just a gut feeling that something went wrong).

    So it’s always tempting to throw away measurements that don’t behave the way you would like them to do. Doing that, you end up “homogenizing” your results. In the end, if you follow that course, more often than not you demonstrate what you wanted to demonstrate in the first place.

    But what I’ve learned from thousands of solitary hours in the lab is that it is the anomaly that is most interesting. You should always watch out for anomalies. And try to see if they’re repeatable, and then find an explanation. More often than not, you will have found something new and unexpected.

    So the process of “homogenizing” is dangerous in two respects: it is a way to convince yourself of your pre-determined conclusion, and it prevents you from improving your knowledge of the phenomena at play.

  35. Dishman
    Posted Jul 20, 2008 at 10:05 AM | Permalink

    So, collecting up the data above:
    Mosher’s first download, generated 9/8/2007 does not contain changes made on 8/25/2007.

    There is a version control problem with GISTEMP. This is one of the most basic elements of Software Assurance.

    Proper Software Assurance process is demonstrably not in place. According to current Best Practices, the product is not reliable.

  36. John Goetz
    Posted Jul 20, 2008 at 2:00 PM | Permalink

    Sorry to belabor this but I want to make sure I am looking at the right file Steve Mosher intended above.

    My first GISStemp tar file is dated 9/8/2007 at 9:06 AM. The Ts.strange.RSU.list.IN file is stamped 8/25/2007 at 4:59 PM. This file is present in the STEP 0 and 1 input file directory. It does not appear to be used by any STEP 2 program that I know of, so I doubt it was meant to be included.

    In my copy of Ts.strange.RSU.list.IN I do not see entries for Orleans, Lake Spaulding, Electra, or Willows as Steve Mosher mentions in #2.

    I am confused!

  37. Cliff Huston
    Posted Jul 20, 2008 at 4:33 PM | Permalink

    John,

    I would guess that Steve Mosher’s download included the ‘STEP0/input_files/Ts.strange.RSU.list.IN’ [Stamp: Aug 25 2007 1:59PM] file. I think the point of confusion is the ‘list.of.stations.completely.removed.txt’ and ‘list.of.stations.someperiod.removed.txt’ files, which if I understand the note in the ‘gistemp.txt file’, are not used (being replaced with the ‘Ts.strange.RSU.list.IN’ file, due to changes made to USHCN data) and are only included for reference. It is in those unused files that Steve Mosher saw Orleans, Lake Spaulding, Electra, and Willows – and assumed that the files were being used to exclude the stations from the process.

    Cliff

  38. steven mosher
    Posted Jul 20, 2008 at 5:35 PM | Permalink

    re 37. Exactly. The data in those txt files correspond to the description
    given in Hansen 2001. ( exclude this time period for orleans, that time period for lake spaulding etc etc ) HOWEVER, the current state of the code doesnt acess these files.

    John note this. in 2001 Hansen eliminated ALL orleans data up until 1929 because
    it was abnormaly cold. now look at the orleans data before 1929.

    I have extra crazy pills if you want some

  39. Posted Jul 20, 2008 at 6:13 PM | Permalink

    I have looked at the files that I have again. I have three GISTEMP versions. The first dropped on or about the 9/10/2007 date I downloaded it. This gzipped file is 2,201,165 bytes as reported by MS Windows Vista. The .tar file within it is timestamped 2007-09-07 19:02. This archive version contains the list.of.stations.completely.removed.txt (1702 bytes) and the list.of.stations.someperiod.removed.txt (9066 bytes) files in the root directory and these are timestamped 2007-09-07 18:17. This version has the Ts.strange.RSU.list.IN file in the STEP0/input_files directory (4829 bytes and timestamped 2007-08-25 15:59) and in the STEP1/input_files directory (same size and timestamp).

    The second dropped on or about the 9/17/2007 date I downloaded it. This gzipped file is 2,182,299 bytes as reported by MS Windows Vista. The .tar file within it is timestamped 2007-09-10 13:09. This archive version does contain the list.of.stations.completely.removed.txt and the list.of.stations.someperiod.removed.txt files in the root directory (same size and timestamp as the first version). This version also has the Ts.strange.RSU.list.IN file in the STEP0/input_files directory and in the STEP1/input_files directory (same sizes and timestamps as the 9/7/2007 version).

    The third dropped on or about the 6/23/2008 date I downloaded it. This gzipped file is 1,980,434 bytes as reported by Windows Vista. the .tar file within it is timestamped 2008-06-09 13:17. This archive version does not contain the list.of.stations.completely.removed.txt and list.of.stations.someperiod.removed.txt files. It does have the Ts.strange.RSU.list.IN file in the STEP0/input_files directory and in the STEP1/input_files directory (same sizes and timestamps as the 9/7/2007 version).

    I have not been able to compile and run the source yet (I have the Windows POSIX subsystem, a cygwin environment, an emulated Linux environment and a Linux environment on a virtual machine available for testing. If/when I get these sources to compile and run, I plan to post it on my own blog (I enjoy the challenge of porting code (I cut my teeth on FORTRAN IV), but this batch of code is a monstrosity that should be a case study in how not to develop code for CS students).

    I am not sure where I made the mistake in by posting #24, so here is my mea culpa, mea maxima culpa.

  40. TheDude
    Posted Jul 20, 2008 at 6:28 PM | Permalink

    sausage, lool

  41. steven mosher
    Posted Jul 20, 2008 at 7:53 PM | Permalink

    36. The files I listed in #11. are the text files noted in Cliffs. #30

    “The download of GISTEMP_SOURCES that I got on Sept. 13, 2007 contains the files:
    list.of.stations.completely.removed.txt -Stamp:Sep 7, 2007 4:17 PM
    list.of.stations.someperiod.removed.txt -Stamp:Sep 7, 2007 4:17 PM”

    and not the same as Ts.strange.RSU.list.IN

    So. to reconstruct things. Long ago back in 2001 when Hansen wrote his paper
    He exluded portions of certain norcal records ( see his paper) THIS is
    confirmed by “list.of.stations.someperiod.removed.txt”
    SOMETIME after the publication of H2001 , USHCN got around to fixing the data from these sites. AGAIN, as cliff writes in #30 citing giss documents

    “The files “list.of.stations.completely/someperiod.removed.txt” had to be
    revised after switching to the newest USHCN file:
    all USHCN station corrections could be removed, since USHCN either omitted
    or corrected the parts listed in these files. For the currently used versions see STEP0/input_files/Ts.strange.RSU.list.IN .”

    Sometime after the original publication of the code nasa removed the files
    “list.of.stations.completely/someperiod.removed.txt” because they are essentially not referenced in the code or used, BUT at one point they were.

    So. in Hansen 2001 The data for Orleans was truncated prior to 1929. But now
    that data has been ressurrrected by USHCN

  42. John Goetz
    Posted Jul 20, 2008 at 7:58 PM | Permalink

    #41 Steve Mosher

    Thank goodness they keep an audit trail for this kind of thing. Right, Dishman?

  43. John Goetz
    Posted Jul 20, 2008 at 7:59 PM | Permalink

    #40 I had been grilling Johnsonville brats earlier in the evening, and I thought Cedarville sausage had a nice ring to it.

  44. steven mosher
    Posted Jul 20, 2008 at 8:25 PM | Permalink

    re 43. tell me that after you griled them you soaked them in beer.

  45. Len van Burgel
    Posted Jul 21, 2008 at 1:49 AM | Permalink

    Geoff Sherrington #26

    Likewise Christmas Island, deleted 1908-1970. This was mined for phosphate far some decades and so there were technical people there. One can only guess at the reason for deletion over these years.

    Christmas Island (Indian Ocean) is the rocky outcrop of a submarine mountain. Its elevation extends up to 361m.
    Prior to 1973, the observations are from Rocky Point (elevation 17m), near the settlement, right on the coast. From 1973 the observations are taken at a location 4km south at the airport (elevation 261 mtr). Understandably the mean annual maximum temperature is about 3C lower at the new site. Although the two locations are only 4km apart, it would be almost impossible to merge them in to a continuous record.

  46. Geoff Sherrington
    Posted Jul 21, 2008 at 5:07 AM | Permalink

    Re # 34 Francois Ouellette

    But what I’ve learned from thousands of solitary hours in the lab is that it is the anomaly that is most interesting. You should always watch out for anomalies. And try to see if they’re repeatable, and then find an explanation. More often than not, you will have found something new and unexpected.

    Thank you for re-emphasising this point, with which I thoroughly agree.

    Re # 45 Len van Burgel Christmas Island.

    Thank you for this simple explanation. Do you know why 50 or so years of potentially good data were taken out? Did you find this explanation in a place where other deletion puzzles could be solved?

  47. Len van Burgel
    Posted Jul 21, 2008 at 6:10 AM | Permalink

    Re # 46 Geoff Sherrington
    It is by ‘local’ knowledge. Not too local, for I have never been to Christmas Island.

    I did ask a few people, but was sidetracked by the fact that Christmas Island weather records are under “Christmas Island Aero”, when the earlier records are labelled under “Rocky Point”. This is confusing since there are many places called “Rocky Point” in Australia. When you check Christmas Island climate averages on http://www.bom.gov.au/climate/averages/ it is shown as “nearest alternative site” 4km away and that can be checked on any Christmas Island map.

    Temperature data for Rocky Point is only shown for 1959-1972 whilst rainfall is from 1901 to 1973.
    That doesn’t mean temperature data doesn’t exist. It may not have been entered in the data base. That means going to the archives to search the records. Checking the Australian National Archives web page shows they hold “Christmas Island weather observations books” for 1964-1966. That is not the end of it either, for Christmas Island is considered to be under the Bureau of Meteorology’s Western Australian Regional Office reponsibility and some of their data is also stored in the Western Australian State Archives.

    I presume your interest doen’t go further, so I will give the archives a miss.

    To solve other deletion problems, ask a local.

  48. John S.
    Posted Jul 21, 2008 at 3:46 PM | Permalink

    Three questions arise concerning Cedarville sausage (a brilliant phrase that ought to be trademarked):

    1. Does homogenization improve the accuracy of the record? Me thinks not.

    2. Does it adequately rein in the suspected “Batman” flaw? Possibly.

    3. Does it introduce a positive recent trend into the data, where there was none before? Certainly!

    From the perspective of an analysis of yearly deviations from the 20-th century means of vetted, intact records in this climate area, the “Batman” feature appears to be no flaw. The average result shows very stable temps, with 1926, 1934, and 1931 by far the warmest years of the century, with deviations well in excess of 2 sigmas. I see nothing in the Cedarville record to impeach it.

    The remarkable thing about the GISS sausage machine, where anomalies are computed from homogenized data, is that the 1951-1980 “norm” is also changed thereby. Thus not only a false trend, but also a bias go into the computational legendermain. Moreover, from a climatic perspective, the use of stations from a wholly different climate area (e.g. Mina NV) just because they happen to fall within a ceratin distance is indefensible. As the widely divergent trends at neighboring stations clearly show, this topographically convoluted area is a quiltwork of microclimates.

    But none of this seems to matter to the crew of spaceship GISS, who find the their sausage to be very tasty.

  49. Scott Lurndal
    Posted Jul 23, 2008 at 1:43 PM | Permalink

    Re #24

    The only thing that can cause the MD5 hash to vary between machines is that the data being hashed (the gzip’ed tarball) has changed. The purpose of the MD5 (Message Digest Algorithm #5) is to ‘digest’ a document (text/binary doesn’t matter) into a small cryptographically secure (i.e. the chances of similar data producing the same MD5 hash is quite small) hashes that can be used to detect changes (from single-bit to blocks of data) in a dataset.

    Of course, if the implementation of the algorithm to produce the hash is flawed (not necessarily unlikely with Microsoft software), then it could vary from what a linux MD5 (md5sum) algorithm produces, but microsoft would be forced to fix that immediately as the security community would go nuts.

    That said, the literature does show methods that can be used to ‘craft’ a false text that will produce the same MD5 sum, which has caused cryptographers to use other digest algorithms (e.g. SHA-256).

  50. crosspatch
    Posted Jul 23, 2008 at 1:51 PM | Permalink

    There are things such as what constitutes an end of a line that differ between operating systems and possibly versions of the same operating system. Some editors can also leave “artifacts” in a file. So you can have two files that look identical on, say, a Windows and a Linux machine that might have completely different MD5 results.

  51. Scott Lurndal
    Posted Jul 23, 2008 at 2:35 PM | Permalink

    Re #44:

    Simmer them in beer (and onions), then grill them.

  52. Scott Lurndal
    Posted Jul 23, 2008 at 2:37 PM | Permalink

    Re #50:

    If the eol and/or eof character changes, or any other ‘editor artifact’ is left, then your dataset has been compromised.

    If the MD5 sums don’t match, you can’t trust the data. Period.

  53. Posted Jul 23, 2008 at 6:56 PM | Permalink

    I can only report the MD5 hash that Windows reports. I can only assume that Windows would report a hash value generated by Unix properly, but there is no guarantee of 100% certainty that someone using an older Windows version would see the same hash value that I see. Microsoft has had cryptographic issues in the past, IIRC; but I do believe that all of the modern NT derived Windows kernels would correctly report a Unix generated MD5 hash (Vista Ultimate has a POSIX subsystem that you can install and use most of the modern Unix tools). Posting the hash value I could see would allow others to compare their hashes and be reasonably sure that their versions and mine match.

  54. John S.
    Posted Jul 24, 2008 at 4:46 PM | Permalink

    As a follow-up to the gist of #48, I patched the Cedarville and Golconda records and performed a cross-spectrum analysis.

    It turns out that the two records show insignificant coherence at frequencies below the Wolfe sunspot cycle (~11yrs). At these low frequencies, Cedarville is significantly coherent with Redding CA, while Golconda is similarly related to Mina and Lovelock NV. Thus Golconda provides no credible basis for any trend-altering “correction” of the Cedarville record.

    In areas with steep climatic gradients, the longest rural record within a certain radius of the station to be “homogenized” does not necessarily provide an unbiased backbone for any adjustment. This ad hoc criterion is simply a device of computational convenience, rather than of climatological reasoning.

Follow

Get every new post delivered to your Inbox.

Join 3,262 other followers

%d bloggers like this: