Quantifying the Hansen Y2K Error

I observed recently that Hansen’s GISS series contains an apparent error in which Hansen switched the source of GISS raw from USHCN adjusted to USHCN raw for all values January 2000 and later. For Detroit Lakes MN, this introduced an error of 0.8 deg C. I’ve collated GISS raw minus USHCN adjusted for all USHCN sites (using the data scraped from the GISS site, for which I was most criticized in Rabett-world). Figure 1 below shows a histogram of the January 2000 step for the 1221 stations (calculated here as the difference between the average of the difference after Jan 2000 and for the 1990-1999 period.) The largest step occurred in Douglas AZ where the Hansen error is 1.75 deg C! There is obviously a bimodal distribution.

hansen40.gif

Next here is a graph showing the difference between GISS raw and USHCN adjusted by month (with a smooth) for unlit stations (Which are said to define the trends). The step in January 2000 is clearly visible and results in an erroneous upward step of about 0.18-0.19 deg C. in the average of all unlit stations. I presume that a corresponding error would be carried forward into the final GISS estimate of US lower 48 temperature and that this widely used estimate would be incorrect by a corresponding amount. The 2000s are warm in this record with or without this erroneous step, but this is a non-negligible error relative to (say) the amounts contested in the satellite record disputes.

hansen41.gif

Aug 7 UPDATE:
On the weekend, I notified Hansen and Ruedy of their Y2K error as follows:

Dear Sirs,
In your calculation of the GISS “raw” version of USHCN series, it appears to me that, for series after January 2000, you use the USHCN raw version whereas in the immediately prior period you used USHCN time-of-observation or adjusted version. In some cases, this introduces a seemingly unjustified step in January 2000.

I am unaware of any mention of this change in procedure in any published methodological descriptions and am puzzled as to its
rationale. Can you clarify this for me?

In addition, could you provide me with any documentation (additional to already published material) providing information on the
calculation of GISS raw and adjusted series from USHCN versions, including relevant source code.

Thank you for your attention,
Stephen McIntyre

Today I received the following response:

Dear Sir,

As to the question about documentation, the basic “GISS Surface Temperature Analysis” page starts with a “Background” section whose first paragraph contains the sentence: “Input data for the analysis ,…, is the unadjusted data of GHCN, except that the USHCN station records were replaced by a later corrected version”. A similar statement appears in the “Abstract” and the “Introduction” section of our 2001 paper (JGR Vol 106, pg 23,947-23,948). The Introduction explains the above statement in more detail.

In 2000, USHCN provided us with a file with corrections not contained in the GHCN data. Unlike the GHCN data, that product is not kept current on a regular basis. Hence we used (as you noticed) the GHCN data to extend those data in our further updates (2000-present).

I agree with you that this simple procedure creates an artificial step if some new corrections were applied to the newest data, rather than bringing the older data in sync with the latest measurements – as I naively assumed. Comparing the 1999 data in both data sets showed that in about half the cases where the 1999 data were changed, the GHCN data were higher than the USHCN data and in the other half it was the other way round with the plus-corrections slightly outweighing the minus-corrections.

Although trying to eliminate those steps should have little impact on the US temperature trend (much less the global trend), it seems a good idea to do so and I’d like to thank you for bringing this oversight to our attention.

When we did our monthly update this morning, an offset based on the last 10 years of overlap in the two data sets was applied and our on-line documentation was changed correspondingly with an acknowledgment of your contribution. This change and its effect will be noted in our next paper on temperature analysis and in our end-of-year temperature summary.

The effect on global means and all our tables was less than 0.01 C. In the display most sensitive to that change – the US-graph of annual means – the anomalies decreased by about 0.15 C in the years 2000-2006.

Respectfully,

Reto A Ruedy

Well, my estimate of the impact on the US temperature series was about 0.18-0.19 deg C., a little bit more than Ruedy’s 0.15 deg C. My estimate added a small negative offset going into 2000 to the positive offset of about 0.15-0.16 after 2000 – I suspect that Ruedy is not counting both parts, thereby slightly minimizing the impact. However, I think that you’ll agree that my estimate of the impact of the impact was pretty good, given that I don’t have access to their particular black box.

Needless to say, they were totally unresponsive to my request for source code. They shouldn’t be surprised if they get an FOI request. I’ll post some more after I chance to cross-check their reply.

As to the impact on NH and global data, I’ve noted long before this exchange that the non-US data in GHCN looks more problematic to me than the US data and it would be really nice if surfacestations.org starting getting some international feedback. Ruedy’s reply was copied to Hansen and to Gavin Schmidt. I’m not sure what business it is of Gavin’s other than his “private capacity” involvement in a prominent blog.

Eureka

Well, I found it. Eureka that is, but what I found was rather depressing. I visited the USHCN climate stations of record both old and new at Eureka, California on July 30th and I can’t decide which site is more out of compliance with sitings standards.

The original location was on the 4th story roof of the main Post office in downtown Eureka and had been there since before 1900:
Eureka Post Office
The weather station was on the elevated scaffold above the roof, it is still visible today.

But the station was moved to the National Weather Service Office on Woodley Island in 1994. They had a chance then to do a good site setup and to adhere to published siting standards, but alas, no such luck:

Eureka NWS Office

The site at the Eureka Weather Service Office has several problems: Asphalt parking lot within 100 feet, buildings within 100 feet, and a line of large shade trees about 25 feet to the west and north, making significant shade for about 1/3 to 1/2 of the day, depending on the time of year, plus wind shelter. Plus there is the concrete under the Stevenson Screen, the 3 concrete pillars for rain gauges that can act as heat sinks, and the crushed rock around the station site. “Crushed rock” isn’t generally found in Humboldt county as a natural surface, so the surface under the weather station isn’t representative of the surrounding area.

Eureka NWS Office Detail

A complete photo essay is available here on www.surfacestations.org

If the NWS doesn’t see fit to make the sites at their own offices comply with published site standards, is it any wonder that so many of the other climate stations of record are so far out of compliance?

More on Asphalt

WMO guidelines state that weather stations should be at least 100 feet from paved areas. As we see the USHCN pictures unfold, we’re obviously seeing one site after another in non-compliance with this requirement, a point notably made in connection with Tucson (Univesity of Arizona) site, where the location was particularly gross, but the point is seemingly pervasive. While many of these pictures also show air conditioners, my guess is that the asphalt pavement may prove to be a more substantial problem than the air conditioners.

I notice that GISS apologist Eli Rabett has another post arguing that traditional quality control doesn’t matter – this time arguing that heat rises and thus, for example, nearby air conditioners don’t matter. Perhaps so, perhaps not. Eli’s implication is that WMO policies don’t “matter”, that, in effect, the practical WMO people are just fuddy-duddies, making pointless QC demands that are unnecessary when Hansen’s on the scene with magic adjustment software. While Eli’s implied criticism of WMO policies may be borne out, my own guess is that the WMO guidelines were created for a reason and that they embody useful practical knowledge – that there’s a reason why, for example, WMO guidelines require that weather stations be 100 feet from pavement and perhaps there are even reasons not to locate them near air conditioners.

But today a little more on pavement and specifically asphalt pavement and why it’s not a good idea to locate weather stations within 100 feet of pavement. The radius is relevant since the pavement strongly re-radiates IR and will affect weather stations that are not directly above it. Continue reading

The USHCN Basketball Team

Odessa_basketball.jpg

This is the climatological station of record for Odessa, Washington. It is at the residence of a COOP weather observer administered by NOAA. The photo was taken by surfacestations.org volunteer surveyor Bob Meyer.

In addition to the proximity to the house and the asphalt being less than the 100 foot published NOAA standard, we have a basketball goal nearby. This is a first as far as I know. I don’t know if any studies or standards exist that describe what if any effects having the MMTS sensor whacked by errant basketballs might have.

Speaking from my own electronic design experience though, transient and numerous G forces applied to electronic sensors don’t generally allow for sustained accuracy and reliability.

The complete photo album for this station is available on www.surfacestations.org

Trends in Peterson 2003

Peterson 2003 stated:

Contrary to generally accepted wisdom, no statistically significant impact of urbanization could be found in annual temperatures.

Last week, Peterson sent me a list of the 289 sites used in this study, together with the classification into urban and rural. As I noted previously, there are many puzzles in the allocation of sites to urban and rural with many “urban” sites seemingly being at best very small towns and, in some cases, rural themselves. So, in that sense, it would seem unsurprising if Peterson didn’t observe any difference between the two networks.

(Lead-in posts are here and here.)

Assuming nothing, I downloaded raw daily data for 282 out of 289 sites. (The other 7 sites either had id number discrepancies or were not online at GHCND.) From this, I calculated average monthly TMAX and TMIN temperatures for all the sites and then calculated 1961-1990 anomalies. I then calculated simple averages of the “raw” anomalies for the two networks BEFORE any jiggery-pokery. Even if all the subsequent adjustments are terrific, from a statistical point of view, it’s always a good idea to see what your data looks like at the start. Here is a plot (with a 24 month smooth.)

As you see in the bottom panel, there is an observable trend in the difference between Peterson-urban and Peterson-rural sites. The delta over 100 years is just under 0.7 deg C.

peters26.gif
Figure 1. Peterson 2003 Network Averages. Top -“urban”; middle – “rural” ; bottom – difference.

You would think that this would have been one of the first tests that Peterson would have carried out and his failure to either carry out this test or report such results if the procedure were carried out is noticeable.

Peterson’s articles describes a series of adjustments: for elevation, latitude, time-of-observation, MMTS. Not all of these adjustments are relevant to an anomaly-based comparison. For example, the adjustment for elevation and latitude is relevant to a direct comparison of urban and rural absolute temperatures, but not for a comparison of anomaly trends. Peterson cites literature (Quayle et al 1991) which states that MMTS introduction has minimal effect on averages (although it increases the TMIN and reduces the TMAX). So this would not account for the difference.

Peterson reported on TOB as follows:

The percentage of stations reading in the afternoon is about the same for rural (33%) as urban (35%). However, rural stations have a higher percentage of a.m. readers (53% versus 37%) and a lower percentage of midnight readers (14% versus 27%) than urban stations.

Again for trends, the salient point is the change in proportions, rather than the specific mechanism. The implication of Peterson’s analysis would seem to be that the 0.7 deg C delta in Peterson urban-Peterson rural differential is not due to the effect of urbanization on the urban sites but related somehow to the higher present proportion of morning to midnight readers in the rural network.

Readers should note that Peterson does not carry out TOB adjustments based on documented changes in observation time (which USHCN users might assume). Instead Peterson has used a procedure attributed to DeGaetano BAMS 2000, which purports to estimate observation time based on the properties of the data itself. The DeGaetano procedure, as with so many of these recipes, is not a statistical procedure known to statistical civilization off the island. You can’t go to a statistics textbook and learn its properties. There is no systematic presentation of DeGaetano-adjusted TOBS series against USHCN adjusted series.

However, regardless of the merits of the DeGaetano adjustment, I think that it’s incorrect for Peterson to say that there is no observable difference in urban and rural trends in his network. There is a substantial difference in trends in the “raw” data, which should have been reported. He believes that this difference is due to TOBS changes based on De Gaetano adjustments, but it’s possible that there is some other explanation for the difference, including the obvious candidate – UHI.

UPDATE

This comparison actually gets a little worse.

In the figure below, I’ve calculated the average unadjusted temperature for actual cities, rather than places like Snoqualmie Falls. My criterion for inclusion in this calculation is whether the city has a major league sports franchise and includes a variety of mostly small market cities: Milwaukee, Sacramento, Orlando, San Antonio, Cincinnati, San Diego, Seattle, Salt Lake City, New Orleans, plus a couple of larger places: Detroit, Philadelphia, Dallas. To my knowledge, no sports franchises are considering re-location or expansion to Snoqualmie Falls, Hankinson, Pine Bluff or the various other supposedly “urban” sites that dilute the Peterson network.

In this data set that supposedly shows the following:

Contrary to generally accepted wisdom, no statistically significant impact of urbanization could be found in annual temperatures.

actual cities have a very substantial trend of over 2 deg C per century relative to the rural network – and this assumes that there are no problems with rural network – something that is obviously not true since there are undoubtedly microsite and other problems. At the very end of the graphic, the change levels off – I wonder if that might indicate increased settlement effects at rural sites.

peters27.gif
Figure 2. Comparison of Peterson Sites with Major League Sports Franchises to Rural Network

Now this doesn’t prove anything one way or the other about other networks – other than there is a need to be wary. However, the notion that Peterson 2003 is a sustainable authority for the IPCC proposition that “rural station trends were almost indistinguishable from series including urban sites” seems increasingly difficult to accept.

Peterson’s “Urban” Sites

I posted up the list of 289 sites from Peterson 2003 purporting to show that the difference between “urban” and rural sites was negligible. (See related posts here and here.)

As noted before, the definition of “urban” includes such metropolises as Wahpeton ND and Hankinson ND. Cottage Grove 1S OR is classified as “urban”, while Cottage Grove Dam OR is classified as “rural”. Some “clusters” lack rural comparanda. Some clusters lack any locations in the Peterson supporting table. The locations are said to be derived from Gallo and David 1999, which had 28 “urban” stations. While most of the sites recur somewhere in this list, the “urban” sites are blended with rural sites before the comparison. Oddly, one of the “urban” Gallo and David sites (Columbia MO) is classified as “rural” in Peterson 2003.

I’ve collated the urban and rural sites in a long table for your edification. Only about 25% of these sites even occur in the USHCN network or GHCN network (and thus in the gridcell composites). It’s hard to see exactly what a comparison of the left column sites and right column sites has to do with whether the urbanization affects the CRU, NOAA or GISS composites. Continue reading

Hansen’s Y2K Error

Eli Rabett and Tamino have both advocated faith-based climate science in respect to USHCN and GISS adjustments. They say that the climate “professionals” know what they’re doing; yes, there are problems with siting and many sites do not meet even minimal compliance standards, but Hansen’s software is able to “fix” the defects in the surface sites. “Faith-based” because they do not believe that Hansen has any obligation to provide anything other than a cursory description of his software or, for that matter, the software itself. But if they are working with data that includes known bad data, then critical examination of the adjustment software becomes integral to the integrity of the record – as there is obviously little integrity in much of the raw data.

Eli Rabett has recently discussed the Detroit Lakes MN series as an example where the GISS adjusted software has supposedly triumphed over adversity and recovered signal from noise. And yet this same series displays a Hansen adjustment that will should leave anyone “gobsmacked”. Continue reading

Unthreaded #17

Continuation of Unthreaded #16

Peterson (2003)

Peterson (2003) online here is an influential study cited by IPCC AR4 purporting to show that the urbanization effect is negligible. It concluded:

Using satellite night-lights—derived urban/rural metadata, urban and rural temperatures from 289 stations in 40 clusters were compared using data from 1989 to 1991. Contrary to generally accepted wisdom, no statistically significant impact of urbanization could be found in annual temperatures.

AR4 said of this study:

Over the conterminous USA, after adjustment for time-of-observation bias and other changes, rural station trends were almost indistinguishable from series including urban sites (Peterson, 2003; Figure 3.3… One possible reason for the patchiness of UHIs is the location of observing stations in parks where urban influences are reduced (Peterson, 2003).

The 289 stations are not listed in the article and no SI is available. One of our readers inquired about the stations and I wrote to Peterson asking him for the information as follows:

Dear Dr Peterson, could you please provide me a list of the USHCN id numbers of the 289 stations used in Peterson 2003, together with information on how they are allocated to the 40 clusters and how they are classified rural-urban? It would be helpful if you provided an SI with this sort of information. Thanks, Steve McIntyre

As I anticipated, Peterson responded promptly with the list together which I’ve posted up here with some additional information that I’ve collated. The first 5 columns are from Peterson. Peterson did not provide lat-long’s; I looked these up in the state files at http://www.wrcc.dri.edu/inventory/sodca.html (CA) etc. I also compared the ID numbers to USHCN id numbers and marked the USHCN-overlapping series. Peterson added the following additional comments:

a) If you read the article, you will notice that I did not specifically use USHCN stations in the analysis.

b) If you read the article, you will notice that I did not classify the stations as rural or urban. That was done by Owen et al. (1998) based on night lights data and used by Gallo and Owen (1999). Kevin Gallo provided me with the metadata he used in Gallo and Owen. I’m sure Dr. Gallo would be happy to answer any questions you might have that are not already addressed in those two papers.

c) The attached file is a list of the 289 stations I evaluated and processed for Peterson 2003. The first number is the group, 1-40; the second column is the rural/urban satellite nigh lights based metadata classification; the third column is the station number and the fourth the station name. If you read Peterson and Owen (2005) you will come across this statement: “The mean urban minus rural difference is 0.03°C using adjusted data and 0.00 with the modified adjusted data. The first result differs slightly from the 0.04°C reported in Peterson (2003) with the difference due to correcting a processing error in the metadata assignments at a few of the stations.” The metadata I provided you are the corrected metadata.

Of the 289 stations, only 63 came from the USHCN network.

In the article, Peterson said that 85 of these stations were rural, 191 urban, and 13 suburban. In the file, there were 84 rural, 6 suburban and 199 urban. Although Peterson attributed the 40 clusters to Gallo and Owen 1999, that article only used 28 clusters and not all of the 28 clusters could be identified in the Peterson list. So some additional selection procedure has been applied.

I did some cross-analyses of the 63 USHCN stations in the Peterson network – the USHCN network being said elsewhere to be mostly “rural”. Of these stations, 13 were rural, 2 suburban and 48 urban.

On an earlier occasion, I did a concordance of USHCN identification numbers to GISS lights – such concordances take a surprising amont of time and I used this information to cross-check GISS lights against Peterson’s network – with the usual surprising results. Of the 63 USHCN stations in the Peterson network, 9 had GISS-lights of 0. However, 3 of the 9 sites with lights=0 were classed by Peterson as urban (Fort Yates ND; Utah Lake Lehi UT; Fort Valley AZ) while 6 were classified as rural.

Of the 13 Peterson USHCN sites classified as “rural”, the GISS lights were as high as 19. Checking the 48 Peterson USHCN sites classified as “urban”, 15 had GISS lights less than or equal to 19 (including 3 with lights=0 as noted above.)

One of the Peterson clusters is in California, where surfacestations.org has a strong survey presence. Here Peterson compared 7 sites classified as “urban”: USHCN sites DAVIS 2 WSW EXP FARM, Lodi; non-USHCN urban sites: Placerville, Sacramento FAA AP, Sacramento WSO City, Antioch Pump Plant, Folson Dam; against one rural non-USHCN non-GHCN site: Camp Pardee. I’m not sure what exactly this proves. If the USHCN is supposed to be a “high quality” network, it’s puzzling that so many sites in the Peterson network do not come from either this network or the GHCN network.

In a follow-up email, Peterson said that the data could be obtained from http://www.ncdc.noaa.gov/oa/mpp/digitalfiles.html which unfortunately is a pay-for-view site. JerryB observed that most of the identification numbers could be matched in the GHCND station inventory. I tested this and matched all but 3 sites (Kissimmee, New Orleans Audubon and Red Rock NV) to GHCND identifications. With a change of one digit, New Orleans Audubon and Kissimmee can be matched to GHCND series. The GHCND station inventory seems to missing NV stations from the last half of the alphabet. Daily data for each of these stations can be downloaded from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all by adding a prefix and suffix to the station id. For example:

usid=457473
loc=file.path(“ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all”,paste(“42500″,usid,”.dly”,sep=””))#no
download.file(loc,”temp.dat”)
test=readLines(“temp.dat”)

I haven’t done any further tests on this yet, but spot checking of some clusters seems quite practical. Though the basis for selection of these comparisons seems very unclear. See related posts here and here.)

Reference:
Peterson, T.C., 2003: Assessment of urban versus rural in situ surface temperatures in the contiguous United States: no difference found. J. Clim., 16, 2941—2959.

Gerry North's Suggested Reading on Climate Models

Obviously the big issue in climate is the impact of 2 times CO2. While my own issues tend to be statistical, this site obviously attracts many readers who want to jump straight to discussion of thermodynamics and theories of atmospheric physics. Articles like G and T obviously inflame such tendencies. For some reason, people who don’t necessarily feel competent to speak up on statistical issues, somehow feel able to opine on complicated issues of atmospheric physics. Although the issues are ultimately much more important than the statistical issues, the quality of discussion, in my opinion, is consistently much lower. This is the only reason that I’ve discouraged discussion of these topics.

However, despite my efforts to discourage such topics, such discussion has become an increasing proportion of posts here and, in my opinion, this has detracted from the average quality of comments here.

I don’t particularly blame readers for their struggles in trying to fully understand exactly how increased CO2 results in higher temperatures. I believe that most readers would like an articulate exposition of how increased CO2 translates into increased temperature. I believe that IPCC had an obligation to provide such an exposition and failed to do so. Their failure to provide such an exposition has left fallow ground for articles like G and T.

As some of you are aware, I’ve regularly asked critics of this blog for suggestions on an exposition of how increased CO2 translates into increased temperature and have little to show for such requests – this in itself is surprising or should be surprising. Yesterday, I asked Gerry North, the Chairman of the NAS Panel for a suggested reference and he’s sent me an article and covering letter.

The purpose of this blog is to analyze articles in detail and not to provide a platform for venting. For the next couple of weeks, I would like to declare a moratorium on people venting their own opinions about climate models, atmospheric physics and thermodynamics, however meritorious these opinions may be. If you want to comment on the model set out in the article below and identify shortcomings and defects in it, fine, otherwise please hold your fire on these topics. Continue reading