NASA Follows CA Recommendation

On Feb 20, 2008, I wrote a post reviewing the provenance of various versions of an individual USHCN station (Lampasas), observing that a much more recent version was available at NOAA than at CDIAC) (the source used by NASA. I made the following recommendation:

Regardless of whether these station histories “matter”, surely there’s no harm in NASA (and GHCN) adopting rational approaches to their handling of the USHCN network. To that end, I would make several recommendations to NASA:
1. Use the NOAA USHCN version rather than the stale CDIAC and/or GHCN versions.
2. Lose the splice and the patch.
3. Use USHCN interpolations rather than Hansenizing interpolations.
4. Use TOBS or perhaps MMTS, and if MMTS is used, ensure that NOAA places this online.

A reader observed that, on March 1, 2008, NASA announced the implementation of the first recommendation.

March 1, 2008: Starting with our next update, USHCN data will be taken from NOAA’s ftp site rather than from CDIAC’s web site. The file will be uploaded each time a new full year is made available. These updates will also automatically include changes to data from previous years that were made by NOAA since our last upload. The publicly available source codes were modified to automatically deal with additional years.

No mention was made of the CA recommendation (although they seemed to be aware of the CA discussions, as they altered the legend on one of their station inventory webpages inserting a caption to a series then under discussion here.)

They didn’t mention anything about the patch – recommendation 2. It will be worth checking to see how they implemented the new version. The most logical approach (recommended by CA) would have been to use the NOAA USHCN Filnet version, which did away with any need for a patch (this patch arose out of the Y2K correction – they calculated patches for all 1221 USHCN stations so that there was no Y2K step between the GHCN Raw version and CDIAC SHAP/Filnet versions). If they use the NOAA USHCN Filnet version consistently, then there is no need for the calculation of a patch; plus, their series will be traceable back to its sources.

It is, of course, possible that they’ve continued to pull post-2005 data from USHCN Raw (NOAA up to date version) and pre-2005 version from USHCN Filnet and continued to estimate a patch between the two. It would be pretty silly if they did and I hope that they don’t. (The new method is not yet implemented in the online database).

They’ve slightly edited their methodology page to reflect the changed procedure. As a source, instead of CDIAC, they now show:

For US: USHCN – ftp://ftp.ncdc.noaa.gov/pub/data/ushcn
hcn_doe_mean_data.Z
station_inventory

This doesn’t make clear which USHCN version they use – something that should be shown in this page. They continue to describe a splicing step. This may simply be an oversight or they may plan to continue splicing.

Replacing USHCN-unmodified by USHCN-corrected data:
The reports were converted from F to C and reformatted; data marked as being
filled in using interpolation methods were removed. USHCN-IDs were replaced
by the corresponding GHCN-ID. The latest common 10 years for each station
were used to compare corrected and uncorrected data. The offset obtained in
way was subtracted from the corrected USHCN reports to match any new incoming
GHCN reports for that station (GHCN reports are updated monthly; in the past,
USHCN data used to lag by 1-5 years).

As to the headline – NASA did not credit CA in announcing the changed procedure. Perhaps the timing was simply coincidence. In any event, the net result is a slight improvement in NASA’s methodology and so we can all take some comfort in that.

This entry was written by Stephen McIntyre, posted on Mar 4, 2008 at 8:01 AM, filed under General, GISTEMP, GISTEMP Replication, Surface Record. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

35 Comments

kim

Posted Mar 4, 2008 at 8:04 AM | Permalink

Your, well someone’s, tax dollars at work.
================
Ron Cram

Posted Mar 4, 2008 at 8:36 AM | Permalink

Steve,

If you keep finding their mistakes and giving them advice, they will have to put you on the payroll.
Marshall

Posted Mar 4, 2008 at 9:18 AM | Permalink

Kudos Steve;it is stuff like this that proves that CA does have clout in the scientific community.

Maybe NASA really is warming up to CA and other less popular, but correct venues.
G Alston

Posted Mar 4, 2008 at 9:31 AM | Permalink

Steve,

Do you have a “scorecard” anywhere on the site so we can have a summary
of changes being implemented (by NASA or otherwise) as a result of the
activities/discoveries here?
Phil.

Posted Mar 4, 2008 at 9:39 AM | Permalink

Are you sure that NASA didn’t announce it on Jan 16th?
Sam Urbinto

Posted Mar 4, 2008 at 9:42 AM | Permalink

Are you sure they didn’t announce it in 1972?
pk

Posted Mar 4, 2008 at 9:48 AM | Permalink

Are you sure that NASA didn’t announce it on Jan 16th?

Scroll about three quarters down the NASA page and you’ll see the March 1, 2008 update.
Hans Erren

Posted Mar 4, 2008 at 9:49 AM | Permalink

Tamino wrote earlier:

The only reason you think anybody ever claimed that there’s no problem with the data, is that you’ve been spoon-fed that propaganda (from CA, I wonder?). If there were no data problems, they wouldn’t be extensively documented nor would procedures be devised to compensate for them. Why do you think adjustments exist? If there are no adjustments, denialists complain about errors in the raw data; if there are adjustments, denialists complain about the adjustments.

And they’ll try very hard to make you (and others) believe that climate scientists claim there are no problems — when it’s the climate scientists who are working HARDEST to find what those problems are so they can be compensated in the most effective way possible.

😉
(terry)

Posted Mar 4, 2008 at 10:02 AM | Permalink

congrats. It’s nice to see due diligence being done, even if it required arm-twisting.
VG

Posted Mar 4, 2008 at 10:45 AM | Permalink

I don’t think it matters if they (NASA) “warm up to CA” anymore. Copies should be kept of all these procedures for future reference.
Marshall

Posted Mar 4, 2008 at 10:56 AM | Permalink

#10

Right the main point is good science and CA is doing that.
jEEZ

Posted Mar 4, 2008 at 11:10 AM | Permalink

Does “adjusted”, either positive or negative, data exist for the sites with no input data?
Tony Edwards

Posted Mar 4, 2008 at 12:33 PM | Permalink

Steve, great work, keep up the scrutinizing. An intense scrute never hurt anybody.
Joke aside, I notice this sentence in the quote.
“These updates will also automatically include changes to data from previous years that were made by NOAA since our last upload.”
Does this mean what I read it to mean, that data from previous years might be changed? Surely, once a year’s data is set, it should be set in stone, not adjusted again for who knows what reason. How can anyone work with data that changes from year to year? In fact, to my mind, there should not be any changes to the early year’s data, since it is probably as good as you are going to get. Recent changes , microsite, UHI, etc. might require reworking. but this should be done on the affected data, not on the historical data.
LadyGray

Posted Mar 4, 2008 at 1:46 PM | Permalink

Tony –

It might get colder in the past, and if they didn’t adjust then how would we know?
Stan Palmer

Posted Mar 4, 2008 at 1:58 PM | Permalink

These updates will also automatically include changes to data from previous years that were made by NOAA since our last upload.

This is an unsound practice. In engineering practice, if a change in data is required, the older version would be preserved and a new version created to contain the modified data. A change notice would be issued to detail the reason for and origin of the changes.

Automatic updates to the input data could cause changes to outputs that cannot be traced and justified.

This is yet another example of why AGW research should be taken out of the hands of academic sceintists and given to people who know how to manage very critical projects.
Stan Palmer

Posted Mar 4, 2008 at 2:03 PM | Permalink

re 16

i should have added that any data change notice would have to be approved by an official in charge of maintaining the data. That official would be responsible for the data and have to answer for any discrepancies and/or errors. This would include the possibility of dismissal.
Phil.

Posted Mar 4, 2008 at 3:03 PM | Permalink

Stan Palmer says:
March 4th, 2008 at 1:58 pm
These updates will also automatically include changes to data from previous years that were made by NOAA since our last upload.

This is an unsound practice. In engineering practice, if a change in data is required, the older version would be preserved and a new version created to contain the modified data. A change notice would be issued to detail the reason for and origin of the changes.

Automatic updates to the input data could cause changes to outputs that cannot be traced and justified.

This is yet another example of why AGW research should be taken out of the hands of academic sceintists and given to people who know how to manage very critical projects.

Well it’s NOAA that’s making the changes not the ‘academic scientists’ who are just using the data provided. If you want to see the provenance of any changes check with them. I’m sure Steve McI thought of that when he made his recommendation, although if he wants credit for that he should have put it in writing to Hansen.
steven mosher

Posted Mar 4, 2008 at 3:26 PM | Permalink

Re 17.

Phil. they are not just “using the data provided” nice try. The Gisstemp program would barely get
an F in any engineering class. The documentation is non existent or pathetic. The coding practice
is ripe for disaster ( witness the Y2K debacle) changes are made without documentation.
New versions are posted without anouncement. It’s a code base that has pathetically lame legacy Fortran code and poorly stuctured patches written in Python. If you dont see the engineering
danger in that, then the sky is chartruse and I am a Leprechaun.

Stop defending a practice that you know is substandard and demand more accountability and quality
from public servants. For the record, After the quality is brought up to speed, AGW will still
be the best explanation for the warming we see.
LadyGray

Posted Mar 4, 2008 at 5:19 PM | Permalink

I think documentation is the key to there being problems. It matters not if the original code was written in HP Basic, Cobol, or Pascal. If it was well documented, then it can be flow-charted, then it can be replicated in any language you want. If the code was in C++, but without documentation, it would be just as obscure. The medium is not the message here. MilSpec standards of software development have been around a lot longer than NASA and Hansen. It is appalling that those standards were not applied.
Stan Palmer

Posted Mar 4, 2008 at 5:19 PM | Permalink

re 17

Well it’s NOAA that’s making the changes not the ‘academic scientists’ who are just using the data provided. If you want to see the provenance of any changes check with them. I’m sure Steve McI thought of that when he made his recommendation, although if he wants credit for that he should have put it in writing to Hansen

This is not an answer to the issue. Data is being accepted from some source for a calculation. Whether or not this data is to be used is not the problem of the provider but of the program that is going to be doing the calculation. It is up to them to verify the suitability and accuracy of any inputs. If the results of their calculation change because of changes in data inputs then it is still their responsibility to certify the calculations accuracy.

A medical program uses uncalibrated instruments to perfrom an experiment. Whose problem is it if faulty results are obtained; the research group or the instrument manufacturer?
Carrick

Posted Mar 5, 2008 at 12:39 AM | Permalink

LadyGray

. If the code was in C++, but without documentation, it would be just as obscure.

I understand your point, but you’ll have to admit C++ is capable of being to a large degree self-documenting, especially with well chosen naming conventions. Beyond that, from my experience, the actual algorithms used to implement the code are best described in peer-reviewed journals. When I publish any of my models, they are always done in such a fashion that others can replicate my code and results without having need to my source code. Beyond that, I often have ancillary documentation that spells out the manuscript results in more details (including discussions of specific algorithms, sometimes including tutorials on things like e.g. the effect of the window choice in the time-windowing of data).

However, I can’t imagine trying to embed that level of detail into a computer code. Wouldn’t these various documentation standards require that excrutiating level of detail though?

I’m pretty sure the code itself would cease to be readable. (The basic problem here is that the algorithm is often expressed in relatively few lines of code, even when the details of the algorithm remain very complex.) I would imagine this holds true in spades for 3-d hydrodynamic codes. The solution I’ve found that works best for me is to reference the relevant equation number of the manuscript in the code, then use variable names that have an easy correspondence to the mathematical symbols used in the technical document.

Not sure about the Milspec standards. Aren’t these a bit out of date, unless you’re stuck with an old pre-object-oriented language that is? I’ve never used it, but have heard a lot of good things about the IEEE standard.
Jason

Posted Mar 5, 2008 at 4:11 AM | Permalink

It’s like examining their balls with a microscope.
yorick

Posted Mar 5, 2008 at 4:56 AM | Permalink

I wonder who will win in the end, Hansen’s Zen? Or Steve’s motercylce maintenence.
TAC

Posted Mar 5, 2008 at 5:12 AM | Permalink

When Hansen’s Zen embraces Steve’s motorcycle maintenance, we all win.
steven mosher

Posted Mar 5, 2008 at 9:46 AM | Permalink

OK, this is weird. This morning I wanted to revisit my investigation of CRATER LAKE.

H2001 deletes Crater lake records, but makes no substantive argument why it should be
deleted. It’s just deleted. I looked at Crater Lake a while back and compared it to its neighbors
at prospect and ashland and klamath falls. I saw nothing odd. The deletion of crater lake, while
inconsequential to the global record, remained an oddity to me.

In the course of getting files from USHCN, site by site, the whole web site went crash.

Weird.
James A. Donald

Posted Mar 5, 2008 at 1:49 PM | Permalink

If recent measurements of the urban heat island effect are the correct order of magnitude (and why did no one except amateurs ever try to directly measure the urban heat island effect) then this whole issue is moot, because the urban heat island effect is so much larger than global warming that any attempt to measure global warming through weather stations going to have huge errors. If the results are not absurdly different from satellite and sea surface, and sea ice area measurements, this probably indicates that the compilers kept jiggering their methods till they got the result that they expected.

To get an indication of global warming through weather stations, have to select those sites that are truly rural, and always have been, and are a reasonable distance from the operators house, and always have been.

The satellite data not only gives global temperatures, but local. It would be interesting, and doubtless publishable, to compare known good weather stations with satellite measurements over time.
jEez

Posted Mar 5, 2008 at 5:31 PM | Permalink

At a certain point, Steve’s work for NASA may prove to be invoiceable. May I suggest a name for your consulting firm–Jesterworks.
jEez

Posted Mar 5, 2008 at 5:42 PM | Permalink

And I apologize, my earlier comment 12 is posted in the wrong thread. Anthony, can you move the post about no data stations to the No Data For Some NASA Stations thread.
Bob KC

Posted Mar 5, 2008 at 6:23 PM | Permalink

Steven Mo.,

Re. Crater Lake, see Ellis’ comment here.
steven mosher

Posted Mar 5, 2008 at 6:35 PM | Permalink

RE 29. I’ve asked questions about Crater lake at RC over the past six months.

It’s removal is a mystery.
Steve Moore

Posted Mar 5, 2008 at 8:49 PM | Permalink

RE: #8

…when it’s the climate scientists who are working HARDEST to find what those problems are so they can be compensated in the most effective way possible

“Compensated”?
The problems, or the climate scientists?
Jaye Bass

Posted Mar 6, 2008 at 12:54 AM | Permalink

they are always done in such a fashion that others can replicate my code and results without having need to my source code.

One cannot replicate code without have the code. Your algorithm may say solve this pde, but the code for actually doing so varies greatly from modeler to modeler. You could be running on a 32 bit machine while somebody else is running on a 64 bit machine, pod sizes will be different, round off will be different, etc, and so forth. Even with an arsenal of standardized test suites its very difficult to say two different code bases are equivalent even if they are following the same requirements and top level designs.
MarkW

Posted Mar 6, 2008 at 5:44 AM | Permalink

Because of rounding issues, even if I take the exact same source modules, moving from a 32-bit machine to a 64-bit machine will cause the compiled code to produce different outputs.
Pat Keating

Posted Mar 6, 2008 at 5:04 PM | Permalink

If you are getting significantly different numbers in going from 32 bits to 64 bits, then you need to take a hard look at your algorithm, and use a different approach.
MarkW

Posted Mar 7, 2008 at 5:16 AM | Permalink

Any algorithm that iterates is going to accumulate rounding errors quickly.