Dan Hughes on Software Validation

I [Dan Hughes] posted a short discussion of some software Verification and Validation issues on another thread. Here are some additional thoughts.

I have a few questions for anyone who have answers. I consider these issues to be essentially show-stoppers as far as use of the results of any of the AOLBCGCM codes and all supporting codes use in all aspects of climate-change analyses, for either (1) archival peer-reviewed publications, (2) providing true insight into the phenomena and processes that are modeled, and most importantly (3) for decision-making relative to public policies. Any and all professional software developers would absolutely require that all of the issues to be mentioned below be sufficiently addressed and documented before using any software for applications in the analyses areas for which it was designed.

In no particular order, as each of the following is very important, can anyone provide documented information about:

(1). Audited Software Quality Assurance (SQA) Plans for any of the computer software that is used in all aspects of climate-change analyses.

(2) Documentation of Maintenance under audited and approved SQA procedures of the “‹Å“frozen’ versions that are used for production-level applications.

(3) Documentation of the Qualifications of the users of the software to apply the software to the analyses that they perform.

(4) Documentation of independent Verification that the source coding is correct relative to the code-specification documents.

(5) Documentation of independent Verification that the equations in the code are solved correctly and the order of convergence of the solutions of the discrete equations to the continuous equations has been determined.

(6) Sufficient information from which the software and its applications and results can be independently replicated by personnel not associated with the software.

(7) It is my impression that use of ensemble averages of several computer calculations that are based on deterministic models and equations is unique to the climate-change community in all of science and engineering. I can be easily corrected on this point if anyone can provide a reference that shows that the procedure is used in any other applications. (The use of monte carlo methods to solve the model equations is not the same thing). The use of ensemble averaging and the resulting graphs of the results makes it very difficult to gain an understanding of the calculated results; rough long-term trends are about all that can be discerned from the plots.

(8) Documentation that shows that the codes always calculate physically realistic numbers. For example, the time-rate-of-change of temperature, say, is always consistent with the energy equations and is not the results of numerical instabilities or other numerical solution methods problems.

(9) Documentation in which the mathematical properties (characteristics, proper boundary condition specifications, well- (or ill-) posedness, etc.) of all the continuous equations used in a code have been determined. Do attractors exist, for example.

(10) Documentation in which it has been shown analytically that the system of continuous equations used in any AOLBCGCM model has the chaotic properties that seem to be invoked by association and not by complete analysis. Strange-looking output from computer codes does not prove that the system of continuous equations possess chaotic characteristics. Output from computer codes might very likely be results of modeling problems, mistakes, solution errors, and/or numerical instabilities.

Invoking/appealing-to an analogy to the Lorenz continuous equations is not appropriate for any other model systems. The Lorenz model equations are a severely truncated approximation of an already overly simplified model. The wide range of physical time constants and potential phase errors in the numerical solutions almost guarantees that aperiodic behavior will be calculated.

Especially true considering the next item.

(11) Documentation in which it has been determined that the discrete equations and numerical solution method are consistent and stable and thus the convergence of the solution of the discrete equations to the continuous equations is assured. Actually I understand that the large AOLBCGCM codes are known to be unable to demonstrate independence of the discrete approximations used in the numerical solution methods. The calculated results are in fact known to be functions of the spatial and temporal representations used in the numerical solutions. This characteristic proves that convergence cannot be demonstrated. Consistency and stability remain open questions.

(12) Documentation in which it is shown that the models/codes/calculations have been Validated for applications to the analyses for which it has been designed.

All software, each and every one, that is used for analyses of applications the results of which might influence decisions that affect the health and safety of the public will have addressed all these issues in detail.

If my understanding of the status of these critical issues is correct I can only conclude:

(1) The software used in the climate-change community does not meet the most fundamental requirements of software used in almost all other areas of science and engineering. Almost none of the basic elements of accepted software design and applications for production-level software are applied to climate-change software.

(2) The calculated results cannot be accepted as correct and valid additions to the peer-reviewed literature of technical journals.

(3) The software should never be used in attempts to predict the effects of changes in public policy (fuel sources for energy production, say) on the climate; neither short- or long-range.

(4) The calculated results are highly likely not correct relative to physical reality.

I will say that item (11) is in fact a totally unacceptable characteristic for any engineering and scientific software. The results from any codes that have this property would be rejected for publication by many professional engineering organizations. I can be easily corrected if anyone can point me to calculated results from any other areas of science and engineering in which the fact that the numerical methods are known to be not converged is accepted as being, well, acceptable practice. Buildings, airplanes, bridges, elevators, flight-control systems, nothing in fact, are never designed under this approach.

Actually all professional software development projects require far more than the information that I discuss above. Any textbook on software development can be consulted for a more complete listing and detailed discussions. Almost all the large complex AOLBCGCM codes have evolved over decades from software that was significantly more simple than the present versions. These codes have not been designed and built “‹Å“from scratch’ on a “‹Å“clean piece of paper’. Newly-built software, designed and constructed under SQA plans and associated procedures require very significantly more documentation and independent review, Verification, and Validation than that mentioned above.

Several have mentioned that source listings for some of the AOLBCGCM codes are available on the Web. This is very true. And it is equally true that in theory the source coding could be used to Verify the coding. However, in order for this to be a useful exercise we need some kind of specification of what was intended to be coded into the code. In the absence of this information we cannot develop objective metrics for judging that the coding is correct.

The level of detail needed for Verification of the coding is generally many times greater than that typically available for many software products. Because the objective is Verification of the coding a specification of all the equations in the code is needed. For legacy software that has evolved over decades of time, this information is usually contained in a theory and numerical methods manual in which the continuous equations and the discrete approximations to these and the numerical solution methods used to solve the discrete equations are described in detail. A computer code manual in which the structure of the code is describe in sufficient detail that independent outside interests can understand the source code would also be helpful in any attempts to Verify the coding. I have not been successful in finding such manuals for AOLBCGCM codes.

As someone mentioned, as taxpayer-funded software, this documentation should in fact be readily available. It is not.

The level of documentation detail required for an independent V&V and SQA effort is enormous. Chip H above has mentioned documentation required for development and Verification of the coding. In order to get a even better handle on the exact nature of the models, methods and codes and their applications other documentation should be available. This documentation includes:

Volume 1: Model Theory and Solution Methods Manual. A theory manual in which the derivations of each and every equation, continuous and discrete, are given in sufficient detail that the final equations for the models can be obtained by independent interests.

Volume 2: Computer Code Manual. A computer code manual in which the code is described in sufficient detail that independent outside interests can understand the source code.

Volume 3: User’s Manual. A user’s manual that describes how to develop the input for the code, perform the calculations and understand the results form the calculations.

Volume 4: Verification and Validation Manual. A manual or reports in which the verification and validation of the basic functions of the software and example applications are given.

Volume 5: Qualification Manual. Additional manuals or reports in which the models and methods, software and user are demonstrated to be qualified for application to analyses of the intended application areas.

Other reports and papers can be used to supplement the above documentation. These then become a part of the official record of the software relative to independent V&V and SQA efforts.

The coding cannot be Verified given the level of documentation that I have found so far. This was only a first attempt, and by someone not familiar with the codes. However I think it does give a first glimpse into the lack of sufficient documentation for verifying the coding.

55 Comments

  1. Reid
    Posted Dec 2, 2006 at 7:00 PM | Permalink

    Dan,

    If you have the time and inclination, I suggest you create a GCM Audit blog. It is nothing less than a fiasco that GCM’s are informing public policy and opinion.

    GCM = mathematical conjuring, modern divination

    Let us consult the Oracle at NCAR.

  2. Armand MacMurray
    Posted Dec 2, 2006 at 8:24 PM | Permalink

    Dan,
    You make statements such as:

    All software, each and every one, that is used for analyses of applications the results of which might influence decisions that affect the health and safety of the public will have addressed all these issues in detail.

    and:

    Actually all professional software development projects require far more than the information that I discuss above.

    These are certainly “gold-standard” goals. However, they certainly are not all met by software used in biology, even in pharmaceutical research. In addition, I don’t know of any operating systems in general use which would satisfy the criteria. Thus, I’m curious: what fields actually apply these standards in real life? Are there any OS’s that meet the criteria?
    Items such as (8) seem to be commonly met, but what level of analysis would be required by something like (4) — a proof of correctness (or perhaps automatic code generation from a specification), or just a second programmer reading over the code?

  3. Paul Penrose
    Posted Dec 2, 2006 at 8:46 PM | Permalink

    Armand,
    All class-1 medical devices meet these stringent standards. This includes pacemakers and implantable defibulators. Even class-3 devices (external monitors, etc.) follow all or most of these standards. I don’t think any of the general purpose OTSS (off the shelf software) OSes adhere to these standards, but real-time OSes like uC/OS have been fully validated for use in medical devices and avionics.

    Just because Windows or CableTV set-top boxes aren’t developed using these standards does not make it impossible, or even unreasonable. Personally I don’t blame the developers of the AOGCMs for not using these standards because I believe they developed them more as an experiment in order to help with their understanding of climate. The real sin is when they are used to justify the results of other researchers (like the paeloclimate community) or to inform public policy (IPCC).

  4. Posted Dec 2, 2006 at 10:46 PM | Permalink

    Software verification and validation are an integral part of the software engineering process. Verification establishes the truth between a software product and its specification. Validation establishes the fitness of a software product to it’s operational mission. In this case to model the climate.

    In the 1980s I worked with a great group of engineers at TRW to develop some of the most capable weapon systems software verification and validations tools in the free world. We were developing software for military applications, including flight controls. One mistake and the software could turn a pilot and the crew into a greasy spot on the ground.

    One missed decimal point caused a TRW spacecraft to crash, rather then landed on the moon. As a result, TRW became a leader in the development of verification and validation tools, and a SW development process that reduced the risk of critical errors. In many cases we build the tools and validated the software developed by other firms, giving the V&V process third party independence. Our job was to find errors. We fed those errors back to the developers, until the product performed as specified.

    I highly endorse the verification and validation of CGMs. However, this is complex and time consuming process. Who will pay? Given these models are being used to establish policies which will cost tax payers billions, and national economies trillions, it should be worth the time and effort. I suggest that all future grants for CGMs include a V&V requirement clause, including any add on funding to existing CGMs. No V&V, the CGM output cannot be used to make policy.

  5. Paul Linsay
    Posted Dec 3, 2006 at 9:39 AM | Permalink

    This is all well and good, but it would be possible to write software and documentation that passes all eleven tests above and still is wrong. Until the underlying physical basis of the climate is understood even perfect numerical algoritms and documentation aren’t going to make a GCM correct.

  6. Posted Dec 3, 2006 at 10:29 AM | Permalink

    Paul, you are right. It is possible to build a perfectly bad well documented model. But, having a well documented model makes it easier to find “the bad”.

  7. RichardT
    Posted Dec 3, 2006 at 2:22 PM | Permalink

    What is AOLBCGCM supposed to be an acronym for? Google has is on this page and one other only.

    There is a great difference between the validation necessary for climate models and mission critical code in health care, avionics and weapons systems. If the latter fail, someone dies (or doesn’t). If a climate model starts to behave strangely, the code can be checked, corrected and the model re-run. Arguing that climate models need to be perfectly validated before they can be used is to argue never to use them.

  8. Stan Palmer
    Posted Dec 3, 2006 at 2:37 PM | Permalink

    from 7

    If a climate model starts to behave strangely, the code can be checked, corrected and the model re-run. Arguing that climate models need to be perfectly validated before they can be used is to argue never to use them.

    How would one know that it was behaving strangely if the code had not been validated?

  9. Stan Palmer
    Posted Dec 3, 2006 at 2:43 PM | Permalink

    re 7

    I should have also added that validation is not verification. It is not checking that the code runs as described but that system is doing something useful. In the case of a climate model, validation would entail, among other things, that the design (not just the code) was based on a sound underlying theory.

  10. Ed
    Posted Dec 3, 2006 at 2:56 PM | Permalink

    NASA’s Space Shuttle software engineering team is known for using strigent methodologies for verification and validation. When something is wrong, NASA’s team not only fixes the problem, but goes back to look at the process that created the error and then find a fix to the process. Thus, their software development process itself improves over time, delivering fewer errors.

    I think far too many have been accustomed to poor software quality and are unaware that methodologies are available to build more reliable software from the beginning. And when errors are found, there are methods to fix the process, not just the error.

    Most mass produced software strives for “good enough” rather than “perfection”, accounting for cost-benefits and what the market will bear. With the present state of the art, it is possibly too expensive to strive for perfection in say, word processing software – but “good enough” can satisfy most of the people, most of the time.

    I would lean towards “striving for perfection” in GCMs since the benefit of perfection could be quite high relative to spending trillions of dollars on climate mitigation.

    I agree with the comments above that it is quite possible to have software that perfectly meets all design requirements and has zero bugs and still produces the wrong answer. It is, however, easier to eliminate the cause of the wrong answer if you actually know what the software is doing versus the traditional approach to software: “Hey, the kharma feels right – this software is ready to ship!”

  11. Frank Scammell
    Posted Dec 3, 2006 at 3:37 PM | Permalink

    Dan Hughes- I completely agree with you on verification and validation. I am particularly interested in your observations in (10) Documentation … And I’m concerned about some other model issues:

    1) I believe (but can’t prove) that all models are “tuned” by (heat flux adjustments) to provide the required match to recent history. This produces the match to the jagged “instrumental record” of which modelers are so proud. I think these heat flux adjustments are so large as to defy physical reality, and maybe, in aggregate, larger than the heat flux imbalance attributed to mankind’s contribution to the Earth’s imbalance (AGW), and our current descent into catastrophe. Why do all of the adjustments disappear when projecting perhaps a hundred years into the future ? Where has the natural variability gone ? Some physics I’m unaware of ? How do we know it is all AGW, and more importantly, is there anything that man can do that will change it ? Perhaps, adapt, certainly more economically viable.

    2) The “instrumental record” is also wrong because it is adjusted temperature readings, adjusted for Urban Heat Island effects, but with no specified way to separate the measurements from the adjustments, and thus get the “true” instrumental record. It is wrong because it has an upward drift (bias) that is not reflected in the satellite records. It is also probably wrong because it is primarily urban, Northern hemisphere dominated, with a substantial reduction in number of reporting stations, and thus not representative of even Northern hemisphere, to say nothing of global.

    3) The effect of this temperature bias is somewhat subtle. It goes into the CO2 climate sensitivity (to a doubling of CO2). Remembering that we are talking about highly dynamic models, the method of introducing the CO2 climate sensitivity is important. A step function of a doubling (all at once) will be similar to hitting a bell with a hammer. A ramp will be somewhat better. (Best would be the introduction as an S curve – equivalent to minimizing jerk [first derivative of acceleration]). This information doesn’t seem to be available. Does the CO2 climate sensitivity take into account CO2 absorption nonlinearity (saturation) ? We are less than 1/2 way to a doubling of CO2, but have already experienced 3/4 of the effect on temperature (0.6 deg. C). Why should we expect more than another 0.6 deg. C at most ?

    4) The whole concept of climate sensitivity is suspect. It implies linearity. If we totally crash the global economy, and go to only 1.5x, instead of 2.0x, will we save the world ? from what ? Modelers seem to delight in telling us how complex their models are, non-linear, perhaps chaotic (highly sensitive to initial conditions). How do we know that all models are using the right initial conditions ? and the same initial conditions ?

    5) All of the models seem to produce exponentially rising 100 year forecasts. Do they just throw out the exponentially decreasing ones ? (obviously wrong ! our “physics” or religious belief tells us so). How do we know that the results are not just error propagation ?

    6) Lately, we have been told that the 06 hurricane forecast had been thrown off by an unexpected El Nino. What ? Can’t we even forecast, less than a year in advance, ENSO events ?

    To me, at least, these are important model related issues. I make no claim about my understanding of them all. I hope the readers of CA will help me out. Note that proving any one of them wrong doesn’t prove they are all wrong. I believe they are all independent.

  12. Steve Bloom
    Posted Dec 3, 2006 at 5:42 PM | Permalink

    The material raised here is the obvious focus for the next set of hearings by Barton and Inhofe.

  13. Dave Dardinger
    Posted Dec 3, 2006 at 5:59 PM | Permalink

    re: #12

    Alright! Who let the troll in?

  14. Reid
    Posted Dec 3, 2006 at 6:47 PM | Permalink

    Re #12: “The material raised here is the obvious focus for the next set of hearings by Barton and Inhofe.”

    Obviously Steve. Especially as year after year of real-world observations don’t agree with the GCM’s and hysterical AGW predictions.

    The blowback against the AGW hypothesis is just seriously starting. As time proceeds in the political battle, the AGW hypothesis will increasingly be seen as another in a long series of false alarms by eco-theist scientists and their big government allies.

  15. Willis Eschenbach
    Posted Dec 3, 2006 at 7:13 PM | Permalink

    Frank, I appreciate your raising important points above. Among them you say:

    1) I believe (but can’t prove) that all models are “tuned” by (heat flux adjustments) to provide the required match to recent history.

    There is no question about this, the modelers freely admit that they are tuned. However, a wide variety of things are tuned for, using a wide variety of adjustments, not just adjustments to heat flux. For example, over at UnrealClimate, Gavin Schmidt (a modeller) says:

    [Response: … Secondly, the actual mean albedo is still not that well known – true, recent estimates are lower than the earlier numbers which most GCMs were tuned against, but I would be hesitiant in assuming that a 1% change in global albedo will suddenly make a big difference in response. That is not our experience at all. – gavin]

    Let me point out (again) that this tuning negates one of the main arguments for AGW. The argument is that they tune the GCMs, including all their forcings, to match the historical record. Then they remove the CO2 forcing, and since the GCM no longer is able to hindcast the historical record, they say “See? This proves that CO2 forcing has to be there, so it must be real!”.

    I’m sure you can see the flaw in that logic.

    w.

    PS – a 1% change in albedo is a 3.4W/m2 change in solar forcing. Since this is about equivalent to a doubling of CO2, I’m not sure why Gavin says it won’t make much difference …

  16. Dan Hughes
    Posted Dec 3, 2006 at 7:38 PM | Permalink

    re: #12

    I am a private citizen employed by no one and can state with certainty that no members of either the House or Senate are even aware of my existence. BTW, Barton and Inhofe are out and two members of another party are in.

    Next I’ll address the technical issues that I discussed that you found to be incorrect.

    hmmmmm … there appear not to be any.

  17. Neil Haven
    Posted Dec 3, 2006 at 9:11 PM | Permalink

    “In the absence of … we cannot develop objective metrics for judging that the coding is correct.”
    IMHO, this is the crux of the matter. How do we judge a climate model? We do not know what sets of equations one should solve. We do not know what initial conditions one should use. We do not agree on what quantities the models should predict. We do not know what the standards for falsification of a climate model would be. These are the hallmarks of a research project in progress ‘€” not a usable technology.

  18. Neil Haven
    Posted Dec 3, 2006 at 9:22 PM | Permalink

    Dan,
    You have made a number of very important points. Taken as a whole, your bill of particulars is a powerful indictment of the uses to which climate modeling software has been put. Perhaps playing devil’s advocate a bit will help sharpen the debate:

    Lack of Audited Software Quality Assurance Plans: The academic analogues here are conferences, grant reviews, and peer reviews of publications. Academics might maintain this is a more stringent review process than formal SQA.

    Lack of Version Control, Maintenance Documentation: The academic analogue of a “frozen’ software version barely exists. In a field with fundamental unanswered questions, such as climate modeling, if you’re not making changes you’re not doing research. Many (most?) reported results are from new algorithms newly-implemented.

    User qualifications: Qualifications are judged by the research community as a whole and discounted or honored in proportion to reputation. Academics might maintain this is a more stringent qualification process than a formal documentation process.

    Independent replication: Due to limitations on computational resources, independent replication is difficult; however, since the codes are publicly available, nothing prevents an interested party with sufficient resources from undertaking replication experiments.

    Ensemble averages: A similar method of long pedigree in the protein structure prediction community is the use of “consensus methods” and “prediction meta-servers” for assigning putative folding structures to unknown proteins. These consensus methods, etc. are not ensemble averaging methods, but they are based on finding agreements between multiple deterministic models for the same phenomenon.

    Physically realistic numbers, etc.: Published results from climate models clearly exhibit unphysical behavior (see, for example, here). Since it is unknown which sets of initial conditions are physically consistent, it is unknown whether such unphysical behavior is desirable in a model.

  19. Neil Haven
    Posted Dec 3, 2006 at 9:26 PM | Permalink

    “In the absence of … we cannot develop objective metrics for judging that the coding is correct.”

    IMHO, this is the crux of the matter. How do we judge a climate model? We do not know what sets of equations one should solve. We do not know what initial conditions one should use. We do not agree on what quantities the models should predict. We do not know what the standards for falsification of a climate model would be. These are the hallmarks of a research project in progress — not a usable technology.

  20. Neil Haven
    Posted Dec 3, 2006 at 10:12 PM | Permalink

    Dan,
    You have made a number of very important points. Taken as a whole, your bill of particulars is a powerful indictment of the uses to which climate modeling software has been put. Perhaps playing devil’s advocate a bit will help sharpen the debate:

    Lack of Audited Software Quality Assurance Plans: The academic analogues here are conferences, grant reviews, and peer reviews of publications. Academics might maintain this is a more stringent review process than formal SQA.

    Lack of Version Control, Maintenance Documentation: The academic analogue of a frozen’ software version barely exists. In a field with fundamental unanswered questions, such as climate modeling, if you’re not making changes you’re not doing research. Many (most?) reported results are from new algorithms newly-implemented.

    User qualifications: Qualifications are judged by the research community as a whole and discounted or honored in proportion to reputation. Academics might maintain this is a more stringent qualification process than a formal documentation process.

    Independent replication: Due to limitations on computational resources, independent replication is difficult; however, since the codes are publicly available, nothing prevents an interested party with sufficient resources from undertaking replication experiments.

    Ensemble averages: A similar method of long pedigree in the protein structure prediction community is the use of “consensus methods” and “prediction meta-servers” for assigning putative folding structures to unknown proteins. These consensus methods, etc. are not ensemble averaging methods, but they are based on finding agreements between multiple deterministic models for the same phenomenon.

    Physically realistic numbers, etc.: Published results from climate models clearly exhibit unphysical behavior (see, for example, here). Since it is unknown which sets of initial conditions are physically consistent, it is unknown whether such unphysical behavior is desirable in a model.

  21. Steve Bloom
    Posted Dec 3, 2006 at 10:38 PM | Permalink

    Re #14: “The blowback against the AGW hypothesis is just seriously starting. As time proceeds in the political battle, the AGW hypothesis will increasingly be seen as another in a long series of false alarms by eco-theist scientists and their big government allies.”

    Reid, I’m curious as to your evidence for this statement. As far as I can see it the trend seems to be exactly the opposite.

  22. bender
    Posted Dec 3, 2006 at 10:52 PM | Permalink

    Is it not possible that both trends are occurring, just in different circles? Does it make sense to generalize so? I have to agree with Bloom that the AGW movement is far from exhausting its momentum.

  23. Dave Dardinger
    Posted Dec 3, 2006 at 11:16 PM | Permalink

    re: #20. It may be that Steve McIntyre is the William F Buckley of the Climate Skeptic movement. Just as Buckley funded National Review out of his own funds mostly at first, so Steve B has with CA. And just as US Political Conservatives took to having an outlet for their ideas and aspirations, so serious skeptics have flocked to CA to have a place where they can seriously discuss the issues without being squelched. Will there be a Ronald Reagan of the Climate Skeptic movement who can turn the grassroots movement into a cultural success? That remains to be seen.

    But perhaps even if the movement fails, Steve can look forward to having a long running TV program on PBS or the Canadian equilivant called “Audit Line.”

  24. RMI
    Posted Dec 4, 2006 at 12:28 AM | Permalink

    Reguarding the questions of modeling and climate predictibility: has anyone ever
    run a test on a climate variable to determine if it’s time series’ Lyapunov
    exponent demonstrates non-linear, chaotic, behavior? It seems such a test
    could end the bickering that has long plagued this issue.

    And if climate is non-linear, is it possible to predict it using GCMs, even
    disregarding the sensitivity to initial conditions issue? My point is that the
    GCMs use approximate numerical integration methods to solve the governing
    differential equations in space and time. But these methods assume linear
    evolution of the integrated variables over each integration step. By making the
    integration intervals sufficiently small can such linear solutions reveal
    non-linear effects? Or do these linearizations forever preclude non-linear
    predictions?

  25. bender
    Posted Dec 4, 2006 at 12:30 AM | Permalink

    It’s also worth pointing out that it takes quite some time for new science to reach up to the political spheres. Sometimes years. The recent Mann review proves that there’s still some pre-M&M (i.e. chapter one) warming literature in the pipe. Expect this backlog to keep fuelling the alarmist agenda for some time to come.

    Readers at CA often question Mann’s unrelenting approach. I suspect the strategy is to ignore the real uncertainty and keep the faith, hoping that the new out-of-sample data in 10-20 years’ time will match the AAGW model predictions, thus vindicating all previous work. When he chose to ignore M&M he went “all in”, to borrow a poker term. Now he’s waiting on the “river card”, hoping to pair his ace. Preaching “precaution”, praying for catastrophe.

    The hope is that M&M are a ripple in the road, that this road doesn’t lead anywhere steep & rocky. What happens, of course, all depends on the actual value of A in AGW, which is currently unknown to us. Mann seems to think it’s a sure bet A &gt 0.5. Skeptics are not convinced A &gt 0.2. One day we will know.

  26. bender
    Posted Dec 4, 2006 at 12:57 AM | Permalink

    Re #22 I’ve done this myself. I can post the R script tomorrow if you like.

    But … I don’t think this tells you what you think it tells you. The Lyapunov exponent (àŽ») refers to the rate of divergence away from an attractor given an arbitrarily small difference in initial conditions. Reconstructing a strange chaotic attractor, such as Lorenz’s (1963) butterfly attractor, takes a lot of data. Short time-series yield very poor estimates of the true àŽ».

    I don’t think there’s much “bickering” on the issue. Weather is nonlinear chaotic in much the way that Lorenz described, as a result of heat transfer in 3-D space. I would be very surprised if meteorologists hadn’t followed up Lorenz, providing precise parameter estimates for his coupled heat transfer equations, thus yielding approximate analytical solutions for àŽ» (as opposed to estimates derived from crude time-series). References anyone?

    (That’s a quick answer. A good answer may be forthcoming.)

  27. bender
    Posted Dec 4, 2006 at 1:00 AM | Permalink

    P.S. I believe weather forecasting is OT for this blog. (I think even GCMs are borderline.) The focus here is on multiproxy reconstructions.

  28. Rev Jackson
    Posted Dec 4, 2006 at 1:10 AM | Permalink

    Re #19: Mr Bloom. You can fool some of the people all of the time, and you can fool all of the people some of the time, but you can’t fool all of the people all of the time!

  29. Paul Linsay
    Posted Dec 4, 2006 at 6:42 AM | Permalink

    RE: Lyapunov exponents. There isn’t going to be just one for these models, there are going to be millions because of the number of independent variables. One per variable.

  30. Steve Sadlov
    Posted Dec 4, 2006 at 11:54 AM | Permalink

    RE: #12 – this is classic software quality engineering. Mr. Bloom, can I surmise you have not been exposed to this previously?

  31. Steve Bloom
    Posted Dec 4, 2006 at 3:42 PM | Permalink

    Re #20: That amounts to a pretty good set of reasons why the funders (the NSF and Congress here in the U.S.) won’t pay for all the stuff Dan suggests. Lacking infinite resources, the effort will continue to go into advancing the state of the science rather than fine-tuning things which will be out of date before the fine-tuning process can even be completed. The moment you hear modelers begin to claim they have something approaching a final version (a point which is decades away), that can and likely will change. In the meantime, public policy can and will be made based on partial results.

  32. Steve Bloom
    Posted Dec 4, 2006 at 3:44 PM | Permalink

    Re #28: That’s why it’s important to keep linkin’ to reliable sources, Rev!

  33. Barclay E. MacDonald
    Posted Dec 4, 2006 at 4:09 PM | Permalink

    Mr. Bloom, more correctly I think you mean “In the meantime, public policy can and will [continue to] be made based on …[erroneous] results.”

  34. Willis Eschenbach
    Posted Dec 4, 2006 at 5:17 PM | Permalink

    Steve B., you say:

    Re #20: That amounts to a pretty good set of reasons why the funders (the NSF and Congress here in the U.S.) won’t pay for all the stuff Dan suggests. Lacking infinite resources, the effort will continue to go into advancing the state of the science rather than fine-tuning things which will be out of date before the fine-tuning process can even be completed. The moment you hear modelers begin to claim they have something approaching a final version (a point which is decades away), that can and likely will change. In the meantime, public policy can and will be made based on partial results.

    Estimates of climate spending in the US are on the order of $40 billion (with a “b”) over the last 20 years. If we have money to create the models, we have money to be sure that they are working, particularly when we are betting trillions (with a “t”) of dollars on the result.

    You seem to think that software validation and quality assurance is somehow “fine-tuning” … that was my best laugh this morning, almost snorted my coffee up my nose …

    You say that “public policy can and will be made on partial results”, which is no different than it has ever been. I have no problem with that.

    What I have a problem with is making trillion dollar public policy decisions based on incorrect results, particularly incorrect results which are claimed to be “science’.

    w.

  35. Steve Bloom
    Posted Dec 4, 2006 at 5:53 PM | Permalink

    Willis, as you well know the majority of that funding is for gathering data (mainly via satellites) that has other uses.

    Your assertion that the science is incorrect is where you started, I think. Models aside, there is plenty of data that is more than sufficient cause for concern. Theat you ignore it in addition to the model work, etc. says everything necessary.

  36. Steve Bloom
    Posted Dec 4, 2006 at 5:57 PM | Permalink

    Re #35: Reid, most of that is so wrong that it just serves to embarrass you.

  37. Reid
    Posted Dec 4, 2006 at 6:05 PM | Permalink

    Re #37: “Reid, most of that is so wrong that it just serves to embarrass you.”

    Embarrass me with a refutation instead of trash talk.

  38. welikerocks
    Posted Dec 4, 2006 at 6:39 PM | Permalink

    “there is plenty of data that is more than sufficient cause for concern”

    Oh know, not the “other evidences” argument again! SteveM made a whole area for that discussion some time back, and we still don’t know what they are.

  39. Willis Eschenbach
    Posted Dec 4, 2006 at 7:17 PM | Permalink

    Steve B, you say above:

    Willis, as you well know the majority of that funding is for gathering data (mainly via satellites) that has other uses.

    Your assertion that the science is incorrect is where you started, I think. Models aside, there is plenty of data that is more than sufficient cause for concern. That you ignore it in addition to the model work, etc. says everything necessary.

    I’m a bit confused here. How do you get get so confident about what I know, what I don’t know, and what I “well know”?

    In any case, we don’t have enough money to make bad decisions when we are talking about spending trillions of dollars on the results from untested models. That will be much more expensive than testing the models.

    And I see you are back to the perennial and oh-so-convincing “there’s so much evidence for AGW that I’m not going to cite any of it” argument … you keep talking as though the science were settled. If it were settled, the debate would be over … but we’re still debating. Someday you’ll have to face the fact that a lot of smart, educated folks around the world think that the human influence on the climate is a) not all that large, and b) so poorly understood that it is foolish to go hareing off after a “solution” until we know if there’s a problem and if so, how we might best deal with it.

    Part of knowing what we’re doing, obviously, is to be able to model it well. The GISS ModelE GCM, their latest and greatest, underestimates cloud cover by 13%, and has a 20W/m2 radiation error over the tropics, and thats only two of many known errors. Those are not my figures, that’s data from the people running the models.

    Even without software verification and quality assurance, using a model with errors that size to forecast the effect of an approximately 2-3 watt/m2 change a century from now, and making trillion dollar decisions based on those forecasts, is the height of hubris, or the depth of financial idiocy, or both.

    w.

  40. Steve Bloom
    Posted Dec 4, 2006 at 8:05 PM | Permalink

    Re #38: Oh why not.

    “30 years ago: The climate models claim earth is heading towards a new ice age due to human activity.”

    Reference for that? Model predictions, not some idiot reporter.

    “15 years ago: The climate models claim earth is heading towards a meltdown warming due to human activity.”

    Reference for model predictions using the term “meltdown”? Again, no idiot reporters.

    “8 years ago: Y2K – Life as we know it will end on Jan. 1, 2000”

    Interesting if so. Were any climate scientists or models involved in this prediction? If not, it’s irrelevant.

    “Global warming ended in 1998. Not a single GCM cited by the IPCC in the 1990’s predicted that. They all predicted rising temps.”

    This is correct except for the first bit about global warming ending in 1998, regarding which claiming that the trend ended at the most recent high is utterly bogus. Please cite to an academic statistical source stating that such an approach is valid. Regarding the 1998 GCM “failure,” note that they do not predict El Ninos.

    “Temps have been flat but if I used Mannian statistics I could claim there has been a step decline in temps over the last 8 years.”

    I don’t think anybody could pull that off. Let’s see the statistical work-up.

    “The antarctic is cooling and it’s ice mass is growing. Satellite data on this is irrefutable yet the GCM’s claim this isn’t be happening.”

    I’m afraid you’re misinformed in several regards. See here, here and here (noting in particular the quote from the TAR in comment 1). The upshot is that there is no clear trend in the Antarctic (other than the Peninsula, which is warming like crazy). BTW, if satellite data showing increased mass is irrefutable, why do the GRACE results show otherwise?

    “How many GCM’s predicted the sudden sharp cooling in sea surface temperature. I don’t know but I would guess it zero.”

    Wrong. There was no sudden sharp drop in sea surface temperatures. Source for your claim, please.

  41. Earle Williams
    Posted Dec 4, 2006 at 8:20 PM | Permalink

    You all really should resist Steve (The Artful Dodger) Bloom’s thread hijacking power.

    Back to software verification and validation – is ISO 9001:2000 an adequate first step?

  42. Jim Mitroy
    Posted Dec 4, 2006 at 10:33 PM | Permalink

    Given the nature of this topic, I thought I would resubmit
    something I contributed about 6 months ago. 476 syntax
    warnings seems excessive for something that may impact
    the future of the world economy.

    ********************** Old post *******************

    Since the economy of the free world apparently depends on the
    hockey stick I decided to run the Mann code multiproxy.f
    through the the FORTRAN syntax checker ftnchek.

    There were close to 100 screen fulls of warnings concerning
    mixed mode (mainly REAL*4/REAL*8) arithmetic. The tail-end
    of the output is enclosed below.

    This is what I got

    ……………………………………………………..
    ……………………………………………………..
    Warning near line 2960 col 13 file multiproxy.f: real G
    promoted to real*8 S: may not give desired precision
    3012 570 RETURN
    Error near line 3012 file multiproxy.f: missing END statement
    inserted at end of file

    2 syntax errors detected in file multiproxy.f
    476 warnings issued in file multiproxy.f

    Warning: Subprogram CSVD argument usage mismatch at position 6:
    Dummy arg IP in module CSVD line 2749 file multiproxy.f is used before set
    Actual arg IP0 in module %MAIN line 1138 file multiproxy.f is not set

    ************************** END ftnchek output ********************

    I really would be happier if decisions concerning the future of
    the world economy did not depend on whether the compiler sets
    unitialized variables to zero.

  43. Paul Penrose
    Posted Dec 4, 2006 at 10:54 PM | Permalink

    I just love it when some clueless twit that has never written software using a requirements/validation type process states that it’s too expensive or would take too long. How would they know? Hey, clueless twits: I’ve done it for many years and I can attest that it works and is no more than 10-20% more expensive than traditional methods, however the code is of a much higher quality. While this may seem like alchemy, it is not, because so much time is saved in the debugging and testing phases. Yes, it actually makes testing easier. So there are really no excuses for not doing it this way, except ignorance.

  44. welikerocks
    Posted Dec 5, 2006 at 5:44 AM | Permalink

    #41 (everyone else ignore, sorry Earle)

    Bloom says OT

    Wrong. There was no sudden sharp drop in sea surface temperatures. Source for your claim, please.

    new data show ocean cooling
    GEOPHYSICAL RESEARCH LETTERS, VOL. 33, L18604, doi:10.1029/2006GL027033, 2006 “Recent Cooling of the Upper Ocean”
    Recent Cooling of the Upper Ocean
    John M. Lyman Josh K. Willis and Gregory C. Johnson

    Abstract. We observe a net loss of 3.2 (± 1.1) àƒ’€” 10
    22 J of heat from the upper ocean
    between 2003 and 2005. Using a broad array of in situ ocean measurements, we present annual estimates of global upper-ocean heat content anomaly from 1993 through 2005. Including the recent downturn, the average warming rate for the entire 13-year period is 0.33 ± 0.23 W/m2
    (of the Earth’s total surface area). A new estimate of sampling error in the heat content record suggests that both the recent and previous global cooling events are significant and unlikely to be artifacts of inadequate ocean sampling

    Just google “ocean cooling”

  45. Richard Greenacre
    Posted Dec 5, 2006 at 9:36 AM | Permalink

    Re #43

    Jim,

    The implications from your syntax checker output are dreadful. I hate to think what has happened to the accuracy of any input data with all of the truncation that may be going on. As for the uninitialsised variables – words fail me.

    Re #44

    Paul,

    I completely agree with you. Quality software is usually much cheaper in the long run than most of the ‘kitchen table’ stuff. It’s also much easier to correct if you find that some of your initial assumptions are in error. As has been said elsewhere in this thread it is the validation and verification processes that are the key components.

  46. Steve Sadlov
    Posted Dec 5, 2006 at 10:36 AM | Permalink

    RE: #44 and 46 – Indeed, the decrement in total lifecycle cost of ownership of high quality software pays for the development cost increment and then some. And then, ethically, software used to determine macroeconomic taxation schemes and regulations, ought to also be of a high quality, simply because the matter at hand is so grave.

  47. George Crews
    Posted Dec 5, 2006 at 11:06 AM | Permalink

    Hi Dan,

    There has been considerable scientific software written and used for the DOE’s Yucca Mountain Project — the proposed nuclear waste repository located 100 miles north of Las Vegas, Nevada. The quality assurance requirements imposed on the scientific software important to waste isolation, such as the finite-element climate and water infiltration codes for example, are very similar to what you have listed above.

    As can be seen from examining some of the massive amount of documentation such as emails and QA audits publicly available for the project, adoption of these software quality assurance requirements by our national lab scientists has not been completely without difficulty.

    Of course, this is not surprising. For example, most of us would say that the pencil was the greatest tool ever invented. Only a few of us would say it was the eraser.

    George Crews

  48. Mike T
    Posted Dec 5, 2006 at 11:07 AM | Permalink

    RMI says:

    And if climate is non-linear, is it possible to predict it using GCMs, even
    disregarding the sensitivity to initial conditions issue? My point is that the
    GCMs use approximate numerical integration methods to solve the governing
    differential equations in space and time. But these methods assume linear
    evolution of the integrated variables over each integration step. By making the
    integration intervals sufficiently small can such linear solutions reveal
    non-linear effects? Or do these linearizations forever preclude non-linear
    predictions?

    There are many methods to solver nonlinear ODE’s/PDE’s, some of them use linear approximations in time/space and some do not. Even if the methods are linear, they can still track any nonlinear effects, although in general the step size will have to be smaller (in space and/or time). Of course there is no guarantee that your numerical method is as accurate as you think, so more work needs to be done to ensure you’re getting a reasonable solution.

    One other question, does anyone think R follows this process? If they do not, does that invalidate the analysis done by Steve?

  49. Earle Williams
    Posted Dec 5, 2006 at 12:18 PM | Permalink

    re #45
    rocks,

    Steve B’s one valid point in his prior post #41 was that issue. What he so eloquently didn’t say was that you are conflating the upper ocean with the surface sea temperature. They are distinct measures and one showed significant cooling (Lyman et al, 2006) and the other apparently has not. But rather than advance clarity in a dialog he instead lets things wander off course…

  50. Loki on the run
    Posted Dec 5, 2006 at 12:27 PM | Permalink

    Re 45:

    And here is the politically correct take on ocean cooling: Short-Term Ocean Cooling Suggests Global Warming ‘Speed Bump’

    “This research suggests global warming isn’t always steady, but happens with occasional ‘speed bumps’,” said Josh Willis, a co-author of the study at NASA’s Jet Propulsion Laboratory, Pasadena, Calif. “This cooling is probably natural climate variability. The oceans today are still warmer than they were during the 1980s, and most scientists expect the oceans will eventually continue to warm in response to human-induced climate change.”

  51. Dan Hughes
    Posted Dec 5, 2006 at 3:53 PM | Permalink

    The statements that V&V and SQA cannot be successfully completed are simply incorrect. As others have pointed out it is a true fact that these activities are SOP for many software projects. Additionally, these activities are themselves an industry, taught in university, the subject of seminars, reports, peer-reviewed papers, and books. And they employ thousands of people throughout many different industries. All software the results of which might impact the health and safety of the public, or selected individuals performing dangerous activities, are required to have theses activities applied. Several examples have been noted in the comments above. If the results of AOLGCM models/codes are to be included in decisions processes about consumption of fossil-based fuels, something that will impact the health and safety of millions of people, V&V and SQA should be required.

    As for statements to the effect that AOLGCM models/codes are just too complex for these activities to be applied, consider that operation/control software for some aircraft run to millions of lines of code. No one has said that it is an easy job. The planning alone for how to Validate complex software in which inherently complex physical phenomena and processes are the subject is not an easy task. Yet it is done all the time.

    Some have mentioned that the AOLGCM models/codes are basically research codes in the sense that they are constantly under development and changing. All useful software is constantly under development and changing. Software for which this is not the state is software that is not being used. An extremely simple solution is available that successfully handles this situation. One version of the software, to which independent V&V has been applied, and maintained and applied under the appropriate SQA plans, provides the production versions of the code; often times called the frozen version. Other versions are used as the research/development tools. After the models and methods that are under development and changing are ready, a new production version is produced. It happens all the time and you’ve seen it in action in the software that you use on your personal computer. Not that a high level of V&V and SQA is always applied to PC software. However, some PC software is held to very high standards. Of course many models/codes that at one time needed the big-iron mainframes now run on PCs. Those that required V&V and SQA still require these.

    V&V and SQA models/codes will have always have fewer errors than codes that have not been subjected to the activities. Additionally, as others have noted, the absence of a requirement for V&V and SQA will always mean that maintaining the code and correcting errors will be more expensive, and in some cases of legacy software sometimes impossible. Application of formal independent V&V and SQA procedures and plans would very likely reduce the number of entries here for example. And cut some of the pages in an appendix devoted to code problems as shown in a draft that I can no longer find.

    Someone mentioned the R code. It is up to the customer to decide what level of V&V and SQA is necessary for their application. Also notice that R follows the two-version approach as it is being constantly updated, yet there is a version for production use. If R users decide that formal independent V&V was necessary for their applications they get in touch with the R developers. One of the first inquires of the customers to the developers might be to see the SQA plan for the R software.

    And this brings up other points. It is not just the large AOLGCM models/codes that should be subjected to formal independent V&V and SQA. All software the results of which might effect decisions about public policy, such as the health and safety of the public, will ultimately be required to undergo formal independent V&V and be maintained un an approved SQA plan. Software that is used in the reduction and analysis of measured data, for example, software used in reconstructions of the past temperature is another example.

  52. welikerocks
    Posted Dec 5, 2006 at 5:19 PM | Permalink

    #50

    What he so eloquently didn’t say was that you are conflating the upper ocean with the surface sea temperature.

    Thank you Earle 🙂 hey Bloom see how fair and balanced! 🙂
    But which ocean specifically are you talking about? [j/k] LOL. I am OT again and I have more to say/ask on the subject, but I’ll save it for the Road Map when it’s open again or something. Cheers!

  53. RMI
    Posted Dec 6, 2006 at 1:38 AM | Permalink

    Re 26,Bender, thank you for your answer and offer of the script. I onced tried this
    myself on missile telemetry data and encountered data problems.
    It does take a lot of data with, I believe, well understood noise content.

    By “bickering” I was referring to arguments sometimes seen (at RC, I believe)
    that while it’s agreed that weather is chaotic, it’s claimed that climate
    is not,and is therefore predictable. Is there a definitive conclusion to this discussion?

  54. Steve Sadlov
    Posted Dec 6, 2006 at 9:54 AM | Permalink

    RE: #51 – I see, said the visually impaired person (a bit of self deprecating humor – I get my first eyeware prescription next week …. not bad for an over-the-hill 40-something … I digress) …..

    So let me get this straight. The reputed speed bump of ocean cooling can be explained away by innate variation of the system, but, the reputed killer AGW, which, based on all creditable results, is within the envelope of historical innate variation of the system, cannot be considered to be *possibly* due to a mechanism (or set of them) within the innately varying subsystems of the system? Fascinating.

    Here is a paper that some young and ambitious (and, slightly masochistic! LOL) PhD candidate ought to consider. It is a two part deal:
    1) A massive MSA (Measurement System Analysis) using Sigma best practices, on the instrument network (present and going back into the past) and all the various proxies. Ideally, world renowned experts from the Six Sigma community (e.g. Black Belts) would be brought in to assist and as coauthors.
    2) A thorough characterization, using the MSA in 1 above, along with the actual data in the instrument record and proxies, of the expected level of innate variation in selected parameters thought to indicate the behavior of the climate system. Temperature, precipitation, others – areal global distributions and variations thereof, temporal variations, etc. As part of this, a good amount of auditing of the instrument record and of claimed grid cell maps (I’m thinking of Steve M’s Rain in Spain, or is it Maine, or, La Seine? as an example …).

    This is actually a huge research project, possibly providing disertation fodder for multiple PhD candidates. It could really change the world.

  55. Stan Palmer
    Posted Dec 6, 2006 at 12:53 PM | Permalink

    re 52 and some others

    Critics of the V&V idea commonly say that it is unnecessary and expensive. GCMs and other software can be written and observed in operation to see if it is functioning properly. There was a recent example on this blog in which the difficulties with this practice were exposed. Dr. Juckes ran some of Steve McIntyre’s code and obtained anomalous results. He asked Steve why his code did not produce the charts that Steve indicated they did. Others tried to duplicate this and while most obtained Steve’s results otehrs did not. Testing indicated that his was because of some odd behavior of R with some Linux distributions. Dr. Juckes later reported that he had discovered that the Linux distribution he was using had difficulty accessing some data off the Internet and instead of properly dealing with it returned some arbitrary and incorrect values and carried on. Dr. Juckes received incorrect results not because of his errors but because of an unreported bug in a piece of software that he was relying on.

    Now in this case, the error was noticed and handled by a collective effort. However consider not a large scale GCM with a similar bug deep within some of its components. Incorrect values are returned and errored calculations are reported. There is no indication to the user of this system that anything is amiss. Results of these errored calculations are then used to inform policy decisions that will affect large portions of the world economy.

    This is a very dangerous situation. We are relying on software to make critical decisions. This software jhas not been proven to be safe but is being relied on for potentially critical decisions. An effective V&V system is not an expensive waste of time. It is an essential element.

    There is also the story of the Mars Rovers Sprit and Opportunity. NASA sent them to Mars with a memory leak in their operating system. One rolled over and died soon after it got to Mars and the other was on the verge of doing so. Luckily NASA stumbled onto a solution that save these missions. Years of work by scientists and engineers and $250 million dollars for each missions were placed critical risk because an effective V&V operation was not performed. The mission was supposed to be “better, faster and cheaper”. Cheap in the sense that a necessary function was eliminated for cost reasons with total loss of the mission being avoided only be good luck. An obvious and easily fixable bug was not caught.