John A writes: After a brief search, I found the paper “Global Warming: Forecasts by Scientists versus Scientific Forecasts“
This paper came to my attention via an article in the Sydney Morning Herald. It concerns a paper written by two experts on scientific forecasting where they perform an audit on Chapter 8 of WG1 in the latest IPCC report.
The authors, Armstrong and Green, begin with a bombshell:
In 2007, a panel of experts established by the World Meteorological Organization and the United Nations Environment Programme issued its updated, Fourth Assessment Report, forecasts. The Intergovernmental Panel on Climate Change’s Working Group One Report predicts dramatic and harmful increases in average world temperatures over the next 92 years. We asked, are these forecasts a good basis for developing public policy? Our answer is “no”.
So where is the problem? The problem, according to the authors, is that the IPCC and everyone else does not distinguish between forecasts of the opinions of experts and scientific forecasting (with emphasis):
Much research on forecasting has shown that experts’ predictions are not useful. Rather, policies should be based on forecasts from scientific forecasting methods. We assessed the extent to which long-term forecasts of global average temperatures have been derived using evidence-based forecasting methods. We asked scientists and others involved in forecasting climate change to tell us which scientific articles presented the most credible forecasts. Most of the responses we received (30 out of 51) listed the IPCC Report as the best source. Given that the Report was commissioned at an enormous cost in order to provide policy recommendations to governments, the response should be reassuring. It is not. The forecasts in the Report were not the outcome of scientific procedures. In effect, they present the opinions of scientists transformed by mathematics and obscured by complex writing. We found no references to the primary sources of information on forecasting despite the fact these are easily available in books, articles, and websites. We conducted an audit of Chapter 8 of the IPCC’s WG1 Report. We found enough information to make judgments on 89 out of the total of 140 principles. The forecasting procedures that were used violated 72 principles. Many of the violations were, by themselves, critical. We have been unable to identify any scientific forecasts to support global warming. Claims that the Earth will get warmer have no more credence than saying that it will get colder.
Armstrong and Green further point out that those principles of forecasting sometimes run counter to what most people, scientists included, expect. They also point to various failings of scientists who regard themselves as experts (with some emphasis added):
…here are some of the well-established generalizations for situations involving long-range forecasts of complex issues where the causal factors are subject to uncertainty (as with climate):
‘€¢ Unaided judgmental forecasts by experts have no value. This applies whether the opinions are expressed by words, spreadsheets, or mathematical models. It also applies regardless of how much scientific evidence is possessed by the experts. Among the reasons for this are:
a) Complexity: People cannot assess complex relationships through unaided observations.
b) Coincidence: People confuse correlation with causation.
c) Feedback: People making judgmental predictions typically do not receive unambiguous feedback they can use to improve their forecasting.
d) Bias: People have difficulty in obtaining or using evidence that contradicts their initial beliefs. This problem is especially serious for people who view themselves as experts.
‘€¢ Agreement among experts is weakly related to accuracy. This is especially true when the experts communicate with one another and when they work together to solve problems. (As is the case with the IPCC process).
‘€¢ Complex models (those involving nonlinearities and interactions) harm accuracy because their errors multiply. That is, they tend to magnify one another. Ascher (1978), refers to the Club of Rome’s 1972 forecasts where, unaware of the research on forecasting, the developers proudly proclaimed, “in our model about 100,000 relationships are stored in the computer.” (The first author [Amrstrong] was aghast not only at the poor methodology in that study, but also at how easy it was to mislead both politicians and the public.) Complex models are also less accurate because they tend to fit randomness, thereby also providing misleading conclusions about prediction intervals. Finally, there are more opportunities for errors to creep into complex models and the errors are difficult to find. Craig, Gadgil, and Koomey (2002) came to similar conclusions in their review of long-term energy forecasts for the US made between 1950 and 1980.
‘€¢ Given even modest uncertainty, prediction intervals are enormous. For example, prediction intervals expand rapidly as time horizons increase so that one is faced with enormous intervals even when trying to forecast a straightforward thing such as automobile sales for General Motors over the next five years.
‘€¢ When there is uncertainty in forecasting, forecasts should be conservative. Uncertainty arises when data contain measurement errors, when the series is unstable, when knowledge about the direction of relationships is uncertain, and when a forecast depends upon forecasts of related (causal) variables. For example, forecasts of no change have been found to be more accurate for annual sales forecasts than trend forecasts when there was substantial uncertainty in the trend lines (e.g., Schnaars & Bavuso 1986). This principle also implies that forecasters reverting to long-term trends when such trends have been firmly established, they do not waver, and there are no firm reasons to suggest that the trends will change. Finally, trends should be damped toward no change as the forecast horizon increases.
Of course, this isn’t the behavior that a lot of us have seen from the IPCC. A lot of the criticism levied at the IPCC was that the forecasts were too conservative, rather than the reverse.
Armstrong and Green don’t exactly endorse the notion of “scientific consensus” since its is clear to them that such things when they happen in close groups of people working in the same general field, tend to reinforce the bias rather than remove it. I seem to remember Edward Wegman saying much the same thing about group reinforcement.
What of forecasting by experts? Well it turns out that this appears to be no more a guide to the future than asking your mates down the pub:
The first author’s [Armstrong’s] review of empirical research on this problem led to the “Seer-sucker theory,” stating that, “No matter how much evidence exists that seers do not exist, seers will find suckers” (Armstrong 1980). The amount of expertise does not matter beyond a basic minimum level. There are exceptions to the Seer-sucker Theory: When forecasters get substantial amounts of well-summarized feedback about the accuracy of their forecasts and about the reasons why the forecasts were or were not accurate, they can improve their forecasts. This situation applies for short-term (e.g., up to five days) weather forecasts, but it does not apply to long-term climate forecasts.
Research since 1980 has added support to the Seer-sucker Theory. In particular, Tetlock (2005) recruited 284 people whose professions included, “commenting or offering advice on political and economic trends.” He asked them to forecast the probability that various situations would or would not occur, picking areas (geographic and substantive) within and outside their areas of expertise. By 2003, he had accumulated over 82,000 forecasts. The experts barely if at all outperformed non-experts and neither group did well against simple rules.
This method of forecasting by expert opinion was very popular in the 1970s in climate science:
In the mid-1970s, there was a political debate raging about whether the global climate was changing. The United States’ National Defense University addressed this issue in their book, Climate Change to the Year 2000 (NDU 1978). This study involved 9 man-years of effort by Department of Defense and other agencies, aided by experts who received honoraria, and a contract of nearly $400,000 (in 2007 dollars). The heart of the study was a survey of experts. It provided them with a chart of “annual mean temperature, 0-800 N. latitude,” that showed temperature rising from 1870 to early 1940 then dropping sharply up to 1970. The conclusion, based primarily on 19 replies weighted by the study directors, was that while a slight increase in temperature might occur, uncertainty was so high that “the next twenty years will be similar to that of the past” and the effects of any change would be negligible. Clearly, this was a forecast by scientists, not a scientific forecast. However, it proved to be quite influential. The report was discussed in The Global 2000 Report to the President (Carter) and at the World Climate Conference in Geneva in 1979.
Such was the state of the art back then, but now with the advent of personal computers, canvassing experts to report their impressions of data has been transformed through the use of computer models. But are they any better at forecasting?
The methodology used in the past few decades has shifted from surveys of experts’ opinions to the use of computer models. However, based on the explanations that we have seen, such models are, in effect, mathematical ways for the experts to express their opinions. To our knowledge, there is no empirical evidence to suggest that presenting opinions in mathematical terms rather than in words will contribute to forecast accuracy. For example, and Keepin and Wynne (1984) wrote in the summary of their study of the IIASA’s “widely acclaimed” projections for global energy that, “Despite the appearance of analytical rigour… [they] are highly unstable and based on informal guesswork”.
All right, that was the 1980s. What about much more recently?
Carter, et al. (2006) examined the Stern Review (Stern 2007). They concluded that the Report authors made predictions without any reference to scientific forecasting.
I’m sure there’s lots more to be said about Stern’s methodology in other areas but we must press on
Pilkey and Pilkey-Jarvis (2007) concluded that the long-term climate forecasts that they examined were based only on the opinions of the scientists. The opinions were expressed in complex mathematical terms. There was no validation of the methodologies. They referred to the following quote as a summary on their page 45: “Today’s scientists have substituted mathematics for experiments, and they wander off through equation after equation and eventually build a structure which has no relation to reality. (Nikola Telsa, inventor and electrical engineer, 1934.)”
I assume the reference to Nikola Tesla isn’t meant to be complimentary.
Carter (2007) examined evidence on the predictive validity of the general circulation models (GCMs) used by the IPCC scientists. He found that while the models included some basic principles of physics, scientists had to make “educated guesses” about the values of many parameters because knowledge about the physical processes of the earth’s climate is incomplete. In practice, the GCMs failed to predict recent global average temperatures as accurately as simple curve-fitting approaches (Carter 2007, pp. 64 — 65) and also forecast greater warming at higher altitudes when the opposite has been the case (p. 64). Further, individual GCMs produce widely different forecasts from the same initial conditions and minor changes in parameters can result in forecasts of global cooling (Essex and McKitrick, 2002). Interestingly, modeling results that project global cooling are often rejected as “outliers” or “obviously wrong” (e.g., Stainforth et al., 2005)
Was Stainforth et al a reference to that ridiculous modelling exercise where they emphasized the top end 11C rise without mentioning all of the ones that fell into deep cooling? Yes it was. Obviously Stainforth knows which ones are outliers and therefore “obviously wrong” and which are not, because he’s an expert.
Taylor (2007) compared seasonal forecasts by New Zealand’s National Institute of Water and Atmospheric Research with outcomes for the period May 2002 to April 2007. He found NIWA’s forecasts of average regional temperatures for the season ahead were, at 48% correct, no more accurate than chance. That this is a general result was confirmed by New Zealand climatologist Dr Jim Renwick, who observed that NIWA’s low success rate was comparable to that of other forecasting groups worldwide. He added that “Climate prediction is hard, half of the variability in the climate system is not predictable, so we don’t expect to do terrifically well.” Dr Renwick is an author on Working Group I of the IPCC 4th Assessment Report, and also serves on the World Meteorological Organisation Commission for Climatology Expert Team on Seasonal Forecasting; His expert view is that current GCM climate models are unable to predict future climate any better than chance
Now clearly this is a serious problem with climate modelling on a regional level, but is it being reported that regional climate forecasts for even three months ahead do no better than flipping a coin?
Then there’s the Hurricane Forecasting Débacle of 2006:
…the US National Hurricane Center’s report on hurricane forecast accuracy noted, “No routinely-available early dynamical model had skill at 5 days” (Franklin 2007). This comment probably refers to forecasts for the paths of known, individual storms, but seasonal storm ensemble forecasts are clearly no more accurate. For example, the NHC’s forecast for the 2006 season was widely off the mark. On June 7, Vice Admiral Conrad C. Lautenbacher, Jr. of the National Oceanic and Atmospheric Administration gave the following testimony before the Committee on Appropriations Subcommittee on Commerce, Justice and Science of the United States Senate (Lautenbacher 2006, p. 3):
“NOAA’s prediction for the 2006 Atlantic hurricane season is for 13-16 tropical storms, with eight to 10 becoming hurricanes, of which four to six could become major hurricanes. … We are predicting an 80 percent likelihood of an above average number of storms in the Atlantic Basin this season. This is the highest percentage we have ever issued.”
By the beginning of December, Gresko (2006) was able to write “The mild 2006 Atlantic hurricane season draws to a close Thursday without a single hurricane striking the United States”.
That’s just in the first seven pages. On page 8 they begin their audit of scientific forecasting at the IPCC, and it goes downhill from there.