Willis E on Hansen and Model Reliability

Another interesting post from Willis:

James Hansen of NASA has a strong defense of model reliability here In this paper, he argues that the model predictions which have been made were in fact skillful (although he doesn’t use that word.) In support of this, he shows the following figure:

(Original caption)Fig. 1: Climate model calculations reported in Hansen et al. (1988).

The three scenarios, A, B, and C, are described in the text as follows:

Scenario A has a fast growth rate for greenhouse gases. Scenarios B and C have a moderate growth rate for greenhouse gases until year 2000, after which greenhouse gases stop increasing in Scenario C. Scenarios B and C also included occasional large volcanic eruptions, while scenario A did not. The objective was to illustrate the broad range of possibilities in the ignorance of how forcings would actually develop. The extreme scenarios (A with fast growth and no volcanos, and C with terminated growth of greenhouse gases) were meant to bracket plausible rates of change. All of the maps of simulated climate change that I showed in my 1988 testimony were for the intermediate scenario B, because it seemed the most likely of the three scenarios.

I became curious about how that prediction had held up in the years since his defense of modeling was written (January 1999). So I started looking more closely at the figure.

The first thing that I noted is that the four curves (Scenarios A, B, C, and Observations) don’t start from the same point. All three scenarios start from the same point, but the observations start well above that point … hmmm.

In any case, I overlaid his figure with the very latest, hot off the presses, HadCRUT3 data from Phil Jones at the CRU … and in this case, I started the HadCRUT3 curve at the same point where the scenarios started. Here’s the result:

Fig. 2: Climate model calculations reported in Hansen et al. (1988), along with HadCRUT3 data.

A few things are worthy of note here. One is that starting the scenarios off at the same point gives a very different result from Hansen’s.

The second is the size of the divergence. Scenario C, where greenhouse gases stop increasing in 2000, can be ignored “€? obviously, that didn’t happen. Looking at the other scenarios, the observed temperature in 2005 is a quarter of a degree C below Scenario B, and 0.6°C below Scenario A.

Finally, the observations have mostly been below both all of the scenarios since the start of the record in 1958. Since according to Hansen Scenarios A and C were "meant to bracket plausible rates of change", I would say that they have not done so.

A final note: I am absolutely not accusing James Hansen of either a scam or intellectual dishonesty, he clearly believes in what he is saying. However, he has shaded his original conclusions by starting the observational record well above where the three scenarios all start.

Mainly, the problem is that the world has not continued to heat up as was expected post 1998, while his Scenarios A and B did continue to warm. The post-1998 climate actually is acting very much like his Scenario C … except, of course, that the CO2 emissions didn’t level off in 2000 as in Scenario C.

UPDATE:
The values for Hansen’s scenarios are not archived anywhere. Willis obtained them by digitizing the graphic in the pdf file; the values are provided in comment #63 below. Willis reports that he downlowded the HadCRUT3 dataset from http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/monthly

and the GISTEMP dataset is from http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

Hansen aligned all three scenarios at the same starting point as noted by Willis, who aligned the two temperature series at the same starting value as used by Hansen. (See comment 63.) This procedure has been criticized by Tim Lambert.


174 Comments

  1. John A
    Posted Aug 26, 2006 at 4:30 PM | Permalink

    A final note: I am absolutely not accusing James Hansen of either a scam or intellectual dishonesty, he clearly believes in what he is saying

    Then Hansen is deluded. Who’s going to tell him?

    However, he has shaded his original conclusions by starting the observational record well above where the three scenarios all start.

    Not for the first time in his career James Hansen has cherrypicked his end-dates and cut-offs for maximum effect.

    Mainly, the problem is that the world has not continued to heat up as was expected post 1998, while his Scenarios A and B did continue to warm. The post-1998 climate actually is acting very much like his Scenario C … except, of course, that the CO2 emissions didn’t level off in 2000 as in Scenario C.

    ..or maybe the reality is that his models are far too sensitive to carbon dioxide forcing and have insufficient negative feedbacks.

    In any case, the observations have invalidated his model forecasts. In normal science, this would lead to a retraction or a modification of his theory in mitigation. But this isn’t normal science.

  2. nanny_govt_sucks
    Posted Aug 26, 2006 at 5:36 PM | Permalink

    Even if you graft the hadCRUT3 data to the end of Hansen’s observed data (red line) you still get observations that fall below Scenario “C”, as the latter shows a sharp increase in 2006 while observations don’t.

  3. Ian Castles
    Posted Aug 26, 2006 at 5:55 PM | Permalink

    Re #2. True. In fact the GISS observations (“Table of global-mean monthly, annual and seasonal land-ocean temperature index”) show mean temperature in the first seven months of 2006 as 0.10° C lower than in the corresponding months of 2005. That’s a big decrease – half the size of the cells in the vertical grid in Figs. 1 & 2.

  4. David Smith
    Posted Aug 26, 2006 at 6:03 PM | Permalink

    Any idea why Hansen’s “observed” and HADCRUT3, while similar, aren’t identical?

    Also, is Hansen et al 1988, or is it 1998?

    Thanks

  5. Phil B.
    Posted Aug 26, 2006 at 6:36 PM | Permalink

    Recently, on the Discovery Channel here in the states, there was a program with Tom Brokaw called “What you need to know about Climate Change”. Tom interviewed Jim Hansen and during that interview, Hansen essentially stated without any caveats that the (his team?) modeling predicted that half the world’s species would be extinct by the turn of the century. Tom nodded knowingly and said nothing. I didn’t tape the show, so I am paraphrasing what Hansen said, but it appears that Jim has “moved on” in his modeling.

  6. Willis Eschenbach
    Posted Aug 26, 2006 at 7:49 PM | Permalink

    Re #4, the original Hansen forecasts were in 1988, followed by the paper cited above, in January of 1999.

    As an aside, the GISS dataset is very, very different from the HadCRUT3 dataset, and I have no idea why that might be.

    In particular, the difference for 2005 is very striking, with the GISS dataset showing it as much warmer than 1998, while the HadCRUT3 dataset shows it as much cooler … go figure.

    w.

  7. joshua corning
    Posted Aug 26, 2006 at 9:43 PM | Permalink

    Just for good due dillegance purposes what is the HADCRUT3 and why did you choose it?

  8. Willis Eschenbach
    Posted Aug 26, 2006 at 9:53 PM | Permalink

    Re #7, Joshua, thank you for the excellent due diligence question. HadCRUT3 is the latest version of the global temperature record maintained by Phil Jones. It is a joint project, as I understand it, of the Hadley Centre of the UK Met Office and the Climate Research Unit at East Anglia, UK. See http://www.cru.uea.ac.uk/cru/data/temperature/ for full details and data access.

    I picked it because it was not the other main temperature record, the GISSTemp dataset, maintained by among other people, James Hansen … Full details and data access for the GISSTemp dataset are available at http://data.giss.nasa.gov/gistemp/

    w.

  9. Willis Eschenbach
    Posted Aug 27, 2006 at 12:12 AM | Permalink

    Re #5, Phil, thanks for the posting, wherein you note that:

    Hansen essentially stated without any caveats that the (his team?) modeling predicted that half the world’s species would be extinct by the turn of the century.

    These predictions of extinctions are among the most bogus forecasts arising out of the models, although they originally came, not from the models, but from estimates of tropical deforestation.

    I wrote a paper on this subject which is available here. I found when researching the subject that there have been very few bird or mammal extinctions on the continents “¢’‚¬? only nine in the last 500 years, six birds and three mammals. This was a great surprise to me, with all of the hype I expected many more.

    In any case, my paper shows that there has been no increase in extinctions … have a read, it’s an interesting paper.

    w.

  10. Peter Hearnden
    Posted Aug 27, 2006 at 5:40 AM | Permalink

    Re fig’s 1 and 2, I don’t see the point of going from a graph which uses the GISS (Hansen et al) data to another which uses another updated data set – but leave the Hansen data as was. It is to compare apples with oranges.

    Surely it’s better, and simply less confusing, to update the obbserved data using updated GISS records and show that graph as Fig 2?

    Here’s one I found some time ago – C or B I’d say.

  11. MarkR
    Posted Aug 27, 2006 at 6:05 AM | Permalink

    Re#10 The graphic you show still appears to have observed temperatures starting higher than model temperatures by about 0.1C, enough to bring the smoothed line of the observable temperature to below the zero emissions plot.

  12. per
    Posted Aug 27, 2006 at 7:03 AM | Permalink

    if the prediction was published in 1988,it is woth noting that the temperatures for the three scenarios and the giss data seem to be quite close in 1988.

    I have to say I am not yet convinced by this prediction. From 1988 to 2007 it predicts a rise of ~~0.3C, but from 1988 to 2019 it will be ~0.6C.

    If you put a ruler through the same temperature data you would get the same answer; I think you need quite robust testing to have confidence in a model.

    By the way, if the modelling software was so predictive in 1988, why have we changed it ? Surely future iterations of the modelling software will give even better predcitions ?

    yours
    per

  13. Dan Hughes
    Posted Aug 27, 2006 at 7:15 AM | Permalink

    The observed and calculated initial values will be the same only if the calculated value equals the observed value. Clearly this is not the case. Additionally, the calculated initial values for the Global Average Temperature (GAT) anomalies could be determined in at least two different ways. The first would be the difference between the calculated ensemble-average (GAT)ea at the end of the control runs and the individual (GAT)cj values; these would all be different The second would be the difference between the individual model/code (GAT)cj values at the end of the control runs and its (GAT)cj; and so would be 0.0.

    One of many questions is should the initial values on the graph be shifted so as to correspond? If it is shifted then the graph does not represent the actual results of the calculations. And if the calculations had used the value corresponding to the shifted value the calculated response would have been different.

    Whatever the case, it seems to me that only the very, very rough trends of the calculations are the same as the observations; a very long-term increase. And even that doesn’t look so hot to me. The variability shown in the observations is not at all captured, even over periods of a decade. What then are the effects of this clearly evident lack of agreement on the calculated responses of local environmental, social, and health issues such as biomass, long-range weather, ice, ocean temperatures, etc.

    Finally, someone has noted that some (GAT) calculations are done with the sea surface temperatures fixed. Given the recent observations on the changes in the energy content of the oceans, this does not seem to be a good assumption.

  14. Jeff Weffer
    Posted Aug 27, 2006 at 7:26 AM | Permalink

    The starting point for the data and the models is, of course, very important because the data is plotted in a comparison chart.

    If the model runs start in 1958?, then the observed temperature data trend should start at 0 in 1958 and so should the models.

    However, it is possible that the model runs actually start earlier than 1958 (where the temperature trend started at 0 initially as well) but Hansen did not show the earlier than 1958 simulations and observations.

    I don’t know if anyone can answer this, but why do the HadCRUT3 and GISSTemp datasets differ so much in recent years?

  15. David Smith
    Posted Aug 27, 2006 at 7:49 AM | Permalink

    What would a plot, of A,B,C, HADCRUT3 and GISS, zeroed in 1988 as noted, look like?

    It would be nice to understand why the two sets of observations (GISS and HADCRUT3) differ.

  16. Dan Hughes
    Posted Aug 27, 2006 at 7:56 AM | Permalink

    Willis E, a fine piece of Validation work. Thanks for your efforts.

  17. Posted Aug 27, 2006 at 8:31 AM | Permalink

    HADCRUT3 measures the anomaly from the 1961-1990 mean, while Hansen’s scenarios used the anomaly from the 1951-1980 mean. The 1961-1990 mean is higher than the 1951-1980 mean. Furthermore, Willis has not plotted HADCRUT3 accurately – all his numbers are too low by about 0.05 degrees.

    When you compare apples with apples and not Willis’s oranges, Hansen’s model has been eerily correct.

  18. Tim Ball
    Posted Aug 27, 2006 at 9:58 AM | Permalink

    Thank you Willis;
    The answer to #15 is we need Jones to disclose his method but he has consistently refused to do so.
    This is only the start of the problems with the models. A few years ago I noted the discrepancy between the GISS and HADCRUT3 models was 0.5°C in one year and that this was equivalent to Jones’ estimate for increase of the GAT in approximately 130 years. Nobody was interested, the models were in full flight scaring the world and driving government policy. The data problem is only the first of many severe limitations of the models, but probably the most signifiant because they are basis of the entire cubic construct. Here you are only talking abut the surface data (actually Stevenson Screen measures are not the surface and in many instances above the cirtical boundary layer), consider the problems with anything above the surface where there is virtually no data.
    Consider this quote from an article in Science Vol.313 (4 August 2006) “Waiting for the Monsoons” about an attempt by the models to forecast precipitation for Africa, “One obvious problem is a lack of data. Africa’s network of 1152 weather watch stations, which provide real-time data and supply international climate archives, is just one-eighth the minimum density recommended by the World Meteorological Organization (WMO). Furthermore, the stations that do exist often fail to report.” As I keep repeating, probably ad nauseum for some, but I won’t stop until the severe problems are exposed, Canada has less weather stations now than in 1960 and many of those retained were equipped with unreliable Automatic Weather Observing Stations (AWOS). Warwick Hughes has documented the data problem more completely than anyone to my knowledge. I suspect you can add most of Asia, South America and Antarctica to this list not to mention the oceans. Indeed, I would like to know what percentage of the globe meets the WMO minimum density requirement. Maybe now thanks to your work more people will begin to examine the complete inadequacy of the models because of the data base on which they are constructed and then move on to look at the unjustified assumptions, grossly inadequate mechanisms incorporated and manipulations apparently more designed to achieve a predetermined outcome.
    Before anyone starts bleating about the value of models let me say they have a place in the laboratory where there is a scientific responsibility. I would argue that what Wilis is exposing appears to suggests this is not being met. The problem becomes worse when you go public and let the policymakers believe the models work, have credibillity, and can be the basis of global policy. And again before anyone starts bleating about the warnings put on the model output let me say that most of the public and the politicians have no understanding of the process or the warnings, especially when they are accompanied by ‘end of the world’ messages.

  19. David Smith
    Posted Aug 27, 2006 at 10:34 AM | Permalink

    An interesting chart would be to average each data set over, say, five-year periods, and, using 1959-63 as the zero point, see what a combined plot looks like.

    I would do this but I cannot read the data points on the graphs very well.

    The data groups would be updated GISS, HADCRUT3, scenarioA, scenarioB and scenarioC.

    I tried this for 1985-89 and 2000-2004 and got actuals running a bit below scenarioC (the no-CO2-growth scenario). But, again it is hard to read the points.

  20. Mark H.
    Posted Aug 27, 2006 at 12:21 PM | Permalink

    Obviously I am not getting it – the charted trendlines all seem to start around 1958, and seem pretty close – I don’t see the selective bias.

    In any case, of more importance (to me) are the trend lines. It would seem the current models track one data set somewhat OK (GISS, “B”) and, presumably, their range of assumptions could be tweaked to follow HADCRUT3 as well. The questions that remains: 1) What is the “real” data set and 2) Are the slopes of increase between the two sets the same?

    These two data sets seem to be the range of reasonable prediction?

  21. per
    Posted Aug 27, 2006 at 12:54 PM | Permalink

    I have plotted out giss, hadcrut and crutem, and the difference between giss and the two hadley datasets. There is a bigger difference opening up between giss and hadcrut, presumably because giss and crutem are just over land, whereas hadcrut includes sea temperatures.
    I have posted the graphic on to steve if he wants to incorporate it.
    yours
    per

  22. David Smith
    Posted Aug 27, 2006 at 1:13 PM | Permalink

    Re #20 To me, the chart presented by Hansen, taken at a glance, indicates that temperature rises are running at about a scenario B rate, indicating the models have it “about right”.
    If the base is adjusted such that all data starts at the same point that chart, at a glance, would show GISS running below B and show HADCRUT3 running well below B. The visual impact is different.
    As a side note, I’m surprised that scenario C (apparently) levels out only five years after CO2 growth ends. I thought there is a longer lag, due to the oceans.

  23. Willis Eschenbach
    Posted Aug 27, 2006 at 1:19 PM | Permalink

    Re # 17, Tim, thanks for your contribution. You say:

    HADCRUT3 measures the anomaly from the 1961-1990 mean, while Hansen’s scenarios used the anomaly from the 1951-1980 mean. The 1961-1990 mean is higher than the 1951-1980 mean. Furthermore, Willis has not plotted HADCRUT3 accurately – all his numbers are too low by about 0.05 degrees.

    When you compare apples with apples and not Willis’s oranges, Hansen’s model has been eerily correct.

    As I stated very clearly, I have not plotted the HadCRUT3 data inaccurately. Instead, I have started the observational data at the same place the scenarios start. You could say this puts the HadCRUT data too low … or you could say that puts the scenarios too low … but if you want to compare them, you have to start everyone off at the same place.

    Otherwise, you could just pick your offset to suit your purposes, as Hansen has done. To compare apples with apples means that we start everyone out at the same temperature, and see where the observations and the scenarios go from there. That is what I have done.

    Remember, these are anomalies, not actual temperatures. The starting point is arbitrary. We are interested in comparing the trends, not the absolute values, and to do that, we need to start all of them from the same point.

    This is implicitly shown by the fact that all three scenarios start from the same identical temperature. If your analysis were correct, Tim, they would all start from different temperatures. Hansen is the one comparing apples and oranges, by starting off from a much warmer temperature than the scenarios.

    w.

  24. Willis Eschenbach
    Posted Aug 27, 2006 at 2:12 PM | Permalink

    Re 10, Peter, you raise an interesting question when you present Hansen’s updated graph, and ask why not use that one?

    Three reasons:

    1) That updated graph repeats the earlier error of starting too high. This puts it way out of position.

    2) The GISS dataset contains an anomalously high value for 2005. This was because they (coincidentally?) changed their dataset at the end of 2005.

    Jay Lawrimore, chief of NOAA’s climate monitoring branch, believes 2005 will be very close to 1998, the warmest year on record for the nation.

    “In fact it’s likely to only be second warmest according to the data set we are currently using as our operational version,” he told National Geographic. “(But) an improved data set for global analyses currently undergoing final evaluation will likely show 2005 slightly warmer than 1998.

    Me, I’m suspicious of “improved” datasets that (coincidentally?) say that things are even warmer than we thought … especially when they say:

    Our analysis differs from others by including estimated temperatures up to 1200 km from the nearest measurement station.

    Right … I take the temperature in London … and from that you can estimate the temperature in Italy … right …

    The third reason is that the model which Hansen is using is tuned to the GISS dataset. Therefore, we should not be surprised that it reproduces the data up until 1988, as it is tuned to do so. But that close correspondence proves nothing. I wanted to see how it would compare to the other major temperature dataset, HadCRUT3.

    That’s the three reasons I don’t use Hansen’s graph …

    w.

    PS – A final reason … the GISS temperature dataset is designed, analysed, evaluated, and maintained by James Hansen. Using that temperature dataset to verify Hansen’s claims about his models leads to the obvious inferences of data manipulation, particularly given their drive to have 2005 be the “warmest ever”. (Hansen had predicted in February 2005 that 2005 would be the warmest year ever … then, the old analysis showed that it wasn’t the warmest ever … and they changed to the new analysis which showed it was the warmest ever. Coincidence? Perhaps.)

    While these inferences may not be true, I thought it best to use another dataset for the analysis, to avoid such problems.

  25. Willis Eschenbach
    Posted Aug 27, 2006 at 2:14 PM | Permalink

    Re #11, per, you say:

    if the prediction was published in 1988,it is woth noting that the temperatures for the three scenarios and the giss data seem to be quite close in 1988.

    No, it means nothing, because the model is tuned to reproduce that data.

    w.

  26. per
    Posted Aug 27, 2006 at 2:14 PM | Permalink

    RE: #22
    i think one point is that there is a different trend on land, versus (land + sea). If you compare the average difference between giss and crutem3 from 61 to 05, it is +0.067+/-0.064, whereas the difference for giss and hadcrut3 is 0.13 +/-0.06.

    it is clear that for global temperature that you want to have sea +land, but since giss is only for land, that is the data set hansen has, and therefore uses.
    per

  27. per
    Posted Aug 27, 2006 at 2:19 PM | Permalink

    No, it means nothing, because the model is tuned to reproduce that data

    I do not know how Hansen zeroed his predictions, and the giss data set. If there are facts here, I welcome them. What I was suggesting was that Hansen set the zero and the models to be very close in 1988, which is when the prediction was made. I am not suggesting that “means anything”, merely this is what his zeroing convention might be.

    yours
    per

  28. Willis Eschenbach
    Posted Aug 27, 2006 at 2:26 PM | Permalink

    Per, you say (#20)

    There is a bigger difference opening up between giss and hadcrut, presumably because giss and crutem are just over land, whereas hadcrut includes sea temperatures.

    Per, GISS maintains 2 datasets, one of just land, and one of combined land/sea. The land/sea dataset is available at http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

    w.

  29. Roger Pielke, Jr.
    Posted Aug 27, 2006 at 2:28 PM | Permalink

    All- We discussed this over on our blog a few months ago. Hansen himself has admitted that his 1988 predictions of climate forcings overshot, hence so too did his temperature predictions. This is obvious from Figures 5A and B in this publication by Hansen et al.:

    http://pubs.giss.nasa.gov/docs/1998/1998_Hansen_etal_1.pdf

    Hansen reissued his 1988 preditions in the 1999 paper, because they overshot (otherwise why resissue them!). As well it looks like (to date at least) that his 1999 updated predictions have also overshot in terms of predicted change in aggregate forcings, though it is very early for such judgments.

    When evaluating predictions it is not enough to look only at the predicted temperature change, but also predictions for changes in various components of climate forcings that enter into the prediction. A good prediction will get the right answer and for the right reasons.

  30. Willis Eschenbach
    Posted Aug 27, 2006 at 2:41 PM | Permalink

    OK, a number of people people wanted to see what the graph looks like with the GISTEMP data rather than the HadCRUT data … so by popular demand, here it is.

    Note that there are a number of small year-by=year differences between the original GISTEMP dataset and the new analysis GISTEMP dataset. The net result is the same as with the HadCRUT3 dataset, however “¢’‚¬? the various scenarios are all too high.

    w.

  31. John A
    Posted Aug 27, 2006 at 2:47 PM | Permalink

    Hansen reissued his 1988 preditions in the 1999 paper, because they overshot (otherwise why resissue them!). As well it looks like (to date at least) that his 1999 updated predictions have also overshot in terms of predicted change in aggregate forcings, though it is very early for such judgments.

    Will we have to wait until 2009 before Hansen admits that he’s been flat wrong for all of this time?

  32. Willis Eschenbach
    Posted Aug 27, 2006 at 2:55 PM | Permalink

    Re 28, Roger, always a pleasure to have you contribute on any thread. Let me take this opportunity to recommend your excellent blog to anyone interested in climate science.

    However, you say that “Hansen himself has admitted that his 1988 predictions of climate forcings overshot, hence so did his temperature predictions”, and you say this is shown in the paper you cited. However, in the paper Hansen says:

    The climate index is strongly correlated with global
    surface temperature, which has increased as rapidly as projected
    by climate models in the 1980s.

    So it appears that he agrees that the forcing predictions were too high, but he still says the temperature rose “as rapidly as predicted” … go figure …

    Thanks again,

    w.

  33. Willis Eschenbach
    Posted Aug 27, 2006 at 3:00 PM | Permalink

    Per, thanks for the observation where you say:

    I do not know how Hansen zeroed his predictions, and the giss data set. If there are facts here, I welcome them. What I was suggesting was that Hansen set the zero and the models to be very close in 1988, which is when the prediction was made. I am not suggesting that “means anything”, merely this is what his zeroing convention might be.

    However, examination of the chart shows that the three scenarios are far apart in 1988, and are the same in 1958. This agrees with the reference given by Roger Pielke Jr. above, which shows that Hansens’ forcings start out together in 1958, and diverge from there.

    Thus, we need to start our comparison with observations in 1958.

    w.

  34. per
    Posted Aug 27, 2006 at 3:01 PM | Permalink

    Per, GISS maintains 2 datasets,…

    mea culpa, my ignorance.

    I replotted the data, but zeroing everything at 1988 arbitrarily. If you do this, all three data sets (GISS, Crutem3, Hadcrut3) come out very close in 2005. However, they are then divergent in 1961.

    per

  35. Steve McIntyre
    Posted Aug 27, 2006 at 3:04 PM | Permalink

    Here is

  36. Martin Ringo
    Posted Aug 27, 2006 at 3:08 PM | Permalink

    Let me ask a Cohn and Lins type question (maybe exactly their question — I don’t have the article so I can’t say). Suppose we do a semi-naàƒÆ’à‚⮶e (or semi-sophisticated, depending on your point of view) time-series forecast, with trend estimation, of temperature. Make the forecast coincident with the time of a GCM forecast. (Cohn and Lins, if I recall, include drivers — the forcing variables — in their analysis. What I am suggesting here is more naàƒÆ’à‚⮶e: just do the simple time-series forecast, say a ARMA(3,3), and look at the results in that context.) For example for the 1988 Hansen forecast one would use the 1880-1988 data set and forecast 1989-2005. Put the forecast confidence intervals in and compare to the actual temperature (of the specific series) and the model. Then test if the model better predicts than the time-series forecast. (Somebody wants to tell me where Hansen’s model forecast data is and I will do it.)

    From looking at the pattern of temperature and the forecast confidence interval — roughly +/- 0.3 degrees C — it does not appear that standard forecast evaluation tests on levels would lead one to pick the model over the time series forecast. However, if you include an evaluation of turning points, the results might be more interesting (particularly with the Hadley data that produces a very large AR(2) terms with insignificant AR(1) and AR(3) terms, producing a strong sawtooth pattern to the forecast).

  37. John Creighton
    Posted Aug 27, 2006 at 3:23 PM | Permalink

    #135 I bet time series work better then computer models because we have a better grasp of what the confidence intervals are for the predictions and we are less likely to make a coding error.

  38. Kenneth Blumenfeld
    Posted Aug 27, 2006 at 3:26 PM | Permalink

    Willis,

    Do you know if the delta-T is with respect to a climate “normals” period (e.g.,1931-1960, 1951-1980 etc.), or wrt a specific year? Since the standard in climatology is to use at least a 30-year averaging period, that might explain why the plot of the annual observations don’t start where you think they should. I really do not know, but I do think it would be odd to use an arbitrary year (given interannual variability) as a baseline.

    The validity of this entire discussion rests on whether each plot is supposed have the same origin. I am not certain that they are. Again, if a climatological baseline is used for comparison, then it would make sense that the first delta-t observation has a nonzero value. And since the model scenarios don’t appear to start at zero (and I can’t squint well enough to see that they even have the same starting point, as you claim), then there may be more going on than you suspect. In which case, you would have to move your “correction” back up to where Hansen put it.

    The answers may be in the 1988 paper. Maybe someone here has read it and knows the baseline?

  39. Kenneth Blumenfeld
    Posted Aug 27, 2006 at 3:34 PM | Permalink

    I have glanced at (and I mean that literally,

  40. Willis Eschenbach
    Posted Aug 27, 2006 at 4:14 PM | Permalink

    Kenneth, you say:

    I really do not know, but I do think it would be odd to use an arbitrary year (given interannual variability) as a baseline.

    Odd or not, Hansen has started his forecast in 1958. All three of the scenarios start in 1958, at exactly the same temperature. If we are to compare these scenarios to observations, what justification is there for using any starting point other than that temperature?

    w.

  41. Kenneth Blumenfeld
    Posted Aug 27, 2006 at 4:36 PM | Permalink

    Willis,

    If the temperature baseline is from a climate normals period, then there is no reason that the delta-T for the first observation should be zero. It would simply be the difference between the observed value and the climate normals value (in the case of 1958, it is positive).

    Your methodology only makes sense if the delta-t is with respect to 1958. Note that I am not saying that it is or is not. All I am saying is before you go changing a graph, you should be absolutely certain that you understand the context of it. I am admitting I do not; do you?

  42. Kenneth Blumenfeld
    Posted Aug 27, 2006 at 4:43 PM | Permalink

    BTW, apologies for (current) #38. I was saying that a glance at the 1998 paper (linked by RP Jr) indicates they used the 1951-1980 normals period for their index values. But it really only was a glance, and I can not commit to giving it more than that right now.

  43. Tim Ball
    Posted Aug 27, 2006 at 5:08 PM | Permalink

    #22 Willis
    You quote me but it is not anything I said in #17.

  44. David Smith
    Posted Aug 27, 2006 at 5:21 PM | Permalink

    Perhaps, in the end, what counts are the slopes. What I’ve eyeballed, from the graph start to the final five years, are the following delta Ts:

    Scenario A: +0.92
    Scenario B: +0.62
    Scenario C: +0.55
    adjusted GISS; +0.55 (using the starting point chosen by Hansen)
    HADCRUT3: +0.42

    So, GISS and HADCRUT3 are running at or below scenario C (the one with the end of CO2 growth as of 2000).

    Besides all of that, I have wondered how the models were able to predict a multi-year dip in temperatures around 1964 which came true. That is remarkable. Perhaps it is just chance.

    I am also surprised that temperatures seem to level off so soon after 2000, in scenario C. I expected a longer lag.

    David

  45. Willis Eschenbach
    Posted Aug 27, 2006 at 5:57 PM | Permalink

    I got to thinking about the Hansen scenarios, and I thought I’d compare their statistics to the actual observations.

    One of the comparisons we can make regards the year-to-year change in temperature. The climate models are supposed to reproduce the basic climate metrics, and the average year-to-year change in temperature is an important one. Here are the results:

    These are “boxplots”. The dark horizontal line is the median, the notches represent the 95% confidence intervals on the medians, the box heights show the interquartile ranges, the whiskers show the data extent, and the circles are outliers.

    It is obvious that the “Scenarios” do a very poor job at reproducing the year-to-year changes of the real world. Whether or not they can reproduce the trend (they don’t, but that’s a separate question), they do not reproduce the basic changes of the climate system. They should be disqualified on that basis alone.

    w.

  46. Willis Eschenbach
    Posted Aug 27, 2006 at 6:05 PM | Permalink

    Re 42, Tim Ball, I think the numbers have been messed up by retro-spanking … the post was actually from Tim Lambert. I’ll use last names always from now on

    w.

  47. Willis Eschenbach
    Posted Aug 27, 2006 at 7:06 PM | Permalink

    Re 43, David Smith, I appreciate your contributions. You ask:

    Besides all of that, I have wondered how the models were able to predict a multi-year dip in temperatures around 1964 which came true. That is remarkable. Perhaps it is just chance.

    The answer, of course, is that up until 1988 the models were tuned to the reality. It is only for the post-1988 period that they are actually forecasting rather than hindcasting.

    I just noted another oddity about the scenarios versus the data. The lag-1 autocorrelation of the scenarios is incredibly high:

    GISTEMP       0.79
    HadCRUT3       0.82

    Scenario A       0.98
    Scenario B        0.96
    Scenario Càƒ‚⠠     0.94

    I remind everyone that the modeler’s claim is that they don’t need to reproduce the actual decadal swings, that it is enough to reproduce the statistical parameters of the climate system. To quote again from Alan Thorpe:

    However the key is that climate predictions only require the average and statistics of the weather states to be described correctly and not their particular sequencing.

    In this case, they’ve done a very poor job at describing the average and statistics …

    w.

  48. Posted Aug 27, 2006 at 8:44 PM | Permalink

    Kenneth, they did use 1951-1980 as a base line for comparisons. Willis’s use of 1958 instead is erroneous, and the reason why he gets the result he does. If you do the comparisons the normal way, scenarios B and C are very close to what happened, whether you use GISS or HADCRUT3.

  49. Michael Jankowski
    Posted Aug 28, 2006 at 7:48 AM | Permalink

    Re#10-Here’s one I found some time ago – C or B I’d say.

    and Re#21 –

    Re #20 To me, the chart presented by Hansen, taken at a glance, indicates that temperature rises are running at about a scenario B rate, indicating the models have it “about right”.

    So you are picking the “correct” model scenario(s) based on the results? That’s inappropriate. The “correct” scenario is the one which meets the scenario assumption criteria, not the one that most closely matches the observed results. One has to take the scenario A, B, or C which has occurred and compare those model results with the observed temperature.

    So the questions is…which greenhouse growth scenario has most closely occurred since 1988 – A, B, or C? That’s the ONLY scenario that should be compared to the observations.
    RE#14-

    If the model runs start in 1958?, then the observed temperature data trend should start at 0 in 1958 and so should the models.

    However, it is possible that the model runs actually start earlier than 1958 (where the temperature trend started at 0 initially as well) but Hansen did not show the earlier than 1958 simulations and observations.

    I’m curious how the pre-1988 results look.

    Other posters here have argued for/zeroes observed values to match the model predictions of 1988, and I think that’s valid – especially since the y-axis is not T but delta T.

  50. Willis Eschenbach
    Posted Aug 28, 2006 at 3:07 PM | Permalink

    Re #47, Michael, you ask an interesting question when you say:

    So the questions is…which greenhouse growth scenario has most closely occurred since 1988 – A, B, or C? That’s the ONLY scenario that should be compared to the observations.

    Unfortunately, there is no easy answer to your question. Hansen’s scenarios assumed the following:

    Scenario A

    CH4 0.5% annual emissions increase
    N2O 0.25% annual emissions increase
    CO2 3% annual emissions increase in developing countries, 1% in developed

    Scenario B

    CH4 0.25% annual emissions increase
    N2O 0.25% annual emissions increase
    CO2 2% annual emissions increase in developing countries, 0% in developed

    Scenario C

    CH4 0.0% annual emissions increase
    N2O 0.25% annual emissions increase
    CO2 1.6 ppm annual emissions increase

    and the results were

    Actual

    CH4 0.5% annual emissions increase
    N20 0.9% annual emissions increase
    Developed countries CO2 emissions increase 1988-1998 -0.2%
    Developing countries CO2 emissions increase 1988-1998 4.4%
    CO2 atmospheric increase PPM (Mauna Loa 1998-2003) 1.6 ppmv

    In the event, Scenario A was closest for methane, C was closest for CO2 (up until 2000, when it leveled off), and B wasn’t closest for anything …

    However, life is never that simple. The problem is that we are mixing apples and oranges here, because we are looking at both emissions (methane, N2O, and CO2 in Scenarios A and B) and atmospheric levels (CO2 in Scenario C).

    As one example of the problem that causes, Scenario A had the highest estimate (0.5%) for the annual increase in methane emissions. This was also the best estimate.

    Now, methane emissions have been increasing radically, going from about 0.1% growth during the 1990s to over 1.0% from 2000-2005. Atmospheric methane growth, on the other hand, has been dropping steadily, from about 0.5% growth per year in the early ’90s, to about zero now. The concentration of atmospheric methane is currently about stable, neither increasing nor decreasing.

    So … which of the three scenarios best captures that change in methane? We can’t tell, because we don’t know how the model responded to the assumed growth in methane. I doubt greatly, however, that the drop in atmospheric methane to the current steady state was forecast by any of the scenarios …

    And the simple answer to your question about which scenario was closest to reality? … We don’t know.

    w.

  51. jae
    Posted Aug 28, 2006 at 4:13 PM | Permalink

    Bender and others ragged me so much about overfitting that I think I finally understand it. And it looks to me like these models are just a very complex type of overfitting. If you have to change a model every few years to reflect reality, then it seems to me that you definitely are engaging in overfitting. If the model stands the test of time, then maybe you are on to something.

  52. jae
    Posted Aug 28, 2006 at 4:19 PM | Permalink

    50. IOW, some of the scenarios almost predicted the right temperature with the WRONG DATA. LOL.

  53. Gerhard W.
    Posted Aug 28, 2006 at 4:29 PM | Permalink

    One question:
    How do the contributions of Thomas R. Karl and Patrick J. Michaels from the 2002 U.S. National Climate Change Assessment: “Do the Climate Models Project a Useful Picture of Regional Climate?” fit into this picture? The link to the hearing was recently posted here, but there were no comments…
    ~ghw

  54. Willis Eschenbach
    Posted Aug 28, 2006 at 4:52 PM | Permalink

    Re 49, Michael, you say:

    Other posters here have argued for/zeroes observed values to match the model predictions of 1988, and I think that’s valid – especially since the y-axis is not T but delta T.

    We can do that, this is a full-service thread … here are the results:

    For short-term estimates, we have a couple of comparisons that we can make. The first is just to compare the scenarios to the observations. As you can see, the observations have been running cooler than all of the scenarios for almost the whole period.

    (Curiously, Hansen said that the three scenarios were picked to “illustrate the broad range of possibilities”, but post 1888, the B and C scenarios are very similar … but I digress …)

    The other comparison we can make is to a straight linear extension of the trend of the previous decade. If the models are any good, they should perform with more skill than a straight extension of the trend.

    However, they do not perform with more skill. The results are:

    Scenario A     r^2 = 0.58
    Scenario B     r^2 = 0.34
    Scenario C     r^2 = 0.54
    Trend Only    r^2 = 0.60

    In other words, the straight line trend is better correlated with the outcome than any of the scenarios. Not exactly a resounding vote of confidence for the model and the scenarios thereof …

    w.

  55. Posted Aug 28, 2006 at 5:48 PM | Permalink

    #54 Willis, the r^2 is a bit unfair. B gets low r^2 because it’s in antiphase with observations, but it still the closest to obs. The problem is that observations include stuff such as the Pinatubo eruption and the anomalous 1998 ElNino that cannot be predicted by the models (maybe eventually ElNino, but not volcanic eruptions). Wouldn’t a better comparison use the linear trends from the 3 scenarios r-squared with observations? What is typically used as a measure of model skill?

    #51 you make an interesting point. If the models are changed all the time, then we can never really know if they are “better” than the previous versions. The skill of the model is in predicting the unknown, not reproducing the known. One could then argue that we never know if we are “improving” the models or not. That’s a kind of fundamental philosophical argument against models, similar to what Hank Tennekes did, following Karl Popper.

    Does anyone know how many parameters are needed to “tune” the model? An ideal model would have no free parameter. Can we evaluate the skill of a model by how few free parameters it has?

  56. Bob K
    Posted Aug 28, 2006 at 6:58 PM | Permalink

    The model being sponsored by BBC, (ClimatePrediction.net) has 34 adjustable parameters(3 values each)in their model. Not sure how comparable it is to other models. It sure seems to be an amazing amount of places to tune the model.

    Here’s a link to their parameters. http://www.climateprediction.net/science/parameters.php

  57. Willis Eschenbach
    Posted Aug 28, 2006 at 7:17 PM | Permalink

    RE 55, you ask, “how many parameters are needed to “tune” the model? An ideal model would have no free parameter.”

    I am reminded of Freeman Dyson’s story of taking his results to Enrico Fermi for evaluation:

    . . .[Fermi] delivered his verdict in a quiet, even voice: “There are two ways of doing calculations in theoretical physics”, he said. “One way, and this is the way I prefer, is to have a clear physical picture of the process that you are calculating. The other way is to have a precise and self-consistent mathematical formalism. You have neither.”

    . . .”To reach your calculated results, you had to introduce arbitrary cut-off procedures that are not based either on solid physics or on solid mathematics.” In desperation, I asked Fermi whether he was not impressed by the agreement between our calculated numbers and his measured numbers. “How many arbitrary parameters did you use for your calculations?” I thought for a moment about our cut-off procedures and said “Four.” He said “I remember my friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” With that, the conversation was over.

    So yes, the models can reproduce the past, and the elephant can wiggle his trunk …

    w.

  58. Jeff Weffer
    Posted Aug 28, 2006 at 7:44 PM | Permalink

    The models just predict a 0.1 to 0.4 C increase per decade. They are just computer models that spit out a particular increasing trend-line. The fact that observed temperatures happen to increase at something like the same rate is a spurious relationship.

    The models do not have the right coefficients for GHG effects or for their secondary effects. The models do not explain the past history of the climate (including ice ages and the climate of the distant past.)

    Other factors that affect the climate are more important (or at least not sufficiently taken into account) in these models. It is a fluke that we are even examining predictions versus actuals.

  59. Willis Eschenbach
    Posted Aug 28, 2006 at 8:20 PM | Permalink

    Re 56, Francois, thanks for your questions. You say:

    Willis, the r^2 is a bit unfair. B gets low r^2 because it’s in antiphase with observations, but it still the closest to obs. The problem is that observations include stuff such as the Pinatubo eruption and the anomalous 1998 ElNino that cannot be predicted by the models (maybe eventually ElNino, but not volcanic eruptions). Wouldn’t a better comparison use the linear trends from the 3 scenarios r-squared with observations? What is typically used as a measure of model skill?

    Actually, the scenarios include random volcanic eruption, but you are right regarding the general effect of unpredictable future events. As a full-service thread, I will show the results, but I don’t put too much stock in them. Why? Because:

    1) the models are supposed to do better than just a straight-line trend … otherwise, why have models? and

    2) it’s quite possible, especially when we reduce a complex situation (climate changes) to one single number (trend per decade), to get the right answer for the wrong reasons. I have already shown that the models are far from lifelike, in that their year-to-year changes in temperature are much smaller than reality. They are very poor at modeling the statistics of the climate.

    Given all that, here’s the trends and the RMS error:

    The RMS errors are:

    Scenario A RMS = 1.05°C
    Scenario A RMS = 0.44°C
    Scenario A RMS = 0.56°C
    Extended 1979-1988 trend: RMS = 0.48°C

    Note that despite the claim by Hansen that the scenarios should give the range of possibilities, the trend line of the observations is below the trend line of all of the scenarios.

    w.

  60. Pat Frank
    Posted Aug 28, 2006 at 8:50 PM | Permalink

    #56 — Bob, do they anywhere give the uncertainties in these parameters?

  61. Steve McIntyre
    Posted Aug 28, 2006 at 9:59 PM | Permalink

    Willis, can you post up some of the digital series for these models so that people can do simple time series work on them. Also what the url is for the source of the digital version. Thx

  62. hswiseman
    Posted Aug 28, 2006 at 10:36 PM | Permalink

    This entire analysis and the analysis of the analysis (and indeed the entire debate) relies on temperature data that is highly suspect. Just because the temperature data is “all we have” doesn’t make it particularly useful. Any number of signals could be buried in the assumptions, discontinuities and their smooth-outs. The failure to zero in Hansen’s graph is probably not malicious; more likely a “close enough for government work” mentality. The comments on weather station scarcity are most telling.

    I guess it is just more fun to drill holes in the ice or trees and play with satellites than undertake actual observations of the data. Instead of trillions in CO2 control, how about some nice instruments all over the place? Gather data for a statistically significant period, plot it against other real data from other real instruments and see what we have. Even in the worst case scenario, if it takes 15 years to get valid data, you only have a 1C temp. increase. This might keep the polar bears out of Churchill, and chop a day or two off the ski season at Telluride. We need about a $ billion a year to maintain collect and analyze. You might even be able to tell how far apart you can maintain your sensors before they lose validity. “Oh, that is too much like real science. Too much like work” The fact that no one with any power is screaming for more observations tells you everything you need to know about the debate.

  63. Willis Eschenbach
    Posted Aug 28, 2006 at 11:00 PM | Permalink

    Re 61, Steve M., here’s the data. There’s no URL, these guys rarely publish their results, it is digitized from the Hansen graph. It was then checked by using Excel to generate graphs of the digitized data, then overlaying those graphs on the original Hansen graph to make sure there are no errors. I’ve done this data comma delimited, to make it easy to import into Excel or R. The GISTEMP and HadCRUT3 data has been aligned at the start with the three scenarios.

    Year,GISTEMP,HadCRUT3,Scenario A,Scenario B,Scenario C
    1958,-0.030,-0.030,-0.030,-0.030,-0.030
    1959,-0.050,-0.090,-0.005,-0.065,-0.065
    1960,-0.120,-0.140,-0.020,0.075,0.075
    1961,-0.030,-0.040,-0.005,0.030,0.030
    1962,-0.070,-0.040,-0.005,0.010,0.010
    1963,-0.030,-0.020,-0.090,-0.085,-0.085
    1964,-0.320,-0.320,-0.165,-0.165,-0.165
    1965,-0.220,-0.240,-0.110,-0.165,-0.165
    1966,-0.140,-0.170,-0.045,-0.130,-0.130
    1967,-0.110,-0.170,-0.035,-0.095,-0.095
    1968,-0.150,-0.180,0.000,-0.070,-0.070
    1969,-0.030,-0.030,-0.090,-0.105,-0.105
    1970,-0.080,-0.090,0.045,0.040,0.040
    1971,-0.210,-0.210,0.050,-0.035,-0.035
    1972,-0.110,-0.080,0.125,0.060,0.060
    1973,0.030,0.060,0.175,0.200,0.200
    1974,-0.190,-0.230,0.217,0.130,0.130
    1975,-0.160,-0.190,0.075,0.075,0.075
    1976,-0.270,-0.270,0.213,0.100,0.100
    1977,0.020,0.000,0.140,0.120,0.120
    1978,-0.090,-0.080,0.209,0.070,0.070
    1979,-0.020,0.030,0.250,0.110,0.110
    1980,0.070,0.060,0.320,0.150,0.065
    1981,0.160,0.100,0.380,0.140,0.105
    1982,-0.060,-0.010,0.320,0.090,0.085
    1983,0.150,0.160,0.105,-0.055,-0.145
    1984,-0.020,-0.040,0.065,0.045,-0.005
    1985,-0.050,-0.060,0.215,0.175,0.060
    1986,0.020,0.010,0.275,0.270,0.120
    1987,0.160,0.160,0.400,0.280,0.125
    1988,0.200,0.160,0.440,0.345,0.221
    1989,0.080,0.080,0.425,0.390,0.325
    1990,0.270,0.230,0.495,0.435,0.325
    1991,0.240,0.190,0.495,0.400,0.355
    1992,0.020,0.040,0.505,0.455,0.350
    1993,0.030,0.090,0.640,0.440,0.335
    1994,0.130,0.150,0.680,0.445,0.305
    1995,0.270,0.250,0.710,0.405,0.335
    1996,0.190,0.120,0.860,0.275,0.213
    1997,0.290,0.330,0.860,0.305,0.365
    1998,0.460,0.530,0.925,0.395,0.410
    1999,0.210,0.280,0.915,0.525,0.435
    2000,0.220,0.250,0.820,0.560,0.515
    2001,0.370,0.390,0.855,0.645,0.515
    2002,0.450,0.440,0.940,0.610,0.540
    2003,0.440,0.450,0.905,0.675,0.555
    2004,0.380,0.420,0.955,0.725,0.510
    2005,0.520,0.450,1.025,0.695,0.630

    My best to everyone, have fun,

    w.

  64. Posted Aug 29, 2006 at 7:30 AM | Permalink

    Proving a climate model even with a 0.6 Celsius error that accumulated in four decades requires a really skillful scientist. ;-)

    Maybe the explanation is that the actual weather is bribed by the oil industry. :-)

  65. Paul Penrose
    Posted Aug 29, 2006 at 7:44 AM | Permalink

    Remember the old saw “All models are wrong; some models are useful”. But that begs the question “Useful for what?” I think for pure research, to try to ferret out all the complexities of the physics of climate and weather, the GCMs (and the coupled versions) are interesting and probably useful. For the purpose of trying to say something meaningful about the future state of the climate, either locally or globally, they are pretty much useless.

    Just think about the initial conditions problem: these models compute the flow of heat (flux) through the different layers of a gas (or liquid for the oceans) and try to account for the mixing of the different layers, etc. They do this by breaking up the surface of the planet into cells, and the newest ones slice these cells vertically as well. Now before they can start a model run they need to seed all the cells with data, but how do they know what the right values are for any particular starting point in time? There’s a lot of data for each cell, and most of it must be in the form of a vector – for example it’s not good enough to just know how much heat is in a cell, you have to keep track of which direction it is flowing in. As complex as these models are it would seem to me that even small changes in the initial conditions could change the outcome drastically.

    The flux adjustment issue is a whole other can of worms. Some modelers believe that the drift they are seeing in the outputs are caused by fundimental erros in the models, with improper cloud modeling at the top of the suspect list. Others think that the model resolution is just too low and the drift is a consequence of subtle data-losses that accumulate over simulation time. Newer, faster computers will solve the problem they think. Either way, this problem introduces unknown, and maybe unknowable errors into the results.

    I won’t even go into the Linearity Assumption and whether it can even be proven; the known problems with the models already show why the results should not be quoted outside the lab.

    Finally, Jae is right. Comparing these model outputs to observation is a massive exercise in overfitting.

  66. bender
    Posted Aug 29, 2006 at 9:47 AM | Permalink

    Re #65
    This is the precise reason I came to this blog. To hear a discussion – any discussion, pro or con – on this point:

    Comparing these model outputs to observation is a massive exercise in overfitting.

    I am inherently skeptical of the GCMs, but (being unqualified to judge for myself) I am prepared to shift my view based on an objective analysis of model structure, function, & performance. Unfortunately, what I am seeing from the various web-based resources (pro and con) only serves to fuel my skepticism, not quell it. Chief among my concerns is – as alluded to in #60 – the pretense that there is either no uncertainty in model parameters, or that any uncertainty that does exist is not worth studying formally.

    Again, I ask – as with hurricane frequency data, as with the tree-ring proxies – are policy-makers intentionally being fed uncertainty-free pablum? If so, then why? And is this acceptable?

    I look forward to following this thread very closely.

  67. Mark T.
    Posted Aug 29, 2006 at 9:55 AM | Permalink

    Proving a climate model even with a 0.6 Celsius error that accumulated in four decades requires a really skillful scientist.

    Proving even the 0.6 C is problematic when your error is +/- 1.0 C at best.

    Mark

  68. TAC
    Posted Aug 29, 2006 at 10:36 AM | Permalink

    #57 Willis, that is one of the best stories about modeling I’ve seen. Thank you!

  69. bender
    Posted Aug 29, 2006 at 11:14 AM | Permalink

    That is a great quote, and I was going to mention it in the “overfitting”” discussion with jae … along with another one by a French mathematician (who escapes my memory): “give me enough parameters and I will give you a universe”. Unfortunately, that’s a paraphrase. I couldn’t track down the original.

  70. Posted Aug 29, 2006 at 11:18 AM | Permalink

    Willis,

    On climateprediction.net, they say :

    By using each model to produce a ‘hindcast’ for 1920-2000, and then comparing the spread of forecasts with observations of what actually happened, we will get an idea of how good our range of models is – do most of them do a good job of replicating what actually happened? This will also let us ‘rank’ models according to how well they do. All the models will also be used to produce a forecast for the future – until 2080. When this experiment finishes, we will have a range of forecasts for 21st century climate.

    I’m wondering how they do that (couldn’t find the details). A sensible method would be to start with the period 1920-1960, and find the “best” parameters. Then test the period 1960-2000 with those same parameters, and evaluate the skill. Now conversely, one should do the same exercise starting with the 1960-200 period, and then using it to test skill on the 1920-1960 period. Ideally, the set of parameters that best fits one period should be the same that best fits the other, and the predictive skills for those symmetrical experiments should be the same. That would give a lot of confidence in the choice or parameters. But then there is another question: how many different sets of parameters give an acceptable fit? With that many parameters, it is likely that there are a great many combinations of parameter values giving similar results. How do you chose between them?

    Of course, there is also the problem that maybe some unknown parameters are not included in the models. I see that most parameters are related to cloud formation. If it is, as some believe, related to the amount of cosmic rays, then that is a parameter that is not, to my knowledge, in any of the current GCM’s. Thus it is impossible to know if there would be a better fit with its inclusion. Good skill from current models would be pure luck, and predictive skill inexistent. For those interested, ClimateScience has a good post today on how algae in the Pacific have a significant effect on SST, something that, until recently, was not included in models.

  71. Bob K
    Posted Aug 29, 2006 at 11:28 AM | Permalink

    re #60
    Pat,
    To my knowledge uncertainty in values isn’t quantified by them. They do say they have three values available for use with each of their 34 parameters. I got the impression they have low, medium and high values for each. Sort of like an electric stove. These parameters don’t even cover the external forcings. I don’t know how they handle them.

    Had a discussion in a previous thread with a fellow named Carl that works for CP.net. Found here. http://www.climateaudit.org/?p=635#comments
    Starting around comment #174. He wasn’t at all helpful in answering questions. Although he did post a link to this letter they had in Nature.

    http://www.climateprediction.net/science/pubs/nature_first_results.pdf

    I’m not the sharpest tack in the box, since my formal education ended with HS in the mid 60’s. But I critiqued it and found it to be extremely biased in their selection of model runs to discard. Essentially almost all that showed negative temps in the training period.

  72. bender
    Posted Aug 29, 2006 at 11:52 AM | Permalink

    If you blindly produce a million random models with 34 parameters each, then choose the one that best fits the data, you are going to get an overfit which may perform as well as an insightfully constructed (yet overfit) model. This is, in some ways, the worst kind of overfit because the selection process is unpenalized re: the number of candidates tested. i.e. The larger the number of candidates tested, the higher the likelihood of getting a spurious fit.

    If you are a serious college football fan …

    you will understand immediately the analagous problem of figuring out which teams are the top 8 teams that deserve to go to the BCS. You have hundreds of teams, but very few head-to-head experiments to analyse (especially among top teams, which actively seek to avoid the toughest competitions!), so you are woefully ill-eqipped to determine win-probabilities from what is a heavily under-determined data matrix. Enter the “process-models” (QB ratings, special teams performance stats, etc.) to solve the fundamental data inadequacy problem. But these models (there are dozens of them) perform poorly, and worse, are regionally biased. This is so well-recognized that rather than pick one, the BCS method averages them all in a great melting pot of bad math. (Sounding familiar?) And finally, the “computer rankings” are weighted as only a minor component compared to “expert opinion” – where opinion in year t is heavily correlated with opinion in year t-1. (It is an AR(1) process.)

    Climate modeling is about as flaky as the BCS college football performance modeling system. But whereas we are free to debate the BCS, there are many who believe we are unqualified to ask questions of the GCMs.

  73. John Creighton
    Posted Aug 29, 2006 at 12:32 PM | Permalink

    I think the parameters in the model should not be identified by using the computer model to find the best fit as the computer model are far too complex. Things like cloud feed back can be estimated from satellite and instrumental data. I assume other experiments could be done to describe the various energy exchanges between phases of mater. As for fitting the initial conditions I would recommend using Bayesian probability. We have some prior idea of the probability of the initial states p(a) based on principles like maximum entropy. We have a set of measurements b we want to find the expectation of a given the distribution p(a,b)=p(b|a)p(a).

  74. Bob K
    Posted Aug 29, 2006 at 12:37 PM | Permalink

    Right on point Bender.

    In that old thread, I mentioned that if you give me the Daily Racing Form data from last year, I could write a program that would show a profit when run over that data. It wouldn’t have any predictive power tho’.

    Additionally. If there was a program that accurately predicted the outcome of sporting events, it would quickly become useless as the general public started using it, due to the odds being adjusted to compensate for better public knowledge. All gambling programs lose any edge they might have had once they are made public.

  75. bender
    Posted Aug 29, 2006 at 12:50 PM | Permalink

    In financial forecasting, there is always a reckoning when the out-of-sample validation test data come in. A judgement day, if you will. The pain on that day is palpable. In climate modeling science, do you know what they do to avoid that judgement day? They “move on“.

  76. nanny_govt_sucks
    Posted Aug 29, 2006 at 1:26 PM | Permalink

    Timmy-me-boy has some comments over at Deltoid to the effect that a 30-year average should be used instead of lining up the starting points at 1958. To me this doesn’t make sense as you have to run the models first to get their averages. Anyway, wouldn’t the averages be different for the A, B, and C scenarios? Seems like just a lot of confusion thrown in to give the Alarmists something to grasp on to.

    Can you comment Willis? Thanks.

  77. Willis Eschenbach
    Posted Aug 29, 2006 at 1:38 PM | Permalink

    Re # 70, Francois, because the models are tuned (not to mention overfitted) to the past trends, how well they do in replicating the past trends is meaningless.

    However, there are a number of other measures that we can use to determine how well the models are doing, measures to which they are not tuned. See for example my post #45 in this thread, which examined the models perfomance regarding ‘ˆ’€ T, the monthly change in temperature. That analysis made it clear that Hansen’s model did not hindcast anywhere near the natural range of temperatures.

    Other valuable measures include the skew, kurtosis, normality, autocorrelation, and second derivatives (‘ˆ’€ ‘ˆ’€ T) of the temperature datasets. Natural temperatures tend to have negative kurtosis (they are “fat tailed”, for example. Let me go calculate the Hansen data … I’m going out on a limb with this prediction, I haven’t done the calculations before on this …OK, here it is. From the detrended datasets, we find the kurtosis to be:

    GISTEMP , -0.85
    HadCRUT3 , -0.50
    Scenario A , 0.68
    Scenario B , 0.15
    Scenario C , 1.14

    You see what I mean, that although the models are tuned to duplicate the trend, they do not replicate the actual characteristics of the natural world. In this case, unlike natural data, the scenarios all exhibit positive kurtosis.

    Re the comment that a “30 year trend” should be used instead of lining up the starting points … why 30 years? Why not 10, or 35? Each one will give you a different answer. It seems to me that the answer we are looking for is “where will we be in X years from date Y?” To find that out, we have to line up starting points at date Y.

    But first, we’d have to have models that I call “lifelike”, that is to say, models whose results stand and move and turn like the real data does, they match observations in kurtosis, and average and standard deviation of ‘ˆ’€ T, and a bunch of other measures. Then we need models that don’t have parameters and flux adjustments to keep them in line.

    At that point, we can talk about starting points and thirty year averages. Until then, the models are only valuable in the lab in limited situations.

    w.

  78. Willis Eschenbach
    Posted Aug 29, 2006 at 1:51 PM | Permalink

    Re 73, John Creighton, you have a good idea when you say:

    We have some prior idea of the probability of the initial states p(a) based on principles like maximum entropy. We have a set of measurements b we want to find the expectation of a given the distribution p(a,b)=p(b|a)p(a).

    Unfortunately, our ground based data are abysmal, far too poor to do what you suggest. Africa, for instance, has 1152 weather stations, many of them unreliable. That’s one station for every 26,000 square km if they were evenly spaced, which they’re not. Garbage in, garbage out …

    w.

  79. Jeff Norman
    Posted Aug 29, 2006 at 2:13 PM | Permalink

    Just an observation. Willis quoted the original article as saying:

    Scenarios B and C also included occasional large volcanic eruptions, while scenario A did not.

    If I recall correctly, since Pinatubo in 1991 there haven’t been any large volcanic eruptions. This suggests again that perhaps the AGW forcing is overstated in the model(s).

  80. John Creighton
    Posted Aug 29, 2006 at 7:08 PM | Permalink

    #78 Whether the combination of instrumental and satellite data currently available is enough to initialize the models to get meaningful runs from a reasonable number of Monte Carlo trials is secondary to the question of weather the proper approach is to try and initialize the models using as much of the current instrumental and satellite measurements as possible.

    Of course given a model with fine resolution (a large number of initial states) it is necessary to use every measurement possible for initialization. Additionally without some prior understanding about the statistical distribution of the initial states (e.g. maximum entropy, seasonable variability, day nighttime variations, latitude and altitude variations, inland vs coastal variations) it is impossible to initialize the initial states beyond the resolution of the measurement system in any kind of meaningful way.

  81. Pat Frank
    Posted Aug 29, 2006 at 8:48 PM | Permalink

    #65 — another can of worms to add to your list, Paul, is that the data calls in the GCMs, while running a simulation, can act like a periodicity in climate and end up accelerating some momenta artifactually. I remember reading a paper warning about this a while back.

    #71 — Thanks for your thoughts, Bob. I recall Carl from ClimatePrediction. Calling him ‘non-helpful’ is granting him far too much grace. As I recall, Willis critiqued the CP Nature paper here in CA. After that crushing expose’, it was easy to see the paper was worthless. It then became more easy to conceptualize something that had been bothering me in a visceral sort of way, namely that Nature isn’t really a science journal. It’s an editorializing magazine specializing in science and its more expert imitators.

  82. Nathan Kurz
    Posted Aug 30, 2006 at 1:38 AM | Permalink

    Thanks for your analysis, Willis. I’m enjoying it. But a question about #77:

    But first, we’d have to have models that I call “lifelike”, that is to say, models whose results stand and move and turn like the real data does, they match observations in kurtosis, and average and standard deviation of àƒ⣃ ‹’€ ”‚¬⟔, and a bunch of other measures.

    Could you explain in more detail why it is necessary for the model to be be “lifelike”? I’m not sure that it is necessarily an indication of a good model. I’d guess that the statistical characteristics of the historical temperature record are due in large part to the exact sampling method used. Unless the model is artificially simulating the same measurement errors, why should it produce data with the same characteristics?

    Since the purpose of the model is to show the general trend, wouldn’t a smoother, less chaotic, model be more useful? For that matter, why is the year-to-year noise so large for the models? Wouldn’t the predictions be smoothed out over multiple runs of the model? Or is the model only run once? Along those lines, I wonder if any model that too closely matches the historical record should be viewed suspiciously, as the ability to track (what are most likely) random fluctuations seems like it must be evidence of overfitting.

  83. Posted Aug 30, 2006 at 2:12 AM | Permalink

    Very interesting thread — I’ve been studying the global warming issue some time as something of a hobby and am continually struck by the utter lack of appreciation for the essential variability of global climate as best as we can determine it from paleontological records over very long time scales. Ice ages. Drought and desert. Humans not present at all for most of it.

    The data presented above is of course a snapshot of the merest flicker of time. It correlates well with almost any upwardly trended number (with suitable one or two parameter adjustment) — the rate of growth in the GDP, for example. However, correlation is not causality is a standard adage of statistics.

    Correlation can make one think of causality, however. The most plausible explanation I have seen for climate variations over time periods of centuries has to be that given here:

    http://www.john-daly.com/solar/solar.htm

    There is lovely and (I think) compelling correlation between various direct and proxy measures of solar activity stretching back over at least 1000 years and historical records of temperature including the medieval optimum. To me the most interesting thing about it (and the reason that I bring it up now) is that it is pretty much the only single predictor that very clearly reproduces the “bobble” in temperatures that occurred in the 50’s through the 70’s, that provoked doom-and-gloom warnings at that time not of global warming but of the impending Ice Age!

    Figure 3 is a pretty picture of the trend through the late 80’s — I don’t know of anyone that has extended the result out through the present although that would clearly be a useful thing to do. Plots like this, either lagged 3-5 years or otherwise smoothed, exhibit correlations in the 0.9 and up range, including all the significant variations. I have no idea what goes into the multiparameter fits like Hansen’s, but a one parameter fit that beats them hands down over a far longer time scale is obviously a much better candidate for a causal connection.

    Beyond the statistics, I have to say (as a physicist) that I find solar dynamics to be an appealing dynamical mechanism for climate variation as well. To begin with, I’m far from convinced that the physics of the greenhouse effect for CO_2 supports a linear relationship between CO_2 concentration and heat trapping. Is an actual greenhouse with thick glass panes better at trapping radiation than one with thin ones? I think not, and that isn’t even considering the quantum mechanics (where the efficiency of CO_2 as a greenhouse gas actually drops with as the mean temperature of the blackbody radiator rises and shifts the curve peak away from the relevant resonances, as I understand it).

    Then there is the monotonic increase of CO_2 over hundreds of years but the clearly visible and significant variations in temperature with no observed fluctuation correlations with this trend. Clearly CO_2 can be no more than
    a factor in global temperature, and equally clearly solar irradiance variations must be another. Of course that leaves one requiring a two or more parameter fit, and one can either make CO_2 dominant (and explain the fluctuations with solar irradiance) or make solar irradiance dominant (and throw the CO_2 contribution out all together). Occam would prefer the latter, but the former could be right, if one looks at only 200 years of data.

    If you look all the way back one to two thousand years, however, then things like accurate estimations of global temperature become paramount. CO_2 from human sources (at least — there are many sources of CO_2 and it isn’t clear that the dynamics of CO_2 itself in the ecosystem is well understood or entirely predictable) was clearly negligible over most of this time, yet historically there appear to have been periods when the temperature was as high as it is today if not higher.

    Finally there is the nastiness of politics. I’ve just finished reading the papers that absolutely shred the statistical analysis behind the “approved” view of global warming with its “hockey stick” end and shoddy (and dare I say ignorant) statistical methodology. This isn’t science — this is about money, getting votes, saving rain forests, pursuing secondary agendas. Science is an open process, and one that doesn’t tailor results and ignore (or worse, hide in a mistreatment of a dizzying array of numbers) “inconvenient” truths — like the fact that an unbiased examination of global temperatures over long time scales suggests that our current climate is “nothing special”

  84. Dino
    Posted Aug 30, 2006 at 4:57 AM | Permalink

    Would you care to comment on this?

    http://scienceblogs.com/deltoid/2006/08/climate_fraudit.php

  85. Dino
    Posted Aug 30, 2006 at 5:02 AM | Permalink

    Would you care to comment on this?

    http://tinyurl.com/rrbdt

    Someone seems to think you are doctoring the data.

  86. Dino
    Posted Aug 30, 2006 at 5:10 AM | Permalink

    eh nevermind I see you commented. That’s what I get for posting right after I wake up. :-p

  87. KevinUK
    Posted Aug 30, 2006 at 7:47 AM | Permalink

    #83 RB

    Can I ask how and where you started your research on AGW and what caused you to look into it in the first place? It sounds like you’ve had a similar experience to me. In my case Tim Lambert’s dissing of someone who I respect caused me to wonder whether there was something not quite right and sure enough (as you’ve seen for yourself) it isn’t. Like me, I hope you have lots of friends who are also scientists and engineers. Are any of them also waking up to the poor science that underpins the AGW myth?

    KevinUK

  88. Peter Hearnden
    Posted Aug 30, 2006 at 8:39 AM | Permalink

    Re #83 and Fig 3 of the SWFG article, I wonder what you think of this?

  89. jae
    Posted Aug 30, 2006 at 8:50 AM | Permalink

    Another example of the models not reflecting past reality.

  90. Posted Aug 30, 2006 at 9:18 AM | Permalink

    Solar variation should have an excellent correlation with global temperature except where other forcings rear their ugly head, in particular volcanic eruptions (the effects of which clear in a few years) and perturbation of greenhouse gases and aerosols by human emissions. There is an early anthroprogenic hypothesis that the rise of civilization has had an effect over the past 8K years, but that would have created a fairly steady state so that until about 200 years ago solar would still have dominated any changes.

    That being so, the argument that there is an excellent correlation of solar activity vs. global temperature over the past 1000 years is irrelevant. What one has to do is compare solar activity with global temperature over the past 200 years. If you do so you find that solar activity becomes a worse proxy, the closer to the present one comes, and that you cannot explain measurements over the past 50 years or so without either waving your hands or making very, shall we say, contested claims about various mechanisms. This can even be seen in Soon and Baliunas (Ap J 472 (1996) 472 summarized in this graph. Note that the greenhouse gas effect takes off in 1950 and dominates in later years. The paper was severely criticised (well as those things go) in the IPCC TAR for not having enough variability.

    Which brings up the basic criticism of what was done in this post. Since both models and observations have real variability (in the latter case both observational and climate related), one should set baselines by averaging over a fairly long and relatively quiescent period. Moreover it should be the same period so that one compares ducks with ducks and not chickens, similar beasts, different tastes.

    Finally, per has a good point. HadCrut3 includes sea surface temperatures, GISSTEMP does not. The 1988 Hansen graph is for “Annual mean global surface air temperatures from the model compared with the Hansen and Lebedeff GISSTEMP series. By the way it is contructed, the latter does not cover most of the ocean. Therefore comparing HadCrut3 with the 1988 calculations is comparing ducks with geese, even if all of the sets were properly zeroed. If you are going to use a Hadley Center product Crutem3 is a much better choice. It is quite clear from looking at the data that HadCrut3 has much less variability that Crutem3 which, of course, is no big surprise.

    There are more fowl arguments here for later, but we do have to forego the pleasure for other tasks

  91. Willis Eschenbach
    Posted Aug 30, 2006 at 11:10 AM | Permalink

    Re #89, Eli, thank you for your posting. Among other points which I will answer when time permits, you say:

    Finally, per has a good point. HadCrut3 includes sea surface temperatures, GISSTEMP does not. The 1988 Hansen graph is for “Annual mean global surface air temperatures from the model compared with the Hansen and Lebedeff GISSTEMP series. By the way it is contructed, the latter does not cover most of the ocean. Therefore comparing HadCrut3 with the 1988 calculations is comparing ducks with geese, even if all of the sets were properly zeroed. If you are going to use a Hadley Center product Crutem3 is a much better choice. It is quite clear from looking at the data that HadCrut3 has much less variability that Crutem3 which, of course, is no big surprise.

    Eli, I fear you are in error about the GISTEMP datasets. GISTEMP has two datasets, one of which includes SST, and one of which does not. The 1988 Hansen graph uses the dataset which includes the SST, which is why it correlates well with the HadCRUT3 dataset. In short, the HadCRUT3 dataset is the proper one for comparison.

    I have used both the GISTEMP + SST and the HadCRUT3 datasets in my analysis, since they are quite similar, and which one you choose doesn’t make much difference to the outomes or the statistics.

    w.

  92. John Creighton
    Posted Aug 30, 2006 at 12:21 PM | Permalink

    Sorry for the double post, I posted this in the wrong thread first:
    #89 I think a good part of temperature variation cannot be explained with simple linear models as a result of the combined forcing agents, solar, volcanic and CO2. I believe the dynamics of the earth induce randomness in the earth climate system. I’ve tried doing regression with a simultaneous identification of the noise here:

    http://www.climateaudit.org/?p=692#comment-39708

    I don’t think the results are that different from stand multiple regression for the estimates of the deterministic coefficients but it does show that the system can be fit very well, to an ARMA model plus a colored noise input. Regardless of what regression technique you use a large part of the temperature variance is not explained by the standard three forcing agents alone. Possible other forcing agents (sources of noise) could be convection, evaporation, clouds, jet streams and ocean currents.

  93. John Creighton
    Posted Aug 30, 2006 at 12:22 PM | Permalink

    Oh, gosh and somehow I messed up my html twice in a row:

  94. Steve McIntyre
    Posted Aug 30, 2006 at 12:29 PM | Permalink

    #63. Willis, could you provide a desciption of the exact methods used to align HadCRUT3 and GISS and the exact url’s for the data sets that you used. I’ll update to the post to add in this as well as the data source.

    Also in the original article when Hansen started out in 1958, what was his explanation for the starting value.

  95. John Creighton
    Posted Aug 30, 2006 at 12:38 PM | Permalink

    Here is an idea of how good the above fit is:

    The fit uses an AR (4,4) model for each input. The standard deviation of the residual y-Ax-S shows how close the fit is with the estimated noise, while the standard deviation of residual y-Ax shows how good the deterministic part of the fit is.

    In this case the estimation of the noise did not effect the deterministic fit much because there were enough measurements for the noise and the deterministic output to be nearly orthogonal. We see that around 60% of the standard deviation is due to noise with an AR(4,4) fit. Looking at the above plot for a standard regression fit AR(1,0) we see that there is much more noise then signal. I suspect the same results with a standard regression fit for AR(4,4) because in the algorithm I used the first iteration only fits a small part of the noise and thus should not significantly effect the regression parameters. If I had of used less measurements though I would expect a difference between my algorithm and the standard regression fit. I’ll compare that later.

  96. Posted Aug 30, 2006 at 1:17 PM | Permalink

    To Eli:

    I fundamentally disagree that the last 200 years is in any way crucial to the discussion, or rather, it is crucial only because it covers the entirety of the era for which we have approximately reliable thermometers and hence have any real (non-proxied) opportunity for a discussion. However, if one grants only one thing — that we do not quantitatively understand the dynamics of global climate from first principles, period — then everything else is reduced to curve fitting and models.

    I’m far from an expert on solar dynamics, but I know a fair bit about curve fitting and models where one has no real quantitative basis for the model forms in use. It is clear that there are fluctuations in global temperature with very, very long time scales compared to a year, or even 200 years. Very significant fluctuations. Furthermore, there are fluctuations on a time scale of hundreds of years evident in various proxy data that stretch our knowledge of the EXISTENCE of these fluctuations (but not their magnitude, which is inferrable only from e.g. tree ring proxies and the like via extrapolation subject to many unprovable assumptions). Depending on whether you “favor” Mann et. al. or McIntyre and McKitrick alone, there is or is not anything vaguely approximating a temperature anomaly to consider, let alone one that could be forced by human sources of greenhouse gases.

    Are bristlecone pines a good or bad proxy? I couldn’t say, but it is pretty clear that Mann et. al. did a lousy, sloppy job of extrapolating their proxy data and that (in my opinion, based on reading much of the debate) it is almost certainly true that it was as warm or warmer seven hundred years ago as it is today.

    There are similarly questions about the contemporary data that is used to develop the proxies in the first place. Since both sides in the issue (with the exception of e.g. M&M) seem to have abandoned all pretext of real scientific objectivity and refuse to open up the process of just how to compute global average temperatures anyway to a public scientific debate, there is basically an unknown source of noise, very likely (politically) biased noise, superimposed on the data being used to fuel the public political debate.

    These factors make it entirely possible that ALL attempts to fit the short time scale data are ignoring long time scale or chaotic dynamic factors that in fact dominate but are omitted from all models. None of the models have a plausible (or if you prefer, verifiable) explanation for the causes of ice ages, for example, or for a geologic period where (from observed peak ocean levels recorded by various proxies and so forth) it was even warmer (globally) than it is today. We don’t know how to fit the signal. We don’t even know how to separate signal from noise. We KNOW that the signal has contributions of more or less unknown strength from many time scales. And yet people are asserting that their models are valid on the basis of short time scale fits with many, many adjustable parameters.

    Even atmospheric CO_2 concentrations are not particularly well understood, from what I’ve been able to tell. For one thing, instead of increasing monotonically ONLY in the last 200 years, there is proxy data derived from multiple sources and as far as I know not challenged that asserts that global CO_2 levels have varied on a geological scale almost precisely with VERY coarse grained temperature, right through the last ice age, to a point back in the last warm interlude when they were as high as they are today. And (I would guess) it was pretty much as warm as it is today. And (I would further guess) that the pre-humans of those days were not affecting CO_2 levels much.

    One perfectly reasonable explanation is that CO_2 levels are forced by global temperature, which in turn is forced by solar dynamics on a very long time scale — possibly even true solar variability due to some serious physics going on in the core that people are just beginning to be able to understand or model or predict — as I presume that no one will seriously suggest that CO_2 levels force solar activity as an alternative. This is equally evident in shorter time scale CO_2 fluctuations from the recent past — there are clear trend correlations with temperature (IIRC, I don’t have time right now to dig for the figures I’ve seen in past digs).

    My primary conclusion isn’t that human forced global warming is or isn’t true. It is that it is absurd to claim that we can even THINK of answering the question at this point in time, and that there exists substantial evidence to the contrary, with solar dynamics known and unknown being a very plausible contender as primary agent for global climate with CO_2 and other greenhouse gases quite possibly being driven by it (with positive feedback effects, sure) rather than the other way around. Or it might be something else — tidal effects, magnetic effects — there are lots of sources of free energy out there and we don’t have a very accurate understanding of how they interact with the very complex system that produces weather, let alone climate.

    rgb

  97. Posted Aug 30, 2006 at 2:31 PM | Permalink

    Which, by the way, agrees qualitatively with John’s observation that he can fit the recent data decently by a multiparameter model with significant “noise” that can be interpreted as “missing dynamics in the model. Indeed, it extends it as some of the source of noise could have very long timescales compared to 500 or so years.

    I do have a question about the fit. Some of the sources of noise one might expect to be driven by global temperatures and hence possess some covariance with model parameters. If (say) cloud coverage was related to CO_2 concentrations via the possibly multi-timescale lagged effect of the former on temperature and hence oceanic evaporation, there are opportunities galore for chaotic (nonlinear) mesoscale fluctuations. Ditto, by the way, if (say) solar dynamics drives temperature drives CO_2 concentrations which feeds back onto temperature which feeds back onto cloud coverage which affects the temperature and maybe the rate of CO_2 being scrubbed into e.g. the oceanic reservoir… one would expect a highly non-Markovian description of the actual true energy dynamics.

    Would your model account for any portion of that sort of dependent covariance, or would it only account for effects with a straightforward (e.g. linear or near linear) non-delayed effect on temperature? Outside of the noise, I mean.

    The entire existence of ice ages seems to suggest that either the Sun has serious long period variability (sufficient to significantly lower global temperatures for hundreds of thousands of years stably) or there exist multiple attractors in a chaotic model driven by a more uniform heat source, with transitions that are perhaps initiated by fluctuations of one sort or another to different stable modes. I know there are people looking at this, but I don’t know that they’ve gotten good answers. A chaotic model might well exhibit “interesting” behavior on a whole spectrum of time scales, though, as various parts of the dynamic cycle undergo epicyclic oscillations.

    rgb

    rgb

  98. Willis Eschenbach
    Posted Aug 30, 2006 at 3:49 PM | Permalink

    Steve M, you ask:

    #63. Willis, could you provide a desciption of the exact methods used to align HadCRUT3 and GISS and the exact url’s for the data sets that you used. I’ll update to the post to add in this as well as the data source.

    Also in the original article when Hansen started out in 1958, what was his explanation for the starting value.

    The HadCRUT3 dataset is from http://hadobs.metoffice.com/hadcrut3/diagnostics/global/nh+sh/monthly

    The GISTEMP dataset is from http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts+dSST.txt

    I don’t know how Hansen picked the 1958 starting value.

    I aligned the two temperature datasets by adding the same amount to each datapoint, in a manner such that their 1958 values matched the 1958 values for the scenarios.

    w.

  99. Posted Aug 30, 2006 at 6:14 PM | Permalink

    #98 since the original model prediction work was done in 1988, he presumably did a 30 year run on the models, thus the 1958 start.

  100. John Creighton
    Posted Aug 30, 2006 at 6:22 PM | Permalink

    #97 In my identification all inputs including the noise have the same system poles. The system poles could be due to various feedbacks like cloud cover or CO_2 feedback. The noise which is identified is as a consequence of state changes in the poles that cannot be explained by external inputs. You could predicted outside the region of fit for a limited range based on the system dynamics given passed inputs and passed noise estimates.

    As for ice age effects I’ve read something about the ice ages correlating with the precession of the earth. I am not completely sure how it works. I am not sure if colder winters and warmer summers causes more ice or warmer winters and cooler summers causes more ice.

  101. TAC
    Posted Aug 30, 2006 at 7:02 PM | Permalink

    #97 Robert, you might want to google Demetris Koutsoyiannis (or look )here) on long-memory (LTP) processes and climate. Koutsoyiannis has given a lot of thought to this problem.

  102. John A
    Posted Aug 31, 2006 at 2:42 AM | Permalink

    Robert G. Brown:

    My primary conclusion isn’t that human forced global warming is or isn’t true. It is that it is absurd to claim that we can even THINK of answering the question at this point in time, and that there exists substantial evidence to the contrary, with solar dynamics known and unknown being a very plausible contender as primary agent for global climate with CO_2 and other greenhouse gases quite possibly being driven by it (with positive feedback effects, sure) rather than the other way around. Or it might be something else “¢’‚¬? tidal effects, magnetic effects “¢’‚¬? there are lots of sources of free energy out there and we don’t have a very accurate understanding of how they interact with the very complex system that produces weather, let alone climate.

    That’s exactly my view, but the problem is that in climate science to posit such a fundamental lack of knowledge is called "denialism" rather than "ignorance". We are dealing with people like James Hansen who have spent their entire scientific careers on a single hypothesis and nothing, but nothing will stand in the way of that. This means of course, that the Greenhouse hypothesis as expressed in these "scenarios" is immune to falsification.

    To my mind, to attempt to model climate down to one variable (temperature) or two (temperature and precipitation) from hundreds of thousands of poorly controlled and poorly understood parameters is not simply overfitting but wishful thinking on a grand scale using computers.

  103. TAC
    Posted Aug 31, 2006 at 8:12 AM | Permalink

    #96 and #102 Robert and JohnA: I share your pessimism about climate modeling (poor data; uncertain physics; shoddy mathematical and statistical methods; etc.). Yet, I’m not sure I would disparage models that reduce “climate down to one variable.” Doesn’t it depend on what you’re trying to accomplish? For example, imagine a model for temperature combining some kind of (log)linear deterministic (physically based — get the physicists involved) predictor related to CO2 (if that is what we are interested in testing) with a realistic stochastic component for natural variability (I would look to the statisticians or Koutsoyiannis for this). No more than 3 fitted parameters altogether. Such parsimonious models are hard to build and easily “falsified” (everyone knows “all models are wrong,” and with small-dimension models you can easily see it), but they are much easier to interpret and understand. Though naive, such models might provide real insight. Alternatively, they might demonstrate that, given the complexity of background variability, we aren’t going to be able to say much. That would be interesting, too.

    Whatever. Just an early-morning thought; likely it has already been tried…

  104. Tim Ball
    Posted Aug 31, 2006 at 8:48 AM | Permalink

    #96 Robert G Brown
    I have tried for years to say what you have said here so succinctly, especially in the last sentence. It is hard when you are inside the debate to keep perspective, especially when computer models are so glorified and have been so dominant. It is also complicated by the strident and constant noise generated to distract rather than clarify as you will have already noticed on this site. (The dictum that we can disagree but not be disagreeable is too often replaced by the belief that being disagreeable is tantamont to disagreeing.) It is refreshing to have someone from outside summarize and speak directly to the major inconsistencies and problems. Thank you.

  105. Steve McIntyre
    Posted Aug 31, 2006 at 9:23 AM | Permalink

    #103. TAC – I agree entirely. I’d love to see some more articulated 1D and 2D models. While I can’t prove it mathematically, my entire instinct is that, if the metric is NH (or global) average temperature, there is a latent 1D model that approximates the 3D model to negligible difference.

    The 1D “explanations” of AGW are interesting and I wish that more effort was spent in explaining and expanding them. The only exposition of how AGW works in all three (four) IPCC reports is a short section in IPCC 2AR responding to skeptic Jack Barrett, arguing that increased CO2 will lead to absorption and re-radiation in weak lines at higher (colder) altitudes.

    This is the line of reasoning in radiative-convective models (which would be interesting to consider from a calculus of variations approach for someone with requisite skills). Some of Ramanathan’s articles from the 1970s and 1980s are quite illuminating and, if I ever get to it, I should post up some comments on them. Many of them are now online at AMS Journal of Climate.

  106. bender
    Posted Aug 31, 2006 at 9:34 AM | Permalink

    Re: #103

    Such parsimonious models are hard to build and easily “falsified” (everyone knows “all models are wrong,” and with small-dimension models you can easily see it)

    Just a minor clarification. By “falsification” I assume you mean “invalidation”. Because invalidation of a model is equivalent to falsification of a hypothesis if and only if the model adequately characterizes the hypothesis one is intending to test. That is the question: do simple models give a good and fair test, or are they so easily invalidated that the invalidation does not constitute a refutation?

    And this is the problem, I imagine, with pasimonious models: they may be adequate to respresent some aspects of climate theory, but not others. I am no expert, but I think the reason GCMs have grown in importance is because the more parsimonious models fail to include some of the local (i.e. non-global) processes (such as feedback proceses)that the vast majority of climate scientists feel are important. e.g. Water vapour cycling is not a global process.

    Re: #96,104 If there is uncertainty in the science, then does this completely erode the much-sought-after “consensus” on AGW? Or does it just weaken it somewhat?

  107. Posted Aug 31, 2006 at 9:47 AM | Permalink

    To TAC #97, you are absolutely correct. The preprint linked to this site was a pleasure to read, and contained absolutely precisely the figures I was visualizing and most of the relevant remarks I sought to make, but with full detail. To bring it back to this thread (and indeed this entire site): Figures 1 and 4 of this preprint say it all. They should be required viewing for anybody who wants to even THINK of building a predictive model for this problem. Figure 4, in particular, is relevant to the M&M debate vs Mann, which in turn is ABSOLUTELY relevant to the Hansen graph even if one assumes that the “global temperatures” portrayed therein and being fit are actually meaningful instead of the method-error laden, biased garbage that they almost certainly are.

    Fitting in any short “window” onto the geologic time-temperature series leaves one absolutely, cosmically blind to underlying functional behavior of longer time scales. The only way one has even a hope of extracting meaningful longer-time behavior from such a fit is one knows PRECISELY what the actual underlying functional form is that one must fit to, and only then if that form possesses certain useful properties, like projective orthogonality. Without wanting to review all of functional analysis or fourier transforms or the like to prove this — it should be obvious or else you shouldn’t be participating in this debate.

    However there are some very simple statistical measures one can use to at least characterize things, positing a state of utter ignorance about the functional behavior and viewing the entire temperature series as “just a bunch of data”. For example, all the statistical moments (cumulants) — mean, variance, skew, kurtosis. Thus the importance of M&M vs Mann et al. If the temperature around the 1300’s was in fact as warm as it is today, the scientific debate about “global warming” as anything but speculative fiction is over. Period. It relies on the temperatures we observe being “unusual”. In a purely statistical sense, according to my good friend the central limit theorem, that means “unlikely to be observed in a random iid sample drawn from the distribution” where in turn “unlikely” is a matter of taste — a p-value of 0.05 might suffice for some, even though of course it occurs one year in 20, others might want it to be less than 0.01 (although that happens too).

    However, if two extensive excursions, lasting decades or more, of warm weather like we are currently experiencing, are observed in a mere seven hundred year sample, I don’t even need to do a computation to know that the this isn’t a 0.01 event on a millenial-plus scale. More like a 0.1 or 0.2 event. It also utterly, cosmically confounds attempts to fit human-generated CO_2 as THE primary controlling variable in temperature excursions from the mean, as human generated CO_2 is surely a monotonic function on the entire interval of human history.

    The sun, however, remains not only a plausible causal agent as the primary determinant in the observed variation (given that it IS, after all, the most significant source of free energy for the planet, dwarfing all other sources except possibly tidal heating due to the moon and radioactive heating of the earth’s core, and MAYBE cosmic rays (don’t know what the total power flux is due to cosmic rays but I’ll guess that it equals or exceeds the release of non-solar free energy sources by humans by a few orders of magnitude), averaged over the entire 4\pi solid angle…

    Besides, a number of measures of solar activity are strongly correlated with global temperature over at least a couple of thousand years, as best as proxies (including plain old history books) can tell us. Here the problem is complicated tremendously by the silliness of what one expects out of the fits and the massaging of the data that everybody does to perform them (effectively ignoring the stochastic richness of our ignorance of all causes and delayed differential effects).

    First of all, one needs to make it clear that we Do Not Understand Mr. Sun. We are working on it, sure, but in addition to considerable ignorance concerning the actual magnetohydrodynamics models being proposed (that all have to be validated with horribly incomplete information about interior state — initial conditions, if you will — from observations primarily of exterior state, where it is KNOWN that there are very long time scales to consider — as in a major fluctuation that arrives at the surface to significantly alter solar irradiance and this and that might have actually occurred a rather long time ago). Then we don’t understand all aspects of how the sun transmits energy to the planet or affects the rate and ways energy is absorbed or retained except via ordinary E&M radiation, maybe. Tidal heating, magnetic heating, cosmic rays and charge particles affecting cloud formation which in turn alters the Earth’s mean albedo (in a way many orders of magnitude more significant than CO_2 — clouds are a hell of a greenhouse “gas”).

    We do know that the Sun orbits the center of mass of the solar system, and that its orbit is highly irregular. We do know that there appears to be a transfer effect between orbital angular momentum and the sun’s visible spin, and that altering the latter is a process involving truly stupendous amounts of energy with the consequent release of heat. We know that the Sun does all sorts of strange magnetic things as it proceeds through these irregularly spaced (but predictable) orbital events. We know that they are correlated with, but not absolutely predictive of, things like sunspot count and interval. And we know that these things are correlated very strongly with events in the Earth’s global temperature series, not just over the last 200 years where we have something approximating temperature readings but over the entire range of times where we can deduce temperatures via any sort of proxies.

    It is equally obvious that it is one of MANY parametric inputs (the obviously dominant one, in my opinion) and that our ignorance of many of a near infinity of these inputs (fine grained, they go down to that damnable butterfly in Brazil, after all) requires that a HALFWAY believably model be something like a langevin equation (solved via a fokker-plank equation approach, maybe), if not a full-blown non-Markovian, stochastic, integrodifferential equation. Quasi-linear parametric models (“if CO_2 goes up, global temperatures go up”) are a joke, not serious science.

    But don’t mind me, this is just one of the things I “do” in other contexts where results are easily falsifiable and where we actually have a handle on the microscopic dynamics.

    One other thing I do is neural networks, which in the context of problems like this can be viewed as generalized nonlinear function approximators. The result of building a neural network is of course anathema to most model builders, who want to trumpet things like “This parameter is causing this output to happen the way that it is”. A neural network utterly obscures the true functional relationships it discovers. It also can handle input covariance transparently, and is perfectly happy in a high dimensional multivariate problem describing non-separable input relationships (ones that cannot be expressed as an outer product of independent functions of those variables, ones that e.g. encode an exclusive or or more complex relationship for example).

    They are, however, useful in two ways to the unbiased, even for addressing problems like this. For example, it would be interesting to try training a NN to predict “this year’s mean temperature” from (say) the last twenty-five years’ sunspot data, the last twenty-five years global temperature data (to give it a mechanism for doing the integro-differential non-markovian part of the solution) and with an input for some measure of the total volcanic activity for the last twenty five years. Nothing else. Train it over selected subsets of the last 250 years’ data (it needs to include examples of the bounded variance of the inputs and outputs. Then apply it and see how it does.

    One can then compare the performance of such a network to one built e.g. only with sunspots as input (no non-Markovian inputs). Only with volcanos. With sunspots and volcanos. Only with CO_2. With CO_2 and temperature. Etc. Abstracting information from this process is tedious, but it does not depend on any particular assumptions concerning functional dependence of the inputs. If they are useful, the network will use them. If not, it will ignore them. If they are strongly covariant with other inputs, it won’t matter. Even so, one can sometimes obtain very interesting information about the underlying process dependencies. Oh, and one will likely end up with a quantitative model that is more accurate a predictor than ANY existing model is today by far, but that is besides the point. In fact, if the markovian model proved accurate (and I’ll bet it would:-) one could use the network recursively with simulated inputs drawn from the expected distribution of e.g. sunspots and get an actual prediction of global future temperatures some years in the future that might even accurately describe the hidden feedback stabilization mechanisms that are obviously absent from quasi-linear (monotonic and non-delayed) models.

    To John A — I also completely agree, especially about the wishful thinking aspect of things. The above is the merest outline of how one might actually attack the problem in a way that didn’t necessary beg all sorts of questions. It still leaves one with the very substantial problem of the base data one attempts to fit — the temperature series one tries to fit is a patchwork of changing methodologies, technologies, locations, and worse with urbanization alone providing systematic errors that are corrected for more or less arbitrarily (so that what one measures depends on who is doing the correcting). The patches don’t even fit together well where they overlap — there are weather stations all over the planet that show no statistically significant global warming EVEN in the last decade and current measures of global temperature are either being “renormalized” by individuals with a stake in the discussion or are being based with extraordinary weight given to parts of the world where there IS NO reliable data that covers hundreds of years to which current measurements can be compared (and where it is easy to argue to the current data itself isn’t terribly reliable).

    GIGO. The very first problem one has to address, unless/until one agrees to use only satellite data and radiosonde data and dump surface measurements altogether as the load of crap they probably are. That still leaves one with accurately normalizing to the scale of the RELIABLE measurements we do have that stretch back over 250 years, but that is a more manageable problem.

    In the meantime, I find it overwhelmingly amusing, in a sick sort of way, that Time Magazine published in one month a cover story on the sun that clearly describes the immense complexity of solar dynamics, discusses the high probability that things like its irradiance and magnetic field and solar wind are significant contributers to things like earthbound weather and how important it is to understand them, and then JUST A MONTH OR THREE LATER publishes a cover story on global warming that utterly ignores the sun! No mention of Maunder minima, no glimpse of the Medieval Optimum, no look at temperature varation on a truly geological time scale — just “it is now a known and accepted scientific FACT that human generated CO_2 is about to destroy the planet as we know it”.

    The greatest tragedy associated with this is that the people betting on this horse had damn well be right. If (for example) the current sunspot minimum is in fact the edge of another Maunder minimum (as the solar theories seem to suggest and we’re about to enter a downturn that lasts some thirty years, it will (rightfully) damage the credibility of all scientists, everywhere, for the next fifty years. Chicken Little on a grand scale. As a scientist, I find the sheer probability of this very disturbing, as most ordinary citizens are utterly incapable of differentiating a scientist being enthusiastic about a pet theory from Darwin or Newton, where the “pet theory” is falsifiable and so overwhelming validated by observation that nobody sane expects it to ever be proven false.

    rgb

  108. Posted Aug 31, 2006 at 10:19 AM | Permalink

    Sorry, I’m not totally together this morning. I meant that the “non-Markovian” neural network would likely prove very predictive in this problem, not the “Markovian” one (which would not include any delayed data). I also failed to point out the virtue of including precisely 25 years worth of historical inputs on both temperature and sunspots to the NN. Current models all rely on “smoothing” at least e.g. sunspot data over some preselected interval, and often smooth temperature as well. This eliminates some obvious stochastic noise that would be very difficult to fit, but it also makes the resulting model depend on the particular interval one smooths over, and (in some cases) whether the smoothing includes FUTURE data as well as PAST (kind of tough to built a deterministic model that way, huh). Somehow nobody seems to ever do a comparative study of just how many years one SHOULD smooth over or why they choose “ten years” instead of “two years” or “fifty years”. Tough call, given that good old problem with time scales of unknown length and projective orthogonality (how much of figure 1 DOES one need to be able to extract an accurate fourier component to this pure sine function plus noise, hmmm).

    By including as inputs all the actual numbers over the last 25 years, one eliminates having to think about this in a way that biases any particular window, at least for projective components determinable within the window. If the NN decides it wants to smooth linearly exactly the last eight years, it will learn to ignore the previous seventeen. If it wants to “smooth” by morphing all the inputs through a nonlinear function with support from both the last five years and a second window of five years centered eleven years or twenty two years earlier, it has the data to do so over at least two full cycles.

    This also is wide enough to permit a physically expected lag of some years between a change in WHATEVER mechanism in solar energy transfer or loss inherent in the nonlinear function the network discovers and the resultant changes in NH temperatures. The earth has a certain “thermal inertia”, just like my house, and setting the thermostat up or down (especially to try to bring about a large change) doesn’t happen right away, especially when the earth has all sorts of “windows” it can open to let energy in or out while the AC or furnace is running.

    The other brainless thing I did was attribute the solar and global warming articles to time magazine. I meant National Geographic, sorry, specifically
    July 2004
    and September 2004. Pretty amazing, really.

    rgb

  109. bender
    Posted Aug 31, 2006 at 10:20 AM | Permalink

    Re #107:
    On the utility/futility of neural networks.

    The result of building a neural network is of course anathema to most model builders, who want to trumpet things like “This parameter is causing this output to happen the way that it is”.

    Now why do you think that people would like to understand the relationship between input and output?

    A neural network utterly obscures the true functional relationships it discovers. It also can handle input covariance transparently, and is perfectly happy in a high dimensional multivariate problem describing non-separable input relationships (ones that cannot be expressed as an outer product of independent functions of those variables, ones that e.g. encode an exclusive or or more complex relationship for example).

    This sounds like a research proposal. Are you sure you’re being objective here?

    one will likely end up with a quantitative model that is more accurate a predictor than ANY existing model is today by far, but that is besides the point

    Neural network models are overfit models. They are as likely to fail in an out-of-sample validation test as any other overfit model. No?

  110. charles
    Posted Aug 31, 2006 at 10:39 AM | Permalink

    Dano/Bloom,

    What do you have to say about Dr. Brown’s posts?

    I find it very refreshing when someone knowledgable puts things into perspective.

    That’s why I love this blog.

  111. charles
    Posted Aug 31, 2006 at 10:42 AM | Permalink

    Bender,

    “This sounds like a research proposal. Are you sure you’re being objective here?” talking about Dr. Brown’s post.

    If true this puts Brown into the same position as all the grant funded climate scientist. Bloom/Dano are constantly telling us we should trust grant funded scientist because they are objective – receive no FF monies.

  112. Barney Frank
    Posted Aug 31, 2006 at 11:13 AM | Permalink

    As for ice age effects I’ve read something about the ice ages correlating with the precession of the earth.

    I think you’re referring to Milankovich(vic?)

    I believe it is highly correlated to the recent glacial-interglacial periods but doesn’t help too much in explaining ice age-non ice age periods. Sorry no link, just my memory.

  113. bender
    Posted Aug 31, 2006 at 11:17 AM | Permalink

    Re #111
    He’s criticizing the traditional approach to climate modeling. I want to know what kind of weapons he’s packing. Could be interesting. I say ‘leave the trolls out of this’.

  114. John A
    Posted Aug 31, 2006 at 11:41 AM | Permalink

    Robert and JohnA: I share your pessimism about climate modeling (poor data; uncertain physics; shoddy mathematical and statistical methods; etc.). Yet, I’m not sure I would disparage models that reduce “climate down to one variable.” Doesn’t it depend on what you’re trying to accomplish? For example, imagine a model for temperature combining some kind of (log)linear deterministic (physically based “¢’‚¬? get the physicists involved) predictor related to CO2 (if that is what we are interested in testing) with a realistic stochastic component for natural variability (I would look to the statisticians or Koutsoyiannis for this). No more than 3 fitted parameters altogether.

    The only problem is that with three or more degrees of freedom, chaos ensues.

  115. John Creighton
    Posted Aug 31, 2006 at 12:33 PM | Permalink

    There has been some discussion here how averaging effects the model fit. Averaging can be corrected for in a limited way. For instance in audio systems speakers use sin(x)/x compensation to compensate for the distortion as a result of the sampling process. In the case of temperature averaging cause a more serious problem because the total power dissipated from the earth is proportional to the forth power of the temperature yet we average temperature and not the forth power of the temperature. To compensate for this fact I suggest that one of the forcing terms be -^(1/4).

    The regression coefficient for this term should be roughly proportional to the third power of temperature and we expect close agreement with the equations for black body radiation. I also suggest that we try to quantity other forcing terms like thermal transfer of energy though convection and evaporation. I wonder if any of these forcing components can be estimated though satellite data.

  116. Steve McIntyre
    Posted Aug 31, 2006 at 12:50 PM | Permalink

    RGB – nice to have you posting. Very interesting comments. I think that I’ll put some threads on solar topics.

    TAC or RGB – would either of you like to do a post on Demetris’ article linked in 101 clipping out the key Figure? If so, email it to me or post it at Road Map and I’ll transfer.

  117. bender
    Posted Aug 31, 2006 at 3:14 PM | Permalink

    Re: #101
    Anyone interested in red noise processes and scaling of uncertainty should read papers on 1/f noise as well.

  118. bender
    Posted Aug 31, 2006 at 3:19 PM | Permalink

    Re 114:

    The only problem is that with three or more degrees of freedom, chaos ensues

    Sure, Lorenz (1963). Of course it is possible that the chaos that emerges at one time-space-scale is embedded in higher-order patterns – non-chatic patterns – that emerge at larger time-space-scales. No?

  119. srp
    Posted Aug 31, 2006 at 4:37 PM | Permalink

    Bender: I read Baum’s book What is Thought? and he describes a theorem by Vapnik and somebody else (whose name regrettably slips my mind) that precisely describes the fitting/overfitting behavior of NNs in a binary classification context. The basic idea, loosely, seems to be that if a) the data generating process is stationary and b) the NN is trained on historical samples, then c) the NN’s predictive power is inversely related to the number of free parameters the training has searched over. This is all defined in a precise mathematical way, but that’s the gist of it–if you find a good fit with one or two parameters, then it’s likely to work out of sample, but if you splined the data, there’s no reason to think your fit is going to be good on the next instance.

    That said, I also recall reading that NNs are no panacea in that training and convergence are often painfully slow and ineffective if the problem structure isn’t “friendly” to the NN’s architecture. It’s probably worth a try, though.

  120. Posted Aug 31, 2006 at 8:29 PM | Permalink

    RGB,

    Cosmic rays presumably act on cloud formation, so there’s a huge amplifying mechanism there. No need for a lot of energy. Cosmic rays are in antiphase with the sun’s activity, since the sun’s magnetic field shields us from them. On the other hand, there is mounting evidence that the recent warming can be accounted for by a decreasing albedo over the past 25 years, apparently due to a lack of clouds, especially low-level clouds. We now have satellites observing cloud cover, so we have about 20 years of data. The radiation budget (incoming short wave radiation minus outgoing long wave radiation )has been observed to be much more variable over decadal time scales than is predicted by any model. However, the decreasing albedo trend has apparently reversed over the past 2-3 years. Interestingly, the ocean temperature has also started to drop at the same time, and has lost 20% of the heat it had accumulated over the past 20 years. None of these observations has been predicted by the models. But that’s not surprising since cloud formation is the one thing we don’t understand, and it has to be fully parametrized in the models, with more or less guess values. There is clearly a picture emerging from all this that the sun’s activity, and its direct and indirect effects, are the main driver of the climate dynamics. It could very well be that in the end, we find that CO2 has a very small role in all this.

    Question for Steve M. and John A.: Is there a directory somewhere where one could upload interesting papers, so that we can reference them in our posts and we are certain to find them? I download a lot of papers, but don’t always keep the url where I got them, so whenever I want to post a link, I have to search for them again, and sometimes they’re not there any more, and in any case it’s quite time consuming. I also have papers that were sent to me directly by the authors, and are not available on the web. You could build a bank of relevant papers for the blog.

  121. John Creighton
    Posted Aug 31, 2006 at 9:32 PM | Permalink

    #119 I doubt that the Neural Networks predictive power is always inversely proportional to the number of free parameters. There should be an optimal number of parameters. I’ve tried doing noise removal before in speech with a predictive MA filter and what I found was that the number of parameters that you need it had to be sufficient long enough to describe at least one or two cycles of the vowel frequency.

  122. Barclay E. MacDonald
    Posted Aug 31, 2006 at 9:37 PM | Permalink

    Francois, there may be better choices, but I just started using Google Notebook. It may be a very useful solution. Check it out here

  123. Steve McIntyre
    Posted Aug 31, 2006 at 9:45 PM | Permalink

    #120. Francois, esnips.com (used by jae) has free storage that would be a good place to upload papers. I’m thinking of using it for pdf storage.

  124. John Creighton
    Posted Aug 31, 2006 at 11:25 PM | Permalink

    #120 you may be right. Perhaps cosmic rays are the biggest drivers of temperature changes because of the reaction in the atmosphere that creates clouds. If you look at figure 12 on:

    http://www.dsri.dk/~hsv/SSR_Paper.pdf

    You see the plot of C14 anomalies looks like the mirror image of the low frequency information of what we believe the climate looked like for the last 1000 years. C14 is created by the interaction of the cosmic rays with the atmosphere. The paper also shows a very strong relationship between cosmic radiation and the amount of cloud cover. Unfortunately we don’t have data going back very far. Maybe the C14 data could be used to fill in the past data where the records are missing for cosmic rays and cloud cover.

    More papers on cosmic radiation and cloud cover can be found here:

    http://www.dsri.dk/~hsv/

  125. John Creighton
    Posted Sep 1, 2006 at 12:29 AM | Permalink

    I’m looking at the instrumental temperature record from 1600 to 2000 and I’m looking at the C14 over that same period and I can’t help but think that the C14 which is an indicated of cosmic rays, fits the temperature data much better then solar, greenhouse gases and volcanic eruptions. I wonder if the other drivers are significant at all. Unfortunately the C14 does not explain the high frequency data so I am still left to wonder what cause this.

    Perhaps the high frequency data has to do with when the clouds were formed. If the clouds were formed evenly throughout the year there would be less cooling I think then if it was concentrated within a shorter period of time. Or perhaps it has something to do with the global distribution of cloud cover. More clouds at the equator would have a greater cooling effect then at the poles I think. There is also the altitude distribution of clouds. I confess I haven’t read the paper yet.

  126. Pat Frank
    Posted Sep 1, 2006 at 1:48 AM | Permalink

    #96 — “My primary conclusion isn’t that human forced global warming is or isn’t true. It is that it is absurd to claim that we can even THINK of answering the question at this point in time, and that there exists substantial evidence to the contrary, with solar dynamics known and unknown being a very plausible contender as primary agent for global climate…

    Virtually every scientist who has posted a considered ‘global’ opinion of the state of climate science here, with the exception of John Hunter, has offered a similar view.

  127. James Lane
    Posted Sep 1, 2006 at 3:48 AM | Permalink

    Re #107

    Could someone refer me to the preprint that rgb is discussing?

  128. Demesure
    Posted Sep 1, 2006 at 3:58 AM | Permalink

    Here is a plot of Willis’ data #63: http://opelinjection.free.fr/imagesforum/gw_models.jpg
    HadCrudt3 and Giss provide the same temps over the last decades, except for notable exceptions of 1998 (highest temp recorded for HadCrudt3) and 2005 (highest temp recorded for Giss).
    A new puzzle for climate science.

  129. TAC
    Posted Sep 1, 2006 at 5:08 AM | Permalink

    #127 I think the preprint you refer to is of one of Koutsoyiannis’s latest papers (here). It was cited in #101. I (and/or RGB) will likely be sending SteveM a post on it.

  130. TAC
    Posted Sep 1, 2006 at 5:13 AM | Permalink

    I also like this paper paper (here or here) on the difficulty of determining trend significance in the presence of possible long-term persistence.

  131. John Finn
    Posted Sep 1, 2006 at 5:55 AM | Permalink

    Just a comment on the Hansen graph

    In his article Hansen writes

    Scenarios B and C also included occasional large volcanic eruptions

    This would suggest that the plots of B and C are ‘lower’ than they might have been (without volcanos). The last major eruption was in 1991 (Pinatubo) which according to NASA was responsible for a temperature drop of up to 0.5 deg C and affected global temps for up to 3 years.

    If Hansen were to re-run his model with just the 1991 volcanic eruption the observed temperatures would be running well below both scenarios B & C.

    On the discrepancies between GISS and HADCRUT. Tim lambert is correct about the different anomaly periods, but the ‘discrepancy’ seems to be growing (it should be reasonably constant). I think it may be due to the way the ocean temperatures are measured.

  132. BKC
    Posted Sep 1, 2006 at 8:22 AM | Permalink

    Re. #120 and #124

    Here’s another interesting paper (abstract) that provides very strong evidence (IMHO) that GCRs modulate cloud cover.

  133. Posted Sep 1, 2006 at 8:44 AM | Permalink

    OK, I should, of course, be doing something like actual work but this is more fun so I’ll see what I can do about the questions/suggestions above.

    Lessee…

    a) Currently I am completely unfunded by federal money, and while I have been so funded in the past, the agency (ARO) that funded me could give a rodent’s furry behind for the entire weather debate one way or the other. In fact, I’d say that is true for pretty much all physics funding. Physics has its own problems with politicization of the funding process, but thankfully they are vastly smaller than Climate. So ain’t nobody holding a gun to my head here, I have no vested economic interest in this discussion, and although I have no way of “proving” this I am an ethical person — an ex-boy scout, a university professor and student advisor, a beowulf computing guy, open source fanatic, husband, father, and have hardly ever been accused of crimes major or minor over the years. I did get caught driving with an expired registration once, if that counts. So: “Everything I state in this discussion is my actual, unpaid for opinion based on doing half-assed web-based research and applying whatever I have learned for better or worse in 30 years or so of doing and teaching theoretical physics, advanced computation, statistical mechanical simulation, and in the course of starting up a company that does predictive modelling with neural networks (currently defunct, so no that is not a vested interest either).” Good enough disclaimer?

    b) Re: Neural networks. I have fairly extensive experience with neural networks (having written a very advanced one that uses a whole bunch of stuff derived from physics/stat mech to accelerate the optimization process for problems of very high input dimensionality). As I said, one of the best ways of viewing a NN is as a generalized multivariate nonlinear function approximator. In commercial predictive modelling, the multivariate nonlinear function one is attempting to model is the probability distribution function, usually used as a binary classification tool (will he/won’t he e.g. purchase the following product if an offer is made, based on demographic inputs). However, NNs can equally well be trained to just plain model nonlinear complex (in the Santa Fe Institute sense, not complex number sense though of course they can do that as well) functions presented with noisy training data pulled from that function, not necessarily in an iid sense since the support of a 100 dimensional function may well live in a teeny weeny subvolume and there isn’t enough time in the universe to actually meaningfully sample it (even with only binary inputs, let alone real number inputs).

    For very high dimensional functions with unknown analytic structure, I personally think that NNs are pretty much the only game in town. To use anything else (except possibly Parzen-Bayes networks or various projective trees for certain classes of problems) restricts the result to preordained projective subspaces of the actual problem space. The process of projection, especially onto separable/orthogonal multivariate bases, can completely erase the significant multivariate features in the data, hiding key relationships and costing you accuracy.

    NNs have a variety of “interesting” properties (in the sense of the chinese curse:-). For one, constructing one with a “canned” program (even a commercial canned program) is likely to fail immediately for a novice user leaving them with the impression that they don’t work. Building a successful NN for a difficult problem (with that really DOES have high dimensionality and nontrivial internal correlations between input degrees of freedom and the output desired) is as much art as it is science, simply because the construction process involves solving an NP complete optimization problem and one needs to bring heavy guns to bear on it to have good success. Training takes as long as days or weeks, not minutes, and may require trying several different structures of network to succeed, facts that elude a lot of casual appliers of canned NNs. Having the source to the NN in question is even an excellent idea, because one may need to actually use human intuition, insight, and a devilishly deep understanding of “how it all works” to rebuild the NN on the spot at the source code level to manage certain problems.

    NNs also have a built in “heisenberg”-like process that resembles in some ways the quantum physics one (which is based on vector identities in a functional linear vector space or the properties of the fourier transform that links position and momentum descriptions as you prefer). If one builds a network with “too much power”, a NN overfits the data, effectively “memorizing specific instances” from the training set and using its internal structure to identify them, but then it does a poor job of interpolating or extrapolating. If one builds a network with too little “power” (too few hidden layer neurons, basically) then the network cannot encode all of the nonlinear structures that actually contribute to the solution and it again fails to reach is extrapolatory/interpolatory optimum. So in addition to building a NN (solving a rather large optimization problem in and of itself) one has to also optimize the structure of the NN itself in terms of number of hidden layer neurons, precisely what inputs to use from a potentially huge set of input variables (building a NN with more than order 100 inputs starts to get very sketchy even with a cluster, even with NN’s significant scaling advantage in searching/sampling the available space, even for my program), and then there are a number of other control variables and possibilities for customized structure to optimize on top of that for truly difficult problems. Finally, NNs perform well for certain mappings of the input numbers into “neural form” and perform absolutely terribly for other encodings of the same input data. One has to actually understand how NNs work and what the data represents to present inputs to the network in a way that makes it relatively “easy” to find a halfway decent optimum (noting that the system is nearly ALWAYS sufficiently complex that you won’t find THE optimum, just one that is better than (say) 99.999999% of the local optima one might find by naive monte carlo sampling followed by e.g. a conjugate gradient or worse, a simple regression fit).

    Oops! I forgot to mention the entire problem of selecting an adequate training set! This problem is coupled to a number of the others above — for example, finding the right resolving power for the network — and above all, is related to the problem of being able to extrapolate any model beyond the range of the functional data used to build the model. This problem is discussed in a separate section below, so I won’t say more about it now, but at that time you will see that NNs aren’t any more or less capable of solving this problem in the strict sense of Jaynes (the world’s best description of axiomatic probability, in my humble opinion) and maximum likelihood, Polya’s urn, entropy etc for people who know what they are. Extrapolation necessarily involves making assumptions about the underlying functional form (even if they are only “it is an analytic function and hence smoothly extensible” and one can always find or define an infinite number (literally) of possible exceptions where the assumption breaks down. This leads to some pretty heavy stuff, mathematically speaking, concerning the dimensionality of the actual underlying function, the way that dimensionality projects into the actual dimensions you are fitting which may be non-orthogonal curvilinear transforms of the true dimensions or worse, and the log of the missing information — the information lost in the process of projection. WAY more than we want to cover here, way more than I CAN cover here — read Jaynes and prosper.

    From this you may imagine that I don’t think much of published conclusions concerning NNs — they tend to be based on simple feed forward/back propagation networks applied to simple problems or naively applied to problems that they cannot possibly solve, although there are exceptions. The exceptions are worth a lot of money, though, so people who work out truly notable exceptions do not necessarily publish at all.

    With regard to the specific problem at hand — building NNs to model a nonlinear function “T(i)” (temperature as a function of some given vector of “presumed climatological variables that might be either causal agent descriptors or functions of causal agent descriptions”) — the problem itself is actually simple enough that I think that NNs would do fairly well, subject to several constraints. The most important constraint is that the network needs to be trained on valid data. NNs aren’t “magical” any more than human brains are — they suffer from GIGO as much as any program on earth. They are simply far, far better at searching high dimensional spaces in a mostly UNSTRUCTURED way that makes the fewest possible assumptions about the underlying form of the function being fit, in comparison to using a power series, a fourier representation, a (yuk!) multivariate logistic representation, or fitting anything with a quasi-linear few-variable form that effectively separates contributions from the different dimensions into a simple product of monotonic forms (assumptions that are absolutely not justified in the current problem, by the way).

    As previously noted, this leads us back to the M&M vs Mann result, and the lovely papers with links provided by TAC. I don’t really see how to uplink and embed jpgs grabbed from the figures in these papers and I have to teach graduate E&M in a couple of hours (which does require some prep:-) so I’ll try to summarize the basic idea in (my own) general terms. I STRONGLY urge people to at least skim the actual papers by following the links TAC provided as they use simpler (although perhaps less precise) language than that below and besides, have simply LOVELY figures that say it all.

    c) Suppose you have a completely deterministic function (equation of motion) with (say) 10^whatever degrees of freedom, where whatever is “large”, 10^whatever is “very large” and numbers like 10^whatever! are “good friends with infinity, who lives just past the end of their street” (the latter number figures prominently in various computations of probability in this sort of system). OK, so we cannot think about actually solving such a system, so we go from the actual equation of motion to its Generalized Master Equation. The way this works is that one selects a subset of the degrees of freedom or a functional transform thereof and performs a projection from the actual degrees of freedom to the new ones. To do this one has to embrace a statistical description of the underlying process and average over the neglected degrees of freedom. This results in the appearance of new, coarse grained degrees of freedom (like “temperature”, which is a proxy as it were for the average internal energy per degree of freedom in a multiparticle physical system at equilibrium, but there may be many others as well) and a new equation of motion for the quantities that remain in your microscopic description. See in particular links on the master equation page for Chapman-Kolmogorov and Fokker-Planck and Langevin.

    Note well that at the CK level, solutions on the projective subspace are most generally going to be the result of solving non-Markovian integrodifferential equations with a kernel that makes the time derivative of the quantities of interest at the current time (say, the joint probability distribution for global temperatures at all the measurement stations around the planet as a function of time) a function of not only the current values of those states and other input variables describing the current values of projective (coarse grained averaged) quantities, but the values of those variables at a continuum of times into the past that has to be integrated over. The system has a “memory”, and the dynamics are no longer local in time. Note well that the MICROSCOPIC dynamics IS time-local, but in the process of projection and coarse-grain averaging the new variables “forget” critical information from earlier times that would have been encoded on the degrees of freedom averaged over, and that is still at least weakly encoded on previous-time values of those average degrees of freedom. This sort of evolution is a generalized master equation, and alas wikipedia doesn’t yet have a reference for it but they are in use in physics in a number of places in e.g. the quantum optics of open systems.

    Sorry, I don’t know of any easier way of describing this, as this is the actual bare bones underlying mathematical structure of the actual physical problem one is trying to solve. One CAN (and obviously most people DO) just naively say “hey, global temperature (however that is defined) might be a smooth function of CO_2 concentration (however that is defined) with the following (presumed) parametric form and here is the best parametric fit to that form in comparison to the data” without even thinking about those hidden degrees of freedom and non-Markovian effects, but that is, really, a pretty silly thing to do WITHOUT even explicitly acknowledging the limitations on the likely meaning of the result.

    The point is that there are many physical systems where the local dynamics depends on the particular history of how one arrived at the current state, not just the values of its variables at that state. Pretty much any non-equilibrium, open system in statistical mechanics, for example.

    This gives you the merest glimpse at the true complexity of the problem at hand. The neglected degrees of freedom of the coarse grained variables are responsible for the colored stochastic noise that appears in the actual distribution one is trying to time-evolve or state-evolve, the state evolution is generally non-Markovian so that making a Markov approximation and time evolving it only on the basis of current state is itself an additional source of error and erases the POSSIBILITY of certain kinds of dynamics in the result, etc.

    Now, again for the specific problem at hand, the data being fit is a coarse-grained average of temperature measurements. Those measurements were recorded over hundreds of years and in a very, very inhomogeneous distribution of locations. Those measurements themselves (as measurements always are) are subject to errors both systematic and random — the former become what amounts to an unknown functional transformation of the results from each measurement apparatus and the latter appears as noise of one sort or another.

    The number of locations, the site of the locations, the unknown transformation of the measurements from these locations, all themselves have varied in ways both known and unknown over time. Finally, the results from those locations are THEMSELVES transformed in a specific way into the single number we are calling (e.g.) “global average temperature for the year 1853″ (or 2004, or 1932). Note that different transforms are required for all of those years simply because of HUGE differences in the profile of the contributing sites over decadal timescales. Note also that there are those that accuse the transform itself being used of significant bias, at least in recent years. I cannot address this — it really isn’t necessary. The point is that there is an UNCERTAINTY in T(t) for any given year, and that the deviation is almost certainly not going to be a standard normal error but rather an unknown systematic bias with a superimposed variance that is not normal. Curve fitting without an error estimate on the points is already an interesting exercise that we will not now examine, especially when that error estimate is not presumed normal — suffice it to say that this unknown error SHOULD cause us to reduce the confidence we place in the resulting fit.

    This, then, is the data that is extended by proxies by Mann et. al. to produce a temperature estimate back roughly 1000 years, T(t) for t in the general range of 1000-200* CE. Let us call this curve T_mea(t) (for Mann et. al.). MEA used tree rings as proxies, and as I’m sure everybody who is reading this site is aware, used a statistical weighting mechanism while effectively normalizing the T(t) for the last couple of hundred years to current tree rings that de facto gave a huge weight to just two species of tree in their sample, both with highly questionable growth ring patterns in modern times that may or may not be related to temperature at all and that are not CONSISTENT with patterns observed in the growth rings of hundreds of other neighboring species over the same interval. By doing what amounted to a terrible job with the actual process of proxy extrapolation, they completely erased a warm period back in the 1300-1400’s that is well-documented historically and by all the OTHER tree ring proxies they claimed to use.

    All of this computation was hidden, of course. M&M, by means of some brilliant detective work and bulldog-like dedication to task, had to DEDUCE that this was what had been done by attempting to reproduce their result and by means of limited private communications with MEA who as of the last M&M paper I read still have not disclosed all of their actual code.

    Steve, it would actually be lovely if you would drop A copy of the basic figure from one of your papers in here — one with and one without the hockey stick form (or if you like, one with and one without bristlecone pines being anomalously weighted).

    Two more remarks and then I have to go teach, sorry — maybe I’ll get back to this if there is still interest in people hearing more.

    First, at the VERY least this means that there is YET ANOTHER layer of systematic error AND stochastic noise on top of the projection of T(t) back over 1000 years via proxies. In my opinion, having read M&M and understanding what they are talking about, at least one source of systematic error has been uncovered by them and can be resolved by simply shifting from T_mea(t) to T_mm(t), which shows that temperatures a mere 700 years ago were as warm or warmer than they are today, in the complete absence of anything LIKE the anthropogenic CO_2 that everybody is worried about today.

    Second, an interval 200 years long, or 1000 years long, is still tiny on a geological scale and we KNOW that there are significant geological scale variations in global temperature. Forget possible causes — remember the process of projection and coarse graining/averaging that is implicit in any such description. Think instead about the TAC-referenced papers. We could well be in the position of the ant, living on the side of Mount Everest, who is trying to decide how to climb to the top of the world. To figure out which way to go, he wanders around a bit in his immediate neighborhood, and seems a few grains of sand, a gum wrapper, and — look! An acorn sitting just downhill of him! The shortsided ant climbs to the top of the acorn and proclaims himself king of the world.

    Right.

    BEFORE talking about causes, curves, fits, ALL of the above has to be understood and the data itself has to be reliable. Error estimates have to be attached to the data being fit, and those estimates have to include the effects of convolving the measurement process with the actual values both systematic and random. Finally, NO method for fitting the data to ANY control parameters will succeed if the data has significant variation on timescales larger than the window being fit that are not represented in the basis of the fit.

    Sigh. Have to go. Perhaps I’ll return later to the actual point, which was NNs vs the data span and their likely ability to predict. In an acorn-shell, if the training data spans the REAL range of data in the periods of interest, it should be as good at capturing internal multivariate projective variation as anything. Nothing can extrapolate without assumptions, there is no way to validate those assumptions without still more data. Period.

    rgb

  134. Demesure
    Posted Sep 1, 2006 at 9:00 AM | Permalink

    John Finn said : On the discrepancies between GISS and HADCRUT. Tim lambert is correct about the different anomaly periods, but the “discfrepancy’ seems to be growing (it should be reasonably constant).

    It is growing… or it is lessening depending on the year. Maybe the discrepancies amplitude depends on the number of conventions/year climatologist made to “correct” data. Maybe when Hansen will be no more head of the GISS, GISTEMP will decrease. Who knows.

  135. KevinUK
    Posted Sep 1, 2006 at 9:19 AM | Permalink

    #133, RGB

    Hard going but fascinating stuff. Slight off thread but keep it up.

    Kevin

  136. Dave Dardinger
    Posted Sep 1, 2006 at 10:15 AM | Permalink

    re: 133,

    It’s such fun reading messages like this. It’s just slightly over my head such that I can reasonably judge it to be correct since in those places where I clearly understand it’s correct and where I can’t there are no glaring problems.

    And interestingly, it’s not as far over my head in one sense as Steve M is often since it’s a more general discussion rather than one relying on detailed technical complexities. Still I wish I could take a semester long class on NNs with you as I’m sure I’d learn lots. For that matter it’s too bad I didn’t have someone like you as a teacher when I took E&M (though it was just an undergrad class.) My teacher may have known what he was talking about but we were in a small college and there were only 4 students in the class but he still lectured like he was talking to 100. Very off-putting.

  137. Posted Sep 1, 2006 at 1:04 PM | Permalink

    Whew! I just reread what I wrote and it is far too much, sorry.

    Let me wrap up (since I have a couple of minutes before I have to do my next daily chore) with the following. This isn’t my field, and I have no idea where to get actual data sets, e.g. T_mm(t), T_mea(t), CCO_2(t). I did find sunspots on the web from 1755 on, or so it appears, which unfortunately doesn’t extend back to 1300. There are solar dynamics theories that try to extend patterns back that far and there may even be comparable data of some sort as sunspots were observed a LONG time ago by this culture or that, but I don’t know where to get that data if it does.

    If anybody can direct me at the data, I’d be happy to build a genetically optimized NN (with bells and whistles added as needed) to see it can do with various inputs to predict e.g. T(t,i) where i are the input vectors and where t may or may not be explicitly included (probably not, actually).

    What this can do is give one a reason to believe that a nonlinear functional mapping exists between the inputs used and the target in the different cases. It will not tell one what that relationship is, or whether the relationship is direct or indirect, only that with a certain set of inputs one “can” build a good network and without them one “cannot”, all things being equal. It will not be able to address, in all probability, whether or not e.g. CO_2 or sunspots or the cost of futures in orange juice is “the” best or worst variable to use as an input if, as is not unreasonable, it is discovered that all three are correlated in their variation to T() (quite possibly as effect, not cause, in some cases). But it might be fun just to see what one can see.

    rgb

  138. John Creighton
    Posted Sep 1, 2006 at 1:15 PM | Permalink

    Well here is the data Michael Man used. I got it from the nature website.

    http://www.geocities.com/s243a/warm/code/m_fig7_solar.txt

    http://www.geocities.com/s243a/warm/code/m_fig7_volcanic.txt

    http://www.geocities.com/s243a/warm/code/nhmean.txt

    http://www.geocities.com/s243a/warm/code/fig7_co2.txt

    Sometime I am coning to try to find the data in a more raw form and rebuild it but not today.

    I don’t know if the following links will be helpful but there is more data that you might want to incorporate.

    http://www.ngdc.noaa.gov/mgg/geology/geologydata.html

    http://www.ngdc.noaa.gov/stp/SOLAR/solarda3.html

    http://wdc.cricyt.edu.ar/paleo/datalist.html

    I also found some length of day data which I posted here:

    http://www.climateaudit.org/?p=692#comment-43941

    Let me try this again. FOr some reason if I put too many links the spam filter gets me.

  139. John Creighton
    Posted Sep 1, 2006 at 1:15 PM | Permalink

    Well here is the data Michael Man used. I got it from the nature website.

    http://www.geocities.com/s243a/warm/code/m_fig7_solar.txt

    http://www.geocities.com/s243a/warm/code/m_fig7_volcanic.txt

    http://www.geocities.com/s243a/warm/code/nhmean.txt

    http://www.geocities.com/s243a/warm/code/fig7_co2.txt

    Sometime I am coning to try to find the data in a more raw form and rebuild it but not today.

  140. John Creighton
    Posted Sep 1, 2006 at 1:17 PM | Permalink

    I don’t know if the following links will be helpful but there is more data that you might want to incorporate.

    http://www.ngdc.noaa.gov/mgg/geology/geologydata.html

    http://www.ngdc.noaa.gov/stp/SOLAR/solarda3.html

    http://wdc.cricyt.edu.ar/paleo/datalist.html

  141. John Creighton
    Posted Sep 1, 2006 at 1:18 PM | Permalink

    I also found some length of day data which I posted here:

    http://www.climateaudit.org/?p=692#comment-43941

    P.S. I would of put this all in one post but the spam filter doesn’t seem to like me putting a lot of links in one post.

  142. Posted Sep 1, 2006 at 4:56 PM | Permalink

    Thanks, John. I recorded links and looked at a bunch of data back to 1000 or so. It is really amazing that in Mann’s T_mea(t) the medieval optimum has just plain disappeared, and the maunder and sporer events are almost invisible. How could anybody take this seriously? I pulled just a couple of proxies (e.g. some african lake data, chinese river data) and they show beyond any question an extended warm spell in the 1100-1300 range that was clearly global in scope. I thought that this was visible in nearly all the tree ring data on the planet — but I see that now I can look for myself (if I can figure out how — there is a LOT of tree ring data, and of course (sigh) tree growth is itself multivariate and not a trivial or even a monotonic function of temperature).

    This leaves me with the usual problem — what to fit and how large a range to try to fit (or rather model with a predictive model, not exactly the same thing). There are sunspot proxies that go back over the 1000 year period — I don’t know exactly how that works but they are there. I’ll have to look at a couple of papers on solar dynamics and see if I can improve on this with perhaps orbital or magnetic data.

    I did look over the data on the variation of earth’s rotational period. Difficult to know what to make of this, as I’m not sure what this data reflects physically. There is transfer of angular momentum to and from the earth via e.g. tides and the moon and sun and other planets, there can also be CONSERVED angular momentum but a change in the earth’s moment of inertia due to internal mass rearrangements (upwelling magma? plate tectonics? something large scale). I’d expect none of these to be elastic processes and for there to be a large release of heat, in particular, accompanying anything but uniform motion as internal forces work to keep the essentially fluid (on this sort of time scale) body rotating roughly homogeneously.

    I’ll have to do a back of the envelope calculation to see if the energy changes that might be associated with the variation are of an order that could affect the temperature of the earth’s crust itself. Geodynamics is of course another potential heat source that may or may not be constant. I’d assumed that it was constant or very slowly varying, but this data suggests that it might not be.

    Anyway, it will take me some time to do the computations, so I’ll probably be quiet now until I have something concrete to say (if the thread survives until then:-).

    rgb

  143. Martin Ringo
    Posted Sep 1, 2006 at 10:32 PM | Permalink

    A FYI and follow-up on earlier posts about trends versus model scenarios

    Because there was am early discussion in this thread of the relative merits of a simple time trend versus the Hansen Scenarios, I thought it would be interest to run a Diebold-Mariano predictive accuracy test.
    This test allows testing for different objective values — absolute value difference, squared differences, weighted, etc. — and accounts for the autocorrelations and heteroskedasticity of forecast errors (predicted minus actual values): characteristic missing in most common tests of forecast efficiency. The tests a “forecast” comparison of the Scenarios versus both a simple linear trend and a quadratic trend with an ARMA(3,3) structure, and were run for both absolute value and squared differences from the actual anomalies with both the GISS and Hadley series of anomalies. All series centered on the 1958-1988, common data set, means. Out of the 24 comparison there were 12 rejections of the “no statistical difference” between the forecasts, and each case the time series trend forecast was closer to the actuals.

    The results shown below [how does one post a table into a comment?] should not be interpreted as saying that a simple time series forecast is really superior. Rather, it should more modestly be considered to say that the Hansen scenarios offer no more predictive accuracy over what can be seen from a naàƒÆ’à‚⮶e or semi-semi extrapolation of the series.

    ABSOLUTE VALUE OF DIFFERENCES FROM ACTUAL
    Scenario Trend Data ……Rejection.. t-stat . % Prob
    A Linear GISS………………..Yes…….-3.52………0.16%
    B Linear GISS…………………No………0.42……..33.92%
    C Linear GISS…………………No………1.42……….8.84%
    A ARMA+quadratic GISS..Yes…….-5.55………0.00%
    B ARMA+quadratic GISS..Yes……-2.34……….1.67%
    C ARMA+quadratic GISS…No…….0.14………44.71%

    Scenario Trend Data ….Rejection.. t-stat . % Prob
    A Linear HADLEY…………….Yes……-7.44……..0.00%
    B Linear HADLEY……………..No……..0.35…….36.46%
    C Linear HADLEY……………..No……..1.26…….11.27%
    A ARMA+quadratic HAD….Yes…….-8.06……..0.00%
    B ARMA+quadratic HAD….Yes……-2.75……….0.74%
    C ARMA+quadratic HAD…..No…….0.90……..19.09%

    SQUARED DIFFERENCES FROM ACTUAL
    Scenario Trend Data ……Rejection.. t-stat . % Prob
    A Linear GISS……………….Yes…….-3.57………0.14%
    B Linear GISS………………..No………0.39……..35.09%
    C Linear GISS………………..No………1.28……..10.96%
    A ARMA+quadratic GISS.Yes…….-6.44………0.00%
    B ARMA+quadratic GISS.Yes……-2.70……….0.82%
    C ARMA+quadratic GISS..No…….-0.72………24.13%

    Scenario Trend Data .. …Rejection. t-stat . % Prob
    A Linear HADLEY…………….Yes….-9.61………0.00%
    B Linear HADLEY……………..No……0.73……..23.86%
    C Linear HADLEY……………..No……1.47……….8.12%
    A ARMA+quadratic HAD….Yes……-5.64………0.00%
    B ARMA+quadratic HAD…Yes……-2.10……….2.64%
    C ARMA+quadratic HAD….No…….0.27………29.42%

  144. Martin Ringo
    Posted Sep 1, 2006 at 10:38 PM | Permalink

    Re: RGB #133 and preceding: Neural Networks
    Suppose a Neural Net I is used to make a univariate reconstruction or forecast (from some K number of “forcing,” “exogenous” or driving explanatory variables — no causal relationship assume). How is the distribution of the prediction — presume it is some series — known? You description of NN “fitting” as “generalized nonlinear approximators” is more than apposite. And generally the problem with nonlinear estimators is that one has to resort to asymptotic methods to determine the distributional properties, which leave us poor souls living in the finite sample universe often practicing statistics on faith. Anyway, I was fascinated by your comments and curious how you dealt with the statistical properties of NN.

    [Confession: I tried writing a NN program when I was first learning C++. The exercise scared me off object oriented programming and neural nets ever since.]

  145. John Creighton
    Posted Sep 1, 2006 at 11:24 PM | Permalink

    Speaking of C14 has anyone heard about the Suess effect? Apparently galactic cosmic rays are not the only thing the effects the c14 concentration. Also the burning of c14 depleted fossil fuels. I’m curious though, wouldn’t it be c14 rich fuels that should effect the c14 concentration the most. I wonder how difficult it would be to correct the c14 concentration for fossil fuel burnings so we could isolate the solar effects. I also wonder if frost fires effect the c14 concentration at all.

  146. John Creighton
    Posted Sep 1, 2006 at 11:31 PM | Permalink

    Oh, I understand it now. If the c14 of the fuels we burn has a lower c14 concentration that what is in the atmosphere we dilute it. Fossil fuels since they are older should have a lower c14 concentration. Trees should have a c14 concentration much closer to the atmosphere. Thus forest fires should have a much less significant effect on c14 concentration then fossil fuel burning. If we want to use c14 as a solar proxies we have to correct for the Suess effect.

    http://groups.google.ca/group/sci.chem/tree/browse_frm/thread/cbeb77c657f28598/1fa7f4bd65aa2f62?rnum=61&hl=en&q=seuss+effect+c14&_done=%2Fgroup%2Fsci.chem%2Fbrowse_frm%2Fthread%2Fcbeb77c657f28598%2Fce9ccb1a9c8e2554%3Flnk%3Dst%26q%3Dseuss+effect+c14%26rnum%3D5%26hl%3Den%26#doc_ce9ccb1a9c8e2554

  147. Posted Sep 2, 2006 at 5:36 PM | Permalink

    Re #144

    I just don’t worry about the statistical properties of NNs — I view them as “practical” predictive agents, not as formal statistical fits. In fact, I feel the same way about modelling in general in cases where the underlying model is effectively completely unknown, so one gets no help from Bayes and no help from a knowledge of functional forms.

    I’m working currently on a major random number testing program (GPL) called “dieharder”, that incorporates all of the old diehard RNG tests AND will (eventually) incorporate all the STS/NIST tests, some of the Knuth tests, and more as I think of things or find them in the literature. A truly universal RNG testing shell with a fairly flexible scheme for running RNG tests.

    In this context, I can speak precisely about statistical properties. Random number testing works by taking a distribution of some sort that one can generate from random numbers and that has some known property (ideally one that is e.g. normally distributed with known mean and variance). One generates the distribution, evaluates the statistic, and compares the mean result to the expected mean, computes e.g. chisq, transform to a p-value for the null hypothesis and I can then state “the probability of getting THIS result” from a PERFECT RNG is 0.001″ or whatever.

    Now, just how can one do that when one is pulling samples from an unknown distribution, with unknown mean, variance, kurtosis, skew, and other statistical moments, where the long time scale, short time scale, intermediate time scale behavior is not known, and where we KNOW that the underlying system is chaotic, with a high dimensionality to the primary causally correlated variables and clear mechanisms for nonlinear feedback? The answer is, we can’t. The basic point is that nobody can make a statement about p-value associated with any of the fits being discussed, which is why using them in public policy discussions is absurd.

    What we CAN do by simple inspection of the data is note that if the data range over 1000 years contains two warm excursions like the one we are just finishing this year, a simple maximum entropy assignment of probability (a la polya’s urn) for such excursions occurring in any give century is something like 10% to 20%, or in any given 100 year interval it is not unlikely to find at least one or two decades of similarly warm weather. To make statement beyond that requires accurate and similarly normalized data that stretches back over a longer time, and we observe (when we attempt to do so via proxies) long term temperature variability that vastly exceeds any of that observed in the tiny fraction of geological time since the invention of thermometer.

    So when I build NNs for this problem, it will not be so that I can “succeed” and build one that is highly predictive with some set of inputs and they say “Aha, now we understand the problem” or “Clearly these are the important inputs”. That’s impossible, given that there are lots of inputs and that their EFFECT is clearly all mixed up by feedback mechanisms so that they are all covariant in various ways. For example, there are clear long term variations in CO_2 (evident in e.g. ice core) that seem to occur with temperature. Is this cause? Effect? Both (via positive feedback through any of several positive mechanisms)?

    There are many worthwhile questions for science in all of this, but it is absolutely essential to separate out the real science, which SHOULD have a healthy amount of self-doubt and a strong requirement for validation and falsifiability (and hence for challenge from those that respectfully disagree) from the politics and public policy. Performing statistical studies is by far the least “meaningful” of all approaches to science, because correlation is not causality. It is easy in so many problems to show correlation. Most introductory stats books (at the college level) contain whole chapters with admonitory examples of how one can falsely claim that smoking causes premarital sex and other nonsense from observing correlations in the populations.

    Alas, people who become politicians, nay, who become presidents of the united states, may well be “C-” students in general and have never taken a singe stats course (let alone an advanced one), and besides, one of the lovely thing about misusing statistics is that is a fabulous vehicle for politicians and con artists both to make their pitches. “Help prevent teen pregnancy! Don’t let your daughter smoke!”

    I personally view statistical surveys and models of the sort being bandied about in this entire debate as being PRELIMINARY work one would do in the process of building a real science — exploring the correlations, trying to determine what needs to be explained and what CAN be MAYBE explained, and eventually connecting this back to “theoretical models” in the scientific sense. However, the PHYSICS of current models is so overwhelmingly underdone that I just think that it is absurd that anyone thinks that they can make any sort of statement at all about what causes what. The models contain parameters that are at best estimated and where the estimates can be off by a factor of two or more! They are missing entire physical mechanisms, or include them only by weak proxies (e.g. “solar activity” measures instead of “solar magnetic field” measures instead of “flux of cosmic rays”, where all of the above may vary in related ways, but with noise and quite possibly with additional significant functional variation from other systematic, neglected, mechanism).

    So seriously, the NN is “just for fun” and to see if one can read the tea leaves it produces for some insight, not to be able to make a statement with some degree of actual confidence (in the statistical sense). Sorry about your C++ experience — I actually don’t like C++ either as you can see from my website, where I have the “fake interview” on C++ that is pretty funny, if you are (like me) a C person…;-)

    rgb

  148. John Creighton
    Posted Sep 2, 2006 at 9:15 PM | Permalink

    Robert,
    Best of luck with the neural network fits. You may also want to look at fitting to Satellite data as opposed to instrumental data. I say this since steve’s thread:

    http://www.climateaudit.org/?p=300

    Suggests that it is easier to fit the satellite data to an ARMA model then the instrumental data. I haven’t looked at this too closely but it could save you some grief.

  149. Martin Ringo
    Posted Sep 4, 2006 at 12:16 PM | Permalink

    Re # 147
    I’m all for reading the tea leaves with a hope of insight. When I look at the various ARMA estimates I have made of annual or even monthly data, I don’t see any common structure to the patterns. I contrast this with daily and hourly temperature data which has been pretty consistent, at least for the data sets I have seen in the US. So maybe NNs catch a pattern which gives an insight which … And maybe the Laplacian efforts to model climate may come to fruition.

    If you have it, could you post the exact location of the “interview” regarding C++. I can’t really claim to be a C person — I haven’t written a line of C in over 10 years — but I still think K&R is, if not the greatest, then the cleanest book written on programming. But then the language is pretty clean also.

  150. Posted Sep 4, 2006 at 4:05 PM | Permalink

    Martin, GIYF but among other places it is here: rgb’s C++ Rant (and fake “interview”). Lest this trigger a flame war (always fun, but not necessarily for this venue) let me hasten to point out that while I personally prefer C to C++ for some of the reasons humorously given here, I really think that language preferences of this sort are a matter of religious taste and not worth really arguing about.

    John: There are really two NN projects that seem to be implicitly possible — one that uses the extremely accurate satellite data as you suggest (which alas, doesn’t extend back very far at all on the 1000 year scale) to model short time fluctuations, which almost certainly won’t extrapolate but which should have really good basic data and one that uses the infinitely arguable T(t) from proxies as the model target and SOME sort of input related to e.g. solar activity?

    The problem with NN predictive models is that one has to be very careful picking one’s inputs to avoid certain obvious problems. For example, using a single input of t (the year) would permit
    a network to be built that pretty much approximates T(t) via interpolation and limited extrapolation. However, this isn’t desireable — one could do as well with a fourier transform and looking for important frequencies. Indeed, one hopes that this latter thing has already been done, as it would certainly yield important information.

    Yet the SPECIFIC aspects of solar dynamics that may be “the primary variable” in determining T may not be precisely reflected in “just” sunspot count, and sunspot counts per year only go back so far (roughly 1600’s), at least accurately. So one is tempted to extend them with extrapolated patterns, which in turn beg several questions about e.g. long term gleissberg-type fluctuations and which CAN become nothing but transforms of t if one isn’t careful. This kind of thing has been done many times by Friis-Christensen (who doesn’t JUST look at sunspot intervals but attempts to find evidence of deeper patterns in the sun’s irregular but predictable orbital behavior and its connection with its rotation and magnetic properties). The solar models seem to be getting there, but are still largely incomplete and to some extent phenomenological.

    The good thing about the NN in this context is that IF there is a realatively simple (e.g. fivefold) pattern in the underlying forcing/response, a suitably “stupid” NN will be forced to find a nonlinear model for it in order to end up with a good performance. One of my favorite demo problems, for example, is building a NN that can recognize binary integers that are divisible by e.g. 7, presented bitwise on its inputs. The amazing thing is that there exist networks that will “solve” this problem with something like 95% accuracy after being trained with only 25% or so of the data, given that a NN knows nothing about “division” and that the input neurons aren’t even ordinally labelled or asymmetric in any way. There is reason to hope for relations like those proposed by FC to be abstractable if the network has the right inputs.

    rgb

    Noting that all of this reflects the basic problems with all the other kinds of models one might try to build. Since the invention of NASA and weather satellites, we have increasingly accurate and complete data on global weather. Before that we have accurate and complete data only from a tiny fraction of the world, for an appallingly short period of time on a geological scale, making it extremely dangerous to jump to any sort of dynamical model conclusions.

    rgb

  151. John Creighton
    Posted Sep 4, 2006 at 11:05 PM | Permalink

    With regards to sunspot number I plotted the sunspot number and I looked at Mann’s graph of solar activity (Okay I forget what he called it) Anyway, Mann’s graph looked like the sunspot number put though a low pas filter. In the paper Mann referenced the figure was constructed from many solar indicators.

    So say Mann’s graph is an indication of solar flux while the sunspot number (with the mean subtracted) better correlates with solar magnetism and clouds. So a Neural network with suitably chosen nonlinearities may be able to extract a good deal of information from sun spot number alone. Additionally sunspot number extends though most of the years that Mann did his figure 7 correlations for.

  152. John Creighton
    Posted Sep 5, 2006 at 9:24 PM | Permalink

    Two topics of interest are how to combine high frequency data with low frequency data and how to compare estimates measured at different sampling rates. The problems arise for instance because of different types of data either measurement or proxies.

    My initial thoughts on the issue are if the systems aren’t too stiff then it may be okay to keep compare the estimates by finding a transformations from one controllable canonical form to another. Otherwise more numerically stable representations of the state space equations should be found. In either case I would suggest choosing a common form to compare the estimates obtained by both sets of data (e.g. state space, diagonalized, Jordan form, Schur Decomposition)

    In the case where a controllable canonical form is chosen as the form to compare the estimates an initial estimate can be obtained by estimating the ARMA coefficients from one set of data and then transforming those coefficients to the form of the other set of data. It is important to map the uncertainties as well as the estimates because this information will be used as aprori information in the improved estimate which incorporates this aprori information plus the other set of data.

    It should be noted that recursive least squares is equivalent to the Bayesian estimate where the aprori information is obtained from the previous estimates via the RLS algothim. I bring this up to point out that there is a wide variety of theory about how to recursively incorporate new sets of data to improve an estimate that often gives the same result.

    The advantage of using a controllable canonical form as the basis of compression is that only the model estimate due to one of the sets of data has to be transformed. A transformation can introduce numerical error and statistical bias. As the certainty in the transformations approaches zero the bias introduced by the transformation approaches zero. The problem with controllable canonical form is it my not represent stiff systems in a numerically stable way. If the poles are well separated the system can be put in diagonalized form but a diagonalized form becomes a Jordan form when there is repeated poles. A Schur decomposition is a form that is a numerically stable alternative to Jordan form but not as computationally efficient.

    I bring these issues up because there is a lot of talk here about that statistical issues of fits. Robert points out how difficult it is to provide meaningful statistical results and this gets ever harder in the presence of numerical instabilities. The procedures I describe retain the prospect of calculating error bars but open up the question how much does numerical error effect these error bars and do the algorithms proposed properly account for this error.

  153. John Creighton
    Posted Sep 6, 2006 at 1:14 PM | Permalink

    I was thinking about the orthogonally of the proxies and my first thought was it is probably a precipitation index. Precipitation indexes can be related to cloud cover which play a big part in warming. Of course low clouds cause cooling and high clouds cause warming so precipitation is not directly related to warming. I then recall one of Robert’s posts,

    “I pulled just a couple of proxies (e.g. some african lake data, chinese river data) and they show beyond any question an extended warm spell in the 1100-1300 range that was clearly global in scope. I thought that this was visible in nearly all the tree ring data on the planet “¢’‚¬? but I see that now I can look for myself (if I can figure out how “¢’‚¬? there is a LOT of tree ring data, and of course (sigh) tree growth is itself multivariate and not a trivial or even a monotonic function of temperature).”

    http://www.climateaudit.org/?p=796#comment-44086

    and I then wonder if maybe Mann took care to select the worst of the tree proxies. I am not sure if Robert was saying the MWP and LIA was in most tree data or not. Interestingly enough tree proxies supposedly best for high frequency information so if low frequency proxies are first used to identify the low frequency model and then if the low frequency part of the signal is removed by an inverse filter (similar to differencing) then maybe trees will provide a more robust method of identifying the high frequency part of the signal.

    Anyway, we may be able to use trees to get low frequency information but we have better ways of doing it. I think tree proxies should only be used were they are supposedly suppose to excel.

  154. Mike Hollinshead
    Posted Sep 28, 2006 at 10:58 AM | Permalink

    Re: #6

    Willis,

    If memory serves me right, GISS does not use ship based SSTs, only buoys, whereas Hadley does use them. Their SSTs are therefore likely very different.

    Mike

  155. Steve McIntyre
    Posted Sep 28, 2006 at 11:10 AM | Permalink

    GISS has SST back to the 1870s for 0.5N 159E in the Hansen PNAS study, while HadCRU has virtually no values around 1900 – so someething else must be oging on as well

  156. Willis Eschenbach
    Posted Sep 28, 2006 at 11:33 AM | Permalink

    Re 155, the HadISST (Ice and Sea Surface Temperature) database goes back to1870. I believe that’s the data Hansen used … but like so many things in climate “science”, who knows?

    w.

  157. Willis Eschenbach
    Posted Oct 26, 2006 at 6:00 AM | Permalink

    I’ve been having a bit of a discussion with a rather “in-your-face” gentleman called Eli Rabett on another blog not to be named, about the changes in CO2 in Hansen’s different forcing scenarios. Eli’s claim is that there is “not a tit’s worth” of difference in CO2 in Scenarios B and C until the year 2000. In support of this, he provided the following chart:

    He does not say where the data for the chart comes from … or how he did the calculations … or anything. He just claims that’s the truth. But it can’t be, because if the difference were only a few parts per million between the observations and all three scenarios, why are the scenarios so different from each other and from the observations?

    (And in a bizarre twise, he has the Scenario C levels flattening out in 2000, whereas Hansen states that “Slow growth [Scenario C] assumes that the annual increment of airborne CO2 will average 1.6 ppm until 2025, after which it will decline linearly to zero in 2100.”

    Now, I’ve been putting off actually doing the exercise of figuring out the CO2 levels in Hansen’s scenarios, because it’s somewhat complex. The problem comes from Hansen’s specification of the inputs to the models, which are as follows:

    Scenario A
    CO2 3% annual emissions increase in developing countries, 1% in developed.

    Scenario B
    CO2 2% annual emissions increase in developing countries, 0% in developed.

    Scenario C
    CO2 1.6 ppm annual atmospheric increase until 2025, and decreasing linearly to zero by 2100.

    (He also specifies changes in methane and nitrous oxide, but as he says, “Comparable assumptions are made for the minor greenhouse
    gases. These have little effect on the results.”)

    As you can see, the CO2 inputs are in different units, with A and B given as emission changes, and C given as a change in atmospheric ppmv. That’s the difficulty.

    However, I am nothing if not persistent, so I tackled the problem.

    I got the historical carbon emission data by country from the CDIAC for 1958, the start of the run. I divided it into developed and developing countries. It breaks down like this (in gigatonnes of carbon emitted:

    World : 2.33 gTC
    Developed : 1.90 gTC
    Developing : 0.43 gTC

    That was the laborious part, splitting out the emission data. Next, the atmospheric data. Not all of the CO2 that is emitted stays in the atmosphere. To account for this, I had to calculate the percentage that remained in the atmosphere each year. This varies from year to year. To compute this, I took the Mauna Loa data for the change in CO2 year by year. Knowing the atmospheric concentration and the amount emitted, I then calculated the amount retained by the atmosphere. Over the period, this varied in the range of 80% to 40%.

    I was then ready to do the calcuations. For Scenarios A and B, I calculated each succeeding year’s increased emissions, multiplied that by the percentage retained in the atmosphere, and then used that to calculate the new atmospheric concentration. Scenario C was much easier, it was a straight 1.6 ppmv increase annually.

    Here are the final results:

    A couple things of note. First, Scenarios A, B and C are fairly indistinguishable until about 1980. This is also visible in the model results, where those scenarios do not diverge significantly until 1980.

    Second, all of the scenarios assume a higher rate of CO2 growth than actually occurred.

    Finally, since these scenarios were designed by Hansen to encompass the range of high and low possibilites, this sure doesn’t say much for the scenarios …

    w.

  158. Willis Eschenbach
    Posted Oct 27, 2006 at 6:16 PM | Permalink

    As I mentioned, I’ve been discussing this issue on another blog. Tim Lambert kindly pointed out to me that I was looking at the wrong specification for the forcings of the models. (This was followed by a very nasty response from Eli Rabett). Lambert was right. Here is my response.

    ________________________________________________________________________________________________

    Tim, thank you for pointing out this error. You are 100% correct, Hansen describes two sets of scenarios A, B, and C in his paper. One is for the 1988 graph, and one is for the 2006 graph. Guess the Rabett was a far-sighted lagomorph after all.

    However, none of this changes a couple of things.

    1) The CO2 projections by Hansen are quite good up until 1988, which makes sense, because the paper was written in 1988 and the scenarios were designed, understandably, to fit the history.

    And as Eli Rabett pointed out, all three CO2 scenarios are indentical until the C scenario goes flat in 2000.

    However, after 1988, all three scenarios show more CO2 than observations. C drops off the charts in when it goes flat in 2000, but A and B continue together, and they continue to be higher than observations. The CO2 forcings from A and B are higher than observations every year after 1988 to the present.

    2) Including 4 of the other 5 major GHGs (CH4, N2O, CFC-11, and CFC-12) gives scenarios that diverge before 1988. A and B diverge immediately, and C diverges from B around 1980.

    Hansen’s claim that the scenarios were accurate can only be maintained by tiny graphs that don’t show the details. Once the details are seen, it is obvious that the forcings from the scenarios are all, every one of them, higher than observations, and the scenarios are still diverging from the observations to this day.

    Here is Hansen’s graph …

    And here is a clear graph of the same data …

    The actual 5 gases forcings based on observations follow B very closely until 1988. Again, this is no surprise, B was designed to be as close as possible to observations, with A above and C below. But after 1988, once again the observations diverged from all three scenarios. By 1998, just ten years into the experiment, observations were below all three observations. And the distance between them continued to increase right up to the present.

    A few more thoughts on the Hansen paper. He says:

    The standard deviation about the 100-year mean for the observed surface air temperature change of the past century (which has a strong trend) is 0.20°C; it is 0.12°C after detrending [Hansen et al., 1981]. The 0.12°C detrended variability of observed temperatures was obtained as the average standard deviation about the ten 10-year means in the past century; if, instead, we compute the average standard deviation about the four 25-year means, this detrended variability is 0.13°C.
    “‚⧆or the period 1951-1980, which is commonly used as a reference period, the standard deviation of annual temperature about the 30-year mean is 0.13°C. … We conclude that, on a time scale of a few decades or less, a warming of about 0.4°C is required to be significant at the 3àƒ?à†’ level (99% confidence level).
    “‚⧔here is no obviously significant warming trend in either the model or observations for the period 1958-1985. During the single year 1981, the observed temperature nearly reached the 0.4°C level of warming, but in 1984 and 1985 the observed temperature was no greater than in 1958. Early reports show that the observed temperature in 1987 again approached the 0.4°C level [Hansen and Lebedeff, 1988], principally as a result of high tropical temperatures associated with an El Nino event which was present for the full year. Analyses of the influence of previous El Ninos on northern hemisphere upper air temperatures [Peixoto and Oort, 1984] suggest that global temperature may decrease in the next year or two.

    “‚⧔he model predicts, however, that within the next several years the global temperature will reach and maintain a 3àƒ?à†’ level of global warming, which is obviously significant. Although this conclusion depends upon certain assumptions, such as the climate sensitivity of the model and the absence of large volcanic eruptions in the next few years, as discussed in Section 6, it is robust for a very broad range of assumptions about CO2 and trace gas trends, as illustrated in Figure 3.

    Now, is this all true? Are we in the midst of “signicant”, unusual warming? The answer requires a short detour into the world of statistics.

    “Standard deviation” is a measure of the average size of the short-term variations in a measurement, such as yearly measurement of temperature. A “3àƒ?à†'” (three sigma) level of significance means that the odds of such an event occurring by chance are about one in a thousand. However, there are a couple caveats …

    1) All of these types of standard statistical calculations, such as Hansen used above, are only valid for what are called “stationary i.i.d. datasets”. “Stationary” means that there is no trend in the data. If there is a trend in the data, all bets are off.

    For example, suppose we are measuring the depth of a swimming pool with someone swimming in it, and we can measure the depth of the water every second. Since someone is swimming in the pool, we get different numbers every second for the depth. After a while, we can determine the standard deviation (average size) of the waves that the person makes. We can then say that if the depth of the water is less than the average depth minus three times the standard deviation (average size) of the waves, this is a “three sigma” event, one that is unusual. It means, perhaps, that someone has jumped in the pool.

    Now suppose that we pull the plug on the pool, and the water level slowly starts to fall. Sooner or later, the trough of one of the waves from the swimmer will be less than the three sigma depth … does this mean that that someone has jumped in the pool?

    No. It just means that initially we were dealing with “stationary” (trendless) data, so we could analyze the situation statistically. But once we started emptying the pool, we introduced a trend into the data, and at that point, we can no longer use standard statistics. In other words, all bets are off.

    The same is true for temperature, it always has a trend. As we know from the history of the world, temperature is never stable. It has trends on scales from months to millenia. Because of this, the analysis Hansen did is meaningless.

    2) “i.i.d” stands for “independent identically distributed”. “Independent” means that the numbers in the dataset are not related to each other, that one does not depend on another.

    But this is not true of temperature data. A scorching hot month is not usually followed by a freezing month, for example. This type of dependence on the previous data point is called “autocorrelation”. In other words, the temperatures are not independent of each other, so we can’t use standard statistical methods as Hansen did. We need to use different statistical methods when a dataset is autocorrelated.

    One of the effects of autocorrelation is that it increases the standard deviation. Hansen observes (above) that the standard deviation during the 1951-1980 period was 0.13°C, which makes a 3 sigma event three times that, or 0.39°C. But the temperature record is autocorrelated, which increases the standard deviation.

    Adjusted for autocorrelation, the standard deviation for the ’51-’80 period increases to 0.19°C, which makes a 3 sigma event 0.57°C, not 0.39°C.

    Now, the average temperature anomaly from 1951-1980 was -0.11°C. The average anomaly 1996-2005 was 0.39°C. So, despite Hansen’s dire 1988 predictions, and even ignoring the fact that the global temperature dataset is not stationary, it has not happened that “within the next several years the global temperature will reach and maintain a 3àƒ?à†’ level of global warming” as Hansen predicted.

    Will we see such an event? Almost assuredly … because we can’t actually ignore the fact that the temperature is not stationary. Because of the trend, even if we adjust for autocorrelation, we cannot say that a particular data point in a series containing a trend is significant at any level. So sooner or later, we will see a three sigma event, which because of the trend won’t mean anything at all … but we haven’t seen it yet.

    My best to everyone,

    w.

  159. MarkR
    Posted Oct 28, 2006 at 1:11 AM | Permalink

    From Steve Milloy Junkscince.com

    Check the link out for maths of exagerated warming trend forecast.

    “Real-world measures suggest moderate to strong negative feedback, currently unnamed and un-quantified, mitigates the Earth’s thermal response to additional radiative forcing from both human activity and natural variation. Justification for amplification factors >2.5 for unmitigated positive feedback mechanisms is not evident in empirical measures. It is not clear whether any amplification factor should be applied or even what sign any such factor should be. Nor is there evidence to support such large ? values in GCMs. Division of real-world measures continue to exhibit the same surface thermal response derived by Idso for contemporary local, regional and global climate, for ancient climate under a younger, weaker sun and for Earth’s celestial neighbors, Mars and Venus.

    In the absence of support for amplification factors and in view of their erroneously large ? values it is apparent that the wiggle fitting so far achieved with climate model output is accidental or that these models contain equally large opposing errors in other portions of their calculations such that a comedy of errors produce seemingly plausible results in the short-term. In either case no confidence is inspired.

    On balance of available evidence then the current model-estimated range of warming from a doubling of atmospheric carbon dioxide should probably be reduced from 1.4 – 5.8 °C to about 0.4 °C to suit observations or ËÅ”
    0.8 °C to accommodate theoretical warming — and that’s including ?F of
    3.7 Wm-2 from a doubling of pre-Industrial Revolution atmospheric carbon dioxide levels, a figure we suspect is also inflated.

    The bottom line is that climate models are programmed to overstate potential warming response to enhanced greenhouse forcing by a huge margin. The median estimate 3.0 °C warming cited by the IPCC for a doubling of atmospheric carbon dioxide is physically implausible.”

    “We do not know why modelers persist in using their 2.5 times amplification factor when empirical measure repeatedly demonstrates 0.5 to be the correct ratio. We would like to think competition for a share of the multi-billion-dollar global warming research largesse had nothing to do with it but we can see how difficult it would be to get published in such a frenetic field with results reflecting trivial response. With such a large cash cow to roast we expect heat settings to remain on “high” for the foreseeable future.”

    Link

  160. Posted Oct 28, 2006 at 3:24 AM | Permalink

    # 158

    Careful with the definitions. SteveM should put somewhere formal definitions of statistical terms, so we could talk with same language (in my case, equations+ bad English :). See e.g. #156 in http://www.climateaudit.org/?p=833#comments . (I’m not statistician, but here’s what I think, pl. correct if I’m wrong)

    1) All of these types of standard statistical calculations, such as Hansen used above, are only valid for what are called “stationary i.i.d. datasets”. “Stationary” means that there is no trend in the data. If there is a trend in the data, all bets are off.

    i.i.d is enough. i.i.d process is stationary process. Not necessarily vice versa. However, talking about 3-sigmas with 99 % confidence implies that Hansen means Gaussian i.i.d. So, he shows that it is very unlikely that global temperature is a realization of Gaussian i.i.d process. I agree with that.

    We can then say that if the depth of the water is less than the average depth minus three times the standard deviation (average size) of the waves, this is a “three sigma” event, one that is unusual.

    This is a good example. 3-sigma with unusual refers to Gaussian distribution. But we can assume that it is normal that sometimes somebody jumps in the pool. Then the 3-sigma event is not very rare. But neither is the distribution Gaussian. Hmm, am I getting confusing again ? ;)

    But once we started emptying the pool, we introduced a trend into the data, and at that point, we can no longer use standard statistics. In other words, all bets are off.

    But now here is a change for statistical inference: after we get 5-sigmas we can drop the ‘Gaussian i.i.d’ hypothesis.

    One of the effects of autocorrelation is that it increases the standard deviation.

    Yes, in autocorrelated case, sample standard deviation from small sample will usually underestimate the process standard deviation.

    Shortly: Hansen shows that global temperature is not Gaussian i.i.d process.

  161. Willis Eschenbach
    Posted Oct 28, 2006 at 3:39 AM | Permalink

    Thanks, UC, for the clarification.

    My understanding, open to correction, is that “stationary and “iid” are different things. All iid means is that they are not autocorrelated, and that they have the same distribution (gaussian, poisson, etc.). These data points might or might not contain a trend. Adding a linear trend to gaussian data merely produces a new distributition, let me call it “trended gaussian”. As long as all of the data points are “trended gaussian” in distribution, are they not “identically distributed”?

    Also, you say:

    Yes, in autocorrelated case, sample standard deviation from small sample will usually underestimate the process standard deviation.

    This is true regardless of the size of the sample.

    w.

  162. Posted Oct 28, 2006 at 3:59 AM | Permalink

    “stationary and “iid” are different things

    Yes.

    All iid means is that they are not autocorrelated..

    Even more, i.e. they are independent. For Gaussian random variables uncorrelation implies independency, not necessarily for other distributions.

    These data points might or might not contain a trend.

    Need to think about this.. Short samples can show a trend, but generally no trend.

    As long as all of the data points are “trended gaussian” in distribution, are they not “identically distributed”?

    And trended Gaussian is not stationary either, because the first moment (mean) changes over time.

    Read with caution, these are open to correction as well.

  163. bender
    Posted Oct 28, 2006 at 6:55 AM | Permalink

    “identical” means the distribution doesn’t change. If there is a trend, then a key parameter of the distribution – the mean – is changing.

    “non-autocorrelated” and “independent” are synonymous.

    “stationary” typically means first order *and* second order moments (mean, variance) do not change.

  164. Posted Oct 28, 2006 at 8:37 AM | Permalink

    There might be many definitions, but

    “non-autocorrelated” and “independent” are synonymous

    is this true? For independent F(x1,x2)=F(x1)F(x2), for uncorrelated E(x1x2)=E(x1)E(x2). F is the distribution function. Correlation is weaker property than independence.

    “stationary” typically means first order *and* second order moments (mean, variance) do not change.

    E(x) does not change over time, and autocorrelation can be expresses as R(delta_t), I think that is the definition of weak-sense-stationary process (?)

  165. bender
    Posted Oct 28, 2006 at 12:05 PM | Permalink

    UC, if x1 is dependent on x2, and so on, then xi is, by definition, autocorrelated. Now, the ability to infer autocorrelation based on a sample autocorrelation statistic, that’s an issue. It’s nigh impossible if the series is very short. The dependency among xi needs to be somewhat persistent before the sample autocorrelation coefficient becomes significantly different from zero. The dependency may exist, but not be detectable via a (small) sample autocorrelation coefficient.

    Similarly, if the autocorrelation relationship varies from one (or some) xi to the next, then a dependency is always there, but it will not yield a significant autocorrelation coefficient, because it is not one, homogeneous dependency. This illustrates your point.

    But then I ask you: how would you characterize this moving dependency? You can’t. Therefore you have the same problem with the term “dependence” as you do with “autocorrelation” – whether the series is short, or whether the dependency varies.

    Sure, the terms can mean different things; but they are synonymous in the context you and Willis were using them.

  166. cytochrome_sea
    Posted Oct 28, 2006 at 3:18 PM | Permalink

    Just a nitpick, I don’t think Lambert got it “100%” right, as he said, “You have given the definition for the one starting in 2006, not the one in his 1988 paper” but it was first outlined in the 98 paper.

  167. Willis Eschenbach
    Posted Oct 28, 2006 at 4:53 PM | Permalink

    Man, I love this blog. I learn more here in one day than I can even process. Thanks, guys.

    My original post that led to this discussion was originally intended for a less mathematically knowledgeable blog, so I tried to simplify the math, describing the standard deviation as the “average size” of the residuals, which is not strictly true, etc.

    w.

  168. Posted Oct 29, 2006 at 3:02 AM | Permalink

    UC, if x1 is dependent on x2, and so on, then xi is, by definition, autocorrelated.

    What is your definition for autocorrelation? I would use ‘E(x1x2)=E(x1)E(x2) means no autocorrelation’. My point is, if we criticize Hansen about faulty stats, we should be quite accurate with our terms then. (I know what Willis means and I agree with him)

    Let’s see if I find an example of dependent but non-autocorrelated process.. Change AR1 x(k+1)=alpha*x(k)+w(k) to x(k+1)=alpha*x(k)*w(k), would that do?

  169. bender
    Posted Oct 29, 2006 at 8:19 AM | Permalink

    #168 is consistent with #165.

    Of course, the nature of w(k) matters. The correlation in x(k) will degrade as the variance in w(k) increases. That does not mean x(k) is not autocorrelated. It means the autocorrelation coefficient is a weak model for describing the autoregressive effect of alpha.

    Your caution about definitions, I imagine, stemmed from this line in #158:

    A scorching hot month is not usually followed by a freezing month, for example. This type of dependence on the previous data point is called “autocorrelation“.

    If you “agree with Willis” on this point, then why raise the issue about definitions, particularly the distinction between of “dependence” and “autocorrelation”?

    His statement is accurate enough for a blog and accurate enough to make his case. If he wanted to be more accurate he might have said:

    This type of dependence on the previous data point is callled what leads to “autocorrelation”.

    But then we’re splitting hairs here. And I just don’t think it’s necessary. Last post.

  170. Posted Oct 29, 2006 at 9:11 AM | Permalink

    #168 is consistent with #165.

    I don’t agree, E[x(k)x(k+1)]=0 and that means no autocorrelation (to me.) w(k) matters, that’s true, should add E[w(k)]=0, w(k) i.i.d.

    Your caution about definitions, I imagine, stemmed from this line in #158:

    My caution about definitions is in #160. You added

    “non-autocorrelated” and “independent” are synonymous.

    which I didn’t agree with.

    Last post.

    OK.

  171. Posted Dec 12, 2007 at 8:50 AM | Permalink

    Out of curiosity, have Hansen et al 2006 ever provided a source for the values in their charts? I’d like to see the number in the graph.

    FWIW, I think the difficulty with lining everything up in 1958 is that the initial conditions (IC) for the runs were probably midnight, Dec. 31, 1957. Hansen et al. doesn’t say this, but one must provide initial conditions to a run, and setting the IC to match that particular time is the only thing that makes any real sense.

    The Annual Average temperatures in 1958 did rise, and however the initialized the model didn’t.

    It does make complete sense to put HADCRUT and GISS on the same basis time basis, so what willis does makes sense there. You need to normalize everything to the same year.

    I’m actually not sure quite what is correct to do about matching or not matching start points.

    I could be wrong, but it seems to me there are challenges revolving around with setting initial conditions. You can’t set them for a full average year, you must set them for a precise time. How well can any modeler know everything in Dec. 31, 1957? Whatever choices are made have some effects on climate. Some choices — for example individual storms– may have short term effects on predicted climate; others’ long term effects on predicted climate. (Anomolously high or low amounts of stored heat in the oceans could have a quite long term effect.)

    The sensitivity to these initial conditions is not discussed in the 1988 papers, I don’t run these models, so I don’t know.

    But… anyway, did anyone ever find the data for the Hansen graph on line? I’m hankerin’ for the unshifted stuff, and I’d like 2006 and 2007!

  172. Sam Urbinto
    Posted Dec 12, 2007 at 1:15 PM | Permalink

    I’d say in the case of “If August was 90 F, September won’t be -30 F” has little or nothing to do with statistics, and it’s certainly not iid. It’s a pattern of nature. (Or so I say!)

    If you use statistics to infer something about an unknown aspect of some sample, you can use the z-test to see if the difference between that sample mean and the population mean is large enough to be significant. In order to satisfy the central limit theorem (enough observations of variables with a fininte variance will be normally distributed (Gaussian or bell-curve), the observations are considered beforehand to be i.i.d. by default.

    A collection of random variables is i.i.d. (independent and identically distributed) if each has the same probability distribution and they are all independent of each other. If observations in a sample are assumed to be iid for statistical inference, it simplifies the underlying math, but may not be realistic from a practical standpoint.

    Examples of iid:

    Spinning a roulette wheel
    Rolling a die
    Flipping a coin

    Ceteris paribus of course.

    (A statement about a causal connection between two variables should rule out the other factors which could offset or replace the relationship between the antecedent (first half of the hypothetical proposition, in this case throwing a die) and the consequent (the second half, in this case that the die will land without influence that would make the throw be not iid in the sample, such as weighting one side of it before throwing it)

  173. John M
    Posted Jan 1, 2008 at 1:46 PM | Permalink

    FWIW, here is my crude (sorry, I had to use Paint) update of the Hansen plot. For the sake of argument, I have done it on the “apples to apples” basis supported by Peter Hearnden in #10. I have estimated the GISS anomaly for 2007 to be +0.74, based on the J-N data and the fact that it looks like Dec will about the same or cooler than Nov.

  174. Posted Jul 11, 2008 at 6:39 AM | Permalink

    It would appear that Hansen’s 1988 climate models are beginning to diverge from the actual temperature observations

    The latest GISS readings are shown in the diagram below:

    [wp_caption id="" align="alignnone" width="450" caption="Scenarios A, B and C Compared with Measured GISS Surface Station and Land-Ocean Temperature Data"][/wp_caption]

    The original diagram can be found in Fig 2 of Hansen (2006) and the latest temperature data can be obtained from GISS. The red line in the diagram denotes the Surface Station data and the black line the Land-Ocean data. My estimate for 2008 is based on the first six months of the year.

    Scenarios A and C are upper and lower bounds. Scenario A is “on the high side of reality” with an exponential increase in emissions. Scenario C has “a drastic curtailment of emissions”, with no increase in emissions after 2000. Scenario B is described as “most plausible” and closest to reality.

    Hansen (2006) states that the best temperature data for comparison with climate models is probably somewhere between the Surface Station data and the Land-Ocean data. A good agreement between Hansen’s premise and measured data is evident for the period from 1988 to circa 2005; especially if the 1998 El Nino is ignored and the hypothetical volcanic eruption in 1995, assumed in Scenarios B and C, were moved to 1991 when the actual Mount Pinatubo eruption occurred.

    However, the post-2005 temprature trend is below the zero-emissions Scenario C and it is apparent that a drastic increase in global temperature would be required in 2009 and 2010 for there to be a return to the “Most-Plausible” Scenario B.

    Will global warming resume in 2009-2010, as predicted by the CO2 forcing paradigm, or will there be a stabilsation of temperatures and/or global cooling, as predicted by the solar-cycle/cosmic-ray fraternity?

    Watch this space!

    P.S: It would be very interesting to run an “Actual Emissions” Scenario on the Hansen model to compare it with actual measurements. The only comments that I can glean from a literature survey is that Scenario B is closest to reality, but it would appear that CO2 measurements are above this scenario, but unexpectedly, methane emissions are significantly below. Does anyone have the source code and/or input data to enable this run?

One Trackback

  1. By Thoughts on Hansen et al 1988 « Climate Audit on Jan 19, 2012 at 12:42 PM

    [...] In 1988, Hansen made a famous presentation to Congress, including predictions from then current Hansen et al (JGR 1988) online here . This presentation has provoked a small industry of commentary. Lucia has recently re-visited the topic in an interesting post ; Willis discussed it in 2006 on CA here . [...]

Follow

Get every new post delivered to your Inbox.

Join 3,329 other followers

%d bloggers like this: