Verification r2 Revealed!!!

For the first time, a member of the Hockey Team (Ammann and Wahl) has admitted that the verification r2 for the early steps of MBH98 are catastrophic. Results confirm our calculations – as we predicted. They have not explained the justification for issuing a press release that all our claims were "unfounded" and UCAR has not retracted the press release. I’ve left this as a sticky for a little while since it’s rather fun.

Ammann and Wahl is now in press. When I saw him in San Francisco, Ammann was not going to report the verification r2. I urged him to report the verification r2, urging him in as nice a way as possible not to "replicate" Mann to the extent of once again withholding the verification r2 statistic. He didn’t seem so inclined. That’s one of the reasons why I’ve turned the heat up on Ammann (who seems nice enough but who has fallen in with a rough crowd.)

Guess what – buried deep in Ammann and Wahl [2006] are the verification r2 scores for their MBH emulation. Maybe our complaints to UCAR and publicity at the blog made them decide that wisdom was the better part of valor. Aside from risking scientific misconduct. Or maybe the "provisional acceptance" by Climatic Change included the requirement that they disclose these results. Regardless, the results are on the table. (How long did it take?) They completely vindicate our claims in GRL. The verification r2 for the 15th century step reported by A&W is 0.018. Some steps are even worse. Here is their table. I’ll parse through their commentary a little later and post some more news. These are the guys that issued a UCAR press release saying that all our results were "unfounded".

Wahl and Ammann Table 1S.


  1. John A
    Posted Mar 6, 2006 at 10:46 AM | Permalink

    What the hell happened in the early 18th Century? It must be the lowest r2 ever published in statistical history.

    And didn’t Ammann talk to Mann about getting the story right? r2 either does not matter and shouldn’t be bothered with (Mann at NAS Panel), or does matter and should be reported.

  2. bob
    Posted Mar 6, 2006 at 10:48 AM | Permalink

    I noticed that the article (page 43) contained an acknowledgement to panel member D. Nychka (and someone else) of NCAR for statistical and programming aid.

  3. John Davis
    Posted Mar 6, 2006 at 11:04 AM | Permalink

    I see that there is also a figure 1S (or S1)and a paragraph of text, which purports to demonstate that R2 is useless, RE is great. I’m not a statistician so I can’t comment except to note that the comparisons shown are between the “reconstruction” and the “actual” results – which seems to me to be a difficult trick to pull off in real life.

  4. Steve McIntyre
    Posted Mar 6, 2006 at 11:05 AM | Permalink

    Mann cited Ammann and Wahl during his presentation as vindicating him. Nychka did not mention that he’d helped with this article.

    More importantly, even though he’s being relied upon as one of the few statistically oriented members of the panel, he failed to ask Mann about the verification r2 statistics. Another panelist who was not a statistician asked and Mann avoided the question saying that it would have been “silly” to calculate the r2 statistic. Nychka did not intervene. I think that he should have stepped up.

    Nychka came up and talked to me – he seems like a very decent guy who I’d like in 99.9% of all situations. However, I think that it’s a very bad decision for him to be involved in this, especially now that it’s turned up that he worked on Ammann’s article. He didn’t report this in his online bio put out for comment. As I say, I liked him, he just shouldn’t be doing this.

    BTW I told North that I withdrew my concern over Cuffey. Cuffey went to the effort on Thursday of identifying me before we presented and re-assuring me that he distinguished between the issue of the validity of the multiproxy studies and the larger issue of GW and AGW and would not let his opinions on the one prevent a proper evaluation of the other. He then proved to be the most lively questioner and a real force on the panel.

  5. Ray Soper
    Posted Mar 6, 2006 at 11:13 AM | Permalink

    Steve, for us statistical dullards, newbies to the site, and lurkers could you please explain (in simple terms) what a low R2 actually means? What does a high R2 mean? At below what threshold does R2 tell us that the statistical significance is so low that the study is worthless? Thanks

  6. John Lish
    Posted Mar 6, 2006 at 11:24 AM | Permalink

    Ray – I use an online statistics glossary to follow the arguments.

    Multiple Regression Correlation Coefficient

    The multiple regression correlation coefficient (R2) is a measure of the proportion of variability explained by, or due to the regression (linear relationship) in a sample of paired data. It is a number between zero and one and a value close to zero suggests a poor model.

  7. Douglas Hoyt
    Posted Mar 6, 2006 at 11:35 AM | Permalink

    Usually statistical packages will give error bars for R2 and CE. I wonder what they are? It would tell you if, for example, R2 is or is not statistically different than zero.

  8. Posted Mar 6, 2006 at 11:44 AM | Permalink

    Steve, while you are getting work orders, could you please explain how the valitation statistics are obtained outside the range of the temperature measurements? I can understand an r2 of reconstruction vs observed temperatures, but what if the temperatures are not observed?

  9. Ray Soper
    Posted Mar 6, 2006 at 11:44 AM | Permalink

    re #6: Thanks John. I hadn’t thought of using an on-line statistical glossary – good idea.

    In his report on the thread “One observer’s report on the NAS panel” Ned comments as follows: “John Christy asked Mann about the r2 statistic. Mann said it was an inappropriate measure for these types of analyses.” On what basis could Mann say that? Could he have a point? If not, why not?

  10. john lichtenstein
    Posted Mar 6, 2006 at 12:51 PM | Permalink

    Such extreme differences between the calibration and verification tests indicate overfitting. They should have gone back to the drawing board and tried to build a less fancy model.

  11. Steve McIntyre
    Posted Mar 6, 2006 at 1:33 PM | Permalink

    #7. MBH98 reported that the 99% significance benchmark for r2 was 0.34 in the present circumstances. One can argue about effects of autocorrelation, but I’d be content with using 0.34 as a benchmark for now.

    #9. One point against Mann’s position is that he said he used the verification r2 statistic in MBH98 and has used it elsewhere. The r2 statistic is much more widely used than the RE statistic and disclosure of results like the ones shown here would probably have caused everyone to laugh at MBH98. Now the positions are entrenched so one sees different arguments emeerging. The Hockey Team are scrambling like crazy. Wahl and Ammann [Clim Chg 2006] purport to argue why r2 should not be used. I think that their argument would have been laughed out of court if there were no entrenched positions and I expect it to be laughed out of court anyway.

    #10. The whole she-bang is driven by bristlecones. There is a spurious fit in the calibration period through huge overfitting (they should have read my notes on the linear algeba of MBH98) and then a spurious RE because of the verificaiton period is not really independent of the calibration period, when you have big autocorrelations in both bristlecones and temperature.

    I’ve just gone through W&A in detail. It’s going to be very hard for me not to use a lot of adjectives to describe these guys.

  12. Martin Ringo
    Posted Mar 6, 2006 at 1:40 PM | Permalink

    Re #1: When you do Ordinary Least Squares regression (OLS) without a constant term or the equivalent (e.g. a set of dummy variables), you can get a negative R-squared. The reason is that the explanatory power of the independent variables (without the constant term) is less than the explanatory power of the sample mean. With OLS models with a constant term, the constant term acts (with other regressors) as an estimate of the sample mean of the dependent variable: average( Y) = constant^ + b^ * average( X), where the “^’ denotes regression estimate. And hence the regression can do no worse than the sample mean implying the R-squared must be equal or greater than zero.

    The R-squared of regression is just the squared correlation of the predicted value (using the regression coefficient estimates) of the dependent and the actual value of the dependent. This same statistic, R2 if you like, can be applied for period other than that the regression was calculated upon (e.g. the “calibration” and “verification” periods). When the r-squared (using the lower case “r’ to denote “out of sample” calculation) is calculated for a verification period, its range is -1 to +1 even if there is a constant in the regression model. The reason for this is that the constant (again, with the other regressors) is not guaranteed to capture of sample mean of the verification sample as is the case with the R-squared, with a constant, in the estimation sample.

    Re #5: Ray, I presume that you are interested in the R2 for a verification test. And there the answer depends on the context. Suppose you were an analyst for the CIA during the Cold War, and estimated a model of the Soviet economy. Your boss said we need a verification. And because you thought ahead, you kept part of your data out of your estimation so to use for testing your model. You test it and you come up with an R2 of zero! Your boss fires you and goes back to the previous estimations. Unfortunately, the CIA would have much better off with you model with its zero R2 because it would have been a reasonable estimate of the mean (which if you recall the Cold War history, the CIA manages to blow by factor of roughly two).

    However, climate reconstruction ain’t Cold War economic forecasting. It is true that we want estimates of the past mean temperatures, but we also want to see the patterns of variability. An R2 of near zero gives us little confidence in our ability to predict the magnitude of major changes in temperature. The climate crowd might argue that if they successfully test against the low frequency (long period) changes, that is good enough. I find this hard to believe because if one gets the major changes, i.e. the low frequency changes, right then getting a near zero R2 implies a strong negative correlation with the high frequency changes: something that could happen by coincidence but isn’t likely in comparison to successful test on low frequency having a bias. (Note that is not a statistical theorem just an old practioner’s judgment on model testing.)

    So in summary, for the reconstruction of temperature a common sense interpretation of the needed R2 is probably as good as an expert’s. For annual data something at or over 20% seems reasonable to me. [Note: if you look at von Storch’s PowerPoint presentation for NAS, his statistic that measures the change in the mean of two subsamples (e.g. two 25 year periods) over the standard deviation of those periods can be viewed as a dependent variable, and one can calculated a quasi lower-frequency, R2 type statistic, which should get bigger as the period increases. I don’t know if von Storch has provided estimated, rates-of-change, holding-significance-constant tables, but they could be calculated if one wanted to argue a more statistically valid low-frequency explanatory power test.] In the absence of a demonstration of the minimum R2 (or RE or whatever) needed to capture swings of, say, 0.5 degree C — or maybe even with it — just ask the following question: if we have a model that forecasts future temperatures changes of say 2 or 3 degrees C, how much explanatory power should that model have in a test (i.e. not used in estimation) period with 2 to 3 degree changes? And if I make the test period with only 0.5 to 1.0 degree C changes, would that previous question R2 be good enough?

  13. fFreddy
    Posted Mar 6, 2006 at 1:45 PM | Permalink

    Re Steve, #11

    “…and I expect to be laughed out of court anyway”

    Umm, you might want to insert an “it” there…

    Would it be too cynical to wonder why this paper became available today, instead of, say, last Wednesday ?

  14. per
    Posted Mar 6, 2006 at 1:50 PM | Permalink

    an interesting paper, wouldn’t touch some of the logic with a barge pole 🙂
    did M manage to cite this paper with a straight face ? How did it pass peer-review ?


  15. Steve McIntyre
    Posted Mar 6, 2006 at 2:05 PM | Permalink

    Not only did he cite it with a straight face, he’s relying on it. Also it’s not just Mann who cited it. Sir John Houghton cited it to a Senate Committee as supposedly showing that our claims were all “unfounded”. Also IPCC.

    As to peer reviewers, I was a peer reviewer for the first submission of the article. I wrote a very critical review. Obviously I was in controversy, but I usually make points pretty objectively. One of the things that I objected to was this whole notion that we were presenting an “alternative reconstruction” (“our” reconstruction being a bristlecone-free version of MBH98). A&W and Schneider both knew that this paper misrepresents this but don’t seem to have cared.

    In the first version, Ammann tried to finesse the entire R2-RE issue by withholding the adverse R2 results. I asked for the values as a reviewer but he refused. I guess that the pounding eventually succeeded. Ammann’s lucky that he finally disclosed these results. At least he doesn’t have quite as much of a mess to deal with as he wold have if he’d tried to bluff it out as he wanted to do.

    I notice that UCAR has not retracted their claim that all our claims were “unfounded”. Buncha…

  16. jae
    Posted Mar 6, 2006 at 3:14 PM | Permalink

    And you STILL don’t believe in conspiracies?

  17. Martin Ringo
    Posted Mar 6, 2006 at 3:17 PM | Permalink

    Re #8: David, and Steve correct me if I am wrong, the Proxy Network MBH period refers to the set of proxies that cover (go back through) that period and not the time period for a verification. There are different groups because not all proxies extend back the same length. My presumption is that the verification takes place in the same, pre-estimation period for all of the proxies groupings (subject limitations of missing observations in that period), but the particular set of proxies and their coefficients (from the PCA and calibration) that make up the predictor change by period. They are then all spliced together by some process of weighting which I still do not get.

  18. Steve McIntyre
    Posted Mar 6, 2006 at 3:42 PM | Permalink

    #17 – there are 11 MBH “steps” each with a different roster of proxies ranging from 22 in the AD1400 step to 112 in the AD1820 step. The reconstruction in the 1854-1980 period varies for each step and has different statistics. Mann never reported the unspliced versions (and has not to this day.) A long time ago, I simply wanted to apply some simple econometric tests to the AD1400 reconstruction checking for autocorrelated errors etc. However, Mann, NSF, Nature all refused this information and here we are.

    After all this time, we finally have a Hockey Team admission that the verification r2 is exactly what we said. Nonetheless, UCAR has not retracted their claim that all our claims are “unfounded”. We have written to them about this, but Anthes blew us off.

  19. Andre Bijkerk
    Posted Mar 7, 2006 at 8:25 AM | Permalink

    I wonder how Buerger and Cubasch and all the decision making fits in all this (Thread nr 511). Their paper is not in the reference list.

  20. Ross McKitrick
    Posted Mar 7, 2006 at 9:38 AM | Permalink

    At NAS someone asked R. D’Arrigo (as I recall) about using calibration period residuals to calculate confidence intervals, and what the confidence intervals would be if, instead, you used the verification period residuals and the r2 was very low. She said, more or less, “they’d go from the floor to the ceiling”–which is of course correct. When Mann was being prodded with soft pillows over whether they calculated the r2 he said “No, that would be silly”. Putting it all together: they didn’t report the r2 over the verification interval because it would be silly to show everyone that the confidence intervals go from the floor to the ceiling.
    I hope someone on the Nature editorial board is willing to look at the Table above and ask the forensic question: Could MBH have published their paper and made the claims they did, if these numbers were public knowledge at the time?

    • Skiphil
      Posted Mar 1, 2013 at 3:46 PM | Permalink

      WOW….. I’m not asking for room service but wondering if anyone still arriving at this and related threads can think of ways to address these problems now. Even after so many years, can a public case be made that MBH98 and MBH99 should be retracted? It seems that they were always sold under false pretences….

  21. Ross McKitrick
    Posted Mar 7, 2006 at 10:13 AM | Permalink

    Might have been G. Hegerl who first said “floor to ceiling.” Steve also used the phrase when he was presenting.

  22. Posted Mar 7, 2006 at 10:17 AM | Permalink

    Hey Steve, I was hoping to duplicate my analysis of random series with the RE statistic for my blog today but I am having a terrible time finding it. The web link for the supplementary information on MBH98 at Nature no longer works (!!!), the paper by Cook 1994 is not available online through the UC library, MBH98 only mentions a ‘resolved variance’ statistic which seems to not be the same, and the equation is not in Wahl and Ammann 2006. Would you be able to help? Thanks

  23. Posted Mar 7, 2006 at 10:37 AM | Permalink

    For anyone interested, I found what I think is a public version in Climate Field Reconstruction under Stationary and Nonstationary Forcing, S. Rutherford and M. E. Mann, T. L. Delworth and R. J. Stouffer, 2003.

  24. John Hekman
    Posted Mar 7, 2006 at 12:42 PM | Permalink

    Steve and Ross

    What you have accomplished is truly stunning. It is a genuine David v. Goliath story. It seems that enough seeds have been planted with open-minded scientists now so that the examination of these paleoclimate studies will continue.

    One question. Maybe I missed it in the posts that you have made so far, but did anyone out and out ask Mann about Bristlecone pines in any meaningful way, or was there a discussion of it in your session that hinted at the message getting through?

    Best wishes for more success.

  25. Steve McIntyre
    Posted Mar 7, 2006 at 12:50 PM | Permalink

    #24. I’m writing up my notes. We discussed this in explicit detail. We showed Mann’s claim that his reconstruction was robust to the presence/absence of “all dendroclimatic indicators”, the non-robustness to bristlecones and the CENSORED files. Next day, no one on the panel asked how he reconciled his claims of robustness with the actual non-robustness.

  26. Pat Frank
    Posted Mar 7, 2006 at 1:33 PM | Permalink

    #25 “Next day, no one on the panel asked how he reconciled his claims of robustness with the actual non-robustness.”

    I’ll bet they were too embarrassed. Putting Mann so publicly on the spot may have been too psychologically painful for them. A commendable sentiment, wrongly indulged.

  27. Armand MacMurray
    Posted Mar 7, 2006 at 2:21 PM | Permalink

    I’ve been reading over AW to see how they justify dismissing verification r2 (section 2.3 in their paper). It seems that their argument is as follows:
    1) Even if a temperature reconstruction lacks verification period skill when evaluated on a high-frequency (e.g. annual) basis, it may be fine when evaluated on a low-frequency (e.g. multi-decadal) basis.
    2) For their climate change research purposes, low-frequency temperature reconstructions are sufficient, because they are interested in long-term changes in mean temps.
    3) Thus, requiring verification period skill in the high-frequency domain may exclude useful information by eliminating those reconstructions with low-frequency domain skill, but lacking high-frequency domain skill. (this is where they use the Fig S1 examples)
    4) As they write,

    Thus, in this analysis the most generally appropriate temporal criterion from those listed is (3), a primary focus on low frequency fidelity in validation.


    This criterion ensures that objectively validated low frequency performance, at the time scale of primary interest, is not rejected because it is coupled with poor high frequency performance.


    Based on the above, one would expect AW to choose statistics for both calibration and verification period evaluation that are sensitive to low-frequency performance, but not high-frequency peformance. However, they choose to use RE, which they note “…registers a combination of interannual[=high-frequency] and mean[=low-frequency] reconstruction performance…” !

    Isn’t it self-inconsistent to justify excluding statistics sensitive to high-frequency reconstruction skill and then to go ahead and select a statistic that is sensitive to that very aspect of reconstruction skill?

  28. jae
    Posted Mar 7, 2006 at 3:15 PM | Permalink

    I’m still astounded by the “cherry picking” — selecting only those series that show positive correlation with temperature. How can anyone with any knowledge of statistics justify that? Isn’t randomness still a necessary assumption for these statistical procedures? Have any of those guys tried to justify this? This procedure, alone, invalidates the studies, it seems to me. Am I missing something?

  29. Steve McIntyre
    Posted Mar 7, 2006 at 3:41 PM | Permalink

    #28. I haven’t posted up my notes on D’Arrigo. She put up a slide entitled cherry picking, then said you have cherry pick if you want to make cherry pie.

  30. Steve McIntyre
    Posted Mar 7, 2006 at 3:51 PM | Permalink

    #27. I’ve been going through A&W as well. All the stuff on RE and r2 has been added since the first draft, after which I was expelled as a reviewer (although the changes are in response to things that I said.) In Hughes’ presentation, he argued that you had to use annually resolved data – a program that he’s been advocating since Hughes and Diaz 1994 – which ends up promoting the primacy of tree rings. (As soon as you go through some of Moberg’s hairy series, you see some advantages to the well-dated annuals series.) Hughes said that you need annual data to get degrees of freedom. If you have 50 year bins pace A&W, you have about 1 degree of freedom and can’t use any statistics.

    Look at their “statistical references” – there aren’t any. I exclude Wahl [2004] and Lytle and Wahl [2005] as not being thrid party statistical references and not having anything to do with time series.

    And aside from all that, Mann said that he used the verification r2. Can you imagine this piffle appearing in the original NAture article – even Nature has their limits.

  31. Hans Erren
    Posted Mar 7, 2006 at 4:00 PM | Permalink

    re 27: I would start with a low pass filter on the proxies and temperature and do my entire work on that.

    Unfortunately most of the data”cleaning” on treerings is very influential on low frequencies. How about population shifts of geographically narrow distributed species?

  32. John Hekman
    Posted Mar 7, 2006 at 4:25 PM | Permalink

    re: #30, I’m pretty sure that choosing annual observations instead of less frequent ones “to get degrees of freedom” is invalid statistically. Ed Leamer, a prominent econometrician at UCLA, refers to this fallacy of using more frequent observations as “counting your wealth in small change.”

  33. John Hekman
    Posted Mar 7, 2006 at 4:28 PM | Permalink

    (adding to previous comment) when you go from, e.g. annual observations to monthly ones, the test of the validity of the change is not the r-squared statistic but the F statistic, and in my experience this move from annual to monthly data often raises the r-squared but lowers the F.

  34. Edward McCann
    Posted Mar 7, 2006 at 6:43 PM | Permalink

    Re 27: If one looks at any time history curve in the frequency domain it is easy to see how the individual frequency curves makes up the total. If one splits these curves up into low and high frequency curves it is possible that the high frequency components have positive and negitive maxes that could have a significant effect on the total curve. If the high frequency components are not reliable for Climate related time history curves, than how does one know if there is not something missing? What do you think?

  35. Paul Penrose
    Posted Mar 7, 2006 at 10:38 PM | Permalink

    All this “cherry picking”, or selecting “appropriate” trees to sample would make sense if you were doing it on a biological basis. In other words, if you really understood, down to the cellular level, how these different species of trees react to temperature, CO2, moisture, and other factors, you might be able to select those trees which were most sensitive to temperature. You might also be able to reduce the effect of these other confounding factors in your analysis, thus producing a decent proxy for temperature. Short of that, I don’t see how any honest scientist could justify cherry picking. Not to mention cherry picking for the type of signal you are expecting (or wanting) to find!

  36. jae
    Posted Mar 8, 2006 at 10:28 AM | Permalink

    Folks: I plan to submit a proposal for a grant from the Federal Govt. to do a temperature reconstruction, and I would appreciate your comments on my approach. I plan to get cores from 1,000 river deltas from around the world and look at isotope levels in the yearly sediment layers. I figure I will get at least 100 cores that show some sort of correlation with surface temperatures since 1850. I will select these cores and throw the others out, since they would obviously have too much noise. I will then train the data on temperatures between 1850-1974. I will assume a linear relationship between isotope levels and temperature and I will also assume the population is normally distributed, so I can use my favorite statistical techniques. I may do some PC analyses, also, to see if I can get better relationships. I will then validate my series by comparing it to temperatures from 1975 to 1995 (I don’t like the data from 1995 to 2006, so I don’t plan to use it). If my correlation coefficient (r2) is poor, I plan to move on to the RE statistic. I figure I can get at least 0.2, which ain’t bad for paleoclimatology. If so, I plan to publish my results in Science or Nature. Since I am relatively new to the field and am somewhat wary of the peer review system, I will invite some of my more famous buddies to be co-authors, since they have buddies that are doing the peer reviewing. BTW, I will refuse to archive my data or methods, after spending so much Government money and time on this project. Also, if my R2 statistic is embarrasing, I will refuse to provide that, also. Don’t y’all think this would be a valuable scientific addition to the field?

  37. Posted Mar 8, 2006 at 10:59 AM | Permalink

    Reviewers Comments: This is an interesting and worthwhile study, but due to limited funds it is suggested that you reduce your suggested budget by 75% and resubmit. Based on previous work, it is possible that the objectives of your study would not be compromised if the field collection component was eliminated and the raw data you would have collected replaced with randomly generated numbers. This would produce a proposal within our budgetary constraints.

  38. Posted Mar 8, 2006 at 8:54 PM | Permalink

    Jae, your proposal will be rejected because your method is not sufficiently robust. You must not only throw away R2 if it is too bad. You must also be ready to throw away RE in the hypothetical case that it is bad instead of R2, and/or throw away deltas that are identified to insufficiently confirm global warming theory. Because there are 5 other projects that are much more careful in suppressing all doubts and in creating scary scenarios, your mediocre proposal cannot be accepted. 😉

    More seriously, this debate is very interesting. Could someone repeat Mann’s arguments why RE is so much better than R2 here?

  39. John Hunter
    Posted Mar 8, 2006 at 9:52 PM | Permalink

    A Tribute to Steve McIntyre

    Sorry, this is so good that I had to put it in the top post, but apologies if it has already appeared on this blog.

    It was gleaned from “Deltoid” at:

    and is the following posting by Ken Miles in response to:

    “McIntyre is in fact, an industry shill and has a powerful motive for promulgating the climate change pseudoscience he’s become famous for. If you think otherwise, you’re going to have to do better than this.”

    Ken’s response was spot on:

    “I’ve got to (partially) stick up for McIntyre here. Compared with most (all?) of the clowns who make the ranks of global warming skeptics, McIntyre is far and away the best of them. He does publish in peer reviewed journals (not just Energy & Environment) and he has raised some genuine concerns (such as access to data – I’m staying away from the more technical arguments as I don’t know enough to judge).

    He does tend to overplay his hand a bit, and climateaudit is a cesspool of idiots, but still credit should be given where it’s due.”

    I particularly liked the last sentence …..

  40. Posted Mar 8, 2006 at 10:02 PM | Permalink

    OK, it may be useful to publish their arguments why RE is supposed to be better than R2.

    Figure 1S Relationships of r2 and the “Reduction of Error” statistic (RE) to reconstruction performance, highlighting: (a and b) how r2 is sensitive to the interannual tracking of two time series, yet is insensitive to the relationship between the means of the same series; and (c) how r2 is insensitive to amplitude scaling. RE, on the other hand rewards accurate performance of capturing changes in mean state even when interannual tracking is quite poor (b), while it penalizes lack of capturing changes in mean state when interannual tracking is perfect (a). RE also does not inappropriately reward insensitivity to amplitude scaling even when the mean state is accurately captured (c). The time series presented are arbitrary. Statistics are calculated over years 0-49 only. (Adapted from Rutherford et al., 2005).

  41. Ed Snack
    Posted Mar 8, 2006 at 10:06 PM | Permalink

    Oddly enough, in my experience, Climate Audit has its share of extreme and sometimes odd posters, but is far more reasonable and measured in both the posts and the comments than Deltoid has ever been. Good to see you slumming in the cesspool though John.

    As for the absolutely typical slur about Steve being an industry “shill”, my giddy aunt, can’t any of the religious left come up with something more original ! I use religious quite deliberately as it takes that sort of absolutist mindset to make such accusations and think that it has some meaning.

  42. Louis Hissink
    Posted Mar 8, 2006 at 10:14 PM | Permalink

    Re #39

    And we can’t really decide whether Ken Miles is lying or not, because post-modernists don’t understand it. They think it is an alternative narrative. So identifying him as such would serve no purpose.

    Sometimes I think life under the Taliban might be easier if the Ken Miles of this world gained political power.

    At least one thing is certain – that Deltoid and its clockwork mice need to use this form of argument basicallys their case is lost.

  43. Louis Hissink
    Posted Mar 8, 2006 at 10:15 PM | Permalink

    it’s happened again – basically means
    sorry about that 🙂

  44. Posted Mar 8, 2006 at 10:16 PM | Permalink

    If my feeling for the statistics is correct, the main difference is that you get a better RE score even if the agreement between the datasets follows from slow autocorrelations. On the other hand, when R2 is evaluated, the positive effect of this inertia and autocorrelations is removed.

    This difference is analogous to the difference between the “independent temperature variations for every year” on one side and “random walk” on the other side. Because in the latter case, the truth is somewhere in between – the critical exponents for the autocorrelation are in between the “independent random” and “random walk” exponents, I would guess that some kind of compromise between R2 and RE would be the most appropriate measure of the reliability of the climate reconstructions, too.

    R2 is the quantity that is most sensitive about the individual numbers every year, which is why it can be the finest measure of the accuracy of a model if the model is really exact and works from year to year. RE is more appropriate if you admit that the model is not exact year-by-year but instead, the tree rings in year XY are affected by several previous years.

    Do you share my feelings, Steve? Could you create a new kind of statistic that interpolates between RE and R2? I mean not just some dumb average of the results but instead some measure that is more tolerant towards very short-term annual fluctuations than R2 is, but still keeps the individual annual data as important ones, unlike RE?


  45. Posted Mar 8, 2006 at 10:22 PM | Permalink

    Let me say it differently. Essentially I propose that R2 is the correct starting point, but one should try to evaluate the correlations between the proxy at year XY and the calculated temperatures at several years before XY with some exponentially decaying damping with lifetime T. Calculate it for several values of T – of order a few years – and try to pick some relative local peak. In this sense, you would be calculating R2 of a more complicated model that admits a delay in the effect of the temperatures on the proxies. It may be that RE is close to the generalized R2 with very large values of T.

  46. Steve McIntyre
    Posted Mar 8, 2006 at 10:45 PM | Permalink

    A couple of things on r2 versus RE. First of all, before one even worries about which is “better”, let’s simply start with what Mann said he did. He said in MBH98 that their reconstruction had verification statistical skill and that they consulted RE, r and r2 statistics in the verification period. IPCC TAR followed up by saying that the reconstruction had skill in verification statistics. So regardless of what Mann may feel now, the starting point is what he said then. The warranties of skill were not incidental to MBH; they were instrumental to its acceptance.

    RE is not a statistic that is used much outside the tree ring business. Econometrics facing similar issues tends to use r2 supplemented by a range of other statistics – Durbin-Watson, Lagrange multiplier, etc. etc.

    Ammann and Wahl purport to justify RE over R2, but a couple of points: (1) it’s pretty late in the day to be trying to cooper this up; (2) they provide NO third-party statistical references to support their position. Their only references are to Wahl [2004] and Lytle and Wahl [2005] for articles about pollen counts, not time series.

    Also, and I’m going to post up in detail on this, if you have a negligible r2, your standard errors explode and your confidence intervals in Hegerl’s term go from the floor to the ceiling. I told Ammann about this in San Francisco and he complained that I hadn’t mentioned it in my review. Well, I wrote a long review and ran out of steam listing all the Ammann problems. However, Ammann then proceeded to ignore this problem. So while they claimed to “exactly” replicate MBH, they didn’t carry forward to the confidence intervals.

    While I think of it, there’s a conundrum here. In the “low frequency” approach that they are advocating, think about how you would calculate confidence intervals. They define low frequency as 50-year bins. You have 2.5 bins. So you can’t get any confidence intervals other than floor-to-ceiling. You get this in wavelets where the confidence intervals become floor-to-ceiling in lower-frequency scales.

    Nychka presumably knows all this. I wonder what statistical services that he provided to Ammann. I hope that he didn’t approve the final article or he’ll be laughed out of statistical organizations. I can’t figure out why he’s staying on the NAS panel. He’s alrady occupied a place that should have been occupied by an independent statistician and he failed to question Mann about statistical issues that he knew to be wrong.

  47. jae
    Posted Mar 9, 2006 at 6:37 AM | Permalink

    It seems to me that the important issue isn’t the correlation statistics that are used, it’s the sampling method they used. It is my understanding that you can’t use these statistical procedures with cherry-picked data.

  48. jae
    Posted Mar 9, 2006 at 6:42 AM | Permalink

    This whole thing just blows my mind. I could easily demonstrate global COOLING, if you let me cherry-pick the series for my proxy. Isn’t that the central issue here?

  49. kim
    Posted Mar 9, 2006 at 6:52 AM | Permalink

    jae, review Mickey’s Apprentice in Fantasia. Policy wrought from ‘the stick’ and IPCC have already made a horrible mess.

  50. Peter Hearnden
    Posted Mar 9, 2006 at 6:56 AM | Permalink

    Re #49, jae, no, the central issue is you simply don’t understand how data is selected. Go away and do some learning before you spout off. Learn how you have to eliminate the proxies that are effected by other issues than temperature and thus get left with the ones where temperature is the determiner. Of course you wont, becuase you (somehow) know better than those who done the learning (yep, beats me how too).

    That said, I’d love to see your global cooling graph. Go for it and then let us know where we can see it.

  51. Paul Gosling
    Posted Mar 9, 2006 at 7:11 AM | Permalink


    A couple of points you should be able to clear up for me.

    The calibration period compares a proxy to the instrument record (or younger proxy with “known’ temperatures). The model of this proxy is then validated using a different period of the know instrument or proxy record? So if you have a 100 year instrument record you calibrate your proxy on the first 50 years and validate it on the second 50 years for example?

    My limited understanding of statistics tells me that r2 is very sensitive to outliers, is RE less so?

  52. per
    Posted Mar 9, 2006 at 7:32 AM | Permalink

    Re: #52

    the central issue is you simply don’t understand how data is selected. Go away and do some learning before you spout off. Learn how you have to eliminate the proxies that are effected by other issues than temperature and thus get left with the ones where temperature is the determiner.

    Dear Peter,
    forgive me if I am failing to understand some subtlety, but are you serious about what you have written ?
    Are you aware that there is a problem with your approach ?

  53. Peter Hearnden
    Posted Mar 9, 2006 at 7:42 AM | Permalink

    David, enlighten me.

  54. Steve McIntyre
    Posted Mar 9, 2006 at 8:02 AM | Permalink

    #51. In tree ring practices(Fritts), they raise the issue of overfitting and thus test their models on a withheld verification period. Now if you’ve only got 120 years or so and the series are highly autocorrelated, you can;t get truly “independent” periods to start with. But let’s leave that for now.

    Fritts, a standard tree ring text or even Cook et al 1994, propose a series of statistics to be used to check the validity of the model in the verification period – including BOTH the r2 and RE statistic. Prior to the present dispute, there was NO article in the literature advocating RE over r2.

    If you have a classic spurious regression with unrelated trends, if you ex post divide the data into two periods, you will get a high RE statistic. Thus, my stocks and white noise have a higher RE statistic than MBH. So it’s possible to have spurious RE statistics.

    In the Vs-Zorita comment on our article, they did not report verification r2, but it looks like it was high. In reconstructions of Cook et al 2004 of drought (using many MBH tree ring sereis), they have both high RE and R2 statistics.

    But regardless, Mann said that he also used the r2 test in MBH98 and that his reconstruction had statistical skill. People relied on this article. You can’t just come along 8 years later and say that the r2 statistic is piffle. BTW he told Natuurwetenschap, when asked about the r2 dispute, NWT pointing out that the r2 statistic had been used in Mann and Jones 2003 (and other examples abound) that their reconstruction passed the r2 test. Now he says that he didn’t calculate it – that would be “silly”.

  55. Paul Penrose
    Posted Mar 9, 2006 at 8:36 AM | Permalink

    John Hunter,
    You do realize that by posting here you have included yourself with the “idiots” in the “cesspool”. While I agree that there are some individuals that engage in ad hom. attacks here (on both sides of the argument), characterizing every poster here as an “idiot” and the blog as a “cesspool” is beyond disgusting – and your gleeful support of that statement is certainly a black-mark against your character. You should be ashamed of yourself.

  56. Peter Hearnden
    Posted Mar 9, 2006 at 9:01 AM | Permalink

    Re #55, you have a well developed sense of irony. How would you characterise accusing known scientists of fraud? You approve of that? You think it brings credit to this place when the accusation comes from this places number two?

  57. per
    Posted Mar 9, 2006 at 9:08 AM | Permalink

    Dear Peter
    I did ask you two questions, but you have answered neither.

    If you simply start with a set of random data sets which have no relationship to temperature, if you discard those which go down, and those which stay level, you will be left with those which go up, i.e. those which “show a relationship with temperature” whether there is such a relationship or not.

    The same argument applies to tree samples. You should be able to tell whether the trees are temperature-related before you do the sampling; and presumably you do not deliberately sample trees that you know will have a “bad” relationship with temperature. If you have to do the sampling, and then inspect the data to see if it goes up or down before you can tell if it is “temperature-related”, you are just cherry-picking the data that shows what you want, and ignoring the data that is necessary for the correct picture.

    I hope I am putting this at a level that you can understand.


  58. Steve McIntyre
    Posted Mar 9, 2006 at 9:34 AM | Permalink

    #57. Fortunately this problem is on the table for the NAS panel. The large population sample of “temperature sensitive” sites from Schweingruber shows a decline in ring widths and MXD. From this large population, individual sites can be selected which have late 20th century upticks: Yamal, Sol Dav(Mongolia), bristlecones.

    Other than Briffa et al 2001 on MXD, all the multiproxy studies use tiny samples of about 10 sites. So we see repetitive use of Yamal, bristlecones, Sol Dav and lo and behold a hockey stick.

    This is going to be a real quandary. Mann was asked whether tree ring proxies could be relied upon to register warm period in view of this problem. I’m going to post more on this. His instinct was to pick a graph showing 3 already cherry picked series (Osborn and Briffa, I think) and said, these are still in the linear range. Hopefully the panel sees through this.

    Since the average of 387 sites is down in last half 20th, you cannot obtain the hockeysticks of the multiproxy studies by a random selection.

  59. Posted Mar 9, 2006 at 9:45 AM | Permalink

    #57 Actually to select temperature sensitive site you don’t need to discard the downward sloping ones. The calibration set will take care of the orientation for you and change them to upward sloping.

  60. Paul Penrose
    Posted Mar 9, 2006 at 9:57 AM | Permalink

    Re: #56
    Another logical fallacy from Peter: Red Herring. Just because I post on here occasionally, and just because I generally agree that there are problems with many of the AGW theories and their “proofs”, does not mean I agee with everything posted here. In fact, I have been critical from time to time on the content of some of the comments – to the point that I actually stood up for you once or twice. Of course you know this Peter, and your Red Herring was an attempt to draw attention away from John Hunter’s disgusting remark in a kind of “well other people do it” argument. But just so there’s no confusion, and even though I’ve never lent any kind of support to name calling of any kind: I don’t agree with the statements that AGW supportes are “religious zelots”, or are “fradulent”, or have committed scientific misconduct”. I have some serious misgivings about some of the data and methods used, and am very disturbed at the unwillingness to make all data and computer code freely available, but I don’t make claims that I can’t back up.

    So you’ll have to pardon me if I’m offending at being called an “idiot”. In fact, I demand an apology. Not that I’m expecting one mind you, but I still want it.

  61. Peter Hearnden
    Posted Mar 9, 2006 at 10:24 AM | Permalink

    Re #60. I don’t think calling people idiots is nice tbh. I’m not sure if that’s what John Hunter meant either, but clearly, idiots are few and far between here. I was trying for some context. However John Hunter can speak for himself.

    David, re #57. I’m not a dendrochronologist/climatologist, neither are you. Difference is I don’t think I know their trade better than they do.

    Btw, your last sentence is very typical of you at your condescending worst, otoh at your best I’ve a good deal of time for what you say 🙂 . To get your questions fully answered ask a dendrochronologist/climatologist person, I would be interested in the answers for I only answered with how I understrand thing are.

  62. jae
    Posted Mar 9, 2006 at 10:33 AM | Permalink

    Paul: Peter is just trolling for an argument. Join the “ignore Peter” club.

  63. jae
    Posted Mar 9, 2006 at 10:39 AM | Permalink

    One big trouble with picking series that are supposedly affected (positively) by temperature is that you really don’t know whether it was temperature or some of the many other variables that affected tree growth or wood density. Moisture or CO2 are much more likely than temperature to influence ring widths and latewood density. I am just simply astounded that these procedures are presented by these guys with a straight face. “You have to pick cherries to make cherry pie.” WOW! If I were on the NAS Panel, I would condemn the whole lot, based on this alone.

  64. David H
    Posted Mar 9, 2006 at 10:46 AM | Permalink

    Peter, look at

    Dr. Mann received his undergraduate degrees in Physics and Applied Math from the University of California at Berkeley, an M.S. degree in Physics from Yale University, and a Ph.D. in Geology & Geophysics from Yale University

    So where is his Climatology qualification? When was the first degree in climatology awarded to anyone?

    In any case so far as I can see the issues are to do with biology and statistics. He has no biology degree and tells us he is not a statistician.

  65. per
    Posted Mar 9, 2006 at 10:52 AM | Permalink

    Dear Peter
    I hope i am putting this at a level you can understand.
    In #50, you told off jae for his poor comprehension of data selection. You told him how you have to discard those which don’t give the right signal, so that you are left with the real signal.

    When I put it to you that what you were suggesting was an obvious error, and gave my reason clearly, you have immediately abandoned any argument, and resort to sneering that I don’t know what I am talking about.

    The record shows that i have justified what I said, and explained my logic. Not only are you unable (or unwilling) to justify what you are saying, it also appears that you are quite happy to browbeat others when you are manifestly unable (or unwilling) to justify the words you are using.
    And for the record, I have set out a line of logic, and some facts. If you are able to show me why that line of thought has problems, I will entertain them. You have not done so.

  66. John Lish
    Posted Mar 9, 2006 at 11:40 AM | Permalink

    #61 – Peter you said:

    Difference is I don’t think I know their trade better than they do.

    How about applying the same logic to statisticans?

  67. Posted Mar 9, 2006 at 3:22 PM | Permalink

    RE: #50

    While it may be a bit amateurish, since the infamous spaghetti graph (available on the home page of realclimate) is based on northern hemisphere temperatures, I graphed for southern hemisphere temps using the same data sets (I will post it on my blog, which unfortunately is down at the moment). Interestingly, it does show a cooling trend…..

  68. John Hunter
    Posted Mar 9, 2006 at 4:00 PM | Permalink

    #39, #55, #56, #60, #61: Sorry, I don’t apologise for being open. Ken put my feelings into words so well that I felt impelled to post them here. Remember it was headed “A Tribute to Steve McIntyre” — hopefully it shows that I have a certain respect for the man (I find it interesting that everyone seems to have ignored that bit of Ken’s message — not much to stick the knife into, I guess).

    However, for once I disagree with Peter who said “I don’t think calling people idiots is nice tbh. I’m not sure if that’s what John Hunter meant either, but clearly, idiots are few and far between here.”.

    Sorry, it is exactly what I meant (and, as should be obvious, I did not imply ALL climatesceptics posters are idiots as I am a poster myself).

  69. jae
    Posted Mar 9, 2006 at 4:09 PM | Permalink

    Wow, Hunter really shows a superiority complex. Might be the sign of an inferiority complex.

  70. Ed Snack
    Posted Mar 9, 2006 at 4:10 PM | Permalink

    Just to put some context on my earlier comment. I agree with Paul Penrose that labelling all AGW believers as “religious” is both untrue and unhelpful. However there is a certain subset of the AGW crowd to whom the label can be justly applied. Included in that subset IMO are those who automatically classify any person who doubts any part of the current “scientific consensus” as an “industry shill”. The same group also tend to studiously ignore any evidence that potentially undermines any part of that “consensus” regardless of how well researched or solidly based it is. That is part of a quasi-religious belief set, and a foolish one at that.

    Mind you, there does seem at times to be a certain subset of the skeptical side who has a similar quasi-religious belief that there cannot be any AGW.

  71. Steve McIntyre
    Posted Mar 9, 2006 at 4:23 PM | Permalink

    The word “religious” is for the purposes of this blog going to be added to the spam list.

  72. Armand MacMurray
    Posted Mar 9, 2006 at 4:41 PM | Permalink

    OK, John, you’ve got me confused again. You note

    and, as should be obvious, I did not imply ALL climatesceptics posters are idiots as I am a poster myself)

    Did you mean:
    1) That you, yourself, are a “climatesceptic”?
    2) That you believe the Yahoo “climatesceptics” group is a cesspool of idiots?
    3) Or that, distracted perhaps by the extensive time and effort you spend on disseminating the data/code/methods from your published papers, you mistook the Climate Audit site for the “climatesceptics” Yahoo group?

  73. John Hunter
    Posted Mar 9, 2006 at 5:34 PM | Permalink

    #73: Sorry Armand, early morning muddled head — I meant “climateaudit” not “climatesceptics”.

    But, now you come to mention it, the Yahoo group “climatesceptics” would fit just as well …..

  74. Ed Snack
    Posted Mar 9, 2006 at 6:13 PM | Permalink

    Apologies then Steve, for bringing it up. Unthinking attribution tends to irritate. Sorry.

  75. Posted Mar 10, 2006 at 7:42 AM | Permalink

    Dear Steve,

    thanks for your explanation. I completely agree that comparing some complicated formulae when you have 2.5 bins is a complete nonsense. This is what always strikes me about the very description of the 20th century climate. It only exhibits trends on a 30-year basis, and these periods go up-down-up, and one of them essentially disagrees with the prediction of a very convoluted and conspiratory model. A pretty bad result.

    Of course that I think, on physical basis, that if the tree rings and other proxies are a good estimate of the temperature at a given year, it is exactly the high-frequency, year-by-year detailed information that should be used to find the correct models and dependencies. The only concern is that the one must be careful whether you assign the thickness of a tree ring to year X or X+1 etc, and this finer job of making the model – or assuming that the temperature affects several years afterwards – should be done very carefully. After a good model like that is done, of course that the R2 that eliminates most of the impact of autocorrelations and long-term persistence should be viewed as a criterion whether the correlation between the proxies and the temperatures exist.

    All the best

  76. Paul Penrose
    Posted Mar 10, 2006 at 8:13 AM | Permalink

    John Hunter:
    I’ve tried very hard not to hurl insults, to be objective, rational, and polite in all my postings, but you are really straining my patience. It’s obvious from the hateful invective you use that you are not interested in a reasoned debate, but are just trying to get a shouting match going. I am not going to accomodate you.

    You sir, are no gentleman and have totatally discredited yourself in my eyes. Therefore I will no longer be reading your tripe nor responding to it. Have a nice life.

  77. Steve McIntyre
    Posted Mar 10, 2006 at 8:24 AM | Permalink

    #various insults – I dislike deleting and editing this stuff. I’m going to leave up the exchange of insults for another day and then trim this back. No more clutter please.

  78. Posted Mar 10, 2006 at 8:49 AM | Permalink

    Incidentally, sometimes the hockey stick graphs may be real, such as this one:

  79. Peter Hearnden
    Posted Mar 10, 2006 at 8:58 AM | Permalink

    Re #77. “I’ve tried very hard not to hurl insults, to be objective, rational, and polite in all my postings”

    So have I (though I doubt you’ll accept that, ‘belivers’ are, somehow, a different sort of human being here – am I right? I bet I am 😦 ). Where has it got me? I’ll tell you: Insults. Snide asides about my intelligence. Insults. Snide asides about my intelligence. Snide asides about my ‘unwillingness to enter the debate’ (this is the one you normally get after you quote the IPCC – because that just get dismissed). Insults. Accusation of being religious. When I’m open about my amateur status insults about that. Accusations of being religious. Insults. A campaign to ignore me (but, heck, anyone who disagrees here gets that – see you post for the most recent example). Snide asides about my intelligence. Dismissal with something along the line of ‘you lack the comprehension’. Insults etc etc

    Disagree with the POV of this place and – *I GUARANTEE* – (if you stay long enough) one of the above is what you get. Dano, John Hunter, John Cross, Steve Bloom, Michael Seward, Myself, plus a long list of scientists who dont even participate but are hated all the same, all get one or all of such treatments. I know this, I’ve read pretty much every comment here – I’ve been here since the start! I tell you it’s quite amazing to watch it all unfold and be able to predict where it’s all going!

    Such treatment can get to people – your post shows this (and, you agree with this place, try disagreeing!!!). Me, I’ve taken so much it just feels normal.

    So, before you carp, just stand back a little.

  80. Steve McIntyre
    Posted Mar 10, 2006 at 9:07 AM | Permalink

    Peter, I don’t want this thread to turn into everyone telling their sad stories. Speaking for myself, I think that I’ve been consistently more polite to all of the above people, including yourself, than they have been to me. Please don’t respond to this observation. I won’t delete it, but please don’t. But let’s talk about the topic of the post, rather than proving Ken Blumenfeld’s observation that every post on climate science in every forum converges to the same discussion.

  81. ggh
    Posted Mar 10, 2006 at 2:22 PM | Permalink

    Fun to see scientists ( I think ) duke it out.
    If a group mentioned earlier refuse to disclose their raw data doesn’t that pretty well shoot down their credibility? The data may be “good” but probably open to lots of looking over – I can’t even find some receipts from last year and you guys a arguing over stuff that happened before the Romans ( you must be good ).

  82. per
    Posted Mar 10, 2006 at 3:10 PM | Permalink

    just out of interest, most all of the statistics in the above table are given to three decimal places. Why then do they go to 5 decimal places for 1700-1729 ? Or is it that an r2 of 0.000 looks embarrassing ?

  83. kim
    Posted Mar 11, 2006 at 6:52 AM | Permalink

    I resemble that ‘idiot’ remark, and I demand extemporation.

    I don’t see cherry trees listed among the species sampled and that symbolizes the irrelevancy of all the tree ring data. Moved on, sure, but to what is in reality an undetected signal. Whence now? I suspect better models will be more productive than better proxies, simply because imagining the chaotic elephant is preferable to being deluded by touching isolated, unrepresentative, phenomenological extremities created by chaos. And yet, it persists.

  84. BradH
    Posted Mar 11, 2006 at 7:21 AM | Permalink


    You’ve posted up heaps of things in the last week, so I might have missed this, but just to clarify things: you and Ross have been hammering away at the R2 issue for ages. Every editor of every journal (except Woodwind Sonatas Monthly) must be aware of your arguments. I don’t think it could have slipped past the batter.

    So, have you and Ross decided what you might do, now that Schneider has deforested his high moral ground? What are the options? Do you have a right of reply? Will you submit a new article dealing with A&W issues?

    As a final question, do IPCC publicize their reviewers in advance? Given you last experience, I wouldn’t expect that you (or Bellamy) will be reviewers, but it’d be interesting to know who will be.

  85. Steve McIntyre
    Posted Mar 11, 2006 at 8:29 AM | Permalink

    As far as I understand the process, “expert” review of IPCC 4AR is over. They’ve done a revision which goes to the governments.

    We weren’t offered an opportunity to issue a Reply. We’ll probably submit a Reply, but it’s pretty frustrating wading through all the sludge – especially since the sludge should have been dealt with editorially so that whatever germ of an actual point remained could be dealt with.

  86. Rufus
    Posted Mar 11, 2006 at 6:32 PM | Permalink

    I really don’t understand this argument. So what if (as you claim) 20th century temperatures are lower/higher than the medieval warm period. That proves nothing. What matters, NOW, is that our 25+ year temperature increase is due to greenhouse gases. That’s what the models show, and it doesn’t matter how things stack up against the MWP. So all this argumentation is for nought….

  87. Dave Dardinger
    Posted Mar 11, 2006 at 7:35 PM | Permalink


    Your post would be good if it were meant as satire; unfortunately I suspect it isn’t. The models are designed to show what’s believed to be how things work, not to generate climate from first principles. Consequently, if the beliefs behind the models are shown to be wrong, the models are worthless.

  88. Paul Penrose
    Posted Mar 11, 2006 at 7:50 PM | Permalink

    The models (GCMs) are incapable of proving or disproving the AGW theory. They have been designed with an a priori assumption that AGW is true and are only an attempt to understand what the possible range of effects are. This is not my belief, by the way, but what the designers of these models have stated.

    So once you take the models out of the equation the past temperature reconstructions become a key element in the AGW theory. This is why Steve and Ross (and others) work in this area is so important and is attracting more attention.

  89. jae
    Posted Mar 12, 2006 at 12:37 AM | Permalink

    87. Wow! I thought I would hear some kind of scientific pronouncement from the guru at Realclimate. But what do I read–no facts, but another sermon! LOL. You guys are really in denial and are running with your tails between your legs.

  90. Steve Bloom
    Posted Mar 12, 2006 at 2:46 AM | Permalink

    Re #90: Oh, jae, please watch the details. The spelling details of names, in this case. Even if a post is signed *Rasmus* as opposed to Rufus, remember that this here is the internet and anonymous posters can use any name they want. Nigel Persaud, e.g. (but there I go being mean to Steve again, even though he’s never once criticized me for using a sock puppet). Plus I’m being pompously mean to you, too. 🙂

  91. Hopalong
    Posted Mar 12, 2006 at 9:41 AM | Permalink

    Not sure this is the place to put this off-the-wall thread from a different universe, but here goes…

    Years ago, while doing research at a national laboratory in Tennessee, a fellow I’m intimately familiar with had the technical lead role in a study of nuclear power plant mechanical equipment (pumps, valves, etc.) failure rates. This was being done in support of the U.S. Nuclear Regulatory Commission’s program to develop guidelines for extending the licenses of operating stations. The putative goal was to identify age/failure rate relationships.

    The approach developed was to analyze individual component failures from a relatively large (thousands) failure database, using narrative descriptions developed by the maintenance and engineering staff involved at the particular facility. Particular emphasis on how the failures were detected (planned test or inspection, failure during operation, etc.), the affected component part (packing, bearing, shaft, etc.) and the extent of degradation (minor operational inconvenience for the component, significant degradation of the system it served, etc.). Crosscuts by age, specific system, manufacturer, and others were completed.

    The sponsoring agency was simultaneously being lobbied by another research organization to develop a regulatory framework around Arrhenius-based aging equation [F=A*exp(-B/t)]. The concept was that for every component, all one would need to do is to define the magnitude of A and B, then out would pop a failure rate vs age that could be used to establish clearly-defined requirements for overhaul, additional monitoring, etc. Neither the espouser of this model nor the regulators who ultimately bought into it had been contaminated by real world hands-on experiences with industrial equipment (the proverbial armchair engineers)

    As it turned out, the failure record studies showed that the combination of the method of detection and affected part revealed some very clear patterns that pointed out weaknesses in existing monitoring efforts. For example, bearing degradation was, by and large, not being detected until the equipment had already failed or was on the verge of it. The existing monitoring regulations, which had been in place for decades, only required periodic bearing temperature and overall vibration amplitude monitoring. Neither technique will indicate bearing degradation until failure is imminent, so the finding was not surprising. However, over the years since regulations were originally invoked, technology (e.g., vibration-based bearing flaw detection methods that employ frequency-domain analysis) had developed sufficiently to address this issue. By the early 1990’s, virtually all plants were using these methods on equipment that was important to production, but not necessarily on the safety-related equipment covered by regulation.

    On the other hand, the relationship between failure rate and age was incredibly mixed and weak, essentially stochastic for a very large part of the population. As it turns out, this mixed relationship has been found to be true in other industries (aircraft, for example), as well. That was not universally true, as components at a few stations that saw particularly harsh service environments clearly did have an age-related failure pattern.

    So it became obvious that Arrhenius-based modeling (or any other single mathematical relationship) would not provide any sort of engineering-based support for generic guidelines.

    To summarize this long-winded narrative, two end results were interesting:
    1.These study results (and, as noted, similar data from other industries) clearly refuted the Arrhenius model approach. In spite of that, the attractiveness of a mathematical model won the day. The laboratory in Tennessee was not asked to study additional components. The modeling organization garnered considerable additional research funds.
    2. The observations from the study that were eminently practical in nature were not acted upon by the regulators. Fortunately, the industry saw the merits of more broadly applied diagnostic techniques, and have pretty much adopted them on a voluntary basis where it makes sense to do so.

    The real world is messy – all it needs to make it better is a model, and if that model can focus on only one or two variables, all the better.

  92. jae
    Posted Mar 12, 2006 at 11:12 AM | Permalink

    Bloom: OK, I guess i got my Rufus and my Rasmus mixed up. But not my logic, LOL.

  93. Brooks Hurd
    Posted Mar 12, 2006 at 11:56 AM | Permalink

    Re: 92

    The real world is messy – all it needs to make it better is a model, and if that model can focus on only one or two variables, all the better.

    As you point out, it can be problematic when a model is accepted because of its simplicity in spite of of empirical data which points out that the model is a poor predicter.

    In the case of climate, we now have so many models, that there is a wide range of predictions for future temperature. Depending on the model selected, the prediction can be higher or lower temperatures with a wide range of predictions between the extremes. Therefore when the BBC states in Polar ice sheets show net loss:

    Mass changes in the ice sheets match predictions from computer models of global climate change, they say.

    “they,” the researchers, have simply selected a model which matches their conclusions.

  94. John A
    Posted Mar 12, 2006 at 12:12 PM | Permalink

    Why do these climate modellers remind me so much of psychics with their ex post facto rationalisations? I can’t tell if I’m watching climate science or “Crossing Over with John Edward”.

    When was the last time that a climate modeller emerged from a set of modelling runs to tell us that the results were nowhere as bad as feared?

  95. Brooks Hurd
    Posted Mar 12, 2006 at 1:04 PM | Permalink

    RE: 95

    When was the last time that a climate modeller emerged from a set of modelling runs to tell us that the results were nowhere as bad as feared?

    Whenever they want their funding (from the politicians) to be drastically reduced.

  96. ggh
    Posted Mar 13, 2006 at 7:41 AM | Permalink

    Greenland – Stone ruins of Greenland, when it was green, are right on the coast ( not way up the slope ). Thankfully, when the ice cap melts the seas shouldn’t get too high – r2 better than tree rings.

  97. JEM
    Posted Mar 16, 2006 at 10:45 AM | Permalink

    Re 94: (Re: 92)

    Therefore when the BBC states in Polar ice sheets show net loss:

    Mass changes in the ice sheets match predictions from computer models of global climate change, they say.

    “they,” the researchers, have simply selected a model which matches their conclusions.

    It’s worse than that. Far worse than that.

    A small application of simple arithmetic:

    That BBC report on polar ice sheets does make interesting reading — as an example of how to generate a panic when there’s no panic to generate.

    We are told, “…a US team says that 20 billion tonnes of water are added to oceans each year.”

    Now, it’s not clear if this is due to Greenland melting only, or from Antarctic melting as well. But I’ll assume for now it’s Greenland only. That’s because the report is more significant if that’s so.

    Well, in round numbers the area of the Greenland ice sheet is two million kilometres. Therefore we are being told 10,000 tonnes of ice are lost per square kilometer per year. Given that ice or water weighs approximately one tonne per cubic meter, that translates to 10,000 cubic meters of melting per square kilometer.

    Wow! That sound really terrifying, let’s all panic!

    But wait. What that really means is that the ice sheet is sinking at the prodigious rate of… (wait for the bad news…) (contain your panic…) 0.001 mm per year.

    Gee, at this rate, if the ice cap is two kilometres thick on average, Greenland will be all melted away in a mere 200 million years!

    There are two worthwhile observations to be made about this report.

    One is that the general public and most of the people working for the media, and politicians as a class, and probably the entire Green movement, have no concept of the true meanings of large numbers.

    The second is that if this is the most accurate report available from scientific measurements, of changes in Greenland ice cap melting, any increase or decrease in thickness is too small to be measured in a meaningful way. For this report to be taken as evidence of global warming would require a faith in the accuracy of measurement that would make even Mann blush.

    And even then, there’s clearly no rush…

  98. Dave Dardinger
    Posted Mar 16, 2006 at 11:06 AM | Permalink

    JEM, you need to watch your digits. 10,000 cu M / Km2 / yr is one cm. [1 Km2 = 1000m x 1000m = 10^6 sq m. 10^4 / 10^6 = 10^-2 m = 1 cm.] Therefore it’d only take 200,000 years to melt the cap at that rate. Therefore our 6000-ggrandchildren might truly see green land!

  99. ET SidViscous
    Posted Mar 16, 2006 at 11:18 AM | Permalink

    Well guys don’t forget. If we do a simple linear extrapolation from the end of the hockey stick straight up, as they do to show temperatures in 50-100 years, in ~10,000 years the Earth will be at a few million degrees C, and will presumably undergo fusion of the atmosphere.

    That might melt the icecaps sooner.

    Of course if doing a straight linear extrapolation is not the proper way to do that then never mind.

  100. JEM
    Posted Mar 16, 2006 at 11:23 AM | Permalink

    Re 99:

    you need to watch your digits. 10,000 cu M / Km2 / yr is one cm. [1 Km2 = 1000m x 1000m = 10^6 sq m. 10^4 / 10^6 = 10^-2 m = 1 cm.]

    Woops yes. It seems I used a cubic Km instead of a square Km in my calculation. That was very silly, and sorry.

    But–as you imply–the substantial point holds, I think: based on this report, there is no Greenland ice cap melting going on worth getting worked up about.

  101. John G. Bell
    Posted Mar 16, 2006 at 11:32 AM | Permalink

    Could we use the bomb tests as a marker to detect and date a particular layer of ice and measure the accumulation above? Do this in enough locations and you might pin Greenland down. This wouldn’t work along the coasts where the melting is going on but might work well in the interior.

  102. Steve Sadlov
    Posted Mar 16, 2006 at 5:40 PM | Permalink

    RE: Arrhenius

    OK for early semiconductors, but long since disproven for most everything else.

    Indeed, a highly cautionary tale!

  103. BradH
    Posted Mar 20, 2006 at 6:09 AM | Permalink

    Re: # 99

    Kind of like CO2 “doubling” (wow! sounds scary), from 0.00028 parts per cubic metre, to 0.00056 per cubic metre of atmosphere in a century.

    Phew! I’m glad there’s nothing else in the atmosphere as variable as CO2. Else, we’d all be ru’ned.

  104. Jon-Anders Grannes
    Posted Mar 20, 2006 at 8:49 AM | Permalink

    The pressure is building up?

    I wonder what air pressure we have had the last 1 billion years?


  105. JerryB
    Posted Mar 20, 2006 at 10:05 AM | Permalink

    Re #104,


    Atmospheric CO2 is commonly measured in parts per million by volume (ppmv), but there are many more than 10^6 molecules of air per cubic meter, more than 10^20 near sea level.

  106. BradH
    Posted Mar 20, 2006 at 9:57 PM | Permalink

    Thanks for pointing that out, Jerry.

    Anyway you look at it, the CO2 increase is very, very small in comparison with the other consituents of the atmosphere.

  107. ET SidViscous
    Posted Mar 20, 2006 at 10:12 PM | Permalink

    THat’s why the entire AGW precept is based on positive feedback.

    1. Small amount of CO2 increaces temp, which increases the atmospheres water capacity (Humidity).

    2. since H20 is the primary greenhouse gas this gives us more warming, which allows for more humidity.

    3. Goto step 2.

    It’s not talked about much, but it is the basis for the concept. The thought (incorrect in my opinion) is with a small amount of CO2 the atmosphere runs away to venus by a week from next Tuesday (slight exegeration).

  108. Tim Ball
    Posted Mar 21, 2006 at 4:34 PM | Permalink

    In response to the question, “What the hell hapened in the early 18th century?” Notice that the preceding decade is the second lowest. In response to the question, Is there global warming? I reply , yes but most recently it has been going on since 1680. I have pointed out for years that the nadir of the Little Ice Age was in the 1680s extending to about 1730. Ice on the Thames in the Year of the Great Frost was in 1683. I don’t think it is just coincidence that the Hudson’s Bay Company got their charter in 1670 as demand for furs increased. There are at least 4 reports of Inuit (Eskimoes) in kayaks being sighted off the coast of Scotland, one even making it round to Aberdeen, in the period from 1700 to 1725. This suggest the sea ice was extensive and stretched almost continuously across the Atlantic to provide a ‘shoreline’ for people hunting seal. The added incentives were the fact the Inuit had migrated across the north in the Medieaval Warm Period just a couple of hundred years earlier. For them to travel further east is logical, especially if they knew Europeans had made the same trip bringing attractive items and materials. It clearly was a very difficult period for tree growth and this is possibly reflected in the statistical problems seen with the r2. I would surmise there were several years in which growth did not occur especially at some more difficult sites.

  109. jae
    Posted Mar 21, 2006 at 5:33 PM | Permalink

    RE: 109. I just find it incredible that many scientists are now denying (or ignoring) these historical facts, as well as much of the proxy data, and insisting that temperatures were relatively constant for the last one or two thousand years. What is even more incredible is that the “consensus position,” before the HS hype, was that temperatures were very cyclic and that there WAS a significant LIA and MWP. Oh, well, maybe by the time the Democrats take over the White House, it will be getting colder and we can comment on AGC theories.

  110. ET SidViscous
    Posted Mar 21, 2006 at 5:45 PM | Permalink


    Are you insinuating it will be a cold day in Hell before there is a democrat in the White house again.


  111. John A
    Posted Mar 21, 2006 at 6:08 PM | Permalink

    Without wishing to be political, but the last time there was a Democratic White House, they couldn’t get it past the Senate, which voted in a bill to block all climate treaties that damage the US economy. The vote was 95-0 and so the White House didn’t even try.

    I suspect that the horrendous cost of treaties like Kyoto versus the benefits (none) will force pragmatism to the top of the agenda.

  112. MarkR
    Posted Mar 22, 2006 at 12:36 AM | Permalink

    Hi All

    I’ve been reading this and other websites for a while, and it is clear to me that something/everything is very wrong with the Hockey Stick research.

    The question is what to do about it.

    The journals of record don’t seem to want to know, or are plain obstructionist.

    The NAS ask a different set of questions to what they were instructed to do by Congress.

    This is such an important issue because public policy is being corrupted and huge sums of public money are being wasted on the back of this research.

    When people don’t play the game, I like to look at the rule book.

    There don’t seem to be any state laws against what the Hockey Team are doing, but the researchers are regulated by their own educational establishments code of conduct.

    For example Michael Mann is at Penn State, and their “Guideline RAG16 THE RESPONSIBLE CONDUCT OF RESEARCH” (
    is an interesting read. I would guess that Mann is in breach of almost every regulation he could possibly be.

    Perhaps this offers the route to retraction and/or correction by Mann and his Hockey Team members?

    And as with a house of cards, take one card away and they all fall down.

  113. MarkR
    Posted Mar 22, 2006 at 12:55 AM | Permalink

    Oops sorry for the double posting above, but I was having some trouble getting it to upload.

    Also the link was wrong.

    It should be:

  114. John A
    Posted Mar 23, 2006 at 4:35 AM | Permalink


    Much as I’d like to see Michael Mann properly account for his behavior as a climate scientist (and no, the NAS Panel didn’t do that), its improbable that Penn State would have the ability to investigate Mann’s work done at UMass. The IPCC, it appears, does not hold anyone to account.

  115. John Finn
    Posted Mar 23, 2006 at 4:54 AM | Permalink

    Re: #109

    Tim Ball says

    “I have pointed out for years that the nadir of the Little Ice Age was in the 1680s extending to about 1730”

    This ties in quite well with the CET (Central England Temperature) record which shows temperatures dropping until the 1690s before rising again throught to the 1730s. A few stats

    Average annual temps in the 1690s were around 2 deg C cooler than in the 1730s.

    The average annual temperature in 1695 was 7.22 deg C. In 1733 it was 10.47 deg C. (In 2005 it was 10.44 deg C)

    Anyone reading previous comments by me may think I have some kind of obsession with the CET – sorry about that, but it is the only temperature record we have which covers this ‘maunder minimum’ period (c 1645-1715).

  116. Pat Frank
    Posted Apr 24, 2006 at 8:54 PM | Permalink

    #117 “I don’t know what the hell this r2 is?”

    It’s r^2 – ‘r-squared.’ It’s a goodness-of-fit statistic. The closer it is to 1.0, the better does the fit reproduce the data. It is a measure of goodness-of-fit for any linear or non-linear least squares fit to data. In a fit from theory, as opposed to a purely empirical fit, anything less than about 0.9 becomes suspicious.

    In empirical fits (those not constrained by a valid theory), people usally cite the r^2 fraction as saying that the fit explains ‘blah’ precent of the data wiggles. When someone is using a phenomenological model to fit some data, as in quantity of women’s lingerie sold to men correlated to the demographics of religious fundamentalism, for example, one would explain a fit with r^2 = 0.5 as saying that fundamentalist religious beliefs explain 50% of the sales.

  117. john lichtenstein
    Posted Apr 24, 2006 at 11:41 PM | Permalink

    JohnA, #117 is spam.

  118. Carl Wolk
    Posted Nov 23, 2007 at 12:29 PM | Permalink

    Hi, I’m a highschool kid with no background in science or statistics, so this sentence has left be bewildered. This is what I don’t understand:
    “RE score,” a
    “null” RE score,
    what the values of RE mean,
    what “red noise” is.

    Is an RE score the same as r2? I was wondering if anyone could explain this to me.

    “It overstated the explanatory power of the model when checked against a null RE score (RE=0), since red noise fed into Mann’s PC algorithm yielded a much higher null value (RE>0.5) due to the fact that the erroneous PC algorithm ‘bends’ the PC1 to fit the temperature data.”

    Also, in the future, is there a reference source that I could use for these types of situations?

  119. Ross McKitrick
    Posted Nov 23, 2007 at 4:12 PM | Permalink

    Carl, I think you meant to post this on the thread here–
    But thanks for posting your question anyhoo. I will try to decode.

    The proxy climate reconstruction problem involves taking proxy data and temperature data during an interval where they overlap (called the calibration interval), and working out a set of coefficients that map the two together, so that given the proxy data for an earlier interval you could estimate what the temperature data would have been. It’s easy to correlate any pair of data, so to test that you’re not just generating gibberish you do a “verification” test. You shorten the calibration interval a bit so it no longer covers the whole overlap. Now you have a calibration interval and a verification interval. Then you compute the mapping coefficients and estimate the temperatures during the verification interval. That gives you estimated temperatures, but also you have the observed temperatures for that period. The tests we are talking about boil down to asking how well your model estimates the observed temperatures during the verification interval.

    The r2 test is the simplest. It is the square of the correlation between them, and can be interpreted as the fraction of the variation in the observed data that the estimated data overlaps with. There are 2 other tests, the CE and the RE test. They are conceptually similar, but change slightly the point of comparison. The RE test, for instance, asks how much better your statistical model did than if you had just used the mean of the temperature data during the calibration interval.

    Each of these tests gives you a number, say 0.15. Now you have to decide if that’s a “pass” or a “fail”. In statistics, we decide pass and fail by asking if your model did better than if you had just used a bunch of random numbers. In some cases, tests come in standard forms so there are tables you can look up to tell you if 0.15 is a large number or not (for that test). But some tests, like the RE score, don’t have tables since the “marking scheme” so to speak changes with each data sample.

    In that case you can come up with the pass/fail cut-off by “monte Carlo analysis”. You estimate the model using random numbers instead of proxy data and seeing what kind of RE score you get. If you do this a thousand times and in 95% of the cases you get an RE score of 0.2 or less, then we would say that your proxy data has to yield an RE score of more than 0.2, or you’re not significantly better than random numbers.

    In the MBH case, they got an RE score for the longest portion of their model of 0.51. They said that the Monte Carlo experiment yielded a pass/fail cut-off of 0.0, so therefore they have significant explanatory power. However, we showed that they did not take account of the distortion induced by their decentered principal component method, and if they had done so, the RE pass/fail mark would move up to 0.56 (IIRC). We also claimed that their r2 score fails the significance test, and they knew it at the time but never reported it. This thread was started when 2 of Mann’s later coauthors published a paper in which they (reluctantly) reported the full suite of hitherto-unpublished test scores, showing failing grades for most of the reconstruction experiments.

    Hope this helps.

  120. Willis Eschenbach
    Posted Nov 23, 2007 at 5:49 PM | Permalink

    Carl, first, welcome to ClimateAudit, and congratulations on finding a spot for real science.

    Regarding your ideas, Ross M. has done a good job of explaining most of your questions, I hope. One that got missed was the question about “red noise”.

    “Noise” is any information in your data that is not the signal you are looking for. It could be literally anything, from instrumental variations, to the effect of other variables, to typographical errors in the collation of the dataset.

    There are three flavors of noise: white noise, red noise, and blue noise. White noise is the simplest, because it is totally random. There are some different flavors of random, such as gaussian or normally distributed, binomially distributed, Poisson distributed and the like. All of these types of white noise are called “IID”, or independent identically distributed. This means that they all have the same distribution (Gaussian, Poisson, etc.), and that they are all independent of each other.

    Red and blue noise are not IID, because the individual data points are not independent of each other. Instead, a given value depends in some way upon the previous value. For example, a very hot day is more likely to be followed by another hot day than by a very cold day. Today’s temperature depends in part on yesterdays temperature, so they are not independent. This kind of signal, where a high value is commonly followed by another high value and vice versa, is called “red noise”. Datasets with this kind of structure are called “autocorrelated”.

    Red noise is extremely common in climate data. We may, for example, be looking for a temperature signal in tree rings. However, the tree ring width may also be affected by say precipitation. From our perspective, the precipitation information is “noise”, because we are looking for a temperature signal. It is not white noise, though, because the precipitation signal is autocorrelated — a very dry year is more commonly followed by another dry year than by a very wet year.

    There is also the possibility that one data point depends negatively on the previous data point. For example, because of ocean cycles, good fishing years may alternate, with one year being good and the next one bad. In the case of noise, this type of data is called “blue noise”. Datasets with this structure are called “negatively autocorrelated”.

    Processes that generate red or blue noise are generally called “AR”, or autoregressive, processes. The simplest have the form

    X(t) = aX(t-1) + e


    X = data value

    t = time

    X(t) = data value at time “t”

    e = a random error, with some mean and standard deviation.

    What does this mean? It means that the value of “X” at time “t” is equal to some number “a” times the previous value X(t-1), plus some random number “e”. The variable “a” can take any value from -1 to 1. If “a” is negative, then you get blue noise, and if it is positive, you get red noise. If “a” is zero, the point doesn’t depend on the last point, and you get white noise.

    Finally, why is this important? Well, it turns out that the statistics for AR processes are different, sometimes very different, from standard (IID) statistics. In particular, long trends and wide excursions from the mean are much more common as the value of “a” increases towards 1. This means that many things that look like they represent some real trend are in fact just natural swings in an AR process.

    For a good exposition of this, see Cohn and Lins, “Naturally Trendy“, along with the discussion of Cohn and Lins here on CA.

    Best of luck, keep following the science, you can’t go wrong.


    PS – I’ve posted a list of acronyms used on this blog here

  121. Carl Wolk
    Posted Nov 25, 2007 at 9:34 PM | Permalink

    Thank you very much, Dr. McKitrick and Dr. Eschenbach. I felt somewhat guily for posting possibly the least-sophisticated comment I have ever seen appear on this site, but your answers were very helpful.

    I have a somewhat unrelated question for Dr. McKitrick with regard to his article, “Does a Global Temperature Exist?”: if you could establish that a separate, non-temperature-related series of data fits well (has a low r2?) to the temperature data, could you not say that the temperature data then has some validity and relevance? The reason I am asking is because solar intensity and global temperatures follow the same trend rather well until around the ’70s, so couldn’t we independently verify the temperature data from a specific form of data analysis by saying that we see a correlation beteen two sets of data, one of which is known to be reliable?

  122. Kevin Meyer
    Posted Nov 30, 2009 at 2:33 PM | Permalink

    snip – policy

3 Trackbacks

  1. By The Great Satan on Mar 9, 2006 at 11:11 AM

    Breaking the “Hockey Stick”

    David Stockwell exposes something of great importance In honor of the National Research Council of the National Academies committee to study “Surface Temperature Reconstructions for the Past 1,000-2,000 Years” meeting at this moment, I offer my own…

  2. […] Not only do blogs enable an aggressive falsification program, they enable individuals to defend themselves against stones thrown from ivory towers. On May 11, 2005, on the day that Ross McKitrick and Steve McIntyre were presenting their results debunking the hockey stick in Washington, UCAR issued a press release announcing that one of its scientists, Caspar Ammann and one of its former post-doc fellows, Eugene Wahl, had supposedly demonstrated that their criticisms of the hockey stick were “unfounded”. S&M have used the blog medium masterfully to reveal that a crucial unfavorable r2 verification statistic was withheld from the Nature publications, thus proving the UCAR accusations, not only unwarranted, but totally unfounded. Claims that scientists have been ‘harrassed’ about archiving their data have been shown false by posting all relevant correspondence on the web. […]

  3. […] Lubos Motl, 8 March 2006 Theoretical physicist, Harvard Verification r2 revealed ( […]

%d bloggers like this: