Data "Snooping"

Add this phrase from economics to your vocabularies to describe the "other" studies where proxies with known HS shapes like bristlecones and Yamal are used time after time. Here’s a website with some links. They cite Sullivan, Timmermann and White (1999) and White (2000) for the following definition:

"Data-snooping occurs when a given set of data is used more than once for purposes of inference or model selection."


The topic is actively being researched in econometrics and the methods need to be applied to multiproxy studies.


  1. George
    Hmmm. “data snooping is a dangerous practice to be avoided, but in fact it is endemic.” Not just in econometrics!

    “The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation.” Multiple choice question – What’s the ‘main problem’ in climate science? (1) the data sets give the desired results, so why switch data?; (2) researchers A, B, and W used those data sets, and published their results in refereed journals, so they must be correct; (3) new data which would provide alternate explanations are ignored; (4) none of the above; (5) all of the above.

  2. Steve McIntyre
    There are several articles in the Journal of Economic Methodology, 2000 on data snooping which have lots to say about multiproxy studies. I’m going to post up some excerpts when I have time.

  3. bender
    Data snooping is how you build hypotheses. It’s a valid practice at the start of the iterative circle that is the scientific method. The problem is that at some point these fanciful conjectures have to be confronted by truth-telling experiments … and climatology (probably because it’s so challenging) has failed to grow beyond the initial exploratory mode of model-building (and code testing, which is not the same thing as model testing). Along comes a desperate policy need, and like unripened fruit on a tree, whatever science that exists is snapped up and used, making for a very bitter cherry pie. In summary, the science is currently more limited than we wish it to be.

  4. kim
    Can you bake a cherry pie, Charming Mike, Charming Mike,
    Can you bake a cherry pie, Charming Mikey?
    I can bake a cherry pie,
    All my peers think it’s to die,
    But the stats leave a lot of people hungry.

  5. KevinUK
    #1, George

    Clearly the answer is 5) all of the above.

    I’ll leave others to fill in appropriate HT members names against 1), 2), 3) and 4).

    Kim, nice poem. Did you write it or Charming Mike (aka Mann)?


  6. Peter Hearnden
    Re #5, Ok if ’5′: ‘new data which would provide alternate explanations are ignored’ then: why aren’t there several alternate ‘sceptic’ paleoclimate NH/SH/globe temperature reconstructions??? I’ve said, several times, I’d love to see it/them. I really would!

  7. bender
    Re #6: PH, your answer is in post #233 here

  8. Mark T.
    I’ve said, several times, I’d love to see it/them. I really would!

    The onus is on those that produced the original, flawed reconstructions to revise their data, and methods, and prove it can be done.

    Quite frankly, I do not think we can know with _any reasonable_ certainty what past temperatures were. There needs to be a magic bullet proxy that records temperature throughout the year AND, somewhere along the line the notion of “global mean temperature” needs to really be defined in a way that makes some sense.


    #6 Here is one. I bet it is ‘sceptic’ (consistent).

  10. KevinUK
    #9 UC,

    What on earth do condoms have to do with global warming? Looks like another one to be added to the “Complete list of things caused by Global Warming” here.


  11. Steve Sadlov
    RE: #8 – Temperature is probably a fairly sucky characteristic to be attempting to measure / guess. My own personal favorite is power. I’d love to see global P(rms).

  12. Hans Erren
    my attention on was directly focused on this weird reference:


    messrs R. Air and S.E.A. Gravimeter didn’t sound real, and ideed the full citation should read:


    M. Abbasi, J.P. Barriot, J. Verdun et H. Duquenne
    Bureau Gravimétrique International (BGI), UMR5562, Observatoire Midi-Pyrenées, 31400, Toulouse, France

    And on reading the paper, it is not about data snooping, these french guys should have used the word data-acquisition in their title.

    Hey, that’s what happens to the error bars if you try to track AR1 and loose the measurements! U can blame math. I’m not saying that it is the best reconstruction ever, but it is alternate and sceptic, as per requested.

  14. 2dogs
    As I understand it, “data snooping” occurs where the same observations are used for two or more of the following:

    a. the initial observations that prompted the investigation;

    b. observations used to calibrate models; and

    c. observations used to test and validate models.

    However, again as I understand it, it is relatively okay to re-use observations to test as many alternative hypotheses as you like – so you can have a null hypothesis, and alternative hypothesis 1- 10, all tested on the same data, with the best overall hypothesis selected. Is this correct?

  15. Steve Sadlov
    RE: #9 – That is proper thinking. A very realistic view.

