Spurious Significance #1

I’ve had a number of requests to explain some statistical topics and tests of significance. I’d rather not get involved in an explanation of general statistical concepts, which are perfectly well covered in many other places. However, I am going to post some notes up on “spurious significance”, which, after all, was part of the title of our GRL article "Hockey Sticks, Principal Components and Spurious Significance", although most of the attention has been spent on principal components.

"Spurious significance" is a term in statistics used to describe a situation when a statistic returns a value which is "statistically significant", when it is impossible that there is any significance. It’s a topic, which sounds easy, but quickly gets difficult. Phillips [1998] introduces the topic as follows (anyone who doubts the quick descent into complexity need only to look at the article itself). I’m going to reference Phillips frequently because his original 1986 article on the topic was a remarkable tour de force, which framed this entire matter in a very sophisticated way. ) Phillips:

Spurious regressions or nonsense correlations as they were originally called have a long history in statistics, dating back at least to Yule [1926]. Textbooks and the literature of statistics and econometrics abound with interesting examples, many of them quite humorous. One is the high correlation between the number of ordained ministers and the rate of alcoholism in Britain in the nineteenth century. Another is that of Yule [1926] reporting a correlation of 0.95 between the proportion of Church of England marriages to all marriages and the mortality rate over the period 1866-1911. Yet another is the econometric example of alchemy reported by Hendry [1980] between the price level and cumulative rainfall in the U.K. The latter “relation” proved resilient to many econometric diagnostic tests and was humorously advanced by its author as a new “theory” of inflation. With so many well known examples like these, the pitfalls of regression and correlation studies are now common knowledge, even to nonspecialists. The situation is especially difficult in cases where the data are trending, as indeed they are in the examples above — because “third” factors that drive the trends come into play into the behaviour of the regression, although these factors may not be at all evident in the data.

Another set of examples is here including a regression of:

Egyptian infant mortality rate (Y), 1971-1990, annual data, on Gross aggregate income of American farmers (I) and Total Honduran money supply (M), where the values of the key statistics are: R2 = .918, F = 95.17.

Ultimately, where I’m going with this is a consideration of a couple of different situations, one with which I’m very familiar and one which is new to me. The old one is the regression of MBH98 proxies in the MBH98 calibration step against MBH98 temperature PCs, and, in particular, our two old favorites: the Gaspé tree ring series and the NOAMER PC1 against the temperature PC1. The other one is going to be the regression of the satellite GLB monthly series against a time trend.

I’m writing this little guide through the technical literature, mostly for my own reference. None of our published results rely on understanding this literature. However, my intuition is that it’s relevant to some of the issues that are worrying me, so I’m trying to master the literature. I’ll try to write it relatively easily mostly so that I’m sure that I understand things, but I make no promises on sugarcoating it.

The discussions will refer to linear regression. Recently, there has been rather a fashion in multiproxy studies to propose that “scaling” proxy series to the mean and variance of the target series in the calibration period as an alternative to regression (e.g. Esper et al [GRL 2005] and references). However, it seems intuitively clear to me that, whatever the merits of this approach, they do not circumvent issues of spurious relationships. This can be seen simply by simply recognizing that “scaling” as practiced by paleoclimatologists is simply “constrained” regression i.e. regression with a restriction on the coefficients. The equivalence can be demonstrated with a trivial Lagrange multiplier argument, which I’m thinking of submitting somewhere. But for now, interested parties should bear with me and accept for now that scaling as practiced in paleoclimate is simply a form of constrained regression and not a magic bullet for avoiding problems of spurious significance.

These notes will focus heavily on autocorrelated series. Statistics as presented in curricula always starts from the concept of independent draws, but the behaviour of even simple statistics like the mean and variance in highly autocorrelated series is quite different. It turns out that these issues are intimately involved with spurious regression: one of the main reasons for spurious statistics is a massive under-estimation of standard deviation (variance) in autocorrelated series using the standard ordinary-least-squares (OLS) standard deviation (variance). A variety of technologies have been proposed in econometrics for dealing with this problem, but the issue doesn’t seem to have surfaced in paleoclimatology.

References:
Phillips, P. [1998]. New Tools for Understainding Spurious Regressions, Econometrica, 66, 1299-1325. http://cowles.econ.yale.edu/P/cp/p09b/p0966.pdf
Phillips, P. [1986]. "Understanding Spurious Regressions in Econometrics." Journal of Econometrics, 33, 1986 http://cowles.econ.yale.edu/P/cp/p06b/p0667.pdf
Esper
Esper J, Frank DC, Wilson RJS, Briffa KR (2005) Effect of scaling and regression on reconstructed temperature amplitude for the past millennium. Geophysical Research Letters 32, doi: 10.1029/2004GL021236. http://www.wsl.ch/staff/jan.esper/publications/GRL_Esper_2005.pdf

42 Comments

  1. TCO
    Posted Aug 20, 2005 at 8:50 AM | Permalink

    Did you actually use this level of sophisticated understanding when working for mining companies? I mean Lagrange multipliers! I’m at a “science based” F500 company. And I bet that 95% of researchers, research management, and general management would get eye glaze from a comment about Lagrange multipliers. Are you exercising a long-held frustration to use more sophisticated methods?

  2. DAV
    Posted Aug 20, 2005 at 9:21 AM | Permalink

    Most people’s eyes glaze when encountering “x/y.” What’s your point? The topic here is math. Lagrange multipliers are germane as they are one of the most prominent methods to account for constraints in optimizations.

  3. TCO
    Posted Aug 20, 2005 at 12:01 PM | Permalink

    You’ve misread the post and think it means a criticism of the site. My question is as stated to Steve, for general discussion/interest (on the topic sited).

  4. Steve McIntyre
    Posted Aug 20, 2005 at 1:20 PM | Permalink

    TCO, I got your other coment in the spirit intended.

    No, I didn’t use Lagrange multipliers in mining. I did business and not technical stuff. This is from deep in my resume – I suppose that I learned it in 2nd year university in the 60s. I had to try to remember how they worked, but my math is getting better all the time, as I work at it. I don’t have the athleticism that I had when I was young. But the multiproxy math is trivial; it’s not much more than accounting.

    The Lagrange multiplier thing is more a criticism of Esper and that crowd. If they understood constraints or Lagrange multipliers, they wouldn’t have gotten all excited about a supposed difference between scaling and regression. You’d think that some young sharp mathematicians or statisticians would start trying to deconstruct the climate stuff. Although I think that I now the answer – some of the issues are so simple, that there’s no statistical interest to them. Really what you need are a bunch of sharp undergraduates. Very few climate articles are more advanced than a 2nd year or 3rd year essay. MBH98 is itself rather a laboratory of statistical horrors, but none that are beyond undergraduate level.

  5. TCO
    Posted Aug 20, 2005 at 7:00 PM | Permalink

    Undergrad in statistics? In physics? And of what level of sophistication of the student? I assume you must be talking about a rather elite student. The average ChemE that I know, does not really even have a thoughtful turn of mind to go after things…

  6. Steve McIntyre
    Posted Aug 20, 2005 at 9:54 PM | Permalink

    This was a pure math program for very elite (math contest calibre) students, which is what I was in a younger incarnation. (I stood first in Canada in my senior year in the high school math contest). Here’s the sort of papers that two of the guys (Ed Bierstone, John Scherk) that I competed with have written. Both have had had their names attached to theorems that are cited. I marvel at the calibre of the papers. While I obviously don’t have any academic accomplishment remotely approaching theirs, you can also see why I’m not especially impressed by Mannian bluster or goofy paleoclimatologist statistical practices. I had lunch with Scherk on the day of the Wall Street Journal article: he was quite amused by my sudden celebrity.

    E. BIERSTONE and P.D. MILMAN, Composite differentiable functions, Ann. of Math., 116 (1982), 541-558. Zbl 0519.58003
    E. BIERSTONE and P.D. MILMAN, The Newton diagram of an analytic morphism, and applications to differentiable functions, Bull. Amer. Math. Soc. (N.S.), 9 (1983), 315-318. Zbl 0548.58004
    E. BIERSTONE and G.W. SCHWARZ, Continuous linear division and extension of Càƒ⣃ ‹’€ à…⼠functions, Duke Math. J., 50 (1983), 233-271

    Vijaya Kumar Murty and John Scherk: Effective versions of the Chebotarev density. theorem for function fields, CR Acad. Sci. Paris, t. 319. S´erie I, 1994, pp. 523–528.r. I Math.
    Scherk, J.: On the monodromy theorem for isolated hypersurface singularities. Invent. math. 58, 289–301 (1980).
    Scherk, J., Steenbrink, J. H. M.: On the mixed Hodge structure on the cohomology of the Milnor fibre. Math. Ann. 271, 641–665 (1985).
    J. Scherk, The ramification polygon for a curve over a finite field, Cdn Math Bull 46 (2003)

    I went to a course reunion in May and sat beside Jim Arthur, a really eminent mathematician, who was so charming that he feigned interest in what I was doing. In the lineup for dinner, John Polanyi, an equally eminent scientist, joined the conversation – me with these two remarkable scientists talking about my views on climate change! It was very heady stuff and cheered me enormously.

  7. TCO
    Posted Aug 20, 2005 at 10:06 PM | Permalink

    Any regrets that you did not spend the last several years using your brains in more academic or scientific things? I sometimes wonder which way I should go, although years are going by and I’ve done both bizness weenie stuff and hard core science.

  8. Martin Ringo
    Posted Aug 21, 2005 at 11:03 AM | Permalink

    Regarding spurious regression (or significance) and the Phillips (1998) article

    My (I guess naive) understanding of spurious regression was that it was similar to the fallacy of the multiple t-tests: run twenty t-tests on sets of random noise and you can expect to get one 5% significant result. Or life is full of coincidences. Phillips seems to go further and say something to the effect that any vector of random noise can be explained (presumably meaning something other than the trivial “you pick the significance of the explanation, and I pick an explaining hyperplane”).

    What (among the many things) I don’t understand is the empirical meaning of Phillips’s article. What is the “new tool” if you like?

    Say we have regressed Y on X and found a significant coefficient for the slope. If we have a “model” (in the general sense of the term) of Y = f(X), with f’>0 and our slope is positive, we might say the data is consistent with (has not falsified) our hypothesis. What is Phillips saying that we must do to further check this conclusion?

    In the exploratory use of regression, i.e. without the model such as running simple linear trends, most empiricists understand that all one has done is provide a summary of the data. For instance, a linear trend says the data is tilted up or down to some degree. Without the theoretical comfort of the model, the empiricist would (OK make that “should”) be more wary of coincidence, as in the many warning to physicians about relying on individual retrospective studies. But exploratory regression should be done in the search for the suggestion of a hypothesis, which then has to be developed and tested against alternative data. That is it is descriptive, not inferential. Is Phillips suggesting that this is not useful or legitimate? If so, is that not the equivalent of telling the astronomers to stop looking at the stars? So what is Phillips recommending as to exploratory regression?

    PS: Even if you can’t help me here, I am glad you have kept your brain, shall we say, agile enough to tackle stuff like Phillips. Your postings are consistently interesting. But I shall blanch at the thought of Loeve-Karhunen expansions.

  9. Steve McIntyre
    Posted Aug 21, 2005 at 12:48 PM | Permalink

    Re #8: no it’s a different and more subtle problem. The problem is that the t-test rejects the null hypothesis (say) 75% of the time rather than 5% of the time and that it gets worse as N increases. Phillips [1998] is not the place to start. I’ll post up some more accessible versions.

  10. Peter Hartley
    Posted Aug 21, 2005 at 2:26 PM | Permalink

    It so happens that I have been working on some economic data that I think illustrates the spurious regression phenomenon under discussion quite well. The analysis involves US real GDP data (available for example from http://research.stlouisfed.org/fred2/series/GDPC96/18 ). It is interesting to compare two sets of regression results using this data. First, if you regress the natural log of real GDP on time, time^2 and time^3 (I used an index starting 0.25 for quarter 1 of 1947 and ending at 58.5 for quarter 2 of 2005), all three time variables have coefficients statistically significantly different from zero. The R^2 of the regression is 0.997384, the coefficients and estimated standard errors are:

    constant 7.3249 (0.0077)
    time 0.0434 (0.0011)
    time^2 -0.00026 (4.49E-05)
    time^3 1.73E-06 (5.02E-07)

    It is not “crazy” that the quadratic and cubic terms might be significant. Essentailly, they fit the facts that US GDP growth slowed down in the 70’s and early 80’s, but was higher in the 60’s and 90’s. If we suppose this model is valid, the first difference of y=ln(GDP) (an approximation to the growth rate) should satisfy a quadratic. Specifically, if we let

    y(t) = a + b*t + c*t^2 + d*t^3

    then

    y(t-1) = a + b*(t-1) + c*(t-1)^2 + d*(t-1)^3 = (a-b+c-d) + (b-2c+3d)*t + (c-3d)*t^2 +d*t^3

    so that

    y(t) – y(t-1) = (b-c+d) + (2c-3*d)*t + 3d*t^2

    However, if you regress the first difference on t and t^2 you get:

    constant 0.01002 (0.002008) estimated coefficient roughly 1/4 the implied value from the trend model
    time -8E-05 (0.00016) p-value = 0.612, estimated coefficient roughly 1/6 the implied value from the trend model
    time^2 6.28E-07 (2.58E-06) p-value = 0.808, estimated coefficient roughly 1/8 the implied value from the trend model

    R^2 = 0.0055

    The second regression implies that the strong deterministic trend present in the levels data is spurious. The problem is that real GDP has a non-stationary (ie. random walk) component (that Steve discussed in a previous post). Taking first differences results in a stationary series (the innovations in the random walk). Leaving the random walk in the error term, however, results in one finding a spurious time trend. Is something similar happening with climate data that has a high AR coefficient (ie. a “near random walk”) behavior?

  11. Ross McKitrick
    Posted Aug 21, 2005 at 6:03 PM | Permalink

    Herman Bierans of Penn State (econometrics) gives away a terrific time-series regression package called Easy REg. You can get it here: http://econ.la.psu.edu/~hbierens/EASYREG.HTM. It takes some getting used to but it’s point-and-click Windows based and does some stuff that it’s hard to find in other packages. One nice feature is in the tools menu:Teaching Tools:Spurious regression. You can select samples of unit root data series up to n=1000 and it will run a regression of 2 independent series on each other, give you the graphs and coefficients, and you can verify that most times the t stat is well over 3.0, even though there is no relationship between the 2. The issue is that both data series are generated as unit root expressions, e.g. x(t) = x(t-1)+ e(t), where e(t) is a normal(0,1) error. This was the phenomenon flagged by Granger and Newbold and explained by Philips.

  12. Peter Hartley
    Posted Aug 21, 2005 at 7:01 PM | Permalink

    Further to #10, I was bothered about the large differences between the two sets of estimated coefficients (the problem should not be that bad!). I realized that i made a mistake. The way I have defined time, the first difference is y(t)-y(t-0.25). This leads to implied coefficients in the difference regression of:

    constant = 0.25*(b-(c/4)+(d/16) = 0.0109, which is still about 1.09 times the estimated constant in the differenced model
    time coeff = 0.5*(c-(3/8)d) = -0.00013, which is still about 1.66 times the estimated linear term in the differenced model
    time^2 coeff = 0.75*d = 1.3E-06, which is still about 2.07 times the estimated quadratic term in the differenced model

    We can still see that the levels model gives artificially large coefficients on the time trend terms, but the differences I had before were a bit too large!

  13. John Hunter
    Posted Aug 21, 2005 at 11:30 PM | Permalink

    Steve (#4): You say:

    “Very few climate articles are more advanced than a 2nd year or 3rd year essay. MBH98 is itself rather a laboratory of statistical horrors, but none that are beyond undergraduate level.”

    You never seem to learn that is is very difficult for a climate scientist to discuss technical issues with someone who seems so intent on pissing climate scientists off.

    Couldn’t you just try and cool it for a while? Or can’t you help yourself?

  14. ET Sid Viscous
    Posted Aug 21, 2005 at 11:43 PM | Permalink

    So now you want to come into his site and tell him what he can and can’t say on it?

    Climate science is a broad field, requiring many different skills and tools. Of which statistics is only one. It would not be expected that they were as good with statistics as someone to whom it is their primary devotion. Particular since climatology is a science far removed from pure math. Those for whom statistics is their daily bread and butter might be expected to know it better than someone for whom it is just a tool to examine certain portions of their field.

    I’m willing to bet an EE would know the design of the microchip in the computer better than the climatologist who uses it to run programs.

    Regardless, it’s his site, he has plenty of people focusing a lot of time in ad hominem attacks against him on their sites, it’s only reasonable that he would vent a little from time to time. And I don’t see that comment as particularly vitriolic.

  15. GASman
    Posted Aug 22, 2005 at 4:22 AM | Permalink

    JH (#13) Rather than just being pissed-off in what to me seems a childish play-ground sort of way, wouldn’t it help if climate researchers recognise that when they use some statistical methods it is necessary to do so in an appropriate manner. Your comment “Or can’t you help yourself? ” is just a plain ad hom, can’t you help it?

  16. TCO
    Posted Aug 22, 2005 at 7:55 AM | Permalink

    I agree Steve. In general, you seem to have some right on your side. And in a perfect world, it shouldn’t matter if you write things that are provocative. An ideal researcher (a Feynmann) would share results with you regardless. Either, he would be confident in his case and unworried about your attempts to audit it…or he would be a “real scientist” who values correction from others…to ensure truth. However, Mann is probably not in that camp. He’s a young (now middle) Turk with some nice grants and standing in a politicized field.

    I would refrain from the ad hominem, if you want to get data/methods/help. And if you want to push the discussion on the content rather than getting too wrapped up in the hubbub. You need to keep the high moral ground, so that people who are independant wonder what the heck Mann is scared of, for playing hide and seek.

  17. TCO
    Posted Aug 22, 2005 at 8:03 AM | Permalink

    And in actuality, you are a bit over the top, with the comments about undergrad level reasoning. Since, when pushed, what you are really looking for is the brightest of bright in a specialized program.

  18. John A
    Posted Aug 22, 2005 at 8:33 AM | Permalink

    “Very few climate articles are more advanced than a 2nd year or 3rd year essay. MBH98 is itself rather a laboratory of statistical horrors, but none that are beyond undergraduate level.”

    You never seem to learn that is is very difficult for a climate scientist to discuss technical issues with someone who seems so intent on pissing climate scientists off.

    Not all climate scientists, just a select grouping of them who thoroughly deserve it, because of their hubris and their ill-gotten fame.

  19. Steve McIntyre
    Posted Aug 22, 2005 at 9:51 AM | Permalink

    OK, my language was certainly politically incorrect. Let me try to state the nuance a little better. There’s nothing technically hard in the statistics of multiproxy climate studies – by technically hard, I mean something like the articles by my old classmates Scherk or Bierstone (see #6), or as in the articles by Phillips or Kiefer, Vogelsang. Every Hockey Team article that I’ve read is, in my opinion, really lousy in statistical terms. Maybe (John Hunter) that won’t win me any friends, but that’s what I think. For Ph D statistics people, they are not bad in "interesting" ways. So a self-respecting post-doc statistician is unlikely to go analyzing Hockey Team articles. However, for someone who’s interests are a little less elevated (.e.g. a bright undergraduate), there’s plenty of material to make terrific papers. I could design a course for about 10 3rd year students based around analyzing Hockey Team articles that would probably be very stimulating for them. I’m not saying that it would work for average students; I’m thinking of bright students that will later be technically very skilled.

    As to MBH98 being a "laboratory of horrors", I didn’t mean that in a purely pejorative way. You could design a nice seminar on MBH98 itself. Every step of the article involves some statistical issue, which is usually handled in not an ideal way. Not every issue "matters", but it is a true "laboratory". After doing the seminar, the students would understand how the nooks and crannies contribute to a result.

  20. fFreddy
    Posted Aug 22, 2005 at 10:47 AM | Permalink

    Now, that sounds like something for which it might be worth submitting a grant application …

  21. John Hunter
    Posted Aug 22, 2005 at 7:27 PM | Permalink

    Steve (#19): I could probably go over the work of a typical exploration geologist and say that hardly any of it was “more advanced than a 2nd year or 3rd year essay” and that none of it was “beyond undergraduate level”. If I pre-selected the geologist, it would probably be true but it wouldn’t be very helpful. I still use lots of stuff that I learned as an undergraduate — we all do.

  22. Louis Hissink
    Posted Aug 22, 2005 at 10:47 PM | Permalink

    Hunter makes an interesting comment about exploration geologists, presumably on their use of statistics?

    We use geostatistics quite a lot, and are keenly aware of the problem defined by Wellmer, or Koch and Link, or even Achterberg’s Geomathematics and others as “sample support” and the “sample-volume-variance” effect, apart from the distinction between intensive and extensive variables, and their misuse in statistics.

    I am not going to get too detailed but a few years back Essex and McKitrick wrote a concise little book on this problem in the Dr Thermometer chapter.

    Hence the Hadley centre’s method of computing a global mean temperature by adding station data in grid cells defined by latitude and longitude is fundamentally flawed and will produce an inaccurate estimate of that global temperature. The correct procedure is identical to the methodology of polygonal ore-reserve estimates, where each temperature station is assigned a unique polygon of area defined what type of land it is, and obeying also the rule that each polygon boundary marks the halfway point between adjacent stations. One could assume that the third dimension, thickness, is constant, so the basic extensive variable is area that has to be factored by the intensive variables temperature and specific heat.

    Once individual polygonal areas are assigned to each temperature station, that area is multiplied by the mean temperature of the station for the period under study, times specific heat for air at whatever humidity is representative for that same period of study, presumably 1 year. And so on – all are added, and then divided by the product area*specific heat etc to compute the final global mean.

    Essentially it is the method of mixtures calculation used in physics 101 texts, (that is if they teach the method any more).

    The Hadley centre method computes the global mean temperature of an imaginary spherical surface that has no physical meaning or any basis in reality.

    Mind you if the Hadley Centre methodolgy were applied to the computation of a mineral ore reserve, where the third dimension is important, hence it is volume which is the primary extensive variable that the intensive variable metal% has to factor, (in climate it is the extensive variable area that is factored by the intensive variable temperature and specific heat etc), then their estimate will quite inaccurate.

  23. Armand MacMurray
    Posted Aug 22, 2005 at 10:57 PM | Permalink

    Re: #21

    John, you left out Steve’s point: the stuff learned as an undergraduate needs to be used *correctly*.
    I know that biologists often consult/collaborate with statisticians when doing experiments involving more than minor use of statistics; perhaps this practice would be useful for climate research.

  24. John Hunter
    Posted Aug 23, 2005 at 10:04 PM | Permalink

    Louis Hissink (#22): It is very easy to “do a McIntyre” and to say things like “the Hadley Centre’s method of computing a global mean temperature by adding station data in grid cells defined by latitude and longitude is fundamentally flawed and will produce an inaccurate estimate of that global temperature”.

    Firstly I would ask what you mean by “inaccurate”. Presumably you mean that “your” method would yield results that have a lower uncertainty than the “Hadley Centre” method. Well, in that case you need to prove your assertion, presumably by doing the analysis using both methods, with consistent error budgets.

    Secondly, I wonder why you say “the correct procedure is identical to the methodology of polygonal ore-reserve estimates, where each temperature station is assigned a unique polygon of area defined what type of land it is, and obeying also the rule that each polygon boundary marks the halfway point between adjacent stations”. I would agree that this is ONE method of doing an average over a set of discreet points, but surely you know that there are numerous methods of doing this. So why is “your” method the “correct” one?

    Thirdly, the question of whether averaged temperature, internal energy or enthalpy should be used for the “global average” is not trivial. For example, what property is carried around by the wind? — temperature, internal energy or enthalpy — or something else? (I leave it to you to find the answer to that.) If I put a cake in an oven, does it take on the temperature of the oven or its internal energy? Is the long-wave radiation emitted by the surface of the Earth dependant on the temperature of the Earth or its internal energy? Why do you claim that “the global mean temperature of an imaginary spherical surface ….. has no physical meaning or any basis in reality”? There are lots of questions and no simplistic answer.

    So — if you think that “the Hadley Centre’s method of computing a global mean temperature ….. is fundamentally flawed”, make a contribution, take the historical temperature, do your own analysis and publish the results.

  25. Steve McIntyre
    Posted Aug 23, 2005 at 10:14 PM | Permalink

    Louis, I’m familiar with ore reserve calculations. I don’t see the problem with the way that CRU does averages – in effect, they use gridcell areas the way that you use polygons. What’s unreasonable about that? I think that there are other issues with CRU, but I don’t get the complaints about averaging.

  26. Jeff Norman
    Posted Aug 24, 2005 at 8:49 AM | Permalink

    I have always disageed with the contention in the Essex and McKitrick book that average temperature is a meaningless concept.

    Jeff

  27. Michael Jankowski
    Posted Aug 24, 2005 at 9:18 AM | Permalink

    Firstly I would ask what you mean by “inaccurate”. Presumably you mean that “your” method would yield results that have a lower uncertainty than the “Hadley Centre” method. Well, in that case you need to prove your assertion, presumably by doing the analysis using both methods, with consistent error budgets.

    Narrowing the uncertainty range would make the methodology more precise, but not necessarily more accurate.

    A more accurate methodology would more closely approximate the average global temperature. Proving that would be quite difficult (if not impossible) since we don’t have a magical instrument that can measure the average global temp to compare to different weighting methodologies.

  28. John Hunter
    Posted Aug 24, 2005 at 4:31 PM | Permalink

    Michael Jankowski (#27): You say “narrowing the uncertainty range would make the methodology more precise, but not necessarily more accurate.”.

    And there I was thinking that uncertainty only included precision and not accuracy — silly me. Remind me to never accept an uncertainty estimate from you.

  29. Michael Jankowski
    Posted Aug 24, 2005 at 6:10 PM | Permalink

    RE#28 – Example: A long time ago, a guy wanted to time his walks from his new house to the village down the road to get an idea of how long it takes him to get there. He leaves at noon every day, as soon as the large hand on the clock hits the 12. As soon as he arrives around a half-hour later, he checks the clock in the village square. Sitting beneath the clock is the village idiot, and wanting a second opinion as to the actual time because his vision is fading somewhat he asks the village idiot what time it is. “Eight-oh-six and forty-five seconds after noon!” the idiot replies. So every day for a few weeks, the man repeats the process. Each time he takes a reading from the village clock, and each time he asks the village idiot and gets the same response: “Eight-oh-six and forty-five seconds after noon!” According to the clock readings the man takes in the village square, it takes him 28.50 +/- 1.50 minutes each day to walk from his house to the village. On the other hand, according to the village idiot, it takes him 486.75 +/- 0.00 minutes each day to walk from his house to the village.

    So tell me, please…is the method with the smallest uncertainty also the most accurate in this case? Are you going to go with the village idiot’s methodology and result?

    As I said, a narrower uncertainty is “not necessarily more accurate.”

  30. JP
    Posted Aug 24, 2005 at 6:31 PM | Permalink

    Perhaps a daft question – but why does so much seem to hang on ‘average global temperature’ in the first place? Climate change could result in catastrophic weather conditions without any change in average temperature (or rainfall or whatever), just large changes at a local level that may themselves average out to ‘no change’ over time.

  31. John Hunter
    Posted Aug 24, 2005 at 7:42 PM | Permalink

    Michael Jankowski (#29): I’m not sure if this is even worth replying to. If you base your estimates on poor data or do the analysis wrong, then it is very easy to estimate the uncertainty incorrectly. If you do things right, then the estimate with the lowest uncertainty will give you the most accurate estimate of the answer. I don’t generally trust the “village idiot” (in a real village or, metaphorically, on this site) to give me trustworthy data.

  32. Posted Aug 24, 2005 at 11:09 PM | Permalink

    Physics simulations often produce timeseries with high autocorrelation, with the same problem of estimating appropriate error estimates on timeaveraged means of various quantities. One accepted technique in that community is Flyvbjerg-Petersen block averaging, see J Chem Phys 91:461 1989. It might be worth checking out to see if it can be applied to the autocorrelated timeseries of interest here.

  33. John A
    Posted Aug 25, 2005 at 12:51 AM | Permalink

    Re: #30

    JP: those questions are discussed at length in “Taken by Storm” by Essex and McKitrick.

    From my reading, the movement of the “global temperature” statistic by climate scientists is taken to be a proxy for “global climate change”. Needless to say, this linkage is taken to be axiomatic, without any fundamental argument being employed to explain it. Essex and McKitrick argue that without a theory of climate, such discussions of “global temperature” are meaningless.

  34. Michael Jankowski
    Posted Aug 25, 2005 at 6:11 AM | Permalink

    If you do things right, then the estimate with the lowest uncertainty will give you the most accurate estimate of the answer.

    That’s a huge “if.” Remember, the original statement of mine that you had such a big problem with: “narrowing the uncertainty range would make the methodology more precise, but not necessarily more accurate.”

    The case made in #22 (that you responded to in #24) was that the Hadley Centre methodology was “fundamentally flawed,” which in my mind means “not doing things right.” You said a more accurate methodology would reduce the uncertainty below the Hadley Centre’s uncertainty. As I have shown and you have admitted, a narrower uncertainty can only imply greater accuracy over another method if things are done “right” in both methods. So if the Hadley Centre’s uncertainty in #22 is produced by incorrect methodology, why would an alternative methodology need to have a smaller uncertainty in order to be considered more accurate?

    In the end, it all comes down to the fact that we don’t have a direct measurement of average global temperature to compare the method results to, so there is no way to determine which methodology produces the most accurate estimate. But it should be clear by now that narrowing the uncertainty by itself wouldn’t imply greater accuracy.

  35. Steve McIntyre
    Posted Aug 25, 2005 at 7:54 AM | Permalink

    Re #32: thanks for the reference. I’ll look at it.

  36. JP
    Posted Aug 25, 2005 at 1:24 PM | Permalink

    RE33 Thanks John – It’s on my shelf with other ‘must reads’! I agree that a theory of climate would be useful in discussing climate change(!) I didn’t realise there were none. Presumably there are, but climate being so complex I guess there’s little agreement over the theories? I would also guess that since there’s so much we don’t know we are a long way off agreeing a theory. But I dissagree that ‘without a theory of climate discussions on global temperature are meaningless’. With or without a theory discussions could be equally meaningless (i.e. the theory could be wrong), but, knowledge always being incomplete, that shouldn’t stop us having discussions about things that concern us, and hypothesising about cause-effect, as long as we remain humble.

  37. John A
    Posted Aug 25, 2005 at 1:50 PM | Permalink

    With or without a theory discussions could be equally meaningless (i.e. the theory could be wrong), but, knowledge always being incomplete, that shouldn’t stop us having discussions about things that concern us, and hypothesising about cause-effect, as long as we remain humble.

    You may have noticed that humility amongst climate modellers is a vanishingly rare commodity.

  38. JP
    Posted Aug 29, 2005 at 3:43 AM | Permalink

    RE 37
    Unforunately I’ve noticed that humility is rare full stop. E.g. our capacity to inflict damage to the environment, and our fellow man, seems to be ‘increasing faster than our ability to predict its consequences.’

  39. Willis Eschenbach
    Posted Oct 24, 2006 at 3:49 PM | Permalink

    Re 35, Steve M., I just found a good description of the Flyvbjerg-Peterson algorithm here

    All the best,

    w.

  40. Willis Eschenbach
    Posted Oct 24, 2006 at 9:42 PM | Permalink

    I just did a trial run with the Flyvbjerg-Peterson algorithm, comparing it to the Nychka algorithm. I used both to calculate the standard error of the mean.

    I tried it with both the raw data, and the detrended data (n=128). In both cases, the Flyvberg-Peterson algorithm gave a larger standard error than the Nychka algorithm.

    With the detrended data, the Nychka algorithm SEM is about 80% of the F-P algorithm (SEM 0.027 vs 0.033)

    With the raw data, the Nychka algorithm SEM is about 33% of the F-P algorithm (SEM 0.034 vs 0.10)

    This would seem to indicate that the Nychka algorithm underestimates the actual effect of autocorrelation.

    I repeated the experiment with Jones monthly data, which has a larger autocorrelation, and with a longer time series (n=512). Both the raw and detrended data showed similar numbers. The Nychka algorithm SEM from the raw data was about 20% of the F-P algorithm. The detrended data Nychka algorithm was about 63% of the F-P algorithm.

    Again, this suggests that the Nychka algorithm underestimates the true effect of the autocorrelation. The greater the autocorrelation, the greater the difference between the two methods. The Nytchka algorithm is an ad-hoc method, whereas the F-P algorithm is based on mathematical derivation …

    All of this means that the effect of autocorrelation on temperature data series is worse than I had estimated using the Nychka algorithm …

    w.

  41. bender
    Posted Oct 24, 2006 at 9:49 PM | Permalink

    Willis, how can you assume that the Flyvbjerg-Peterson algorithm is better? And are you sure it’s been coded correctly?

  42. Willis Eschenbach
    Posted Oct 25, 2006 at 4:40 AM | Permalink

    I don’t know if it’s better, bender, but it’s mathematically provable, according to the paper, while Nychka’s is a heuristic method, as far as I know.

    Regarding coding it correctly, I did it manually so I could check each step, assuming I understood it correctly. Basically, the F-P algorithm works by collapsing the data into smaller and smaller sizes, by using “blocks” of data. Each block is made by averaging adjacent data pairs in the previous dataset, to make a dataset half the size or the original, with each element in the dataset representing a block of twice the size. This incrementally reduces the autocorrelation, while not changing the standard deviation or the mean. The relevant formulas are:

    x_{i}^{'} = \frac{1}{2} \{x_{2n-1}+x_{2i}\}

    n^{'} = \frac{1}{2}n

    This is repeated until the autocorrelation disappears or the process cannot continue, and then the variance of the mean is determined in the usual fashion. The paper I cited above gives the relevant mathematical derivation of the method.

    However, experimentation with larger datasets shows that the F-P method sometimes gives smaller answers, and sometimes larger answers, than the Nychka method. In addition, because F-P reduces the size of the dataset by a factor of two each time, the autocorrelation changes by large steps in the final stages, and the zero point falls between two steps. This introduces a margin of error into the F-P algorithm.

    Finally, this algorithm is designed for very large datasets (the smallest example in the paper has 62 thousand data points, and the largest, 31 million). On small datasets, the F-P algorithm seems not to converge entirely on non-detrended datasets before it stops because no more blocking operations can be done. In addition, the difference between the Nytchka and the F-P algorithm decreases as the dataset size n increases. On a dataset of 1024 data points, the difference on the detrended dataset is quite small, although it is still relatively large on the un-detrended dataset.

    Because of this, I reckon that the Nytchka algorithm does a better job for the typical size of climate datasets (a few thousand points max) than the F-P algorithm.

    w.

One Trackback

  1. […] and Newbold observed that, although the classic spurious regressions (see Spurious #1) had very high R2 statistics, they had very low (under 1.5) Durbin-Watson (DW) statistics. (The DW […]