The reason for looking at the form of stochastic process that bests suits the gridcell (and hemispheric) temperatures is that the statistical behavior of a random walk (one type of stochastic process) is very different than independent draw from a normal process.
The equation for a random walk is: y[t] = y[t1] + à?à’ * àÅ½àⴛt]. If the random walk is a cointossing game, then à?à’ =1 and àÅ½àⴛt] is drawn from {+1,1}. If àÅ½àⴛt] is N[0,1], then it is a form of discrete Brownian motion. This differs from independent draws (iid) which are represented by y[t] = à?à’ * àÅ½àⴛt]. Between random walks and i.i.d., you get ARMA processes, simple versions of which are as follows:
AR1: y[t] = à?”¬➠* y[t1] + à?à’ * àÅ½àⴛt]
ARMA(1,1) y[t] = à?”¬➠y[t1] + à?à’ (àÅ½àⴛt] + àÅ½àⶠ* àÅ½àⴛt1])
A random walk (and iid) are equivalent to AR1 process with à?”¬➠=1 (and 0 respectively). Processes with high AR1 coefficients (say > 0.9) behave like random walks (often called "integrated processes") in some ways. So when we see that so many gridcell temperature series are highly autocorrelated, with AR1 coefficients >0.9 (especially in tropical oceans), it becomes more realistic to apply statistical methods based on "near integrated processes" than on i.i.d. It is therefore a little surprising to see climate scientists present analyses of temperature data using i.i.d. assumptions – see for example a recent realclimate post here .
The behaviour of i.i.d series is relatively well understood; the behavior of integrated processes is not. I’ve provided here some extensive quotes from a wellknown mathematician, W.A. Feller [1966], An Introduction to Probability and Statistics,(see 7185 passim) . (which is far from being elementary despite its name). Feller’s comments are surprisingly florid for a mathematical text and express his astonishment at the extraordinary persistence of runs in a PeterPaul game :
If +1 stands for heads, then s[k] equals the (positive or negative) excess of the accumulated number of heads over tails at the conclusion of the kth trial. The classical description introduces the fictitious gambler Peter who at each trail wins or loses a unit amount. The sequence s[1], s[2],…s[n] then represents Peter’s successive cumulative gains. It will be seen presently that they are subject to chance fluctuations of a totally unexpected character. The picturesque language of gambling should not detract from the general importance of the cointossing model. In fact, the model may serve as first approximation to many more complicated chancedependent processes in physics, economics and learningtheory…(p.71)..
Even the simple cointossing model leads to surprising indeed to chocking results. They are of practical importance becaue they show that contrary to generally accepted views, the laws governing a prolonged series of individual observations will show patterns and averages far removed from those derived for a whole population…
The results are startling. According to widespread beliefs, a socalled law of averages should ensure that in a long cointossing game each player will be on the winning side for about half the time and that the lead will pass not infrequently from one player to the other. Imagine then a huge sample of records of ideal cointossing games, each consisting of exactly 2n trials. We pick one at random and observe the epoch 2k of the last tie (i.e. the number of the last trial at which the accumulated number of heads and tails were equal.) ….
With probability ½, no equalization occurred in the second half of the game, regardless of the length of the game. Furthermore, the probabilities near the endpoints are greatest; the most probable values of k are the extremes 0 and n. These results show that intuition leads to an erroneous picture of the probable effects of chance fluctuations.
Feller shows that the distribution of k is approximately closely by an arc sine distribution 2 à?”⫞1 arc sin (x^0.5). In a long cointossing game, one of the players remains practically the whole time on the winning side, the other on the losing side. Feller gives some numbers: In 20 tossings, the probability that the lead never passes from one player to the other is about 0.352. The probability that each player leads 10 times is only 0.06.Let n be large. With probability 0.20, one player is in the lead 97.6% of the time. In one of 10 cases, one player is in the lead 99.4% of the time.
Feller presents the following table for an experiment in which a coin is tossed once per second every day for 365 days – calculating the length of time that the less fortunate player is in the lead.
p  Lead by less fortunate player 

0.9  153.95  days 
0.8  126.1  days 
0.7  99.65  days 
0.6  75.23  days 
0.5  53.45  days 
0.4  34.85  days 
0.3  19.89  days 
0.2  8.93  days 
0.1  2.24  days 
0.05  13.5  hours 
0.02  2.16  hours 
0.01  32.4  minutes 
Table 1. Period of Time in Lead in Feller’s CoinTossing Game
Statistical significance tests are usually at the 1% or 5% levels. So even if one player was ahead for all but 13.5 hours, one could not exclude a fair game at a 95% level of significance. Feller also provided some remarkable statistics on runs.
The theoretical study of chance fluctuations confronts us with many paradoxes. For example, one should expect naively that in a prolonged cointossing game the observed number of changes of lead should increase roughly in proportion of the duration of the game. In a game that lasts twice as long, Peter should lead about twice as often. This intuitive reasoning is false. ..the number of changes of lead in n trials increases only as n^.5: in 100n trials, one should expect only 10 times as many changes of lead as in n trials. This process once more that the waiting times between successive equalizations are likely to be fantastically long.
The probability of r changes of sign in n trials decreases with r. The probabilities for exactly r changes of sign in 99 trials are given by Feller as follows:
r  Probability 
0  0.1592 
1  0.1529 
2  0.1412 
3  0.1252 
4  0.1066 
5  0.0873 
6  0.0686 
7  0.0517 
8  0.0375 
9  0.0260 
10  0.0174 
11  0.0111 
12  0.0068 
13  0.0040 
Table 2. Expected Lead Changes in 99Toss Coin Tossing Game
The process underlying the tropospheric temperature curve has an extremely high AR1 coefficient (à?”¬➠= 0.9215 instead of à?”¬➠=1) and it has a significant MA1 coefficient. We know that the process is not i.i.d, not least because it has a very low DurbinWatson statistic. If you look at this graph from the perspective of Feller’s PeterPaul game, rather than from the deceptive perspective of i.i.d.processes, it’s hard to feel that there is a statistically "significant" trend. The number of crossings seems high relative to Feller’s example (even allowing for process differences.)
Figure 1. GLB Satellite Temperature
If the process has a very high AR1 coefficient (as seems to be the case), intuitions about what is to be expected should be sought from gambling runs (with the process suitably modified) rather than unrealist i.i.d. models. In fact, the it seems to me that climate scientists would be much better off if they unlearned almost everything that they think they know about means and variances (drawn from i.i.d. situations) and start with autocorrelated series as a base case (with i.i.d. as a special case of little interest.).
97 Comments
This is a useful discussion. Your point here — that if we suppose that the satellite record (and the “good” CRU cell records) are accurate measures, the relatively high AR(1) coefficients imply we should be careful about measuring trends — is a good one. A related point is that one needs a relatively long sample to make confident statements about the statistical properties of a time series with an AR(1) coeficient that is close to 1. Intuitively, the high autocorrelation means that one has fewer independent observations of the series than one would think given the sample size. Some climate change studies that have made statements based on a relatively small sample have been overturned when a longer sample is used.
When you presented the ARMA models of the CRU data, however, I thought you might have been heading in another direction. There is also a literature in econometrics discussing how discrete jumps in the mean of an otherwise stationary process (resulting for example from changes in measurement techniques, or in this case instruments, station location, vegetation around the recording site etc) can produce a series that looks like a random walk. The very high AR coefficients in some of the CRU grid cells might then indicate inhomogeneities in the record for those cells.
A bit OT, but following the link to Realclimate, I see our commentor TCO claiming to have been censored and indeed banned from this blog (dated 7 August) and his comments deleted.
TCO’s post (No 54) Quote “Please don’t ban me. I already got kicked off (and comments deleted) from the climate auditor site. And I wasn’t misbehaving. If anything I was more on their side than yours, but I just wanted to dig a little into the methodology (well wanted them to do a first cut reanalysis…not of your work…but starting from scratch). Even if they are right and you are wrong, it seems a little wierd how much time they spend on someone else’s work. you would think with all the effort exerted, they would get interested and start doing original work.
Anyhow…if you ban me too, I will feel really bad. How about puitting a word in for me instead with your colleagues over there?
And I’m not really a troll. Well…actually I have an AWFUL tendancy to troll. But I thought I didn’t really do any…yet.”
I have certainly seen his comments in other threads, and I haven’t seen anyone else banned, although SpamKarma may be working ? Query, has poster TCO had comments removed or been banned, or are we seeing a SK effect. I hate to see inaccurate comments. TCO, feel free to comment on your remarks if you see this, if you are not too embarrassed that is.
This was an old post. TCO got on the wrong side of Spam Karma for some reason and his posts weren’t going through. He drew that to our attention and they were recovered from purgatory. After a while, Spam Karma recognized him as not trolling. We do a get some Type II errors, but we’ve had 7437 soam rejections in the last month and you really need this type of software to function.
Steve,
I am no statistician but I am intrigued by the trend line in the GLB Satellite Temperature grapic you show.
Can you please show the trend line if you started the data set from approximately June 1980 and finished it at about Dec 2004 what would be like then?
It looks rather flat to me.
If you had a peak similar to that of 1998/9 at the start of this graph then it would also look rather flat.
If the data set had started in about Dec 1984 and finished about Dec 1998 then the trend would be sharply upwards.
I know you can’t subjectively pick and choose what data helps your cause but given the huge amount of variability in the graph how acurate would any trend be?
Anyway a trend of 0.123K doesn’t appear any greater than that of the early 20th century which can’t be attributed to CO2 increases.
You’re missing the point. The concept of a "trend line" presupposes
normalindependent errors. Because the data is so autocorrelated, it’s not clear that the "trend line" has any more significance than a trend line of net winnings at a casino. What one is really trying to test is whether the game is "fair" or "unfair" i.e. an "unfair" game is one with a trend.Stece: “normal” corrected to “independent”.
My very first post (number 2 on Preisendorfer) is still missing, but it’s irrelevant since the discussion and my repeating points covers it.
OK, yes, I rather thought that it might be an SpamKarma effect, I forgot the Preisendorfer thread problems. TCO, you might want to just correct that post on Realclimate then, in the interests of giving a “true and fair” view.
Apart from an eerie fascination with these statistics on temperature, I think there is a huge assumption that the greenhouse warming camp is making – that climate response to extra CO2 in the atmosphere is linear. It is not, as otherwise the planet would have gone into heat death eons ago when the level was over 100 times higher than at present.
A long time ago Angstrom pointed out that there was already enough CO2 in the atmosphere to “saturate” its two absorption bands – adding more effectively makes no difference.
What am I missing in this?
An good illustration of spurious trend in random walks
is a plot of multiple random walks starting from a
single origin point. The outermost trajectories diverge indefinitely spreading out and appearing to trend from the origin.
Anyway I wanted to thank you for the reference by
Umberto Triacca on problems with Grangercauses and follow
up on something.
Given ARMA has not been accepted and the above is unlikely what
is a good strategy? It seems to me you have been very successful in
mm05(grl) with the strategy:
1. replicate the published results (A)
2. correct the flaws in method and/or data
3. new results (B) conflict with (A)
This strategy (a kind of audit) is very appealing and convincing.
More so say that Triacca where two possible interpretations of
the results were shown. In that case, the reader can simply choose
their preferred interpretation, so doesn’t really rebut the claim.
It is also more appealing than the Storch et.al. (Nature 2004)
approach because it uses the real data, not simulated data (though
there is a role for simulated data). It seems with the audit
strategy only two options are possible: 1. you are right that
the original research is flawed, or 2. that the original results
were produced by some other OK method, or 3. that the fact it is
flawed doesn’t matter. In all cases it puts the ‘ball back in
the opponents court’ rather than just weakly suggesting they
should not be so confident.
If you were going to take on another topic would you adopt a similar
strategy? i.e. replicate the global surface temperature average results of other, fix the errors in the data, prove the data is
not IID and use ARMA to estimate parameters for comparison?
Steve (#5):
You say: “The concept of a “trend line” presupposes normal errors.”
I assume that you mean “independent errors” rather than “normal errors” (they are quite different). I can’t see anywhere in the derivation of trend estimation where one resorts to an assumption that the “errors” are normally distributed (but correct me if I’m wrong).
You also say: “Because the data is so autocorrelated, it’s not clear that the “trend line” has any more significance than a trend line of net winnings at a casino.”
This whole thread seems to be based on the suggestion that climate scientists do not allow for autocorrelation when estimating the uncertainty in a trend. Well, I can assure you that I and my coworkers do take autocorrelation into account and estimate the uncertainty in the trend appropriately. There is also much information on the web and elsewhere which tell us how to do this.
You also say in your original article:
“It is therefore a little surprising to see climate scientists present analyses of temperature data using i.i.d. assumptions – see for example a recent realclimate post here.”
— which seems a rather suprising statement give that, in one place, the author states quite clearly that “further analysis showed that the absolute monthly maximum/minimum temperature was poorly correlated with that of the previous month, ruling out dependency in time (this is also true for monthly mean temperature – hence, ‘seasonal forecasting’ is very difficult in this region)” — i.e. the author DID test for independence.
So I don’t really see the point of this thread, except perhaps to blind the reader with statistics and cast another slur on climate science.
RE #8: they don’t assume that the impact of CO2 is linear: this is genrally regarded as logarithmic enhanced by water vapor feedback (the net result of which is more or less linear although the processes are nonlinear).
Re #9: It’s seems to be relatively easy to spot gridcell records that don’t make any sense mostly for nonhomogeneity. Do these low quality cells have an impact on their results? It would be worth checking. If Pfizer or Boeing were relying on this data set for a drug or an airplane, they would long ago have put a “red team” or “tiger team” on it to see how good it is.
we’re not going to get into the temperature data. This is really just a diversion. There are several reasons. You’d need to see raw data which is inaccessible. It’s a much bigger data set than any of the proxy data sets and would need a lot of work.
Re #10: Obviously the post discusses independent errors and makes no discussion of normal errors. Indeed a PeterPaul game is binomial rather than normal (leaving Central Limit Theorem aside). However, I did indeed inadvertently use “normal” instead of “independent” in my comment reply and I have corrected it.
John H., first, as a matter of interest, how do you allow for autocorrelation in your calculation of trends? what processes do you use for analysis of residuals? I’m quite happy to discuss this on a constructive basis and leave aside past attempts to score debating points.
Secondly, with respect to Benestad, as I read his article (and I did so quickly), it seems to me that he stated explicitly that he used i.i.d. assumptions. So the fact that you and other workers account for autocorrelation does not mean that Benestad did. Indeed he says that he didn’t.
You pointed out that Benestad did purport to justify an i.i.d. assumption. In one sense, you must think that this argument is invalid, since your stated practice is to allow for autocorrelation. Moreover, if you look at the results which I’ve posted up with ARMA(1,1) models for the GLB tropospheric temperature data set and the graphic showing ARMA(1,1) coefficients for the CRU dataset by gridcell, it’s pretty evident that the CRU monthly data set is very strongly autocorrelated, especially in the tropical oceans. One other important tweak: most quick analyses of autocorrelation are against an alternative AR1 process (or ARMA(1,0)). Granger and Newbold [1986] point this out and specifically note the shortcomings of the DurbinWatson test (the usual measure) in this respect. I’ve noticed that the ARMA(1,1) models are markedly superior to the AR1 models and usually have high AR1 coefficients. The AR1 coefficient of the CRU monthly series in an ARMA(1,1) model is 0.9215, which is obviously inconsistent with Benestad’s claim of no dependency. This is not just an artifact of the GLB average, but is observable in individual gridcells.
I’m trying to keep some caveats on my statements on the temperature data set (which I’ve said at the outset of this topic, but can’t repeat in every post or coment). I have a couple of years work on the proxy data sets and anything that I say there, I’m saying more categorically. But with that caveat, do you think that the i.i.d. assumption is justified?
Re Benestad at realclimate. Ironically, Benestad [2003] does briefly refer to Feller’s text quoted here as follows:
Benestad complains that some are “too theoretical” and some are “much more complicated than required”. Give me a break. They are only “too complicated” if the event s are not i.i.d. If you look at the conclusions of Benestad [2003] and Benestad [2004], they are cautiously expressed in terms of rejecting a null hypothesis of i.i.d. I’m not sure why anyone would have proposed that. Where this goes off the rails is in something like the realclimate post where they post up a trend line with i.i.d errors compared to a line with no trend with i.i.d. errors. Actual series don’t look like the trend line with i.i.d. errors – they look like stichastic processes generated from stochastic processes (e.g. ARMA (1,1) with high AR1 coefficients.) So of course the null i.i.d. hypothesis will be rejected.
I’ve looked at Benestad [2003] and Benestad [2004]; these are bizarre articles and I’ll probably do a post on them. Every time I turn over a stone, I get distracted.
Re:#10
It is indeed nice to see a message from you which actually engages the subject matter of this site. But I now have a question for you, which somewhat mimics the sort of question you’ve specialized in here in the past. So I apologize in advance if the question is regarded by you as trivial or intrusive.
You say, “I can assure you that I and my coworkers do take autocorrelation into account….” I wonder 1) who these coworkers are and 2) if you might have actually meant “Other workers in the field of climate change regardless of whether or not I’ve ever collaborated with them or not”?
I ask this because just as you felt it appropriate to correct Steve concerning a technical word usage, I feel it’s appropriate to determine if you were overgeneralizing the term ‘coworkers’ and would care to be more clear, or, if you did indeed only mean to refer to individuals you’ve worked with in research, why you think this is useful to bring up? Thus you might have said, “I don’t know about what other researchers do, but…” or “I and my coworkers… and we are quite certain all other groups do the same.” Now, of course, if Steve’s remarks had been directed at a paper with your name on it, you wouldn’t need to provide such additional caveats, but since they were more general, and AFAIK didn’t include your work, I think you should clear up just what you meant.
Re: #11
The reason I am interested in your audit approach is that it does get results. There are no shortage of examples of misspecified models in the literature, or better ways to do things — the problem to is in gaining traction with the debunking — witness #10. But who can really argue against a through audit? There would be the enthusiasm for it if there was a high probability that the extensive work required would lead to a high payoff in terms of actually ‘proving’ something, or more more important, convincing people. I have seen some time series paper and thought “Yeah its fun to crunch numbers and produce alternative explanations” but how to get to the larger game, make it through reviewers, and make it worhwhile?
What do you mean the raw data is inaccessible? How much work?
As an aside there is a lot of accessible core data that would be available.
MBH are trying hard …
There’s lots that could be done and there’s no question that auditing would get results. I don’t think it would do any harm even if there are no gotcha’s. Business audits are not generally gotcha exercises. I wasn’t expecting anything very much from auditing Mann’s work – it just seemed like something that should be done, since no one had done it and I approached it like a big crossword puzzle. Some of the issues really surprised me.
I haven’t done an inventory of what’s available and what’s missing in the CRU data. I do know that Warwick Hughes, who is familiar with the data set, has tried to get station data from Phil Jones and been rebuffed. Jones’ answer was: “We’ve got 25 years invested in this. Why should we let you see it when you’re only objective is find flaws?” I posted up the exact words on a previous occasion. If you’re trying to replicate calculations of the gridcells, you need the station data as used, especially if the methods are sparsely (or even inaccurately) described as they were in the multiproxy studies. Without exact data and/or exact methodoogy, it’s hard to get a foothold. Even with a foothold, it takes time as witness how long it’s taken to get the full story on MBH98 – which is still incomplete even with the latest source dump, which leaves many questions unanswered. Maybe Barton’s crowd will tak an interest in this problem and get the raw data up for inspection here.
I once tried to get the dataset used in the Jones’ study on UHI, in NAture and cited by IPCC. Jones said that it was on a diskette somewhere, but he didn’t know where and it was now superceded.
I’ve had some approaches from a journal specifically interested in replication, although the topics tend to be interested in replication plus robustness/sensitivity analysis, which is fine as well. I think that there is a market for studies. Email me offline if you want some suggestions.
If its a random walk, shouldn’t you take first differences as the first part of the ARIMA process (the I in ARIMA)?
Re #18: who knows what it “really” is? It’s not i.i.d., but it’s not a random walk either. The AR1 coefficients are very high, but are less than 1. So it’s more that random walks can give some insight into the behavior. But the forms that I’ve modeled are ARMA(1,1) so even in a random walk scenario, the innovation term would need to be an MA type term, which would affect run behavior pretty dramatically. I’m experimenting a bit here and I suspect that if you combine a trend with various sorts of ARMA noise, you might very well get series that are hard to distinguish from other ARMA noise.
Re (#10) As I recall OLS gives a consistent trend coefficient even in autocorrelated data. But a trend coefficient–consistent or not–on its own is not of interest, what we want to know is (a) whether it is significant, and (b) whether the data are stationary or not. Deciding both issues is necessary to permit conclusions about whether the data follow a nonzero trend or not, even if the sample looks like it does.
Modeling autocorrelated residuals is certainly the starting point but there’s a lot more to it. I think Steve is justified in complaining about the way time series data are handled in some climate papers. For instance in the IPCC Report chapter 2 there’s a table (2.1) of trend coefficients and some are labeled "significant" but the only description of the estimation method is "restricted maximum likelihood".
What does this mean? It’s not enough just to say a correction for autocorrelation was used–this may just imply a simple AR1 model. Inferences on trend coefficients require that unit roots be tested and ruled out (or handled properly if present) and ARMA and ARCH processes controlled so the residuals are truly IID. (BTW IID = independent and identically distributed, a prerequisite for the t and F stats being valid). In a time series context it is hard to compute unbiased variances and tstats.
Nonstationarity can be hard to distinguish from strong autocorrelation but represents a discrete change in the meaning of the underlying process, and in the climate context would imply a fundamentally different interpretation of the physical mechanism that generated the observations. But I agree there are lots of papers presenting sophisticated treatments of climate time series.
The papers I’ve seen on temperature (eg Tsonis, Karner, Woodward&Gray, Kaufmann and Stern etc) differ on whether temperature data is stationary or nonstationary, because it appears to be at the boundary and either interpretation has supporting evidence. But it matters (hugely) which it is and I would think this should be a frontline research question. Precipitation data sometimes appears to follow a Levy process, which is nonstationary and has no finite variance, undermining the conventional meaning of ‘extreme’ events. A mathematician wrote Steve and I after our first paper came out to say he’d looked at the ice core data we had on our web site and it tests as a Levy process.
John writes: Edited for clarity. What do Canadians have against paragraphs?
Beware that the sought for trend in the data is small, between 60 and 180 mK/decade, which is much smaller than the signal (noise?) caused by ENSO and volcanics.
http://home.casema.nl/errenwijlens/co2/howmuch.htm
Steve:
Re(#4,#5)
As I said I am not a statistician. I understand what your saying (I think) about the gambling comparison.
I guess what I was wanting to know is does the starting and ending point of the dataset affect the trend?
I ask this because when I look at the graph apart from about 1998 everything is in between +0.4 and 0.4 above and below the zero point. So is it just coincidence that the start of the trend line is near the starting point of the data and the end of the trend line near the end point? Or if we trimmed a little of the data off either end so that the graph starts on a high and ends on a low whether this affects the trend line?
I accept that this is purely hypothetical and that in either case the trend line maybe statistically insignificant I am simply asking the question as an amatuer trying to better understand what I am looking at.
It wouldn’t make much difference to an OLS trend, where all points contribute. However, for a gambling run (random walk), intuitively, the endpoints do matter, as the path wouldn’t seem to matter as much, other than for the information it provides on variance and distribution of the innovations (individual bets).
Steve (#12): You say:
“I did indeed inadvertently use ‘normal’ instead of ‘independent’ in my comment reply and I have corrected it”.
Well, I guess that is the privilege of being the owner of this site. Now, would you like to show a little good faith and correct my “local” to “global” in my response #1 to the 6/21/2005 thread “WSJ Editorial”. My mistake was just as “inadvertent” as yours, but you saw fit to blow it into a conspiracy theory in thread “IPCC 1[1990] – Comment #1” (6/24/2005), which subsequently attracted 72 postings. You probably do not like the reference to a “conspiracy theory”, but please remember that you did say “I think that I’ve seen the same mischaracterization elsewhere”. You would also gain a bit of my trust if you deleted that whole thread.
However, let’s talk about technicalities. You ask “how do you allow for autocorrelation in your calculation of trends” with the rider “I’m quite happy to discuss this on a constructive basis and leave aside past attempts to score debating points”. Well, firstly, I do not trust that you won’t drag this out into an inquisition, as you have done with other people on other occasions. So I will give you the bare details of what we do and leave it at that.
Firstly, we estimate uncertainties in different ways so as to convince ourselves that the techniques are reasonably robust. Secondly there are two obvious ways to proceed. One method is to calculate the trend using all the data points and to estimate the uncertainty assuming a number of degrees of freedom derived from the record length divided by the integral time scale (rather than from the number of data points). The other method is to initially (i.e. prior to calculating the trend) average the data into adjacent temporal bins that are sufficiently large for the residual to be statistically uncorrelated. The uncertainty in the trend may then be calculated by basing the number of degrees of freedom on the number of bins. The test for noncorrelation may be based on the DurbinWatson statistic, although simply putting an upper limit on the lag1 correlation yields virtually the same result.
And that’s the level of detail I’d put in a publication.
Finally, you ask “do you think that the i.i.d. assumption is justified?”. I think I’ve indicated how I would test for noncorrelation in the above discussion.
Dave Dardinger (#14):
My Oxford Handy Dictionary says: “coworker ….. one who collaborates with another”. That is what the word means and that is what I meant.
Steve said “normal” when he meant “independent”. The words have completely different meaning, but I was quite happy for him to admit his mistake and to correct it. I just wish he’d done the same for me.
Sticking to matters at hand, estimation of uncertainties is a subject that really interests me in paleoclimate and one where I’m really having trouble understanding practices. 1) You say that you estimate uncertainties in “different ways”: what different ways do you use? 2) How do you estimate “integral time scale”? 3) how do you decide the number of bins? 4) how do calculate the upper limit to prescribe when you “put an upper limit on the lag1 correlation”?
If there are statistical authorities that you primarily rely on for this methodology, I’d be quite happy to consult them, as opposed to worrying you with an exposition of texts. All of these would be fair questions for someone to ask of a publication.
Re#22,
I did a quick trendline using MS Excel and the data downloaded from UAH. Maybe I was using an old data set or the wrong starting point, because I came up with just under 0.11 deg C/decade for all the data, not the reported 0.123. At any rate, using the trendline I came up with, the effect of your first suggestion (June 1980Dec 2004) only slightly flattened the line to nearly 0.10 deg C/decade. The effect of your second suggestion (Dec 1984Dec 1998) raised it to 0.17.
Steve (#26):
You ask:
> 1) You say that you estimate uncertainties in “different ways”: what different ways do you use?
I described three in #24.
> 2) How do you estimate “integral time scale”?
It’s in the standard texts — e.g. “Data Analysis Methods in Physical Oceanography” by William J. Emery and Richard E. Thomson, Pergamon Press, 1998.
> 3) How do you decide the number of bins?
I explained: I “average the data into adjacent temporal bins that are sufficiently large for the residual to be statistically uncorrelated”.
> 4) How do calculate the upper limit to prescribe when you “put an upper limit on the lag1 correlation”?
I can’t remember the value I use (I’m not in my office) — but you can pick a value that agrees with one of your “pet” methods of testing for autocorrelation — a nice little experiment for you to do — and then you can please tell me the answer.
Michael Jankowski (#27): I think you’ve just discovered what is often colloquially called the “bleeding obvious” to anyone who has spent any time at all doing linear regressions — i.e. that the actual value of slope is quite sensitive to what you actually do. This spread of results is, of course, one component of the uncertainty of the trend.
re #15
Nice evasion, John, but I wasn’t concerned with your latest argument with Steve. I was wondering why you brought up what you and your coworkers did? Go back and read what I wrote and then you can respond appropriately. If you wish. Unlike how a lot of your messages to Steve sound, I’m not demanding you do what I ask just because I ask it. If you don’t want us to know who your coworkers are or why your internal rules for error estimation are significant visavis the procedures of Mann, et. al and other climate proxy researchers, well then, that’s fine.
BTW, just what is the technical meaning for "normal" errors? I understand that "independent" errors are ones which have no particular common source (& some random distribution), but, and I assume this is what happened with Steve, "normal" has the common meaning of ‘ordinary’, ‘runofthemill’ and systemic errors would be taken by the lay reader as being nonnormal. And I don’t think "normal" in this case has the mathematical meaning of orthogonal (at right angles). Is it referring to the shape of the error distribution?
Steve: “normal” means a Gaussian (bellcurve) distribution (2 * pi)^1 * exp (x^2) for N(0,1). If you have autocorrelation, you can have Gaussian errors, but the results from one sample will be related to the previous sample – hence not “independent”. I think that the origin of the term “normal” applying to Gaussian errors derives from sort sort of orthogonality but I don’t recall what off hand.
Re#29,
I’ve done thousands of linear regressions. I “discovered” the “bleeding obvious” you refer to a long time ago. I was simply doing a quick task another poster had requested. I thought it was worth a few minutes of my time to satisfy the curiosity of an obvously interested poster and thought it was worth noting that the trend using all the data I had was not exactly the reported value, along with providing two possible causes.
John H, re #28.
1) I thought that the “first” and “second” in your paragraph referred to different topics and got wrongfooted by that. As I understand it now, one method is consulting integral times scales; the second method is to conslt the results from binning: is that what you had in mind?
2) I appreciate the reference to Emery and Thomson, which I’ll look at some time. You will understand that it’s a text that I would have not thought of consulting as a statistical reference, since it is trade specific. I’m not sniping about this, just noting that I wouldn’t have been able to guess that was a “standard” text.
4) I don’t see the point of guessing games. If you get a chance later, I’d appreciate some info on the upper limit policy.
I’m not in favor of deleting threads, commments, etc. Much better to issue corrections in a later post. If they are interpolated (which is obviously more work) than it should be clear that a change was made.
This is a discussion and a record of discussion, not a finished or draft publication. It should be clear what process is going on. Editing previouscomments will lead to confusion.
Also, we need to better disaggregate the “you made a mistake, no I didn’t” (i.e. clarifications) from the disagreements on the issues themselves.
Also, I don’t understand the hesitancy to disclose all details to critics. If they find a math error, great. If they are sophists, well, that’s a danger worth accepting, to allow all possible critics to examine the work. Plus, you might actually learn something from having someone look at the data/methods with a critical eye. And at the end of the day, this is a BIG signal to noise problem (hence all the complicated stats), so digging into the nitty gritty is reasonable here, where it might not be on another problem (like say voltage versus current for a resistor).
But I give John H., more credit for honesty than others criticized for at least hanging on the side and engaging.
Re: #28 pts #13
John H — it’s nice to see reasoned info sharing; thanks!
I guess I have a big problem with using studies that lead to policy decisions, while refusing to disclose details of the studies. As a decisionmaker, I view that kind of behavior as innappropriate and not giving me a “warm fuzzy” about the studymaker’s confidence in his work. AND CONGRESS is a decisionmaker. And the FEDERAL GOVERNMENT does fund climate research, so if they want to nose around in the work, TOUGH TITTIES. and yes, they will grandstand and talk for about 10 minutes and question for about 2. Deal with it. Everyone else who gets funded has to. And heck, the requested disclosure from Mann was notable for how detailed it was from a technical perspective. If you’re a real scientist, you welcome that. Bring it on!
Sure, some opponents will try to twist whatever you have to argue against it. So what? But that is no reason to hide details. I mean the antismoking types are not scared to reveal all their science, knowing that tobacco companies will mount whatever argument that they can. The danger in keeping this stuff hidden is that faults will NOT be exposed. That’s not science, that’s PR!
How would you feel about buying a house from someone who told you, you couldn’t do a home inspection, “because everyone uses them to drive the negotiation”? I mean, he’d be right! but so what? That’s still no reason to not allow an inspection. And it would give me the willies to buy a house from someone who took that stance.
Oh…and by the way, I’ve got my “union card”. Have several peerreviewed publications in the experimental natural science literature. I don’t know whether the earth is undergoing AGW. It very likely could. I do know that the scientists advocating it are not behaving properly, if they deny their critics full freedom to pick apart the studies (and without any requirements of who is “qualified” to look at it). Sheesh! Dick Feynman would be amazed at y’all here.
Re: #28 pt #4, #29
But it would be great to avoid these kinds of comments, as they don’t help further the discussion, but just reflect poorly on the author.
Re #36
At the risk of spinning way offtopic, you really don’t want to cite the antismokers as paragons of scientific rectitude. The current campaigns for total smoking bans have involved some abuses of statistics that make MBH look like saints.
Steve (#32):
> 1) I thought that the “first” and “second” in your paragraph referred to different topics and got wrongfooted by that. As I understand it now, one method is consulting integral times scales; the second method is to conslt the results from binning: is that what you had in mind?
Yes. And I gave two ways of testing the “binning” method for noncorrelation.
> 4) I don’t see the point of guessing games. If you get a chance later, I’d appreciate some info on the upper limit policy.
I’m in my office now, so I have the reference. The limit for the lag1 correlation that I use is 0.3. See Ostrom, C.W., Jr., 1990. Time Series Analysis, Regression Techniques, Second Edition: Quantitative Applications in the Sovial Sciences, v. 07009; Newbury Park, Sage Publications. Also a “trade specific” publication, I’m afraid. I got the reference from http://www.ltrr.arizona.edu/~dmeko/notes_11.pdf.
Also, I wasn’t setting up a “guessing game”. I simply said “you can pick a value that agrees with one of your ‘pet’ methods of testing for autocorrelation “¢’¬? a nice little experiment for you to do “¢’¬? and then you can please tell me the answer” – I’d actually be interested to get your help and agreement on at least something. It’s called collaboration.
Dave Dardinger (#30): Sorry if I ignored most of your ramblings in #14 — I couldn’t see any point in your question and nor did I see any point in attempting a response. As regards your later question of “what is the technical meaning for ‘normal’ errors”, I am rather surprised that you were participating in this thread and still didn’t know about Normal distributions (what DO they teach in high schools nowadays?). But thanks, Steve, for explaining.
Michael Jankowski (#31): You may have “done thousands of linear regressions” but you have not apparently yet learned that quoting a trend without an uncertainty estimate is pretty meaningless (for example how do I know that all the estimates in your posting #27 aren’t statistically indistinguishable?). However you’ll be glad to know (in a perverse sort of way) that this problem is exhibited widely by both practicing scientists and climate contrarians – sometimes out of slackness and sometimes to “prove” a point that isn’t really there at all. This issue was the main point of my work on sea level at Tuvalu, although many commentators seem to have missed it.
Anyway, let’s hope that this thread will make some positive contribution to the science and not just serve as a way of increasing Steve’s tally of “gotchas”.
Let’s collaborate with John H. and drag him over to the side of mathematical rectitude. We can get some publication count out of it too and you know raises and all that. ;)
RE: #27
Michael thanks for that. It seems you copped a bit of flak for it. I appologise for that.
Re: #41
Thanks John I appreciate that you need uncertainty estimates for trends. I guess when I look at the graph and how it is determined I have a large degree of uncertainty that we can determine any meaningful trend.
As I have stated I am a Layman and am no way critising the work of the UAH team that produced it. But just from my laymans point of view here are some of my reasons for uncertainty.
This graph represents (in my laymans terms) the difference in the calculated average global temperature for the troposphere.
So to get the average you take temperature readings over many areas then use some (from my point of view) complicated calculation to work out an average. Then chart the difference in this average over a period of time.
This chart then shows (apart from one spike in about 1998) that this fluctuates in a range of +0.4 to 0.4.
From my point of view I would have thought that most min and max temperatures would fit into +40c and 40c a range of 80c (I don’t know how this extrapolates to the troposhere). So when we see a fluctuation of an average in a range of just 0.8c and a trend of 0.123c this is a very small difference. If I then extrapolate that back to where I live (Mount Dandenong, Victoria, Australia) which has a max/min range of +39c to 2c that fluctuation of 0.8c is just 2% of my range (41c) and the trend of 0.123c is just 0.3%. These are very small values and almost unmearsurable in a “real” world.
As I said earlier this in no way repudiates the work of the UAH team it is simply my way of determining “Uncertainty”. I personally can’t stake my future and that of my children on values and trends which are so small as to be unperceptible in the real world
I just want some frigging palm trees and alligators in Virginia. Pronto. What is taking so long with gettting the Endless Summer?
Ross McNaughton (#44): It is difficult to argue statistics with someone who doesn’t want to believe statistics. However, when you say “these are very small values and almost unmeasurable in a ‘real’ world”, perhaps you should note the word “almost”. This sentiment was echoed by Thomas Crowley in the recent presentations of Bradley, Crowley and Ammann (6 April 2005; see http://www.ucar.edu/webcasts/ and the thread “UCAR Webcast of Bradley, Crowley, Ammann …..”) where he said “….. these warmings are already poking their head above (indistinct word) natural variability of the system”.
Unfortunately, you are inconsistent and finish with: “I personally can’t stake my future and that of my children on values and trends which are so small as to be unperceptible in the real world”. I assume you mean “imperceptible”, but I don’t think either word means “almost unmeasurable”.
Perhaps you just don’t want to believe that the trend is significant.
TCO (#45): I know your comments are generally lightweight, but I find #45 in extraordinarily bad taste. While there may be both winners and losers due to global warming in the coming centuries, the losers will probably have it so bad that I don’t think it is a joke.
Don’t cry. See my previous comments on Realclimate for proof of my gator interest. I totally understand people who release them in the hope that they will take. oh…and maybe I’m just expressing that although I’m almost 40, I’m not noticing the climate change in my daily life.
To get serious, you can obviously argue both sides of benefit of warming (or cooling). I don’t have a real dog in the fight although I wish that US middle atlantic was warmer in winter.
And if someone dies from GW, sure I care. But, it’s all…I’m tired need to sleep now. Later.
I mean, I believe that the GW is happening, it just seems so slow that it doesn’t do much good.
;)
(bed now)
Re #40
Well, I don’t know what they teach now, but I assure you that in the early ’60s in a small school in Ohio, they didn’t teach statistics, at least not in the College prep classes. Maybe there was some statistics in a business class. Anyway because of the particular particular programs I was in, ACS approved Chemistry as an undergrad and Biochemistry in Grad School, I never had a chance to take a statistics class. I do have a couple of statistics texts around the house, however, and find them of use sometimes. But I didn’t feel like digging one out or doing a search on Google, so I figure it was a placating gesture to ask an easy to answer question.
You’re being nearly polite on this thread, so I won’t bug you further.
Re: #46
Thank you John for your reply. I am sorry that I gave the impression that I don’t believe the statistics.
Yes I see a trend is it statistically significant I do not know and I will certainly defer to you if you say it is.
I guess where I was trying to come from is that if I look at it in say 5year blocks I see the following:
Years 15 trend is down
Years 110 trend is flat
years 115 trend is up but less than now
years 120 trend is higher than now
years 125 trend is same as now
years 130 trend is up
But apart from 1998 everything is inside +0.4 to 0.4 and if we had a downward spike of similar magnitude to the 1998 up spike in the next 5 years then the trend would become flat. I know this is a what if but I think I am looking at natural variability not a significant trend. As well based on the accuracy of the underlying data
Quote from the UAH team
“The result is that the satellite temperature measurements are accurate to within three onehundredths of a degree Centigrade (0.03 C)”
this 0.3c makes up 25% of the trend trend is between 0.09c and 0.15c. I would have thought as well the computations required to convert the data in to a global mean could only compound this variation in accuracy.
From my reading of what Steve is saying is that based on the type movement we are seeing in this graph the use of a trend line does not help shed any real understanding of the underlying processes which create it.
Please don’t interperet this as me being a “contrarian” trying to refute a scientific study I am just trying to put my ideas on it out there to see what others think.
For Pete’s sake, someone wanted to know what would happen to the slope of the trendline if the starting and endpoints changed, and I told him. I’m not submitting my analysis for publication, using it to form the basis of a dispute, etc. Grow up.
John,
I don’t wish anyone worse. I guess I think that most likely the climate is changing because of CO2, but it’s happening very slow. Things like this (same deal with peak oil or with end of the frontier) are just big trains that are going to move down the tracks and have effects. I would rather be a bit chipper about it. And seriously gators and palm trees would be nice. I really like subtropical climate.
From the “Shorter Oxford Dictionary”:
To trend: Have or assume a general direction or tendency
Tendency: Movement toward or in the direction of something
I also can see no tendency in this graph. Slapping a straight line on it, no matter how carefully crafted, and calling it a “trend” is being misleading, to say the least.
Ross McNaughton (#52), Stephen H (#55) and others: I can only repeat Thomas Crowley’s statement (see #46; I only wish I could decipher the “indistinct word”):
“….. these warmings are already poking their head above (indistinct word) natural variability of the system”.
Climate scientists are detecting a small signal in a lot of noise. To do this you have to be a bit sophisticated — so just looking at the data and saying “it doesn’t look like a trend to me” isn’t good enough. It’s a bit like a doctor diagnosing a hazy shadow on an Xray image of your lung. If he thinks he can see something significant, I think you would be a bit daft to say “it doesn’t look significant to me, so I’ll ignore it”. The doctor has previous experience and a lot of knowledge on his side.
I think you would agree that in such cases you would both apply the precautionary principle.
I agree with John on this one.
Re: 56:
Classic “Appeal to authority” fallacy. You can’t make this stuff up.
But you’ve got to love the analogy to medicine. Medical care is a prescribed good and you don’t have normal markets or prices for prescribed goods. The classic quote from the thief er doctor is “Your money or your life?” If you are Jack Benny you say “I’m thinking, I’m thinking” This seems to be the case in the climate change prescription, “Your prosperity or your life?”. Let’s just say “I’m auditing, I’m auditing”
Steven Hales (#58): Sorry, I haven’t yet seen anything you’ve audited, but perhaps I missed it — I thought Steve did the auditing and you guys cheered.
Re #56 re Crowley: Let’s suppose that a doctor recommends an operation to remove your lung or something like that. Before doing so, you decide that you want to have a second opinion and ask the doctor for a copy of the Xray image so that you can independently analyze it or have it independently analyzed. Let’s just say that you have questions about whether the image is even an image of a lung – you think that it looks like an image of a kidney or perhaps an image of a leg. After 26 inquiries, the doctor says that he has misplaced the orignal image and gives you a blurred photocopy of the Xray image. When you then ask the doctor a few questions about the blurred photocopy, the doctor refuses to answer.
Home run, Steve M!
You know, one thing I’ve observed is that people on the left (and while climate change is not per se a left/right thing, as a practical matter it has largely fallen out that way) have a difficulty with analogies. I don’t know how many times I’ve made an analogy to someone and they’ve simply not understood it. And when they do try one themselves as John H did, they don’t see that they’re often setting themselves up to be pinned. I’m sure John will come back with something he thinks rebuts you, but he’d be smarter if he’d admit he was bested this match.
Steve and Dave (#60 and #61): I’m not interested in playing smart games. I gave the analogy to try and indicate why you have to be a bit sophisticated in detecting a signal among significant noise — that’s all. I would tend to either do an “expert” analysis or ask for “expert” advice — I wouldn’t just trust my kneejerk reaction. Now if you guys want to make alternative points by extending the analogy then that’s fine — but that’s a different discussion.
Dave Dardinger :
John Hunter :
Looks like the same discussion to me. Come on, John, prove Dave wrong.
fFreddy (#63): Oh dear. As I said, “I’m not interested in playing smart games”.
Please see my next post and see if you can get your head around some actual technicalities.
The technicality of moving forward in time to read your next post is above and beyond my capabilities. I’m out.
For anyone who wants to play around with linear regression:
Here’s a way for some of you to get a feel for linear regression in the presence of noise. The idea is to generate your own “synthetic” trend, contaminate it with autocorrelated noise and see how the trend (obtained by linear regression) varies with different “realisations” of the noise (keeping the statistics of the noise the same). Here is a recipe for making time series with (visually at least) similar statistics to those of Figure 1 in the main posting:
1. Generate a random series of 300 “monthly” values, with a standard deviation of 0.75. This is 25 years long or about the same length as Figure 1. For example, taking the simplest of possible distributions of random numbers, choose randomly (with a 50% probability for each) from +0.75 or 0.75.
2. Smooth (1) with a running average which is 13 points wide (i.e. 13 months), omitting the 6 smoothed values at each end which do not have enough surrounding points to “fill” the whole averaging width. The resultant series contains 288 points and is now autocorrelated.
3. Add a constant trend of 0.001 units/month (or 0.12 units/decade) to (2). This will look similar to Figure 1, in a statistical sense.
4. Do a linear regression of (3) and note the apparent trend.
Repeat (1)(4) (with different random numbers) many (e.g. 100) times.
If you do this you will get a feel for:
(a) The spread of the trend estimates for different sets of random numbers.
(b) How a signal in which the trend has apparently been completely swamped by noise can still retain some statistically significant measure of the trend.
For the above example, and 100 realisations of the time series, I obtained trends distributed as 0.001 +/ 0.0005, which are significantly different from 0 at about the 98% confidence level. However, if I look at any single time series, the trend is not always visually apparent.
I think I’ll take you up on this. However it will take a day or two as I just downloaded R yesterday and am still learning it. OTOH a lot of what you suggest is covered in the example in appendex A of the manuel, so I already know how to add noise to a series. BTW, does it make a difference if you add the trend before or after you smooth the data? (I don’t think so, but maybe I’m overlooking something.)
You can ‘cherrypick’ a series from the above process that does show a trend, but that does not provide a basis for belief that the satellite series has a trend. The statistics of the synthetic and natural data should differ anyway. I don’t know if the results would go like this, but say they did, wouldn’t it convince there is no rational basis for belief that the trend is nonzero?
1. Replicate the published regression methods to get accepted trend and significance.
2. Show the points are not IID, hence use of regression is flawed.
3. Fit a model with better loglikelihood (i.e. ARMA)
4. Show trend is no longer significant with robust tests.
For good measure you would throw in ARMA analysis of the Hadley model and JH simulations above to show the simulations don’t represent reality, ruling them out as possible explanations. I haven’t seen a paper that is as simple and direct as this in all the series stuff, if it is the case, too much figure skating.
David,
The signal to noise ratio antropogenic/natural is about 0.15/1.0 so we are talking about extreme low fidelity here. I can see that ARMA is a goos tool to test time series quality, I fail to see how this method can be of use as a predictor of physical processes.
There are four mechanisms dominating the temperature record: Volcanos, ENSO, Sun, and GHG, the first three can easily be derived by linear multivariate analysis the magnitue of last one is unknown, however if we also assume linearity as is the proven for the first three mechanisms a linear trend between 65 and 173 mK/decade is the result. It is possible to detect this trend in the data. But atrributing the full remaining trend to CO2 alone is tricky because of the uncertainty in the solar signal (see eg wilson
ftp://ftp.ngdc.noaa.gov/STP/SOLAR_DATA/SOLAR_IRRADIANCE/ERBS2003.DOC) or the unknown multidecadal oscillatos, for which no mechanism is known.
Re #56:
Climate scientists are detecting a small signal in a lot of noise. To do this you have to be a bit sophisticated “¢’¬? so just looking at the data and saying “it doesn’t look like a trend to me” isn’t good enough. It’s a bit like a doctor diagnosing a hazy shadow on an Xray image of your lung. If he thinks he can see something significant, I think you would be a bit daft to say “it doesn’t look significant to me, so I’ll ignore it”. The doctor has previous experience and a lot of knowledge on his side.
Except that the “Doctors” predicted a coming Ice Age in the 1970’s due to a perceived cooling trend. You would think they would learn their lesson, and not make wild predictions based on short term trends.
Perhaps the lesson they learned was “if we make enough noise we’ll get more funding” ?
cheers,
Robert
Hans, Yes estimating the size of mechanisms is important. The drivers could be nonzero and give a zero temperature trend through cancellation (e.g. sulphates and CO2). The question at hand, though, is whether the trend in temperature data is a false rejection of the null hypothesis of no trend. It seems to be well known that both trend and significance by linear regression are biased estimators with this type of autocorrelated data and exaggerate the trend and significance.
But, since you are concerned with the drivers, estimators of these would be biased in any multivariate setting as well, including estimation of size of the coefficients for the physical drivers. Wouldn’t you need to know the size of this bias before making pronouncements about the actual size of the coefficients with any certainty? Any source that has done this would be appreciated. I think Stern and Kaufman (1997) have done something like this, and the CO2 coefficient vanishes under the time series model, and reappears weakly with cointegration.
Then, after parameterization of a time series model including the physical drivers the possible futures could be simulated in the usual fashion, giving predictions of mean and variations in possible future temperature comparable with CGCMs or linear trends. It seems simple enough.
Re#66 (or anyone else interested in the exercise),
Let’s say the instrument getting your readings is accurate to +/ 0.1…or +/ 0.5…or +/ 1.0. What happens in these cases? I won’t be around for a few days, but I look forward to seeing your results as soon as I return.
re #71
Here are my favourite references:
Michaels, P.J. and P.C. Knappenberger. 2000. Natural signals in the MSU lower tropospheric temperature record. Geophysical Research Letters 27:29052908.
Douglass, David H, and B. David Clader, 2002, Determination of the Climate Sensitivity of the Earth to Solar Irradiance. Geophysical Research Letters 10 , doi:10.1029/2002GL015345.
Douglass, David H, B. David Clader, and R.S. Knox , 2004, Climate sensitivity of Earth to solar irradiance: update. Physics, abstract physics/0411002. http://citebase.eprints.org/cgibin/citations?id=oai:arXiv.org:physics/0411002
last one is online.
Dave Dardinger (#67): You ask “does it make a difference if you add the trend before or after you smooth the data?”
No it shouldn’t, as the running mean of a trend is the same as the trend (if you ignore endeffects, which you should have done by rejecting the ends, according to my “recipe”). But I don’t see why you would want to try and smooth the trend anyway — you are trying to autocorrelate the noise, not the trend.
David Stockwell (#68):
You say that “you can cherrypick a series from the above process that does show a trend”. Well, I think I showed that, for the particular parameters chosen in my example, there would be only around 2 realisations in 100 that would show a negative trend (i.e. there is about 98% confidence of a positive trend). So you don’t have to cherrypick — by far the majority of realisations would show a positive trend. But that’s what my “recipe” is for — you can play games.
You also say “show the points are not IID, hence use of regression is flawed”. This would be a wrong conclusion. Linear regression is O.K. for both correlated and nonstationary data, so long as you do it properly. If the points are correlated, then you have to adjust the number of degrees of freedom when you estimate the uncertainty in the trend (and other regression parameters). If the data is nonstationary, then the analysis should allow for the variation of apriori errors (if known) using an appropriate weighting technique.
Robert (#70): It would be nice if you could show that the broad consensus of scientists “predicted a coming Ice Age in the 1970’s” and urged governments to do something about it (e.g. through a process as comprehensive as the IPCC process). There is a big difference between a few climate scientists talking about possibilities, and most climate scientists talking about probabilities.
Michael Jankowski (#72): You can include instrumental errors in the “experiments” if you like. But, in general, environmental variability has a far greater effect on the final uncertainty than instrumental errors.
re #74
Well, I wasn’t talking about smoothing the trend, of course. I said “smooth the data” if you’ll look. And in this case the ‘noise’ is the data. Of course in the real world there are things like volcanic eruptions, el Ninos, cycles of various sorts as well as various sorts of ‘true’ noise.
I have some things to say about your ‘plaything’ once I get it working, but I’ll hold off for now.
Dave Dardinger (#78):
You said in #67: “does it make a difference if you add the trend before or after you smooth the data?”.
I assume by “data” you mean:
(a) “noise+trend” if you add the trend before smoothing, or
(b) just the “noise” if you add the trend after smoothing.
So, if you add the trend BEFORE you smooth the data, then you smooth the noise and you smooth the trend (case (a), above), because smoothing by taking a running average is a linear operator, so:
smoothed (noise + trend) = smoothed (noise) + smoothed (trend)
I agree with John. And Steve will back him up. of course, one can’t dismiss signal within noise. I’m a former sub guy. I wouldn’t ignore a contact at several negative dB. The key issue is what is the statistical significance. and that ends up being a highly mathematical thing. and is where MBH has gotten gigged by MM and Storch.
The problem with a linear trends is that climate is an oscillating system. All finite length data sets have a low frequency limit. The lowest frequency that can be captured is defined by the period of the data set, which in this case is 300 months. IOW, your data has to run at least as long as the period of the lowest frequency to insure you don’t get a meaningless trend. If for example there is a oscillation in the system that is 1200 months long, only a quarter cycle will be present in the data set. There is no way around this. If the data is noisy, as it is in John’s example, there is no way to tell if the trend is a “trend” or just an artifact of the short sample period. Using the procedure, and values (which can be changed), from John’s post # 66, I put together a Excel spread sheet to demonstrate this problem. It includes 4 graphs (All filtered).
1) 300 month linear trend + noise
2) 300 month sine trend + noise
3) 600 month linear trend + noise
4) 600 month sine trend + noise
Each graph has a linear trend line for a quick visual comparison. At 300 months the sine is at it’s maximum value, at 600 months the sine is at a zero crossing (1200 month period). Recalculating the sheet (f9) generates a new set of random numbers.
P.S. The spread sheet is Excel 2000, if anybody needs version 97 I will upload one.
Re#66,
John Hunter,
Thank you. This is a great learning opportunity. I have some questions. Why are we smoothing over 13 months? How do you (or anyone else) select the period of smoothing? Should the standard deviation be “approximately” 0.75 since the numbers are random and will change?
I did the exercise and got 0.001 ± 0.001 (95% confidence) over 100 runs.
I also tried something a little different. I extended the random series out and generated 3,000 monthly values (250 years) with a standard deviation of ~0.75. I then calculated a long series 25 year trends and found that about >15% of the time there were trends greater than 0.0005 (up to 0.0015) and that none of these trends could be reliably used to forecast the subsequent 25 year trend.
Jeff
Greg F (#81):
“You say “the problem with a linear trends is that climate is an oscillating system”.
I would hope that we all know this very well by now (climate scientists certainly do). So I hope it is no surprise to anyone — it is certainly no surprise to me. One of the favourite tricks of the contrarians is to pick data records that are too short. For example, see: http://www.trump.net.au/~greenhou/ and search down for:
“LongTerm Sea Level Estimates from Tide Gauge Records: A General Statement”
and:
“Use Short Data Sets or Data From a Small Region”
The contrarian literature is full of this variant of “cherrypicking”.
Jeff Norman (#82):
> Why are we smoothing over 13 months?
I just adjusted the smoothing period (and the standard deviation) to get a record that looked (statistically) like Figure 1 (i.e. the time scales and variance looked about right). I leave it to you to play around with other possibilities.
> Should the standard deviation be “approximately” 0.75 since the numbers are random and will change?
I think you misunderstand. The random numbers may be selected from any reasonable distribution which has a standard deviation of 0.75 (or whatever you choose). Something called the “Central Limit Theorem” ensures that, after smoothing, the results do not depend much on the shape of the original probability distribution. I just gave you an example of a very simple distribution which consists of a 50% chance of the result being 0.75 and a 50% chance of the result being +0.75 (which of course has a standard deviation of 0.75).
> I extended the random series out and generated 3,000 monthly values (250 years) with a standard deviation of ~0.75.
Because you only looked at adjacent 25year trends, all you have done is to effectively repeat my recipe 10 times (which is quite poor statistically — I suggested that you repeat it 100 times).
> I then calculated a long series 25 year trends and found that about >15% of the time there were trends greater than 0.0005 (up to 0.0015)
I’m not sure you have this right. The trends should be centred on 0.001, so about 50% should be above 0.001 — so I can’t see how only 15% could be above .0005.
I found the trends to be distributed as 0.001 +/ 0.0005 which means that they would be greater than 0.002 for about 2% of the trials, greater than 0.0015 for about 16% of the trials, less than 0.0005 for about 16% of the trials and less than 0.0 for about 2% of the trials.
> none of these trends could be reliably used to forecast the subsequent 25 year trend
I’m not sure what you mean by “reliably” — the statistics of the trend is 0.001 +/ 0.0005 — is that “reliable” or not? I don’t think it matters — the point of this exercise is to get a feel for the effect of noise on trends estimated by linear regression.
John Hunter (#84)
Sorry, I meant to say I extended out the unbiased randon series to show that you could get a 25 year 0.001 trend at any time in the series. Does that make more sense?
Jeff
Jeff (#85): Not really, I’m afraid. However you pick your 25 years from the 250 years (or longer preferably), if you pick enough of them, the trends should have a distribution given by 0.001 +/ 0.0005 (where the last number refers to the standard deviation).
John Hunter (#86)
In the first part of my experiment I repeated your experiment with the 0.001 trend bias added to the 100 individual random runs. I calculated an average trend of 0.001 plus a standard deviation of 0.0005 which I multiplied by two to derive the “0.001 ± 0.001 (95% confidence)” that I reported above. So I got the same results as you, surprise.
In the second part of my experiment I calculated a number (twenty or so) of unbiased runs extended out to 3,012 “months” or 250 “years” of running averages. I did this because I wanted to see how often I would get a 25 “year” trend of +0.001 in an unbiased series. It was on average more than 15% of the time (min ~8%, max ~22%). The highest trend was >+0.0015, but the average of the 250 unbiased trend points was ~0.0000.
Jeff
John H, I’ve not had an opportunity yet to look at your references, but will do so. I have looked at your examples. They don’t illustrate the situation described in the post. Your examples show: if there is a true trend, OLS methods can find it despite quite considerable noise. However, the issue at hand is one of “false positives”, i.e. identifying a trend when there is no trend in the data. The development of techniques to avoid false positives is not easy.
I would urge you (and others) to read or reread the following paper by Carl Wunsch (an oceanographer even) touching on very similar issues, even citing Feller, quoted extensively in this post. Wunsch, C., 1999, The Interpretation of Short Climate Records, With Comments on the North Atlantic and Southern Oscillations (pdf) http://ocean.mit.edu/~cwunsch/papersonline/bamsrednoise.pdf
For others, there are many interesting papers at Wunsch’s website.
For a different issue on trends (which does not apply to the tropospheric series) but worth noting, people might be interested in D. B. Percival and D. A. Rothrock (2005), `”Eyeballing” Trends in Climate Time Series: A Cautionary Note’, Journal of Climate, 18, no. 6, pp. 88691. http://faculty.washington.edu/dbp/PDFFILES/eyeballingtrends.pdf
Jeff Norman (#87): Thanks, now I understand — for your “second” experiment, you used a zero trend to generate your data.
For the first experiment, when you applied a trend of 0.001, linear regression returned trends distributed as 0.001 ± 0.0005 (standard deviation) — i.e. the same result that I found.
If you had, instead, applied a zero trend to the “first” experiment, you would have found that linear regression returned trends distributed as 0.000 ± 0.0005 (standard deviation) — i.e. the spread of the trends would have been the same as the spread for the original “first” experiment. This is an important result and perhaps one that I should have indicated earlier — that the spread of the regression trend depends on the variance and temporal scale of the noise and not on the specified trend.
Now, as I indicated in #84, your “second” experiment is not really different from your “first” experiment, except that the specified trend was different. Therefore the spread of trends should be the same (although I note that you say “I calculated a number (twenty or so) of unbiased runs extended out to ….. 250 years of running averages” — which means that some of your records must overlap and hence are not independent). I generated 1000 realisations of 25 year records with zero specified trend, and found the following distribution of regression trends (the figures in brackets are the theoretical values for a standard deviation of 0.0005 and a normal distribution):
0.0005 14.5% (15.9%)
> 0.001 2.0% (2.3%)
which agree well with the theoretical values.
If you didn’t get results similar to the above, then I suspect you did something wrong.
Re. #87 and #89 (for Jeff Norman, but also Steve please note!:
The “Comment Submitter” seems to make a stuffup when I input a table which includes percentages (or perhaps “less than” or “greater than” symbols), so here is the table at the end of posting #89 again:
less than 0.001: 2.7 percent (2.3 percent)
less than 0.0005: 17.5 percent (15.9 percent)
greater than 0.0005: 14.5 percent (15.9 percent)
greater than 0.001: 2.0 percent (2.3 percent)
Steve (#88): You say my examples “don’t illustrate the situation described in the post” and that “the issue at hand is one of ‘false positives’, i.e. identifying a trend when there is no trend in the data”.
I disagree. I have indicated (and hopefully clarified in #89) how to derive the statistical distribution of the trend from synthesised data which uses a prescribed trend and autocorrelated noise. From that distribution you can easily calculate the probability of getting a false positive. For example, taking the case described in #89 and #90, if the trend is really exactly zero, then the probability of deriving a trend greater than 0.001 (one possible definition of a “false positive”) is about 2 percent.
It is easy to make linear regression sound overcomplicated and I think much of your article at the beginning of this thread does just that. A regression trend, in it’s simplest form, is just a weighted average of the input data (having first removed the simple average from the input data), where the weights are just a linear trend (negative at the start of the record and positive at the end). So if you know the statistics of the input data, it is just “first year” statistics to derive the statistics of the trend.
So if you think the statistics of an average are simple, then so too are the statistics of a regression trend. If, on the other hand, you think the statistics of an average are complicated, then so too are the statistics of a regression trend.
As regards the quote from Wunch, you should note that he refers to data that “appears visually interesting” — the pitfalls inherent in making VISUAL inferences from time series are what my series of postings is really all about — see a following posting on this.
Linear Regression 102:
For those who have completed Linear Regression 101 (posting #66), here is the next assignment: :)
Do the recipe of posting #66 for a range of specified trends (e.g. 0.003, 0.002, 0.001, 0.0005, 0, 0.0005, 0.001, 0.002, 0.003) but with the other constants as defined in the recipe. For each of these specified trends produce plots of, say, 10 different time series (each of which is the result of a different set of random noise). Select plots randomly from this total of 90 plots and show them to an observer, asking them the question “what is the sign of the trend in this plot?”. From this, derive the success rate of the observers for different “signal to noise” ratios (i.e. different values of the magnitude of the prescribed trend). Now do the same experiment except, instead of using real observers, use linear regression to estimate the sign of the trend (i.e. use the results of step (4) of the recipe).
Which is more successful at detecting the correct sign of the trend — the visual observer or linear regression?
The purpose of this exercise is to illustrate how a linear regression is a much more sensitive way of determining a trend than visual inspection of a plot.
Spoon feeding time : What values should one put into the top two equations (AR1 and ARMA)above, to produce dummies of the global satellite time series? In particular what SD should one use for the random component?
Steve: See http://www.climateaudit.org/?p=300 for ARMA coefficients. I’ll post up sd later. Remind me if I forget.
RE#72, 77,
I’m still waiting for the calculated response(s) from you. To just claim that it doesn’t matter much doesn’t hold much water without a statistical demonstration to justify it (interesting: dismissing a statistical analysis as part of a statistical demonstration). The +/1.0 spec I suggested could be ignored, as it would seem almost impossible for such an instrument to produce readings limited to the 0.75 to +0.75 range over such a large number of readings. I suggest revising them to +/ 0.10, 0.25, and 0.50.
Maybe a better question would be to have asked what the maximum instrumental error could be for the trend to remain significantly different from zero at various confidence intervals.
I hope to see something by the end of the week, although it really shouldn’t take much time at all for a veteran such as yourself. I think it would be a tremendous addition to your Lin Reg 101 (or 102, if you feel it is that advanced).
Michael Jankowski (#94): I don’t normally spend a lot of time responding to your questions as you, like Steve, seem continually on the offensive. In this thread I have suggested numerical experiments that you can do to get a feel for the effect of noise on a regression trend. If you seriously want to learn, then go ahead and do the experiments — I am not going to do them for you. You can add (uncorrelated) “instrumental” noise between steps (2) and (3) in the recipe of posting #66. It will have a similar effect to correlated noise but, for a given magnitude, will contribute less to the spread in the trend estimates.
My statement in posting #77 that “in general, environmental variability has a far greater effect on the final uncertainty than instrumental errors” is quite obviously true in many instances. For example, take a look at the sea level record for Tuvalu at:
http://www.antcrc.utas.edu.au/~johunter/tuvalu.pdf
Figure 5 shows that the monthly sea level has as environmental variability with an amplitude of at least 10 cm and a correlation scale of years. However, the instrumental error for a modern tide gauge such as this is down at the 1 cm level and the correlation scale is considerably less. The environmental variability is clearly dominant. But don’t take my word for it — do the numerical experiment by synthesising records that statistically look like the Tuvalu record — that’s what posting #66 was all about.
Re#95,
I thought you were presenting an exercise in linear regression for the “cheerleaders” here. It would be shameful to ignore something like instrumental error in such a case. After all, it’s so improper to present the value of a trendline without including its error, even during casual conversation. So why would it be acceptable to ignore any portion of the error in the statistical analysis?
Your work with Tuvalu is certainly not representative of all environmental measurement. And regardless of how frequently environmental variability is much larger than instrumental error, instrumental error needs to be accounted for without simply being disregarded. And I thought we were talking about generating statistically significant trendlines with noisy data in a general sense, not specifically Tuvalu or necessarily an instance where instrumental error was relatively insignificant. Maybe this isn’t very applicable with Tuvalu or even the satellite measurements, but I would say it’s quite important when you’re talking about something like tree ring proxies from hundreds of years ago.
Thank you for granting me the permission to do your exercises. As I said in another post, I’ve done thousands of linear regressions in my lifetime and don’t need a refresher course. I simply wanted to see how you specifically treated instrumental error and have it demonstrated for the cheerleaders. I apologize for making such a demanding request. Clearly, your time is better served arguing over the application of terms like “aftermarket” to climate change and when one can properly use phrases such as “more advanced than a 2nd year or 3rd year essay” and “beyond undergraduate degree.”
Michael Jankowski (#96): The purpose of my “recipe” in #66 was “for some of you to get a feel for linear regression in the presence of noise”. Jeff Norman took it as simply as that (saying it was “a great learning opportunity”), had a go, came back with some serious questions (which I hope I answered satisfactorily) and hopefully learned something useful. It was just an exercise — to see the effect of noise on the estimation of a trend. If you want that “noise” to represent environmental noise, then that’s fine. If you want that “noise” to represent instrumental noise, then that’s fine also. If you want to introduce two sources of noise (as I indicated in #95), then again that’s fine — it’s up to you. If don’t want to have a go at it then I’m quite happy with that — but please don’t waste any more of my time trying to have an argument where none exists.
On 8/18 these were “serious technical issues” and now they are just “exercises.”
I fail to see how my question/request wasn’t “serious,” relevant, and important and wouldn’t have provided “something useful” for people here to learn.