## A Red Noise Spaghetti Diagram

I notice that the phrase "spaghetti diagram" is catching on a little. In connection with MM05, we archived 100 simulated PC1s, using the weird method of MB98. Just for fun, here’s a spaghetti diagram from 6 of these, chosen from the 100 at random, with 25-year smooths. You’ll see why I’m pretty unimpressed with spaghetti diagrams as a statistical technology. Given the proliferation of spaghetti diagrams, this obviously needs to be written up. I can splice the instrumental record and it will probably look better than the current round of spaghetti diagrams. The underlying issue is that the Mann’s PC methodology is only one form of cherry-picking, although it is a mechanized and very efficient form of cherry-picking.

1. John A.

When my daughter reaches maturity, she’ll never believe how backward and primitive we once were to believe such nonsense.

2. Posted Mar 2, 2005 at 3:50 AM | Permalink | Reply

I suspect that the statement "Just for fun, here’s a spaghetti diagram from 6 of these, chosen from the 100 at random ….." is untrue. If we are to believe MM05 (and I’m not saying I do), of the 10000 simulations, 99% were 1-sigma hockeysticks, of which approximately ONE HALF were upside-down hockeysticks and the other half were the right way up (as we would expect from a symmetric analysis technique). Now the spaghetti diagram of 6 series selected from 100 "at random" only shows "right way up" hockeystics — a rather unlikely event (1 in 2^6, I believe) — or has someone been cooking the books (or at least not admitted that "oh, we did throw away the other 6 series that didn’t fit")? Or is this rather trivial error just the tip of an iceberg of more serious shenanigans …..?

Steve’s comment: PCs are unoriented and I believe, in group-theoretic terms, would be a coset of the two oppositely-oriented forms in a quotient group. The 100 examples, as described in the article, were all from the upside-up branch, but there is no intrinsic orientation. The salient point is that the generation of the bend. The MBH99 NOAMER PC1, which drives MBH99, is an upside-down form which Mann inverts. I’m not objecting to Mann inverting the PC1. Is this something that bothers you?

However, the point illustrated here needs to be read in context with my post on Jacoby. Jacoby picked the 10 “most temperature sensitive” chronologies out of 36. I pointed out that Mann’s PC method was one way of generating hockey sticks and Jacoby’s was another. In fact, if you apply Jacoby methods to cherry-pick random data with the persistence properties of the North American network, you get series looking rather like the upside-up branch of Mann’s PCs. I had these examples readily at hand. It’s interesting that non-cherrypicking northern treeline studies like Wilmking have chronologies going both up and down. Also that Jacoby suppressed a later version of the Gaspe series which did not have a hockey stick shape.

When I write this up in more detail, I’ll make this clearer.

3. Posted Mar 2, 2005 at 6:16 PM | Permalink | Reply

O.K. Steve. Perhaps I misunderstood. Because you said, "I can splice the instrumental record and it will probably look better than the current round of spaghetti diagrams" I assumed that you were suggesting that the spaghetti diagram you showed was representative of actual reconstructions using random data. However, they are only representative of actual reconstructions if PC1 is the ONLY PC selected for the reconstruction AND the random data actually selects (though the loadings) a "right-way up" hockeystick rather than an "upside-down" hockeystick — which it would only do approximately half the time. But surely one would select ONLY PC1 if PC1 accounts for the bulk of the variance — and for random input data, I am sure it doesn’t.

It seems as if you are suggesting that, if I do a multiple linear regression and one of the dependent variables is a parabola, then I am "mining" the data for parabolae — which is only true if I ONLY fit a parabola and NO OTHER basis functions.

I’m sorry, but I just find your article very misleading. It suggests to the non-expert that Mann et al. "mine" for hockeystick reconstructions, whereas they MAY mine for hockeystick PCs — which is a VERY different thing.

Finally may I thank you for this little discussion. Prior to it, I had some reservations about whether or not Mann et al. "mine" for hockeystick reconstructions. With your help, my mind is now much more at ease.

Steve: It doesn’t look to me like you know much about principal components.

Steve again: John Hunter sent me an offline email saying that my last response was too snippy. It was snippy mostly because of the tone of his prior email. However, in fairness, underneath the unpleasant language of his postings, I can see that there is a legitimate inquiry here in relation to the comment about linear regression and parabolas. One big difference between linear regression algorithms and principal components algorithms is that linear transformations of data have a big effect in PC algorithms, but not in regressions. Most people are used to thinking in regression terms, where standardizations only affect coefficients. This is not the case in PC algorithms so your premise is wrong. I’ve tried to explain this as clearly as possible in the various articles – there’s additional information in the E&E article.

The MBH98 algorithm definitely “mines” for hockey sticks. Whether this was done by accident or intentionally (and not disclosed) is really unknown right now. The more generous assumption is surely that it was done accidentally. If it was done intentionally and misrepresented in the original article, in most walks of life, this would be treated more severely.

4. Posted Mar 3, 2005 at 5:14 AM | Permalink | Reply

Steve, you still say "The MBH98 algorithm definitely “mines” for hockey sticks" — I think you still mean that it mines for hockeystick PCs and NOT for hockeystick reconstructions (assuming you have included sufficient PCs in the reconstruction).

Steve: There are two modules in MBH98 that need to be distinguished : the PC module and the regression module – and my language here wasn’t as precise as it should have been. The MBH98 PC algorithm mines for hockey stick shaped series. It’s a flawed algorithm since it nearly produces a hockey stick with a significant bend from networks that have the autocorrelation properties of the North American network. William Connolley’s observation that the simulated hockey sticks don’t have much of a bend is simply wrong. I’ve archived 100 simulated PC1s with bends as big as the MBH98 temperature reconstruction.

This observation about the flaw in their PC algorithm has become established pretty fast once GRL accepted our article. Mann et al. are trying to argue that this error, like their numerous other errors, “doesn’t matter”.

This is going to be pretty hard to sustain. There’s been some disinformation about how we use this observation.

What we did was to follow what the flawed method did in the actual MBH98 network, which contained flawed proxies: most notably the bristlecone pines, which have been noted to have an anomalous 20th century growth pulse not related to temperature (posted by Graybilll and Idso to be due to CO2 fertilization, but other possibilities are canvassed in our E&E article.) The flawed method picked out the most flawed proxies. The interaction was lethal.

If you fix the method, you still have the flawed proxies. If you use the number of retained PCs listed in MBH98, then the bristlecones are in a lower PC and the impact of the flawed proxies is much diminished. Mann et al. argue that application of Preisendorfer’s Rule N as a test for retention of PCs ( the use of which is not evidenced in other networks – see my post on Was Preisendorfer’s Rule N Used? – and appears to be an entirely ad hoc method introduced to enable them to retain 5 PCs, which in turn enables the flawed proxies to dominate the regression reconstruction from the Pc4 position. If you think about it, any non-climatic trend will show up in a Preisendorfer-type test: so this doesn’t prove that the proxy is any good.

The flaws should have turned up in the verification statistics. If the MBH98 was picking up a real climate signal, it would have a significant R2 as well as a significant RE statistic. However, if there was a non-climatic trend, then you get a spurious RE statistic along the lines of our GRL article simulations. The MBH98 R2 for the 15th century step is about 0.0. They withheld this statistic in the initial reporting and have continued to refuse to provide this statistic or to provide their digital reconstruction for this step from which it could be calculated. Try justifying that. You can’t.

5. Posted Mar 3, 2005 at 4:34 PM | Permalink | Reply

Steve, you say "Try justifying that. You can’t.".

I’m not trying to justify anything (the "hockeystick" work is not mine to justify). What I’m trying to do is to understand, and you continually seem to dodge away from the central issue of my original posting, which was:

1. PCs are DIFFERENT from reconstructions — you seem to want to blur this distinction in your promotions (e.g. in the "spaghetti" article above).

2. Mining for hockeysticks in PCs is DIFFERENT from mining for hockeysticks in reconstructions.

3. There is no obvious reason why a "mined" hockeystick in a PC should be prominent in the reconstruction, if you select the PCs properly and calculate the loadings properly.

4. As far as I know, you have NOT showed that the method of MBH98 mines for hockeysticks in the reconstructions.

Steve: I agree that mining for hockeysticks in the PC algorithm is different than the effect in the MBH98 reconstruction. In our GRL article and elsewhere, we don’t say that the NH hockeystick is "simply an artifact" of the erroneous PC method. However, we do argue that a valid temperature reconstruction should out-perform these simulated PC1s in a NH temperature reconstruction. Hardly anyone has paid attention to this argument, but I think that it’s really very significant.

Our E&E article is where we discuss the impact on the regression module. You’ll see that the NOAMER PC1 and Gaspe series are outliers from the other statistics and singlehandedly change the 15th century from high to low.

While there may be no "obvious" reason why 1-2 hockey stick series should dominate the NH reconstruction, under MBH98 methods, this is empirically what happens. If you look at my Replication posts, especially on the Reconstructed PC1s, you’ll see that I’ve got some serious questions about what’s going on in the 15th century portion of their regression module as the results are inconsistent with other periods. Source code – please. Maybe you’ll have better luck with Mann. Have you tried?

In one of the two outliers, the MBH98 "method" is nothing more complicated than editing the data to add a hockey stick series (and does not involve PCs). In this case, they did not report the editing and misrepresented the start date of the series so noone knew about the editing. This is a unique example of this sort of editing. This sure sets off alarm bells for me as something that is really hard to justify.

If you increase from 2 to 5 PCs, the bristlecones as a PC4 will still be an influential outlier and together with Gaspe overrule the other series.

What happens is that two hockey stick series are given far more weighting than the other 20 series. This exemplifies the lack of robustness. Remember MBH98 claims that their results were "robust" to the exclusion of dendroclimatic indicators. This is flat out false, as we discuss in EE. It is false on the present admissions of MBH98 – if their results were "robust", they wouldn’t depend on whether the PC4 was in or out. This should be self-evident at this point.

6. Posted Mar 3, 2005 at 9:29 PM | Permalink | Reply

Steve, you ask: “Source code – please. Maybe you’ll have better luck with Mann. Have you tried?”.

No, I have no reason to. I have long believed that if you give me any scientific paper, I can find flaws in it. However, science works by being self-correcting, so that the results of these flaws (if not the actual details of the flaws) generally get recognised in the course of time. I could have spent my career writing comments on papers pointing out these flaws (after I had wasted the time of the authors asking for their data and programs). And how would that have helped the progress of the science? — very little, I suspect.

My question to you, Steve, is this: with your obvious mathematical and analytical abilities, why do you not do something constructive (such as actually improving on the reconstructions of Mann et al.) rather than follow the rather futile, negative and destructive course you are on now?

For your interest, I have indicated one aspect of the scientific significance of your present critique of Mann et al. in a posting to realclimate (www.realclimate.org/index.php?p=128#comments, posting 23, with a minor correction in posting 27 — yes I make mistakes too!).

7. Ed Snack

John, just try this instead of carping. Reconstruct the MBH results without the BCP series, and see if you get the same results. I urge you, try it, it might just change your mind. One other direct (and related) questions, do you believe that the BCP responses in the 20th C are primarily temperature related ? If not, is it reasonable to include them as a temperature proxy ? Now, IF the MBH method minus the BCP’s does not produce a "hockey stick" like result, is it a robust reconstruction ? One further question John, with your obvious mathematical and analytical abilities, why don’t you do something constructive (such as actually improving or perhaps correcting the reconstructions of Mann et al. and M&M) rather than follow the negative and destructive course you are now on ?

8. Michael Mayson

John writes:”My question to you, Steve, is this: with your obvious mathematical and analytical abilities, why do you not do something constructive (such as actually improving on the reconstructions of Mann et al.) ….”

But M&M’s corrections of errors and flaws in the MBH reconstructions is an improvement.

9. Posted Mar 5, 2005 at 12:07 AM | Permalink | Reply

Steve and Ed: again the discussion has drifted away from my original concern, which was summarised by my statement (posting #5):

“1. PCs are DIFFERENT from reconstructions — you seem to want to blur this distinction in your promotions (e.g. in the “spaghetti” article above).

2. Mining for hockeysticks in PCs is DIFFERENT from mining for hockeysticks in reconstructions.

3. There is no obvious reason why a “mined” hockeystick in a PC should be prominent in the reconstruction, if you select the PCs properly and calculate the loadings properly.

4. As far as I know, you have NOT showed that the method of MBH98 mines for hockeysticks in the reconstructions.”

Now, if Steve could show the results of RECONSTRUCTIONS based on the random data used to produce the PC1s in the spaghetti diagram that started this article (using an acceptable method of selecting PCs and loadings) AND if these showed a hockeystick shape, then his little article on “spaghetti diagrams” might have a point (I would also be happy for Steve to “splice the instrumental record”!). And if that turns out to be the case, then my next question would be: “why did Steve not show the simulated reconstructions rather than the PC1s in the first place?”.

10. Ed Snack

Is it correct to paraphrase your point now, John, that Steve is essentially correct in all he has said (bar the snippiness maybe), but that he has been a bit misleading in his original item by presenting what are simulated PCs are full reconstructions ? A point BTW that didn’t bother me originally as I read understood the diagram to show PC1′s as the third sentence in facts states.

I suggest that your point 3 is generally correct, but actually in this case it is part of the issue. And it is probably true that if you preferentially mine for HS’s in PC’s, that you will preferentially end up with an overall HS. This does bring up the BCP records again of course, if one went by the logic of your #3, then the BCP’s would of course be excluded anyway. And that is important, critical even, but maybe OT for the narrow point you wish to discuss here ?

11. Posted Mar 6, 2005 at 5:34 PM | Permalink | Reply

Ed, I think we’ve really come to the end of this discussion — it seems pretty clear to me that Steve is not going to produce a set of hockeystick reconstructions produced from random data using the method of MBH98 — that’s all I was looking for.

As for your comment that “it is probably true that if you preferentially mine for HS’s in PC’s, that you will preferentially end up with an overall HS” — does this mean that you also believe that Fourier analysis “mines for sinusoidal waves”?

12. Ed Snack

John, what a curious comment, so you think what Mann et al are doing is analogous to using Fourier analysis ? And, in the absence of any constructive comment against, I take it that you agree with Steve’s general comments.

13. John Hunter

Ed:

> so you think what Mann et al are doing is analogous to using Fourier analysis?

Fitting basis functions is the common theme, once you have chosen the basis functions
(in Mann et al’s case, the PCs) ….

> in the absence of any constructive comment against, I take it that you agree with

Nope — I just choose not to spend my time discussing them — they were off my original point. I try not to waste time being dragged into diversionary discussions …

14. Steve McIntyre

A couple of comments. The idea that Mann’s method of mining for hockey stick shaped series is somehow analogous to Fourier analysis "mining" for sine waves or linear regression "mining" for straight lines is spreading in the pro-Mann blogosphere. It seems like such a ridiculous idea that I hardly know where to begin, but I guess I’ll have to spend some time on it. But one obvious comment is that Mann said that he used "conventional" PC methods, which do not mine for hockey sticks. If he intentionally transformed his data so that the PC procedure mined for hockey sticks, then he had an obligation to disclose this and let referees and specialist readers at the time decide whether this was a good procedure or not.

Secondly, John Hunter has misconstrued the way that we used the simulated hockeysticks in our article and the way that I used the simulated hockeysticks in the quick iluustration here. We regressed the simulated PC1s against NH temperature to show that they yielded spurious RE statistics. It seems self-evident to me that the MBH98 reconstruction should be able to out-perform these series generated from random data. For reconstructions which actually contain a climate signal, there should be a significant R2 statistic as well. Mann has refused to disclose the R2 statistic for his 15th century reconstruction – which will be about 0.0 when it’s finally revealed.

I didn’t see the necessity of extending the simulations to the full NH temperature reconstruction, since we’d proved the point about spurious RE statistics without it. I’m planning to do some new simulations to illustrate cherrypicking methods and Mann’s regression module would probably be useful. While I have an exact replication of Mann’s PC calculations, I have only an approximate replication of Mann’s NH reconstruction and was reluctant to do a big simulation exercise until enough information was provided to permit an exact replication.

On the spaghetti diagram, again, my point was merely to show, in a mathematical sense, that you can produce spaghetti diagrams from random data. This does not prove that the usual spaghetti graphs were produced with random data. What I’m trying to get to, and I’m thinking “out loud” both here and in the initial post, is that, if scientists want to argue that a spaghetti graph means something, then one would like to see some form of statistical quantification of the point that is being argued.

My own take on the actual spaghetti graphs, and I’m going to work this concept up formally, is that their “common” features are a product of the use of a couple of stereotyped proxies (particularly bristlecone pines) and stepwise methods, which instroduce instrumental “proxies” later in the record, together with a majority of proxies which make negligible constribution to the reconstruction, but give the illusion of lots of stuff going on.

15. Spence_UK

This is me thinking out loud on one of the topics here, hope you don’t mind.

Does the MBH98 method mine for hockey stick shapes in real data as well as noise? I believe the answer is yes, although this is from reasoning out what causes the "mining" rather than conducting any experiments. The reason for this is related to the way in which PCA works. PCA cannot distinguish between "mean offset" and a "variance" – which is why it is common practice to use mean-centred PCAs, otherwise you introduce "variance" which isn’t really there.

The method applied to the proxys in the North American network offsets the record using an average associated with a short period of the training data set (1902-1980). Let us assume that a record in the set contains values similar to the rest of the record (say 1000-1902). This record will then have a mean close to zero, and a low "variance" (from the PCA viewpoint).

Let us consider another record, which has a level in the period 1902-1980 that differs from the rest of the record (1000-1902). This record will be offset, creating a large mean value for the duration of the record. To the PCA, this record will appear to have a "large" variance in all except the period 1902-1980.

Now let us consider a large data set of records. Let us imagine a data set in which 90% of the records have a low value in the period 1902-1980, and 10% which have a large value of arbitrary sign. The decentring process is applied, in which the 10% are offset with a large mean error. This is treated by the PCA process as "variance", and hence these records are identified as the records with the greatest variance, and promoted into a high principal component – with more than their fair share of "variance explained" (as the offset error will count as "variance explained") in the eigenvalues.

Even worse – note I mentioned "arbitrary sign" – if a record has an offset with an opposite sign, PCA will recognise it as a negative correlation, and will invert the temperature record (!!!) before adding it into the mix. So a small number of "large offset" records of arbitrary sign will combine to dominate the variance explained with consistent sign, producing the "hockey stick" consistently from noise or signal, with exaggerated variance explained, particularly when signal is present.

If the normalised data is used to reconstruct the principal components, the hockey stick will appear with 1902-1980 period with a mean of zero and the rest with an offset. If the pre-normalised data is used, the reverse is true.

This explanation is quite technical, but I hope it helps clarify the problems with the decentred PCA, and why it would apply to a signal as well as noise. An exercise for the reader is to consider the consequence of the exaggerated "variance explained" is on the reconstruction.

Steve: you’ll be interested in the diagram in our EE article which shows exactly such a reversal of signs. It’s really hard to think of any justification for using PC methods for extracting a “signal” from noisy records. I’ve been experimenting with this and a simple mean seems to always work better.

16. Spence_UK

Steve,

I had a quick scan of the document before but I missed it, having checked again now I see the discussion on MM05 (EE) page 77. I would go further than your discussion on this page – not only does this problem occur when using the peculiar decentring convention applied by Mann, it could occur in a conventional centred PCA as well (although your tests suggests in this case, it hasn’t). It should be easy to test for as the top PCs will contain (significantly large) negative values in the eigenvectors if a negative correlation is present.

My view is still that the PCA is not an appropriate tool for reconstructing temperatures in this way. I was interested to read in Ian Jolliffe’s texts that when he used PCA to assess temperatures, he would use a large mean offset (e.g. express the records in degrees C) which would minimise the chances of negative values in the eigenvectors and tend to cause the average temperature to be promoted into PC1. PC1 then becomes (essentially) an "average" of the data sets. Unfortunately this is not practical when dealing with proxy records, because they cannot be easily related to a scale such as degrees C without injecting considerable additional noise into the process.

Steve: I agree with you 100%.

BTW in the tree ring networks, they have all been standardized to nondimensional units (with a mean of 1). – so these neworks are different from more heterogeneous networks (although they are still pretty heterogeneous).

I’ve been doing some experiments in which I take a “signal” and mix varying proportions of noise with the signal and then carry out PC analyses, using a network of 70 pseudoproxies. A centered PC always underperforms a simple mean in extracting the signal. I agree with you that PC doesn’t seem like an appropriate method. In fact, it’s really hard to think of a valid reason for using it. However, it is a really good method of mining for hockeysticks. I’ve been looking closely at what happens when signal extraction starts to fail. The PC methods seem to get pulled off-signal by outliers and the signal gets spread out over the first few PCs.

17. David Ball

I notice that my phrase "spaghetti diagram" is catching on a little.

There’s not a little hubris in that comment. I guess the fact that so-called "spaghetti diagrams" are used extensively when working with ensemble forecast output escaped your notice.

In connection with MM05, we archived 100 simulated PC1s, using the weird method of MB98. Just for fun, here’s a spaghetti diagram from 6 of these, chosen from the 100 at random, with 25-year smooths.

How much variance is explained by each of these? I repeated your red noise experiment and the PC’s accounted for a whopping 2 to 3% of the variance. That is completely different than for MBH who’s PC’s account for 50+%.

Steve: I was only referring to the phrase “spaghetti diagram”, not the form of illustration itself. But now that you mention it, I’m not sure what is proved statistically by ensemble spaghetti diagrams either.

As to variance, if you apply a flawed method to flawed data, you get particularly flawed results. Following the flawed method to see what it picks turns out to be an excellent way of finding flawed data in the case of MBH98.

The MBH98 PC1 uncannily picks out the flawed bristlecone pine data (problems with which are acknowledged in MBH99, Hughes and Funkhouser [2003] and elsewhere). A high eigenvalue under the flawed method does not prove that the bristlecone ring width data has any relationship to temperature. Mann’s grabbed the wrong end of the stick here. Here it only highlights the most flawed data.

It’s also interesting to look at R2 stats. If MBH98 were actually recovering a climate "signal" (as opposed to the bristlecone pine fertilization effect), then it would obviously have a significant R2 for its 15th century portion of the reconstruction. MBH98 did not report the R2 (because it is almost certainly about 0.0). Everyone would have laughed at it. Now when I try to get the R2 or the supporting calculations for the 15th century portion to calculate their R2, we get diatribes against the statistic. The time for these diatribes was in 1998. The R2 statistic should have been reported then and readers left to judge for themselves whether they wanted to accept a reconstruction with a 0.0 R2 statistic.

18. Spence_UK

How much variance is explained by each of these? I repeated your red noise experiment and the PC’s accounted for a whopping 2 to 3% of the variance. That is completely different than for MBH who’s PC’s account for 50+%.

Steve has already answered this comment, but I thought I’d add in my two pence worth (4 cents if US/Can ;))

Something I teach my up and coming grads is that half of the battle of statistical analyses is "asking the right question". Here Prof. Mann and Steve are asking entirely different questions, and it is important to understand exactly what the questions are and what the results mean.

The test that Mann (and you) are applying here is a test of statistical significance, whereas the test Steve is applying is one of data interpretation. They are two distinct tests, and have their own uses.

The statistical significance test is only useful if you know how to interpret what the PCA has (significantly) extracted from the data.

Prof. Mann on realclimate seems to have used the significance test as a diversionary tactic, diverting people’s attention to the flawed interpretation aspect of the test. But if you can’t interpret the data what is the point in having any kind of significance? The R squared statistic simply highlights that the temperature interpretation of the PCs is plain wrong.

Steve: thanks for the comments, thinking about heterogeneity of variance was one of the things on my list to consider, it would be interesting to look into these aspects a little closer.

Steve: Preisendorfer’s Rule N is hardly written in stone. If you look at Overland and Preisendorfer [1982], all they claim is that this is a necessary but a sufficient condition for significance. In typical realclimate way, this gets transposed so that it is now argued as a “mathematical error” not to acquiesce in data mining results which have some slight “significance” under Preisendorfer’s Rule N. If you think about it, the implication of their use of Rule N is that you could turn ordinary regression calculations into principal component calculations and, under Preisendorfer.s Rule N, every spurious regression would magically become significant. What surprises me is the traction that this argument has had in the climate science community, when statistically it doesn’t hold water.

BTW it’s not clear that Mann actually used Preisendorfer’s Rule N in his tree ring networks. I can’t replicate the selections in other networks; it looks to me like it might be an ad hoc method to grab the PC4 in the North American network. This may be one reason why Mann doesn;t want to show his source code.

19. David Ball

I couldn’t help but notice you didn’t address the problem of the variance explained. When you run your red noise experiment you find PC’s that explain a very tiny fraction of the variance, usually about 3%. You seem to be making an unwarranted assertion that this result can be extended to the Mann et al case where their PC’s explain 50+%.

The simple fact of the matter is that there is nothing wrong with the PCA technique employed by MBH, provided it is done properly and frankly, there is some doubt whether your application of the technique is appropriate. For example, you make the claim that following the MBH method increases the variance of upward trending series. That simply isn’t the case. If you follow MBH’s methodology the variance of upward trending series is reduced.

Steve: First of all, the variance issue does not show that the bristlecones are a proxy for temperature. All it shows is that the bristlecones are exceptionally hockey stick shaped. Indeed, that requires an explanation. The very dominance of the bristlecones in the MBH98 PC1 points to a problem.

I disagree with your estimate of variance as well – the 3% figure that you attribute to me is not from any of my calculations: can you give me a page reference where I give that figure?

In our article, we talk about the de-centering of the MBH98 method as causing a re-allocation of variance so that hockey stick shaped series are over-weighted. The idea that the variance is “reduced” is simply incorrect. You can tell that you’ve misunderstood the point merely by looking at the highly weighted MBH98 series: the most hockey stick shaped series are the most overweighted in their PC1. This is a matter of fact, not of speculation.

I think that I’m pretty careful. Can you point to a case where a “simple error in addition and subtraction” has escaped my notice?

Have you sorted out with Dr Mann the many odd errors in MBH98 – now described by Cubasch as a “can of worms” and von Storch as “shoddiness”?

20. Spence_UK

I couldn’t help but notice you didn’t address the problem of the variance explained.

I couldn’t help but notice that you only focus on the significance issue, and ignore the fact that the interpretation is all wrong. The trouble is, for the PCA to be useful you need both to be right.

This argument, promulgated by RealClimate, has been repeated so often now I feel the need to start making analogies. Picture the scene at the coffee shop…

Customer: I’d like a café latté, please, to go
Vendor: Certainly sir, straight away
(Vendor promptly pours shot of espresso in a cup and hands it to customer)
Customer: Errmmm.. excuse me. There is a problem with this café latté. It has no milk in it.
Vendor: But it contains the finest espresso shot money can buy!
Customer: I don’t care. It has to have milk in it to be a café latté.
Vendor: I couldn’t help but notice you didn’t address the issue of the espresso.

21. Steve McIntyre

Spence, I don’t know which is stranger: the realclimate argument or the traction that it’s got with climate scientists. For example: if you took the 50 non-bristlecone pine series in the North American AD1400 network and added in a set of stock prices for 20 dot.coms for 581 days from 1996-1998 and then ran a principal components analysis – be it MBH98 or correctly, you’d undoubtedly get out a PC representing the dot.com’s, which would meet a Preisendorfer test. Obviously that would not mean that dot.com prices from 1996-1998 were a meaningful proxy for world temperature history from 1400-1980.

If someone told you that you had a data set of 70 series, but that it included 20 spurious inter-related series, that you didn’t know which ones, but you needed an instant guess as to which ones, PC would probably be a pretty good way of locating the spurious series.

22. Spence_UK

I couldn’t agree more with your comment in #21, Steve.

Back to the original point of this post, I also concur with the problems of the spaghetti diagram. Take this example in Wikipedia which is often cited. Looking closely at the graph, the proxies follow the temperature record quite closely in the period late 19th to early 20th century. This isn’t really surprising as this is the calibration period. The values seem to range between around -0.7 and 0.

From about 1500 to 1800 the lines are just complete noise between around -0.8 and -0.2. As this is broadly within the minimum and maximum of the calibration period, it suggests to me that whatever proxy is being used to generate the records are still, to within around 1/4 of the noise level, within the minimum and maximum levels, and do not really correlate to anything as a function of time.

The exception is the medieval warm period, which strangely shows quite strongly in all but one of the records that go back that far, values typically ranging from -0.3 to 0. However, given the nature of the 1500-1800 period I’m still not inclined to read anything into it.

The splice of the instrumental record has all kinds of other problems, as we know, such as assumptions of extrapolated linear response in the proxies etc. etc., as you have touched on elsewhere in the site.

What these graphs say to me is that we are still struggling to draw any meaningful conclusions on what historical temperature was. But I guess in some ways they are kind of like the “ink blot” diagrams psychiatrists use. Perhaps they tell us more about what is in someones mind than what is on the page.

23. Steve McIntyre

Spence, your dialogue about the caffe latte reminds me of a dialogue that I was thinking about to express some of my own frustration with realclimate logic. For example, one of Mann’s arguments to supposedly “refute” the point that you can generate hockey stick shaped series from red noise is that Rutherford et al. get a hockey stick shaped series using a different method. What Rutherford’s method has to do with criticisms of MBH98 method eludes me: perhaps the method of Rutherford et al is valid, but that doesn’t save MBH98.

Do you remember the famous Monty Python scene where the customer tries to return a stuffed parrot and John Cleese insists that the bird is alive. I feel like that customer. You can picture the dialogue as Cleese says that Rutherford’s parrot is alive, so this proves that the parrot that he sold me is alive.

The other scene that I’ve pictured is Cleese with a pigeon decked out in a few feathers being sold as a parrot. The pigeon is the bristlecone pine series; a “parrot” is a multiproxy study in which all proxies matter. I try to return the poor little pigeon as not being a parrot. So Cleese goes behind a screen and re-emerges with the pigeon dressed up in slightly different plumage (e.g. 5 “significant” PC series) and tells me that it’s a parrot. Or he goes behind another screen and returns the same pigeon with peacock feathers (e.g. 80 of 95 proxies being US tree ring series and no PCs) and tells me it’s a parrot.

The logic at realclimate doesn’t seem any different to me. The logic is no different: civilians get thrown off by mentions of Preisendorfer’s rules and principal components, but it’s really just pigeons and parrots.

24. Spence_UK

Now being a Python fan that analogy does appeal to me a lot I can picture it now…

“This hockey stick is dead. It has passed on. It has expired and gone to meet its maker. Bereft of life, it rests in peace. It has shuffled off this mortal coil and gone to meet the choir invisible. This hockey stick is no more.

… it … is … an … ex … hockey stick!”

“Well, I’d better replace it then… I’m afraid we’re right out of hockey sticks. I’ve got this Preisendorfer Rule N…”

PS. In the original sketch, the “alternate offering” was a slug! I’m nit-picking now, but a slug seems to describe the bristlecone pines so much more accurately than a pigeon…

It is frustratingly difficult to get across an understanding of statistics to non-statisticians, partly because the field is so different to “normal” scientific analysis. It makes me laugh that people believe non-climate scientists have nothing useful to add to climate science but climate scientists are free to re-write the rule book on statistical methods.

There are some interesting articles on the importance of study protocol, data treatment and management with regard to medical studies (not much seems to exist with regards climate science – strange that!) plus some other examples of “when statistics go bad”, such as a very poor study from some years back on the MMR “triple-vaccine” linking it to autism which contained so many of the classic errors – data selection, lack of control experiments, etc. – with the consequence that the UK is now suffering an increase in problems due to parents opting out of the vaccine after the media frenzy that followed. I must dig some of them up at some point, I suspect they have a lot of read-across to the problems associated with the hockey stick.

25. Steve McIntyre

I agree with your thought on medical studies. I’ve done some research on procedural recommendations in a programme termed “evidence-based medicine”, which sounds like a unnecessary phrase. The proponents point out that the importance of review articles (read: multiproxy studies] for clinicians [read: policy-makers] and the tremendous selection bias in review articles, especially towards intervention approaches.

Ontario has been a center of this programme, although it has got international recognition. If you google “Guyatt Sackett”, you’ll see typical literature. Guyatt is from a squash-playing family, I know his brother and father and possibly him.

I’ve tried to get information on the “clear a priori” criteria said to be a virtue of MBH98 proxy selection, but you can imagine how far I’ve gotten with this request.

26. Spence_UK

I checked some of the sites I looked at, and came across the name Guyatt a few times. Unfortunately, most of the sites I have come across so far really only discuss topics with regard to medical science, and often comparison of distributions (which is of most interest to them), so it is difficult for non-statisticians to see the read across.

This link covers a some basic material, including highlighting the importance of study protocol, including such things as Criteria for managing missing and messy data should be discussed before problems are encountered. and Who is going to back-up data? Seemingly mundane elements of data processing must be worked out in advance of the study.

This link, the little handbook of statistics, has a fair amount of stuff in it, so much so I haven’t trawled through much of it yet!

This site again stresses the importance of the study protocol, with this little insight:

If you play coin toss with someone, no matter how far you fall behind, there will come a time when you are one ahead. Most people would agree that to stop the game then would not be a fair way to play. So it is with research. If you make it inevitable that you will (eventually) get an apparently positive result you will also make it inevitable that you will be misleading yourself about the justice of your case.

This sort of thing highlights the importance of having clear reasons for, and archiving of, rejected and unused series and exactly why it is essential in statistical analysis to know these things, otherwise there can be no confidence in the results.

If you don’t get a chance to look at any of those though, I would urge a quick check of the four paragraphs under the title “Research has consequences!” at this link, it includes some real gems!

27. Spence_UK

Oops I missed this link off, with a few interesting comments in it – such as If the authors have used obscure statistical tests, why have they done so and have they referenced them?, more stuff on protocol and more references to Guyatt!

28. John A

Re: #26:

If you play coin toss with someone, no matter how far you fall behind, there will come a time when you are one ahead. Most people would agree that to stop the game then would not be a fair way to play. So it is with research. If you make it inevitable that you will (eventually) get an apparently positive result you will also make it inevitable that you will be misleading yourself about the justice of your case.

This sort of thing highlights the importance of having clear reasons for, and archiving of, rejected and unused series and exactly why it is essential in statistical analysis to know these things, otherwise there can be no confidence in the results

This is the part that bothers me most about climate models and the results thereof, which are subject to publication bias of the modeller. In the real world of modelling, the climate models hinge on non-physical “fudge factors” in order to constrain a model to a given result (ie warming at a constant rate). The IPCC TAR was very explicit in explaining that climate models are “not falsifiable in the Popperian sense” because of these human-based factors.

29. Spence_UK

Indeed John, and the fact that people like Peter Hearnden are so flippant about the need to do this in statistical studies, I am not sure if the right pressure will come to bear on the climate scientists to tidy up their act in the way that happens with medical science (where mistakes have a more immediate and measurable effect).

The last link in #26 refers to this paragraph in particular:

One might seek comfort from the knowledge that the scientific method is based on replication. Faulty results will not replicate and they’ll be found out. However, the first report in any area often receives special attention. If its results are incorrect because of faulty study design, many further studies will be required before the original study is adequately refuted. If the data are expensive to obtain, or if the original report satisfies a particular political agenda, replication may never take place.

This page was written originally by Gerard Dallal in 1998. Gerard clearly has some great insight!

30. Hans Erren

See how one hundred years after Arrhenius nobody had checked his infrared spectrum
Langley infrared observations (1890) revisited
When did Arrhenius lower his calculations ?
More here:
http://hanserren.cwhoutwijk.nl/cooling.htm

31. McCall

Steve — I found the ClimateAudit site a few months ago, but I’m still catching up on some of the older threads (like “Medievel” last week).

FYI — I had noticed something else about the Wikipedia/Connelley spaghetti chart link in post 22 (ws recently in discussion with Tim Lambert about his accusing me of “misrepresenting Moberg” (7-Aug)). Anyway, this comes from my post 70 of http://timlambert.org/2005/07/barton3/all-comments/#comments

Specifically, I didn’t know if you or Spence_UK had already noticed the poor (as in misleading/deceptive) choice of colors of the multiproxy reconstructions of that Wikipedia/Connelley plot — the RED of Moberg’05 vs. the RED-ORANGE of Huang’04, under zoom. The hockey team supporter position is that even Moberg’s multiproxy reconstruction exceeds its own MWP peaks! False, as you can see that the 1979 end of Moberg’05 PROXY RECONSTRUCTIONS (RED 1-1979), are ~.1 oC less than the PEAK temp values found between 1000-1150. It is the Huang’04 reconstruction (RED-ORANGE 1500-1980), that PEAKS in 1980 near the Moberg’05 PEAK values found between 1000-1150 — Moberg’s multiproxy reconstruction in 1979, never reaches the 2 peak temperatures of the MWP, even though the color choices in the spaghetti make it appear so.

NOTE: This is different than the other obvious flaw of the HockeyTeam supporters constant reliance on the actual observations (in this case, BLACK, from Hadley Centre) to seamlessly append to the multiproxy reconstructions to form the “blade,” regardless of whether or not the MWP or the LIA is in evidence.

If you already have posted this, sorry to repeat it.

32. McCall

Correction: The hockey team supporter position LETS STAND THE APPEARANCE that even Moberg’s multiproxy reconstruction exceeds its own MWP peaks!

Perhaps my post was confusing, but I couldn’t get Tim Lambert to even acknowledge the misleading color scheme of this spaghetti. Again, like their comfort with ambiguously using “M&M” to help with their obfuscations, I’m sure they love this color scheme, second only to the BLACK Hadley plot being changed BLOOD-RED?

33. McCall

Steve — it looks like you’re very busy posting on several subjects.

I forgot that I summarized my problems with the wikipedia spaghetti on
at post 75.

34. Knut Knutsen

re #23 and #24, if the Python accuracy is paralleled in rest of comments, it tells a sad story. Anybody slightly familiar with Monthy Python would know that John Cleese is the customer, not the salesperson in the Parrot sketch… Again: facts turned upside down.

35. John A

Re: #34

Is this the place for an argument?

36. fFreddy

I can see the Deltoid post now …

!!! MCINTYRE SCREWS UP AGAIN !!!
This is a Cleese (tall, irascible, …). This is a Palin (short, amiable …). Clearly any serious person can tell the difference.

So, there you go. Anthropogenic Global Warming must be true – QED.
(C) Tim Lambert, 2005

37. McCall

Followed by the obligatory Wikipedia citation:

and a vehement defense by proxy of the now extinct, Norwegian Blue Parrot*.

* was actually the now extinct Norwegian Blue Penguin — could talk and sit on a perch, but is no more due to (well, you know)?

38. Steve McIntyre

One version of the skit ends with a rousing rendition of the Lumberjack Song. Let’s all sing together: “We’re the Hockey Team, we’re Ok…”

39. fFreddy

OK, I’m game :

We’re the hockey team and we’re okay
You’d better do … whatever we say

We measure trees. We eat your lunch.
We run the I, PCC
We want a bigger budget, and

Open for peer review …

40. Steve McIntyre

Maybe the Lumberjack Song would be appropriate. Look at the bio pictures of M and H, in which they chose to photographed in front of large cylindrical sections. http://www.climate2003.com/blog/hockey_team.htm

41. Ed Snack

Knut has the attribution correct, something I decided wasn’t worth the comment to point out, but this serious error that undermines all of Steve’s work to date seems luckily to evaded Tim lambert’s attention to date. I do like the comparison, and John_A, I’ve told you once !

42. TCO

Please stop with the trackback whoring, EMN. Combined with the comments you’ve posted on trying to make money from your blog, it’s highly suggestive that you are abusing the community here.

43. Greg F

RE:44
Chill TCO, it’s David Stockwell’s blog.

44. Posted Apr 27, 2006 at 5:58 PM | Permalink | Reply

I’m sorry. I was just editing some tags and damn thing sends off pingbacks to all the links in the article.

45. TCO

Re 2: I have a problem with his inverting the PC1 (if he did). How can one know which way to invert it? If one is really being objective?

46. Tony Mach

Has the image moved? The old URL was:
http://climateaudit.org/wp-content/spaghe2.gif

Is this the image?
http://www.climateaudit.info/data/climate2003/images/spaghe2.gif