Groveman and Landsberg

Jean S pointed out the following quote from the NAS Report and suggested that this be discussed:

The first systematic, statistically based synthesis of multiple climate proxies was carried out in 1998 by M.E.Mann, R.S.Bradley and M.K. Hughes (Mann et al. 1998); their study focused on temperature for the last 600 years in the Northern Hemisphere. The analysis was later extended to cover the last 1,000 years (Mann et al. 1999), and the results were incorporated into the 2001 report of the Intergovernmental Panel on Climate Change (IPCC 2001)."

Jean S drew attention to the following comment from my report on Hughes at NAS:

"Hughes said that the calculation of a global or hemispheric mean was a ‘somewhat tedious afterthought’ to these spatiotemporal maps. He cited Groveman 1979; and Bradley and Jones 1993 as originating the practice."

Here’s a discussion of Groveman and Landsberg 1979; I’ll discuss Bradley and Jones 1993 on another occasion. The conclusion is that MBH is NOT the "first systematic, statistically based synthesis of multiple climate proxies", as both the studies mentioned by Hughes would fit that description. I’ll discuss what the distinctive "contribution" of MBH actually was on another occasion.

MBH98 itself did not claim to be the "first" multiproxy study; they described themselves only as providing a “new statistical approach” and actually cited Bradley and Jones and several other studies (but not Groveman and Landsberg 1979):

A variety of studies [Barnett et a 1995; Mann et al 1995; Bradley and Jones 1993; Hughes and Diaz 1994; Diaz and Pulwary 1994] have sought to use a “Åmultiproxy’ approach to understand long-term climate variations, by analysing a widely distributed set of proxy and instrumental climate indicators1,5–8 to yield insights into long term global climate variations. Building on such past studies, we take a new statistical approach to reconstructing global patterns of annual temperature back through the 15th century, based on the calibration of multiproxy data networks by the dominant patterns of temperature variability in the instrumental record."

Jean S observed that Groveman and Landsberg 1979 is not cited in Bradley and Jones (1993), but is cited in Bradley and Jones (1995) as follows:

"The resulting time series [Bradley and Jones 1993] back to 1400 provides the best reconstruction of Northern Hemisphere "summer" conditions currently available. Many other review papers have intercompared reconstructions in the past (e.g. Williams and Wigley, 1983), but only one other composite series for the last few centuries had been published (Groveman and Landsberg, 1979). However, this series combines a number of records that are poorly calibrated in terms of climate response, leading to a composite series that is difficult to interpret."

Crowley, who, as one of the peer reviewers for NAS, presumably passed on the above statement by the NAS Panel, considered both the Groveman and Landsberg 1979 and Bradley-Jones 1993 NH reconstructions in Crowley and Kim (GRL 1996).

Groveman and Landsberg 1979 was recently considered in Thejll and Schmidt 2005 in the context of distinguishing OLS and Cochran-Orcutt methods.

The Groveman and Landsberg NH reconstruction has also been used in some solar correlation studies and was considered last year byThejll and Schmith 2005.

Jean S concludes that Groveman and Landsberg is "pretty much similar to HT methods, so this should get the "pioneering status" out from the hockey team."

Groveman and Landsberg 1979
So what did Groveman and Landsberg 1979 do? They questioned whether the NH temperature could be reconstructed from a small subset. They used 20 series, of which all but 3 were instrumental series. (The only other proxy study to use a lot of instrumental temperature series as “proxies” is, ahem, MBH98, which used 12 instrumental series.) Most of the G-L instrumental series are attributed to Bozenkova et al (1976 – Meteorologiya I Gidrologiya 7, 27-35). The three G-L proxy series were Alaska tree ring widths (Karlstrom 1961, Ann NY Acad Sci 95, 290), Finnish tree ring index (Siren 1961 — Comm Inst Forestalis Fenniae 54) and Tokyo winter temperatures (Gray 1974 — Weather 29, 103).

Their multivariate method can hardly be recommended. They do separate calibrations for each step in which the network changes. For each step, they do a multiple linear regression of the NH temperature index against the proxies, retaining “significant” values. In the 1872-1880 step, 8 “predictors” are selected for the reconstruction. In the earliest step (1579-1658), 3 predictors are available and used. Thjell and Schmith say that 28 such intervals occur." Groveman and Landsberg describe the procedure as follows:

A number of long time series showing the highest correlations with the NH temperature for the simultaneous periods of record 1881-1960 were selected…Selections from these series were combined in multiple linear regressions maximizing the explained variance. These regression equations were then used to reconstruct earlier hemispheric temperatures. For the first interval prior to 1881 (1872-1880) eight were chosen. The multiple correlation coefficient of this regression ws r=0.882, corresponding to an explained variance of nearly 78%, each independent variable contributing a significant portion of the variance…

The choice of independent variables is not only dictated by their statistical significance but also by the length of record prior to 1881, to permit as much reconstruction backward as possible. This procedure resulted in other combinations of variables to reduce the error of estimate and enhance the correlation in various time segments of the reconstruction…With a shrinking data base from 8 independent variables (1872-1880) to three (1579-1658( the standard error of estimate rose from 0.110 to 0.162 . The presently available data prior to 1579 would explain less than 50% of the variance and hence no further reconstruction was attempted.

Whether one puts any credence in the standard error of their estimates (I don’t), they did this 20 years before MBH98 put standard errors to their reconstruction. They concluded:

This reconstruction has demonstrated feasibility of estimating a mean hemispheric temperatures by using a statistical approach.

I’ve only had access to Groveman and Landsberg 1979, which reports coefficients from only one step (1872-1880). This is interesting because it illustrates an important point — the lack of regularity in the coefficients:

Table 1. Groveman and Landsberg Regression Coeffficients 1872-1880 Step

Site	Coefficient
Vienna	-0.164
Berlin	+0.222
DeBilt	-0.103
Finland tree rings	0.0003
Innsbruck	+0.109
Godthaab, Greenland	+0.036
Montreal	+0.050
Reykjavik	+0.055

I don’t know what units the Finnish tree rings are in. But note that two instrumental series (Vienna, De Bilt have negative coefficients) It’s obviously unreasonable in some sense that Vienna temperatures should be believed to have a negative correlation with NH temperature, while Innsbruck temperatures have a positive correlation. This shows that we’re simply getting spurious fits and showing why it’s not a good idea to do multiple inverse regressions. Please note the behavior of these coefficients as I’m going to discuss this in the context of VZ pseudoproxies.

It’s hard to figure out why Thejll and Schnmith re-visited Groveman and Landsberg 1979, other than perhaps because it was non-controversial. They re-examine the GL regressions and observed that the residuals from the regression were autocorrelated. Their illustration of the GL reconstruction and their version using Cochrane-Orcutt methods is shown below.

From Thejll and Schmith, 2005.

In my opinion, the inconsistent OLS regression coefficients are conclusive evidence of overfitting. (MBH also has some negative weights for instrumental series!) I don’t know what the Thejll and Schmith coefficients and don’t plan to do any more work on this data set. I’ll posting up on OLS methods as applied to a VZ pseudoproxy network, where inconsistent coefficients are also obtained. Keep the inconsistent G-L coefficients in mind. I’m surprised that this study has been used in solar correlation studies. To the extent that studies of forcing factors have relied on Groveman and Landsberg 1979, they need to be re-visited and no longer quoted as the series itself has poor statistical properties.

This entry was written by Stephen McIntyre, posted on Jul 1, 2006 at 2:01 PM, filed under Multiproxy Studies, NAS Panel and tagged groveman. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

32 Comments

Peter Hartley

Posted Jul 1, 2006 at 5:47 PM | Permalink

The negative coefficients might be the result of high collinearity between variables. For example, we might expect Vienna and Innsbruck to be highly correlated with each other while Berlin and De Bilt may be another highly correlated pair. In these situations, one often finds opposite signed coefficnets on the highly correlated pairs of variables. In additon, the coefficients move around a lot as the set of regressors is changed, which I gather also happened in this circumstance. By the way, the figure (?) from Thejil and Schmith did not render in my browser.
Peter Hartley

Posted Jul 1, 2006 at 5:56 PM | Permalink

Further to the previous comment, if we assume as a “benchmark” that the four series Berlin, De Bilt, Vienna and Innsbruck were identical, the “net coefficient” on that variable would be .222+.109-.164-.103 = .064, which is of the same order of magnitude as the other temperture series Godthaab, Montreal and Reykjavik. The larger coefficients (in absolute value) on the four European coefficients that neverthelss net out to what is perhaps a more reasonable value is another indicator that these variables might all be highly correlated.
Steve McIntyre

Posted Jul 1, 2006 at 6:12 PM | Permalink

I agree 100% that the opposite signing is undoubtedly due to collinearity. I’m going to discuss this type of effect in more detail in connection with the VZ pseudoproxy network as IMHO this is much more intimately connected with VZ variance attenuation than has been discussed in the strident discussions so far. A remarkable related feature is that the empirical lack of collinearity in the MBH network.

The whole idea of multivariate inverse regression in which you are regressing temperature against networks of proxies is something that I have trouble persuading myself to be a useful method – as opposed to the other extreme: simply taking an average.
John Creighton

Posted Jul 1, 2006 at 6:42 PM | Permalink

#3 If you use the singular value decomposition based pseudo inverse then you get the minimum norm minimum mean squared error solution. The minimum has the tendency to prevent collinear series from having alternating coefficients.
Peter Hartley

Posted Jul 1, 2006 at 6:42 PM | Permalink

I agree that a simple average of the collinear variables may make more sense than a weighted average produced by the above type of regression. This is especially so when the variables you are averaging are themselves temperatures. When there are true “proxies”, however, may want to determine some coefficient relating the proxy to the variable of interest. An analogous situation in economics involves extending GDP statistics back in time to periods before WWII when we didn’t have national accounts. People have noticed, for example, that industrial production indices for various sectors (eg steel, construction etc) in the WWII period are highly positively correlated with GDP. Such indices were complied in earlier periods. Hence, people have used the observed relationship between industrial production indices and GDP in the post-WWII era to construct proxy measures of GDP in earlier periods. One issue in doing this is that industrial proiduction is more variable than GDP so if you simply averaged the indices from different industries you would get an approximation for GDP that is too variable. The post-WWII regressions down-weight the indices to give a constructed GDP that has varoance more similar to post-WWII GDP.
TCO

Posted Jul 1, 2006 at 6:58 PM | Permalink

citation?
John Creighton

Posted Jul 1, 2006 at 7:08 PM | Permalink

#6 Try this paper:
David Smith

Posted Jul 1, 2006 at 7:32 PM | Permalink

Steve or John, my computer is now showing the dreaded "exapnded sidebar", which makes this site almost useless (unless I am able to guess about 25% of the words).

If this is a permanent situation, could you publish hints as to what I can do to overcome it?

By the way, I’ve never seen this happen on other websites.

Thanks in advance,

david

John replies: John Creighton didn’t get the memo
David Stockwell

Posted Jul 1, 2006 at 7:33 PM | Permalink

Sorry I don’t get your argument. The problem with the coefficients above is that the values are so low they are probably not significant in an autocorrelated context. Therefore they can be positive or negative just by chance. Even if they were very significant, and negative and positive, then I have not problem with indicators being inversely related to temperature. So whether the correlation are strong or weak, the lack of regularity says nothing about overfitting.

I think the whole question of which is the best model to use, which is what you seem to be getting into, should be seen in a cost/benefit framework. For example, the cost function could be over model degrees of freedom, flavors of analysis, increase in uncertainty, while the benefits are accuracy on an independent test. There is no right way, nor are there arbirtary 64 flavors. There is a constrained optimization that I sense tends to favor simple averaging but you would have to look at more formally.
TCO

Posted Jul 1, 2006 at 7:40 PM | Permalink

Looking for a citation to the article that is referenced here. Pdf would be great, but citation just means the actual article description enough that you could look it up in library. Author-year is not a citation. Creighton, I’ll take a look at what you gave, but I’m not looking for a miscelanuous article. want the one that is subject of discussion.
John Creighton

Posted Jul 1, 2006 at 7:53 PM | Permalink

#9 You are probably right that the values are too low to be significant. Didn’t Mann claim a much higher correlation?
John Creighton

Posted Jul 1, 2006 at 8:05 PM | Permalink

#9 as far as the best model you are right that you must trade the cost between the model order and how well it fits the information. One way to do this is the Akaike information.

http://en.wikipedia.org/wiki/Akaike_information_criterion

This technique effectively balances the fitted parameters with the variance in how well it fit the data.
Steve McIntyre

Posted Jul 1, 2006 at 10:41 PM | Permalink

#9. David and others, I’m going to put up a post showing what happens to the coefficients in VZ pseudoproxies, which is interesting. I also have an interesting diagram in inventory showing how various multivariate meethods interconnect. Until you spend some time foraging through the multivariate literature, it’s hard to really see how interconnected the methods are. Stone and Brooks (1990) have a pretty diagram linking OLS to partial least squares through "continuum regression" i.e. ridge regression is a continuum beween the two. But this is only the start: canonical correspondence analysis links to partial least squares and ordinary least squares through changing a regularization parameter.

The problem with negative coefficients is that there’s no reason why one temperature gridcell (Vienna) should have a negative coefficient while Innsbruck doesn’t. The model doesn’t make sense.

Everyone in the regression world is so used to viewing multicollinearity as their enemy and something to be feared that it’s easy to lose sight of the fact that multicollinearity is exactly what you want when you have a network of pseudoproxies and orthogonality is your enemy.
Peter Hartley

Posted Jul 2, 2006 at 7:52 AM | Permalink

Multicollinearity is an “enemy” (or a “problem”) when your theory suggests that two variables X1 and X2 ought to have different effects on the dependent variable Y but it just so happens that over the sample period X1 and X2 were so highly correlated that you cannot distinguish their effects. Hence, you cannot test the theory. In the above example, however, the natural hypothesis is that Vienna, Innsbruck, Berlin and De Bilt temperatures are likely to be fairly highly correlated and to have a similar relationship to the northern hemisphere average temperature. They can be thought of as noisy measures on the same “Western European” temperature signal. Evidence of multicollinearity is consistent with that hypothesis and certainly not a “problem” for it. In short, whether or not multicollinearity is a problem depends on what hypothesis you are trying to test.
TCO

Posted Jul 2, 2006 at 8:13 AM | Permalink

similarly changing both off-centering and standard deviation at the same time in your test case, does not allow you to tell which is wrong or that both are flaws. [/broken record]
Steve McIntyre

Posted Jul 2, 2006 at 10:01 AM | Permalink

As a method, the bias comes almost entirely from the de-centering. The use of “detrended” standard deviations contributes somewhat to it but is very secondary (Zorita’s implementation does not have this second aspect.)

While I am reluctant to call any method “right” or “wrong”, unusually in the decentered case, one can call it “wrong” because of its bias. This position was endorsed by von Storch/Zorita and by NAS.

Neither covariance/correlation is per se a “flaw”. In the MBh98 network, because bristlecones have lower standard deviations than other chronologies (usually considered a sign of lack of climate sensitivity), covariance and correlation PCs yield somewhat different results – but neither is “right”, a point that somehow eludes you. Huybers was wrong to argue that use of covariance PCs was “biased” the other way from MBH98 and correlation PCs somehow went down the middle. Any result needs to be proven as a temperature proxy. The NAS panel endorsed our position (although they did not explicitly rebuke Huybers).

The instability in results does not occur in a “tame” network and points to problems in the network, which are associated with bristlecones. This is not a mixing of methodological and proxy issues. Statisticians testing robustness use a variety of exploratory methods to try to identify outliers and problem series. But when the outliers are identified (Hampel), it is a scientific and not statisitcal issue. In this case, the problematic series (bristlecones) were identified by noting differing results under different PC methods and their inclusion/exclusion becomes a scientific not statistical issue. The pre-Mann scientific opinion had been to exclude them. We pointed this out and the NAS panel has endorsed their exclusion.

TCO, you may think that our Reply to Huybers was too argumentative and perhaps it was. But the points were still much better than his points on the other side.
TCO

Posted Jul 2, 2006 at 10:54 AM | Permalink

It doesn’t “elude me”. Reread my remarks. You’re the one who is trapped in a box of thinking that H was attacking you and that you must repond. My point is that when you want to look at the impact of one effect, you change that effect. If you change two things at the same time it is obtuse, ugly and potentially misleading. I won’t back down. I won’t back down.
TCO

Posted Jul 2, 2006 at 11:09 AM | Permalink

I am fine with the idea that certain methods are susceptible to problems with certain types of data. I’d like it mathematically defined, though. Call me Lord Kelvin! 🙂

Regarding the bristlecones and the NAS position: you make too much of an appeal to authority. NAS was wrong about “Mann being first”. They are no great shakes. I care about explicit reasoned arguments. Not how many diplomas they have or how many feathers coming out of their ass. Show me the money, the meat. The correlation coefficients. That’s what I hunger for. Finally, don’t conflate the possible tainting of the bristlecones as data with methods that respond to distinct shapes. MECE baby, MECE.
TCO

Posted Jul 2, 2006 at 11:22 AM | Permalink

Thanks for the remarks about H. What I really liked about it was the clarification that covariance matrix was two degrees of seperation from Mann. Not the issue of correlation versus covariance, where I’m agnostic but the issue of clarity of effect versus cause, where I am strident. But it’s cool, man.
Ken Fritsch

Posted Jul 2, 2006 at 2:18 PM | Permalink

ref #18

Regarding the bristlecones and the NAS position: you make too much of an appeal to authority. NAS was wrong about “Mann being first”. They are no great shakes. I care about explicit reasoned arguments. Not how many diplomas they have or how many feathers coming out of their ass.

TCO, is this appeal to authority critique becoming as obsessive as the appeal to Steve M to publish and, if so, do you see anything contradictory here? When one publishes one cites other authorities on a specific point and without consideration that that authority may have been incorrect on another unrelated point — or would have Steve M publish without references (to authority).
TCO

Posted Jul 2, 2006 at 2:35 PM | Permalink

I see citations very differently from you. Citations show traceability and benefit my readers if they wanted to check things or to dig into the context, detail or closely related issue of the subject more thoroughly. This is the case with citations of method, citations of introduction (previous relevant work) and citations of comparative studies.

I’m actually blown away that you see citations more as a case of dressing papers up. Maybe that’s how you write?
Ken Fritsch

Posted Jul 2, 2006 at 4:19 PM | Permalink

re #21

I see citations very differently from you.

I doubt it. I see Steve M’s reference to NAS in his post in the same manner as I would view a citation in a published paper but less formally and every bit within your context. Self-deleted by poster: a further discussion of why publishing a paper can be construed as a healthy process of appeal to authority.
TCO

Posted Jul 2, 2006 at 4:52 PM | Permalink

Could you expand, Ken? Last post was garbled.

Let me say how I look at references and blather on a bit since this is actually something that I care about. At least there will be some new content, not just refrains to get Steve to do something.

When I read a paper, I look at references. If it’s an interesting paper, I’ll go and get all the references. (copying is cheap, time is precious.) Then while trying to read the paper very thoroughly, I can look at the references and better understand what the paper is trying to say. (science papers can be very clipped, can be a bit pretentious, etc.) In addition, I get a feel for the broader context. When I write a paper, I try to do the same thing and give the reader those references that would help him. Some are basic “mini-lit review” items from the introduction. Some are references to comparable work that is specifically mentioned in the given article, but where there is not sufficient room to recap. (For example, we found a correlation of .8 to precip in lollypop trees, this compares to a .6 correlation by Fritts in 1960.) And in some cases, there is some detail of method which for brevity, it is not desired to repeat. For instance, we preserved the specimen using the method of Douglass (citation).

Things that annoy me in citations:
(1) Any clerical inaccuracy. I knew an on-the-cusp-of-Nobel scientist who would go to the library and handcheck all the references in papers written by his gradstudents. When you write a paper, you are contributing to the archived liuterature. People will look at and use your paper 40 years from now. Don’t screw up the citation. (This is why it irks me when Steve makes references to imprecise citations like VS04 or the like).
(2) A bogus citation. Something where a description of method was needed, but author could not find it, and could not be bothered to recap it himself. So he puts something that looks like a ref to a procedure, but when you track it down, it’s not. I knew a research group that did this with about 10 of their papers. Someone got lazy on the first one and then the just kept pushing it forward. I found it and that stopped being repeated.
(3) Overly pushing your own work or slighting competitors. Pretty obvious that you should not do this sort of thing. That said, I do think there is a good point to referencing most of your own similar papers done in an area of research. Not for patting yourself on the back, but so that future readers can pull all the relevant papers and follow the general thread of work.
(4) Lazy citations. This is a weaker form of error 2, but think of the heartbeat paper in the paper that I criticized (my thread). They wanted a general paper on autocorrelation of time series. Instead of picking a textbook or some classical paper (mandlebrot, Nile, classical econ paper, etc.), they went with a single, not so important paper on heartbeats. I BET this was a lazy grad student, just checking a box, not thinking about things from the persepctive of a future reader.

—————–

The concept that you have of citing papers as authorities to bolster weak work is a strange one. I am actually not that familiar with it. I take more of a nescessary, but not sufficient regard towards the literature. I’ve read plenty of papers and know that they vary in quality. I don’t like (nor am I used to in the physical sciences) someone buttressing a weak argument by reference to a paper as an authority. I like (and usually see) them reference other papers as sources of information (well documented) and as part of an integrated argument or inference. Just as someone would refer to a link on the net for more information as part of an argument, but not as obviating the need to tie everything together logically.
Ken Fritsch

Posted Jul 2, 2006 at 5:55 PM | Permalink

re #23

The concept that you have of citing papers as authorities to bolster weak work is a strange one.

Even though I may periodically require a good lecture I think you are wasting space doing it here. I agree in essence with your comments on references. I certainly did not even imply that anyone should cite papers to bolster weak work. Here is what I mean: The process of publishing a paper is in a way an appeal to authority in that if one thinks they have something important to say to a wide audience the alternative of publishing requires submitting to an authorized source where the appeal to the authority of that publication gets you that audience –it does not mean you even have an enduring respect for that organization.

A paper reference assumes firstly an appeal to authority as in “leading authorities” in such and such a technical area or at least someone who has been authorized to publish. The worth of that reference to an individidual reader is than finally determined by the content of that reference and that individual’s evaluation of it (not by an off hand comment by a TCO). In this specific case I think some of us see the multiple personalities of NAS (and perhaps the reasons for them having that condition) and we judge that we are fully capable of determining which is operating (in agreement in this case with Steve M).
Ken Fritsch

Posted Jul 2, 2006 at 6:09 PM | Permalink

When I was posting about data mining at an investment web site, the issue of AGW was broached as an off topic discussion and I wondered at that time how much potential climate studies and relationships had for data mining. I knew that computer models were programmed initially using know physical phenomena but that with the assumptions and fine tuning required that over fitting the data there was possible. I did not have the time or background to research the current state of computer programs that were being calibrated on past temperatures.

I was at that time able to find the 1991 study published in Science by E. Friss-Christian and K. Lassen that closely correlated the variation the length of sun spot cycles with temperature for both 130 years of instrumentally recorded temperatures up to 1989 and back to 1550 using very incomplete temperature proxies. In later papers I believe seeing a critique of the completeness of their instrumental temperature records. I believe the sun spot data was passed through a low-pass filter and some “optimizing” of fit was accomplished by it.

Variation in sun spot number had been correlated previously with temperature but suffered from evidence that it lacked reasonable cause and effect relationship. The sun spot cycle length variation and temperature correlation was criticized for lack of convincing physical evidence.

Finally, one of the original authors, Lassen, and a new colleague, P. Thejll, reported in 2000 that the fit of sun spot cycle length and temperature was diverging dramatically after 1980 and explained the divergence as a potential case for an AGW signal now being seen above that from the sun spot cycle length variation. W. Soon criticized the authors for basing their claims on predictions of future cycle lengths now and for the original hypothesis lacking in physical explanation. Soon did not reject the influence of the sun on the recent time temperature increases, but simply pointed to the potential statistical problems with this study and its predecessors.

To my mind, this study appears very analogous to an investment strategy that has been derived from data mining and the over fit nature of the in-sample results becoming apparent only after sufficient years of seeing out-of-sample results. The close correlation claimed in the study made the “divergence” more apparent. The claims and methods for the HS certainly show much evidence of data mining, but since in their case the in-sample results show little or no correlation, out-of-sample results will be of little evaluative worth.
Ken Fritsch

Posted Jul 2, 2006 at 6:26 PM | Permalink

Re #25

Holy moly, TCO, none of this philosophical discussion on references helped me one bit to make my links. The first one worked (and if I can remember, I will have learned the proper procedure), but there are 2 links in paragraph 4 one starting after P. Thejll and the other after W. Soon.

John replies: I have fixed the links.
Steve McIntyre

Posted Jul 2, 2006 at 7:29 PM | Permalink

#25. Ken, because of so many years in stock-market oriented business, I am very much aware of the analogies that you mention. Indeed, the failure to guard against such behavior on the part of multiproxy authors is very frustrating. Ferson et al 2003 is nice on the specific interaction between data mining and autocorrelation in a stock market context. I’ve discussed this on the blog and the points in Ferson et al 2003 are worth bearing in mind for proxy authors. We asked the NAS Panel to consider this.

Actually, one of our objections to NAS about the composition of the panel was that there was no one on the panel that had this viewpoint (NAS panels are supposed to cover a broad viewpoint.) While there’s much to content oneself in NAS findings, I feel that a more representative and even more independent panel might have gone a little further, but, hey, one step at a time.Also, Ken, you didn’t mention it, but the Lassen study that you linked to, used the Groveman and Landsberg series mentioned here.
TCO

Posted Jul 2, 2006 at 7:33 PM | Permalink

Sure we’ve talke about overfitting and the divergence problem. That an intricate, very “fitted” theory then diverges in the future ought to make one worry about any extension backwards in time as well. No disagreement.
TCO

Posted Jul 2, 2006 at 7:45 PM | Permalink

Off topic, but do you want to know how to be crowned as a stock genius and make a lot of money?

Pick 4 high beta stocks: A, B, C, D. Take a list of 64,000 names from a dial for dollars mailing list, divide into 4 groups of 4,000. Mail out letters to each group: Hi, I’m Steve M., the stock genius. I want you to try my paid newsletter for “stock locks”. Just to show you that I can pick them, I’ll tell you free that stock A, (B, C, D) is a lock to go up 20% in the next month.

Now, throw away the names of any stocks that did not lock and rock. Let’s assume one stock (B) worked. Keep the B names, chuck the others. Pick a new group of stocks: E, F, G, H. Divide the B list of 16,000 into 4 groups. Rinse, lather, repeat.

When you’ve done it 3 times, you’ll have 1,000 people ready to have your babies if you’ll just keep sending them picks. Sign them up for the newsletter at $395.00/year. Publish some blather. Let them churn their accounts with their brokers.

Repeat with another list of 64,000 names. It’s a big country…
TCO

Posted Jul 2, 2006 at 7:48 PM | Permalink

Oh…and if you want me to stop blathering about Steve publishing don’t bring it up. My take is basically that publishing does not assure that one has clear logic, it makes it much more likely…v
Ken Fritsch

Posted Jul 2, 2006 at 8:25 PM | Permalink

re #29

That ones been around the block more than a few times.

re #30

My take is basically that publishing does not assure that one has clear logic, it makes it much more likely…v

Not that there is a major concern here but in that case could you gives us a clear indication of your publication bibliography? Why would it not surprise me if it turns out you published 30 articles on nearly same topic.
TCO

Posted Jul 2, 2006 at 8:28 PM | Permalink

Much less then that.