Richard Smith on PC Retention

Richard Smith, a prominent statistician, has recently taken an interest in multiproxy reconstructions, publishing a comment on Li, Nychka and Ammann 2010 (JASA) here and submitting another article here. I’ll try to comment more on another occasion.

Today I want to express my frustration at the amount of ingenuity expended by academics on more and more complicated multivariate methods, without any attention or consideration to the characteristics of the actual data. For example, Smith (2010) attempts to analyze the MBH98 North American tree ring network without even once mentioning bristlecones.

Smith starts off as follows:

In this discussion, we use principal components analysis, regression, and time series analysis, to reconstruct the temperature signal since 1400 based on tree rings data.Although the “hockey stick” shape is less clear cut than in the original analysis of Mann, Bradley, and Hughes (1998, 1999), there is still substantial evidence that recent decades are among the warmest of the past 600 years.

Smith refers to MM2003, MM2005a and MM2003b, describing only one of a number of issues raised in those articles – Mannian principal components. Smith Smith describes the network as follows:

The basic dataset consists of reconstructed temperatures from 70 trees for 1400–1980, in the North American International Tree Ring Data Base (ITRDB).

The dataset is located at Nychka’s website here and is a mirror image of the MBH tree ring network that we archived in connection with MM2005a. 20 of these are Graybill strip bark chronologies – the ones that were left out in the CENSORED directory.

Pause for a minute here. Leaving aside the quibble that we are talking about tree ring chronologies rather “70 trees”, Smith has, without reflection, taken for granted that the 70 tree ring chronologies are 70 examples of “reconstructed temperature”. They aren’t. They are indices of tree growth at these 70 sites, which, in many cases, are more responsive to precipitation than temperature. Academics in this field are far too quick to assume that things are “proxies” when this is something that has to be shown.

The underlying question in this field is whether Graybill strip bark bristlecone chronologies have a unique capability of measuring world temperature. We discussed this in MM200b as follows:

While our attention was drawn to bristlecone pines (and to Gaspé cedars) by methodological artifices in MBH98, ultimately, the more important issue is the validity of the proxies themselves. This applies particularly for the 1000–1399 extension of MBH98 contained in Mann et al. [1999]. In this case, because of the reduction in the number of sites, the majority of sites in the AD1000 network end up being bristlecone pine sites, which dominate the PC1 in Mann et al. [1999] simply because of their longevity, not through a mathematical artifice (as in MBH98).

Given the pivotal dependence of MBH98 results on bristlecone pines and Gaspé cedars, one would have thought that there would be copious literature proving the validity of these indicators as temperature proxies. Instead the specialist literature only raises questions about each indicator which need to be resolved prior to using them as temperature proxies at all, let alone considering them as uniquely accurate stenographs
of the world’s temperature history

Most “practical” readers of this blog have no difficulty in understanding this point, whereas academics in this field prefer to consider the matter via abstract policies on PC retention, with Smith being no exception.

Smith’s approach was to regress world temperature against principal components of the MBH tree ring network (with all 20 Graybill chronologies) varying the number of retained principal components and examining the fit. Smith described the problem as an inverse regression i.e. “cause” (world temperature y) against “effects” – the proxies denoted as x.

While Smith says that this is a “natural” way to look at the data, I don’t think that OLS regression of cause against a very large number series is “natural” at all. On the contrary, if one looks at this methodology even with relatively simple pseudoproxies, it is a very poor method. (There’s a 2006 CA post on these issues that IMO is a very good treatment.)

In my opinion, if the tree ring series truly contain a “signal”, a much more “natural” approach is to calculate an average – an alternative that is seldom considered by academics in this field.

Reducing the number of proxy series in the X matrix makes the problem of OLS regression less bad. Smith characterizes the OLS problem as one of overfitting and says that a “standard method for dealing with this problem” is to transform into principal components. Smith then goes to the problem of how many principal components to retain.

I don’t think that one can assume that principal components applied to the NOAMER tree ring network will automatically lead to good results.

Preisendorfer, a leading authority on principal components who was cited in MBH98, provided the following advice in his text- advice quoted at CA here:

The null hypothesis of a dominant variance selection rule [such as Rule N] says that Z is generated by a random process of some specified form, for example a random process that generates equal eigenvalues of the associated scatter [covariance] matrix S…

One may only view the rejection of a null hypothesis as an attention getter, a ringing bell, that says: you may have a non-random process generating your data set Z. The rejection is a signal to look deeper, to test further. One looks deeper, for example, by drawing on one’s knowledge and experience of how the map of e[i] looks under known real-life synoptic situations or through exhaustive case studies of e[i]‘s appearance under carefully controlled artificial data set experiments. There is no royal road to the successful interpretation of selected eigenmaps e[i] or principal time series a[j] for physical meaning or for clues to the type of physical process underlying the data set Z. The learning process of interpreting [eigenvectors] e[i] and principal components a[j] is not unlike that of the intern doctor who eventually learns to diagnose a disease from the appearance of the vital signs of his patient. Rule N in this sense is, for example, analogous to the blood pressure reading in medicine. The doctor, observing a significantly high blood pressure, would be remiss if he stops his diagnosis at this point of his patient’s examination. ….Page 269.

A ringing bell.

Applying Preisendorfer’s advice, the next scientific task is to determine whether Graybill bristlecone chronologies truly have a unique ability to measure world temperatures and, if so, why. A step urged on the field in MM2005b,

Instead of grasping this nettle – one that has been outstanding for a long time – Smith, like Mann and Wahl and Ammann before him, purported to argue that inclusion of bristlecones could be mandated “statistically” without the need to examine whether the proxies had any merit or not.

Smith’s approach was a little different than similar arguments by Mann and Wahl and Ammann. Smith did a series of such regressions varying K, calculating the Akaiche Information Criterion and other similar criteria for each regression, ultimately recommending 8 PCs, still giving a HS, though one that is not as bent as the original. Smith begged off consideration of the bristlecones as follows:

I have confined this discussion to statistical aspects of the reconstruction, not touching on the question of selecting trees for the proxy series (extensively discussed by M&M, Wegman, Scott, and Said and Ammann/Wahl) nor the apparent recent “divergence” of the relationship between tree ring reconstructions and measured temperatures (see, e.g., NRC 2006, pp. 48–52). I regard these as part of the wider scientific debate about dendroclimatology but not strictly part of the statistical discussion, though it would be possible to apply the same methods as have been given here to examine the sensitivity of the analysis to different constructions of the proxy series or to different specifications of the starting and ending points of the analysis.

I strongly disagree with Smith’s acquiescence in failing to grasp the nettle of the Graybill chronologies. The non-robustness of results to the presence/absence of bristlecones should have been clearly reported and discussed.

Ron Broberg commented on Smith (2010) here. Broberg referred to a number of my posts on principal components and commented acidly on my failure to propose a “good rule for the retention of PCs”:

I’m listing some of Steve McIntyre’s posts on the number of PCs to retain. If, after reading these, you still don’t know what McIntyre believes to be a good rule for the retention of PCs, then at least I know I’m not alone. If I have missed something, please let me know.

While Broberg may be frustrated, our original approach to the problem was one of auditing and verification i.e. begin with the examination of MBH policy for retention of principal components. We tried strenuously to figure out what Mann had done and were unable to do so. Mann’s criteria for PC retention remain unexplained and unknown to this day. Broberg may be frustrated, but I can assure readers that I am far more frustrated that this important step in MBH remains unexplained to this day.

In the case at hand, until one resolves whether Graybill bristlecone chronologies are a valid temperature proxy, nor do I see the point of trying to opine on the “right” number of retained principal components. It seems to me that Smith begged the question with his initial statement that the 70 series in the NOAMER network were “reconstructed temperatures”. Maybe they are, maybe they aren’t. Surely that needs to be demonstrated scientifically, rather than assumed.

47 Comments

  1. Posted Jun 9, 2011 at 12:52 PM | Permalink

    Absolutely maddening!

    Quick response: While it’s clear that such basic issues haven’t been considered or addressed, perhaps it would be worthwhile to use Smith’s methods show the resulting graphs with and without bristlecone/Gaspe cedar. Use his methods to demonstrate how dependent the resulting curve is on inclusion of those specific “proxies”.

    Bruce

  2. Steven Mosher
    Posted Jun 9, 2011 at 1:24 PM | Permalink

    The new smith paper has at least this

    “In summary, recent papers published in this ¯eld have highlighted the sensitivity of paleoclimatic
    data reconstructions to choices in both data selection and statistical methodology. The fact that
    statistical methods that seem at ¯rst sight extremely logical, applied to well-documented datasets,
    can produce results totally at odds with previous literature, is an important warning against the
    automated use of statistical methods without consideration of the data to which they are being
    applied.”

  3. John T
    Posted Jun 9, 2011 at 1:42 PM | Permalink

    The best post-processing in the world can’t make dirty data clean.
    The best statistics in the world can’t make bad data good.

    If your analysis found either one of those statements to be false, you did something wrong.

    The hardest part of research is designing the experiment so that you get good, clean data. If you can do that, the rest is easy.

  4. Posted Jun 9, 2011 at 1:59 PM | Permalink

    Mann’s criteria for PC retention remain unexplained and unknown to this day. Broberg may be frustrated, but I can assure readers that I am far more frustrated that this important step in MBH remains unexplained to this day.

    Indeed. You keep coming back to the same old points. Thanks.

    Sad to see that ‘a prominent statistician’ is joining the expanded team – but not at all surprising. The way they’re going they’ll need to bring the whole of statistics over to keep this show on the road. But it all makes for a useful test of intellectual integrity, as one considers which statisticians to listen to in the future.

  5. Gary
    Posted Jun 9, 2011 at 2:01 PM | Permalink

    None of these people are biologists. Any conclusions they draw are invalid until they adequately address the biological questions involved with using data from biological specimens. Get some plant physiologists and morphologists to work on the strip-bark growth and divergence problems before doing any more statistical speculation.

    • Steve McIntyre
      Posted Jun 9, 2011 at 2:10 PM | Permalink

      Actually we talked to one of the leading world experts on strip bark in 2004, who, by coincidence, is at the University of Guelph and has actually studied the botany of strip bark formation in long-lived cedars. He said that cedars liked cool damp temperature, making the Gaspe series that much more improbably.

    • Posted Jun 20, 2011 at 1:52 PM | Permalink

      Although “strip barkedness” is *not* an important issue in dendro science as a whole (because such trees are a small part of the total dendro tree sample), I much agree with your overall point. You can only do so much with statistical improvements before you must revert to addressing fundamental subject matter issues that remain unresolved, and which may well be more limiting to the overall advancement of the science.

      And there certainly are some, no question. But there *are* research groups or individuals working on (at least some of) them. However, rather than the causes of strip-bark, the critical issues relate to other biological issues. These include, at a minimum, the following:

      1. The relative (and interacting) roles of various climatic limiting factors on ring parameters, and their relevant time scales, including all of the various topics related to the divergence phenomenon.
      2. The effect of carbon dioxide fertilization on tree growth, their interactions with climatic factors, and the effect of these on climate calibrations and reconstructions.
      3. The effect of tree size/age on ring parameters, optimal methods for series detrending, and the directly related issues of optimal field sampling methodology (big, relatively unadressed problem!).
      4. The effectiveness of a multivariate response variable approach.
      5. Interactions among the topics above.

  6. Craig Loehle
    Posted Jun 9, 2011 at 2:18 PM | Permalink

    It is hilarious to me that someone attempts to determine the “right” number of pc to retain using screwed up data. This also begs the question as to whether principal components is even the right thing to do when there are so many reasons for spurious correlations of growth of subsets of trees with recent climate.

    • srp
      Posted Jun 9, 2011 at 8:46 PM | Permalink

      It’s a bit like having a general rule for which root of a quadratic equation should be taken as the answer.

  7. Pat Frank
    Posted Jun 9, 2011 at 2:24 PM | Permalink

    Steve, given your discussion, if Prof. Smith had grasped the nettle of “proving the validity of these indicators as temperature proxies“, his analysis would have had no point except as a sterile statistical analysis, at best, and he’d have had no paper.

    As someone thoroughly familiar with written academese, and with the equivocation sometimes indulged when knowledge is lacking or when facts prove a little uncomfortable to the analysis, Prof. Smith’s comment that, “I have confined this discussion to statistical aspects of the reconstruction, not touching on the question of selecting trees for the proxy series (extensively discussed by M&M, Wegman, Scott, and Said and Ammann/Wahl) nor the apparent recent “divergence” of the relationship between tree ring reconstructions and measured temperatures (see, e.g., NRC 2006, pp. 48–52). I regard these as part of the wider scientific debate about dendroclimatology but not strictly part of the statistical discussion” is a clear example of this.

    The statement is a give-away that he knows there’s a basic problem of validating a physical correspondence of tree rings and temperature, and this is his way of allowing his analysis to go forward into publication without dealing with the underlying problem. If there are protests, he can now say he acknowledged the problem.

    Anyone who reads his paper and is familiar with academic waffling, and coming to that statement, will immediately understand that the published analysis really is a sterile statistical exercise and is scientifically worthless, and this is implicitly so-admitted by the author.

    Scientists interested in advancing the field would probably stop reading right there, unless they were also interested in learning the statistical techniques for their own sake.

    • Posted Jun 10, 2011 at 2:26 AM | Permalink

      The statement is a give-away that he knows there’s a basic problem of validating a physical correspondence of tree rings and temperature, and this is his way of allowing his analysis to go forward into publication without dealing with the underlying problem. If there are protests, he can now say he acknowledged the problem.

      Anyone who reads his paper and is familiar with academic waffling, and coming to that statement, will immediately understand that the published analysis really is a sterile statistical exercise and is scientifically worthless, and this is implicitly so-admitted by the author.

      But Smith still gets published and it’s the number of published articles, is it not, that is the key metric in the evaluation of scientists, leading to such little matters as whether they get the next grant or go out of business?

      Isn’t that just a little bit unhelpful?

      Cameron Neylon, the biophysicist and open science guru, was telling me on Wednesday about one of his current projects, on evaluation – how the evaluation of scientists has to be transformed to take account of the greater goods of openness. At the moment, as he explained on the second panel of the Royal Society do at the Festival Hall, if it’s a choice between writing up your work in three months to get published in Nature (the editor was right there on the panel) or cleaning up your code, your data and your metadata to make them useful to the wider community, there is no incentive at all to do the latter.

      I wondered whose smart money was going into such an important topic. Some of the more right-wing CA readers will be delighted to learn that it’s George Soros’ Open Society Institute that’s funding Neylon on this. Well, good on them.

  8. Posted Jun 9, 2011 at 2:33 PM | Permalink

    there is still substantial evidence that recent decades are among the warmest of the past 600 years.

    I certainly hope so!

    Still a far cry from saying modern temps are unprecedented for thousands of years. Essentially this new and improved hockey stick tells us what we already knew, that it was colder during the LIA.

  9. oneuniverse
    Posted Jun 9, 2011 at 4:51 PM | Permalink

    Richard Smith says that he has “confined his discussion to statistical aspects of the reconstruction”. He chooses not to discuss the (un)suitability of the tree-rings as proxies for temperature. So how does he conclude that “The results support an overall conclusion that the temperatures in recent decades have been higher than at any previous time since 1400” ?

  10. Frank
    Posted Jun 9, 2011 at 6:29 PM | Permalink

    You wrote: “In the case at hand, until one resolves whether Graybill bristlecone chronologies are a valid temperature proxy, nor do I see the point of trying to opine on the “right” number of retained principal components. It seems to me that Smith begged the question with his initial statement that the 70 series in the NOAMER network were “reconstructed temperatures”. Maybe they are, maybe they aren’t. Surely that needs to be demonstrated scientifically, rather than assumed.”

    Isn’t this what calibration and validation periods are supposed to do?

  11. Mambo
    Posted Jun 9, 2011 at 6:35 PM | Permalink

    Craig Loehle: “It is hilarious to me that someone attempts to determine the “right” number of pc to retain”

    I agree. My understanding of PCA is that the different extracted components are orthogonal (if necessary after appropriate rotation), therefore there really should be no more than a single component representing temperature (and this may not necessarily be the first one).

    Could someone explain why so many components are used for making the Hockey Stick?

    • Posted Jun 20, 2011 at 3:25 PM | Permalink

      The axes are indeed orthogonal but that doesn’t mean the causative factors that drive the variation across those axes are necessarily related to one and only one axis.

  12. EdeF
    Posted Jun 9, 2011 at 9:08 PM | Permalink

    I plotted White Mtn bristlecone pine ring width and density vs temperature at Bishop, CA
    the closest site (with altitude compensation thrown in) and couldn’t find anything that looked “linear”. I thought the definition of linear was, “line-like”, “having the shape
    of a straight line”. Couldn’t find anything close. Plots looked like someone had shot-up
    the side of a barn with a shotgun. I know that you can put the data into a regression program that will give you an equation, but, really, how good is it if the original data
    doesn’t look to the naked eye at least a bit linear?

  13. Geoff Sherrington
    Posted Jun 10, 2011 at 1:25 AM | Permalink

    Pat Frank – “Scientists interested in advancing the field would probably stop reading right there, unless they were also interested in learning the statistical techniques for their own sake.”

    The other course would be for interested scientists to do more and more calibrations of tree properties and lake sediment properties with plausible instrumental temperatures and observations of other variables.
    Tasmania’s huon pine, Lagarostrobos franklinii, has several properties that could make it a good study candidate. Unfortunately, it is distant from historical temperature recording stations that can give a long and accurate correlation. Yet, I suppose that its results to date are still included in global reconstructions because there are so few data points in the Southern hemisphere. It could even appear as that red dot on Steve’s reproduced graph in “Richard Smith on PC Retention” posted later than this post.

    Steve – It’s Ed Cook’s Tasmania huon pine series – which is regularly used in Team temperature reconstructions.

  14. Geoff Sherrington
    Posted Jun 10, 2011 at 1:27 AM | Permalink

    Correction to my above – Steve’s graph in “McShane and Wyner Weights”, not “Richard Smith on PC Retention”.

  15. UC
    Posted Jun 11, 2011 at 1:50 PM | Permalink

    While Smith says that this is a “natural” way to look at the data, I don’t think that OLS regression of cause against a very large number series is “natural” at all.

    “natural”, as in “natural calibration”? i.e. it is guaranteed that the unknown past temperature is distributed as the observed temperature?

  16. Hu McCulloch
    Posted Jun 12, 2011 at 10:19 PM | Permalink

    While of course if the data is Garbage In, any results are Garbage Out, I think it’s legitimate of Smith, as a statistician, to do a purely statistical analysis of this data and in particular to address the issue of how many PC’s to include.

    Unfortunately, Smith relies entirely on “inverse calibration”, regressing temperature on the proxies as if the proxies were the cause of temperature, instead of “classical calibration,” regressing proxies on temperature and then inverting. Smith’s approach gives an inconsistent estimate of the transfer function, excessively attenuated reconstruction, and inadequate confidence intervals. For the sake of argument, however, I’ll just take his approach as given.

    Although Smith provides an indirect preliminary statistical test for the significance of the PC’s as temperature indicators “without allowing for autocorrelation”, when he does get around to adjusting for autocorrelation, he does not provide even this indirect statistical evidence.

    The indirect preliminary test he provides can be inferred from the AIC statistics in his Table 1. Since AIC is (usually) minus the log likelihood plus two times the number of parameters, a decline in AIC indicates that the likelihood ratio statistic (twice the change in log likelihood) for the newly included variable exceeds 4, and therefore the chi-square(1) 5% critical value of 3.84. Since AIC generally declines from K = 1 to K = 8, most of the tests for individual significance would come out rejecting insignificance at the 5% level. It would have been better if he had started from K = 0 and had provided a joint test of the hypothesis that all 8 are collectively significant, but from the AIC values it looks like that would have been the case. (Some authors divide AIC by the sample size, which alters the way the LR statistic could be inferred from it. Smith should have made it clear how he was computing it.)

    However, when he goes on to acknowledge that there is important autocorrelation present, he does not repeat Table 1 using the autocorrelation-adjusted likelihoods. Therefore there is not even an indirect test of collective or individual significance that can be constructed from his paper.

    If one is to try correlating PCs with temperature, one should first confirm how many arise above white noise, using something like Preisendorfer’s Rule N, or at least an eyeball scree criterion. Unfortunately, Smith does not do this.

    Also, if this data does contain a hemispheric temperature signal, it is not clear why this would show up in more than one of the PCs. Smith does not address this issue.

    Steve has often pointed out that PCA and regression analysis can place either sign on a series, so that a “temperature reconstruction” may end up using a series with the sign opposite the theory that led one to use it to indicate temperature. I would like to suggest that one way this could be controlled for would be by splitting each PC into two components, according to the signs it attaches to the (presumably positive) temperature indictors. The total variance assigned to the PC could then be split between the two components according to their total weights, and the signed PC’s then sorted again by their assigned variance. As they enter the regression in the order of the importance, the sign of their regression coefficient can be checked and taken into account appropriately. If only variables with the “right” sign are admitted, the final reconstruction will only use the “right” signs on the underlying series, and no apparently explanatory power will be coming from “wrong” signs.

    Smith does mention in his last paragraph that there is some (unspecified) controversy (associated with M&M and Wegman) over the selection of series. This important disclaimer should have been put up front in the introductory “Brief Summary of the Controversy”, and not tacked on at the very end.

    Smith is also misleading on the period that is relevant for the HS controversy. Although he repeated refers to MBH99 and in his second paragraph phrases the problem as “what is the probability that the 1990s were the warmest decade of the [1000-2000] millenium?” (his brackets, the data he gives goes only back to 1400, as in MBH98. Everyone grants that 1400-1900 was a fairly cool period, being downright cold in the LIA portion, so it is not controversial to find that the 20th c is warmer than this period. The big issue is whether the MWP existed, and if it did, whether it was comparable to (or even warmer than?) the recent period.

    • Posted Jun 13, 2011 at 11:18 AM | Permalink

      The dataset is located at Nychka’s website here and is a mirror image of the MBH tree ring network that we archived in connection with MM2005a. 20 of these are Graybill strip bark chronologies – the ones that were left out in the CENSORED directory.

      Steve — Which 20 of Mann/Nychka’s 70 are the stripbark chronologies that should be excluded?

      The link in the left margin to “Steve’s Public Data archive”, at 38.114.169.124/data, is no longer active. Perhaps the URL got changed during the 2009 changeover?

      Also, M&M2005E&E says that an SI is available at http://www.climate2003.com, but that domain comes up as “for sale”.

      Nychka’s file only goes back to 1400AD, but Mann’s PC1 goes back to 1000AD. Is there a file that conveniently carries them back to their beginning, or at least to 1000?

      steve; most things from old website are at http://www.climateaudit.info/data/climate2003/

      the ad1000 network is different from the ad1400 network. see http://www.climateaudit.info/data/mbh99/ for ad100 info.

      the censored graybill chronologies were; # [1] “az510” “ca528” “ca529” “ca530” “ca533”
      # [6] “ca534” “co511” “co522” “co523” “co524”
      #[11] “co525” “co535” “co545” “co547” “nv510”
      #[16] “nv511” “nv512” “nv513” “nv514” “nv516”

    • UC
      Posted Jun 13, 2011 at 3:02 PM | Permalink

      Hu:

      Unfortunately, Smith relies entirely on “inverse calibration”, regressing temperature on the proxies as if the proxies were the cause of temperature, instead of “classical calibration,” regressing proxies on temperature and then inverting. Smith’s approach gives an inconsistent estimate of the transfer function, excessively attenuated reconstruction, and inadequate confidence intervals.

      I think in some recent paper it was said that

      It does not seem to be generally realized that the fitting should be done
      in terms of the deviations which actually represent ‘error’.

      ..unless the training sample (observed temperatures) and past temperature together form a random sample from a population. And if the past is like the present there cannot be anything unprecedented in the present temperatures ( wait a second, if one doesn’t use proxies after 1980, he can then claim that the training sample is like the past temperature, but unprecedentedness came after 1980? )

      • Hu McCulloch
        Posted Jun 14, 2011 at 10:26 AM | Permalink

        Good point, UC — If ICE (“Inverse Calibration Estimation,” ie regressing temperature on proxies) is used with a calibration period ending in 1980, the reconstruction will tend to look like pre-1980 temperatures, since the implicit Bayesian prior that yields ICE is that the past is a draw from the same distribution that produced the calibration sample. High post-1980 temperatures that are unprecedented in the calibration period will therefore tend to become unprecedented in the entire reconstruction as well.

        “Tend” is an important word, however, since it is not impossible, with a sufficiently extreme value of the proxy, for the reconstructed temperatures to end up outside or even well outside the calibration period. It’s just a lot harder for this to happen than it would be with consistent CCE (“Classical Calibration Estimation”, ie regressing proxies on temperatures and then inverting). Likewise, ICE CIs will tend to exclude a lot of possibilities that are consistent with CCE.

        Tierney et al (authors’ reply to comment by Willis and me on their Nature Geoscience 5/10/2010 article) defend ICE, citing Tellinghuisen, Fresenius J Anal. Chem (2000) 368: 585-588. Tellinghuisen argues that ICE has a smaller RMSE in small samples. This could well be true, since CCE has an undefined population mean and infinite second moment, while a completely uninformative estimator (such as all past temperature = 10) can have a finite RMSE relative to every conceivable actual past temperature. A better criterion is the size distortion of confidence intervals, which often is very bad for ICE.

        Unfortunately, I haven’t yet figured out how to interpret simulated size distortions for the Hunter-Lamboy CCE confidence intervals. CCE can be motivated with an uninformative diffuse prior on past temperatures. There should be no expected distortion when the true parameters are drawn from the prior in question, but when this is an improper distribution I get lost.

        (Fieller Classical intervals unfortunately put some of the probability out in limbo, which isn’t good, either.)

        • UC
          Posted Jun 14, 2011 at 2:17 PM | Permalink

          Good point, UC — If ICE (“Inverse Calibration Estimation,” ie regressing temperature on proxies) is used with a calibration period ending in 1980, the reconstruction will tend to look like pre-1980 temperatures, since the implicit Bayesian prior that yields ICE is that the past is a draw from the same distribution that produced the calibration sample. High post-1980 temperatures that are unprecedented in the calibration period will therefore tend to become unprecedented in the entire reconstruction as well.

          Yes, and smoothing methods ( http://www.youtube.com/watch?v=pCgziFNo5Gk etc ) can be used to boost the unprecedented part even more.

          Tierney et al (authors’ reply to comment by Willis and me on their Nature Geoscience 5/10/2010 article) defend ICE, citing Tellinghuisen, Fresenius J Anal. Chem (2000) 368: 585-588. Tellinghuisen argues that ICE has a smaller RMSE in small samples.

          Interesting, and NAS panel used the argument that we cannot control temperatures, so ‘controlled calibration’ (CCE) is inappropriate:

          The temperature reconstruction problem does not fit into this framework because
          both temperature and proxy values are not controlled.

          I’d look this another way, if we cannot guarantee that the unknown values are like the calibration sample (the reason can be, for example, that calibration points were fixed when designing the experiment, controlled) then we need to have CCE as a starting point. If we have an equation

          T=b*P+n, temperature is scale times proxy plus noise

          and the scale b just happens to be zero, we still get proper CIs, as the noise process n is assumed to be known. That’s how the prior kicks in with ICE. But if we have

          P=b*T+n, proxy is scale times temperature plus noise

          with pure ‘noise-proxy’, b=0, there is no way to predict temperature. Cannot divide by zero. No information about temperature at, say year 1000, before we take a look at the real proxies. Sounds better to me, as the reason to make these reconstructions is to see whether the past temperatures are like the present temperatures or not – using ICE would lead to circular reasoning.

        • UC
          Posted Jun 22, 2011 at 3:49 PM | Permalink

          Good point, UC — If ICE (“Inverse Calibration Estimation,” ie regressing temperature on proxies) is used with a calibration period ending in 1980, the reconstruction will tend to look like pre-1980 temperatures, since the implicit Bayesian prior that yields ICE is that the past is a draw from the same distribution that produced the calibration sample. High post-1980 temperatures that are unprecedented in the calibration period will therefore tend to become unprecedented in the entire reconstruction as well.

          As Mann08 is topical again these figures might be of interest (found from my Mann08 folder)

          a1

          a2

          Can’t remember exactly where these were originally posted, but the conclusion was that regEM is more or less ICE. Kemp11 brings the SSA smoothing back ( see https://climateaudit.org/2009/07/08/rahmstorf-et-al-reject-ipcc-procedure/ , I think the CI problem is still there), so my comment

          Yes, and smoothing methods ( http://www.youtube.com/watch?v=pCgziFNo5Gk etc ) can be used to boost the unprecedented part even more.

          might apply to this new ‘case’ as well,

          a3

        • UC
          Posted Jun 23, 2011 at 2:06 AM | Permalink

          “Tend” is an important word, however, since it is not impossible, with a sufficiently extreme value of the proxy, for the reconstructed temperatures to end up outside or even well outside the calibration period. It’s just a lot harder for this to happen than it would be with consistent CCE (“Classical Calibration Estimation”, ie regressing proxies on temperatures and then inverting).

          Then you need proxies that really are hockey stick shaped, the calibration period mean being unprecedented. Like Tiljanders. Why Mann08 Fig. S8 b starts at year 500?

        • UC
          Posted Jun 23, 2011 at 3:17 PM | Permalink

          Why Mann08 Fig. S8 b starts at year 500?

          Hmm, removing the first 3 proxies seems to make a big difference

          nt1

        • UC
          Posted Jun 28, 2011 at 10:52 AM | Permalink

          Why Mann08 Fig. S8 b starts at year 500?

          I forgot that Mann08 SI says

          As a further safeguard against
          potentially nonrobust results, a minimum of seven predictors in
          a given hemisphere was required in implementing the EIV
          procedure.

          ..and if I remove 4 Tiljanders there are only 6 predictors remaining for this step.

        • Posted Jun 28, 2011 at 11:09 AM | Permalink

          UC (Jun 28, 2011 at 10:52 AM) —

          Also see Mann09 SI Fig. S8, which shows both no-dendro/yes-Tilj (blue line) and no-dendro/no-Tilj (green line) EIV reconstructions.

          The green line is dashed prior to 1500 AD, signifying that this recon failed Mann09’s validation test at the 95% level. Unfortunately, the trace is so faint that it is hard to visualize.

          In the third graphic at this recent post, I overwrote the green dashes with a solid green line. This makes it easy to contrast no-dendro/yes-Tilj and no-dendro/no-Tilj — the most informative comparison, in my opinion.

        • UC
          Posted Jun 28, 2011 at 4:23 PM | Permalink

          Interesting, haven’t read Mann09 yet (takes about 5 years to replicate one Mann paper, still some small issues with MBH98). Mann09 S8 also starts at AD 500, same robust-7-threshold rule applies ?

        • Posted Jun 28, 2011 at 5:39 PM | Permalink

          Yes, the Mann09 recons start at 500 AD.

          I don’t know about the 7-threshold rule. The approach of Mann09 builds on Mann08, but does not appear to be identical. Mann09’s summary of their methods is

          We employ the global proxy data set used by [Mann08] comprising more than a thousand treering, ice core, coral, sediment, and other assorted proxy records spanning the ocean and land regions of both hemispheres over the past 1500 years. The surface temperature field is reconstructed by calibrating the proxy network against the spatial information contained within the instrumental annual mean surface temperature field [Brohan et al, JGRA, 2006] over a modern period of overlap between proxy and instrumental data (1850 to 1995) using the RegEM CFR procedure [Mann et al, JGRA, 2007] with additional minor modifications.

          Links to Mann09 text and SI at the post linked in my prior comment.

        • UC
          Posted Jul 5, 2011 at 7:33 AM | Permalink

          I don’t know about the 7-threshold rule.

          I’m trying to find a rational explanation. Hopefully it is not

          10 minus 4 Tiljanders is less than 7

        • Posted Jul 5, 2011 at 10:41 AM | Permalink

          Oh.

          This might be relevant, then: Tiljander’s thicknessmm, lightsum, and darksum passed Mann08’s validation test. A statistical metric (R or R^2 IIRC) had to be above a certain threshold. XRD did not pass.

          For some analyses, “all four” proxies appear to be in the count, while for others, the count of proxies that have passed validation is used. That would be these three Tiljanders.

          So maybe “10 minus 3 Tiljanders is 7” is what you are looking for?

  17. Posted Jun 13, 2011 at 8:00 AM | Permalink

    I’ve read some comments by climate scientists at blogs recently which has led me to believe that they really aren’t getting how spatial autocorrelation of similar series is preferentially selected by PCA. This is especially true when a subset of highly correlated data in question is oversampled heavily. Bristlecones with their prominent blade are well correlated (dendrograms by Eschenbach). It has taken a long time to come to grips with what is going on in the field but last years conference in Scotland? was full of PCA papers with abstracts discussing patterns in temp data of one sort or another. In proxy papers, where the proxies have 70 trees and NO effort is made to weight for area of coverage (haven’t read Smith yet, but the practice is standard in paleo datamashes), the result is far more likely to be garbage than not.

    In the Antarctic, the peninsula with a heavy trend (blade) was heavily oversampled with numerous surfacestation datapoints. It was far more analogous to the bristlecone mess than some may realize.

  18. Posted Jun 13, 2011 at 8:01 AM | Permalink

    Oh, and ditto on the averaging vs regression comment in the head post.

  19. Matt Skaggs
    Posted Jun 14, 2011 at 9:07 AM | Permalink

    It seems to me that the single input parameter of temperature can, theoretically at least, produce more than one PC. That is because temperature can correlate to length of growing season, nutrient uptake, and insolation, all of which can vary across the sampled population based upon local weather. Now of course if you are unsure about temperature correlation to PC1, the rest is really in the weeds. It has been a long time since I delved into dendro, but I recall that bristlecones do not necessarily produce a single growth ring at drier sites such as the White Mountains. A dry early season followed by an intense summer rain can produce a second set of rings in a single season. In general, if the site gets enough precipitation for adequate tree growth, bristlecone is replaced by fir and spruce, another reason why a wise dendro treads cautiously in the thin air of the American southwest.

    • Posted Jun 20, 2011 at 3:45 PM | Permalink

      Yes, I agree with your first point and in fact just made the same point in response to someone else’s comment above.

      Your last point though is general–this same thing (replacement by other species under a more favorable climate regime) is common to all treeline conditions, but it is not just an issue of precipitation amount. There are definite temperature acclimations/adaptations/differentiations involved between treeline and subalpine forest trees. There is no question about that. Nevertheless, there does need to be more attention paid to the effects of soil moisture on growth response in sites that in the past (or sometimes, assumedly in the present as well) were limited primarily by growing season temperature.

  20. Hu McCulloch
    Posted Jun 15, 2011 at 7:35 AM | Permalink

    A further elementary factor that Smith neglects but which should be taken into account in any study of how many (if any) treering PCs to include in a temperature reconstruction is CO2. The Law Dome goes back to 1010AD and up to 1975 and is available from CDIAC, so it can be spliced onto Mauna Loa, which begins in 1958. MBH99 themselves acknowledge that there could be a CO2 fertilization effect, even though their “correction” for it was in truth simply a bodge to adjust the shape of the HS shaft, that made no actual numerical use of the CO2 data. Since CO2 correlates with instrumental temperatures, there is a good chance that including it will kill any apparent significance of the TR data (even with stripbarks), but this should be checked.

    To Smith’s credit, he does actually reference M&M 2003, 2005GRL and 2005EE. An earlier paper by Li, Nychka and Ammann, which I discussed at https://climateaudit.org/2008/04/07/more-on-li-nychka-and-ammann/ , acknowledges that “several specific objections” to MBH98 had been raised, but then do not actually cite the papers in question, which of course are the M&M papers. Instead, they cite 6 papers which supposedly meet the objections raised by the phantom papers. At least statisticians still have citation standards!

    • Posted Jun 20, 2011 at 7:10 PM | Permalink

      “Since CO2 correlates with instrumental temperatures, there is a good chance that including it will kill any apparent significance of the TR data (even with stripbarks), but this should be checked.”

      Not nearly that simple, although it certainly is a complicating factor. Hard to follow exactly what you mean however.

  21. UC
    Posted Jul 5, 2011 at 2:22 PM | Permalink

    AMac

    For some analyses, “all four” proxies appear to be in the count, while for others, the count of proxies that have passed validation is used. That would be these three Tiljanders.

    So maybe “10 minus 3 Tiljanders is 7? is what you are looking for?

    Takes some time to figure out what screening I used. But the figure above ( http://www.climateaudit.info/data/uc/notilj.png ) shows step19 result without the following
    proxies (in red):

    proxy19 1 : tiljander_2003_darksum
    proxy19 2 : tiljander_2003_lightsum
    proxy19 3 : tiljander_2003_thicknessmm
    proxy19 4 : tiljander_2003_xraydenseave

    That is not a real hockey stick, because regEM practically gives the calibration mean as the reconstruction (the reason seems to be that there is no good correlation add all with the target). If you add Tiljanders, you get a hockey stick (as Hu observed above, even ICE-results can go outside the calibration range with some extreme proxy on board). But in my replication result there are 10 proxies in step 19. 10 minus 4 is 6, and this wouldn’t pass the 7-threshold.

    • Posted Jul 5, 2011 at 4:04 PM | Permalink

      I’m afraid I am not much help here.

      If you want to visualize the proxies or retrieve the data as archived by Tiljander, check my blog for the relevant posts.

      Jeff Id of the Air Vent did a bunch of Mann08 reconstruction emulations. Perhaps it would be useful to email him, or check his 2009/10 archives.

      By the way, the post-1985 Tiljander numbers as used by Mann08 weren’t archived by Tiljander — those files only went to 1895. So Mann08 “infilled” the 1986-1995 data with RegEM. I am hazy on this, but as far as I can tell, “infilling” meant “extrapolating”.

      Generating reasonable-looking data to fill in gaps in the reconstruction period doesn’t seem like a good idea. But generating data for use in one of the ends of the calibration period seems much riskier, to my unschooled eye.

      • Steve McIntyre
        Posted Jul 5, 2011 at 5:41 PM | Permalink

        AMac, the most notorious infilling was spotted early – Mann et al deleted the post-1960 portion of the Briffa MXD data and infilled it, thus “hide the decline”. No one in climate science seems to mind.

      • Posted Jul 16, 2011 at 3:46 PM | Permalink

        Catching up a bit, the data set used in Kemp11, glhad_eiv_composite.mat is almost like glglfulhad_smxx, non-screened proxies with global temperature target. Here’s where I am now with the replication:

        a1

        I was confused by the use of non-screened proxies, but Mann08 explains:

        For the EIV approach, which does not require that proxy
        data represent local temperature variations, results are compared
        by using several alternative data-selection schemes, including one
        that employs all available proxy records, another that employs only
        proxy records contained within the target hemisphere, and another
        that employs only the proxy data within that hemisphere that pass
        the temperature-screening analysis mentioned above.

        MAKEPROXY.m ‘screening on raw data (old)’ would result a reconstruction that is sometimes outside the 2-sigma uncertainties of Kemp11 Fig. 4A, so that must be oldish, incorrect screening method. The glglhadfulsm20 that is used for no-Tiljander example in http://www.meteo.psu.edu/~mann/supplements/MultiproxyMeans07/ seems to be non-screened EIV result as well.