This post got deleted in the server crash. I typically post up from written text, but usually do a final edit as I input onto the blog, so the present note will probably differ a little from what I posted up yesterday, but probably not materially.
Mann did not present anything germane that had not already been presented at realclimate or Ammann-Wahl. As a result, in my opinion, we had fully anticipated and dealt with all his points in our presentation the previous day. Certainly, he said nothing that caused me any concern about the validity of our points. Perhaps the most surprising aspect of his presentation was his response to a question about the verification r2 statistic, in which he denied ever calculating the statistic as that would have been "silly and incorrect reasoning". Given that we’d presented specific evidence that he had calculated this statistic, this seemed to me to be an unpromising line of defence for him.
Mann did not attend any of Thursday”s presentations. As I mentioned before, I did not have an opportunity to meet him as he had “moved on” before the Friday morning session even ended. I have reasonable notes on his presentation, but I am not a great note taker and there are definitely gaps in my notes. [Update: Mann PPT here]
His first slide was a figure showing the location of MBH98 proxies on a world map, based on a similar figure in MBH98. He said that “we have come a long way in 10 years”. As an editorial comment, while people often say this, mostly as a means of distancing themselves from MBH98, I’m a little puzzled as to what the exact advances are: what exactly are the advances? Hockey Team studies mostly are types of averages of small subsets of proxies chosen without any reported selection criteria“ where’s the advance even from Bradley and Jones 1993?
Mann then said that the current emphasis was to reconstruct spatiotemporal patterns and then showed a pretty animation from the Mann et al  interactive presentation, which showed world temperature maps from from 1750-1980 based on about half a dozen reconstructed EOFs from 1750 on. (SM: 6 EOFs do generate pretty pictures. However, in the early portions of MBH98 and in MBH99, there is only one reconstructed EOF and thus no reconstructed spatiotemporal pattern. Whether the EOFs are stationary is a big question. The pretty pictures from 1750 on have nothing to do with the controversial periods. I barely paid any attention to this anomation since I knew that it was irrelevant to any issue in dispute. But some observers thought it was very slick.)
While he presented the animation, Mann said the calculation of the NH average was “scientifically the least interesting”, replicating an ennui about NH mean temperature that we’d previously heard from Alley and Hughes. I’m sure that any policy-maker, or even any civilian, on the panel would have wondered at the disconnect between their pronouncements of NH mean temperature (in, say, press releases) and the present ennui. Nobody on the panel questioned this ennui (nor did they for Alley or Hughes).
Mann then said that it was “specious” to say that IPCC was based on the hockey stick. Soon we started to get the full panoply of Mannian vocabulary “silly”‘?, “incorrect”‘?, “completely wrong”‘?, “not legitimate”‘? were soon to follow, plus at least one more "specious". I think that he missed “spurious”‘?, but I could be wrong.
Mann then reported that MBH was the first reconstruction to describe “self-consistent errors”‘?, a point that we had made in somewhat different terms, as we had described the calculation of confidence intervals in MBH98 as one of its main selling points, contributing to the impression of new levels of statistical accomplishment relative to other multiproxy studies.
Mann went on to say that the “error bars were based on the spectrum of calibration residuals”‘?. My notes show that he then said “that would be completely wrong”, which is a very Mannian turn of phrase, but my notes don’t say what precisely was “completely wrong”. He mentioned that the residuals were “fairly red”‘? and in some cases “significantly red”‘? using a 95% CI. This presumably refers to the discussion of confidence intervals in MBH99. I’ve posted up on this before, reporting that the calculations are incomprehensible (not just to me, but to a time series specialist who asked to look at the matter, and also, as reported to the NAS panel, incomprehensible to von Storch.) MBH99 contains no statistical reference for their confidence interval calculations. I would be surprised if any reviewer of MBH99, either for GRL or for IPCC TAR, understood what he meant and the matter has accordingly been glossed over so far. Anyway no one on the panel asked what Mann to explain this methodology or provide a reference, which is too bad, because it would be nice to know what he actually did.
Mann then turned to an increasingly frequent talking-point (one raised on this blog a few days ago by Tas). He said that IPCC WG1 did not conclude that late 20th century temperatures were “very likely“ only “likely”‘? i.e. 60-70% confidence. The premise of this observation is that, if IPCC elsewhere made claims that exceeded the WG1 confidence, then the scientists of WG1 could not be held responsible for that. I’m not sure that this is correct, but that’s not the issue that interests me. My question is whether WG1 could even claim that Mannian confidence intervals were "likely" – for example, MBH99 confidence intervals were calculated using calibration residuals (using an overfitted methodology) , while the confidence intervals based on verificaiton residuals would presumably be larger (since the verification r2 is directly related to standard error in the residuals). We had already drawn the panel’s attention specifically to this point, but no one on the panel asked Mann about this.
Mann showed the Wiki spaghetti graph. He observed that Esper and Moberg were saying almost opposite things about centennial results. No one on the panel followed up to inquire what this inconsistency between Esper and Moberg at centennial scale meant about suggestions that the spaghetti graph demonstrated some sort of broad consistency. Mann pointed out that Oerlemanns’ results from glaciers suggested a less cold LIA than some of the other spaghetti graphs (supporting the MBH low-amplitude reconstruction.)
Mann said that the Wiki results demonstrated that the HS was not an artifact of tree ring network. Cuffey observed that there was some substantial sharing among networks. Mann replied that Moberg doesn’t share. (This is not strictly true, as there is overlap between Moberg and, for example, Mann and Jones or Crowley.) However, Moberg is relatively independent. (SM: Moberg undeservedly received relatively little criticism. Moberg contains no explicit criteria for proxy selection and is based on a tiny sample of 11 proxies, not all of which are well-chosen. As I’ve posted elsewhere, if you apple-pick a small subset, instead of cherry-pick, you can get a high MWP from the proxy population.)
One of the panellists asked Mann what was the reconstruction tolerance in 1000? Mann said that the scatter is 0.4 deg; “assume”‘? that the results are independent; thus, the tolerance had to be 0.4/sqrt (df) i.e. smaller than 0,.4 deg C for the NH average. Mann was asked: don’t you have to consider error bars in the reconstructions? Mann: only one that has them is Esper (SM: I don’t understand this comment and my note here may be inaccurate). Mann said that the jackknife uncertainty was les than 0.4 deg C.
Cuffey noted that the reconstructions were mostly based on tree rings except Moberg, and wondered again about 20th century divergence.
Mann presented a graphic showing reconstructions from forcing. He pointed out that the von Storch results were an outlier and said that he would recommend leaving out the VS simulation.
Applying a talking-point recently used by Bradley, Mann said that they were extremely aware of the uncertainties in MBH99 – that’s why they used the word in the title. He then showed a slide with the title of MBH99 – which included the word “Uncertainties”‘?. He said that they had emphasized the uncertainties of MBH99 all along, implying that, if others did not, then MBH could harly be blamed. As an editorial note here, I would merely observe that the press release that was issued for MBH99 was hardly a model of caution. (In business, you are responsible for promotional language in press releases, even if more cautious language is used in a prospectus.) The original MBH99 press release, which was widely disseminated (e.g. by AGU), said:
1998 was warmest year of millennium, climate researchers report
AMHERST MA — Researchers at the Universities of Massachusetts and Arizona who study global warming have released a report strongly suggesting that the 1990s were the warmest decade of the millennium, with 1998 the warmest year so far”
"Temperatures in the latter half of the 20th century were unprecedented," said Bradley.
The latest reconstruction supports earlier theories that temperatures in medieval times were relatively warm, but "even the warmer intervals in the reconstruction pale in comparison with mid-to-late 20th-century temperatures," said Hughes.
Read in its entirety, MBH99 used their confidence interval calculations to get to the result that 1998 was the warmest year of the millennium. They calculated 2-sigma error bars from reconstructions (the error bars being incorrectly calculated in my opinion) and then observed that 1998 was above the level of any prior year – thus the conclusion that it was warmest year. (Note that there was no comparison to proxies for 1998, as the proxies had not been brought up to date.)
Mann then said (perhaps in reply to a query about bristlecones – my notes are unclear) that the western US was important to the EOF1; it was a “sweet spot”‘? for estimating NH mean. No one on the panel asked for a further explanation of this. I checked this and was unable to confirm this claim; indeed, my calculations show the opposite. 5 of 6 EOF1 coefficients for the 6 gridcells from 112.5 to 122.5 W and 37.5-42.5N are less than the median (which includes the California bristlecone/foxtail sites) and the sixth is barely above the median. A “sweet sport”‘? is at 7.5N; 52.5-67.5E. The lowish weights in this area are observable in the color-coded diagram in MBH98 itself.
Mann then proceeded to showed a new reconstruction graphic (which as I recall was based on 7 series cherry-picked from Mann and Jones, 2003]. It had a somewhat high MWP, but, as always, the MWP levels are just below corresponding modern levels. Turekian asked: “would you sign your name to this? Mann said that he couldn’t decide between this and MBH: I like chocolate and I like mint.
There was some discussion of a mixed temperature/precipitation signal, but my notes are unclear.
Mann said that he was “more than aware”‘? of CO2 fertilization issues. He said that D’Arrigo and Jacoby et al was “remarkably consistent” up to 1800 with the North American tree ring PC1, which was “adjusted to remove CO2.”‘? On the previous day, we had presented a graphic showing that there was no adjustment to MBH98 figures. No one on the panel asked Mann about the adjustment. We had also showed a graphic from Biondi et al  in which the bristlecones up to 1800 were said to be remarkably consistent with Biondi’s Idaho reconstruction, which, unlike D’Arrigo and Jacoby, did not go up in the 20th century (but which was not selected by Mann for comparison.) No one asked why one was chosen and not the other. No one asked about the physical basis for the adjustment – Mann’s adjustment implies that CO2 fertilization at 3000 m kicks in at about 160 ppm and is saturated at about 175 ppm. [This is from memory of some calculations that I did and I'll check and edit.]
Mann then said that Ammann and Wahl had showed that MM were “without statistical or climatological merit”‘?, were “completely specious, not legitimate” and that results excluding “key proxies”‘?, “completely fails verification.”‘? He must have been going through withdrawal, as by this time, he’d gone about 20 minutes without saying “specious”‘?. However, he made up for this in very short order.
No one on the panel challenged his interpretation of our results. This was frustrating as we had explicitly stated on the previous day that we had NOT presented an alternate reconstruction, but had shown the impact of various alternatives, in particular, the effect of excluding bristlecones or reducing their impact by centered PC calculations. Draw a deep breath and consider for a moment what Mann (and Ammann) are actually saying: an MBH98-type reconstruction without bristlecones is "without statistical or climatological merit". You know what – we agree with that. Except that we ask: if a reconstruction without bristlecones is "without statistical or climatological merit", what does this imply about MBH98-type reconstructions and all the other proxies? It suggests to me that either the reconstruction method is no good or the proxies are no good or both. Of course, we also say that an MBH98-type reconstruction additionally using bristlecones is "without statistical merit" – a conclusion which is not rebutted by Mann observing that the reconstruction without bristlecones is also without statistical merit. We had explicitly raised this issue and also explicitly stated that the issue was not whether the reconstruction without bristlecones passes or fails an RE test, but whether the MBH98 reconstruction passes an r2 test, citing in addition Bürger and Cubasch on the inappropriateness of using an RE test (supposedly reserved for verification) as a means of choosing between models. The panel let Mann’s observations pass and did not ask him about these issues.
Mann went on to say that you “get same answer if you use a full data set”or if you use “correct PC retention”‘?; “if you don’t use PCA, you get the same”‘?; “as long as you use all the data”‘?. He said that the bristlecones were in the PC4, which needed to be kept using “objective selection rules”‘? (Preisendorfer’s Rule N). He said that MM “eliminate key proxy data”‘?. In keeping with realclimate practice, everything is in code words – “bristlecone”‘? is not mentioned in this context; it’s always “key data”‘?. On the previous day, we had explicitly discussed all these issues, noting the inconsistency between MBH claims of robustness to presence/absence of dendroclimatic indicators and the lack of robustness to bristlecones. Again, no one on the panel raised any questions here and Mann forged on.
Christy did ask Mann: “Did you calculate R2?” ‘? Mann’s answer was: “We didn’t calculate it. That would be silly and incorrect reasoning”‘?. Whenever I hear this statement in my mind, the following phrase runs through my mind: "I did not have r2 with that statistic, Miss Lewinsky".
We had discussed the verification r2 issue in considerable depth on the previous day, even showing a graphic in which Mann had shown verification r2 for the AD1820 step. However, no one on the panel challenged Mann either about his claim that they did not calculate the r2 statistic or why it would be “silly and incorrect reasoning”‘? to calculate the r2 statistic – a point which is not only not self-evident, but incorrect. Perhaps the non-statistical panelists were reluctant to step into an area where they were not experts, given Mann’s aggressive and dismissive response to Christy. However, Nychka and Bloomfield, as statisticians, should have stepped here. I’ve pointed out Nychka’s association with Ammann (he is acknowledged in Wahl and Ammann ); Nychka is a decent guy, but he should have made way for an independent statistician.
Cuffey asked Mann about the divergence problem – is it possible that the proxies are nonlinear and at a threshold? Mann responded by showing 3 series with high late 20th century values (probably from Osborn and Briffa, I’m not sure) and said that these showed no threshold, thus this was evidence that we were not yet at a threshold. No one challenged him on whether these were unrepresentative series, picked from a larger population (as of course they were), although cherry picking had been raised as an issue on the previous day.
Cuffey asked: Do you know the temperature a thousand years ago within half degree? Mann said that it was known “within 0.1-0.2 degree on a century scale.”‘? He was far more optimistic about confidence intervals than anyone else.
My notes show that Mann then said that the RE statistic was “favored by most statisticians; that statisticians don’t use r2″‘?. I don’t have notes on the question. Neither Nychka nor any other panelist challenged him on this point, although statisticians (and the panelists) use the r2 statistic all the time.
Roberts observed that the pre-1000 proxies were sparse.
My next notes are sparse. I show Mann as now discussing the RegEM method, saying that the “RegEM method”‘? was “not subjective”‘?; that Rutherford et al  (using RegEM) did what Bürger and Cubasch asked. We’ve had a little discussion on the blog about RegEm, but the panel was not in a position to contest any of this. At some point, Mann said that they had stopped using the MBH98 method more than 5 years ago and were now using RegEM (as Science mentioned). It should be noted that RegEM has nothing to do with tree ring PC calculations – it’s simply an alternative to the multivariate method in which the proxy network is regressed against the temperature PCs. As an editorial note, Rutherford et al  applied RegEM to the exact same network as MBH98, thus with the flawed PC series.
RegEM is a different multivariate method; its statistical properties are unknown in the sense of ability to estimate confidence intervals. I’ve not parsed through this yet, but I’m certain that they use calibration residuals again, if they calculate confidence intervals. I’ve pointed out that Rutherford et al  incorrectly collated the instrumental record into its calculations. It’s interesting that the methodology is so “robust”‘? that it’s insensitive to whether the instrumental record is collated correctly.
My notes show that Mann then said that he calculated the RE and CE statistics; that the r2 statistic was not “good”‘?, “not sensible”‘?. No one challenged him on this. He continued by saying: “I don’t claim to be a statistician”‘?.
In passing, MBH98 did not report CE results either. It fails the CE test, as well as the r2 test as we had pointed out the previous day. No one on the panel asked about this.
My notes shows that Mann once again said that you had to “look at spectrum of unresolved variance”, but I didn’t record the question. Unfortunately no one asked him for a statistical reference for this procedure, as I’d like to see what the procedure is and what the reference says about it.
North concluded the session by saying:”Thanks, Mike”‘?.
So where were we after this? For anyone familiar with our work, Mann didn’t lay a glove on us. I thought that the previous day’s evidence from Alley and Schrag was unhelpful to the Hockey Team, with some surprising admissions. I thought that D’Arrigo’s musings about cherry picking and cherry pie were an image that must surely trouble the panel. The issue of the Divergence Problem grew legs and clearly did trouble the panel. Hegerl was hard to understand, but admitted that confidence intervals for low-correlation recosntructions went from the "floor to the ceiling". Von Storch was severely critical on many counts, especially replication. Our presentation was severe as well.
And yet soemone like Kerr perceived the proceedings differently. He reported that there were two reconstructions that supported Mann (D’Arrigo and Hegerl) and that Mann had moved on from MBH98 methods over 5 years ago.
Who knows what the panel will ultimately report. I doubt that the panel will end up really drawing a line in the sand against the Hockey Team, but, based on the record of the presentations that I saw, I see only downside for the Hockey Team.