Goodell and Deflategate Science

Yesterday, Roger Goodell released his decision on the Brady appeal.

Most of the early discussion has been about Brady’s destruction of his cell phone. Brady has contested the NFL’s characterization of this incident here (see cover here), saying that he had replaced a broken phone; that they had already told the NFL that Brady was not going to turn over his cell phone and that Brady had no obligation to do so under the labour agreement; that they provided the NFL with records from the carrier of all calls and texts; that he had “never written, texted, emailed to anybody at anytime, anything related to football air pressure before this issue was raised at the AFC Championship game in January”; that Wells already had Jastremski and McNally’s phones (on which there were no communications from Brady until after the AFC Championship game).  More on this below.

My specific interest in the decision was how the scientific issues were dealt with, given that there were serious statistical and scientific defects in the Exponent report.   There isn’t very much in the Goodell decision about the science and statistics. Goodell adopted the Exponent report in total.  It also looks to me like Brady’s side did a totally ineffective job of confronting the Exponent report.

Goodell accepted Exponent’s finding that the full extent of the decline could not “be explained” and that a “substantial part of the decline” was due to tampering. Goodell says that the Brady side submitted “alternative scientific analyses (including the study presented by economists from the American Enterprise Institute)” and, as an expert witness, produced Dean Edward Snyder of the Yale School of Management, described as an “economist who specializes in industrial organization”.    Against them, the “Management Council” produced two Exponent scientists (Caligiuri and Steffey) and the Princeton professor who had originally reviewed the Exponent study.

The salient section is as follows (with two footnotes) :

I find that the full extent of the decline in pressure cannot be explained by environmental, physical or other factors. Instead, at least a substantial part of the decline was the result of tampering….

I took into account Dean Snyder’s opinion that the Exponent analysis had ignored timing… Dr Caligiuri and Dr Steffey both explained how timing was, in fact, taken into account in both their experimental and statistical analysis. They concluded based on physical experiments that timing of the measurements did have an effect on the pressure but that the timing in and of itself could not account for the full extent of the pressure declines hat the Partiot balls experienced.  Dean Snyder, in contrast, performed no independent analysis or experiments, not did he take issue with the results of the Exponent experimental work that incorporated considerations of timing and were addressed in detail in the testimony of Caligiuri and Steffey.

I also considered Dean Snyder’s other two “key findings”, as well as the arguments summarized in the NFLPA’s post-hearing brief, including criticism of the steps taken in the Officials Locker Room at halftime to measure and record the pressure of game balls[1]. I was more persuaded by the testimony of Caligiuri, Steffey and Marlow and the fact that the conclusions of their statistical analysis were confirmed by the simulations and other experiments conducted by Exponent. Those simulations and other experiments were described by Prof Marlow as a “first-class piece of work”.[2]

[1] There was argument at the hearing about which of the two pressure gauges Anderson used to measure the pressure in the game balls prior to the game. The NFLPA and Snyder opined hat Mr Anderson had used the so-called logo gauge.  On this issue, I find unassailable the logic that the Wells Report that the non-logo gauge was used, because otherwise neither the Colts’ ball nor the Patriots’ balls when tested by Anderson would have measured consistently with the pressures at which each team had set their footballs prior to delivery to the game officials, 13 and 12.5 psi respectively. Mr Wells’ testimony was confirmed by that of Caligiuri and Marlow. As Marlow testified, “There’s ample evidence that the non-logo gauge was used”.

[2] For similar reasons, I reject the arguments advanced in the AEI Report. The testimony provided by the Exponent witnesses and Professor Marlow demonstrated that none of the arguments presented in that report diminish or undermine the reliability of Exponent’s conclusions.

If Snyder’s testimony was as represented, he was a singularly poor choice of expert witness.  There are major errors, defects and adverse assumptions through the Exponent report and Snyder should have taken issue with them.  Why they wouldn’t have challenged the 67 deg F assumption of the simulations or the apparent gross error in Figures 26 and 30 (at CA here; also at ^ here)  is beyond me.

It’s also hard to understand why the Brady side would have produced an expert witness who hadn’t gone to the trouble of doing his own independent analysis.   As represented by Goodell, Snyder focused on the single issue of “timing”, claiming that Exponent had “ignored” timing.  While there are issues with how Exponent handled timing, it is ludicrous to say that they “ignored” timing issues.  Yes, their “statistical analysis” in Appendix A, from which ludicrous claims of “statistical analysis” are derived, ignored timing issues, but timing issues were front and center in the simulations and claims that Exponent “ignored” timing – if Snyder made such claims – are easily refuted.

On the other hand,  Goodell and Exponent’s characterization are always shadow boxing with reality.  Goodell said :” Dr Caligiuri and Dr Steffey both explained how timing was, in fact, taken into account in both their experimental and statistical analysis. ”  This isn’t true either. Timing was taken into account in their experimental analysis, but not in the statistical analysis (in Appendix A).   (By the way, I haven’t written except in passing about the statistical analysis in Appendix A as the simulations seemed to me to be the core of the prosecution case, while the statistical analysis was so stupidly irrelevant and pointless as to be worthless, but the fact that it is referred to here as a factor in the decision may cause me to revisit this.)

One of the most important, if not the most important, arguments in trying to make sense of events was the scenario in which referee Anderson used the Logo gauge for measuring Patriot balls and the Non-Logo gauge for measuring Colt balls, inattentively changing gauges between measurements – as NFL officials also did during half-time, despite the heightened scrutiny.  This scenario neatly reconciles a lot of otherwise discordant information, as discussed in previous posts.  This scenario was raised in the AEI article as well and in an early response by the Patriots.  If Goodell has correctly characterized evidence from Snyder and the NFLPA, they botched this issue as well. According to Goodell, they argued that Anderson had used the Logo gauge for measuring both Patriot and Colt balls, raising the problem of the approximate pregame match of Colt pressures and Anderson’s measurements.  This argument is moot if, as seems entirely possible, Anderson inattentively changed gauges.  Then the issue is how Anderson’s pregame measurement of Patriot balls (if done with Logo gauge) could have reconciled with Patriot pregame measurements.   On this narrower issue, there are a couple of possibilities: (1) the Patriot (Jastremski) gauge might have had a similar bias to Anderson’s Logo gauge.  Exponent’s analysis of gauge variation is wildly irrelevant to the problem as they limited their analysis to other examples of new Non-Logo gauges. Also, the NFL appears to have been in possession of the Jastremski gauge at half-time and could have tested its calibration, but it didn’t do so, apparently not keeping track of the gauge.  (2) while Exponent has plausibly shown that the additional pressure arising from Patriot gloving protocols would have worn off by the time of Anderson’s measurements, it also appears possible that Patriot pregame measurements were done while the balls were still impacted by gloving.

The AEI report had raised the issue of switching gauges, but did not carry out the more detailed analysis of the implications of that scenario on the transients and simulations.  The Brady side needed more than provided in the AEI report, but the switching scenario cannot be trivially dismissed either. Goodell stated: “The testimony provided by the Exponent witnesses and Professor Marlow demonstrated that none of the arguments presented in that report diminish or undermine the reliability of Exponent’s conclusions.”  I don’t see how anyone can responsibly assert that the switching scenario does not “diminish or undermine the reliability of Exponent’s conclusions.” It’s an important possibility that really does call into question the validity of Exponent’s claim that the decline in pressure cannot be accounted for by environmental and physical factors.

I noticed that Goodell’s decision added the word “substantial” in saying that “substantial part of the decline” was due to tampering. This word is new in the Goodell decision and is not actually stated in the Wells Report, which said instead that the decline “cannot be explained completely” by environmental and physical factors. As I reported previously, Exponent said that pressures in Exponent’s Game Day simulations were “noticeably higher” than observed Patriot pressures, but did not use the word “substantial” – undoubtedly because the difference was only 0.1-0.24 psi (see Figure 30 and Exponent page 62).

It seems to me that the use of the word “substantial” changes the hurdle. Would the difference of 0.1-0.24 psi, described as “noticeable” in the Exponent Report also be fairly described as “substantial”? I don’t think so. Read carefully, I do not believe that the Exponent Report, even on its own terms, supports the term “substantial” (as opposed to, say detectable).

Raymond Bradley and the Grand Old Duke of York

In today’s post, I’ll return to more typical Climate Audit programming.  Upside-down Mann’s mentor, Raymond Bradley, has somewhat surprisingly published an article (Balascio et al 2015) that supports a longstanding Climate Audit criticism of varve proxies. Bradley and coauthors did not report that their interpretation of an important Baffin Island series is upside-down to the orientation used in PAGES2K and numerous AR5 vintage multiproxy reconstructions.  It seems that proxies used by the Team are like the Grand Old Duke of York:

And when they were up, they were up,
And when they were down, they were down,
And when they were only half-way up,
They were neither up nor down.

Ruling out high deflation scenarios

Further to my series of posts on Deflategate, reader chrimony observed that my statistical analysis had shown that it was possible that there had been no tampering, but had not excluded the possibility of tampering.  This is a sensible observation, but raises the question of whether and how one could use the available statistical information to exclude tampering. This is analysis that ought to have been done in the Wells Report.  I’ve done the analysis in this post and the results are sharper than I’d anticipated.

For Logo initialization, any manual deflation exceeding de minimis of say 0.1 psi can be excluded by observations.  For Non-Logo initialization, statistical information rules out “high” deflation scenarios i.e.  deflation by more than the inter-gauge bias of 0.38 psi plus uncertainty, including deflation levels of ~0.76 psi reported in Exponent’s deflation simulations.  Remarkably, for Non-Logo initialization, the only manual deflation that is not precluded are amounts equal (within uncertainty) to the inter-gauge bias of ~0.38 psi.  Precisely why Patriots would have deflated balls by an amount almost exactly equal to the bias between referee Anderson’s gauges is a bizarre coincidence, to say the least.  I think that one can safely say that it is “more probable than not” that referee Anderson used the Logo gauge than that such an implausible coincidence.
Exponent’s Transients: Bodge or Botch?

In my first writeup, I observed that Exponent’s Logo transients appeared to be bodged too high, even with their unwarranted and adverse use of 67 deg F initialization (Exponent’s “temperature trick”). In today’s post, I’ve taken a closer look at the seemingly questionable calculation of the transients at 67 deg F, showing that the Patriot transients make sense only if initialization for the transients purporting to show Logo Gauge initialization were not actually initialized at 12.5 psi using the Logo Gauge (as stated and as is the purpose of the diagram).  My reverse engineering shows that the Patriot dry transient in Figure 27 only makes sense if the Logo Gauge read 12.81 psi at initialization or if the Master Gauge (not the stated Logo Gauge) was erroneously used for initialization.  If I’m correct, this is a very significant error –  a botch, rather than a bodge – for which one would expect a prompt corrigendum, if not retraction, of the corresponding calculations.  In a postscript to today’s post, I’ve attached a note on conversion from Logo and Non-Logo Gauge scale to correctly calibrated Master Gauge scale.

NFL Officials Over-Inflated Patriot Balls

One of the ironies of the NFL’s conduct in this affair is that it can be established that NFL officials (under the supervision of NFL Executive Vice President Troy Vincent) over-inflated Patriot balls at half-time, the only proven tampering with Patriot balls. Brady and the Patriots were unaffected by the overinflation by NFL officials, as they destroyed the Colts in the second half.

Exponent must have noticed the over-inflation by officials, as it is implied by the post-game measurements, but failed to report or comment on it. Their avoidance becomes all the more conspicuous because many of the texts at issue in the Wells Report pertain to an earlier incident in which NFL officials had over-inflated Patriot balls, much to Brady’s frustration and annoyance at the time.
More on Deflategate

By converting football pressures to ball temperatures under the Ideal Gas Law, it is possible to conveniently show Colt and Patriot information – transients, simulations and observations – on a common scale. I’ve done this in the diagram shown below, and, in my opinion, it neatly summarizes the actual information. Commentary follows the figure.

Deflategate and Errors in the Wells Report

Readers in the U.S. are doubtless aware of the “Deflategate scandal”, in which the NFL alleged that Tom Brady, the greatest quarterback of his generation, had conspired with an equipment manager and locker room attendant, to deflate a microscopic amount of pressure from footballs in the AFC championship game. The NFL seemed to be completely taken by surprise by the Ideal Gas Law and the fact that outside temperatures below calibration temperatures would result in much larger deflation without tampering.

The findings depend on the interpretation of statistical data by decision-makers – a topic that interests me.   I found the technical report by Exponent, Wells’ technical consultants, to be very unsatisfactory on numerous counts:

  • although they were reported by Wells to have considered “all permutations”, they hadn’t.  On important occasions, they omitted highly plausible possibilities that indicated no tampering and, on other occasions, they only considered assumptions that were most adverse to the Patriots;
  • on key occasions, it seemed to me that Exponent failed to properly characterize exculpatory results.

At the end of my analysis, I concluded that their key technical findings were simply incorrect and wrote up my analysis, now online here.

I watched both the AFC championship and the final. I have no fan commitment to the Patriots. As someone who’s played sports all his life and whose play has always been rushed, I am amazed at how time seems to stand still for great athletes such as Brady.

The summary is as follows.
Implications of recent multimodel attribution studies for climate sensitivity

Last year, a paper of mine (Lewis 2014) showing that the approach used in Frame et al (2005), which argued for using a uniform prior for estimating equilibrium (strictly, effective) climate sensitivity (ECS), in fact led to a unique, objective Bayesian estimate for ECS upon undertaking a simple transformation (change) of variables. The estimate was lower, and far better constrained at the upper end, than the one resulting from use of a uniform prior in ECS, as recommended in Frame et al (2005) when estimating ECS. The only uniform priors involved were those for estimating posterior probability density functions (PDFs) for observational variables with Gaussian (normally distributed) data uncertainties, where they are totally noninformative and their use is uncontroversial. I wrote an article about Lewis (2014) at the time, and a version of the paper is available here.

I’ve now had a new paper that uses an essentially identical method to Lewis (2014), but with updated, higher quality data, published by Climate Dynamics, here. A copy of the accepted version is available on my web page, here.

Scientific American article: “How to Misinterpret Climate Change Research”

A Scientific American article concerning Bjorn Stevens’ recent paper “Rethinking the lower bound on aerosol radiative forcing” has led to some confusion. The article states, referring to a blog post of mine at Climate Audit, “The misinterpretation of Stevens’ paper began with Nic Lewis, an independent climate scientist.”. My blog post showed how climate sensitivity estimates given in Lewis and Curry (2014) (LC14) would change if the estimate for aerosol forcing from Stevens’ recent paper were used instead of the estimate thereof given in the IPCC 5th Assessment Working Group 1 report (AR5 WG1). To clarify, Bjorn Stevens has never suggested that my blog post misinterpreted or misrepresented his paper.

The article also states, paraphrasing rather than quoting, “Lewis had used an extremely rudimentary, some would even say flawed, climate model to derive his estimates, Stevens said.” LC14 used a simple energy budget climate model, described in AR5 WG1, to estimate equilibrium climate sensitivity (ECS) from estimates of climate system changes over the last 150 years or so. An essentially identical method was used to estimate ECS in Otto et al (2013), a paper of which Bjorn Stevens was an author, along with thirteen other AR5 WG1 lead authors (and myself). Energy budget models actually estimate an approximation to ECS, effective climate sensitivity, not ECS itself, which some people may regard as a flaw. AR5 WG1 states that “In some climate models ECS tends to be higher than the effective climate sensitivity”; this is certainly true. Since the climate system takes many centuries to equilibrate, it is not known whether or not this is the case in the real climate system. LC14 discussed the issues involved in some detail, and my Climate Audit blog post referred to estimating “equilibrium/effective climate sensitivity”.

I sent Bjorn Stevens a copy of the above wording and he has responded, saying the following:

“Dear Nic,

because I have reservations about estimates of ocean heat uptake used in the ‘energy-balance approaches’, and because of a number of issues (which you allude to) regarding differences between effective climate sensitivity estimates from the historical record and ECS, I am not ready to draw the inference from my study that ECS is low. That said, I do think what you write in the two paragraphs above is a fair characterization of the situation and of your important contributions to the scientific debate. The Ringberg meeting also made me confident that the open issues are ones we can resolve in the next few years.

Feel free to quote me on this.

Best wishes, Bjorn”

Update 26 April 2015

Gayathri Vaidyanathan tells me that the article has  been changed at ClimateWire .  Certainly, the title has been changed, and I presume the text has been amended per the version she sent me, which no longer suggests misinterpretation. But Scientific American is still showing the original version, so the situation is not very satisfactory.

Update 28 April 2015

The text of the article has now been changed at Scientific American, although the title is unaltered. The sentence referring to misinterpretation now reads “Stevens’ paper was analyzed by Nic Lewis, an independent climate scientist.*” At the foot of the article is the note:

Correction: A previous version of this story did not accurately reflect Lewis’ work. Lewis used Stevens’ study in an analysis that was used by some media outlets to throw doubt on global warming.

Pitfalls in climate sensitivity estimation: Part 3

A guest post by Nicholas Lewis

In Part 1 I introduced the talk I gave at Ringberg 2015, explained why it focussed on estimation based on warming over the instrumental period, and covered problems relating to aerosol forcing and bias caused by the influence of the AMO. In Part 2 I dealt with poor Bayesian probabilistic estimation and summarized the state of observational, instrumental period warming based climate sensitivity estimation. In this third and final part I discuss arguments that estimates from that approach are biased low, and that GCM simulations imply ECS is higher, partly because in GCMs effective climate sensitivity increases over time. I’ve incorporated one new slide here to help explain this issue.

Slide 19


