The “Blade” of Ocean2K

I’ve had a longstanding interest in high-resolution ocean proxies (with posts as early as 2005 – see Ocean Sediment tag) and had already written detailed reviews of many of the individual high-resolution series used in Ocean2K (e.g. here here here here here here). In these prior discussions, the divergence between the 20th century proxy data and 20th century instrumental data had been a major issue.  The non-bladedness of Ocean2K data was therefore unsurprising to me.

Although, for their main figures, the Ocean2K authors made the questionable decisions to voluntarily degrade their data both into 200-year bins and from deg C to SD units, in their Supplementary Information, they identified a network of 21 series with high-resolution extending into the 20th century, showing results in 25-year bins but only for the short period 1850-2000, once again re-scaling, this time using only six values (25-year bins) for each series.

In my first post, I had undertaken to examine their data in higher resolution and will do so today using their high-resolution network – without the needless re-scaling and over the entire 0-2000 interval. The results clearly confirm the absence of a 20th century blade.  The Ocean2K authors were singularly uninformative about this obvious result; I’ll show how they worked around this “problem”.   I’ll also discuss untrue claims by Ken Rice (ATTP) and other ClimateBallers that the Ocean2K data “finishes in 1900” or is otherwise too low resolution to permit identification of a concealed blade.

Background: the Ocean Proxy “Divergence Problem”

I had initially become interested in high-resolution ocean data (especially alkenone and Mg/Ca, rather than dO18) because, as opposed to tree rings, they were directly calibrated in deg C according to standard equations (not ex post correlations).

Alkenone series are based on the ratio of C37:2 and C37:3 in coccolithopores, while Mg/Ca series are based on ratios in foraminifera, with surface dwelling foraminifera (especially G. ruber) being of interest.  During the past 20 years and especially the past 10 years, alkenone samples have been widely collected throughout world oceans and coretop and sediment trap calibrations yield sensible maps of ocean temperature without jiggling. In deep time, they also yield “sensible” results. Alkenone series constitute 15 of 21 high-resolution of the Ocean2K dataset (26 of 57 overall) and also the majority of the Marcott ocean data (31 of 60), with foraminifera Mg/Ca being the second-largest fraction.

Alkenone and Mg/Ca series had originally been collected to shed light on “deep time”, but there were occasional box cores which both preserved the most recent sediments (a sampling problem with piston cores) and which had been sampled at sufficiently high-resolution to shed light on the past two millennia.  I’ve made a practice of regularly examining the NOAA and Pangaea datasets for potentially relevant new data and, over the past 10 years, had already noticed and separately discussed many of the series in the high-resolution Ocean2K dataset (e.g. here here here here here here).

Here, for example, is a figure from Leduc et al 2010, previously shown at CA here, showing dramatic decreases in alkenone SST at two sites: Morocco, Benguela.  (Both these sites are included in the Ocean2K high-resolution dataset, both with more than thirty 20th century values.)  Numerous other CA posts on the topic are in the following tags: Ocean sediment; Alkenone.

Figure 1. From Leduc et al 2010. Both locations are in the Ocean2K high-resolution network.

In a number of CA posts, I had questioned the “alkenone divergence problem”, the term alluding to the notorious divergence between instrumental temperatures and tree ring density proxies that had given rise to various “tricks” to “hide the decline” in Mann’s section of IPCC TAR and other articles in order not to “dilute the message”.  In important ways, the alkenone divergence problem is even more troubling as (1) there is a physical calibration of alkenone proxies, whereas tree ring densities are merely correlated after the fact; and (2) alkenone proxies have “sensible” properties in deep time.

The “problem” arising from divergence between a proxy reconstruction and instrumental temperature is that such divergence makes it impossible to have confidence in the proxy reconstructions in earlier periods without reconciling the divergence.   Mann, for example, has always insisted that his reconstructions have statistical “skill” in calibration and verification periods, though the validity of such claims has obviously been at issue.

A Reconstruction from the Ocean2K “High-Resolution” Dataset

The Ocean2K data consisted of 57 series of wildly differing resolution: nine series had fewer than 20 values, while twelve series had more than 100 values.  In geophysics, specialists always use high-resolution data where available and use low-resolution data only where better data is unavailable.  In contrast, in their main figures, the Ocean2K authors degraded all their data into 200-year bins and made composites of the data only after blurring.

In their Supplementary Information, the Ocean2K authors identified a subset of 21 high-resolution series: twelve of the 21 series had more than 20 values in the 20th century, seven series had more than forty 20th century values and all but one had more than eight values.  In Figure S10, they showed a high-resolution composite in 25-year bins, but only for the 1850-2000 period and only in SD Units (1850-2000 and only after binning).

Because the underlying proxy data is already in deg C, it is trivially easy (easier in fact) to do the Ocean2K calculations in deg C rather than SD Units and it’s hard to believe that the Ocean2K authors hadn’t already done so.    Figure 2 below shows the composite of high-resolution ocean cores in 25-year bins over the 0-2000 period (rather than 1850-2000) and in deg C (rather than SD units.)  For comparison, I’ve also shown instrumental HadSST (black) and the composite from the full network in Ocean2K technique using 200-year bins (but retaining deg C).    Expressed in deg C, there is a major divergence in the 20th century between instrumental temperature and the proxy reconstruction.   Even late 20th century proxy values are clearly below medieval values.


Figure 2. Red – 25-year bin composite of  Ocean2K high-resolution ocean cores (not using one incongruous singleton coral series) retaining deg C. Magenta- composite for full network, calculated as in Ocean2K composite, but retaining deg C throughout. Black – HadSST global (since ERSST global series only begins in 1880.)  

McGregor et al made no mention of this dramatic divergence in their main text, instead asserting that “the composite of reconstructions from tropical regions are in qualitative agreement with historical SST warming at the same locations”:

Although assessment of significance is limited by the number and resolution of the reconstructions, and by the small amount of overlap with historical SST estimates, we find that the composite of reconstructions from tropical regions are in qualitative agreement with historical SST warming at the same locations (Supplementary Fig. S10). Upwelling processes recorded at a number of the sites may also influence the twentieth-century composite (Supplementary Sections 1 and 8).

Even if the tropical composite was in “qualitative” agreement (a point that I will examine in a future article), this implies that the extratropical divergence has to be that much worse in order to yield the actual overall divergence.  It is very misleading for the authors to claim “qualitative agreement” in the tropics without disclosing the overall divergence.

Deep in their Supplementary Information (page 44), they quietly conceded that the high-resolution composite did not yield the warming trend of the instrumental data, but there is no hint of this important result in the text of the article:

The 21_O2k and 21_Kaplan composites are non-significantly correlated (r2 = 0.17, df = 4, p = 0.42), with the warming trend in the 21_Kaplan not reproduced in the 21_O2k composite (Supplementary Fig. S10).

They illustrated this with the following graphic (in 1850-2000 SD units after binning). While the use of SD Units degrades the data, even this figure ought to have been sufficient to dispel the speculation of some ClimateBallers that the 1800-2000 bin might combine low 19th century values and high 20th century values, thereby concealing a blade.


Figure 3. Excerpt from Ocean 2K SI Figure S10, showing Kaplan SST (top panel) and Ocean2K high-resolution composite, both expressed in 1850-2000 SD Units (after binning).

While the SD units of the 200-year bin and 25-year bin figures are not the same, I think that it is still instructive to show the two panels with consistent centering.  In the figure below, I’ve centered the panel showing the 25-year bins so that it matches the reference level of the final (1800-2000) bin of the 200-year reconstruction, further illustrating that its final bin does not contain a concealed blade.


Figure 4. Left panel – Ocean2K in 200-year bins (PAGES2K FAQ version from here); right – bottom panel of SI Figure S10a, with its zero value aligned to value of 1800-2000 bin in left panel.   Both panels in SD Units (not deg C).  I’ve been able to closely emulate results in the left panel, but not as closely in the right panel.  

The Supplementary Information carries out corresponding analyses on subsets of the high-resolution data: tropical vs extratropical, upwelling vs non-upwelling, alkenone vs Mg/Ca.   Trying to analyse the divergence through such stratification is entirely justified, though the actual statistical analysis carried out by the Ocean2K authors is far from professional standard. I’ll discuss these analyses in a separate post.   For now, I’ll note that similar concerns have been raised about alkenone data in a Holocene context, even by Ocean2K authors.  Lorenz et al 2006 (discussed at CA in early 2007 here) had contrasted trends in tropical vs extratropical alkenone data over the Holocene; in my commentary, I had pointed out the prevalence of upwelling locations in the tropical data.

Postscript: False ClimateBaller Claims that the Data “Finishes in 1900” 

In reaction to my first post, Ken Rice (ATTP) and other ClimateBallers argued that there was no reason to expect the Ocean2K data to have a blade, since the data supposedly ended in 1900 or was otherwise too low resolution.  Such claims were made at David Appell’s here, at Rice’s blog here and on Twitter.

As I observed in my post, the Ocean2K data archive is excellent and the measurement counts easily calculated. A barplot of measurements (grey) and cores (red) is shown below.  Not only does the data not end in 1900, the number of individual measurements from the 20th century is larger than any previous century.  Nor is this sample too small to permit analysis.  21 series are considerably more than the number of medieval proxies in many canonical multiproxy studies that are uncontested by IPCC or ClimateBallers. While it would be nice to have more data (especially in the Southern Ocean), there’s easily enough 20th century data to be worthwhile discussing.


Figure 5. Number of measurements in the Ocean2K dataset by 20-year period (grey); number of contributing series by 20-year period (red).

Now consider various assertions about the data made by Rice and others. Shortly after my original article, Rice stated (here and here) that the data ended in 1900 and thus there was no reason to expect a blade.

“As far as I’m aware, it finsishes in 1900 and the paper has “pre-industrial” in the title. So why would we expect it to have a blade?”

Rice even accused me of “misread[ing]” the x-axis:

Can’t quite work out how you’ve managed to misread the x-axis so badly?

I informed Rice in a comment at Appell’s that his belief that ended “in 1900” was incorrect as follows:

Ken says: “As far as I’m aware, it finsishes in 1900 and the paper has “pre-industrial” in the title. So why would we expect it to have a blade?”  The data doesn’t end in 1900. There are more measurements in the 20th century than in any previous century. The 20th century data doesn’t have a Hockey Stick either, as you can see in their Figure S10a.

I had also posted a Twitter comment highlighting that, even in the Ocean 2K step graph, the final (1800-2000) bin of the step-graph extended to 2000.  Rather than defend his false claims, Rice made a Gavinesque exit, but not before making an unsupported allegation that I was spreading “misinformation” about the Ocean2K study:


Nonetheless, a few days later, Rice returned to the topic in a blog article on Sept 13, re-iterating his untrue claim that the Ocean2K data ended “in 1900”:

Steve McIntyre (who was involved in the discussion on David Appell’s blog) seems to be highlighting that the recent Ocean2K reconstruction does not have a blade. Well, the data appears to end in 1900 and the paper title is Robust global ocean cooling trend for the pre-industrial Common Era, so why would we expect there to be a blade.

This time, one of his readers (improbably, Sou) pointed out to Rice that the 1800-2000 bin must include 20th century data. Sou speculated that 200-year bin could contain a concealed blade in the bin through a combination of cold 19th century values and warm late 20th century values – apparently unaware that this possibility had already been foreclosed by Supplementary Figure S10):

I don’t know that the recent ocean2k paper ended in 1900. I think what it did was end in the 1801 to 2000 “bin”, which would have included the coldest years of the past 2,000 years, as well as whatever proxy records were included up to 2000. The boxes in Figure 2 showed a lot of things, including the median for each 200 year bin, the latest of which was centred on 1900 – but went from 1801 to 2000.

Rice amended his post to say that his prior assertion (that the data ended in 1900) wasn’t “strictly correct”:

 What I say here isn’t strictly correct.

However, the issue is not that his original assertion wasn’t “strictly correct”; the issue was that it was unambiguously wrong.

The Ocean2K “Hockey Stick”

The long-awaited (and long overdue) PAGES2K synthesis of 57 high-resolution ocean sediment series (OCEAN2K) was published a couple of weeks ago (see here here). Co-author Michael Evans’ announcement made the results sound like the latest and perhaps most dramatic Hockey Stick yet:

Today, the Earth is warming about 20 times faster than it cooled during the past 1,800 years,” said Michael Evans, second author of the study and an associate professor in the University of Maryland’s Department of Geology and Earth System Science Interdisciplinary Center (ESSIC). “This study truly highlights the profound effects we are having on our climate today.”

A couple of news outlets announced its release with headlines like “1,800 years of global ocean cooling halted by global warming”, but the the event passed unnoticed at realclimate and the newest “Hockey Stick” was somehow omitted from David Appell’s list of bladed objects.

Continue reading

Op Ed on Deflategate

In Financial Post here. My submitted version was a little harsher. #deflategate For related blog posts, see tag here.


Did McNally Inflate One Football in the Washroom?

In today’s post,  I’m going to show the Deflategate data from a new perspective.   Rather than arguing about whether the Patriots used the Logo gauge, I’ve assumed, for the sake of argument, the NFL’s conclusion that the Non-Logo gauge was used, but gone further (as they ought to have done). I’ve “guessed” the amount of deflation that would be required to yield the observations. And, instead of only considering the overall average, I plotted each data point and how the “guessed” deflation would reconcile each data point.

Some very surprising results emerged, one of which raises the question in the title: did McNally inflate one football in the washroom?  If the question doesn’t seem to make sense, read on.

Rather than one guess being applicable to all measurements, I ended up needing four different groups each with a different guessed deflation.  A “good” guess (i.e. one that “worked”) for the majority of balls (7) was 0.38 psi – an interesting number that I’ll discuss in the post.  A good guess for two balls was zero deflation.  But for ball #7, it was necessary to assume that it had been inflated by approximately 0.5 psi in the washroom. One ball was lower than the others (0.76 psi) and remains hard to explain.  The Wells Report reasonably drew attention to variability, but did not address the details of actual variability other than arm-waving and did not actually show that erratic washroom deflation was a plausible explanation for observed variability.

While the approach in today’s post doesn’t appear conceptual,  statistical algorithms, including linear regression,  typically solve inverse problems.  The spirit of today’s post is approaching Deflategate as an inverse problem.  In doing so, I am aware (as Carrick has forcefully observed) that the underlying physical conditions were poorly defined, but people still need to make decisions using the available information as best they can.  I think that the approach in today’s post provides a much more plausible and satisfying explanation of the variation in Patriot pressures than those presented by either Exponent or Snyder or, for that matter, my own previous commentary.

Bear with the explanation of context, as the results are interesting.

Continue reading

Letter to Daniel Marlow on Exponent Error

On June 29, I sent a letter to Ted Wells, notifying him of the erroneous description of key figures in the Exponent report, but did not receive any acknowledgement.  In the presumption that Daniel Marlow of Princeton is more likely to be concerned about the erroneous research record (as well as having obligations that the research record be properly presented) I sent him a similar letter today, copying lawyers Daniel Goldberg and Jeffrey Kessler.

Continue reading

Exponent’s Trick to Exaggerate the Decline

In an earlier article,  I pointed out that essential figures in the Exponent report contained (what appeared to be) an important misrepresentation: that transients purporting to represent Logo gauge initialization had not really been initialized with the Logo gauge.  The same point was later (and independently) made in a technically oriented sports blog.  Exponent’s misrepresentation (“trick”) exaggerated the “decline” – a word that climate readers will be amused to find in discussion of this topic.  To effectively defend their client,  in my opinion, this was perhaps the most important job for Brady’s technical experts. And, out of all the technical issues, because this issue involved a misrepresentation – be it unintentional or intentional – this was arguably the technical issue with the biggest upside for Brady’s legal team, since misrepresentations have an entirely different legal weight than errors.  In today’s post, I’ll look at the transcript to see how Snyder (the technical expert) and Kessler (the lead lawyer) did on this issue. I’ll also suggest a face-saving solution for Goodell.
Continue reading

Who “Told” Exponent Not to Consider Switching Scenario?

The transcript of the Brady appeal before Goodell has been released and it’s astonishing to see how the sausage was made.  It raises many issues, one of which I’ll discuss in today’s post. Continue reading

Goodell and Deflategate Science

Yesterday, Roger Goodell released his decision on the Brady appeal.

Most of the early discussion has been about Brady’s destruction of his cell phone. Brady has contested the NFL’s characterization of this incident here (see cover here), saying that he had replaced a broken phone; that they had already told the NFL that Brady was not going to turn over his cell phone and that Brady had no obligation to do so under the labour agreement; that they provided the NFL with records from the carrier of all calls and texts; that he had “never written, texted, emailed to anybody at anytime, anything related to football air pressure before this issue was raised at the AFC Championship game in January”; that Wells already had Jastremski and McNally’s phones (on which there were no communications from Brady until after the AFC Championship game).  More on this below.

My specific interest in the decision was how the scientific issues were dealt with, given that there were serious statistical and scientific defects in the Exponent report.   There isn’t very much in the Goodell decision about the science and statistics. Goodell adopted the Exponent report in total.  It also looks to me like Brady’s side did a totally ineffective job of confronting the Exponent report.

Goodell accepted Exponent’s finding that the full extent of the decline could not “be explained” and that a “substantial part of the decline” was due to tampering. Goodell says that the Brady side submitted “alternative scientific analyses (including the study presented by economists from the American Enterprise Institute)” and, as an expert witness, produced Dean Edward Snyder of the Yale School of Management, described as an “economist who specializes in industrial organization”.    Against them, the “Management Council” produced two Exponent scientists (Caligiuri and Steffey) and the Princeton professor who had originally reviewed the Exponent study.

The salient section is as follows (with two footnotes) :

I find that the full extent of the decline in pressure cannot be explained by environmental, physical or other factors. Instead, at least a substantial part of the decline was the result of tampering….

I took into account Dean Snyder’s opinion that the Exponent analysis had ignored timing… Dr Caligiuri and Dr Steffey both explained how timing was, in fact, taken into account in both their experimental and statistical analysis. They concluded based on physical experiments that timing of the measurements did have an effect on the pressure but that the timing in and of itself could not account for the full extent of the pressure declines hat the Partiot balls experienced.  Dean Snyder, in contrast, performed no independent analysis or experiments, not did he take issue with the results of the Exponent experimental work that incorporated considerations of timing and were addressed in detail in the testimony of Caligiuri and Steffey.

I also considered Dean Snyder’s other two “key findings”, as well as the arguments summarized in the NFLPA’s post-hearing brief, including criticism of the steps taken in the Officials Locker Room at halftime to measure and record the pressure of game balls[1]. I was more persuaded by the testimony of Caligiuri, Steffey and Marlow and the fact that the conclusions of their statistical analysis were confirmed by the simulations and other experiments conducted by Exponent. Those simulations and other experiments were described by Prof Marlow as a “first-class piece of work”.[2]

[1] There was argument at the hearing about which of the two pressure gauges Anderson used to measure the pressure in the game balls prior to the game. The NFLPA and Snyder opined hat Mr Anderson had used the so-called logo gauge.  On this issue, I find unassailable the logic that the Wells Report that the non-logo gauge was used, because otherwise neither the Colts’ ball nor the Patriots’ balls when tested by Anderson would have measured consistently with the pressures at which each team had set their footballs prior to delivery to the game officials, 13 and 12.5 psi respectively. Mr Wells’ testimony was confirmed by that of Caligiuri and Marlow. As Marlow testified, “There’s ample evidence that the non-logo gauge was used”.

[2] For similar reasons, I reject the arguments advanced in the AEI Report. The testimony provided by the Exponent witnesses and Professor Marlow demonstrated that none of the arguments presented in that report diminish or undermine the reliability of Exponent’s conclusions.

If Snyder’s testimony was as represented, he was a singularly poor choice of expert witness.  There are major errors, defects and adverse assumptions through the Exponent report and Snyder should have taken issue with them.  Why they wouldn’t have challenged the 67 deg F assumption of the simulations or the apparent gross error in Figures 26 and 30 (at CA here; also at ^ here)  is beyond me.

It’s also hard to understand why the Brady side would have produced an expert witness who hadn’t gone to the trouble of doing his own independent analysis.   As represented by Goodell, Snyder focused on the single issue of “timing”, claiming that Exponent had “ignored” timing.  While there are issues with how Exponent handled timing, it is ludicrous to say that they “ignored” timing issues.  Yes, their “statistical analysis” in Appendix A, from which ludicrous claims of “statistical analysis” are derived, ignored timing issues, but timing issues were front and center in the simulations and claims that Exponent “ignored” timing – if Snyder made such claims – are easily refuted.

On the other hand,  Goodell and Exponent’s characterization are always shadow boxing with reality.  Goodell said :” Dr Caligiuri and Dr Steffey both explained how timing was, in fact, taken into account in both their experimental and statistical analysis. ”  This isn’t true either. Timing was taken into account in their experimental analysis, but not in the statistical analysis (in Appendix A).   (By the way, I haven’t written except in passing about the statistical analysis in Appendix A as the simulations seemed to me to be the core of the prosecution case, while the statistical analysis was so stupidly irrelevant and pointless as to be worthless, but the fact that it is referred to here as a factor in the decision may cause me to revisit this.)

One of the most important, if not the most important, arguments in trying to make sense of events was the scenario in which referee Anderson used the Logo gauge for measuring Patriot balls and the Non-Logo gauge for measuring Colt balls, inattentively changing gauges between measurements – as NFL officials also did during half-time, despite the heightened scrutiny.  This scenario neatly reconciles a lot of otherwise discordant information, as discussed in previous posts.  This scenario was raised in the AEI article as well and in an early response by the Patriots.  If Goodell has correctly characterized evidence from Snyder and the NFLPA, they botched this issue as well. According to Goodell, they argued that Anderson had used the Logo gauge for measuring both Patriot and Colt balls, raising the problem of the approximate pregame match of Colt pressures and Anderson’s measurements.  This argument is moot if, as seems entirely possible, Anderson inattentively changed gauges.  Then the issue is how Anderson’s pregame measurement of Patriot balls (if done with Logo gauge) could have reconciled with Patriot pregame measurements.   On this narrower issue, there are a couple of possibilities: (1) the Patriot (Jastremski) gauge might have had a similar bias to Anderson’s Logo gauge.  Exponent’s analysis of gauge variation is wildly irrelevant to the problem as they limited their analysis to other examples of new Non-Logo gauges. Also, the NFL appears to have been in possession of the Jastremski gauge at half-time and could have tested its calibration, but it didn’t do so, apparently not keeping track of the gauge.  (2) while Exponent has plausibly shown that the additional pressure arising from Patriot gloving protocols would have worn off by the time of Anderson’s measurements, it also appears possible that Patriot pregame measurements were done while the balls were still impacted by gloving.

The AEI report had raised the issue of switching gauges, but did not carry out the more detailed analysis of the implications of that scenario on the transients and simulations.  The Brady side needed more than provided in the AEI report, but the switching scenario cannot be trivially dismissed either. Goodell stated: “The testimony provided by the Exponent witnesses and Professor Marlow demonstrated that none of the arguments presented in that report diminish or undermine the reliability of Exponent’s conclusions.”  I don’t see how anyone can responsibly assert that the switching scenario does not “diminish or undermine the reliability of Exponent’s conclusions.” It’s an important possibility that really does call into question the validity of Exponent’s claim that the decline in pressure cannot be accounted for by environmental and physical factors.

I noticed that Goodell’s decision added the word “substantial” in saying that “substantial part of the decline” was due to tampering. This word is new in the Goodell decision and is not actually stated in the Wells Report, which said instead that the decline “cannot be explained completely” by environmental and physical factors. As I reported previously, Exponent said that pressures in Exponent’s Game Day simulations were “noticeably higher” than observed Patriot pressures, but did not use the word “substantial” – undoubtedly because the difference was only 0.1-0.24 psi (see Figure 30 and Exponent page 62).

It seems to me that the use of the word “substantial” changes the hurdle. Would the difference of 0.1-0.24 psi, described as “noticeable” in the Exponent Report also be fairly described as “substantial”? I don’t think so. Read carefully, I do not believe that the Exponent Report, even on its own terms, supports the term “substantial” (as opposed to, say detectable).

Raymond Bradley and the Grand Old Duke of York

In today’s post, I’ll return to more typical Climate Audit programming.  Upside-down Mann’s mentor, Raymond Bradley, has somewhat surprisingly published an article (Balascio et al 2015) that supports a longstanding Climate Audit criticism of varve proxies. Bradley and coauthors did not report that their interpretation of an important Baffin Island series is upside-down to the orientation used in PAGES2K and numerous AR5 vintage multiproxy reconstructions.  It seems that proxies used by the Team are like the Grand Old Duke of York:

And when they were up, they were up,
And when they were down, they were down,
And when they were only half-way up,
They were neither up nor down.

Continue reading

Ruling out high deflation scenarios

Further to my series of posts on Deflategate, reader chrimony observed that my statistical analysis had shown that it was possible that there had been no tampering, but had not excluded the possibility of tampering.  This is a sensible observation, but raises the question of whether and how one could use the available statistical information to exclude tampering. This is analysis that ought to have been done in the Wells Report.  I’ve done the analysis in this post and the results are sharper than I’d anticipated.

For Logo initialization, any manual deflation exceeding de minimis of say 0.1 psi can be excluded by observations.  For Non-Logo initialization, statistical information rules out “high” deflation scenarios i.e.  deflation by more than the inter-gauge bias of 0.38 psi plus uncertainty, including deflation levels of ~0.76 psi reported in Exponent’s deflation simulations.  Remarkably, for Non-Logo initialization, the only manual deflation that is not precluded are amounts equal (within uncertainty) to the inter-gauge bias of ~0.38 psi.  Precisely why Patriots would have deflated balls by an amount almost exactly equal to the bias between referee Anderson’s gauges is a bizarre coincidence, to say the least.  I think that one can safely say that it is “more probable than not” that referee Anderson used the Logo gauge than that such an implausible coincidence.
Continue reading