For Logo initialization, any manual deflation exceeding de minimis of say 0.1 psi can be excluded by observations. For Non-Logo initialization, statistical information rules out “high” deflation scenarios i.e. deflation by more than the inter-gauge bias of 0.38 psi plus uncertainty, including deflation levels of ~0.76 psi reported in Exponent’s deflation simulations. Remarkably, for Non-Logo initialization, the only manual deflation that is not precluded are amounts equal (within uncertainty) to the inter-gauge bias of ~0.38 psi. Precisely why Patriots would have deflated balls by an amount almost exactly equal to the bias between referee Anderson’s gauges is a bizarre coincidence, to say the least. I think that one can safely say that it is “more probable than not” that referee Anderson used the Logo gauge than that such an implausible coincidence.

As discussed in a previous post (see here), half-time ball pressures can be converted to ball temperatures using the Ideal Gas Law and knowledge and/or assumptions of pre-game initialization conditions.

The half-time ambient temperature was 48 deg F (black solid dot). The average Colt half-time pressures (using relatively unbiased Non-Logo gauge) convert to an average ball temperature of approximately 58.1 deg F (blue + sign). Based on the information that the referees only measured 4 Colt balls because they were “running out of time”, I’ve estimated the average Colt measurement time at 12.5 minutes, just before the end of half-time at 13.5 minutes. This yields the negative exponential transient as shown below. Because Patriots had substantially more ball possession, especially towards the end of the first half, their mix of balls would be wetter than the Colt mix and thus, if anything, below the Colt transient. Note that this transient is for a mix of wet and dry balls – not dry balls or wet balls.

*Figure 1. Half-time ball temperatures. Dry transient is fitted negative exponential to Colt half-time average at estimated average measurement time of 8 minutes. Wet transient differential is based on information in Figure 27. Implied ball temperatures for average Patriot half-time pressure measurements is shown for three cases: Logo initialization and no deflation; Non-Logo initialization and no deflation; Non-Logo initialization and 0.72 psi deflation – matching average deflation in Exponent simulations of rapid deflation. *

Exponent’s simulations of surreptitious deflation all yielded an average deflation of ~0.76 psi (with very little variability – see note in Appendix.) If Patriot balls had been deflated after measurement by the same amount as Exponent’s deflation simulation (0.76 psi) – a plausible comparison – then the implied ball temperature for Patriot half-time average pressure of 11.11 psi (Non-Logo gauge) is 58.4 deg F –** higher** than the corresponding temperature for Colt balls measured later in the half-time- and well above the transient at plausible average Patriot measurement times. The implied *hiatus* contradicts the possibility of surreptitious deflation in the amount of the Exponent simulations. The only deflation values (Non-Logo gauge initialization) that are consistent with the transient to observed Colt values are values in an interval centered (curiously) at ~0.38 psi – the precise value of the inter-gauge bias.

For comparison, I’ve also shown the corresponding ball temperature assuming Logo gauge initialization at 71 deg F (red + sign), almost exactly on the Colt temperature transient. If, after Logo gauge initialization, there had been manual deflation of 0.38 psi, the ball temperature corresponding to 11.11 psi at half-time would be almost exactly equal to the 0.76 psi deflation case for Non-Logo initialization – contradicted by the resulting hiatus.

**Discussion**

These results are considerably sharper than results in earlier discussion. For the Logo gauge initialization of Patriot balls, it is not just that observed values are consistent with Logo initialization, but any *noticeable *(in some sense) manual deflation would yield ball temperatures at 11.11 psi (Non-Logo) that were too high relative to the Colt measurements later in the half-time. Any manual deflation greater than ~0.1 psi or so would be inconsistent with observations. Exponent’s simulations did not show that such small deflation could be achieved, nor is there any sensible reason why anyone would bother trying to deflate footballs by ~0.1 psi.

For Non-Logo initialization, any manual deflation greater than ~0.38 psi plus uncertainty allowance, the observed half-time pressures of 11.11 psi (Non-Logo) yields ball temperatures that are too high in comparison with later Colt measurements and are precluded. Similarly, manual deflation less than ~0.38 psi minus uncertainty allowance yields ball temperatures that are too low in comparison with later Colt measurements.

The remarkable result is that only manual deflation (Non-Logo initialization) that is not precluded are amounts equal, within uncertainty, to the inter-gauge bias of ~0.38. It surely passes all understanding why the Patriots would set a deflation target that so exactly matched the inter-gauge bias of referee Anderson’s two gauges. And then executed a surreptitious deflation program exactly implementing this implausible objective. Or even why they would bother with ~0.38 psi deflation rather than more substantial deflation of 1-1.5 psi or more.

I think that one can safely say that it is “more probable than not” that referee Anderson used the Logo gauge than that such an implausible coincidence.

**Appendix – Exponent’s Deflation Simulations**

Exponent’s deflation simulations involved three different employees attempting to deflate 11 footballs in 1 minute 40 seconds. The results were very consistent: each employee deflated the balls by an average of 0.76 psi, with very narrow deflation ranges both intra-employee and intra-employee. The average deflation ranged from 0.75-0.79 psi and standard deviations of ~0.1 psi. Note that the Wells Report had arm-wavingly attributed Patriot pressure variability to variability in deflation, but their own deflation simulations did not yield anything other than negligible variability – an inconsistency not addressed in the report.

]]>

**Figure 27 – Simulations with Logo Initialization**

The critical simulations pertain to initialization with the Logo gauge. These simulations yield the transients in Figure 26, 27 and 30, that are used for comparison of “models” and observations. As previously noted, Exponent carried out these simulations at 67 deg F – the temperature most disadvantageous to Patriots. Their reasons for this adverse assumption were poorly supported and, in my opinion, are completely invalid. I’ve also observed a puzzling discrepancy between the transients shown in these figures and calculations using the Ideal Gas Law.

First, here is an annotated version of Wells Report Figure 27, showing dry (solid) and wet (dashed) transients for Colt (blue) and Patriot (red) footballs, each supposedly initialized at 13.0 and 12.5 psig at 67 deg F using the Logo gauge. In each case, I’ve shown my reverse-engineered estimates for the actual pressure reading (Logo Gauge) at initialization in order to yield the reported dry transients: 12.81 psig (Patriot) and 13.26 psig (Colt) – 0.31 and 0.26 psi respectively above the Logo readings said to have been used. In each case, I’ve shown the location of transients displaced downwards by these amounts. I’ve also shown (solid dots) the Logo average half-time (converted to Master scale using Exponent’s formulas), each coinciding with the horizontal lines in the diagram. Below the diagram, I’ll explain these calculations and speculate on how the discrepancy may have occurred.

*Figure 1. Annotation of Wells Report Figure 27 as explained in text.*

The solid dots showing the average Logo half-time measurements converted to Master scale (both shown at plausible times( confirm that the conversion formulas have been accurately implemented.

I digitized the Colt dry transient (solid steelblue) and fit it with a negative exponential to an asymptote. The corresponding y-intercept is 11.90 psig (Master). The pre-game pressure (Master) at 67 deg F initialization required to yield this pressure at 48 deg F ambient is 12.90 psig (Master), which, in turn, is equivalent to 13.26 psig (Logo) using Exponent’s conversion formula. At the stated initialization pressure of 13.0 psig (Logo) at 67 deg F, the corresponding y-intercept is 11.67 psig, about 0.23 psi lower than the y-intercept in Figure 27. A downward translation of the dry transient by this difference of 0.23 psi yields the 13.0 psig transient shown in thicker blue (together with wet transient a further 0.45 psi lower in dashed blue overlapping the Patriot dry transient).

Similarly, the y-intercept of the digitized Patriot dry transient in Figure 27 is ~11.43 psig. The pre-game pressure (Master) at 67 deg F required to yield this pressure at 48 deg F is 12.47 psig (Master), equivalent to 12.81 psig in Logo scale according to Exponent’s conversion formula, ~0.31 psi greater than the 12.5 psig (Non-Logo) said to have been used. A downward translation of the Patriot dry transient by this difference of 0.31 psi yields the 12.5 psig transient shown in thicker red (together with wet transient a further 0.45 psi lower in dashed red.) **Note that the Figure 27 Patriot transient is consistent with it being calculated at 12.5 psig initialization using the Master Gauge but not with 12.5 psig initialization using the stated Logo Gauge. **

**Figure 25 – Simulations with Non-Logo Initialization**

For comparison, the corresponding calculations for the Non-Logo simulations (Figure 25) are analysed identically below. In this case, the transients in Figure 25 are consistent with Colt calculations based on 13.1 psig (Non-Logo; 13.05 Master), as opposed to the stated 13.0 psig, and Patriot calculations based on 12.66 psig (Non-Logo; 12.61 Master), as opposed to the stated 12.5 psig, with re-calculate transients translated downwards by a corresponding amount as shown below.

*Figure 2. Annotation of Wells Report Figure 25 (Non-Logo) as explained in text.*

The wet differentials for both Colts and Patriots in this figure (~0.2 psi) are considerably less than the wet differential shown in the Logo diagram (~0.45 psi). For reference, wet differentials (dashed lines) using this larger differential are also shown in this figure. As in Figure 1, the average Non-Logo half-time measurements converted to Master scale are shown (solid dots) at plausible times.

**Discussion**

Exponent clearly stated that its Figure 27 simulations were initialized with the Logo Gauge rather than the Master Gauge:

In recognition of the remaining uncertainty as to which gauge was used to measure the footballs pre-game and in the interest of completeness, similar tests were run using the Logo Gauge.

The Logo Gauge was used to set the pressureof two balls to 12.50 psig (representative of the Patriots) and two balls to 13.00 psig (representative of the Colts). From each set (corresponding to each team), one ball remained dry while exposed to the game temperature and the other was wet.

The above Figure 1 shows that this simply isn’t possible. The correctness of this observation can be confirmed by re-calling that initialization at 67 deg F and 12.5 psig yielded half-time pressures at 48 deg F ambient of ~11.5 psig, a value mentioned on several occasions in the Wells Report and observable in Table 10. The Patriot dry transient in Figure 27 is consistent with initialization using the Master Gauge, but not with initialization using the Logo Gauge, as stated and as supposedly the point of the comparison.

This seems like a botch, rather than a bodge, on Exponent’s part and, if so, ought to require a corrigendum, if not retraction.

These differences of 0.3 psi may not seem like much, but the amount in dispute is only ~0.3 psi they are highly material. Also note that any lowering of the transients typically increases the time window in which observations are consistent with transients. Using 71 deg F initialization, all transients move lower still.

**Postscript: Conversion to Master Gauge**

Before comparing “models” to observations, there are some thorny instrumentation problems that need to be addressed first. Not only was the Logo Gauge biased relative to the Non-Logo Gauge, but it experienced very large drift during Exponent’s studies. Reading between the lines of the report, it looks like they didn’t think about intra-study drift until after it had occurred, very much complicating and somewhat, in my opinion, compromising their conversion of gauge readings to correct (Master Gauge) pressures.

As a preamble, the gap between Logo and Non-Logo gauge measurements at half-time and post-game averaged ~0.38 psi, with no discernible trend between 10.5 and 13.25 psig – see the black +-signs in the figure at right below (overplotted onto an excerpt from Wells Report Figure 12.) This ~0.38 psi Logo bias can be seen in Figure 11 of the Wells Report (see excerpt in left panel below) as the bias at 13 psi in the first (Initial – lightblue) tests; the corresponding test for the Non-Logo Gauge showed no bias. The left panel diagram shows that the Logo Gauge bias had increased to ~0.75 psi by the time of the Final test, with the Non-Logo Gauge now biased low by ~0.15 psi.

Figure 12 of the Wells Report (excerpt in right panel) shows two calibrations (V1, V2) over the Logo and Non-Logo gauges over pressure ranges. Both the V1 and V2 calibrations show considerably larger differences between Logo and Non-Logo readings than actually observed on Game Day, and thus even the V1 tests appear to already experience drift from Game Day readings.

Exponent presented formulas for conversion from Logo and Non-Logo scales to Master scale, with the formulas said to have been calculated from an “early” set of measurements. The differences from Master scale arising from these formulas is shown as the dotted black (Logo) and dotted green (Non-Logo) lines in the right panel. The gap between the lines given by the formulas is noticeably less than the consistent ~0.38 psi observed gap on Game Day, strongly suggesting that some drift had already taken place by the time that Exponent calculated its calibration formulas. The failure of their conversion formulas to preserve this observed difference is a real count against this aspect of their technical work and a frustration to comparative analysis.

*Figure 3. Left – excerpt from Wells Report Figure 11, showing Logo and Non-Logo gauge readings in successive tests: note drift upwards of Logo Gauge. Right – excerpt from Wells Report Figure 12. See text for further explanation.*

]]>

Exponent must have noticed the over-inflation by officials, as it is implied by the post-game measurements, but failed to report or comment on it. Their avoidance becomes all the more conspicuous because many of the texts at issue in the Wells Report pertain to an earlier incident in which NFL officials had over-inflated Patriot balls, much to Brady’s frustration and annoyance at the time.

**Post-Game Measurements Show Over-Inflation**

After the AFC Championship game, the NFL retrieved all balls. But instead of measuring all balls, they only measured 4 Patriot and 4 Colt balls, said to have been randomly sampled. Paired measurements were made using the Logo and Non-Logo gauges. Once again, they did not keep track of the time of the measurements or record the temperature of the balls at the time of measurement. Nonetheless, some information can be gleaned from the results.

The average Colt measurement using the Non-Logo scale was 12.35 psig (12.7125 Logo). These results were almost identical to average Colt measurements at half-time (NonLogo- 12.3250; Logo -12.7375), from which one can deduce that the ball temperatures at the time of full-time measurement were almost exactly equal to the ball temperatures at the time of Colt half-time measurement.

Even without this information, based on a starting Colt pressure of 13 psig at pregame 71 deg F, the temperature at the time of the post-game measurements using Ideal Gas Law was ~58.55 deg F (using Non-Logo measurements in both cases).

It appears almost certain that the Patriot post-game measurements were taken at more or less the same time – and therefore, ball temperature – as the Colt post-game measurements. Assume that this was done within 1 deg F of the Colt post-game temperature. Applying the Ideal Gas Law, this yields a range of 13.69-13.8 psig (Non-Logo Gauge) at an equilibrium temperature of 71 deg F (the pregame temperature used in this calculation) – all values that are above the NFL maximum of 13.5 psig. (Measurements using the biased Logo Gauge would have read ~0.38 psi higher.)

This surprising “violation” of NFL rules by their own referees and officials surely merited comment in Exponent’s report. However, Exponent completely evaded the topic, not even reporting the post-game measurements (these were reported only by the lawyers in the text of the Wells Report itself). Exponent purported to excuse this omission on the supposed grounds that post-game information was “significantly less certain” than corresponding pre-game information:

Based on information from Paul, Weiss, we understand that shortly after the end of the AFC Championship Game, four Patriots footballs and four Colts footballs were also measured by the two game officials who had conducted the halftime tests, using the same two gauges used at halftime. Although we understand that these measurements were also recorded in writing, information concerning the timing of these measurements, the pressure levels at which these eight footballs started the second half and the identity of the four Colts footballs tested after the game (specifically, whether they were the same footballs that had been tested at halftime) was

significantly less certain, especially as compared with the information about similar issues concerning the pre-game period.As a result, we did not believe that the post-game measurements provided a scientifically reasonable basis on which to conduct further analysis.

The Wells Report itself regurgitated this piffle from Exponent almost verbatim as follows:

Although these measurements were recorded in conditions similar to those present during halftime, information concerning the timing of these measurements, the pressure levels at which these eight footballs started the second half and the identity of the four Colts footballs tested after the game (specifically, whether they were the same footballs that had been tested at halftime) is significantly less certain than the information about similar issues concerning the pre-game or halftime periods. As a result, our experts concluded that that the post-game measurements did not provide a scientifically reasonable basis on which to conduct a comparative analysis similar to that performed using the pre-game and halftime measurements.

However, none of these arguments pass the light of day. The four post-game Colt footballs were said to have been randomly chosen and their identity to half-time footballs would only matter if the four Colt half-time footballs had not been randomly chosen – a topic that the Wells Report does not discuss. Nor does it matter what pressure level at which the Colt footballs started the second half.

Nor, especially, is the post-game information “significantly less certain” than pre-game information. There are paired post-game measurements for each ball, from which one can reasonably deduce the actual pressure and temperature at the time of measurement. In contrast, there are no paired pre-game measurements and one is forced to guess which gauge was used. If the post-game information was insufficient to provide a “scientifically reasonable basis on which to conduct further analysis”, then all the more so for pre-game.

**Previous NFL Over-Inflation in the October 2014 Patriot-Jet Game**

Curiously, a considerable portion of the Wells Report is devoted to Patriot texts about a previous incident of referee over-inflation.

According to texts in the Wells Report, the referees of a Thursday night game between the Patriots and Jets on October 16, 2014 had over-inflated some (or more) of the footballs to nearly 16 psig, far above the maximum of 13.5 psig. Brady noticed the over-inflation and complained angrily to Patriot equipment manager John Jastremski during the game. During half-time, Jastremski texted an unidentified recipient about Brady’s complaints and, like Climategate correspondent Raymond Bradley rolling his eyes at Michael Mann’s whining, said that he was “ready to vomit”:

Jastremski: Tom is acting crazy about balls… Ready to vomit.

Recipient: K … He saying there[they’re] not good enough??

Jastremski: Tell later.

The Patriots almost lost to an abysmal Jet team, winning only because of a blocked field goal at the end of the game. The next morning, Jastremski checked the balls and determined that Brady’s complaints had been justified. He texted his fiancee (“Panda”) around 8:05 a.m.:

Ugh. Tom was right… I just measured some of the balls. They supposed to be 13 lbs [psi]. They were like 16 [psi]. Felt like bricks.

Jastremski also texted Jim McNally, the part-time Patriot employee responsible for the officials’ room and subsequently accused of deflating balls in a washroom prior to the AFC Championship game, that the referees had over-inflated some of the balls to nearly 16 psi, adding the additional information that the referees had apparently pumped additional air into the footballs, but not deflated them with the gauge to regulation pressure:

Jastremski: I checked some of the balls this morn… The refs fucked us…a few of then were at almost 16… They didnt recheck then after they put air in them

Such “rechecking” is required because, practically, footballs are inflated by pump above the target pressure and air is then released by gauge needle to get to the desired pressure. The procedure was described for the AFC Championship pregame in the Wells Report as follows:

According to Anderson, two of the game balls provided by the Patriots measured below the 12.5 psi threshold. Yette used the air pump provided by the Patriots to inflate those footballs, explaining that he “purposefully overshot” the range (because it is hard to be precise when adding air), and then gave the footballs back to Anderson, who used the air release valve on his gauge to reduce the pressure down to 12.5 psi.

Ironically, by testing the balls after they had arrived at equilibrium, Jastremski had carried out an elementary test either neglected or not reported by the NFL in respect to the intercepted ball (which had not been reflated at half-time and which had been kept in NFL possession) or in respect to the Colt balls (whose exact opening pressure with known gauges was not known.)

Brady’s complaints about over-inflation led to several ragging exchanges between McNally, who evidently felt unappreciated by Brady, and Jastremski during the following week. These texts constitute the majority of the texts adduced as supposed evidence in the Wells Report, though none of them ever refer to tampering with footballs after official inspection, let alone Brady condoning such tampering.

]]>

*Figure 1. Transients as digitized from Figures 25 and 27 converted to temperature transients using Ideal Gas Law. Red- Patriot, blue- Colt; thick – dry; thin – wet; solid -Logo, dashed – Non-Logo. Simulations shown in open circles: large – Logo 67 deg F initialization; small – NonLogo 71 deg F initialization.* Observed average: solid circle- Non-Logo; + – Logo.

I’ve shown the opening 48 deg F value as a black triangle for reference.

The dry and wet transients of Figures 25 and 27 were digitized and fitted with negative exponentials to an asymptote. For each permutation (pregame Colt pressure 13 psig, Patriot 12.5 psig; Logo bias of 0.38 psi; pregame 71 deg F for Non-Logo; 67 deg F Logo), the moles contained in the football based on pregame initialization were calculated. From the prescribed moles, the corresponding temperature transients are calculable under the Ideal Gas Law. There is no reason why the temperature gradients for dry footballs (thick lines) should differ and yet there is about a 5 deg F range. Theoretically, it seems that the temperature gradients should start at 48 deg F, but the fitted curves all start higher. Curiously, the range for wet footballs (thin lines) is much narrower. I don’t exclude the possibility of an error in my calculations as these calculations are not routine for me, but I’ve checked carefully and they seem right.

Next, I’ve converted the simulation results (Non-Logo 71 deg F initialization – large circles; Logo 67 deg F initialization – small circles) to temperature. Again, there is no reason why there should be any difference in temperature transients, but there is.

Finally, I’ve plotted the observed half-time average measurements at plausible average measurement times (Patriot 3.75 minutes; Colt 8 minutes) based on Logo initialization (+) and Non-Logo initialization (solid dot). Note that these calculations do not depend on which gauge was used for **half-time** measurement, only which gauge was used for pre-game initialization.

From the figure, the implied temperature of Patriot balls at half-time based on Logo initialization and of Colt balls at half-time based on Non-Logo initialization both correspond well with simulation and transient calculations and with each other, while the implied temperature of Patriot balls based on Non-Logo initialization and Colt balls based on Logo initialization are both inconsistent with the other information.

I think that this single figure encapsulates all of the relevant information in Figures 24, 25, 26, 27, 29 and 30 in a single location.

**Appendix**

I’ve used the following scripts to produce the above figure. In this iteration, I didn’t use Exponent’s conversion formulas to Master Gauge scale, as the failure of these formulas to conserve the differential between gauges troubles me.

#IDEAL GAS LAW RELATIONS pV=nRT pascal= 6894.76 #pascals per 1 psi R= 8.31441 #gas constant J K-1 mol-1 Vf= .004237 #m3 volume of football 4237 cm3 cent =function(x) (x-32)*5/9 fahr = function(x) 9/5*x +32 coltp=13.0 bias=.38 #between Logo and Nonlogo Gauge Nx=function(targetp,pregame=71) (6894.76*(14.7+targetp))*Vf/(R*(273+cent(pregame))) #function to calculate moles from pregame pressure(psig) and temperature (deg F) Nc= (6894.76*(14.7+coltp))*Vf/(R*(273+cent(71)));Nc # .3308858 Pf= function(tem, n=Nc) (n*R*(273+cent(tem))/Vf)/6894.76 - 14.7 #in psi #function to calculate pressure in psig from temperature in deg F and moles Pf(48) #11.84 at 13.05 Tf= function(P, n=Nc) fahr( (Vf* ((P+14.7)*6894.76))/(n*R) -273 ) #in psi #function to calcuate temperature in deg F from pressure in psig and moles Tf(10.9567,n=Nplogo)#47.99 Tf(11.49-bias, n=Nplogo) # 51.02595 ##INPUT FIT TO DIGITIZED TRANSIENTS IN FIGURES 25 (Non-Logo) and 27 (Logo) #setwd("d:/2015/football") B=read.csv("http://www.climateaudit.info/data/football/transient_coef.csv") B$pregame=rep(c(71,67),each=4) B$P=12.5+0.5*(B$team=="Colt")-0.38*(B$gauge=="Logo") B$n = (6894.76*(14.7+B$P))*Vf/(R*(273+cent(B$pregame))); # .3308858 ##PLOT FIGURE ind=seq(0,13.5,.25); o=F if(o) png("implied_ball_temperatures.png",w=1080,h=810) if(o) par(list(font.lab=2,font.axis=2,cex.lab=1.4,cex.axis=1.4,cex.main=2)) if(o) cex0=3 else cex0=2 plot(0,type="n", xlim=c(-.25,14),ylim=c(42,75),axes=F,xlab="Halftime Minutes",ylab="Ball Temperature (deg F)") axis(side=1); axis(side=2,las=1); box() for(k in c(1,3,5,7)) { f=function(x) B$A[k]+B$B[k]*exp(-B$C[k]*x) lines( ind,Tf( f(ind), n=B$n[k]), col=6-2*as.numeric(B$team[k]),lwd=3,lty=as.numeric(B $gauge[k])) } #dry transients for(k in 1+c(1,3,5,7)) { f=function(x) B$A[k]+B$B[k]*exp(-B$C[k]*x) lines( ind,Tf( f(ind), n=B$n[k]), col=6-2*as.numeric(B$team[k]),lwd=1,lty=as.numeric(B $gauge[k])) } #wet transients #plot observed averages converted to temperature points(0,48,pch=17,cex=cex0) #Patriot points(3.5, Tf(11.11,n=Nx(12.5-bias,pregame=71) ),pch="+",col=2,cex=cex0) #51.02595 points(3.5, Tf(11.11,n=Nx(12.5,pregame=71) ),pch=19,col=2,cex=cex0) #51.02595 #Colt points(8, Tf(mean(foot$Nonlogo[12:15]),n=Nx(coltp- bias,pregame=71)),pch="+",col=4,cex=cex0) points(8, Tf(mean(foot$Nonlogo[12:15]),n=Nx(coltp,pregame=71)),pch=19,col=4,cex=cex0) #plot simulations converted to temperature points( sim$Pat_avg, Tf( sim$Pat_pressure,n= Nx(12.5-bias*(sim$Case=="Logo"),sim$Pregame_T )), pch=1,cex=2*as.numeric(sim$Case),col=2) points( sim$Colt_avg, Tf( sim$Colt_pressure,n= Nx(coltp-bias*(sim$Case=="Logo"),sim $Pregame_T )), pch=1,cex=2*as.numeric(sim$Case),col=4) legend("topleft", lty=c(1,2),lwd=c(3,3),legend=c("Logo","Non-Logo") ,cex=2) legend("bottomright", fill=c(2,4),legend=c("Patriot","Colt"),border=F,cex=2) title("Implied Half-time Ball Temperatures") if(o) dev.off()

]]>

The findings depend on the interpretation of statistical data by decision-makers – a topic that interests me. I found the technical report by Exponent, Wells’ technical consultants, to be very unsatisfactory on numerous counts:

- although they were reported by Wells to have considered “all permutations”, they hadn’t. On important occasions, they omitted highly plausible possibilities that indicated no tampering and, on other occasions, they only considered assumptions that were most adverse to the Patriots;
- on key occasions, it seemed to me that Exponent failed to properly characterize exculpatory results.

At the end of my analysis, I concluded that their key technical findings were simply incorrect and wrote up my analysis, now online here.

I watched both the AFC championship and the final. I have no fan commitment to the Patriots. As someone who’s played sports all his life and whose play has always been rushed, I am amazed at how time seems to stand still for great athletes such as Brady.

The summary is as follows.

**Summary of Analysis of Wells Report**

The conclusions of the Wells Report ultimately depend on statistical and technical analysis carried out by Exponent, their technical consultants. The original problem, as framed by Exponent, was whether the observed pressure drop of Patriot balls could be explained by physical or environmental factors, including temperature changes and selection of pregame gauges:

We then sought to determine whether any combination of the factors listed in 7a through 7d [temperatures at pre-game, on the field and at half-time; timing of half-time measurements; wetness; pre-game gauge use] above (within ranges defined as realistic by Paul, Weiss) suggested pressure levels that matched those recorded on Game Day. If those factors could be set in such a way that the pressures suggested by the transient experiments matched the Game Day measurements, then we could conclude that the Game Day measurements could be explained by physical or environmental factors.

Exponent studied a number of permutations of factors, claiming that none of these combinations accounted for the additional loss of air pressure in Patriot balls or the difference in pressure loss in respect to Colt balls:

Exponent concluded that, within the range of likely game conditions and circumstances studied, they could identify no set of credible environmental or physical factors that completely accounts for the Patriots halftime measurements or for the additional loss in air pressure exhibited by the Patriots game balls, as compared to the loss in air pressure exhibited by the Colts game balls. Dr. Marlow agreed with this and all of Exponent’s conclusions. This absence of a credible scientific explanation for the Patriots halftime measurements tends to support a finding that human intervention may account for the additional loss of pressure exhibited by the Patriots balls.

In this article, I show that these factors can, in fact, be set “in such a way that the pressures suggested by the transient experiments matched the Game Day measurements” as follows:

- Pre-game temperature around 71 deg F
- Logo measurement of Patriot balls and Non-Logo measurement of Colt balls

It is therefore possible to unequivocally say that the “Game Day measurements could be explained by physical or environmental factors”, contradicting the key technical finding of the Wells Report. The corollary is that the Wells Report provides no technical basis for concluding that the Patriot balls had even been out of compliance with NFL regulations during the AFC Championship.

In previous discussions of the Wells Report, Prof MacKinnon and Hassett et al previously identified the important possibility that referee Anderson had not used the same gauge for pre-game measurements of both teams – an inconsistency that also occurred in the half-time measurements under the supervision of NFL Executive Vice President Vincent. The present article extends their work to include analysis of Exponent’s simulations and transients, showing that all relevant issues raised in the Wells Report can be fully explained by “physical and environmental factors”.

The Wells Report also revealed remarkable chaos and inefficiency in NFL protocols and procedures, even in connection with half-time measurements under the additional scrutiny of NFL Executive Vice President Vincent and other senior NFL officials. Had their protocols met reasonable standards, much, if not most, of the present, seemingly false, controversy could have been avoided.

]]>

I’ve now had a new paper that uses an essentially identical method to Lewis (2014), but with updated, higher quality data, published by Climate Dynamics, here. A copy of the accepted version is available on my web page, here.

Like many climate sensitivity studies, the method involves comparing observationally-based and model-simulated temperature data at many differing settings of model parameters, a simple global energy balance model (EBM) with a diffusive ocean being used. But, unusually, surface temperature observational data were not used directly. Like Lewis (2014), my new paper uses observationally-constrained estimates of global mean warming attributable purely to greenhouse gases, separated using detection and attribution methods from temperature changes with other causes, treating them as “observable” data. Effective heat capacity, the ratio of ocean etc. heat uptake to the change in global mean surface temperature (GMST), is used as a second observable. It is estimated using the AR5 planetary heat uptake estimates spanning 1958–2011 and HadCRUT4v2 GMST data.

Detection and attribution studies involve coupled 3D global climate model (GCM) simulation runs with different categories of forcing. They use multiple-regression techniques to estimate what scaling factors to apply to the GCM-simulated spatiotemporal temperature response patterns for the various categories of forcing in order to best match their sum with observational data. The scaling factors (being the regression coefficients) adjust for the GCM(s) under- or over-estimating the responses to the various categories of forcing and/or the forcing strengths. So the estimates of GHG-attributable warming they produce, used as input data in my study, are fully constrained by gridded observational temperature records. This approach is potentially better able to isolate aerosol forcing, the biggest cause of uncertainty when estimating ECS from warming over the instrumental period, than methods using low dimensional models. I used estimates from the same multimodel detection and attribution studies that underlay the main anthropogenic attribution statements in the IPCC fifth assessment Working Group 1 report (AR5), Gillett et al (2013) (open access) and Jones et al (2013), based on their longest analysis periods (respectively 1861-2010 and 1901-2010).

The abstract from my paper reads as follows:

Equilibrium Climate Sensitivity (ECS) is inferred from estimates of instrumental-period warming attributable solely to greenhouse gases (AW), as derived in two recent multi-model Detection and Attribution (D&A) studies that apply optimal fingerprint methods with high spatial resolution to 3D global climate model simulations. This approach minimises the key uncertainty regarding aerosol forcing without relying on low-dimensional models. The “observed” AW distributions from the D&A studies together with an observationally-based estimate of effective planetary heat capacity (EHC) are applied as observational constraints in (AW, EHC) space. By varying two key parameters – ECS and effective ocean diffusivity – in an energy balance model forced solely by greenhouse gases, an invertible map from the bivariate model parameter space to (AW, EHC) space is generated. Inversion of the constrained (AW, EHC) space through a transformation of variables allows unique recovery of the observationally-constrained joint distribution for the two model parameters, from which the marginal distribution of ECS can readily be derived. The method is extended to provide estimated distributions for Transient Climate Response (TCR). The AW distributions from the two D&A studies produce almost identical results. Combining the two sets of results provides best estimates [5–95% ranges] of 1.66 [0.7 – 3.2] K for ECS and 1.37 [0.65 – 2.2] K for TCR, in line with those from several recent studies based on observed warming from all causes but with tighter uncertainty ranges than for some of those studies. Almost identical results are obtained from application of an alternative profile likelihood statistical methodology.

The posterior probability density functions (PDFs) for the two ECS estimates are shown in Figure 1. The exact match of best estimates and uncertainty bounds using the alternative frequentist profile likelihood method confirms that the objective Bayesian method used provides frequentist probability-matching.

**Figure 1**. The box plots indicate boundaries, to the nearest grid value, for the percentiles 5–95 (vertical bar at ends), 17-83 (box-ends), and 50 (vertical bar in box), and allow for off-graph probability lying between *S* = 5 K and *S* = 20 K. Solid line box plots reflect the percentile points of the CDF corresponding to the plotted PDF. Dashed line box plots give confidence intervals derived using the SRLR profile likelihood method (the vertical bar in the box showing the likelihood profile peak).

The revised best (median) estimate for ECS in Lewis (2014) using the objective Bayesian approach, after correcting data handling errors, was 2.2°C. It seems likely that estimate was biased high by the use of temperature data spanning just the 20th century, which started with two anomalously cool decades.

The new study’s best estimate for ECS is almost identical to that of 1.64°C obtained in Lewis and Curry (2014). That study used a simple single-equation energy budget model to compare, between periods spanning 1859–2011, the rise in GMST with forcing and heat uptake estimates given in AR5. As it relied on the expert assessment of aerosol forcing given in AR5, which spans a very wide range, the ECS estimate upper uncertainty bound was higher, at 4.05°C, than in my new study.

The ECS estimate in my new study is also very similar to that in Lewis (2013). That study compared the evolution of surface temperatures in four latitude zones with simulations spanning 1860– 2001 by the MIT 2D global climate model (GCM). Many simulations were performed with differing parameter settings and hence varying model values of equilibrium/effective climate sensitivity (ECS), ocean effective vertical diffusivity (*K*_{v}) and aerosol forcing – which can be tightly constrained when zonal rather than GMST data is used. The parameter combination that best fitted the observational data gave a median estimate for ECS of 1.64°C. With non-aerosol forcing etc. uncertainties adequately allowed for, the 5–95% uncertainty range was 1.0–3.0°C.

Figure 2 shows posterior PDFs for the two TCR estimates from my new study. The best estimates are within 0.05°C of each other. Their average is 1.37°C, with a 5–95% range of 0.65–2.2°C. This is within a few percent of the best estimates for TCR in Lewis and Curry (2014), and of those given in Otto et al (2013), of which I was a co-author alongside fourteen AR5 lead authors.

FIG. 2: Estimated marginal PDFs for transient climate response derived, upon integrating out *K*_{v}, using the transformation of variables method. The box plots indicate boundaries, to the nearest grid value, for the percentiles 5–95 (vertical bar at ends), 17-83 (box-ends), and 50 (vertical bar in box). They reflect the percentile points of the CDF corresponding to the plotted PDF.

]]>

The article also states, paraphrasing rather than quoting, “Lewis had used an extremely rudimentary, some would even say flawed, climate model to derive his estimates, Stevens said.” LC14 used a simple energy budget climate model, described in AR5 WG1, to estimate equilibrium climate sensitivity (ECS) from estimates of climate system changes over the last 150 years or so. An essentially identical method was used to estimate ECS in Otto et al (2013), a paper of which Bjorn Stevens was an author, along with thirteen other AR5 WG1 lead authors (and myself). Energy budget models actually estimate an approximation to ECS, effective climate sensitivity, not ECS itself, which some people may regard as a flaw. AR5 WG1 states that “In some climate models ECS tends to be higher than the effective climate sensitivity”; this is certainly true. Since the climate system takes many centuries to equilibrate, it is not known whether or not this is the case in the real climate system. LC14 discussed the issues involved in some detail, and my Climate Audit blog post referred to estimating “equilibrium/effective climate sensitivity”.

I sent Bjorn Stevens a copy of the above wording and he has responded, saying the following:

“Dear Nic,

because I have reservations about estimates of ocean heat uptake used in the ‘energy-balance approaches’, and because of a number of issues (which you allude to) regarding differences between effective climate sensitivity estimates from the historical record and ECS, I am not ready to draw the inference from my study that ECS is low. That said, I do think what you write in the two paragraphs above is a fair characterization of the situation and of your important contributions to the scientific debate. The Ringberg meeting also made me confident that the open issues are ones we can resolve in the next few years.

Feel free to quote me on this.

Best wishes, Bjorn”

*Update 26 April 2015*

Gayathri Vaidyanathan tells me that the article has been changed at ClimateWire . Certainly, the title has been changed, and I presume the text has been amended per the version she sent me, which no longer suggests misinterpretation. But Scientific American is still showing the original version, so the situation is not very satisfactory.

*Update 28 April 2015*

The text of the article has now been changed at Scientific American, although the title is unaltered. The sentence referring to misinterpretation now reads “Stevens’ paper was analyzed by Nic Lewis, an independent climate scientist.*” At the foot of the article is the note:

“* Correction: A previous version of this story did not accurately reflect Lewis’ work. Lewis used Stevens’ study in an analysis that was used by some media outlets to throw doubt on global warming.*“

]]>

In Part 1 I introduced the talk I gave at Ringberg 2015, explained why it focussed on estimation based on warming over the instrumental period, and covered problems relating to aerosol forcing and bias caused by the influence of the AMO. In Part 2 I dealt with poor Bayesian probabilistic estimation and summarized the state of observational, instrumental period warming based climate sensitivity estimation. In this third and final part I discuss arguments that estimates from that approach are biased low, and that GCM simulations imply ECS is higher, partly because in GCMs effective climate sensitivity increases over time. I’ve incorporated one new slide here to help explain this issue.

**Slide 19**

I’ll start with an easy target: claims that reduced instrumental period warming based ECS estimates that have been published over the last few years reflected the hiatus in warming over the last decade. Such claims are demonstrably false. The main effect of using data extending past 2000 is to provide better constrained ECS estimates, as the anthropogenic signal rose further above background noise.

Most recent studies that give results using data for different periods actually show lower, not higher, ECS median estimates when data extending only to circa 2000 is used. Skeie 2014 is an exception. I attribute this to the less strong observational constraints available from such data being unable to counteract its excessively negative aerosol forcing prior distribution.

**Slide 20**

Now for some genuine issues. First, in a 2014 paper Drew Shindell argued that inhomogeneous forcing, principally by aerosols, with a greater concentration in the northern hemisphere – particularly in the extratropics – than homogenous GHG forcing would have a greater effect on transient global warming. That is principally because the northern hemisphere has more land and warms more rapidly. That aerosol forcing reached a peak level some time ago, unlike for GHGs, also contributes to the effect. The result would be that TCR, and hence ECS, estimates based on observed global warming were biased down.

I think there is in principle something in Shindell’s argument, but I regard his GCM-based estimate of the magnitude of the bias as absurdly high. Based on a simple model and observational constraints as to the ratios of transient warming for various latitude zones, I obtain a best estimate for the bias of no more than about 5%. It would be difficult to reconcile a significant bias with estimates from the non-energy budget ‘good’ studies being in line with energy-budget based estimates. Good non energy-budget studies should be unaffected by this issue due to their use of models that resolve forcing and temperature by hemisphere, and within each hemisphere by latitude zone and/or land vs ocean.

In his Ringberg talk, Gavin Schmidt stated that in the GISS-E2-R AOGCM, the transient responses (over ten years) to aerosol forcing and land use were respectively 1.33x and 3.65x as large as that to GHG forcing. From this he deduced that TCR and ECS estimated from the model’s historical run were biased low by about 20% and 30% respectively. Picking (over high) median estimates based on historical period unadjusted forcing, of 1.6°C for TCR and 1.9°C for ECS, he claims that these go up by respectively 35% and 60% when adjusted for forcing-specific ‘transient efficacy’.

I am at a loss to understand how the diagnosed increases of 20% and 30% turned into claimed increases of 37% and 63% – maybe this was achieved by using uniform priors. Moreover, the very large estimated land use forcing transient efficacy shown in Gavin Schmidt’s slide is based on an unphysical regression line that implies a very large GMST increase with zero land use forcing. In view of these oddities the findings shown seem questionable.

If, despite my doubts, the results Gavin Schmidt presented are correct for the GISS-E2-R model, they would support Drew Shindell’s argument in relation to that model. But it would not follow that similar biases arise in other models or in the real world. I am aware of only two other AOGCMs for which transient efficacies have been likewise been diagnosed using single-forcing simulations (Shindell 2014 used the standard CMIP5 simulations, which is much less satisfactory). One of those models shows a significantly lower transient efficacy for aerosol forcing than for GHG (Ocko et al 2014), behaviour that implies TCR and ECS estimates based on historical warming would be biased up, not down. The other model also appears to show that behaviour, albeit based only on preliminary analysis.

In the light of the available evidence, I think it very doubtful that aerosol and land use forcing have caused a significant downwards bias in observationally-based estimation of TCR or ECS.

The next two bullet points in slide 20 concern arguments that the widely-used HadCRUT4 surface temperature dataset understates the historical rise in GMST. However, over the satellite era, which provide lower troposphere temperature estimates with virtually complete coverage, HadCRUT4 shows a larger global mean increase than does UAH and, even more so, RSS. It seems quite likely that upward biases arising from land surface changes (UHI, etc.) and the destabilisation of the nocturnal boundary layer (McNider et al 2012) exceed any downwards bias resulting from a deficit of coverage in the Arctic.

For land surface changes, AR5 gives a negative best estimate for albedo forcing but states that overall forcing is as likely positive as negative. On that basis it is inappropriate to include negative land surface forcing values when estimating TCR and ECS from historical warming. Those studies (probably the majority) which include that forcing will therefore tend to slightly overestimate TCR and ECS.

The final point in this slide concerns the argument, put quite strongly at Ringberg (e.g., see here) that climate feedback strength declines over time, so that ECS – equilibrium climate sensitivity – exceeds the effective climate sensitivity approximation to it estimated from changes in GMST, forcing and radiative imbalance (or its counterpart, ocean etc. heat uptake) over the instrumental period. As explained in Part 1, in many but not all CMIP5 models global climate feedback strength declines over time, usually starting about 20-30 years after the (GHG) forcing is imposed. I address this issue in the next slide.

**Slide 21**

As running AOGCMs to equilibrium takes so long, their ECS values are generally diagnosed by regressing their top of atmosphere (TOA) radiative imbalance N – the planetary heat absorption rate – on dT, their rise in GMST, during a period of, typically, 150 years following a simulated abrupt quadrupling in CO_{2} concentration. The regression line in such a ‘Gregory plot’ is extrapolated to N = 0, indicating an equilibrium state. ECS is given by half the dT value at the N = 0 intercept. That is because CO_{2} forcing increases logarithmically with concentration, and a quadrupling equates to two doublings.

Slide 21, not included in my Ringberg talk, illustrates the potential bias in estimating ECS from observed warming over the instrumental period. It is a Gregory plot for the MPI-ESM-LR model (chosen in honour of the Ringberg hosts). The grey open circles show annual mean data, that closest to the top LH corner being for year 1. The magenta blobs and line show pentadal mean data, which I have used to derive linear fits (using ordinary least squares regression). The curvature in the magenta line (a reduction in slope after about year 30) indicates that climate feedback strength (given by the slope of the line) is decreasing over time.

CMIP5 model ECS values given in AR5 were based on regressions over all 150 years of data available, as for the blue line in the slide. I have compared ECS values estimated by regressing over years 21-150 (orange line), as in Andrews et al (2014), with ECS values estimated from the first 35 years (green line). Since the growth in forcing to date approximates to a 70-year linear ramp, and at the end of a ramp the average period since each year’s increase in forcing is half the ramp period, 35 years from an abrupt forcing increase is fairly representative of the observable data. As can be seen, the ECS estimate implied by the orange years 21-150 regression line is higher than that implied by the blue year 1-150 regression line, which in turn exceeds that implied by the green years 1-35 regression line. This indicates an increase over time in effective climate sensitivity.

On average, ECS diagnosed for CMIP5 models by regressing over years 21-150 of their abrupt 4x CO_{2} Gregory plots exceeds that diagnosed from years 1-35 data by 19%. However, excluding models with a year 21-150 based ECS exceeding 4°C reduces the difference to 12%. This is fairly minor. The difference is not nearly large enough to reconcile the best estimates of ECS from observed warming over the instrumental period with most CMIP5 model ECS values. And it is not relevant to differences between observationally-based TCR estimates and generally higher AOGCM TCR values.

It is, moreover, unclear that higher AOGCM ECS values diagnosed by Gregory plot regression over years 21-150 are more realistic than those starting from year one. Andrews et al (2014) showed, by running the HadGEM2-ES abrupt 4x CO_{2} simulation for 1290 years (to fairly near equilibrium), that the ECS diagnosed for it from regressing over years 21-150 appears to be substantially excessive. The true model ECS appears to be closer to the estimate based on regressing over years 1-35, which is 27% lower.

Importantly, an increase in effective climate sensitivity over time, if it exists, is almost entirely irrelevant when considering warming from now until the final decades of this century. The extent of such warming, for a given increase in GHG levels, is closely dependent on TCR, irrespective of ECS. Even if effective climate sensitivity does increase over time, that would not bias estimation of TCR from observed historical warming. And the projected effect on warming from effective climate sensitivity increasing in line with a typical CMIP5 model would be small even over 300 years – only about 5% for a ramp increase in forcing, if one excludes HadGEM2-ES and the Australian models (two of which are closely related to it, with the third being an outlier).

**Slide 22**

Although the increase in effective climate sensitivity, due to a reduction in climate feedback strength, with time in many CMIP5 models appears to have little practical importance, at least on a timescale of up to a few centuries, finding out why it occurs is relevant to gaining a better scientific understanding of the climate system.

In a model-based study, Andrews et al (2014) linked the time-variation to changing patterns of sea-surface temperature (SST), principally involving the tropical Pacific. In current AOGCMs, after an initial delay of a few years, on a multidecadal timescale the eastern tropical Pacific warms significantly more than the western part and the tropical warming pattern becomes more El-Nino like, affecting cloud feedback.

The two LH panels in slide 22, from Tim Andrews’ paper and talk, show the CMIP5 model ensemble mean patterns of surface warming during the first 20 and the subsequent 130 years after an abrupt quadrupling of CO_{2}. The colours show the rate of local increase relative to that in GMST. It can be seen that even during the first 20 years, warming is strongly enhanced across the equatorial Pacific.

The RH panels, taken from a different paper, show observed and modelled patterns of warming over 1981–2010. The CMIP5 ensemble mean trend (bottom RH panel) shows a pattern in the tropical Pacific fairly consistent with that over the first 20 years of the abrupt 4x CO_{2} experiment, as one might expect. But the observed trend pattern (top RH panel) is very different, with cooling over most of the eastern tropical Pacific, including the equatorial part.

So observations to date do not appear consistent with the mean evolution of eastern tropical Pacific SST predicted by CMIP5 models. Given Tim Andrew’s finding that weakening of climate feedback strength over time in CMIP5 models is strongly linked to evolving eastern tropical Pacific SST patterns, that must cast considerable doubt on whether effective climate sensitivity increases over time in the real world.

**Slide 23**

There are other reasons for doubting the realism of the changing SST patterns in CMIP5 models that Andrews et al (2014) found to be linked to increasing climate sensitivity.

The strong warming in the deep tropics across the Pacific over years 21–150 is linked to positive longwave (LW) cloud feedback, which in CMIP5 models strengthens and spreads further after years 1–20. But is this behaviour realistic? In parallel with MPI’s main new CMIP6 model MPI-ESM2 (ECHAM6 plus an ocean module), Thorsten Mauritzen has been developing a variant with a LW iris, an effect posited by Dick Lindzen some years ago (Lindzen et al 2001). The slides for Thorsten Mauritzen’s Ringberg talk, which explained the Iris variant and compared it with the main model, are not available, but slide 23 comes from a previous talk he gave about this work. It shows the equilibrium position; so far only simulations by the fast-equilibrating slab-ocean version of the Iris model have been run. [Note: the related paper, Mauritsen and Stevens 2015, has now been published.]

As the top panels show, unlike the main ECHAM6/MPI-ESM2 model, the Iris version exhibits no positive LW cloud feedback in the deep tropical Pacific. And the bottom panels show that, accordingly, warming in the central and eastern tropical Pacific remains modest. This suggests that, if the Iris effect is real, any increase in effective climate sensitivity over time would likely be much lower than CMIP5 model ensemble mean behaviour implies. The Iris version also has a lower ECS than the main model, although not as low as might be expected from the difference in LW cloud feedback, as this is partially offset by a more positive SW cloud feedback.

**Slide 24**

Slide 24 lists methods of estimating ECS other than those based on observed multidecadal warming. I explained in Part 1 that I concurred with AR5’s conclusions that estimating ECS from short term responses involving solar or volcanic forcing or TOA radiation changes was unreliable, and that true uncertainty in paleoclimate estimates was larger than for instrumental period warming based estimates. That implies that combining paleo ECS estimates with those based on instrumental period warming would not change the latter very much.

I also showed, in Part 2, that the model most widely used for Perturbed Physics/Parameter Ensemble studies, HadCM3/SM3, could not successfully be constrained by observations of mean climate and/or climate change, and so was unsuitable for use in estimating ECS or TCR. (Such use nevertheless underlies UKCP09, the official UK 21st century climate change projections.)

The other main source of ECS estimates involves GCMs more directly. Distributions for ECS and TCR can be derived from estimated model ECS and actual model TCR values. A 5-95% ECS range for CMIP5 models, of 2–4.5°C, was given in Figure 1, Box 12.2 of AR5. Feedbacks exhibited by GCMs can also be analysed, and to some extent compared with observations. But although development of GCMs is informed by observations, their characteristics are not determined by observational constraints. If the climate system were thoroughly understood and AOGCMs accurately modelled its physical processes on all scales that mattered, one would expect all aspects of their behaviour to be fairly similar, and the ECS and TCR values they exhibited might then be regarded as reliable estimates. However, those requirements are far from being satisified.

Since AOGCMs tend to be similar in many respects, it is moreover highly doubtful that a statistically-valid uncertainty range for ECS or TCR can be derived from CMIP5 model ECS and TCR values. If some key aspect of climate system behaviour is misrepresented in (or unrepresented by) one CMIP5 model, the same problem is likely to be common to many if not all CMIP5 models.

In this connection, I’ll finish by highlighting two areas relevant to climate sensitivity where model behaviour seems unsatisfactory across almost all CMIP5 models.

**Slide 25**

Slide 25 compares tropical warming by pressure level (altitude) in CMIP5 models and radiosonde observations over 1979-2012. Most models not only show excessive near-surface warming, by a factor of about two on average, but a much greater increase with height than observations indicate. This is the ‘missing hot-spot’ problem. The ratio of tropical mid-troposphere to surface warming would be expected to be smaller in a model with a LW iris than in one that does not, a point in favour of such a feature.

Figure 9.9 in AR5 showed much the same discrepancy – an average factor of about 3x – between observed and modelled temperature trends in the tropical lower troposphere over 1988-2012. Observations in that case were based on satellite MSU datasets and reanalyses that used models to assimilate data.

**Slide 26**

A lot of the discussion at Ringberg 2015 concerned clouds, one of the most important and least well understood elements of the climate system. Their behaviour significantly affects climate sensitivity.

Slice 26 shows errors in cloud fraction by latitude for twelve CMIP5 GCMs. (TCF)_{sat}. is the average per MODIS and ISCCP2 observations. It can be seen that most models have too little cloud cover in the tropics and, particularly southern, mid latitudes, and too much at high latitudes. Errors of this magnitude indicate that reliance should not be placed on cloud feedbacks exhibited by current climate models.

Models also appear to have excessive low cloud liquid water path, optical depth and albedo, which may result in negative optical depth climate feedback being greatly underestimated in models (Stephens 2010).

**Slide 27**

My concluding slide reiterates some of the main points in my talk. Assuming Bjorn Stevens’ revised estimate of aerosol forcing is correct, then the 95% uncertainty *bounds* on ECS and TCR from observed multidecadal warming are well below the *mean* ECS and TCR values of CMIP5 models. It will be very interesting to see how these discrepancies between models and observations are resolved, as I think is likely to occur within the next decade.

** ***Additional references*

Timothy Andrews, Jonathan M. Gregory, and Mark J. Webb (2015): The Dependence of Radiative Forcing and Feedback on Evolving Patterns of Surface Temperature Change in Climate Models. *J. Climate*, **28**, 1630–1648

Lindzen, RS, M-D Chou, AY Hou (2001) Does the Earth have an adaptive infrared iris? *Bull. Amer. Meteor.* Soc. 82, 417-432, 2001

Mauritsen, T and B Stevens (2015) Missing iris effect as a possible cause of muted hydrological change and high climate sensitivity in models. *Nature Geoscience* doi:10.1038/ngeo2414

McNider, R. T., et al. (2012) Response and sensitivity of the nocturnal boundary layer over land to added longwave radiative forcing. J. Geophys. Res., 117, D14106.

Ocko IB, V Ramaswamy and Y Ming (2014) Contrasting Climate Responses to the Scattering and Absorbing Features of Anthropogenic Aerosol Forcings *J. Climate*, **27**, 5329–5345

Rogelj J, Meinshausen M, Sedlácek J, Knutti R (2014) Implications of potentially lower climate sensitivity on climate projections and policy. *Environ Res Lett* 9. doi:10.1088/1748-9326/9/3/031003

Shindell, DT (2014) Inhomogeneous forcing and transient climate sensitivity. *Nature Clim Chg*: DOI: 10.1038/NCLIMATE2136

Stephens, GL (2010) Is there a missing low cloud feedback in current climate models? *GEWEX News*, 20, 1, 5-7.

*Update 21 April 2015*

The Mauritsen and Stevens paper about the new MPI Iris model has just been published. I have added a reference to it, and to a couple of inadvertently omitted references.

]]>

In Part 1 I introduced the talk I gave at Ringberg 2015, explained why it focussed on estimation based on warming over the instrumental period, and covered problems relating to aerosol forcing and bias caused by the influence of the AMO. I now move on to problems arising when Bayesian probabilistic approaches are used, and then summarize the state of instrumental period warming, observationally-based climate sensitivity estimation as I see it. I explained in Part 1 why other approaches to estimating ECS appear to be less reliable.

The AR4 report gave probability density functions (PDFs) for all the ECS estimates it presented, and AR5 did so for most of them. PDFs for unknown parameters are a Bayesian probabilistic concept. Under Bayes’ theorem – a variant on the conditional probability lemma – one starts by choosing a prior PDF for the unknown parameter, then multiplies it by the relative probability of having obtained the actual observations at each value of the parameter (the likelihood function), thus obtaining, upon normalising the result to unit total probability, a posterior PDF representing the new estimate of the parameter.

The posterior PDF melds any existing information about the parameter from the prior with information provided by the observations. If multiple parameters are being estimated, a joint prior and a joint likelihood function are required, and marginal posterior PDFs for individual parameters are obtained by integrating out the other parameters from the joint posterior PDF.

Uncertainty ranges derived from percentage points of the integral of the posterior PDF, the posterior cumulative probability distribution (CDF), are known as credible intervals (CrI). The frequentist statistical approach instead gives confidence intervals (CIs), which are conceptually different from CrIs. In general, a Bayesian CrI cannot be exactly equivalent to a frequentist CI no matter what prior is selected. However, for some standard cases they can be the same, and it is typically possible to derive a prior (a probability matching prior) which results in CrIs being close to the corresponding CIs. That is critical if assertions based on a Bayesian CrI are to be true with the promised reliability.

Almost all the PDFs for ECS presented in AR4 and AR5 used a ‘subjective Bayesian’ approach, under which the prior is selected to represent the investigator’s views as to how likely it is the parameter has each possible value. A judgemental or elicited ‘expert prior’ that typically has a peaked distribution indicating a most likely value may be used. Or the prior may be a diffuse, typically uniform, distribution spread over a wide range, intended to convey ignorance and/or with a view to letting the data dominate the posterior PDF. Unfortunately, the fact that a prior is diffuse does not in fact mean that it conveys ignorance or lets the data dominate parameter inference.

AR4 stated that all its PDFs for ECS were presented on a uniform-in-ECS prior basis, although the AR4 authors were mistaken in two cases. In AR5, most ECS PDFs were derived using either uniform or expert priors for ECS (and for other key unknown parameters being estimated alongside ECS).

When the data is weak (is limited and uncertainty is high) the prior can have a major influence on the posterior PDF. Unlike in many areas of physics, that is the situation in climate science, certainly so far as ECS and TCR estimation is concerned. Moreover, the relationships between the principal observable variables (changes in atmospheric and ocean temperatures) and the parameters being estimated – which typically also include ocean effective vertical diffusivity (*K*_{v}) when ECS is the target parameter – are highly non-linear.

In these circumstances, use of uniform priors for ECS and *K*_{v} (or its square root) greatly biases posterior PDFs for ECS, raising their medians and fattening their upper tails. On the other hand, use of an expert prior typically results in the posterior PDF resembling the prior more than it reflects the data.

Some studies used, sometimes without realising it, the alternative ‘objective Bayesian’ approach, under which a mathematically-derived noninformative prior is used. Although in most cases it is impossible to formulate a prior that has no influence at all on the posterior PDF, the form of a noninformative prior is calculated so that it allows even weak data to dominate the posterior PDF for the parameter being estimated. Noninformative priors are typically judged by how good the probability-matching properties of the resulting posterior PDFs are.

Noninformative priors do not represent how likely the parameter is to take any particular value and they have no probabilistic interpretation. Noninformative priors are simply weight functions that convert data-based likelihoods into parameter posterior PDFs with desirable characteristics, typically as regards probability matching. This is heresy so far as the currently-dominant Subjective Bayesian school is concerned. In typical ECS and TCR estimation cases, noninformative priors are best regarded as conversion factors between data and parameter spaces.

For readers wanting insight as to why noninformative priors have no probability meaning, contrary to the standard interpretation of Bayes’ theorem, and regarding problems with Bayesian methods generally, I recommend Professor Don Fraser’s writings, perhaps starting with this paper.

The Lewis (2013) and Lewis (2014) studies employed avowedly objective Bayesian approaches, involving noninformative priors. The Andronova and Schlesinger (2001), Gregory et al (2002), Otto et al (2013), and Lewis & Curry (2014) studies all used sampling methods that equated to an objective Bayesian approach. Studies using profile likelihood methods, a frequentist approach that yields approximate CIs, also achieve objective estimation (Allen et al 2009, Lewis 2014).

**Slide 10**

I will illustrate the effect of using a uniform prior for TCR estimation, that being a simpler case than ECS estimation. Slide 10 shows estimated distributions from AR4 and AR5 for anthropogenic forcing, up to respectively 2005 and 2011. These are Bayesian posterior PDFs. They are derived by sampling from estimated uncertainty distributions for each forcing component, and I will assume for the present purposes that they can be considered to be objective.

Slide 11 shows posterior PDFs for TCR derived from the AR4 and AR5 PDFs for anthropogenic forcing, Δ*F*, by making certain simplifying approximations. I have assumed that the generic-TCR formula given in AR5 holds; that uncertainty in the GMST rise attributable to anthropogenic forcing, Δ*T* , and in *F*_{2xCO2}, the forcing from a doubling of CO_{2}, is sufficiently small relative to uncertainty in Δ*F* to be ignored; and that in both cases Δ*T* = 0.8°C and *F*_{2xCO2} = 3.71 W/m^{2}.

On this basis, posterior PDFs for TCR follow from a transformation of variables approach. One simply changes variable from Δ*F* to TCR (the other factors in the equation being assumed constant). The PDF for TCR at any value TCR_{a} therefore equals the PDF for Δ*F* at Δ*F* = *F*_{2xCO2} ⨯ Δ*T* / TCR_{a} , multiplied by the standard Jacobian factor: the absolute derivative of Δ*F* with respect to TCR at TCR_{a}. That factor equals, up to proportionality, 1/TCR^{2}.

Suppose one regards the posterior PDFs for Δ*F* as having been derived using uniform priors. This is accurate in so far as components of Δ*F* have symmetrical uncertainty distributions, but overall it is only an approximation since the most uncertain component, aerosol forcing, is assumed to have an asymmetrical distribution. However, the AR4 and AR5 PDFs for Δ*F* are not greatly asymmetrical.

On the basis that the posterior PDFs for Δ*F* correspond to the normalised product of a uniform prior for Δ*F* and a likelihood function, the PDFs for TCR derived in slide 11 correspond to the normalised product of the same likelihood function (now expressed in terms of TCR) and a prior having the form 1/TCR^{2}. Unlike PDFs, likelihood functions do not depend on which variable that they are expressed in terms of. That is because, unlike a PDF, a likelihood function represents a density for the observed data, not for the variable that it is expressed in terms of.

The solid lines in slide 12 show, on the foregoing basis, what the effect is on the AR4- and AR5-forcing based posterior PDFs for TCR of substituting a uniform-in-TCR prior for the mathematically correct 1/TCR^{2} prior applying in slide 11 (the PDFs from which are shown dotted). The median (50% probability point), which is the appropriate best estimate to use for a skewed distribution, increases substantially, doubling in the AR4 case. The top of the 17–83% ‘likely’ range more than quadruples in both cases. The distortion for ECS estimates would be even larger.

I cut slide 12a out of my talk to shorten it. It shows the computed joint noninformative prior for ECS and sqrt(*K*_{v}) from Lewis (2013). Noninformative priors can be quite complex in form when multiple parameters are involved.

Ignore the irregularities and the rise in the front RH corner, which are caused by model noise. Note how steeply the prior falls with sqrt(*K*_{v}), which is lowest at the rear, particularly at high ECS levels (towards the left). The value of the prior reflects how informative the data is about the parameters at each point in parameter space. The plot is probability-averaged over all values for aerosol forcing, which was also being estimated. I believe the fact that aerosol forcing is being estimated accounts for the turndown in the prior at low ECS values; when ECS is very low temperaturs change little and the data conveys less information about aerosol forcing.

Slide 13 summarises serious problems in instrumental period warming based ECS studies, ordered by year of publication, breaking problems down between seven factors. Median ECS estimates are shown by the green bars at the left.

Blank rectangles imply no significant problem in the area concerned; solid yellow or red rectangles signify respectively a significant and a serious problem; a rectangle with vertical yellow bars, which may look like solid pale yellow, indicates a minor problem.

Red/yellow diagonal bars (may look like a solid orange shade of red) in rectangles across ‘Internal variability influence’ and ‘High input Aerosol forcing’ mean that, due to use of global-only data, internal variability (the AMO) has led to an overly negative estimate for aerosol forcing within the study concerned, and hence to an overestimate of ECS. Yellow or red horizontal bars across those factors for the Frame et al (2005) and Allen et al (2009) studies mean that internal variability appears to have caused respectively significant or serious misestimation of aerosol forcing in the detection and attribution study that was the source of the (GHG-attributable) warming estimate used by the ECS study involved, and hence to upwards bias in that estimate (reflected in a yellow or red rectangle for ‘Other input data dubious’).

The blue/yellow horizontal bar across ‘High input Aerosol forcing’ and ‘Other input data dubious’ for the Skeie et al (2014) study mean that problems in these two areas largely cancelled. Skeie’s method estimated aerosol forcing using hemispherically-resolved model-simulation and observational data. An extremely negative prior for aerosol forcing was used, overlapping so little with the observational data-based likelihood function that the posterior estimate was biased significantly negative. However, the simultaneous use of three ocean heat content observational datasets appears to have led to the negatively biased aerosol forcing being reflected in lower modelled than observed NH warming rather than a higher ECS estimate.

The ‘Data don’t constrain model fit’ red entries for the Forest studies are because, from my experience, warming over the model-simulation run using the claimed best-fit parameter values is substantially greater than per the observational dataset. The same entry for Knutti et al (2002) is because a very weak, pass/fail, statistical test was used in that study.

The ‘Model biased or faulty’ red rectangle for Andronova and Schlesinger (2001) reflects a simple coding error that appears to have significantly biased up its ECS estimation: see Table 3 in Ring et al (2012).

A more detailed analysis of problems with individual ECS studies is available here.

To summarise: all pre-2012 instrumental-period-warming studies had one or more serious problems, and their median ECS estimates varied widely. Most studies from 2012 on do not appear to have serious problems, and their estimates agree quite closely. (The Schwartz 2012 study’s estimate was a composite of five estimates based on different forcing series, the highest ECS estimate comes from a poor quality regression obtained from one of the series.)

Slide 14 gives similar information to slide 13, but for TCR rather than ECS studies. As for ECS, all pre-2012 studies had one or more serious problems that make their TCR estimates unreliable, whilst most later studies do not have serious problems apparent and their median TCR estimates are quite close to one another.

Rogelj et al (2012)’s high TCR estimate is not genuinely observationally-based; it is derived from an ECS distribution chosen to match the AR4 best estimate and ‘likely’ range for ECS; the same goes for the Meinshausen et al (2009) estimate. The reason for the high TCR estimate from Harris et al (2013) is shown in the next slide.

A more detailed analysis of problems with individual TCR studies is available here.

This slide came later in my talk, but rather than defer it to Part 3 I have moved it here as it relates to a PPE (perturbed physics/parameter ensemble) study, Harris et al (2013), mentioned in the previous slide. Although this slide considers ECS estimates, the conclusions reached imply that the Harris et al TCR estimate shown in the previous slide is seriously biased up relative to what observations imply.

The plot is of joint distributions for aerosol forcing and ECS; the solid contours enclose ‘likely’ regions, of highest posterior probability density, containing 66% of total probability. Median estimates are shown by crosses; the four Ring et al (2012) estimates based on different surface temperature datasets are shown separately. The black contour is very close to that for Lewis and Curry (2014).

The grey dashed (dotted) vertical lines show the AR5 median estimate and ‘likely’ range for aerosol forcing, expressed both from 1750 (preindustrial) and from 1860; aerosol forcing in GCMs is normally estimated as the change between 1850 or 1860 and 2000 or 2005. The thick grey curve shows how one might expect the median estimate for ECS using an energy budget approach, based on AR5 non-aerosol forcing best estimates and a realistic estimate for ocean heat uptake, to vary with the estimate used for aerosol forcing.

The median estimates from the studies not using GCMs cluster around the thick grey curve, and their likely regions are orientated along it: under an energy budget or similar model, high ECS estimates are associated with strongly negative aerosol forcing estimates. But the likely regions for the Harris study are orientated very differently, with less negative aerosol forcing being associated with higher, not lower, ECS. Its estimated prior distribution ‘likely’ region (dotted green contour) barely overlaps the posterior regions of the other studies: the study simply does not explore the region of low to moderately negative aerosol forcing, low to moderate ECS which the other studies indicate observations best support. It appears that the HadCM3/SM3 model has structural rigidities that make it unable to explore this region no matter how its key parameters are varied. So it is unsurprising that the Harris et al (2013) estimates for ECS, and hence also for TCR, are high: they cannot be regarded as genuinely observationally-based.

Further information on the problems with the Harris et al (2013) study is available here: see Box 1.

This slide shows what I regard as the least-flawed ECS estimates based on observed warming over the instrumental period, and compares them with ECS values exhibited by the RCP8.5 simulation ensemble of CMIP5 models. I should arguably have included the Schwartz (2012) and Masters (2014) estimates, but I have some concerns about the GCM-derived forcing estimates they use.

The violins span 5–95% ranges; their widths indicate how PDF values vary with ECS. Black lines show medians, red lines span 17–83% ‘likely’ ranges. Published estimates based directly on observed warming are shown in blue. Unpublished estimates of mine based on warming attributable to greenhouse gases inferred by two recent detection and attribution studies are shown in green. CMIP5 models are shown in salmon.

The observational ECS estimates have broadly similar medians and ‘likely’ ranges, all of which are far below the corresponding values for the CMIP5 models.

The ‘Aldrin ECS^{-2}‘ violin is for its estimate that uses a uniform prior for 1/ECS, which equates to a ECS^{-2} prior for ECS. I believe that to be much closer to a noninformative prior than is the uniform-in-ECS prior used for the main Aldrin et al (2012) results. The Lewis (Forest) estimate is based on the Lewis (2013) preferred main ECS estimate with added non-aerosol forcing uncertainty, as shown in the study’s supplemental information.

This slide is like the previous one, but relates to TCR not ECS.

As for ECS, the observational TCR estimates have broadly similar medians and ‘likely’ ranges, all of which are well below the corresponding values for the CMIP5 models.

The Schwartz (2012) TCR estimate, which has been omitted for no good reason, has a median of 1.33°C and a 5–95% range of 0.83–2.0°C.

The Lewis (Forest) estimate uses the same formula as in Libardoni and Forest (2011), which also uses the MIT 2D GCM, to derive model TCR from combinations of model ECS and Kv values.

The main cause of long tails in ECS and TCR studies based on observed multidecadal warming is uncertainty as to the strength of aerosol forcing (*F*_{aer}). I’ll end this part with a pair of slides that show how well constrained the Lewis and Curry (2014) energy-budget main ECS and TCR estimates would be if they were recalculated using the distribution for aerosol forcing implicit in Bjorn Stevens’ recent study instead of the wide AR5 aerosol forcing distribution. (For some reason these slides appear much later, out of order, in the PDF version of my slides on the Ringberg 2015 website.)

The median ECS estimate reduces modestly from 1.64°C to 1.45°C, but the 95% uncertainty bound falls dramatically, from 4.05°C to 2.2°C.

The picture is similar for TCR, although somewhat less dramatic. The median TCR estimate reduces modestly from 1.33°C to 1.21°C, but the 95% uncertainty bound falls much more, from 2.50°C to 1.65°C.

*Additional references*

Allen MR, Frame DJ, Huntingford C, Jones CD, Lowe JA, Meinshausen M, Meinshausen N (2009) Warming caused by cumulative carbon emissions towards the trillionth tonne. *Nature, *458, 1163–6.

Frame DJ, Booth BBB, Kettleborough JA, Stainforth DA, Gregory JM, Collins M, Allen MR (2005) Constraining climate forecasts: The role of prior assumptions. *Geophys. Res. Lett.*, 32, L09702, doi:10.1029/2004GL022241.

Harris, G.R., D.M.H. Sexton, B.B.B. Booth, M. Collins, and J.M. Murphy, 2013. Probabilistic projections of transient climate change. *Clim. Dynam.*, doi:10.1007/s00382–012–1647-y.

Lewis N (2014) Objective Inference for Climate Parameters: Bayesian, Transformation-of-Variables, and Profile Likelihood Approaches. *J. Climate*, 27, 7270-7284.

Masters T (2014) Observational estimate of climate sensitivity from changes in the rate of ocean heat uptake and comparison to CMIP5 models. *Clim Dynam* 42:2173-2181 DOI 101007/s00382-013-1770-4

Sexton, D.M. H., J.M. Murphy, M. Collins, and M.J. Webb, 2012. Multivariate probabilistic rojections using imperfect climate models part I: outline of methodology. *Clim. Dynam.*, 38: 2513–2542.

Stevens, B. Rethinking the lower bound on aerosol radiative forcing. In press, *J.Clim* (2015) doi: http://dx.doi.org/10.1175/JCLI-D-14-00656.1

]]>

As many readers will be aware, I attended the WCRP Grand Challenge Workshop: Earth’s Climate Sensitivities at Schloss Ringberg in late March. Ringberg 2015 was a very interesting event, attended by many of the best known scientists involved in this field and in areas of research closely related to it – such as the behaviour of clouds, aerosols and heat in the ocean. Many talks were given at Ringberg 2015; presentation slides are available here. It is often difficult to follow presentations just from the slides, so I thought it was worth posting an annotated version of the slides relating to my own talk, “Pitfalls in climate sensitivity estimation”. To make it more digestible and focus discussion, I am splitting my presentation into three parts. I’ve omitted the title slide and reinstated some slides that I cut out of my talk due to the 15 minute time constraint.

**Slide 2**

In this part I will cover the first bullet point and one of the major problems that cause bias in climate sensitivity estimates. In the second part I will deal with one or two other major problems and summarize the current position regarding observationally-based climate sensitivity estimation. In the final part I will deal with the third bullet point.

In a nutshell, I will argue that:

- Climate sensitivity is most reliably estimated from observed warming over the last ~150 years
- Most of the sensitivity estimates cited in the latest IPCC report had identifiable, severe problems
- Estimates from observational studies that are little affected by such problems indicate that climate sensitivity is substantially lower than in most global climate models
- Claims that the differences are due to substantial downwards bias in estimates from these observational studies have little support in observations.

**Slide 3**

ECS refers to equilibrium climate sensitivity: the increase in global mean surface temperature (GMST) that a doubling of atmospheric CO_{2} concentration leads to, once the ocean has fully equilibrated. The ECS of a coupled atmosphere-ocean general circulation climate model (AOGCM, or just GCM) can be determined by running it to equilibrium in a 2x CO_{2} experiment, but that takes thousands of simulation years. All non-palaeoclimate real-world ECS estimates reflect effective climate sensitivity, which depends on the strength of climate feedbacks over the analysis period involved (although in a few cases estimates are calibrated to ECS in one or more AOGCMs).

In many but not all current generation (CMIP5) AOGCMs, effective climate sensitivity estimates based on transient forced warming fall short of ECS, to an extent depending on the model, the estimation period, the forcing profile and the method used. It is unknown whether effective and equilibrium climate sensitivity differ much in the real world.

A shorter term measure of sensitivity, transient climate response (TCR) represents the increase in GMST over a 70 year period during which CO_{2} concentration increases at 1% p.a., thereby doubling. The focus at Ringberg 2015 was mainly on ECS, which can be related, at least approximately, to the physical concepts of (a) the effective radiative forcing (ERF) that a doubling of CO_{2} concentration produces and (b) the sum of the climate feedbacks to surface warming. Although TCR depends also on ocean heat uptake characteristics and is thus does not have a simple physical interpretation, it is more relevant than ECS to warming over this century.

The first three bullet points of Slide 3 reiterate what the IPCC fifth assessment WG1 report (AR5) said in Chapters 10 and 12 about palaeoclimate ECS estimates and those based on short timescales or non-greenhouse gas (GHG) forcings, and its implicit conclusion that estimates based on multidecadal warming during the instrumental period (since about 1850) were likely to prove most reliable and provide the narrowest uncertainty bounds.

ECS estimates based on multidecadal warming typically use simple or intermediate complexity climate models driven by estimated forcing timeseries, and measure how well simulated surface temperatures and ocean heat uptake compare with observations as model parameters are adjusted. Energy budget methods are also used. These involve deriving ECS and/or TCR directly from estimates changes in forcing and measured changes in GMST and ocean heat content, usually between decadal or longer periods at the start and end of the multidecadal analysis period. Alternatively, regression-based slope estimates are sometimes used.

Attribution scaling methods refers to the use of scaling factors (multiple regression coefficients) derived from detection and attribution analyses that match observed warming to the sum of scaled AOGCM responses to different categories of forcing, based on their differing spatiotemporal fingerprints. The derived scaling factor for warming attributable to GHG can then be used to estimate ECS and/or TCR, using a simple model. This hybrid method of observationally-estimating ECS and TCR appears to work better than the ‘PPE’ approach, which involves varying AOGCM model parameters.

Various studies have combined palaeoclimate and instrumental ECS estimates using subjective Bayesian methods. I do not believe that such methods are appropriate for ECS estimation, as the results are sensitive to the subjective choice of prior distribution. It is however possible to use objective methods – both Bayesian and frequentist – to combine probabilistic ECS estimates, provided that the estimates are independent – which palaeo and instrumental estimates are usually assumed to be. However, since palaeo ECS estimates are normally less precise than good instrumental ones, combination estimates are usually dominated by the underlying instrumental estimate.

**Slide 4**

My talk concentrates on ECS estimates based on observed warming observed during the instrumental period, as they are thought to be able to provide the most reliable, best constrained observational estimates. Slide 4 shows a version of Box 12.2, Figure 1 from AR5 with all other types of ECS estimate removed. The bars represent 5–95% uncertainty ranges, with blobs showing the best (median) estimates.

For the Lewis (2013) study, the dashed range should be ignored and the solid range widened to 1.0–3.0°C (with unchanged median) to reflect non-aerosol forcing uncertainty, as discussed in that paper.

Although the underlying forcing and temperature data should be quite similar in all these studies, the estimates vary greatly.

**Slide 5
**

The relevant forcing concept here is ERF, denoted here by Δ*F*. AR5 defines it as follows: “ERF is the change in net TOA [top of atmosphere] downward radiative flux after allowing for atmospheric temperatures, water vapour and clouds to adjust, but with surface temperature or a portion of surface conditions unchanged.”

Δ*T* refers to the change in GMST resulting from a change in ERF, and Δ*Q* to a change in the planetary heating rate, mainly (>90%) reflected in ocean heat uptake.

Without using ocean heat content (OHC) data to estimate Δ*Q*, ECS tends to be ill-constrained.

Δ*Q* is not relevant to estimating TCR: the equivalent equation for generic TCR given in AR5 Chapter 10 is TCR = F_{2xCO2} ⨯ Δ*T */ Δ*F.*

Pre-2006 ECS studies almost all used the Levitus (2000) OHC data, which – due apparently to an uncorrected arithmetic error – gave substantially excessive values for Δ*Q*. Lin 2010 also used an excessive estimate for Δ*Q*, taken from the primarily model-based Hansen et al (2005) study. Moreover, many of the studies make no allowance for Δ*Q* being non-negligibly positive at the start of the instrumental period, as the Earth continued its recovery from the Little Ice Age. Gregory et al (2013) gives estimates of steric sea-level rise from 1860 on, derived from a naturally-forced model simulation starting in 850. Converting these to the planetary energy imbalance, and scaling down by 40% to allow for the model ECS of 3°C being high, gives Δ*Q* values of 0.15 W/m2 over 1860-1880 and 0.2 W/m2 from 1915-1960 (Δ*Q* being small in the intervening period due to high volcanism).

Multidecadal variability, represented by the quasi-periodic Atlantic Multidecadal Oscillation (AMO) in particular, means that the analysis period chosen is important. The AMO seems to be a genuine internal mode of variability, not as has been argued a forced pattern caused by anthropogenic aerosols.

The NOAA AMO index exhibits 60–70 year cycles over the instrumental period, peaking in the 1870s, around 1940, and in the 2000s. The AMO affects GMST, with a stronger influence in the northern hemisphere. As well as altering heat exchange between the ocean and atmosphere, the AMO also appears to modulate internal forcing through changing clouds – a little recognised point. As I will explain, the AMO can distort ECS estimation more seriously than its influence on GMST – of maybe ~0.2°C peak-to-peak – suggests.

**Slide 6
**

Taking the last point first, the Meinshausen et al (2009) and Rogelj et al (2012) TCR distributions featured in AR5 Figure 10.20(a) as estimated from observational constraints were actually based on ECS distributions selected simply to match the AR4 2-4.5°C ECS range and 3°C best estimate. They should therefore be regarded as estimates based primarily on expert opinion, not observations.

Uncertainty as to the change in aerosol forcing occurring during the instrumental period, Δ*F*_{aero}, is the most important source of uncertainty in most ECS and TCR estimates based on multidecadal warming. Chapter 8 of AR5 gives a 1.8 W/m^{2} wide 5-95% range for Δ*F*_{aero} over 1750–2011, about as large as the best estimate for (Δ*F – *Δ*Q*). The Lewis and Curry (2014) energy budget based study used the AR5 best estimate and uncertainty range for aerosol forcing (as well as other forcings), and hence its ECS and TCR estimates have 95% bounds that are much higher than their median values.

Otto et al (2013), although likewise using an energy budget method, used estimated forcings in CMIP5 AOGCMs (Forster et al 2013), which exhibit a narrower uncertainty range than AR5 gives, and adjusted their central value to reflect the difference between AOGCM and AR5 aerosol forcing estimates. Its resulting median estimate for TCR was accordingly almost identical to that in Lewis and Curry (2014), but its 95% bound based on the most recent data was lower.

It is possible to estimate Δ*F*_{aero} with considerably less uncertainty than that stated in AR5, using “inverse methods” that infer Δ*F*_{aero} from hemispherically- or zonally-resolved surface temperature data. This takes advantage of the latitudinally-inhomogeneous, northern hemisphere dominated, distribution of anthropogenic aerosol emissions, using a latitudinally-resolving model to estimate the spatial pattern of temperature changes at varying Δ*F*_{aero} levels. Of the ECS studies featured in AR4 and AR5, Andronova and Schlesinger (2001), Forest et al (2002 and 2006), Liberdoni and Forest (2011/13), Ring, Schlesinger et al (2012), Aldrin et al (2012) and Lewis (2013) used this approach; so did Skeie et al (2014).

However, inverse estimates of Δ*F*_{aero} are very unreliable if only GMST data is used. At a global level the evolution of Δ*F*_{aero} and Δ*F*_{GHG} is very highly correlated (*r* = 0.98 for the AR5 best estimate timeseries). Moreover, the diverge of the growth rates of the two series post the 1970s, when aerosol emissions flattened out, coincides and gets conflated with the AMO upswing.

**Slide 7
**

The AMO index smoothed pattern is shown by the red curve in the inset at the top of the LH panel of Slide 7, and can be seen to resemble the detrended GMST with shorter term natural signals removed (blue curve). The RH panel is not relevant to my argument, and can be ignored. Zhou and Tung may be overestimating the influence of the AMO on GMST (their range is in fact over 0.4 K, not 0.3 K as per the slide); Delsole et al (2011) estimate it to be about half as strong. However, even at one-quarter of the level shown it is enough to bias estimation of Δ*F*_{aero} up by a factor of two or more, with an accompanying upwards bias of 20% or more in the estimate of warming attributable to GHG (and hence in TCR estimates; ECS estimates are even worse affected).

The problem is that a combination of a strongly negative estimate for Δ*F*_{aero}, and a high estimate for ECS is able to mimic the effect on GMST caused by a factor (the AMO) not represented in the estimation model used. The slight fall in GMST between the 1940s and the early 1970s is matched by selecting a strongly growing negative Δ*F*_{aero} that counters increasingly positive Δ*F*_{GHG}, whilst the fast rise in GMST from the late 1970s on is matched by a high ECS (and hence high TCR), operating on a strong rise in Δ*F*_{GHG} that is no longer countered by strengthening Δ*F*_{aero}.

The ECS and TCR studies that use only the evolution of GMST (along with data pertaining to Δ*Q*) to estimate Δ*F*_{aero} jointly with ECS therefore usually reach a much more negative estimate for Δ*F*_{aero}, and a higher estimate for ECS, than studies that are able to estimate Δ*F*_{aero} from the differential evolution of hemispherically- or zonally-resolved surface temperature data. Studies affected by this problem include, for ECS, Knutti et al (2002), Tomassini et al (2007) and Olson et al (2012) and, for TCR, Knutti & Tomassini 2008 and Padilla et al (2011).

Using hemispherically- or zonally-resolved temperature data to estimate aerosol forcing fails to avoid contamination by the AMO when the analysis period is insufficiently long. Many AR4 era ECS and TCR studies used the 20th century as their analysis period. The1900s started with the AMO low and ended with the AMO high. Gillett et al (2012) found that, despite its uses of spatiotemporal patterns, their detection and attribution study’s estimate of warming attributable to GHG was biased ~40% high when based on 1900s data compared to with when the longer 1851-2010 period was used. ECS studies affected by this problem include Gregory et al (2002), Frame et al (2005) and Allen et al (2009).The Stott and Forest (2007) TCR estimate is also affected.

The Gregory and Forster (2008) TCR estimate, while avoiding the AMO’s influence on aerosol forcing estimation, is significantly biased up by the AMO’s direct enhancement of the GMST trend over the short 1970–2006 analysis period used.

I will leave it there for Part 1; in Part 2 I will move on to problems with Bayesian approaches to climate sensitivity estimation.

*References*

Aldrin, M., M. Holden, P. Guttorp, R.B. Skeie, G. Myhre, and T.K. Berntsen, 2012. Bayesian estimation of climate sensitivity based on a simple climate model fitted to observations of hemispheric temperatures and global ocean heat content. *Environmetrics*;23: 253–271.

Andronova, N.G. and M.E. Schlesinger, 2001. Objective estimation of the probability density function for climate sensitivity. *J. Geophys. Res*.,106 (D19): 22605–22611.

DelSole, T., M. K. Tippett, and J. Shukla, 2011: A significant component of unforced multidecadal variability in the recent acceleration of global warming. *J. Clim.,* 24, 909–926.

Forest, C.E., P.H. Stone, A.P. Sokolov, M.R. Allen and M.D. Webster, 2002.Quantifyinguncertainties in climate system properties with the use of recent climate observations. *Science*; 295: 113–117

Forest, C.E., P.H. Stone and A.P. Sokolov, 2006.EstimatedPDFs of climate system properties including natural and anthropogenic forcings. *Geophys. Res. Lett*., 33: L01705

Forster, P.M., T. Andrews, P. Good, J.M. Gregory, L.S. Jackson, and M. Zelinka, 2013. Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. *J. Geophys. Res*., 118: 1139–1150.

Frame D.J., B.B..B. Booth, J.A. Kettleborough, D.A. Stainforth, J.M. Gregory, M. Collins and M.R. Allen, 2005.Constrainingclimateforecasts: The role of prior assumptions*. **Geophys. Res. Lett.*, 32, L09702Fyfe, J.C., N.P. Gillett, and F.W. Zwiers, 2013.Overestimatedglobal warming over the past 20 years.* Nature Clim.Ch.*; 3.9: 767–769.

Gillett NP, Arora VK, Flato GM, Scinocca JF, von Salzen K (2012) Improved constraints on 21st-century warming derived using 160 years of temperature observations. *Geophys. Res. Lett.*, 39, L01704, doi:10.1029/2011GL050226.

Gregory, J.M., R.J. Stouffer, S.C.B. Raper, P.A. Stott, and N.A. Rayner, 2002. An observationally based estimate of the climate sensitivity. *J. Clim.*,15: 3117–3121.

Gregory, J. M., and P. M. Forster, 2008: Transient climate response estimated from radiative forcing and observed temperature change. *J. Geophys. Res. Atmos.*, 113, D23105.

Knutti, R., T.F. Stocker, F. Joos, and G.-K. Plattner, 2002. Constraints on radiative forcing and future climate change from observations and climate model ensembles. *Nature*, 416: 719–723.

Knutti, R., and L. Tomassini, 2008: Constraints on the transient climate response from observed global temperature and ocean heat uptake. *Geophys. Res. Lett*., 35, L09701.

Levitus, S., J. Antonov, T. Boyer, and C Stephens, 2000. Warming of the world ocean, *Science*; 287: 5641.2225–2229.

Lewis, N., 2013. An objective Bayesian, improved approach for applying optimal fingerprint techniques to estimate climate sensitivity. *J. Clim.,*26: 7414–7429.

Lewis N, Curry JA (2014) The implications for climate sensitivity of AR5 forcing and heat uptake estimates. *Clim. Dyn*. DOI 10.1007/s00382-014-2342-y

Libardoni, A.G. and C. E. Forest, 2011. Sensitivity of distributions of climate system properties to the surface temperature dataset. *Geophys. Res. Lett.*; 38, L22705. Correction, 2013: doi:10.1002/grl.50480.

Lin, B., et al., 2010: Estimations of climate sensitivity based on top-of-atmosphere radiation imbalance. *Atmos. Chem. Phys.*, 10: 1923–1930.

Meinshausen, M., et al., 2009: Greenhouse-gas emission targets for limiting global warming to 2 °C. *Nature*, 458, 1158–1162.

Olson, R., R. Sriver, M. Goes, N.M. Urban, H.D. Matthews, M. Haran, and K. Keller, 2012. A climate sensitivity estimate using Bayesian fusion of instrumental observations and an Earth System model. *J. Geophys. Res. Atmos.,*117: D04103.

Otto, A., et al., 2013. Energy budget constraints on climate response. *Nature Geoscience*, 6: 415–416.

Padilla, L. E., G. K. Vallis, and C. W. Rowley, 2011: Probabilistic estimates of transient climate sensitivity subject to uncertainty in forcing and natural variability.* J. Clim.,* 24, 5521–5537.

Ring, M.J., D. Lindner, E.F. Cross, and M.E. Schlesinger, 2012. Causes of the global warming observed since the 19th century. *Atmos. Clim. Sci.*, 2: 401–415.

Rogelj, J., M. Meinshausen, and R. Knutti, 2012: Global warming under old and new scenarios using IPCC climate sensitivity range estimates. *Nature Clim. Change*, 2, 248–253.

Schwartz, S.E., 2012. Determination of Earth’s transient and equilibrium climate sensitivities from observations over the twentieth century: Strong dependence on assumed forcing. *Surv.Geophys*., 33: 745–777.

Tomassini, L., P. Reichert, R. Knutti, T.F. Stocker, and M.E. Borsuk, 2007. Robust Bayesian uncertainty analysis of climate system properties using Markov chain Monte Carlo methods. *J. Clim.*, 20: 1239–1254.

Zhou, J., and K.-K. Tung, 2013. Deducing multidecadal anthropogenic global warming trends using multiple regression analysis. *J. Atmos. Sci*., 70, 3–8.

]]>