In 2012, the then much ballyhoo-ed Australian temperature reconstruction of Gergis et al 2012 mysteriously disappeared from Journal of Climate after being criticized at Climate Audit. Now, more than four years later, a successor article has finally been published. Gergis says that the only problem with the original article was a “typo” in a single word. Rather than “taking the easy way out” and simply correcting the “typo”, Gergis instead embarked on a program that ultimately involved nine rounds of revision, 21 individual reviews, two editors and took longer than the American involvement in World War II. However, rather than Gergis et al 2016 being an improvement on or confirmation of Gergis et al 2012, it is one of the most extraordinary examples of data torture (Wagenmakers, 2011, 2012) that any of us will ever witness.
A guest post by Nic Lewis
Introduction and Summary
In a recently published paper (REA16), Mark Richardson et al. claim that recent observation-based energy budget estimates of the Earth’s transient climate response (TCR) are biased substantially low, with the true value some 24% higher. This claim is based purely on simulations by CMIP5 climate models. As I shall show, observational evidence points to any bias actually being small. Moreover, the related claims made by Kyle Armour, in an accompanying “news & views” opinion piece, fall apart upon examination.
The main claim in REA16 is that, in models, surface air-temperature warming over 1861-2009 is 24% greater than would be recorded by HadCRUT4 because it preferentially samples slower-warming regions and water warms less than air. About 15 percentage points of this excess result from masking to HadCRUT4v4 geographical coverage. The remaining 9 percentage points are due to HadCRUT4 blending air and sea surface temperature (SST) data, and arise partly from water warming less than air over the open ocean and partly from changes in sea ice redistributing air and water measurements.
REA16 infer an observation-based best estimate for TCR from 1.66°C, 24% higher than the value of 1.34°C if based on HadCRUT4v4.. Since the scaling factor used is based purely on simulations by CMIP5 models, rather than on observations, the estimate is only valid if those simulations realistically reproduce the spatiotemporal pattern of actual warming for both SST and near-surface air temperature (tas), and changes in sea-ice cover. It is clear that they fail to do so. For instance, the models simulate fast warming, and retreating sea-ice, in the sparsely observed southern high latitudes. The available evidence indicates that, on the contrary, warming in this region has been slower than average, pointing to the bias due to sparse observations over it being in the opposite direction to that estimated from model simulations. Nor is there good observational evidence that air over the open ocean warms faster than SST. Therefore, the REA16 model-based bias figure cannot be regarded as realistic for observation-based TCR estimates. Continue reading →
I’ve submitted an article entitled “New Light on Deflategate: Critical Technical Errors” pdf to Journal of Sports Analytics. It identifies and analyzes a previously unnoticed scientific error in the technical analysis included in the Wells Report on Deflategate. The article shows precisely how the “unexplained” deflation occurred prior to Anderson’s measurement and disproves the possibility of post-measurement tampering. At present, there is insufficient information to determine whether the scientific error arose because the law firm responsible for the investigation (Paul, Weiss) omitted essential information in their instructions to their technical consultants (Exponent) or whether the technical consultants failed to incorporate all relevant information in their analysis. In either event, the error was missed by the NFL consultant Daniel Marlow of the Princeton University Department of Physics, by the authors of the Wells Report and by the NFL.
In my most recent post, I discussed yet another incident in the long running dispute about the inconsistency between models and observations in the tropical troposphere – Gavin Schmidt’s twitter mugging of John Christy and Judy Curry. Included in Schmidt’s exchange with Curry was a diagram with a histogram of model runs. In today’s post, I’ll parse the diagram presented to Curry, first discussing the effect of some sleight-of-hand and then showing that Schmidt’s diagram, after removing the sleight-of-hand and when read by someone familiar with statistical distributions, confirms Christy rather than contradicting him. Continue reading →
In the past few weeks, I’ve been re-examining the long-standing dispute over the discrepancy between models and observations in the tropical troposphere. My interest was prompted in part by Gavin Schmidt’s recent attack on a graphic used by John Christy in numerous presentations (see recent discussion here by Judy Curry). Schmidt made the sort of offensive allegations that he makes far too often:
As a result, Curry decided not to use Christy’s graphic in her recent presentation to a congressional committee. In today’s post, I’ll examine the validity (or lack) of Schmidt’s critique.
Schmidt’s primary dispute, as best as I can understand it, was about Christy’s centering of model and observation data to achieve a common origin in 1979, the start of the satellite period, a technique which (obviously) shows a greater discrepancy at the end of the period than if the data had been centered in the middle of the period. I’ll show support for Christy’s method from his long-time adversary, Carl Mears, whose own comparison of models and observations used a short early centering period (1979-83) “so the changes over time can be more easily seen”. Whereas both Christy and Mears provided rational arguments for their baseline decision, Schmidt’s argument was little more than shouting.
A guest article by Nic Lewis
In a recent article I discussed Bayesian parameter inference in the context of radiocarbon dating. I compared Subjective Bayesian methodology based on a known probability distribution, from which one or more values were drawn at random, with an Objective Bayesian approach using a noninformative prior that produced results depending only on the data and the assumed statistical model. Here, I explain my proposals for incorporating, using an Objective Bayesian approach, evidence-based probabilistic prior information about of a fixed but unknown parameter taking continuous values. I am talking here about information pertaining to the particular parameter value involved, derived from observational evidence pertaining to that value. I am not concerned with the case where the parameter value has been drawn at random from a known actual probability distribution, that being an unusual case in most areas of physics. Even when evidence-based probabilistic prior information about a parameter being estimated does exist and is to be used, results of an experiment should be reported without as well as with that information incorporated. It is normal practice to report the results of a scientific experiment on a stand-alone basis, so that the new evidence it provides may be evaluated.
In principle the situation I am interested in may involve a vector of uncertain parameters, and multi-dimensional data, but for simplicity I will concentrate on the univariate case. Difficult inferential complications can arise where there are multiple parameters and only one or a subset of them are of interest. The best noninformative prior to use (usually Bernardo and Berger’s reference prior) may then differ from Jeffreys’ prior.
Where there is an existing parameter estimate in the form of a posterior PDF, the standard Bayesian method for incorporating (conditionally) independent new observational information about the parameter is “Bayesian updating”. This involves treating the existing estimated posterior PDF for the parameter as the prior in a further application of Bayes’ theorem, and multiplying it by the data likelihood function pertaining to the new observational data. Where the parameter was drawn at random from a known probability distribution, the validity of this procedure follows from rigorous probability calculus. Where it was not so drawn, Bayesian updating may nevertheless satisfy the weaker Subjective Bayesian coherency requirements. But is standard Bayesian updating justified under an Objective Bayesian framework, involving noninformative priors?
A noninformative prior varies depending on the specific relationships the data values have with the parameters and on the data-error characteristics, and thus on the form of the likelihood function. Noninformative priors for parameters therefore vary with the experiment involved; in some cases they may also vary with the data. Two studies estimating the same parameter using data from experiments involving different likelihood functions will normally give rise to different noninformative priors. On the face of it, this leads to a difficulty in using objective Bayesian methods to combine evidence in such cases. Using the appropriate, individually noninformative, prior, standard Bayesian updating would produce a different result according to the order in which Bayes’ theorem was applied to data from the two experiments. In both cases, the updated posterior PDF would be the product of the likelihood functions from each experiment, multiplied by the noninformative prior applicable to the first of the experiments to be analysed. That noninformative priors and standard Bayesian updating may conflict, producing inconsistency, is a well known problem (Kass and Wasserman, 1996).
Modifying standard Bayesian updating
My proposal is to overcome this problem by applying Bayes theorem once only, to the joint likelihood function for the experiments in combination, with a single noninformative prior being computed for inference from the combined experiments. This is equivalent to the modification of Bayesian updating proposed in Lewis (2013a). It involves rejecting the validity of standard Bayesian updating for objective inference about fixed but unknown continuously-valued parameters, save in special cases. Such special cases include where the new data is obtained from the same experimental setup as the original data, or where the experiments involved are different but the same form of prior in noninformative in both cases. Continue reading →
A guest article by Nic Lewis
I reported in a previous post, here, a number of serious problems that I had identified in Marvel et al. (2015): Implications for climate sensitivity from the response to individual forcings. This Nature Climate Change paper concluded, based purely on simulations by the GISS-E2-R climate model, that estimates of the transient climate response (TCR) and equilibrium climate sensitivity (ECS) based on observations over the historical period (~1850 to recent times) were biased low.
I followed up my first article with an update that concentrated on land use change (LU) forcing. Inter alia, I presented regression results that strongly suggested the Historical simulation forcing (iRF) time series used in Marvel et al. omitted LU forcing. Gavin Schmidt of GISS responded on RealClimate, writing:
“Lewis in subsequent comments has claimed without evidence that land use was not properly included in our historical runs…. These are simply post hoc justifications for not wanting to accept the results.”
In fact, not only had I presented strong evidence that the Historical iRF values omitted LU forcing, but I had concluded:
“I really don’t know what the explanation is for the apparently missing Land use forcing. Hopefully GISS, who alone have all the necessary information, may be able to provide enlightenment.”
When I responded to the RealClimate article, here, I inter alia presented further evidence that LU forcing hadn’t been included in the computed value of the total forcing applied in the Historical simulation: there was virtually no trace of LU forcing in the spatial pattern for Historical forcing. I wasn’t suggesting that LU forcing had been omitted from the forcings applied during the Historical simulations, but rather that it had not been included when measuring them.
Yesterday, a climate scientist friend drew my attention to a correction notice published by Nature Climate Change, reading as follows:
“Corrected online 10 March 2016
In the version of this Letter originally published online, there was an error in the definition of F2×CO2 in equation (2). The historical instantaneous radiative forcing time series was also updated to reflect land use change, which was inadvertently excluded from the forcing originally calculated from ref. 22. This has resulted in minor changes to data in Figs 1 and 2, as well as in the corresponding main text and Supplementary Information. In addition, the end of the paragraph beginning’ Scaling ΔF for each of the single-forcing runs…’ should have read ‘…the CO2-only runs’ (not ‘GHG-only runs’). The conclusions of the Letter are not affected by these changes. All errors have been corrected in all versions of the Letter. The authors thank Nic Lewis for his careful reading of the original manuscript that resulted in the identification of these errors.” Continue reading →
A guest article by Nic Lewis
In April 2014 I published a guest article about statistical methods applicable to radiocarbon dating, which criticised existing Bayesian approaches to the problem. A standard – subjective Bayesian – method of inference about the true calendar age of a single artefact from a radiocarbon date determination (measurement) involved using a uniform-in-calendar-age prior. I argued that this did not, as claimed, equate to not including anything but the radiocarbon dating information, and was not a scientifically sound method for inference about isolated examples of artefacts.
My article attracted many comments, not all agreeing with my arguments. This article follows up and expands on points in my original article, and discusses objections raised.
First, a brief recap. Radiocarbon dating involves determining the radiocarbon age of (a sample from) an artefact and then converting that determination to an estimate of the true calendar age t, using a highly nonlinear calibration curve. It is this nonlinearity that causes the difficulties I focussed on. Both the radiocarbon determination and the calibration curve are uncertain, but errors in them are random and in practice can be combined. A calibration program is used to derive estimated calendar age probability density functions (PDFs) and uncertainty ranges from a radiocarbon determination.
The standard calibration program OxCal that I concentrated on uses a subjective Bayesian method with a prior that is uniform over the entire calibration period, where a single artefact is involved. Calendar age uncertainty ranges for an artefact whose radiocarbon age is determined (subject to measurement error) can be derived from the resulting posterior PDFs. They can be constructed either from one-sided credible intervals (finding the values at which the cumulative distribution function (CDF) – the integral of the PDF – reaches the two uncertainty bound probabilities), or from highest probability density (HPD) regions containing the total probability in the uncertainty range.
In the subjective Bayesian paradigm, probability represents a purely personal degree of belief. That belief should reflect existing knowledge, updated by new observational data. However, even if that body of knowledge is common to two people, their probability evaluations are not required to agree, and may for neither of them properly reflect the knowledge on which they are based. I do not regard this as a satisfactory paradigm for scientific inference.
I advocated taking instead an objective Bayesian approach, based on using a computed “noninformative prior” rather than a uniform prior. I used as my criterion for judging the two methods how well they performed upon repeated use, hypothetical or real, in relation to single artefacts. In other words, when estimating the value of a fixed but unknown parameter and giving uncertainty ranges for its value, how accurately would the actual proportions of cases in which the true value lies within each given range correspond to the indicated proportion of cases? That is to say, how good is the “probability matching” (frequentist coverage) of the method. I also examined use of the non-Bayesian signed root log-likelihood ratio (SRLR) method, judging it by the same criterion. Continue reading →
Gerry Browning writes:
The Correct System of Equations for Climate and Weather Models
The system of equations numerically approximated by both weather and climate models is called the hydrostatic system. Using a scale analysis for mid-latitude large scale motions in the atmosphere (motions with a horizontal length scale of 1000 km and time scale of a day), Charney (1948) showed that hydrostatic balance, i.e., balance between the vertical pressure gradient and gravitational force, is satisfied to a high degree of accuracy by these motions. As the fine balance between these terms was difficult to calculate numerically and to remove fast vertically propagating sound waves to allow for numerical integration using a larger time step, he introduced the hydrostatic system that assumes exact balance between the vertical pressure gradient and the gravitational force. This system leads to a columnar (function of altitude) equation for the vertical velocity called Richardson’s equation.
A scale analysis of the equations of atmospheric motion assumes that the motion will retain those characteristics for the period of time indicated by the choice of the time scale (Browning and Kreiss, 1986). This means that the initial data must be smooth (have spatial derivatives on the order of 1000 km) that lead to time derivatives on the order of a day. To satisfy the latter constraint, the initial data must satisfy the elliptic constraints determined by ensuring a number of time derivatives are of the order of a day. If all of these conditions are satisfied, then the solution can be ensured to evolve smoothly, i.e., on the spatial and time scales used in the scale analysis. This latter mathematical theory for hyperbolic systems is called “The Bounded Derivative Theory” (BDT) and was introduced by Professor Kreiss (Kreiss, 1979, 1980).
Instead of assuming exact hydrostatic balance (leads to a number of mathematical problems discussed below), Browning and Kreiss (1986) introduced the idea of slowing down the vertically propagating waves instead of removing them completely, thus retaining the desirable mathematical property of hyperbolicity of the unmodified system. This modification was proved mathematically to accurately describe the large scale motions of interest and, subsequently, also to describe smaller scales of motion in the mid-latitudes (Browning and Kreiss, 2002). In this manuscript, the correct elliptic constraints to ensure smoothly evolving solutions are derived. In particular the elliptic equation for the vertical velocity is three dimensional, i.e., not columnar, and the horizontal divergence must be derived from the vertical velocity in order to ensure a smoothly evolving solution.
It is now possible to see why the hydrostatic system is not the correct reduced system (the system that correctly describes the smoothly evolving solution to a first degree of approximation). The columnar vertical velocity equation (Richardson’s equation) leads to columnar heating that is not spatially smooth. This is called rough forcing and leads to the physically unrealistic generation of large amounts of energy in the highest wave numbers of a model (Browning and Kreiss, 1994; Page, Fillion, and Zwack, 2007). This energy requires large amounts of nonphysical numerical dissipation in order to keep the model from becoming unstable, i.e., blowing up. We also mention that the boundary layer
interacts very differently with a three dimensional elliptic equation for the vertical velocity than with a columnar equation (Gravel, Browning, and Kreiss).
Browning, G. L., and H.-O. Kreiss 1986: Scaling and computation of smooth atmospheric motions. Tellus, 38A, 295–313.
——, and ——, 1994: The impact of rough forcing on systems with multiple time scales. J. Atmos. Sci., 51, 369-383
——, and ——, 2002: Multiscale bounded derivative initialization for an arbitrary domain. J. Atmos. Sci., 59, 1680-1696.
Charney, J. G., 1948: On the scale of atmospheric motions. Geofys.Publ., 17, 1–17.
Kreiss, H.-O., 1979: Problems with different time scales for ordinary differential equations. SIAM J. Num. Anal., 16, 980–998.
——, 1980: Problems with different time scales for partial differential equations. Commun. Pure Appl. Math, 33, 399–440.
Gravel, Sylvie et al.: The relative contributions of data sources and forcing components to the large-scale forecast accuracy of an operational model. This web site
Page, Christian, Luc Fillion, and Peter Zwack, 2007: Diagnosing summertime mesoscale vertical motion: implications for atmospheric data assimilation. Monthly Weather Review, 135, 2076-2094.