Objective Bayesian parameter estimation: Incorporating prior information

A guest article by Nic Lewis


In a recent article I discussed Bayesian parameter inference in the context of radiocarbon dating. I compared Subjective Bayesian methodology based on a known probability distribution, from which one or more values were drawn at random, with an Objective Bayesian approach using a noninformative prior that produced results depending only on the data and the assumed statistical model. Here, I explain my proposals for incorporating, using an Objective Bayesian approach, evidence-based probabilistic prior information about of a fixed but unknown parameter taking continuous values. I am talking here about information pertaining to the particular parameter value involved, derived from observational evidence pertaining to that value. I am not concerned with the case where the parameter value has been drawn at random from a known actual probability distribution, that being an unusual case in most areas of physics. Even when evidence-based probabilistic prior information about a parameter being estimated does exist and is to be used, results of an experiment should be reported without as well as with that information incorporated. It is normal practice to report the results of a scientific experiment on a stand-alone basis, so that the new evidence it provides may be evaluated.

In principle the situation I am interested in may involve a vector of uncertain parameters, and multi-dimensional data, but for simplicity I will concentrate on the univariate case. Difficult inferential complications can arise where there are multiple parameters and only one or a subset of them are of interest. The best noninformative prior to use (usually Bernardo and Berger’s reference prior)[1] may then differ from Jeffreys’ prior.

Bayesian updating

Where there is an existing parameter estimate in the form of a posterior PDF, the standard Bayesian method for incorporating (conditionally) independent new observational information about the parameter is “Bayesian updating”. This involves treating the existing estimated posterior PDF for the parameter as the prior in a further application of Bayes’ theorem, and multiplying it by the data likelihood function pertaining to the new observational data. Where the parameter was drawn at random from a known probability distribution, the validity of this procedure follows from rigorous probability calculus.[2] Where it was not so drawn, Bayesian updating may nevertheless satisfy the weaker Subjective Bayesian coherency requirements. But is standard Bayesian updating justified under an Objective Bayesian framework, involving noninformative priors?

A noninformative prior varies depending on the specific relationships the data values have with the parameters and on the data-error characteristics, and thus on the form of the likelihood function. Noninformative priors for parameters therefore vary with the experiment involved; in some cases they may also vary with the data. Two studies estimating the same parameter using data from experiments involving different likelihood functions will normally give rise to different noninformative priors. On the face of it, this leads to a difficulty in using objective Bayesian methods to combine evidence in such cases. Using the appropriate, individually noninformative, prior, standard Bayesian updating would produce a different result according to the order in which Bayes’ theorem was applied to data from the two experiments. In both cases, the updated posterior PDF would be the product of the likelihood functions from each experiment, multiplied by the noninformative prior applicable to the first of the experiments to be analysed. That noninformative priors and standard Bayesian updating may conflict, producing inconsistency, is a well known problem (Kass and Wasserman, 1996).[3]

Modifying standard Bayesian updating

My proposal is to overcome this problem by applying Bayes theorem once only, to the joint likelihood function for the experiments in combination, with a single noninformative prior being computed for inference from the combined experiments. This is equivalent to the modification of Bayesian updating proposed in Lewis (2013a).[4] It involves rejecting the validity of standard Bayesian updating for objective inference about fixed but unknown continuously-valued parameters, save in special cases. Such special cases include where the new data is obtained from the same experimental setup as the original data, or where the experiments involved are different but the same form of prior in noninformative in both cases. Continue reading

Marvel et al.: GISS did omit land use forcing

A guest article by Nic Lewis

I reported in a previous post, here, a number of serious problems that I had identified in Marvel et al. (2015): Implications for climate sensitivity from the response to individual forcings. This Nature Climate Change paper concluded, based purely on simulations by the GISS-E2-R climate model, that estimates of the transient climate response (TCR) and equilibrium climate sensitivity (ECS) based on observations over the historical period (~1850 to recent times) were biased low.

I followed up my first article with an update that concentrated on land use change (LU) forcing. Inter alia, I presented regression results that strongly suggested the Historical simulation forcing (iRF) time series used in Marvel et al. omitted LU forcing. Gavin Schmidt of GISS responded on RealClimate, writing:

“Lewis in subsequent comments has claimed without evidence that land use was not properly included in our historical runs…. These are simply post hoc justifications for not wanting to accept the results.”

In fact, not only had I presented strong evidence that the Historical iRF values omitted LU forcing, but I had concluded:

“I really don’t know what the explanation is for the apparently missing Land use forcing. Hopefully GISS, who alone have all the necessary information, may be able to provide enlightenment.”

When I responded to the RealClimate article, here, I inter alia presented further evidence that LU forcing hadn’t been included in the computed value of the total forcing applied in the Historical simulation: there was virtually no trace of LU forcing in the spatial pattern for Historical forcing. I wasn’t suggesting that LU forcing had been omitted from the forcings applied during the Historical simulations, but rather that it had not been included when measuring them.

Yesterday, a climate scientist friend drew my attention to a correction notice published by Nature Climate Change, reading as follows:

Corrected online 10 March 2016

In the version of this Letter originally published online, there was an error in the definition of F2×CO2 in equation (2). The historical instantaneous radiative forcing time series was also updated to reflect land use change, which was inadvertently excluded from the forcing originally calculated from ref. 22. This has resulted in minor changes to data in Figs 1 and 2, as well as in the corresponding main text and Supplementary Information. In addition, the end of the paragraph beginning’ Scaling ΔF for each of the single-forcing runs…’ should have read ‘…the CO2-only runs’ (not ‘GHG-only runs’). The conclusions of the Letter are not affected by these changes. All errors have been corrected in all versions of the Letter. The authors thank Nic Lewis for his careful reading of the original manuscript that resulted in the identification of these errors.” Continue reading

Bayesian parameter estimation: Radiocarbon dating re-examined

A guest article by Nic Lewis


In April 2014 I published a guest article about statistical methods applicable to radiocarbon dating, which criticised existing Bayesian approaches to the problem. A standard – subjective Bayesian – method of inference about the true calendar age of a single artefact from a radiocarbon date determination (measurement) involved using a uniform-in-calendar-age prior. I argued that this did not, as claimed, equate to not including anything but the radiocarbon dating information, and was not a scientifically sound method for inference about isolated examples of artefacts.[1]

My article attracted many comments, not all agreeing with my arguments. This article follows up and expands on points in my original article, and discusses objections raised.

First, a brief recap. Radiocarbon dating involves determining the radiocarbon age of (a sample from) an artefact and then converting that determination to an estimate of the true calendar age t, using a highly nonlinear calibration curve. It is this nonlinearity that causes the difficulties I focussed on. Both the radiocarbon determination and the calibration curve are uncertain, but errors in them are random and in practice can be combined. A calibration program is used to derive estimated calendar age probability density functions (PDFs) and uncertainty ranges from a radiocarbon determination.

The standard calibration program OxCal that I concentrated on uses a subjective Bayesian method with a prior that is uniform over the entire calibration period, where a single artefact is involved. Calendar age uncertainty ranges for an artefact whose radiocarbon age is determined (subject to measurement error) can be derived from the resulting posterior PDFs. They can be constructed either from one-sided credible intervals (finding the values at which the cumulative distribution function (CDF) – the integral of the PDF – reaches the two uncertainty bound probabilities), or from highest probability density (HPD) regions containing the total probability in the uncertainty range.

In the subjective Bayesian paradigm, probability represents a purely personal degree of belief. That belief should reflect existing knowledge, updated by new observational data. However, even if that body of knowledge is common to two people, their probability evaluations are not required to agree,[2] and may for neither of them properly reflect the knowledge on which they are based. I do not regard this as a satisfactory paradigm for scientific inference.

I advocated taking instead an objective Bayesian approach, based on using a computed “noninformative prior” rather than a uniform prior. I used as my criterion for judging the two methods how well they performed upon repeated use, hypothetical or real, in relation to single artefacts. In other words, when estimating the value of a fixed but unknown parameter and giving uncertainty ranges for its value, how accurately would the actual proportions of cases in which the true value lies within each given range correspond to the indicated proportion of cases? That is to say, how good is the “probability matching” (frequentist coverage) of the method. I also examined use of the non-Bayesian signed root log-likelihood ratio (SRLR) method, judging it by the same criterion. Continue reading

Gerry Browning: In Memory of Professor Heinz Kreiss

Gerry Browning writes:

The Correct System of Equations for Climate and Weather Models

The system of equations numerically approximated by both weather and climate models is called the hydrostatic system. Using a scale analysis for mid-latitude large scale motions in the atmosphere (motions with a horizontal length scale of 1000 km and time scale of a day), Charney (1948) showed that hydrostatic balance, i.e., balance between the vertical pressure gradient and gravitational force, is satisfied to a high degree of accuracy by these motions. As the fine balance between these terms was difficult to calculate numerically and to remove fast vertically propagating sound waves to allow for numerical integration using a larger time step, he introduced the hydrostatic system that assumes exact balance between the vertical pressure gradient and the gravitational force. This system leads to a columnar (function of altitude) equation for the vertical velocity called Richardson’s equation.

A scale analysis of the equations of atmospheric motion assumes that the motion will retain those characteristics for the period of time indicated by the choice of the time scale (Browning and Kreiss, 1986). This means that the initial data must be smooth (have spatial derivatives on the order of 1000 km) that lead to time derivatives on the order of a day. To satisfy the latter constraint, the initial data must satisfy the elliptic constraints determined by ensuring a number of time derivatives are of the order of a day. If all of these conditions are satisfied, then the solution can be ensured to evolve smoothly, i.e., on the spatial and time scales used in the scale analysis. This latter mathematical theory for hyperbolic systems is called “The Bounded Derivative Theory” (BDT) and was introduced by Professor Kreiss (Kreiss, 1979, 1980).

Instead of assuming exact hydrostatic balance (leads to a number of mathematical problems discussed below), Browning and Kreiss (1986) introduced the idea of slowing down the vertically propagating waves instead of removing them completely, thus retaining the desirable mathematical property of hyperbolicity of the unmodified system. This modification was proved mathematically to accurately describe the large scale motions of interest and, subsequently, also to describe smaller scales of motion in the mid-latitudes (Browning and Kreiss, 2002). In this manuscript, the correct elliptic constraints to ensure smoothly evolving solutions are derived. In particular the elliptic equation for the vertical velocity is three dimensional, i.e., not columnar, and the horizontal divergence must be derived from the vertical velocity in order to ensure a smoothly evolving solution.

It is now possible to see why the hydrostatic system is not the correct reduced system (the system that correctly describes the smoothly evolving solution to a first degree of approximation). The columnar vertical velocity equation (Richardson’s equation) leads to columnar heating that is not spatially smooth. This is called rough forcing and leads to the physically unrealistic generation of large amounts of energy in the highest wave numbers of a model (Browning and Kreiss, 1994; Page, Fillion, and Zwack, 2007). This energy requires large amounts of nonphysical numerical dissipation in order to keep the model from becoming unstable, i.e., blowing up. We also mention that the boundary layer
interacts very differently with a three dimensional elliptic equation for the vertical velocity than with a columnar equation (Gravel, Browning, and Kreiss).

Browning, G. L., and H.-O. Kreiss 1986: Scaling and computation of smooth atmospheric motions. Tellus, 38A, 295–313.
——, and ——, 1994: The impact of rough forcing on systems with multiple time scales. J. Atmos. Sci., 51, 369-383
——, and ——, 2002: Multiscale bounded derivative initialization for an arbitrary domain. J. Atmos. Sci., 59, 1680-1696.
Charney, J. G., 1948: On the scale of atmospheric motions. Geofys.Publ., 17, 1–17.
Kreiss, H.-O., 1979: Problems with different time scales for ordinary differential equations. SIAM J. Num. Anal., 16, 980–998.
——, 1980: Problems with different time scales for partial differential equations. Commun. Pure Appl. Math, 33, 399–440.
Gravel, Sylvie et al.: The relative contributions of data sources and forcing components to the large-scale forecast accuracy of an operational model. This web site
Page, Christian, Luc Fillion, and Peter Zwack, 2007: Diagnosing summertime mesoscale vertical motion: implications for atmospheric data assimilation. Monthly Weather Review, 135, 2076-2094.

Disappearing the MWP at Icefields, Alberta

In today’s post, I’m going to critically examine another widely used tree ring chronology: the Icefields (Alberta) MXD RCS chronology of Luckman and Wilson (2005 pdf), used most recently in Wilson et al 2016.

I’ll show that the RCS technique used in the LW2005 MXD chronology eliminated high medieval values as a tautology of their method, not as a property of the data and that the Icefields data provides equal (or greater) justification for MXD RCS chronologies with elevated medieval values.  The measure of potential difference is previewed in Figure 1 below, which compares the reported LW2005 chronology (top panel) to an MXD chronology with elevated medieval values, calculated using equally (or more plausible) RCS methodology, with several other similar variations shown in the post.


Figure 1.  Top – MXD RCS chronology from Luckman and Wilson 2005 Figure 2; bottom – MXD RCS chronology calculated under alternative assumptions – see Figure 3 below. 

I will use random effects statistical techniques both to analyse the data and to place prior analysis in a more formal statistical context.  Because LW2005 coauthor Rob Wilson stands alone for civility in the paleoclimate world and because the present post is critical of past analysis, in some ways, I would have preferred to use another example. However, on the other hand, it is also possible that Rob will recognize that the techniques applied in the present post – techniques that are unfamiliar to dendros –  can yield fresh and important insight into the data and will look at the Icefields data a little differently.

Although the article in consideration was published more than a decade ago, the analysis in today’s article was impossible until relatively recently, because coauthor Luckman withheld the relevant data for over a decade.

Continue reading

New Light on Gulf of Alaska

Last week, I posted on the effect of ex post site selection on the Gulf of Alaska tree ring chronology used in Wilson et al 2016 (from Wiles et al 2014).  An earlier incarnation of this chronology (in D’Arrigo et al 2016) had had a severe divergence problem, a problem that Wiles et al had purported to mitigate. However, their claimed mitigation depended on ex post selection of modern sites that were 800 km away from the original selection.

Even with this ex post selection, I could not replicate the supposed mitigation of the divergence problem, because Wiles et al had not archived all of their data: it appeared that their ex post result depended on still unarchived data.  Subsequent to my post,  Greg Wiles commendably archived the previously missing Wright Mountain (but, unfortunately, neglected to archive the update to Eyak Mountain leaving the data less incomplete but still incomplete.)

The new data confirms my suspicion that the “missing” Wright Mountain data was inhomogeneous: it turns out that the average Wright Mountain RW was 21% higher than the average RW value from other modern sites.  Because Wiles et al used RCS with a single regional curve, the inclusion of this inhomogeneous data results in higher recent values.  By itself, the Wright Mountain data doesn’t actually go up in the last half of the 20th century, but inclusion of the inhomogeneous data translates the chronology upwards relative to subfossil data.   In the Calvinball world of RCS chronologies, the handling of inhomogeneous data is determined after the fact, with RCS chronologers seeming to be extremely alert to inhomogeneities that yield high medieval values (e.g. Polar Urals), but rather obtuse to inhomogeneities that yield high modern values, with decisions on site inhomogeneity always being made ex post. All too often, medieval-modern comparisons rest on Calvinball decisions, rather than the integrity of the data.

In today’s post, I’ll use the random effects techniques (consistently recommended at CA) to try to provide a structured consideration of this example.  Continue reading

Marvel et al. – Gavin Schmidt admits key error but disputes everything else

A guest article by Nicholas Lewis


Gavin Schmidt has finally provided, at the GISS website, the iRF and ERF forcing values for a doubling of CO2 (F2xCO2) in GISS-E2-R, and related to this has made wholesale corrections to the results of Marvel et al. 2015 (MEA15). He has coupled this with a criticism at RealClimate of my appraisal of MEA15, writing about it “As is usual when people try too hard to delegitimise an approach or a paper, the criticisms tend to be a rag-bag of conceptual points, trivialities and, often, confused mis-readings – so it proves in this case”. Personally, I think this fits better as a description of Gavin Schmidt’s article. It contains multiple mistakes and misconceptions, which I think it worth setting the record straight on. Continue reading

A Return to Polar Urals: Wilson et al 2016

Wilson et al 2016, like D’Arrigo et al 2006, includes a ‘Polar Urals’ chronology as one of its components.  Tree ring chronologies from Polar Urals and Yamal have long been a contentious issue at Climate Audit, dating back to the earliest days (see tags Yamal, Polar Urals).

Whereas the D’Arrigo et al 2006 version had one of the biggest blades in the entire D’Arrigo et al 2006 portfolio (ending at nearly 4 sigma, with the smooth at 2.5 sigma), the new version, while slightly elevated in the 20th century, remains under 1 sigma.


Figure 1. ‘Polar Urals’ chronologies (scaled to SD Units): top- from D’Arrigo et al 2006 (RCS); bottom – Wilson et al 2014.  The top series is actually the Yamal chronology (Briffa, 2000), while the bottom series is a Polar Urals MXD version from Briffa et al (2013). 

In today’s post, I’ll first discuss the dramatic reduction in the blade of the ‘Polar Urals’ chronology from D’Arrigo et al 2006 to Wilson et al 2016 (which uses a variation of the Polar Urals MXD chronology from Briffa et al 2013).  This doesn’t take long to explain.

I’ll then use the rest of the post to discuss the bewildering thicket of adjustments to the Polar Urals MXD chronology in Briffa et al 2013.  I would be astonished if any practising paleoclimatologist (including the lead author of Wilson et al 2016) has ever made any attempt to figure out what Briffa (or more likely, Melvin actually did) in Briffa et al 2013.  I doubt that the editor and peer reviewers did either. Continue reading

Picking Cherries in the Gulf of Alaska

The bias arising from ex post selection of sites for regional tree ring chronologies has been a long standing issue at Climate Audit, especially in connection with Briffa’s chronologies for Yamal and Polar Urals (see tag.)  I discussed it most recently in connection with the Central Northwest Territories (CNWT) regional chronology of D’Arrigo et al 2006,  in which I showed a remarkable example of ex post selection.

In today’s post, I’ll show a third vivid example of the impact of ex post site selection on the divergence problem in Gulf of Alaska regional chronologies.  I did not pick this chronology as a particularly lurid example after examining multiple sites. This chronology is the first column in the Wilson et al 2016 N-TREND spreadsheet and was the first site in that collection that I examined closely.  It is also a site for which most (but not all) of the relevant data has been archived and which can therefore be examined. Unfortunately, data for many of the Wilson et al 2016 sites has not been been archived and, if past experience is any guide, it might take another decade to become available (by which time we will have all “moved on”). Continue reading

Cherry-Picking by D’Arrigo

One of the longest standing Climate Audit issues with paleoclimate reconstructions is ex post decisions on inclusion/exclusion of data, of which ex post decisions on inclusion/exclusion of sites/data in “regional [treering] chronologies” is one important family.  This was the issue in the original Yamal controversy, in response to which Briffa stated that they “would never select or manipulate data in order to arrive at some preconceived or unrepresentative result”. However, Briffa and associates have never set out ex ante criteria for site inclusion/exclusion, resulting in the methodology for Briffa regional reconstructions seeming more like Calvinball than science, as discussed in many CA posts.

Unlike Briffa, D’Arrigo has candidly admitted to the selection of data to arrive at a preconceived result. At the 2006 NAS panel workshop, Rosanne D’Arrigo famously told the surprised panelists that you had to pick cherries if you want to make cherry pie.   Again in 2009 (though not noticed at the time), D’Arrigo et al 2009 stated that they could “partially circumvent” the divergence problem by only using data that went up:

The divergence problem can be partially circumvented by utilizing tree-ring data for dendroclimatic reconstructions from sites where divergence is either absent or minimal. (Wilson et al., 2007; Buntgen et al., in press; Youngblut and Luckman, in press).

Portfolio managers would have like to have a similar option in constructing portfolios: if, after the fact, you pick stocks that went up, it would be trivially easy to “circumvent” market downturns.  That paleoclimatologists seem so obtuse to this simple observation is a major puzzlement.

In today’s post, I’ll show an absolutely breathtaking example of biased ex post picking by D’Arrigo et al in the D’Arrigo et al 2006 CNWT chronology.  It was impossible for anyone to identify the full measure of this bias at the time or for many years afterwards, as D’Arrigo and coauthors failed to archive data at the time and refused to provide it when requested. They were supported in their refusal by IPCC WG1 Co-Chair Susan Solomon, who, as CA readers are aware, threatened me with expulsion as an IPCC AR4 reviewer for seeking supporting data for D’Arrigo et al 2006 (then cited in preprint by AR4).   The data showing the cherry picking only became available in 2014 as part of a belated archiving program in the final year of Gordon Jacoby’s life.  

Continue reading