Because in my view prior information about a fixed but unknown parameter needs to be conveyed by a likelihood function, not by a prior distribution. I believe that standard Bayesian theory in this respect does not in general lead to objective inference (at least in long run probability matching terms). The true function of the prior, if you are looking for objective inference, is to convert a density in data space – the likelihood function (whether from one experiment or two or more experiments combined – into a density in parameter space. That is, in essence, what a noninformative prior does.

]]>This they converge at the same rate, but above a certain sample size, the mle is always almost surely closer to the true value than the posterior mean.

]]>About this: *Bayesian parameter estimation isn’t focussed on obtaining a point estimate *

Everyone in science wants accurate point estimates. Invariance under transformations is generally disparaged by Bayesians when it is applied by frequentists. It’s the posterior mean about which you can prove it has a smaller mse than the mle, in appropriate cases.

Why use noninformative priors if the goal, as stated in your title, is to incorporate prior information.

These Critical comments aside, I respect your work.

]]>Just to clarify, I have focussed on estimation of parameter values rather than hypothesis testing.

The prior information that I am thinking of is information derived from observations, so is of the same nature as information derived from the new dataset being analysed. So logically their reliability should be tested in similar ways.

Bayesian parameter estimation isn’t focussed on obtaining a point estimate, but when one is wanted it is generally better to use the posterior median, which is invariant under reparameterisation, than the posterior mean (or the posterior mode), which is not.

“Unless the prior has a mean equal to the true value, the mle will converge to the true value faster than the mean of the corresponding posterior distribution.”

If a noninformative reference prior is used (being Jeffreys prior, in the univariate case) – which will often have no finite mean – then I believe the mle and the posterior distribtion will converge at the same rate to the true value of the parameter. See Bernardo and Smith, 1994, Appendix B.4.4.

]]>May I recommend the book “A Comparison of Frequentist and Bayesian Methods of Estimation” by Prof Francisco Samaniego?

Short take-home message: the asserted superiority of Bayesian methods depends on the accuracy of the prior. He calls it a “threshold” that must be met for the Bayesian method to be better than the frequentist. He provides a Bayesian metric for assessing goodness.

Also, it may or may not be true that that a 0 value for a parameter is highly unlikely; but it is demonstrably true in many settings that parameter values really, really close to 0 are sufficiently accurate. Take, for example, the effects of most gene expression levels on most measures of health and disease: it might be called a vast desert of true nil null hypotheses.

I read CA now and then. You write at a uniformly high standard. Please keep up the good work.

Your title says: “Incorporating prior information “. Do you think that “prior information” ought to have been subjected to stringent testing like most other claims about the world? Has anyone shown that the mean of the posterior distribution calculated from using a Jeffreys prior converges to the true value faster than the maximum likelihood estimator from the same likelihood function as used there? Unless the prior has a mean equal to the true value, the mle will converge to the true value faster than the mean of the corresponding posterior distribution.

]]>Unfortunately, in many cases in climate science the data is poor and it is not practicable to improve it by obtaining more data – unless someone invents a time machine!

You say “reasonable people can engage in scientific discourse to discuss their beliefs and understandings of the prior.” However, I believe that the only valid purpose of the prior is to convert the likelihood function into a posterior PDF. If there is genuine existing knowledge, it needs to be represented as a likelihood function (to be combined with the likelihood function for the experiment being analysed, with which it may also be compared) not as a prior PDF. This view is, of course, anathma to most Bayesians, but I am confident that it will in time be seen to be correct.

I don’t disagree that cliamte scientists may often have over simplistic data models. But in the area of climate sensitivity estimation that i am most involved with, differences in estimates have mainly arisen either a) because different data sources, with notably different values, have been used (e.g., for aerosol forcing); or b) because different statistical methods have been used – mainly subjective Bayesian methods with a uniform or “expert” prior on the one hand, or either frequentist (including likelihood ratio) methods or (in the case of my studies in particular) objective Bayesian mothods with a noninformative prior on the other hand.

]]>BTW, the correct date for the Zellner text I was citing is indeed 1971.

]]>