PostsAbout me

Bayesian inference: Approaching certainty through sampling

Bayesian Data Analysis from Gelman et al. (2013), in section 3.7, presents the statistical analysis of a bioassay experiment. The parameters of the model are , and we draw samples from the numerically calculated posterior. Then the authors write:

All of the 1000 simulation draws had positive values of , so the posterior probability that is roughly estimated to exceed 0.999.

I thought this 0.999 figure is an overestimate; I analyze this question in this post.


The event “” is a Bernoulli-distributed random variable; let’s denote it with . If we draw samples from (and denote the results with ), the conditional probability distribution of is described by the following directed graphical model:

Bayes net for x_i and theta

The node for is filled because it’s observed, and the plate represents copies of this node (with ranging from to ).

If (resp. ) denote the number of samples where is true (resp. false), the likelihood is described by:

[p({x_i} \given \theta) = \text{Binomial}(n_1 \given n = S, p = \theta).]

We can assume a noninformative uniform prior on the probability on the unit interval. A Beta prior is conjugate to the Bernoulli likelihood, and , and this results in the following posterior:

[p(\theta \given {x_i}) = \text{Beta}(\theta \given \alpha_0 + n_0, \beta_0 + n_1).]

With and , this amounts to a distribution, whose pdf is as such:

Pdf of Beta(1001,1)

As expected, most of the probability mass is close to 1.0. But that graph is not very legible, so let’s zoom in on the right end of the x axis:

Pdf of Beta(1001,1) in [.99,1.0] interval

The red line marks the mean of the distribution, which is approximately , but not nearly all of the probability mass is on the right side of . Using the cdf of the posterior, we have that

[P(\theta > 0.999) = 0.63,]

meaning there’s still a 1 in 3 chance that the posterior probability that does not exceed . To be fair, is still good for a “rough estimate”, unless one has a strong prior for . (Given the nature of the experiment and the meaning of the parameter — the toxicity of a compound —, a flat prior on “” is reasonable.)

Presidential elections

A similar statement was made for 1988 pre-election polls, on page 70:

All of the 1000 simulations ; thus, the estimated posterior probability that Bush had more support than Dukakis in the survey population is over 99.9%.

When a presidential election is won “by a landslide”, that rarely means more than a 60-40% results; so in this case, I would rather use a prior that puts more mass on results close to 50-50%, for example :

Pdf of Beta(10,10)

This results in the following posterior:

Pdf of Beta(1010,10)

So in this case, the crude estimate does does not suffice, and we should rather be only 98% certain. (This is a 20-fold difference, , and a well-calibrated superforecaster could tell them apart.) If the stakes are high, then refine your model, and draw more samples.


The meaning of 1000 true + 0 false simulations depends on your prior beliefs: the posterior mean could be 0.999 (with a uniform prior), or anything less than 0.99 (with a prior weighted more towards the center or zero).

I love BDA3; I’m nowhere near finished, but even the first chapters have taught me new ideas and proofs (e.g. the Bayesian cookbook in section 3.8, or modeling normal data with unknown mean and variance). The examples and exercises are a great combination of applications and theory. As you can see from this post, all I can do is nitpick some tiny details. A quick intro to practical Bayesian modeling is a presentation from Andrew Gelman.

Did you like this post, did I make a mistake, or do you know a BDA3 discussion group? Let me know in the comments below!


import math
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats

%matplotlib inline
posterior = st.beta(1001,1)
def plot_beta(xs, rv, alpha=1001, beta=1, **plot_kwargs):
    plt.plot(xs, rv.pdf(xs), label='pdf', **plot_kwargs)
    plt.ylabel("Probability density function")
    plt.title(f"Pdf of Beta(θ | α = {alpha}, β = {beta})")
    plt.grid(True, alpha=0.5)
plot_beta(np.linspace(0,1,1000), posterior)
plt.gca().set_xlim(-0.01, 1.01)


def ramanujan(n):
    """Series that converges to 1/π at an exponential rate,
    by Srinivasa Ramanujan"""
    return 8**.5 / 9801 * sum(math.factorial(4*k)
                               / math.factorial(k)**4
                               / 396**(4*k)
                               * (1103 + 26390*k)
                              for k in range(0, n))

for i in range(1,4):
    print(f"1/ramanujan({i}) - π ≈ {1/ramanujan(i) - math.pi:.2e}")

# Easter egg. Thanks for reading!

1/ramanujan(1) - π ≈ 7.64e-08

1/ramanujan(2) - π ≈ 4.44e-16

1/ramanujan(3) - π ≈ 0.00e+00

plot_beta(np.linspace(0.990,1.0,1000), posterior)
plt.vlines(posterior.mean(), 0, plt.gca().get_ylim()[1], color='r', lw=1, label='mean')
plt.gca().legend(loc='upper left')




print('P(θ > 0.999) = {:d}%'.format(int(100*(1-posterior.cdf(0.999)))))

P(θ > 0.999) = 63%

print('P(θ > 0.998) = {:d}%'.format(int(100*(1-posterior.cdf(0.998)))))

P(θ > 0.998) = 86%


Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2013. Bayesian Data Analysis: Third Edition. Official webpage