Posts An Introduction to Bayesian Inference

An Introduction to Bayesian Inference


Bayesian inference is an important method in statistics that tells us how to update a hypothesis based on new information or data. To understand Bayesian inference, we first need to understand Bayes’ Theorem. Knowledge in fundamental probability theory is assumed.

Bayes’ Theorem

Bayes’ Theorem allows us to use some knowledge or belief that we already have, denoted as the prior, to help calculate the probability of a related event. In other words, it’s a tool for us to calculate conditional probabilities. Mathematically, we define Bayes’ Theorem as

where A and B are our events. Here, P(A) is our prior, and we can use it to find our desired probability P(A|B).


Suppose you want to find the probability that you have a certain disease given that you tested positive, or (P(Disease|+). Using Bayes’ Theorem, where event A is having the disease and event B is testing positive, we can calculate this using known values P(+|Disease), P(Disease), and P(+)–the probability of testing positive given you have the disease, the overall probability of having the disease, and the overall probability of testing positive.

For an excellent visualization of this example, see this from Brown University’s Seeing Theory.

Bayesian Inference

Instead of using events A and B, now let’s use models, as we would in a real-world example. Let \(\theta\) be a Gaussian distribution defined by \(\theta={\mu, \sigma}\), and our new data or observations be \(y={y1, y2, …, yn}\).

In model form, Bayes’ theorem is:

Here, P(Theta) is our prior distribution, and P(Theta|data) is our posterior distribution accounting for our new observations. The only tricky thing left to solve for our posterior distribution is P(data|Theta). It turns out, this is the same thing as the likelihood distribution.

To visualize what’s really happening in Bayesian inference, here’s an example:

Here, the gold curve represents the likelihood distribution, and the blue curve represents the prior distribution. After we multiply the probability density functions for both distributions and normalize, we end up with our posterior distribution, shown below in pink:

Now you should have a basic understanding of the statistical tool that drives much of the modern-day information you see!

This post is licensed under CC BY 4.0 by the author.