Bayes' Theorem

Updating Beliefs with Evidence

Share:
Bayes' Theorem describes how to update probabilities based on new evidence. It provides a mathematical framework for reasoning under uncertainty and is fundamental to modern statistics, machine learning, and scientific inference. This elegant theorem reveals how prior knowledge combines with new information to form updated beliefs.

Bayes' Theorem

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

The probability of A given B equals the probability of B given A times the prior probability of A, divided by the probability of B.

P(AB)P(A|B)

The posterior probability of A given evidence B.

P(BA)P(B|A)

The likelihood of observing B given that A is true.

P(A)P(A)

The prior probability of A, before observing evidence.

P(B)P(B)

The marginal probability of B, the normalization constant.

Prerequisites & Learning Path

Before diving deep into Bayes' Theorem, it's helpful to have a solid understanding of foundational concepts. Here are the key prerequisites that will help you master this identity.

Probability Basics

Understanding of basic probability concepts.

Required

Data Analysis

Understanding of how to organize and analyze data.

Required

Bayes' Theorem

Understanding of conditional probability and Bayes' theorem.

Recommended

Historical Timeline

17th century
Pascal and Fermat develop probability theory.
18th century
Bayes develops Bayesian probability theory.
19th century
Gauss and others develop statistical methods and the normal distribution.
20th century
Modern statistics emerges with Fisher, Pearson, and others.

History of Bayes' Theorem

The theorem is named after Thomas Bayes (1701-1761), an English Presbyterian minister and mathematician. Bayes discovered the theorem but never published it during his lifetime. It was published posthumously by Richard Price in 1763 in "An Essay towards solving a Problem in the Doctrine of Chances," which appeared in the Philosophical Transactions of the Royal Society.

The theorem remained relatively obscure for over a century, primarily of interest to a small group of statisticians. It wasn't until the 20th century that it found widespread applications. The development of decision theory, Bayesian statistics, and later machine learning brought Bayes' Theorem to the forefront of modern data science.

Today, Bayesian methods are fundamental to artificial intelligence, medical diagnosis, spam filtering, scientific inference, and countless other applications. The theorem provides a principled way to update beliefs in light of new evidence, making it one of the most important results in probability theory.

What each symbol means: a deep dive

P(AB)P(A|B)

This is the posterior probability—the probability of event AA after observing evidence BB. It represents our updated belief about AA given the new information. This is what we want to calculate.

P(BA)P(B|A)

This is the likelihood—the probability of observing evidence BB given that AA is true. It measures how well the evidence supports hypothesis AA. In medical testing, this might be the probability of a positive test given that the patient has the disease.

P(A)P(A)

This is the prior probability—our initial belief about the probability of AA before observing any evidence. It represents what we know (or assume) about AA from previous experience, background knowledge, or general principles.

P(B)P(B)

This is the marginal probability or normalization constant—the total probability of observing evidence BB under all possible hypotheses. It ensures that the posterior probabilities sum to 1. It can be calculated using the law of total probability: P(B)=P(BA)P(A)+P(B¬A)P(¬A)P(B) = P(B|A)P(A) + P(B|\neg A)P(\neg A).

Probability tree diagram

This tree diagram shows all possible paths through the probability space. Follow the branches to see how prior probabilities combine with likelihoods to produce the final posterior probability. Each path shows the probability of that specific outcome.

Loading visualization...

The Proof: Step-by-Step Derivation

Bayes' Theorem follows directly from the definition of conditional probability and the product rule. The proof is elegant and straightforward.

Step 1: Definition of conditional probability

The conditional probability of AA given BB is defined as:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

Step 2: The same for P(B|A)

Similarly, the conditional probability of BB given AA is:

P(BA)=P(AB)P(A)P(B|A) = \frac{P(A \cap B)}{P(A)}

Step 3: Solve for the intersection

From the second equation, we can solve for P(AB)P(A \cap B):

P(AB)=P(BA)P(A)P(A \cap B) = P(B|A) \cdot P(A)

Step 4: Substitute into the first equation

Substituting this expression for P(AB)P(A \cap B) into the first equation gives Bayes' Theorem:

P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

This completes the proof. The theorem shows how to "reverse" conditional probabilities: if we know P(BA)P(B|A), we can calculate P(AB)P(A|B) using our prior knowledge P(A)P(A) and the normalization constant P(B)P(B).

The result

This is a fundamental identity in probability and statistics! Bayes' Theorem provides the mathematical foundation for updating beliefs with evidence, forming the basis for machine learning, medical diagnosis, scientific inference, and decision-making under uncertainty.

Examples for beginners

Let's work through some concrete examples to see Bayes' Theorem in action.

Example 1: Medical diagnosis

Suppose a disease affects 1% of the population. A test for the disease is 95% accurate (95% of people with the disease test positive, and 95% of people without the disease test negative). If someone tests positive, what is the probability they actually have the disease?

Let AA = "has the disease" and BB = "tests positive".

We know:

  • P(A)=0.01P(A) = 0.01 (prior: 1% have the disease)
  • P(BA)=0.95P(B|A) = 0.95 (95% of diseased test positive)
  • P(B¬A)=0.05P(B|\neg A) = 0.05 (5% of healthy test positive)

We need P(B)P(B):

P(B)=P(BA)P(A)+P(B¬A)P(¬A)P(B) = P(B|A)P(A) + P(B|\neg A)P(\neg A)
P(B)=0.95×0.01+0.05×0.99=0.0095+0.0495=0.059P(B) = 0.95 \times 0.01 + 0.05 \times 0.99 = 0.0095 + 0.0495 = 0.059

Now applying Bayes' Theorem:

P(AB)=0.95×0.010.059=0.00950.0590.161P(A|B) = \frac{0.95 \times 0.01}{0.059} = \frac{0.0095}{0.059} \approx 0.161

So even with a positive test, there's only about a 16% chance the person actually has the disease! This counterintuitive result shows why Bayes' Theorem is so important—it corrects for the base rate (prior probability).

Example 2: Spam filtering

Suppose 20% of emails are spam. A spam filter identifies 90% of spam correctly and incorrectly flags 5% of legitimate emails as spam. If an email is flagged as spam, what's the probability it's actually spam?

Let AA = "email is spam" and BB = "flagged as spam".

We have:

  • P(A)=0.20P(A) = 0.20
  • P(BA)=0.90P(B|A) = 0.90
  • P(B¬A)=0.05P(B|\neg A) = 0.05
P(B)=0.90×0.20+0.05×0.80=0.18+0.04=0.22P(B) = 0.90 \times 0.20 + 0.05 \times 0.80 = 0.18 + 0.04 = 0.22
P(AB)=0.90×0.200.22=0.180.220.818P(A|B) = \frac{0.90 \times 0.20}{0.22} = \frac{0.18}{0.22} \approx 0.818

So there's about an 82% chance a flagged email is actually spam.

Common Mistakes

When working with Bayes' Theorem, students often encounter several common pitfalls. Understanding these mistakes can help you avoid them and apply the identity correctly.

Forgetting to check domain restrictions

Incorrect:

Applying Bayes' Theorem without verifying that all variables satisfy the required conditions (e.g., denominators not zero, square roots of non-negative numbers).

Correct approach:

Always check the domain of each variable before applying the identity. Verify that all conditions are met.

Why this matters: Many identities have implicit domain restrictions that must be satisfied for the identity to hold.

Order of operations errors

Incorrect:

Incorrectly applying operations when using Bayes' Theorem, especially with fractions or exponents.

Correct approach:

Follow the correct order of operations: parentheses, exponents, multiplication/division, addition/subtraction.

Why this matters: Order of operations is critical for correctly applying mathematical identities.

Quantum Implications

While Bayes' Theorem is a classical probability result, its principles connect to quantum mechanics and quantum information theory in profound ways.

  • Quantum Measurement: When we measure a quantum system, we update our knowledge about its state. This process is analogous to Bayesian updating, where the measurement outcome is the evidence and the quantum state is the hypothesis being updated.
  • Quantum Bayesianism (QBism): This interpretation of quantum mechanics treats quantum probabilities as personal degrees of belief that are updated using rules similar to Bayes' Theorem. In QBism, quantum states represent an agent's knowledge, not objective reality.
  • Quantum Information Theory: Bayesian inference principles are used in quantum error correction, quantum state estimation, and quantum machine learning algorithms.
  • Wave Function Collapse: The collapse of the wave function upon measurement can be viewed as a Bayesian update, where the probability distribution (wave function) is updated based on the measurement outcome.
  • Quantum Computing: Quantum algorithms often involve updating probability amplitudes (which are complex numbers, not classical probabilities) based on measurement results, following principles analogous to Bayesian updating.

Note: The direct application of Bayes' Theorem in quantum mechanics is nuanced because quantum probabilities involve complex amplitudes and interference. However, the conceptual framework of updating beliefs based on evidence is central to both classical and quantum probability theory.

Philosophical Implications

Bayes' Theorem raises fundamental questions about the nature of probability and how we should reason under uncertainty.

  • Bayesian vs. Frequentist: Bayesians interpret probability as a degree of belief or confidence, which can be updated with new evidence. Frequentists interpret probability as the long-run frequency of events. This philosophical divide has shaped the development of statistics for over a century.
  • Subjective Probability: The Bayesian interpretation allows for subjective priors, meaning different people can start with different beliefs and update them based on the same evidence. This raises questions about objectivity in science.
  • Inductive Reasoning: Bayes' Theorem provides a mathematical framework for inductive reasoning—inferring general principles from specific observations. This addresses David Hume's problem of induction by showing how evidence can rationally update beliefs.
  • Scientific Method: Bayesian updating formalizes the scientific method: we start with hypotheses (priors), make observations (evidence), and update our beliefs (posteriors). This makes the process of scientific discovery mathematically precise.

The theorem's power lies in its ability to combine prior knowledge with new evidence in a principled way, making it a cornerstone of rational decision-making under uncertainty.

Patents and practical applications

Machine Learning

Naive Bayes classifiers, Bayesian networks, and probabilistic models use the theorem for classification and prediction.

Medical Diagnosis

Updating disease probabilities based on test results, symptoms, and patient history.

Spam Filtering

Classifying emails as spam or legitimate based on word frequencies and other features.

Scientific Inference

Updating hypotheses based on experimental evidence in fields from physics to psychology.

Legal Reasoning

Evaluating evidence in court cases and updating beliefs about guilt or innocence.

Search Engines

Ranking search results based on relevance probabilities.

Recommendation Systems

Predicting user preferences and recommending products, movies, or content.

Is it fundamental?

Is Bayes' Theorem fundamental to rational thought, or does it describe how we should think given our probabilistic framework? It provides a universal framework for updating beliefs in light of new evidence, applicable across virtually every domain of human knowledge. But is this fundamentality about the theorem itself, or about the way we've chosen to model uncertainty?

Its elegance lies in its simplicity: just four probabilities combine to give us an answer to "what should I believe now?" The theorem appears in machine learning (naive Bayes, Bayesian networks), medical diagnosis, spam filtering, scientific inference, legal reasoning, and countless other applications. Yet we might question: does this ubiquity suggest deep truth about rational thought, or does it reflect that we've built our frameworks around this theorem? Are we discovering fundamental structure, or creating useful conventions?

The theorem reveals counterintuitive truths (like the medical diagnosis example), showing that our intuitive reasoning about probability is often flawed. But is this a correction toward fundamental truth, or simply a correction toward mathematical consistency? Does Bayes' Theorem describe how the universe works, or how we should work with uncertainty? The distinction matters for understanding its fundamentality.

Open questions and research frontiers

Prior Selection

How should we choose prior probabilities when we have little or no prior knowledge? This is the problem of "uninformative priors" and remains an active area of research in Bayesian statistics.

Computational Methods

Calculating posterior probabilities can be computationally expensive for complex models. Research into Markov Chain Monte Carlo (MCMC) methods, variational inference, and other approximation techniques continues to advance.

Bayesian Machine Learning

Integrating Bayesian principles into deep learning and neural networks is an active research area, including Bayesian neural networks and uncertainty quantification.

Quantum Bayesianism

The QBist interpretation of quantum mechanics continues to be developed and debated, exploring how Bayesian principles might explain quantum phenomena.

Causal Inference

Extending Bayesian methods to causal reasoning, going beyond correlation to understand causation, is a frontier in statistics and machine learning.

People and milestones

  • Thomas Bayes (1701-1761): Discovered the theorem but never published it. His work was published posthumously by Richard Price in 1763.
  • Richard Price (1723-1791): Published Bayes' work and recognized its importance, adding an introduction that discussed its applications to probability and induction.
  • Pierre-Simon Laplace (1749-1827): Independently discovered and popularized the theorem, developing it further and applying it to problems in astronomy and statistics.
  • Harold Jeffreys (1891-1989): Developed the theory of Bayesian inference in the 20th century, creating methods for choosing uninformative priors.
  • Dennis Lindley (1923-2013): Promoted Bayesian methods in statistics, helping to establish them as a major approach alongside frequentist methods.
  • Judea Pearl (1936-present): Extended Bayesian methods to causal reasoning, developing Bayesian networks and the do-calculus, earning the Turing Award in 2011.

Related Identities

External References

Discovered Patterns

Research Notes

Loading notes...

Prior P(A) = 25.0%P(B|A) × P(A)100% × 25.0%P(B|¬A) × P(¬A)50% × 75.0%P(B) = 62.50%Posterior P(A|B) = 40.0%Before EvidenceEvidence (Positive Test)After Evidence (Bayesian Update)

P(A|B) = (P(B|A) × P(A)) / P(B)

= (1.00 × 0.250) / 0.625 = 0.400

Posterior probability: 40.0%

Visualization Guide:

━ Blue bar (top) — The prior probability P(A)P(A), representing our initial belief before observing any evidence.

━ Green section — True positives: P(BA)×P(A)P(B|A) \times P(A), the probability of observing evidence BB when AA is true.

━ Red section — False positives: P(B¬A)×P(¬A)P(B|\neg A) \times P(\neg A), the probability of observing evidence BB when AA is false.

━ Orange line — The total probability P(B)P(B), which is the sum of true and false positives. This is the normalization constant in Bayes' Theorem.

━ Purple bar (bottom) — The posterior probability P(AB)P(A|B), our updated belief after observing evidence BB. This is calculated as the green section divided by the orange line.

Controls: Adjust the prior probability, sensitivity (true positive rate), and specificity (true negative rate) using the sliders. Observe how the posterior probability changes, often in counterintuitive ways, especially when the prior is very small.