Bayes’ Theorem Introduction


This entry is part 7 of 6 in the series Probability

Bayes’ Theorem is an important concept in Data Science. It is widely used in Machine Learning as a classifier that makes use of Naive Bayes’ Classifier. What is Bayes’ Theorem?

Bayes Theorem follows some basic concepts of probability. We have an article here

There is an article at Data Flair called Bayes’ Theorem – The Forecasting Pillar of Data Science

  • P(A): Prior probability
  • P(A|B): Posterior probability
  • P(B|A): Likelihood
  • P(B): Evidence

Posterior and Prior Probability

In Bayesian statistics, prior probability refers to the probability of an event before new data is collected. Posterior probability is the updated probability of an event based on new data. Bayes’s theorem states that for any two events A and B, the probability of A given B equals the probability of A multiplied by the probability of B given A divided by the probability of B.

Bayes Theorem gives us a method of calculating the posterior probability, which is the likelihood of an event occurring after taking into consideration new information.

A Spam Example

Suppose we have a dataset of emails. We’ve identified the ones that are spam and the ones that are not spam. We also observe that spam emails have certain characteristics. For example, spam emails seem to contain the word “money” more often than non-spam emails. You want to predict new incoming emails.

Let’s say you want to determine the probability that an email is spam given a specific word appears in the email. For this example, let’s use the word “money.”

  • The probability of an email being spam is 10%. P(spam)
  • The probability that the word “money” appears in an email is 15%. P(money)
  • The probability that the word “money” appears in a spam email is 40%. P(money|spam). Looking at all of our spam emails, we notice that the word “money” appears in 40% of them. This is also known as likelihood.
P(A|B) = \dfrac{P(B|A) * P(A)}{P(B)}

Putting our variable names into the formula.

P(spam|money) = \dfrac{P(money|spam) * P(spam)}{P(money)}

Let’s plug in our numbers from our spam example above.

P(A|B) = \dfrac{0.4 * 0.1}{0.15} = 0.2667

The probability that an email is spam, given that it contains the word “money” is 26.67%.

Medical and Other Examples

A typical use of Bayes’s theorem in the medical field is to calculate the probability that a person who tests positive on a screening test for a particular disease actually has the disease.

Online retailers use Bayesian algorithms to predict whether or not users will like certain products and services.

Naive Bayes is an algorithm based on Bayes Theorem that can be used when you need to use a supervised classifier and you are oaky with the assumption that there is independence among the predictor variables (X variables).

Series Navigation<< Binomial Probability Distributions

Leave a Reply