- Naive Bayes Introduction
Naive Bayes is a supervised classification algorithm based on Bayes’ theorem with an assumption of independence among predictors. In statistics, Naive Bayes classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong independence assumptions between the features.
Bayes’ theorem gives us a method of calculating the posterior probability, which is the likelihood of an event occurring after taking into consideration new information. In other words, when you calculate the probability of something happening, you take relevant observations into account.
Naive Bayes models are simple, fast, and good predictors. In certain situations, Naive Bayes is also known to outperform much more advanced classification methods. Even if a more advanced model is required, producing a Naive Bayes model can also be a great starting point.
Bank Churn Example
Suppose you have a dataset of bank data where each row represents a customer. You have a few columns in your data, one of which is called “Exited”. If the customer has left
When working in Python, we need to split the data first into features and a target variable and then into training data and test data. Let’s assign our predictive features to a variable called X. And the exited column, our target to a variable called Y. Then, we can split into training and test data.
I am jumping right in here after having made several assumptions. First, in a real-world project, we would have performed thorough exploratory data analysis (EDA), after first having planned your project. We would have performed feature engineering. What is that? Feature engineering is the second part of the analyze phase of Google’s PACE framework.