Are you working in Python? Do you want to build a decision tree?
Let’s work though this workflow with a particular dataset in mind. We’ll work with a bank churn dataset found on Kaggle.
Python Workflow
- Understand the Objective
- Know your data
- Import statements
- Read the Data
- Exploratory Data Analysis (EDA)
- Select an Evaluation Metric
- Feature Engineering
- Split the Data
- train a Baseline Model
- Tune the Model (GridSearchCV)
Your objective may be trying to predict consumer behavior. Perhaps you work at a bank and you want to be able to predict if a customer will churn or not. Churn means leave the the company. The bank doesn’t want that and if they can predict the type of person that is likely to leave, the bank can take action to reduce that probability.
Know your data by reading the data dictionary, if one exists.
Import your needed Python libraries into your Python project.
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier # This function displays the splits of the tree from sklearn.tree import plot_tree from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix from sklearn.metrics import recall_score, precision_score, f1_score, accuracy_score
Perform EDA on the data. You can start with a few Python and pandas commands on the DataFrame, such as head(), info(), describe().