Decision Tree Workflow


This entry is part 3 of 6 in the series Decision Trees

Are you working in Python? Do you want to build a decision tree?

Let’s work though this workflow with a particular dataset in mind. We’ll work with a bank churn dataset found on Kaggle.

Python Workflow

  1. Understand the Objective
  2. Know your data
  3. Import statements
  4. Read the Data
  5. Exploratory Data Analysis (EDA)
  6. Select an Evaluation Metric
  7. Feature Engineering
  8. Split the Data
  9. train a Baseline Model
  10. Tune the Model (GridSearchCV)

Your objective may be trying to predict consumer behavior. Perhaps you work at a bank and you want to be able to predict if a customer will churn or not. Churn means leave the the company. The bank doesn’t want that and if they can predict the type of person that is likely to leave, the bank can take action to reduce that probability.

Know your data by reading the data dictionary, if one exists.

Import your needed Python libraries into your Python project.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# This function displays the splits of the tree
from sklearn.tree import plot_tree

from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from sklearn.metrics import recall_score, precision_score, f1_score, accuracy_score

Perform EDA on the data. You can start with a few Python and pandas commands on the DataFrame, such as head(), info(), describe().

Series Navigation<< Decision Tree Python AlgorithmDecision Tree 4 Rows >>

Leave a Reply