Decision Tree - Only Six Rows - BeginCodingNow.com

This entry is part 5 of 6 in the series Decision Trees

This is a very simple example of building a decision tree model on a very small dataset that has only six rows of data. The dataset only has 6 rows of data. With so few rows of data we can easily see the tree and understand how the tree was built. Let’s dive right inn to the Python code we would use. To make things easier, I have provided the data right in the code itself, opposed to importing the data from an external file.

1import pandas as pd
2import matplotlib.pyplot as plt
3 
4# from sklearn.model_selection import train_test_split
5from sklearn.tree import DecisionTreeClassifier
6 
7# This function displays the splits of the tree
8from sklearn.tree import plot_tree

1# to manually create a DataFrame, start with a dictionary of equal-length lists.
2# The data has been slightly modified from the Udemy course
3data = {"J": [1, 1, 0, 1, 0, 1],
4        "K": [1, 1, 0, 0, 1, 1],
5        "L": [1, 0, 1, 0, 1, 0],
6        "Class": ['A', 'A', 'B', 'B', 'A', 'B']}
7df = pd.DataFrame(data)
8df

We are trying to predict the class. Notice that the K feature is the best feature for predicting.
When K = 1 it is class A three out of four times, which seems to the best predictor.
When L = 1 it is A two out of three times, which is not a good predictor.
When J = 1 it is class A half of the time, making J a poor predictor
So, in order they are K, L and J.

1# Define the y (target) variable
2y = df['Class']
3 
4# Define the X (predictor) variables
5X = df.copy()
6X = X.drop('Class', axis=1)
7X

1 y

Instantiate the Model

1# Instantiate the model
2decision_tree = DecisionTreeClassifier(random_state=42)
3 
4# We will use ALL of the date to train the model.
5# Fit the model to training data
6decision_tree.fit(X, y)

1print(X.columns)
2print('\n')
3X.info()

`1`	`cn` `=` `list(set(df['Class']))`

2 cn

1# Plot the tree
2plt.figure(figsize=(6,4))
3plot_tree(decision_tree, max_depth=4, fontsize=9, feature_names=['J', 'K', 'L'], class_names=cn, filled=True);
4plt.show()

Test Dataset One – Perfect K

1# to manually create a DataFrame, start with a dictionary of equal-length lists.
2# The data has been slightly modified from the Udemy course
3data_test = {"J": [1, 0, 0],
4        "K": [1, 1, 0],
5        "L": [1, 0, 1],
6        "Class": ['A', 'A', 'B']}
7df_test = pd.DataFrame(data_test)
8df_test

1# Define the y (target) variable
2y_test = df_test['Class']
3 
4# Define the X (predictor) variables
5X_test = df_test.copy()
6X_test = X_test.drop('Class', axis=1)
7X_test

1predictions = decision_tree.predict(X_test)
2# 
3from sklearn.metrics import classification_report, confusion_matrix
4print(confusion_matrix(y_test, predictions))
5print('\n')
6print(classification_report(y_test, predictions))

Test Dataset Two – Imperfect K

1# to manually create a DataFrame, start with a dictionary of equal-length lists.
2# The data has been slightly modified from the Udemy course
3data_test2 = {"J": [1, 0, 0],
4        "K": [1, 0, 0],
5        "L": [1, 0, 1],
6        "Class": ['A', 'A', 'B']}
7df_test2 = pd.DataFrame(data_test2)
8df_test2

1# Define the y (target) variable
2y_test2 = df_test2['Class']
3 
4# Define the X (predictor) variables
5X_test2 = df_test2.copy()
6X_test2 = X_test2.drop('Class', axis=1)
7X_test2

1predictions2 = decision_tree.predict(X_test2)
2print(confusion_matrix(y_test2, predictions2))
3print('\n')
4print(classification_report(y_test2, predictions2))

1[[1 1]
2 [0 1]]
3 
4 
5              precision    recall  f1-score   support
6 
7           A       1.00      0.50      0.67         2
8           B       0.50      1.00      0.67         1
9 
10    accuracy                           0.67         3
11   macro avg       0.75      0.75      0.67         3
12weighted avg       0.83      0.67      0.67         3

Series Navigation<< Decision Tree 4 RowsDecision Trees and Random Forests >>

BeginCodingNow.com

for data analysts & software developers

for data analysts & software developers

Decision Tree – Only Six Rows

Instantiate the Model

Test Dataset One – Perfect K

Test Dataset Two – Imperfect K

Leave a ReplyCancel reply