Decision Tree Python Algorithm


This entry is part 2 of 6 in the series Decision Trees

Here’s a brief summary of what we need to do in Python. I’m thinking of Jupyter Notebook. There’s a lot more to learn here with decision trees, and this here is just an overview. Here I discuss a very basic understanding of how decision trees are built in Python.

In Python you’d start the code with importing packages. Next we import the data to a panda’s DataFrame. Next you do exploratory data analysis or EDA. Based on the data decide on an appropriate evaluation metric. Next, prepare the data for modeling. We may drop the unpredicted features. We might dummy encode a column or two, creating Boolean columns from the categorical predictor column. Split the data into training and test sets using the train test split function, and stratify based on the target. Train a baseline decision tree model. Instantiate the classifier and set the random state. We’ll assign it to a variable called decision_tree. Next we’ll fit it to the training data. Finally we’ll use the predict method to use the tree we just grew to make predictions on the X test data. Assigning the results to a variable called dt_pred. We’ll use our evaluation metric functions we imported. Now we’ll inspect the confusion matrix of our decision trees predictions. We examine the splits of the tree by using the plot tree function that we imported.

  1. importing packages
  2. import the data to a panda’s DataFrame
  3. exploratory data analysis or EDA
  4. We may drop the unpredicted features
  5. We might dummy encode a column or two
  6. Split the data into training and test sets using the train test split function, and stratify based on the target.
  7. Train a baseline decision tree model
  8. Instantiate the classifier and set the random state
  9. fit it to the training data
  10. use the predict method to use the tree we just grew to make predictions on the X test data
  11. use our evaluation metric functions we imported
  12. inspect the confusion matrix of our decision trees predictions
  13. examine the splits of the tree by using the plot tree function that we imported

I did not include hyperparameter tuning in this list. What is hyperparameter tuning?

Series Navigation<< Decision Tree Modelling IntroductionDecision Tree Workflow >>

Leave a Reply