Decision Trees and Random Forests


This entry is part 6 of 6 in the series Decision Trees

A random forest is a collection of decision trees whose results are aggregated into one final result. Their ability to limit overfitting without substantially increasing error due to bias is why they are such powerful models.

  1. Instantiate the random forest classifier rf and set the random state.
  2. Create a dictionary cv_params of any of the following hyperparameters and their corresponding values to tune. The more you tune, the better your model will fit the data, but the longer it will take.
    • max_depth
    • max_features
    • max_samples
    • min_samples_leaf
    • min_samples_split
    • n_estimators
  3. Define a set scoring of scoring metrics for GridSearch to capture (precision, recall, F1 score, and accuracy).
  4. Instantiate the GridSearchCV object rf1. Pass to it as arguments:
    • estimator=rf
    • param_grid=cv_params
    • scoring=scoring
    • cv: define the number of you cross-validation folds you want (cv=_)
    • refit: indicate which evaluation metric you want to use to select the model (refit=_)

Here is an article at Medium called Decision Trees and Random Forests.

Series Navigation<< Decision Tree – Only Six Rows

Leave a Reply