A popular and widely used technique to improve model performance after creation is known as hyperparameter tuning. Hyperparameters are parameters that can be set before the model is trained. They can be tuned, or changed, to improve model performance directly affecting how the model is fit to the data. Hyperparameter tuning is the process of adjusting the parameters to find the best values that will result in the most optimal model.
Decision Tree Hyperparameters
Two commonly used hyperparameters are max depth and min samples leaf. Setting the max depth hyperparameter defines a limit of how long a decision tree can get. The depth of the decision tree is the number of levels between the root node and the farthest node from the root node, with the root node itself being level zero. Setting a value for max depth can help reduce overfitting problems by limiting how deep the tree will go, and it can reduce the computational complexity of training and using the model in the first place.
The min sample leaf hyperparameter defines the minimum number of samples that must be contained in a leaf node. It means that split will only happen if there are enough samples in each of the result nodes to satisfy the required value.
Grid search is a useful tool to confirm that a model achieves its intended purpose by systematically checking every combination of hyper parameters to identify which set produces the best results based on the selected metric. When performing a grid search, the first step is to specify which parameters you want to tune and then set of values that you want to search over. For example, maybe we want to tune max depth and min samples leaf, so we would define potential values for each of these. The algorithm will check every combination of values to see which pair has the best evaluation metrics. Of course the more combinations, the more computer time it takes.