# Decision [[binary-tree|Tree]]
- Components
- Each **node** specifies an evaluation of a feature.
- Each branch corresponds to a feature value.
- Each **leaf** signifies a categorical decision (label).
- Procedure
1. Select the best feature for root.
2. Construct branch for each possible feature value
3. Split data along each branch
4. [[recursion|Recurse]] for each branch
5. Terminate into leaf node after adequate performance
- Selecting the best feature = minimizing the _[[entropy|information entropy]]_
after partitioning.
- _Information Gain_ (IG)
- Features that perfectly partition should give maximal information
- Unrelated attributes should give no information
- Defined as expected reduction in entropy by partitioning according to the
particular feature.
- Calculate entropy in each partition, times the probability, and sum up
- Prevent overfitting
- Pre-pruning: halt tree construction early
- Post-pruning: remove branches from "fully-grown" trees