# Decision [[binary-tree|Tree]] - Components - Each **node** specifies an evaluation of a feature. - Each branch corresponds to a feature value. - Each **leaf** signifies a categorical decision (label). - Procedure 1. Select the best feature for root. 2. Construct branch for each possible feature value 3. Split data along each branch 4. [[recursion|Recurse]] for each branch 5. Terminate into leaf node after adequate performance - Selecting the best feature = minimizing the _[[entropy|information entropy]]_ after partitioning. - _Information Gain_ (IG) - Features that perfectly partition should give maximal information - Unrelated attributes should give no information - Defined as expected reduction in entropy by partitioning according to the particular feature. - Calculate entropy in each partition, times the probability, and sum up - Prevent overfitting - Pre-pruning: halt tree construction early - Post-pruning: remove branches from "fully-grown" trees