ml - Samuel's Vault

# Machine Learning - **Supervision:** Training data objects and the features are accompanied by labels. - Classification (Categorical) - Regression (Numerical) - Unsupervised Learning - [[clustering|Clustering]] - Pattern / Association Mining > Given $N$ _training objects_ $\{(x_1, y_1), \dots, (x_N, y_N)\}$, such that > $x_i$ is the _feature vector_ of the $i$th _object_ and $y_i$ is its _label_. > A learning algorithm seeks a function $g: X \to Y$, where $X$ is the _feature_ > space and $Y$ is the _label_ space. - Data Splitting: Training/Validation (Dev)/Test Set - Overfitting - Predicts too closely/exactly - Starts to "memorizing" instead of learning and generalizing - Cross-validation - Partition the dataset into $k$ subsets (usually 5 or 10) - Pick a different subset for testing each time