# Machine Learning
- **Supervision:** Training data objects and the features are accompanied by
labels.
- Classification (Categorical)
- Regression (Numerical)
- Unsupervised Learning
- [[clustering|Clustering]]
- Pattern / Association Mining
> Given $N$ _training objects_ $\{(x_1, y_1), \dots, (x_N, y_N)\}$, such that
> $x_i$ is the _feature vector_ of the $i$th _object_ and $y_i$ is its _label_.
> A learning algorithm seeks a function $g: X \to Y$, where $X$ is the _feature_
> space and $Y$ is the _label_ space.
- Data Splitting: Training/Validation (Dev)/Test Set
- Overfitting
- Predicts too closely/exactly
- Starts to "memorizing" instead of learning and generalizing
- Cross-validation
- Partition the dataset into $k$ subsets (usually 5 or 10)
- Pick a different subset for testing each time