# Support Vector Machine - Finding a _separating hyperplane_ - Must correctly classify the data - The closest points have the greatest distance to it -- Maximize the margin width - Advantages - Insensitive to high dimensionality - Insensitive to class imbalance - Disadvantages - Data may not have a clear boundary - Linear hyperplane doesn't exist in non-linear data. - `sklearn.svm.LinearSVC`: a parameter `C` to dictate tradeoff between margin and correct classification. The larger, the narrower the margin. - Use _kernel function_ to map data points to a higher dimensionality, so that they can be separated. A feature transformation $\boldsymbol x \to \phi(\boldsymbol x)$ is needed. - Polynomial - Radial basis function (RBF), or Gaussian kernel - Sigmoid kernel ## Problem Definition For a binary classification task - Notations - $\boldsymbol x_i = (x_1, x_2, \dots, x_n)$, where $\boldsymbol{x}\in \mathbb R^n$ - $y_i \in \{-1, 1\}$ - Derive a function $f: \boldsymbol x \to y$. - The hyperplane is given as $\boldsymbol{wx} + b = 0$ - Where $\boldsymbol w = (w_1, w_2, \dots, w_n)$ is a weighted vector and $b$ is the bias. - To use the SVM, we have - Boundaries are given by $\boldsymbol{wx^\pm} + b = \pm 1$ - If $y_i = +1$, then $\boldsymbol{wx}_i + b \ge 1$ - If $y_i = -1$, then $\boldsymbol{wx}_i + b \le 1$ - i.e. $y_i(\boldsymbol{wx}_i + b) \ge 1$ - Margin width is given by $2/\lVert\boldsymbol w\rVert$ $ \begin{align} \min \quad & \boldsymbol w^\intercal\boldsymbol w / 2 \\ \textrm{s.t.} \quad & y_i(\boldsymbol{wx}_i + b) \ge 1 \quad \forall (\boldsymbol x_i, y_i) \end{align} $