# Support Vector Machine
- Finding a _separating hyperplane_
- Must correctly classify the data
- The closest points have the greatest distance to it -- Maximize the margin
width
- Advantages
- Insensitive to high dimensionality
- Insensitive to class imbalance
- Disadvantages
- Data may not have a clear boundary
- Linear hyperplane doesn't exist in non-linear data.
- `sklearn.svm.LinearSVC`: a parameter `C` to dictate tradeoff between margin
and correct classification. The larger, the narrower the margin.
- Use _kernel function_ to map data points to a higher dimensionality, so that
they can be separated. A feature transformation
$\boldsymbol x \to \phi(\boldsymbol x)$ is needed.
- Polynomial
- Radial basis function (RBF), or Gaussian kernel
- Sigmoid kernel
## Problem Definition
For a binary classification task
- Notations
- $\boldsymbol x_i = (x_1, x_2, \dots, x_n)$, where
$\boldsymbol{x}\in \mathbb R^n$
- $y_i \in \{-1, 1\}$
- Derive a function $f: \boldsymbol x \to y$.
- The hyperplane is given as $\boldsymbol{wx} + b = 0$
- Where $\boldsymbol w = (w_1, w_2, \dots, w_n)$ is a weighted vector and $b$
is the bias.
- To use the SVM, we have
- Boundaries are given by $\boldsymbol{wx^\pm} + b = \pm 1$
- If $y_i = +1$, then $\boldsymbol{wx}_i + b \ge 1$
- If $y_i = -1$, then $\boldsymbol{wx}_i + b \le 1$
- i.e. $y_i(\boldsymbol{wx}_i + b) \ge 1$
- Margin width is given by $2/\lVert\boldsymbol w\rVert$
$
\begin{align}
\min \quad & \boldsymbol w^\intercal\boldsymbol w / 2 \\
\textrm{s.t.} \quad & y_i(\boldsymbol{wx}_i + b) \ge 1 \quad
\forall (\boldsymbol x_i, y_i)
\end{align}
$