# $k$-Meas
- Minimize sum of squared errors (against representative point in cluster)
- Representative points
- _Centroid_ = average of points in cluster
- _Medoid_ = similar to centroid but exists in the dataset
- _Median_ (take median on different axis to create new object)
- _Mode_ (similar to above)
## Algorithm
- Algorithms
- Global optimum: Minimize sum of squared errors (SSE)
- $k$-Means, Medoids, Medians, Modes
- Steps
- Randomly select $k$ points as initial centroids
- Form $k$ clusters by assigning points to nearest centroid
- Move each centroid to the mean of the cluster
- Repeat until convergence
## Validation
- _The Elbow Criterion_ - determine number of clusters
- Run $k$-means with different $k$
- Choose the smallest $k$ with a low SSE
- Plotting the graph and find the elbow
## Pros and Cons
- Pros
- Easy to compute
- Easy to interpret
- Efficient - $O(tkN)$, where $t$ is num of iterations
- Cons
- Results sensitive to distance measures
- Sensitive to initialized centroids
- Need to specify num of clusters
- Results sensitive to outliers and noise
- Not suitable to discover clusters with arbitrary shapes