# $k$-Meas - Minimize sum of squared errors (against representative point in cluster) - Representative points - _Centroid_ = average of points in cluster - _Medoid_ = similar to centroid but exists in the dataset - _Median_ (take median on different axis to create new object) - _Mode_ (similar to above) ## Algorithm - Algorithms - Global optimum: Minimize sum of squared errors (SSE) - $k$-Means, Medoids, Medians, Modes - Steps - Randomly select $k$ points as initial centroids - Form $k$ clusters by assigning points to nearest centroid - Move each centroid to the mean of the cluster - Repeat until convergence ## Validation - _The Elbow Criterion_ - determine number of clusters - Run $k$-means with different $k$ - Choose the smallest $k$ with a low SSE - Plotting the graph and find the elbow ## Pros and Cons - Pros - Easy to compute - Easy to interpret - Efficient - $O(tkN)$, where $t$ is num of iterations - Cons - Results sensitive to distance measures - Sensitive to initialized centroids - Need to specify num of clusters - Results sensitive to outliers and noise - Not suitable to discover clusters with arbitrary shapes