K-means Clustering
An interactive visual comparison of k-means clustering performance on circular Gaussian blobs vs. non-circular moon-shaped data.
Introduction
K-means clustering is one of the most widely used unsupervised learning algorithms. It partitions data into k groups by iteratively assigning each point to the nearest cluster center and then updating the cluster centers to be the mean of the assigned points.
While k-means works beautifully on data with roughly spherical clusters, its assumptions can fail dramatically when clusters are non-spherical. To make this clear, I built an interactive artifact that runs k-means side-by-side on two very different datasets:
- Left: Circular Gaussian blobs (k-means-friendly)
- Right: Moon-shaped, non-circular data (k-means-unfriendly)
You can step through the algorithm iteration-by-iteration or watch it run automatically, adjusting parameters to see how the results change.
K-means Clustering Comparison Demo
Compare k-means performance on circular vs. non-circular data side by side
Circular Data
Non-circular Data
Key Learning Points
Spherical clusters that k-means handles well. Clean separation achieved in few steps.
Non-spherical clusters that k-means struggles with. Forces linear boundaries on curved structures.
How K-means Works
The k-means algorithm repeats two main steps until it converges:
- Assignment step
Each data point is assigned to the cluster whose center is closest in Euclidean distance: - Update step
Each cluster center is updated to be the mean of all points assigned to it:
The algorithm starts with k initial centers (chosen randomly or with the smarter k-means++ method) and stops when the centers move very little between iterations or after a fixed number of steps.
Measuring Progress: SSE
In our demo, we display the Sum of Squared Errors (SSE) at each iteration:
- A lower SSE means points are closer to their assigned center (tighter clusters).
- K-means always decreases SSE with each iteration until it converges.
- But low SSE does not always mean a good clustering — especially on non-spherical data.
What You’ll See in the Demo
- Initialization
The cluster centers (× marks) appear in their initial positions. - Iteration
Each step reassigns points to the nearest center, recoloring them, then moves the centers toward the mean of their assigned points. - Convergence
On the left, clusters quickly form around the Gaussian blobs.
On the right, k-means still forces linear boundaries, slicing curved clusters awkwardly. - Experimentation
Adjust k and initialization methods to see:- How initial positions affect the final result
- How k-means struggles with non-circular shapes
Why the Difference?
K-means assumes:
- Clusters are spherical in shape
- Clusters have similar size
- Clusters are linearly separable
When these assumptions hold (e.g., Gaussian blobs), the algorithm works great.
When they don’t (e.g., moons, rings), k-means’ rigid boundaries can’t adapt to the actual structure of the data.
With this interactive artifact, you can watch exactly how k-means evolves and see — in real time — both its strengths and limitations.