Clustering colors in an image

Quick introduction

The applet demonstrates how the k-means clustering algorithm operates on a dataset which represents the different colors in an image.
Colors are represented as three-dimensional vectors of floating point values. Each component represents the amount of Red, Green and Blue in the color (RGB). Note that using this representation, if two vectors are "near", the colors they represents will be similar; also, the average of two vectors, particularly if they are not very different, represents a color perceived as roughly "halfway" between the original ones. This holds also when averaging multiple RGB vectors. There are infinite other ways to represent a color with a 3D vector, but only a subset of them can enjoy these properties. See below for details on how the choice of the color representation affects results of the clustering process.
  1. The first step is building the dataset the k-means algorithm will work on: this is done randomly sampling several thousands pixels from the image, for performance reasons (at the used 200x120 resolution, there's plenty of data to use).
  2. Then, an RGB representation of each pixel is created: the resulting dataset is an array of 3-vectors.
  3. The k-means algorithm is applied on the dataset, returning k cluster centers. They represent the k colors we'll use to draw the image.
  4. Every pixel of the image is converted to an RGB vector, and is drawn using the color represented by the nearest cluster center.
The resulting image is made up of only k different colors, which do their best to represent the image minimizing the error.
The problem is part of a broader class named "vector quantization", which has a number of applications in lossy data compression; indeed, reducing the number of colors in an image is useful to compress it: assuming to represent each color component with 8 bits, a small 100x100 image would need10000*24 bits (29.3 kBytes); if we manage to accettably represent the image using only 32 different colors, we can encode each with just 5 bits, and reduce the needed memory to 6.1 kBytes, plus the space required to define the 32 used colors (32*24 bits).
compression example

Instructions

Operating the applet is extremely simple: the top border holds the control buttons, then you can see the original image (above) and the clustered one (below); in the bottom there is a one-line status bar, which describes what's happening, and reports the total error in the current clustered image (if any).
Press the "Compress to X colors" button to start the k-means algorithm on the original image. Use the slider on the right to change the value of X.
Use the "Next Image" button to work on the next image of the provided set; allow the time for the image to be downloaded and resized.
While the k-means algorithm runs, intermediate results are shown.
The applet should work with Java 1.5 on all platforms. It currently does not work with previous releases.

Image summary






Considerations and experiments

Effects of the color representation on the clustering algorithm

We already noted that a color can be represented in many ways through 3D vectors: just think of using another triple of different colors.
However, radically different models can be used: HSV for example uses the three components Hue, Saturation and Value, that very roughly speaking can be thought of as a color's tint, vivacity and brightness (the terms used here have a very precise meaning in the color theory).
As an example, look at this "slice" of 3D HSV space:
hsv slice
On the x direction you see increasing Value; selected color is has an almost maximum "Value" parameter. On the y axis there is saturation: note that colors in the bottom are grayscale (0 saturation); selected color has an high saturation. The slice of the HSV cube (where the z axis is Hue), is taken at the Hue value of a Blue tint (you can see this in the right indicator);
Note that a couple of colors that are near in the HSV cube are similar. Is the baricenter of a couple of colors a good compromise between the two? If the two colors are not so far, we see that it should be.
But are we sure that "distance" between two HSV vectors represents the perceived distance between the two colors? A couple of colors distant in the Hue and saturation parameters, both with a value around 0, surely look very similar (almost black), but they are very far in the HSV cube!
This is why color theory states that HSV is not a perceptually uniform color representation: in a perceptually uniform representation, the euclidean distance between two color vectors is directly proportional to the perceived distance by a human being. As a side note, a perfectly p.u. representation is yet to be found; also, perceptual uniformity depends on the subject: for color blind people, for example, some colors perceived by others as quite different can look very similar. The applet uses an RGB representation modified to deal with the logarithmic response of our vision to brightness; it is quite p.u., but not up to the state-of-art LAB color model. A RGB to LAB conversion of a color is possible but not straightforward, and it does not bring noticeable improvements in this context. Color theory is a surprisingly complex field: in-depth information on perceptual uniformity and color models is found here, while a good starting point for color theory is the Color FAQ.
Perceptual uniformity is important when we apply clustering algorithms on colors, because that ensures that the euclidean distance between color vectors is close to real perceived color distance; for example, applying the kmeans algorithm on colors encoded in HSV, will give lower quality results: if there are two distinct clusters far in Hue and Saturation but  both with a very small Value, the algorithm will tend to assign a center to each: a waste! Infact, in a p.u. representation, those two clusters would be merged into one, since all these colors are almost black, and so their vectors would be similar; this would leave more centers for other clusters.

Data cloud examples

A scatter plot of the dataset originated by the first images helps to better visualize the problem: the representation used is RGB modified towards perceptual uniformity. X asis is the Red component, Y is Green, while Z is blue. Here is how (a random subset of) the pixels of the second image looks like:



Note that the points are distributed in a surface: infact, you can identify the three colors white, red and blue with the points of coordinates (1,1,1), (1,0,0) and (0,1,0). Other colors are combinations of the three given, generating a plane, which gets distorted to a curve surface because of the correction towards perceptual uniformity.

Here you can see a larger animated scatter plot for the first image (1.1 MB): the distribution is much more complex this time.

Additional notes

The code which implements the k-means algorithm has been modularized in a simple and minimalistic non-OO java library: download it here (source and documentation included).
It also supports restarting the algorithm with different initializations, then return the best result (a feature which is not used in this application); a callback-driven notification feature can be optionally used, that updates an observer with the cluster center positions and total error at every iteration.
The algorithm is rather fast: the applet spends most of the time redrawing the image with the new colors, rather than in the kmeans algorithm.

Tools

The applet is written using Eclipse and Sun's JDK 1.5.0.
Images are either photographs or sketched using the Gimp.
Scatter plots are generated by xd3d, and animated by gifsicle.

Feedback

Any feedback is appreciated!
Your email:
Message:


Alessandro Giusti
giusti (at) leet (dot) it
last edit: mar 2 2005

Return to my main page.

Creative Commons License
This work is licensed under a Creative Commons License.