# Image Retrieval: Color Coherence Vector

It’s recommended to have a look at this post. It’s an introduction to Image Retrieval. Some of its terms and expressions are used in this post.

Last post we talked about two common color descriptors Global Color Histogram (GCH) and Local Color Histogram (LCH). Then we discussed the main problem of using GCH that it has no information about color spatial distribution. After that we discussed an attempt to solve this problem using LCH. Finally we showed some drawbacks of using LCH.

Color descriptors are used to differentiate between images and compute their similarities by describing their colors.

Now we’ll discuss one of the most efficient color descriptors that contains information about color spatial distribution which is called Color Coherence Vector (CCV).

**Color Coherence Vector**

Color Coherence Vector (CCV) is a more complex method than Color Histogram. It classifies each pixel as either coherent or incoherent. Coherent pixel means that it’s part of a big of connected component (CC) of the same color while incoherent pixel is part of a small connected component. Of course first we define the criteria which we use to measure whether a connected component is big or not.

**Feature extraction algorithm**

1. Blur the image (by replacing each pixel’s value with the average value of the 8 adjacent pixels surrounding that pixel).

2. Discretize the color-space (images’ colors) into n distinct color.

3. Classify each pixel either as coherent or incoherent. This is computed by

- Find connected components for each discretized color.
- Determine tau’s value (Tau is a user-specified value (Normally it’s about 1% of image’s size)).
- Any Connected Component has number of pixels more than or equal to tau then its pixels are considered coherent and the others are incoherent.

4. For each color compute two values (C and N).

- C is the number of coherent pixels.
- N is the number of incoherent pixels.

It’s clear that the summation of all color’s C and N = number of pixels.

**Matching function**

To compare 2 images a, b.

Ci : number of coherent pixels in color i.

Ni : number of incoherent pixels in color i.

Let’s take this example to make algorithm’s steps clear.

Assuming that the image has 30 colors instead of 16777216 colors (256*256*256).

Now we’ll discretize the colors to only three colors (0:9, 10:19, 20, 29).

Assuming that our tau is 4

For color 0 we have 2 CC (8 coherent pixels)

For color 1 we have 1 CC (8 coherent pixels)

For color 2 we have 2 CC (6 coherent pixels and 3 incoherent pixels)

So finally our feature vector is

**Drawbacks of Color Coherence Vector**

Now we see that Color Coherence Vector method considers information about color spatial distribution between pixels in its coherent component. But this method has some drawbacks. The remaining part of this post will discuss two main drawbacks of it.

Coherent pixels in CCV represent the pixels which are inside remarkable components in image. But what if we combined these entire components into one component. We will have only one component the number of its pixels will be obviously equal to the number of pixels in the remarkable components.

To make it clear look at these pictures assuming tau equals to 8.

Although they are different pictures but they have the same CCV.

Another problem we may encounter is the positions of these remarkable connected components relative to each other.

These pictures have the same CCV with different appearance.

There are many solutions to these problems. Most of them add another dimension in feature vector which is components’ position relative to the others. So this dimension is used in the comparison in order to differentiate between pictures that have the same CCV.

Here you’ll a fast Matlab implementation on Github.

SA Tarek. Thanks for this nice summary, and the other ones. Good work indeed !

Concerning the last paragraph, there are two approaches:

1- Either building a universal model to capture both color and shape (color distribution) features.

2- Keep the color model simple, and use another model to capture the other features such as shape or texture. Then combine all models in the distance function.

Most research works use the 2nd alternative. It gives you more flexibility in building the feature vector. Such a decision should, however, take into account the target application. For example, in medical CBIR the spatial distribution of colors is of a great importance. So, it should be incorporated into the model.

Another approach, which is frequently used in medical CBIR is to compute the color features of only the interesting parts in the image, rather than the whole image. These are, traditionally, the pathology bearing regions. This approach requires image segmentation, as a pre-processing.

hmmm interesting, Thanks Dr. Mahmoud for ur comment 🙂