Archive | Image Retrieval RSS for this section

Step-by-Step Guide to Spectral Clustering

Spectral Clustering is widely known technique used to cluster the data by transforming its affinity matrix (or similarity matrix which will be described later) to another space usually a lower-dimensional space where the similar instance in the data are grouped to a certain portion in the new space. So instead of looking at the data in the high dimensional space, you can look at it in the lower space that preserves the distances between your data. Then clustering using K-means becomes much accurate.

So what are the steps to perform this transformation and start clustering?

First we need to define the term affinity matrix. Simply you can think of it as an adjacency matrix that represents a graph. In this graph each node is an instance of your data. Each edge is a similarity distance between two nodes. Similarity distance could be any function that can express the similarity between two instances (nodes). For example you can use “Euclidian distance” function or the most commonly used “Gaussian Kernel”.1

So finally you get a matrix where  is the similarity between node i and node j.

Secondly, we need to create a Graph Laplacian Matrix. There’re many different eigenvector-based techniques like:

Unnormalized Laplacian2

Normalized Laplacian3

Random Walk Laplacian4

Where D is a n*n diagonal matrix5and W is the Affinity Matrix. Also I’d recommend to see section 8.5 and quora answer to understand the effect of the different Laplacians.

Thirdly, we compute the eigenvectors of the Laplacian Matrix and choose the highest or lowest k eigenvectors depending on what type of Laplacian matrix we used.

In this post I’ll consider only Unnormalized Laplacian and in this type we choose the lowest k eigenvectors.

Now let’s see two a very simple example.

Consider we have data that has separable connected component (in other words its clusters are completely disconnected). Here my Affinity matrix (W) is very simple it contains either zero or one



Now D equals to


Laplacian Matrix L = D – W


Finally, These are the eigenvectors


As you can see, in this case if we only chose 1 Dimension (the first eigenvector) this will be good enough because it’s either -0.5774 or 0. So we can simply threshold this dimension to get our 2 clusters. The second example is a little bit complex.

In the second example we have the same graph adjacency matrix but we’ll add a single connection between the two separated clusters. So now they’re not separated anymore.


Now D equals to


Laplacian Matrix = D – W



These are the eigenvectors


As you can see the first eigenvector has the same absolute value 0.4082 (which is 1/sqrt(6) ). This is a single number related to the number of connected components (which is 6 in this case). And this case happens when the graph does not have separable components.

The second eigenvector is the one that is interesting in our case. It like the first example we can threshold this vector to cluster our data. So we get 3 negative number and 3 positive numbers. If we rearrange our weight matrix according to this eigenvector (so we put the clusters above each other). We get


There’s an interesting observation here. The separated data points (using eigenvectors) have almost the same structure. So what’s happening is that the eigenvectors reduce the Laplacian Matrix into set of nodes that are strongly related to each other like the first three columns and the second three columns.

Now there are some notes about how to get the clusters from your eigenvectors. The famous and easiest way is to use k-mean with top k eigenvectors.

Another idea is based on the recursive best cut. As in the last example we chose the second dimension. Now sort this dimension and cut it to form two clusters using this equation to decide how much elements you should take from its beginning.


Recursively divide each cluster the same way to two clusters. Your stopping condition is when you find no more good clusters.

Last idea that is recommended is to get k dimensions from the eigenvectors based on the eigenvalue. By sorting the eigenvalues (neglect any eigenvalue = zero) then iteratively choose the number k such that all eigenvalues are very small, but the eigenvalue number k+1 is relatively large.

For example in our second case, the eigenvalues are

Here we can see that only the second eigenvector is good (Remember that we neglect any zero eigenvalue).

Robot motion planning based on Voronoi


  • Voronoi
  • Why Voronoi
  • Motion Planning


1)    Defining an obstacle

  • Black and white problem.
  • Color gradient.

2)    Detecting all obstacle’s edges

  • Gradient Magnitude.

3)    Discretizing all continuous edges into points

  • Start & end nodes problem.
  • How?!
  • Boundary points problem.

4)    Voronoi Part

  • Saving and reading from files.

5)    Deleting non-safe paths (or edges)

6)    Dijkstra

  • Start & end nodes problem.

7)    Delete all other edges

8)    Decimation

  • Robot Commands.


  • Voronoi: From Wikipedia, In mathematics, a Voronoi diagram is a way of dividing space into a number of regions. A set of points (called seeds, sites, or generators) is specified beforehand and for each seed there will be a corresponding region consisting of all points closer to that seed than to any other. (As shown in this figure).


  • Why: Our goal is to use this diagram to plan the safest route for a robot to go from one location to another. As it’s shown in the figure, the edges are as far as possible from all obstacles (points).
  • Motion Planning: This part of the project is about how to convert an image with obstacles into input for Voronoi Diagram, and then calculating the safest shortest path for the robot.


1)    Defining the obstacles: After taking the input from the user (image containing obstacles), We want to define the obstacles where the robot should avoid (Note we only need to know the obstacles we don’t search for the possible paths because anything else will be considered passable):

  • The first problem is the color of the obstacle; in the first phase we assumed that it’ll be black and white image (where black means that it’s an obstacle).
  • Then we decide the pixel’s color whether it’s black or not. So we used Color Distance (Euclidean distance). That’s if the distance between a pixel and a black pixel <= 200 then it’s considered an obstacle (200 is the default value, it’s changeable by user).
  • After that we generalized obstacle’s color to be changeable by user (controlled by the same metric (Color Distance)).


2)    Detecting obstacles’ edges: now after knowing the obstacles we need to convert them to points in order to run and construct Voronoi. Our approach is converting these obstacles into edges and then converting these edges into points that represent the shape of the obstacles.

  • We used Gradient Magnitude algorithm to convert obstacles to edges.


3)    Discretizing all edges into points: here we need to discretize all edges to points.

  • Before discussing this problem, there’s an important note, the start and end positions for the robot should be also represented as points the reason for that will be clear in the next steps.
  • The problem in this step is that the distance between any two points must be less than robot’s size in order to prevent the algorithm from accepting a path that goes through the obstacle.
  • How : our approach is having a 2D matrix represent the edges from step 2, we iterate over each point and erase any point that is near to it with distance <= robot size. (Look at the figure).
  • Boundary points: The edges of the image are also considered obstacles, so we discretized these edges by the same technique used with other points.


4)    Voronoi Part: now we organized the input and it’s ready to construct Voronoi with.

  • Voronoi diagram was made in c++ (to be more efficient).
  • We save the size of the image and points in a file. Then execute a c++ exe file which read the input and save the output in another file
  • After that the main program (C#) reads the output and deletes the input ,output and exe file.


5)    Deleting non-safe paths (or edges): After receiving the output from Voronoi c++. As shown in the figure, there’re some edges that the robot cannot use them.

  • This is done by iterating over each point and delete all edges that the distance between the point and them <= robot size.


6)    Dijkstra: now everything is ready to compute the shortest (safest) path. In this stage we run dijkstra algorithm over all safe edges in order to find the shortest path from the start to the end nodes.

  • Note: we connected the start & end points with all points in their convex polygon.


7)    Delete all other edges: After calculating the final path, now we neglect all other unused edges.


8)    Decimation (Optional step): Finally we need to downsample the final path in order to send it as commands to the robot (to reduce the amount of commands).

  • All commands are describing the distance and orientation that the robot should take.
  • We make a linear line every 5 units in x-axis (it’s the default value, changeable by user).
  • Details are shown in the example


Source code can be found here :

For excitable version click here

Image Retrieval: Color Coherence Vector

It’s recommended to have a look at this post. It’s an introduction to Image Retrieval. Some of its terms and expressions are used in this post.

Last post we talked about two common color descriptors Global Color Histogram (GCH) and Local Color Histogram (LCH). Then we discussed the main problem of using GCH that it has no information about color spatial distribution. After that we discussed an attempt to solve this problem using LCH. Finally we showed some drawbacks of using LCH.

Color descriptors are used to differentiate between images and compute their similarities by describing their colors.

Now we’ll discuss one of the most efficient color descriptors that contains information about color spatial distribution which is called Color Coherence Vector (CCV).

Color Coherence Vector

Color Coherence Vector (CCV) is a more complex method than Color Histogram. It classifies each pixel as either coherent or incoherent. Coherent pixel means that it’s part of a big of connected component (CC) of the same color while incoherent pixel is part of a small connected component. Of course first we define the criteria which we use to measure whether a connected component is big or not.

Feature extraction algorithm

1. Blur the image (by replacing each pixel’s value with the average value of the 8 adjacent pixels surrounding that pixel).
2. Discretize the color-space (images’ colors) into n distinct color.
3. Classify each pixel either as coherent or incoherent. This is computed by

  • Find connected components for each discretized color.
  • Determine tau’s value (Tau is a user-specified value (Normally it’s about 1% of image’s size)).
  • Any Connected Component has number of pixels more than or equal to tau then its pixels are considered coherent and the others are incoherent.

4. For each color compute two values (C and N).

  • C is the number of coherent pixels.
  • N is the number of incoherent pixels.

It’s clear that the summation of all color’s C and N = number of pixels.

Matching function

To compare 2 images a, b.
Ci : number of coherent pixels in color i.
Ni : number of incoherent pixels in color i.


Let’s take this example to make algorithm’s steps clear.
Assuming that the image has 30 colors instead of 16777216 colors (256*256*256).


Now we’ll discretize the colors to only three colors (0:9, 10:19, 20, 29).


Assuming that our tau is 4
For color 0 we have 2 CC (8 coherent pixels)
For color 1 we have 1 CC (8 coherent pixels)
For color 2 we have 2 CC (6 coherent pixels and 3 incoherent pixels)
So finally our feature vector is


Drawbacks of Color Coherence Vector

Now we see that Color Coherence Vector method considers information about color spatial distribution between pixels in its coherent component. But this method has some drawbacks. The remaining part of this post will discuss two main drawbacks of it.

Coherent pixels in CCV represent the pixels which are inside remarkable components in image. But what if we combined these entire components into one component. We will have only one component the number of its pixels will be obviously equal to the number of pixels in the remarkable components.

To make it clear look at these pictures assuming tau equals to 8.


Although they are different pictures but they have the same CCV.
Another problem we may encounter is the positions of these remarkable connected components relative to each other.

These pictures have the same CCV with different appearance.


There are many solutions to these problems. Most of them add another dimension in feature vector which is components’ position relative to the others. So this dimension is used in the comparison in order to differentiate between pictures that have the same CCV.

Here you’ll a fast Matlab implementation on Github.

Image Retrieval: Global and Local Color Histogram

In the previous post we talked about Image Retrieval and Image Descriptors. Now we will introduce one of the most common and important descriptors that doesn’t include information about color spatial distribution which is Color Histogram.

Color Histogram

Color Histogram is a representation of the distribution of colors in an image.(From Wikipedia)
Color histogram represents the image but from another perspective. Color Histogram counts similar pixels and store it in bins in order to describe the number of pixels in each range of colors (or bin) independently.
Note: Color Histogram is a color descriptor and as we knew from the previous post that each descriptor contains a feature extraction algorithm and a matching function.

Color Histogram is divided into:

  • Global Color Histogram (GCH).
  • Local Color Histogram (LCH).

Global Color Histogram

GCH is the most known color histogram used to detect similar images.
Feature extraction algorithm:

  1. Discretize your color-space (images’ colors) into n color (You may use just 8*8*8 =512 color instead of 256*256*256=16777216 color).
  2. Create a bin for each color.
  3. Count number of pixels for each color and store it in histogram’s bins.

Matching function:
The most common matching function for this method is Euclidean distance.
To compare 2 images A, B.

A(R,G,B) : represents number of pixels in color = (R,G,B). (for example A(6,2,4) represents the number of discretized pixels of color R=6,G=2 and B=4).
D:  sum Euclidean distances.


Remember : the larger the distance value, the less similar the images are.

Look at this example


Here C has the same color histogram as B but A is different from them.

Using Euclidian distance for these color histograms we found that  D(A,C) = D(A,B) and D(B,C) = 0 but There’s a problem here that  B, C are not similar at all so D(B,C) shouldn’t be zero and D(A,C) should be smaller than D(A,B) because A,C have the same pixels except for only two pixels.

That’s why we call GCH doesn’t include information about color spatial distribution.

There’s an attempt to solve this problem which is the next part of this post.

Local Color Histogram

LCH includes information about color’s distribution in different regions. It’s the same as GCH but at first we divide the image into different block. Where each pair of the blocks (one of them in the first image and the other in the second) will be computed separately using GCH. After that the total distance between the two images will be the sum of all GCH distances between them.


Feature extraction algorithm:

  1. Split image into m blocks
  2. Compute the GCH for each pair of blocks as shown in the figure

Matching function:
To compare 2 images a, b.
All we need to do is to sum up all distances computed by GCH.

D:  sum of Euclidean distances.


Using LCH the distances are now more reasonable. D(A,B) = 1.768, D(A,C) = .707, D(B,C)=1.768.

So sometimes LCH is more efficient than GCH. But when the image is rotated we may get a very different output.

Look at this example:


In this example the distance between the 2 images using LCH = 0

a Capture

Here the distance between the 2 images using LCH = 4 although they are the same but the second one is rotated and this problem is the main disadvantage of Local Color Histogram.

Image Retrieval

The aim of this post is to talk about:

  1. Image retrieval and its classifications.
  2. Image Descriptors.
  3. Color moment descriptor.

Short overview:

Image retrieval is an old research topic in computer science. It’s about how to retrieve (or search) for image(s) from a database of images by extracting some distinctive features for each image.

Image retrieval is used in image processing and computer vision. One of the most famous applications of this topic is Search by image made by Google.

Image Retrieval is classified into:

  • Tag-based image.
  • Content-based image retrieval.

Tag-based image retrieval (CBIR):
Searching for images relying on metadata and tags that are associated with images. This classification depends on human intervention to provide a description of the image content.
Content-based image retrieval (CBIR):
Searching for images relying on its actual content. Based on their similarities instead of  textual description (List of CBIR systems)

CBIR system is divided into :

  • Data insertion is responsible for extracting features and information from Images.
  • Query processing is responsible for retrieving images depending on specified queries.

Image Descriptors:

Image Descriptor is a descriptor that contains a feature extraction algorithm and a matching function (Matching function is a similarity measure to compare images. Like Euclidean that is, the larger the distance value, the less similar the images are. )

Feature extraction is mapping the image pixels into the feature space (Data Insertion).
Matching function compares a given image with database images (Query processing).

Image Descriptors are classified into:

  • Color descriptors.
  • Shape descriptors.
  • Texture descriptors.

Currently we will talk about Color descriptors

Color descriptors are divided into two groups:

  • Contains information about color spatial distribution.
  • Doesn’t contain information about color spatial distribution.

Color spatial distribution means that the color descriptor is taking into account information about colors’ position in the image.
For example look at these images. Both have similar color distribution (amount) although they have different appearance.

In the next posts the difference between these groups will be more clarified.

The first color descriptor we will talk about is Color Moments

Color Moments

This method depends on some statistical moments like mean, variance and skewness.

Feature extraction algorithm:

  1. Separate the 3 color channels of the image (R,G,B) .
  2. Compute the mean, variance and skewness for each color channels.

The combination of these moments is a good descriptor to differentiate between images’ color distribution.

Matching function:
To compare 2 images a, b.

Assuming that
r : Number of color channels (which is in our case is 3 colors (red, green and blue))
Ei : Mean of color i.
Vi : Variance of color i.
Si: Skewness of color i.
D : Similarity distance.


Where W1, W2 and W3 are user specified weights.

Next posts we will talk about more efficient color descriptor. Stay Tuned.