Spectral Clustering is widely known technique used to cluster the data by transforming its affinity matrix (or similarity matrix which will be described later) to another space usually a lower-dimensional space where the similar instance in the data are grouped to a certain portion in the new space. So instead of looking at the data in the high dimensional space, you can look at it in the lower space that preserves the distances between your data. Then clustering using K-means becomes much accurate.
So what are the steps to perform this transformation and start clustering?
First we need to define the term affinity matrix. Simply you can think of it as an adjacency matrix that represents a graph. In this graph each node is an instance of your data. Each edge is a similarity distance between two nodes. Similarity distance could be any function that can express the similarity between two instances (nodes). For example you can use “Euclidian distance” function or the most commonly used “Gaussian Kernel”.
So finally you get a matrix where is the similarity between node i and node j.
Secondly, we need to create a Graph Laplacian Matrix. There’re many different eigenvector-based techniques like:
Random Walk Laplacian
Thirdly, we compute the eigenvectors of the Laplacian Matrix and choose the highest or lowest k eigenvectors depending on what type of Laplacian matrix we used.
In this post I’ll consider only Unnormalized Laplacian and in this type we choose the lowest k eigenvectors.
Now let’s see two a very simple example.
Consider we have data that has separable connected component (in other words its clusters are completely disconnected). Here my Affinity matrix is very simple it contains either zero or one
Now D equals to
Laplacian Matrix L = D – W
Finally, These are the eigenvectors
As you can see, in this case if we only chose 1 Dimension (the first eigenvector) this will be good enough because it’s either -0.5774 or 0. So we can simply threshold this dimension to get our 2 clusters. The second example is a little bit complex.
In the second example we have the same graph adjacency matrix but we’ll add a single connection between the two separated clusters. So now they’re not separated anymore.
Now D equals to
Laplacian Matrix = D – W
These are the eigenvectors
As you can see the first eigenvector has the same absolute value 0.4082 (which is 1/sqrt(6) ). This is a single number related to the number of connected components (which is 6 in this case). And this case happens when the graph does not have separable components.
The second eigenvector is the one that is interesting in our case. It like the first example we can threshold this vector to cluster our data. So we get 3 negative number and 3 positive numbers. If we rearrange our weight matrix according to this eigenvector (so we put the clusters above each other). We get
There’s an interesting observation here. The separated data points (using eigenvectors) have almost the same structure. So what’s happening is that the eigenvectors reduce the Laplacian Matrix into set of nodes that are strongly related to each other like the first three columns and the second three columns.
Now there are some notes about how to get the clusters from your eigenvectors. The famous and easiest way is to use k-mean with top k eigenvectors.
Another idea is based on the recursive best cut. As in the last example we chose the second dimension. Now sort this dimension and cut it to form two clusters using this equation to decide how much elements you should take from its beginning.
Recursively divide each cluster the same way to two clusters. Your stopping condition is when you find no more good clusters.
Last idea that is recommended is to get k dimensions from the eigenvectors based on the eigenvalue. By sorting the eigenvalues (neglect any eigenvalue = zero) then iteratively choose the number k such that all eigenvalues are very small, but the eigenvalue number k+1 is relatively large.
For example in our second case, the eigenvalues are
Here we can see that only the second eigenvector is good (Remember that we neglect any zero eigenvalue).