SCIENTIFIC NEWS AND
INNOVATION FROM ÉTS
Efficient Clustering with SLK - By : Imtiaz Masud Ziko, Éric Granger, Ismail Ben Ayed,

Efficient Clustering with SLK


Imtiaz Masud Ziko
Imtiaz Masud Ziko Author profile
Imtiaz Masud Ziko is a PhD student in Machine Learning in the Department of Systems Engineering at ÉTS.

Éric Granger
Éric Granger Author profile
Éric Granger is a professor in the Systems Engineering Department at ÉTS. His main research interests are machine learning, pattern recognition, computer vision, and adaptive and intelligent systems.

Ismail Ben Ayed
Ismail Ben Ayed Author profile
Ismail Ben Ayed is a professor in the Systems Engineering Department at ÉTS. His work focuses on the design of efficient algorithms for processing, analysis and interpretation of medical images.

Machine learning

Purchased on Istock.com. Copyright.

SUMMARY

In AI, to make machines learn automatically without any given supervision through examples is called unsupervised learning. Cluster analysis is a very common unsupervised learning method which aims to gather similar instances in the same group from a given pool of unlabeled instances. In large scale scenarios, clustering methods can suffer in terms of clustering solutions and scalability issues. We propose an efficient clustering method Scalable Laplacian K-modes (SLK): which combines both the centroid and pairwise clustering objectives in a single formulation and optimizes efficiently and rapidly the distributed algorithm, while being better in clustering tasks than most existing methods.

Organizing Huge Amounts of Data

The advancement of data acquisition technologies through the Internet, digital media, social networks, video surveillance and growing business industries have produced massive amounts of high dimensional data. Manipulation of these huge volumes of data is crucial to build a resourceful model specific to related applications. Thus, the need for automatic data analysis is obvious. Clustering algorithms appeared extensively in literature as a tool to analyze data through extracting similar patterns from data distribution in an unsupervised manner.  These techniques have been used in a wide range of data analysis applications such as business data analysis, market analysis, social network analysis and many other applications related to AI. However, organizing a large amount of highdimensional data in some meaningful clusters is very challenging due to scalability and efficiency issues. For example popular photosharing websites like Facebook or Instagram where the need to handle millions of uploaded photos per day is obvious.

Clustering algorithms to classify images

Images classified according to similar characteristics

Numerous clustering algorithms are proposed in the literature. However, one particular clustering algorithm successful in a specific application may not necessarily succeed in other related applications. We propose Scalable Laplacian K-modes: a clustering method which combines two well-known clustering models in a single objective and provides a smart way of optimizing it for distributed, faster and better clustering solutions than popular existing clustering methods. This research work was accepted as a spotlight article in Neural Information Processing Systems (Neurips), arguably considered the best venue in AI and machine learning.

Scalable Laplacian K-modes (SLK) Clustering

Scalable Laplacian K-modes

As said, Laplacian K-modes combines two models in a single objective: K-modes and the well known Laplacian. K-modes find the mode of each cluster, as in the very popular mean-shift method. The term mode roughly means the most representative sample corresponding to the peak of a distribution. The Laplacian term brings consistency in clusters by imposing same cluster assignments for nearby data samples and thereby helps to find manifold structured clusters such as spiral clusters shown in the following image.

Spiral clusters

 

Although, the model is promising, there is a very challenging optimization problem due to discrete constraints and Laplacian. We propose a concave-convex relaxation of the formulation and an iterative bound optimization method which solves the clustering problem parallel at each iteration. Also our proposed solution provides the representative mode of each cluster as a byproduct in linear time complexity which we call SLK-BO. This makes the algorithm faster than the variant SLK-MS which finds the mode with an iterative algorithm. 

We report comprehensive experiments on various datasets mostly containing images or numeric data. We report Normalized Mutual Information (NMI) and Clustering accuracy (ACC) as performance measures. Higher values mean better performance. We also evaluate the time in seconds to organize the data.   Lower time mean better efficiency.

Clustering results for different databanks

Table: Clustering results as NMI/ACC in the upper half and average elapsed time in seconds (s) in the lower half.

 In the Table, it can be observed that the SLK-MS/SLK-BO method offers much better accuracy and timing. For example, in the MNIST (code) dataset, 95% images were automatically organized correctly according to their similarity in an unsupervised way. Also in terms of timing, SLK is much faster than the other methods particularly for high-dimensional datasets such as LabelMe (GIST) and YTF and scalable for large Reuters dataset. 

Additional Information

More technical details about SLK clustering can be found in the Neurips paper: https://papers.nips.cc/paper/8208-scalable-laplacian-k-modes.pdf. And the python code is available at: https://github.com/imtiazziko/SLK.

Imtiaz Masud Ziko

Author's profile

Imtiaz Masud Ziko is a PhD student in Machine Learning in the Department of Systems Engineering at ÉTS.

Program : Automated Manufacturing Engineering 

Research chair : ETS Research Chair on Artificial Intelligence in Medical Imaging 

Research laboratories : LIVIA – Imaging, Vision and Artificial Intelligence Laboratory 

Author profile

Éric Granger

Author's profile

Éric Granger is a professor in the Systems Engineering Department at ÉTS. His main research interests are machine learning, pattern recognition, computer vision, and adaptive and intelligent systems.

Program : Automated Manufacturing Engineering 

Research laboratories : LiNCS - Cognitive and Semantic Interpretation Engineering Laboratory  LIVIA – Imaging, Vision and Artificial Intelligence Laboratory 

Author profile

Ismail Ben Ayed

Author's profile

Ismail Ben Ayed is a professor in the Systems Engineering Department at ÉTS. His work focuses on the design of efficient algorithms for processing, analysis and interpretation of medical images.

Program : Automated Manufacturing Engineering 

Research chair : ETS Research Chair on Artificial Intelligence in Medical Imaging 

Research laboratories : LIVIA – Imaging, Vision and Artificial Intelligence Laboratory 

Author profile


comments

    Leave a Reply

    Your email address will not be published. Required fields are marked *