K-MEANS CLUSTERING AND ITS REAL USE-CASES IN THE SECURITY DOMAIN!!!

Kalla kruparaju
3 min readSep 26, 2021

K means is one of the most popular Unsupervised Machine Learning Algorithms Used for Solving Classification Problems. K Means segregates the unlabeled data into various groups, called clusters, based on having similar features, common patterns.

What is clustering?

Clustering is one of the most common exploratory data analysis technique used to get an intuition about the structure of the data. It can be defined as the task of identifying subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different.
Unlike supervised learning, clustering is considered an unsupervised learning method since we don’t have the ground truth to compare the output of the clustering algorithm to the true labels to evaluate its performance. We only want to try to investigate the structure of the data by grouping the data points into distinct subgroups.

What Is K-Means Algorithm ?
K-means Algorithm is an Iterative algorithm that divides a group of n datasets into k subgroups /clusters based on the similarity and their mean distance from the centroid of that particular subgroup/ formed.Or K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.

The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data.

For a better understanding of k-means, let’s take an example from cricket. Imagine you received data on a lot of cricket players from all over the world, which gives information on the runs scored by the player and the wickets taken by them in the last ten matches. Based on this information, we need to group the data into two clusters, namely batsman and bowlers.

Use-case of K-means Clustering in Security Domain

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.

k-means can typically be applied to data that has a smaller number of dimensions, is numeric, and is continuous. think of a scenario in which you want to make groups of similar things from a randomly distributed collection of things k-means is very suitable for such scenarios. some examples

Behavioral segmentation

Segment by purchase history

Segment by activities on application, website, or platform

Define personas based on interests

Create profiles based on activity monitoring

Inventory categorization

Group inventory by sales activity

Group inventory by manufacturing metrics

Sorting sensor measurements

Detect activity types in motion sensors

Group images

Separate audio

Identify groups in health monitoring

Detecting bots or anomalies.

Separate valid activity groups from bots.

Group valid activity to clean up outlier detection

--

--