K-means clustering and its Real Usecases in the Security Domain.
What is k-means clustering?
Clustering is the task of dividing the population or data points into several groups such that data points in the same groups are more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar traits and assign them into clusters. The goal of the k-means algorithm is to find groups in the data, with the number of groups represented by the variable k. The algorithm works iteratively to assign each data point to one of the k groups/clusters based on the features that are provided. In the reference image below, k=3, three clusters are identified from the source dataset. K-means clustering is also referred to as Lazy Learning. Also, k-means clustering is unsupervised learning.
k-means can typically be applied to data that has a smaller number of dimensions, is numeric, and is continuous. Think of a scenario in which you want to make groups of similar things from a randomly distributed collection of things; k-means is very suitable for such scenarios.
How does the K-means algorithm work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the value of K, to decide the number of clusters to be formed.
Step-2: Select random K points which will act as centroids.
Step-3: Assign each data point, based on their distance from the randomly selected points (Centroid), to the nearest/closest centroid which will form the predefined clusters.
Step-4: Place a new centroid of each cluster.
Step-5: Repeat step no.3, which reassigns each datapoint to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to Step 7.
Step-7: FINISH
Use cases of k-means clustering in the security domain:-
1. Identifying crime localities :
With data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality.
2. Insurance fraud detection :
Machine learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. Utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on their proximity to clusters that indicate fraudulent patterns. Since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial.
3. Cyber-profiling criminals :
Cyber profiling is the process of collecting data from individuals and groups to identify significant correlations. The idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene.
So that’s all for today’s technical practical. Thanks for reading.
Connect with me on LinkedIn :-