k-mean clustering and its use cases in security

Bhanudas Rane
4 min readJul 20, 2021

Hello Guys ,
Todays we see about k-mean clustering and their applications in security . so lets start !!

✨What is Clustering ?
Clustering is the process of dividing the entire data into groups (also known as clusters) based on the patterns in the data. In simple words, the aim is to segregate groups with similar traits and assign them into clusters.

✨Use of Clustering :-
Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs .

Land use: Identification of areas of similar land use in an earth observation database.

Insurance: Identifying groups of motor insurance policy holders with a high average claim cost

City-planning: Identifying groups of houses according to thei house type, value, and geographical location

Earthquake studies: Observed earthquake epicenters should be clustered along continent faults.

✨K-mean Clustering :-
K-means is one of the algorithm that is use to solve problems of clustering which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:
1)The centroids of the K clusters, which can be used to label new data
2)Labels for the training data (each data point is assigned to a single cluster)

As we see clustering technique use trial and error to specify K( e.g. 3,4,5).In above fig. you can see the flow to select K value. Elbow Technique is one of the technique that use to determine the value of K. Once we get K’s value ,the system will assign that many centroids randomly and measure the distance of each of the data points from these centroids .

Accordingly, it assigns those points to the corresponding centroid from which the distance is minimum. So each data point will be assigned to the centroid, which is closest to it. Thereby we have a K number of initial clusters. For the newly formed clusters, it calculates the new centroid position. The position of the centroid moves compared to the randomly allocated one. Once again, the distance of each point is measured from this new centroid point. If required,
the data points are relocated to the new centroids, and the mean position or the new centroid is calculated once again. Like wise cycle of allocation process.

✨Use cases in security Domain :-

1) Cyber-Profiling Criminals :-
Cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene. The results of log analysis datasets using the K-Means algorithm to cyber profiling process show that the algorithm has to group activity based on the data of internet users visited the website. This grouping is divided into three, namely the visit low, medium, and high.

2)Call Record Detail Analysis
A call detail record (cdr) is the information captured by telecom companies during the call, sms, and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics. How you can cluster customer activities for 24 hours by using the unsupervised k-means clustering algorithm. it is used to understand segments of customers with respect to their usage by hours.

3)Insurance Fraud Detection
Machine learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial. using clustering in automobile insurance to detect frauds.

This are the some cases where k-means used . Hope you will enjoy to read this blog .
🎇Thank You ,Bye !!

--

--