Confusion Matrices and Cyber Case

Bhanudas Rane
6 min readJun 6, 2021

Hello Learners..

Welcome to my new blog on cyber crime cases with confusion matrix , friends this is very interesting blog . If you want to know more about confusion matrix then invest your 1 min here to read …

We divide topic in two ways:-

  1. Introduction of Confusion Matrix and Its Error.
  2. Cyber attack cases with confusion matrix.

Introduction of Confusion Matrix and Its Error.

Confusion Matrix :-

A confusion matrix is a technique for summarizing the performance of a classification algorithm. It is a table that is often used to describe the performance of a classification model or classifier on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

Above figure shows the confusion matrix, In this 4 cases are there as follows:-

✔True Positive (TP) :- Model Predicted True and Actual Value is also True .

✔True Negative (TN) :- Model Predicted False and Actual Value is also False.

✔False Positive (FP) :- Model Predicted True and Actual Value is also False

✔False Negatives (FN) :- Model Predicted False and Actual Value is also True

Confused ?? We understand this matrix with example :-

Take the example of cricket match between India and Australia

True Positive:

You projected positive and its turn out to be true. For example, you had predicted that India would win the world cup, and it won.

True Negative:

When you predicted negative, and it’s true. You had predicted that Australia would not win and it lost.

False Positive:

Your prediction is positive, and it is false. You had predicted that Australia would win, but it lost.

False Negative:

Your prediction is negative, and result it is also false. You had predicted that India would not win, but it won.

Confusion Matrix give two errors :-

By understanding above example we observe False Positive (FP) and False Negative (FN) are the errors of matrix and this is also called type1 error and type2 error respectively .

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

  1. Accuracy (all correct / all) = (TP + TN) /( TP + TN + FP + FN)
  2. Misclassification (all incorrect / all) = (FP + FN )/( TP + TN + FP + FN)
  3. Precision (true positives / predicted positives) = TP / (TP + FP)
  4. Sensitivity aka Recall (true positives / all actual positives) = TP /( TP + FN)
  5. Specificity (true negatives / all actual negatives) =TN / (TN + FP)

Cyber Attack Cases with confusion matrix.

In below examples of cyber cases happen in Elazığ province Turkey .They are publish research paper on “Cyber-attack method and perpetrator prediction using machine learning algorithms” , So lets discuss how they use confusion matrix to get accuracy ,recall ,precision and etc.

Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. The key factors in the fight against crime and criminals are identifying the perpetrators of cyber-crime and understanding the methods of attack. Detecting and avoiding cyber-attacks are difficult tasks. However, researchers have recently been solving these problems by developing security models and making predictions through artificial intelligence methods. A high number of methods of crime prediction are available in the literature. On the other hand, they suffer from a deficiency in predicting cyber-crime and cyber-attack methods. This problem can be tackled by identifying an attack and the perpetrator of such attack, using actual data. The data include the type of crime, gender of perpetrator, damage and methods of attack. The data can be acquired from the applications of the persons who were exposed to cyber-attacks to the forensic units. In this paper, we analyze cyber-crimes in two different models with machine-learning methods and predict the effect of the defined features on the detection of the cyber-attack method and the perpetrator. We used eight machine-learning methods in our approach and concluded that their accuracy ratios were close. The Support Vector Machine Linear was found out to be the most successful in the cyber-attack method, with an accuracy rate of 95.02%. In the first model, we could predict the types of attacks that the victims were likely to be exposed to with a high accuracy. The Logistic Regression was the leading method in detecting attackers with an accuracy rate of 65.42%. In the second model, we predicted whether the perpetrators could be identified by comparing their characteristics. Our results have revealed that the probability of cyber-attack decreases as the education and income level of victim increases. We believe that cyber-crime units will use the proposed model. It will also facilitate the detection of cyber-attacks and make the fight against these attacks easier and more effective

The study aims to analyze the data collected about incidents correctly, to avoid crimes and to catch the perpetrators. The main subject of this paper is to draw conclusions from the analyzed data and combat crimes based on the outcome. These results will reveal and shed light on the investigations carried out by law enforcement officers and any concealed facts. Based on the information on the victim and the method of the cyber-crime, and whether the perpetrator is identified or not, machine-learning methods may be used to determine if the same perpetrator carried out the cyber-attack. The damages suffered by the victims in cyber incidents in Elazığ province have been discovered over the years through various methods. The sum of monetary damages suffered by each victim in the dataset was obtained by summing over the years. It is thought that the decrease in such incidents, observed especially after 2017, results from deterrence secured by the laws and awareness activities. The amount of economic losses due to cyber-attacks is profoundly serious in Elazığ .The damage mentioned above is enough to show the importance of dealing with cyber security and attack methods.

During the experiment, the dataset was first trained and tested in all algorithms. Accuracy and evaluation criteria were also adopted. Accuracy, precision, recall and F1 score values were obtained by comparing the predicted values with the test data.

Comparing the model in terms of precision, recall and F1-scores, the best result was also obtained with SVML algorithm, albeit a small margin. While LR, SVMK, DT, RF and XGBoost gave results above 92%, their performances were close to each other.

For predicting accuracy, precision, recall and F1-score they are use confusion matrix. Using this SVML algorithm give better result around 95% accuracy.

Thats all in this Blog , Hope you are understand the confusion matrix and their use. Good Day ✨!!

Thank You 😊

--

--