Confusion Matrix and Intrusion Detection System (IDS)

amit soni
4 min readJun 6, 2021

In this article, I have tried to small talk about of confusion matrix and intrusion detection system and the relation of IDS and confusion matrix

What is Intrusion Detection System?

Intrusion Detection Systems (IDS) have been a popular defensive strategy used in protecting entire enterprise networks, hosts in a network, or processes on a host from cyber attacks

The main task of an IDS is the analysis and classification of the network activity, which is aimed to discriminate the normal network events from the intrusion ones. The concept of intrusion can be related to a software activity, such as malware(e.g., spyware, virus, rootkit, etc.) and human activity addressed to the illegitimate exploitation of network resources.

What is Confusion Matrix?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

For a binary classification problem, we would have a 2 x 2 matrix as shown below with 4 values:

Let’s decipher the matrix:

  • The target variable has two values: Positive or Negative
  • The columns represent the actual values of the target variable
  • The rows represent the predicted values of the target variable

Understanding True Positive, True Negative, False Positive, and False Negative in a Confusion Matrix

True Positive (TP)

  • The predicted value matches the actual value
  • The actual value was positive and the model predicted a positive value

True Negative (TN)

  • The predicted value matches the actual value
  • The actual value was negative and the model predicted a negative value

False Positive (FP) — Type 1 error

  • The predicted value was falsely predicted
  • The actual value was negative but the model predicted a positive value
  • Also known as the Type 1 error

False Negative (FN) — Type 2 error

  • The predicted value was falsely predicted
  • The actual value was positive but the model predicted a negative value
  • Also known as the Type 2 error

Let me give you an example to better understand this. Suppose we had a classification dataset with 1000 data points. We fit a classifier on it and get the below confusion matrix:

Example of a confusion matrix

The different values of the Confusion matrix would be as follows:

  • True Positive (TP) = 560; meaning 560 positive class data points were correctly classified by the model
  • True Negative (TN) = 330; meaning 330 negative class data points were correctly classified by the model
  • False Positive (FP) = 60; meaning 60 negative class data points were incorrectly classified as belonging to the positive class by the model
  • False Negative (FN) = 50; meaning 50 positive class data points were incorrectly classified as belonging to the negative class by the model

This turned out to be a pretty decent classifier for our dataset considering the relatively larger number of true positive and true negative values.

Why does Confusion Matrix need to use evaluate IDS?

This matrix is one of the best methods that evaluated IDS.

It depends on several measurements to determine the performance of the model where each column in this matrix represents the expected class while each row represents the actual class.

The performance of the classifier is evaluated by calculating the number of the expected records correctly and the number of records classified incorrectly.

confusion matrix for a two-class classifier which can be used for evaluating the performance of an IDS. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class.

IDS are typically evaluated based on the following standard performance measures:

True Positive Rate (TPR): It is calculated as the ratio between the number of correctly predicted attacks and the total number of attacks. If all intrusions are detected then the TPR is 1 which is extremely rare for an IDS. TPR is also called a Detection Rate (DR) or Sensitivity. The TPR can be expressed mathematically as

False Positive Rate (FPR): It is calculated as the ratio between the number of normal instances incorrectly classified as an attack and the total number of normal instances.

False Negative Rate (FNR): False-negative means when a detector fails to identify an anomaly and classifies it as normal. The FNR can be expressed mathematically as:

Classification rate (CR) or Accuracy: The CR measures how accurate the IDS is in detecting normal or anomalous traffic behavior. It is described as the percentage of all those correctly predicted instances to all instances:

References: https://www.sciencedirect.com/science/article/pii/S1319157817304287?via%3Dihub

https://cybersecurity.springeropen.com/articles/10.1186/s42400-019-0038-7

--

--