What is Confusion Matrix? Machine Learning

What is Confusion Matrix?

Confusion Matrix is a performance evaluation tool for classification models. It provides a summary of prediction results, showing how well the model predicts each class and where it makes errors. This is especially useful for binary or multi-class classification problems.

Structure of a Confusion Matrix

For binary classification, the confusion matrix is a 2×2 table:

	Predicted: Positive	Predicted: Negative
Actual: Positive	True Positive (TP)	False Negative (FN)
Actual: Negative	False Positive (FP)	True Negative (TN)

Explaining the Terms

1 True Positive (TP)

Definition: The model correctly identifies a positive case.
Example: A test for a disease correctly identifies a sick patient as positive.
Significance: Indicates successful prediction for the positive class.

2 True Negative (TN)

Definition: The model correctly identifies a negative case.
Example: A test correctly identifies a healthy patient as negative.
Significance: Reflects accurate prediction for the negative class.

3 False Positive (FP)

Definition: The model incorrectly identifies a negative case as positive.
Example: A test incorrectly identifies a healthy patient as sick.
Significance: Represents “false alarms,” which can lead to unnecessary actions.

4 False Negative (FN)

Definition: The model incorrectly identifies a positive case as negative.
Example: A test fails to detect a disease in a sick patient.
Significance: Indicates “missed detections,” which can have serious consequences in critical applications.

Example of a Confusion Matrix

Imagine a model designed to detect spam emails. Here’s the confusion matrix based on 100 emails:

	Predicted: Spam	Predicted: Not Spam
Actual: Spam	40 (TP)	10 (FN)
Actual: Not Spam	5 (FP)	45 (TN)

Metrics Derived from a Confusion Matrix

Using the values in the confusion matrix, we can calculate important metrics:

1 Accuracy

Measures the overall correctness of the model.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Example:
Accuracy = (40 + 45) / (40 + 45 + 5 + 10) = 0.85 (85%)

2 Precision

Focuses on how many predicted positives were correct.
Precision = TP / (TP + FP)
Example:
Precision = 40 / (40 + 5) = 0.89 (89%)

3 Recall (Sensitivity)

Measures how many actual positives were correctly identified.
Recall = TP / (TP + FN)
Example:
Recall = 40 / (40 + 10) = 0.80 (80%)

4 F1-Score

Balances precision and recall using their harmonic mean.
F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
Example:
F1-Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) ≈ 0.84 (84%)

Why is a Confusion Matrix Important?

A confusion matrix provides a detailed view of model performance, highlighting not just overall accuracy but also the types of errors the model makes. This allows practitioners to:

Understand the trade-offs between precision and recall.
Identify issues like class imbalance or bias.
Optimize the model to reduce specific errors (e.g., false positives or false negatives).

TutorialKart

What is Confusion Matrix?