Accuracy of a Model
In machine learning, the accuracy of a model is a metric used to evaluate how well the model performs in correctly predicting outcomes. It measures the proportion of correct predictions (both positive and negative) to the total number of predictions made by the model.
Definition for Accuracy
Accuracy is defined as the ratio of the number of correct predictions to the total number of predictions. Mathematically, it is expressed as:
Accuracy Formula:
\(\text{Accuracy} = \dfrac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}\)
- TP: True Positives (correctly predicted positive cases)
- TN: True Negatives (correctly predicted negative cases)
- FP: False Positives (incorrectly predicted positive cases)
- FN: False Negatives (incorrectly predicted negative cases)
Importance of Accuracy
Accuracy is a straightforward and intuitive metric for understanding a model’s performance. However, it is most suitable for datasets with balanced classes. In imbalanced datasets, accuracy may provide misleading results because it does not distinguish between different types of errors (FP and FN).
Components in a Confusion Matrix
The confusion matrix shows how predictions are distributed among the actual classes:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
The components (TP, TN, FP, FN) are used in the accuracy formula to compute the proportion of correct predictions.
Detailed Examples with Steps to Calculate Accuracy
Below are ten real-world examples that explain the concept of model accuracy. Each example includes a scenario, details, and steps to calculate the accuracy.
Example 1 – Accuracy of Medical Diagnosis Model
Scenario: A model predicts whether patients have a specific disease.
- Total patients: 200
- TP (correctly diagnosed positive): 50
- TN (correctly diagnosed negative): 120
- FP (false positives): 10
- FN (false negatives): 20
Steps:
- Compute total correct predictions:
TP + TN = 50 + 120 = 170
. - Compute total predictions:
TP + TN + FP + FN = 200
. - Calculate accuracy:
Accuracy = 170 / 200 = 0.85 (85%)
.
Example 2 – Accuracy of Spam Email Detection Model
Scenario: A model predicts whether an email is spam.
- Total emails: 500
- TP: 120
- TN: 350
- FP: 10
- FN: 20
Steps:
- Correct predictions:
TP + TN = 120 + 350 = 470
. - Total predictions:
500
. - Accuracy:
Accuracy = 470 / 500 = 0.94 (94%)
.
Example 3 – Accuracy of Fraud Detection Model
Scenario: A model predicts whether a transaction is fraudulent.
- Total transactions: 1,000
- TP: 80
- TN: 850
- FP: 20
- FN: 50
Steps:
- Correct predictions:
TP + TN = 80 + 850 = 930
. - Total predictions:
1,000
. - Accuracy:
Accuracy = 930 / 1,000 = 0.93 (93%)
.
Example 4 – Accuracy of Cancer Detection Model
Scenario: A model predicts whether patients have cancer.
- Total patients: 500
- TP: 40
- TN: 420
- FP: 20
- FN: 20
Steps:
- Correct predictions:
TP + TN = 40 + 420 = 460
. - Total predictions:
500
. - Accuracy:
Accuracy = 460 / 500 = 0.92 (92%)
.
Example 5 – Accuracy of Loan Default Prediction Model
Scenario: A model predicts whether a customer will default on a loan.
- Total customers: 800
- TP: 100
- TN: 650
- FP: 30
- FN: 20
Steps:
- Correct predictions:
TP + TN = 100 + 650 = 750
. - Total predictions:
800
. - Accuracy:
Accuracy = 750 / 800 = 0.9375 (93.75%)
.
Example 6 – Accuracy of Defect Detection Model
Scenario: A model predicts whether a product is defective.
- Total products: 1,000
- TP: 50
- TN: 900
- FP: 20
- FN: 30
Steps:
- Correct predictions:
TP + TN = 50 + 900 = 950
. - Total predictions:
1,000
. - Accuracy:
Accuracy = 950 / 1,000 = 0.95 (95%)
.
Conclusion
Model accuracy is an essential performance metric, particularly in balanced datasets where all classes are equally significant. It provides a clear and intuitive measure of a model’s overall correctness. However, for imbalanced datasets, it is recommended to complement accuracy with other metrics such as precision, recall, and F1 score to gain a deeper understanding of the model’s performance.