F1-Score of a Model
The F1-score is a performance metric for evaluating machine learning models, especially when there is a class imbalance in the dataset. It is the harmonic mean of precision and recall, providing a balanced measurement of a model’s ability to identify positive cases while minimizing false positives and false negatives.
Definition for F1-Score of a Model
The F1-score combines both precision and recall into a single metric by calculating their harmonic mean. It is especially useful when you need to balance the trade-off between precision and recall.
Mathematically, the F1-score is expressed as:
F1-Score Formula:
\( \text{F1} = \dfrac { 2 \times \text{Precision} \times \text{Recall} } { \text{Precision} + \text{Recall} } \)
- Precision: The ratio of true positive predictions to all positive predictions.
- Recall: The ratio of true positive predictions to all actual positive cases.
Importance of F1-Score
The F1-score is valuable when you have an imbalanced dataset, meaning one class is more frequent than the other. In such cases, traditional accuracy can be misleading, as a model might simply predict the majority class more frequently to achieve high accuracy, but still perform poorly in identifying the minority class.
For example, in fraud detection, predicting “no fraud” more often could lead to higher accuracy, but it would miss fraudulent transactions. F1-score balances precision and recall, providing a more informative performance measure.
Position of Components in a Confusion Matrix
The confusion matrix helps visualize the distribution of predictions:
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
To calculate the F1-score, we need both precision and recall, which can be derived from the confusion matrix.
Detailed Examples with Steps to Calculate F1-Score
Below are ten real-world examples that explain F1-score calculation step-by-step:
Example 1 – F1-Score of a Cancer Detection Model
Scenario: A model predicts whether a patient has cancer.
- True Positives (TP): 90
- False Positives (FP): 20
- False Negatives (FN): 10
- True Negatives (TN): 80
Steps:
- Calculate precision:
Precision = TP / (TP + FP) = 90 / (90 + 20) = 0.8182 (81.82%)
. - Calculate recall:
Recall = TP / (TP + FN) = 90 / (90 + 10) = 0.9 (90%)
. - Calculate F1-score:
F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.8182 * 0.9) / (0.8182 + 0.9) = 0.857 (85.7%)
.
Example 2 – F1-Score of a Fraud Detection Model
Scenario: A model predicts whether a transaction is fraudulent.
- TP: 70
- FP: 30
- FN: 30
- TN: 90
Steps:
- Precision:
Precision = 70 / (70 + 30) = 0.7 (70%)
. - Recall:
Recall = 70 / (70 + 30) = 0.7 (70%)
. - F1-score:
F1 = 2 * (0.7 * 0.7) / (0.7 + 0.7) = 0.7 (70%)
.
Example 3 – F1-Score of a Spam Email Detection Model
Scenario: A model detects whether an email is spam.
- TP: 50
- FP: 20
- FN: 10
- TN: 80
Steps:
- Precision:
Precision = 50 / (50 + 20) = 0.7143 (71.43%)
. - Recall:
Recall = 50 / (50 + 10) = 0.8333 (83.33%)
. - F1-score:
F1 = 2 * (0.7143 * 0.8333) / (0.7143 + 0.8333) = 0.7727 (77.27%)
.
Example 4 – F1-Score of a Loan Default Prediction Model
Scenario: A model predicts whether a customer will default on a loan.
- TP: 100
- FP: 25
- FN: 50
- TN: 125
Steps:
- Precision:
Precision = 100 / (100 + 25) = 0.8 (80%)
. - Recall:
Recall = 100 / (100 + 50) = 0.6667 (66.67%)
. - F1-score:
F1 = 2 * (0.8 * 0.6667) / (0.8 + 0.6667) = 0.7273 (72.73%)
.
Example 5 – F1-Score of a Product Defect Detection Model
Scenario: A model detects defective products.
- TP: 120
- FP: 30
- FN: 40
- TN: 150
Steps:
- Precision:
Precision = 120 / (120 + 30) = 0.8 (80%)
. - Recall:
Recall = 120 / (120 + 40) = 0.75 (75%)
. - F1-score:
F1 = 2 * (0.8 * 0.75) / (0.8 + 0.75) = 0.7727 (77.27%)
.
Example 6 – F1-Score of an Object Detection Model
Scenario: A model detects cars in images.
- TP: 200
- FP: 50
- FN: 25
- TN: 150
Steps:
- Precision:
Precision = 200 / (200 + 50) = 0.8 (80%)
. - Recall:
Recall = 200 / (200 + 25) = 0.8889 (88.89%)
. - F1-score:
F1 = 2 * (0.8 * 0.8889) / (0.8 + 0.8889) = 0.8421 (84.21%)
.
Example 7 – F1-Score of a Disease Diagnosis Model
Scenario: A model diagnoses whether a patient has a disease.
- TP: 90
- FP: 20
- FN: 10
- TN: 80
Steps:
- Precision:
Precision = 90 / (90 + 20) = 0.8182 (81.82%)
. - Recall:
Recall = 90 / (90 + 10) = 0.9 (90%)
. - F1-score:
F1 = 2 * (0.8182 * 0.9) / (0.8182 + 0.9) = 0.857 (85.7%)
.
Example 8 – F1-Score of a Sentiment Analysis Model
Scenario: A model analyzes text to predict sentiment (positive or negative).
- TP: 80
- FP: 20
- FN: 30
- TN: 100
Steps:
- Precision:
Precision = 80 / (80 + 20) = 0.8 (80%)
. - Recall:
Recall = 80 / (80 + 30) = 0.7273 (72.73%)
. - F1-score:
F1 = 2 * (0.8 * 0.7273) / (0.8 + 0.7273) = 0.7619 (76.19%)
.
Example 9 – F1-Score of a Movie Recommendation System
Scenario: A recommendation system predicts whether a user will like a movie.
- TP: 70
- FP: 30
- FN: 20
- TN: 100
Steps:
- Precision:
Precision = 70 / (70 + 30) = 0.7 (70%)
. - Recall:
Recall = 70 / (70 + 20) = 0.7778 (77.78%)
. - F1-score:
F1 = 2 * (0.7 * 0.7778) / (0.7 + 0.7778) = 0.7377 (73.77%)
.
Conclusion
The F1-score is an important evaluation metric when dealing with class imbalance. It provides a more balanced measure of a model’s performance by accounting for both precision and recall, making it ideal for scenarios where both false positives and false negatives need to be minimized.