F1-Score of a Model

The F1-score is a performance metric for evaluating machine learning models, especially when there is a class imbalance in the dataset. It is the harmonic mean of precision and recall, providing a balanced measurement of a model’s ability to identify positive cases while minimizing false positives and false negatives.


Definition for F1-Score of a Model

The F1-score combines both precision and recall into a single metric by calculating their harmonic mean. It is especially useful when you need to balance the trade-off between precision and recall.

Mathematically, the F1-score is expressed as:

F1-Score Formula:

\( \text{F1} = \dfrac { 2 \times \text{Precision} \times \text{Recall} } { \text{Precision} + \text{Recall} } \)

  • Precision: The ratio of true positive predictions to all positive predictions.
  • Recall: The ratio of true positive predictions to all actual positive cases.

Importance of F1-Score

The F1-score is valuable when you have an imbalanced dataset, meaning one class is more frequent than the other. In such cases, traditional accuracy can be misleading, as a model might simply predict the majority class more frequently to achieve high accuracy, but still perform poorly in identifying the minority class.

For example, in fraud detection, predicting “no fraud” more often could lead to higher accuracy, but it would miss fraudulent transactions. F1-score balances precision and recall, providing a more informative performance measure.


Position of Components in a Confusion Matrix

The confusion matrix helps visualize the distribution of predictions:

Predicted PositivePredicted Negative
Actual PositiveTrue Positive (TP)False Negative (FN)
Actual NegativeFalse Positive (FP)True Negative (TN)

To calculate the F1-score, we need both precision and recall, which can be derived from the confusion matrix.


Detailed Examples with Steps to Calculate F1-Score

Below are ten real-world examples that explain F1-score calculation step-by-step:


Example 1 – F1-Score of a Cancer Detection Model

Scenario: A model predicts whether a patient has cancer.

  • True Positives (TP): 90
  • False Positives (FP): 20
  • False Negatives (FN): 10
  • True Negatives (TN): 80

Steps:

  1. Calculate precision: Precision = TP / (TP + FP) = 90 / (90 + 20) = 0.8182 (81.82%).
  2. Calculate recall: Recall = TP / (TP + FN) = 90 / (90 + 10) = 0.9 (90%).
  3. Calculate F1-score: F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2 * (0.8182 * 0.9) / (0.8182 + 0.9) = 0.857 (85.7%).

Example 2 – F1-Score of a Fraud Detection Model

Scenario: A model predicts whether a transaction is fraudulent.

  • TP: 70
  • FP: 30
  • FN: 30
  • TN: 90

Steps:

  1. Precision: Precision = 70 / (70 + 30) = 0.7 (70%).
  2. Recall: Recall = 70 / (70 + 30) = 0.7 (70%).
  3. F1-score: F1 = 2 * (0.7 * 0.7) / (0.7 + 0.7) = 0.7 (70%).

Example 3 – F1-Score of a Spam Email Detection Model

Scenario: A model detects whether an email is spam.

  • TP: 50
  • FP: 20
  • FN: 10
  • TN: 80

Steps:

  1. Precision: Precision = 50 / (50 + 20) = 0.7143 (71.43%).
  2. Recall: Recall = 50 / (50 + 10) = 0.8333 (83.33%).
  3. F1-score: F1 = 2 * (0.7143 * 0.8333) / (0.7143 + 0.8333) = 0.7727 (77.27%).

Example 4 – F1-Score of a Loan Default Prediction Model

Scenario: A model predicts whether a customer will default on a loan.

  • TP: 100
  • FP: 25
  • FN: 50
  • TN: 125

Steps:

  1. Precision: Precision = 100 / (100 + 25) = 0.8 (80%).
  2. Recall: Recall = 100 / (100 + 50) = 0.6667 (66.67%).
  3. F1-score: F1 = 2 * (0.8 * 0.6667) / (0.8 + 0.6667) = 0.7273 (72.73%).

Example 5 – F1-Score of a Product Defect Detection Model

Scenario: A model detects defective products.

  • TP: 120
  • FP: 30
  • FN: 40
  • TN: 150

Steps:

  1. Precision: Precision = 120 / (120 + 30) = 0.8 (80%).
  2. Recall: Recall = 120 / (120 + 40) = 0.75 (75%).
  3. F1-score: F1 = 2 * (0.8 * 0.75) / (0.8 + 0.75) = 0.7727 (77.27%).

Example 6 – F1-Score of an Object Detection Model

Scenario: A model detects cars in images.

  • TP: 200
  • FP: 50
  • FN: 25
  • TN: 150

Steps:

  1. Precision: Precision = 200 / (200 + 50) = 0.8 (80%).
  2. Recall: Recall = 200 / (200 + 25) = 0.8889 (88.89%).
  3. F1-score: F1 = 2 * (0.8 * 0.8889) / (0.8 + 0.8889) = 0.8421 (84.21%).

Example 7 – F1-Score of a Disease Diagnosis Model

Scenario: A model diagnoses whether a patient has a disease.

  • TP: 90
  • FP: 20
  • FN: 10
  • TN: 80

Steps:

  1. Precision: Precision = 90 / (90 + 20) = 0.8182 (81.82%).
  2. Recall: Recall = 90 / (90 + 10) = 0.9 (90%).
  3. F1-score: F1 = 2 * (0.8182 * 0.9) / (0.8182 + 0.9) = 0.857 (85.7%).

Example 8 – F1-Score of a Sentiment Analysis Model

Scenario: A model analyzes text to predict sentiment (positive or negative).

  • TP: 80
  • FP: 20
  • FN: 30
  • TN: 100

Steps:

  1. Precision: Precision = 80 / (80 + 20) = 0.8 (80%).
  2. Recall: Recall = 80 / (80 + 30) = 0.7273 (72.73%).
  3. F1-score: F1 = 2 * (0.8 * 0.7273) / (0.8 + 0.7273) = 0.7619 (76.19%).

Example 9 – F1-Score of a Movie Recommendation System

Scenario: A recommendation system predicts whether a user will like a movie.

  • TP: 70
  • FP: 30
  • FN: 20
  • TN: 100

Steps:

  1. Precision: Precision = 70 / (70 + 30) = 0.7 (70%).
  2. Recall: Recall = 70 / (70 + 20) = 0.7778 (77.78%).
  3. F1-score: F1 = 2 * (0.7 * 0.7778) / (0.7 + 0.7778) = 0.7377 (73.77%).

Conclusion

The F1-score is an important evaluation metric when dealing with class imbalance. It provides a more balanced measure of a model’s performance by accounting for both precision and recall, making it ideal for scenarios where both false positives and false negatives need to be minimized.