Logistic Regression in Machine Learning
Logistic Regression is a supervised learning algorithm used for binary classification problems, where the output can only belong to one of two classes (e.g., “yes” or “no,” “spam” or “not spam”).
Despite its name, logistic regression is not a regression algorithm but rather a classification technique. It predicts the probability that a given input belongs to a particular class.
How Does Logistic Regression Work?
Step 1: Linear Combination of Inputs
Logistic regression begins by calculating a linear combination of the input features:
\( z = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n \)
Here:
- \( z \): Weighted sum of the features
- \( x_1, x_2, \dots, x_n \): Input features
- \( \beta_0 \): Intercept term (bias)
- \( \beta_1, \beta_2, \dots, \beta_n \): Coefficients or weights of the features
This step is similar to linear regression, where \( z \) is the raw output that represents the linear relationship between the input features and the target variable.
Step 2: Sigmoid Function
The raw output \( z \) is then passed through the sigmoid (logistic) function, which maps the value to a range between 0 and 1:
\( h(z) = \dfrac{1}{1 + e^{-z}} \)
The sigmoid function ensures that the output can be interpreted as a probability:
- \( h(z) \approx 1 \) when \( z \) is very large (indicating high confidence for Class 1)
- \( h(z) \approx 0 \) when \( z \) is very small (indicating high confidence for Class 0)
- \( h(z) = 0.5 \) when \( z = 0 \) (indicating no preference between the classes)
This step introduces nonlinearity, making logistic regression suitable for binary classification tasks.
Step 3: Classification
To assign a class label, the predicted probability \( h(z) \) is compared against a decision threshold (typically 0.5):
- If \( h(z) \geq 0.5 \), the input is classified as Class 1.
- If \( h(z) < 0.5 \), the input is classified as Class 0.
In practice, the threshold can be adjusted based on the specific application. For example:
- A lower threshold (e.g., 0.3) might be used to reduce false negatives in sensitive cases like medical diagnosis.
- A higher threshold (e.g., 0.7) might be used to reduce false positives in applications like fraud detection.
Step 4: Model Training
During training, the algorithm adjusts the coefficients (\( \beta_0, \beta_1, \dots, \beta_n \)) to minimize the error in predictions. The cost function used in logistic regression is the log-loss function (or binary cross-entropy), defined as:
\( J(\beta) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(h(z_i)) + (1 – y_i) \log(1 – h(z_i)) \right] \)
Here:
- \( n \): Number of training samples
- \( y_i \): Actual class label (0 or 1) for the \(i^{th}\) sample
- \( h(z_i) \): Predicted probability for the \(i^{th}\) sample
Optimization algorithms like Gradient Descent are used to minimize the cost function and find the best-fit coefficients.
Types of Logistic Regression
- Binary Logistic Regression: Used for problems with two classes, such as “yes” or “no.”
- Multinomial Logistic Regression: Handles classification problems with three or more classes that are not ordered.
- Ordinal Logistic Regression: Used for multi-class problems where the classes have a natural order, such as “low,” “medium,” and “high.”
Key Characteristics of Logistic Regression
- Output Interpretation: Logistic regression outputs probabilities, making it easy to interpret the confidence of predictions.
- Decision Boundary: The decision boundary is linear in the input space, making it suitable for linearly separable data.
- Cost Function: Logistic regression uses the log-loss (cross-entropy) function to measure error:
- \( J(\beta) = -\dfrac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h(z_i)) + (1 – y_i) \log(1 – h(z_i)) \right] \)
- Here, \( y_i \) is the actual label, \( h(z_i) \) is the predicted probability, and \( m \) is the number of training examples.
Advantages of Logistic Regression
- Simplicity: Easy to understand and implement.
- Efficiency: Works well for small to medium-sized datasets.
- Probability Outputs: Provides probabilistic predictions, which are useful for decision-making.
- Feature Importance: Coefficients can indicate the importance of each feature.
Limitations of Logistic Regression
- Linearity Assumption: Assumes a linear relationship between input features and the log-odds of the output, which may not hold for complex datasets.
- Not Suitable for Non-linear Problems: Logistic regression cannot capture non-linear decision boundaries without transformations or feature engineering.
- Sensitive to Outliers: Outliers can affect the model’s performance if not handled properly.