Linear Regression in Machine Learning

Linear Regression is a supervised learning algorithm. It models the relationship between one or more independent variables (features) and a dependent variable (target). It does so by fitting a straight line to the data.

Linear regression can help us answer questions like:

  • “How much will a house cost based on its size and location?”
  • “What will the temperature be tomorrow based on historical data?”

How Does Linear Regression Work?

Mathematical Model:

Linear regression establishes a relationship between the input features and the output target by assuming that the target can be approximated as a linear combination of the input features.

For Simple Linear Regression, it uses the equation of a straight line:

\( y = mx + b \)

For Multiple Linear Regression, the model generalizes to:

\( y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n \)

Here, y is the predicted value, x₁, x₂, ... are the input features, β₀ is the intercept, and β₁, β₂, ... are the coefficients or weights of the features.

The goal of the model is to find the optimal values for these coefficients.

Training the Model:

The model is trained by minimizing the cost function, which measures the difference between the predicted values and the actual target values.

The most commonly used cost function is the Mean Squared Error (MSE), defined as:

\( \text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i – \hat{y}_i)^2 \)

Here:

  • \( y_i \): Actual value of the target variable for the \(i^{th}\) data point
  • \( \hat{y}_i \): Predicted value for the \(i^{th}\) data point
  • \( n \): Number of data points

The training process involves finding the values of the coefficients (β₀, β₁, β₂, ...) that minimize the MSE. This is typically achieved using techniques like Gradient Descent, which iteratively updates the coefficients as:

\( \beta_j \leftarrow \beta_j – \alpha \frac{\partial}{\partial \beta_j} \text{MSE} \)

Where:

  • \( \beta_j \): The \(j^{th}\) coefficient
  • \( \alpha \): Learning rate (step size)
  • \( \dfrac{\partial}{\partial \beta_j} \text{MSE} \): Partial derivative of the cost function with respect to \(\beta_j\)

Prediction:

Once trained, the model predicts the target value for new inputs by applying the learned coefficients to the input features using the linear equation:

\( \hat{y} = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n \)

Here, \( \hat{y} \) is the predicted value, and x₁, x₂, ... are the feature values for the new input.


Types of Linear Regression

As already specified in the mathematical model, there are two types of linear regression.

1. Simple Linear Regression

Models the relationship between one independent variable and one dependent variable.

Example: Predicting house price based solely on size.

Here house size is the feature, and house price is the target.

2. Multiple Linear Regression

Models the relationship between multiple independent variables and one dependent variable.

Example: Predicting house price based on size, location, and number of bedrooms.

Here features are: house size, house location, and number of bedrooms in the house; in total there are three features. And the target is: house price.


Advantages of Linear Regression

  • Simplicity: Easy to understand, implement, and interpret.
  • Efficiency: Works well on small to medium-sized datasets.
  • Insights: Helps identify and quantify relationships between variables.

Limitations of Linear Regression

  • Assumes Linearity: Linear regression assumes a straight-line relationship between features and target, which may not always hold.
  • Sensitive to Outliers: Extreme values can significantly impact the model’s accuracy.
  • Feature Dependency: The model’s performance decreases with highly correlated or irrelevant features.