Machine Learning: January 2025

Wednesday, January 8, 2025

Why Polynomial Models Have High Variance

Polynomial models tend to exhibit high variance because they are more complex and flexible, making them sensitive to fluctuations in the training data.

1. Overfitting to Training Data

Nature of the Model:
Polynomial models fit the data by adding higher-degree terms (

x^{2}, x^{3}, \dots

):

y=w0+w1x+w2x2+⋯+wnxn

Limitation:
With higher degrees, the model can fit even minor noise in the data, leading to overfitting.

The model performs well on the training set.

It fails to generalize to unseen data (test/validation sets).

2. Sensitivity to Small Data Fluctuations

Reason:
A high-degree polynomial has more parameters (weights) to optimize, making it sensitive to small changes in the data.

Effect:
Small variations in the training data can result in drastically different polynomial curves.

3. Increased Model Complexity

Impact of Higher Degrees:
As the degree of the polynomial increases:
- The model gains flexibility to fit the training data.
- It loses stability when generalizing to new data.

Result:
This complexity results in a model with high variance.

4.Lack of Smoothness in Predictions

Nature of Polynomial Curves:
High-degree polynomial curves can oscillate wildly, especially near the edges of the data range.

This behavior, known as the Runge phenomenon, is a sign of overfitting.

5. Bias-Variance Tradeoff

Polynomial models reduce bias by fitting the training data well, but this comes at the cost of increased variance.
The high variance makes predictions unstable and less reliable on unseen data.

Example

Dataset: Predicting house prices based on size.

Linear Model: Assumes a straight-line relationship and underfits the data (high bias).
2nd Degree Polynomial: Fits the data better, reducing bias but slightly increasing variance.
10th Degree Polynomial: Captures all nuances in the training data, including noise, leading to high variance and overfitting.

Why Linear Models Have Higher Bias

Linear models tend to exhibit higher bias because they make simplistic assumptions about the relationship between the input features and the target variable.

1. Assumption of Linearity

Nature of the Model:
Linear models assume a linear relationship between the input features (

x

) and the output (

y

y=w1x1+w2x2+…+b

Limitation:
Real-world data often exhibits complex, non-linear relationships that a linear model cannot capture. As a result:

The predictions are far from the true values, leading to high bias.
The model underfits the data because it oversimplifies the problem.

2. Reduced Model Complexity

Simple Parameterization:
Linear models have relatively few parameters (weights and biases), which limits their flexibility.

Limitation:
They cannot adapt to intricate patterns in the data, especially in cases with high feature interactions or non-linear dependencies.

3. Lack of Feature Interactions

Nature of Interactions:
Linear models do not automatically account for interactions between features (e.g., the combined effect of $x_1$ and $x_2$ on $y$ ).

Limitation:
In many real-world problems, feature interactions play a critical role. Ignoring these interactions increases the model's bias.

4. High Bias by Design

Simplified Decision Boundaries:
Linear models create straight-line decision boundaries (e.g., in classification tasks). These boundaries may not accurately separate complex data distributions.

Example:
In image classification, linear models fail to capture spatial and hierarchical patterns, leading to underperformance.

5. Robustness vs. Bias

Intention:
Linear models are intentionally designed to be robust and interpretable but at the cost of higher bias.

Tradeoff:
They avoid overfitting (low variance) but underfit the data due to their simplicity.

When Are Linear Models Useful Despite High Bias?

When Relationships Are Actually Linear:

If the underlying relationship is simple, linear models work well.

Example: Predicting house prices based on square footage.

When Interpretability Is Key:

Linear models are easier to interpret compared to complex non-linear models.

When Data Is Limited:

Linear models generalize better when there isn’t enough data to support more complex models.