Wednesday, January 8, 2025

Why Polynomial Models Have High Variance

Polynomial models tend to exhibit high variance because they are more complex and flexible, making them sensitive to fluctuations in the training data.


1. Overfitting to Training Data

Nature of the Model:
Polynomial models fit the data by adding higher-degree terms (x2,x3,):

           y=w0+w1x+w2x2++wnxn


Limitation:
With higher degrees, the model can fit even minor noise in the data, leading to overfitting.
  • The model performs well on the training set.
  • It fails to generalize to unseen data (test/validation sets).

2. Sensitivity to Small Data Fluctuations
  • Reason:
    A high-degree polynomial has more parameters (weights) to optimize, making it sensitive to small changes in the data.
  • Effect:
    Small variations in the training data can result in drastically different polynomial curves.

3. Increased Model Complexity

  • Impact of Higher Degrees:
    As the degree of the polynomial increases:
    • The model gains flexibility to fit the training data.
    • It loses stability when generalizing to new data.

  • Result:
    This complexity results in a model with high variance.


 4.Lack of Smoothness in Predictions

  • Nature of Polynomial Curves:
    High-degree polynomial curves can oscillate wildly, especially near the edges of the data range.
    • This behavior, known as the Runge phenomenon, is a sign of overfitting.

5. Bias-Variance Tradeoff

  • Polynomial models reduce bias by fitting the training data well, but this comes at the cost of increased variance.
  • The high variance makes predictions unstable and less reliable on unseen data.


Example

Dataset: Predicting house prices based on size.

  • Linear Model: Assumes a straight-line relationship and underfits the data (high bias).
  • 2nd Degree Polynomial: Fits the data better, reducing bias but slightly increasing variance.
  • 10th Degree Polynomial: Captures all nuances in the training data, including noise, leading to high variance and overfitting.

Why Linear Models Have Higher Bias


Linear models tend to exhibit higher bias because they make simplistic assumptions about the relationship between the input features and the target variable. 


1. Assumption of Linearity

Nature of the Model:
Linear models assume a linear relationship between the input features (x) and the output (y):
                                                        y=w1x1+w2x2++b

Limitation:
Real-world data often exhibits complex, non-linear relationships that a linear model cannot capture. As a result:

  • The predictions are far from the true values, leading to high bias.
  • The model underfits the data because it oversimplifies the problem.


2. Reduced Model Complexity

  • Simple Parameterization:
    Linear models have relatively few parameters (weights and biases), which limits their flexibility.

  • Limitation:
    They cannot adapt to intricate patterns in the data, especially in cases with high feature interactions or non-linear dependencies.


3. Lack of Feature Interactions

  • Nature of Interactions:
    Linear models do not automatically account for interactions between features (e.g., the combined effect of x1x_1 and x2x_2 on yy).

  • Limitation:
    In many real-world problems, feature interactions play a critical role. Ignoring these interactions increases the model's bias.
 

4. High Bias by Design

  • Simplified Decision Boundaries:
    Linear models create straight-line decision boundaries (e.g., in classification tasks). These boundaries may not accurately separate complex data distributions.

  • Example:
    In image classification, linear models fail to capture spatial and hierarchical patterns, leading to underperformance.

5. Robustness vs. Bias

  • Intention:
    Linear models are intentionally designed to be robust and interpretable but at the cost of higher bias.

  • Tradeoff:
    They avoid overfitting (low variance) but underfit the data due to their simplicity.


When Are Linear Models Useful Despite High Bias?

When Relationships Are Actually Linear:
    • If the underlying relationship is simple, linear models work well.
    • Example: Predicting house prices based on square footage.

When Interpretability Is Key:

  • Linear models are easier to interpret compared to complex non-linear models.

When Data Is Limited:

  • Linear models generalize better when there isn’t enough data to support more complex models.







Monday, December 23, 2024

Validation Error

What is Validation Error?

 Validation Error refers to the error (or loss) calculated on a validation dataset during the training process of a machine learning model. The validation dataset is a subset of data that is not used for training but is used to evaluate the model's performance after each training iteration (epoch).

 

Purpose of Validation Error

  • Prevent Overfitting:
    Overfitting occurs when the model learns the noise or irrelevant details in the training data, leading to poor generalization. If validation error increases while training error decreases, it is a sign of overfitting.

  • Monitor Model Performance:
    Validation error helps in choosing hyperparameters like learning rate, number of layers, and nodes.

  • Enable Early Stopping:
    Early stopping halts training when validation error stops decreasing, saving computation time and preventing overfitting.


Root Mean Square Error (RMSE)

 Root Mean Square Error (RMSE)

Definition:
Root Mean Square Error (RMSE) is a commonly used metric to measure the difference between predicted values by a model and the actual observed values. It represents the standard deviation of the residuals (prediction errors).


Key Properties of RMSE:

  1. Units: RMSE is in the same units as the target variable, making it interpretable.
  2. Sensitivity to Outliers: RMSE penalizes large errors more heavily because the errors are squared before averaging.
  3. Best for Continuous Variables: Used mainly for regression problems to measure accuracy.






Generalization

 

Generalization

Generalization in machine learning refers to a model's ability to perform well on unseen data (data it has not encountered during training). A well-generalized model captures the underlying patterns in the training data without overfitting to its noise or specific details, enabling it to make accurate predictions on new, unseen datasets.


Training Performance vs. Test Performance

Bias-Variance Tradeoff

Evaluation Metrics


Key Aspects of Generalization

  1. Training Performance vs. Test Performance

    • A well-generalized model has similar performance on both the training data and test data.
    • Poor generalization often leads to:
      • Overfitting: Performs well on training data but poorly on unseen data.
      • Underfitting: Performs poorly on both training and unseen data due to oversimplification.
  2. Bias-Variance Tradeoff

    • Achieving good generalization often involves balancing bias and variance:
      • Low bias ensures the model captures complex patterns.
      • Low variance ensures the model doesn’t overfit the training data.
  3. Evaluation Metrics

    • Generalization is measured using metrics like accuracy, precision, recall, or RMSE, evaluated on validation/test datasets.

Improving Generalization

  • Regularization: Adds constraints to the model (e.g., L1/L2 regularization).
  • Dropout: Randomly deactivates some neurons during training to prevent overfitting.
  • Cross-Validation: Validates the model's performance on multiple subsets of the data.
  • Early Stopping: Stops training when the validation error starts increasing.
  • Data Augmentation: Increases training data diversity to improve robustness.

Generalization Gap

The generalization gap is the difference between a model's performance on the training dataset and its performance on the validation or test dataset. It provides a measure of how well the model can generalize from the training data to unseen data.