1. Overfitting to Training Data
Nature of the Model:Polynomial models fit the data by adding higher-degree terms ():
y=w0+w1x+w2x2+⋯+wnxn
With higher degrees, the model can fit even minor noise in the data, leading to overfitting.
- The model performs well on the training set.
- It fails to generalize to unseen data (test/validation sets).
- Reason:
A high-degree polynomial has more parameters (weights) to optimize, making it sensitive to small changes in the data.
- Effect:
Small variations in the training data can result in drastically different polynomial curves.
3. Increased Model Complexity
- Impact of Higher Degrees:
As the degree of the polynomial increases:- The model gains flexibility to fit the training data.
- It loses stability when generalizing to new data.
- Result:
This complexity results in a model with high variance.
4.Lack of Smoothness in Predictions
- Nature of Polynomial Curves:
High-degree polynomial curves can oscillate wildly, especially near the edges of the data range. - This behavior, known as the Runge phenomenon, is a sign of overfitting.
5. Bias-Variance Tradeoff
- Polynomial models reduce bias by fitting the training data well, but this comes at the cost of increased variance.
- The high variance makes predictions unstable and less reliable on unseen data.
Example
Dataset: Predicting house prices based on size.
- Linear Model: Assumes a straight-line relationship and underfits the data (high bias).
- 2nd Degree Polynomial: Fits the data better, reducing bias but slightly increasing variance.
- 10th Degree Polynomial: Captures all nuances in the training data, including noise, leading to high variance and overfitting.
