The knowledge about the bias variance trade off is very important for you as a statistician as it comes across your work – especially exploratory work – all the time. Taking care about it will help you avoid problems of over- or underfitting your models.
Wikipedia explains the principle very well:
“In statistics and machine learning, the bias–variance tradeoff is the property of a model that the variance of the parameter estimates across samples can be reduced by increasing the bias in the estimated parameters. The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:
- The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
- The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
This trade-off is universal: It has been shown that a model that is asymptotically unbiased must have unbounded variance”
In this episode, we dive into 3 examples for it:
- Subgroup analyses
- Cluster analysis
- Regression analysis
Wikipedia – Variance trade off