Last modified: Jan 31 2026 at 10:09 PM • 7 mins read

Bias / Variance

Introduction
Visual Intuition: The Three Classifiers
Diagnosing Bias and Variance with Error Metrics
Four Classic Scenarios
Decision Framework
High Bias + High Variance: How Is This Possible?
The Role of Bayes Error
Summary: Two-Step Diagnostic Process
Important Caveats
Key Takeaways

Introduction

Understanding bias and variance is a critical skill that separates good machine learning practitioners from great ones. While the concepts are easy to learn, they’re surprisingly difficult to master - there are nuances that reveal themselves only with experience.

The Deep Learning Era Shift

In traditional machine learning, people often discussed the “bias-variance tradeoff” - the idea that reducing bias increases variance and vice versa. However, in the deep learning era:

We still talk about bias and variance, but the tradeoff has become less restrictive.

With modern techniques (more data, regularization, bigger networks), we can often reduce bias without increasing variance, or reduce variance without increasing bias. Let’s explore how to diagnose and fix these issues.

Visual Intuition: The Three Classifiers

2D Classification Example

Consider a binary classification problem with features $x_1$ and $x_2$:

Characteristics:

Classifier Type	Complexity	Fits Training Data	Generalizes Well	Problem
High Bias	Too simple (e.g., linear)	❌ No	N/A	Underfitting
Just Right	Appropriate	✅ Yes	✅ Yes	None
High Variance	Too complex	✅ Yes	❌ No	Overfitting

The Challenge in High Dimensions

In 2D, you can visualize the decision boundary and see bias/variance issues directly. But in high-dimensional problems (e.g., cat classification with millions of pixels), you can’t plot the data.

Solution: Use quantitative metrics instead of visual inspection.

Diagnosing Bias and Variance with Error Metrics

The Two Key Numbers

To diagnose bias and variance, compare these errors:

Training set error: How well does your model fit the training data?
Dev set error: How well does your model generalize to new data?

Assumptions for This Analysis

Important: This analysis assumes that human-level performance (or Bayes error) is nearly 0%.

For tasks like recognizing cats in clear photos, humans achieve ~0% error. We’ll discuss more complex cases later.

Four Classic Scenarios

Scenario 1: High Variance

Metric	Value	Interpretation
Training error	1%	Excellent fit to training data
Dev error	11%	Poor generalization
Diagnosis	High Variance	Overfitting

Analysis:

Model performs well on training set (1% error)
Performance degrades significantly on dev set (11% error)
Gap of 10% indicates the model memorized training data rather than learning general patterns

Visual analogy: This is like the rightmost plot - the complex, wiggly decision boundary.

Scenario 2: High Bias

Metric	Value	Interpretation
Training error	15%	Poor fit even to training data
Dev error	16%	Slightly worse on dev set
Diagnosis	High Bias	Underfitting

Analysis:

Model can’t even fit the training set well (15% error)
Dev set performance is only slightly worse (16% error)
Small gap (1%) means model is consistently underperforming
Model is too simple to capture underlying patterns

Visual analogy: This is like the leftmost plot - the straight line that doesn’t fit the data.

Scenario 3: High Bias AND High Variance (Worst Case!)

Metric	Value	Interpretation
Training error	15%	Poor fit to training data
Dev error	30%	Much worse on dev set
Diagnosis	High Bias + High Variance	Worst of both worlds

Analysis:

Model doesn’t fit training data well (15% error) → High bias
Model generalizes even worse (30% error) → High variance
This happens when the model is wrong in multiple ways

How is this possible? See the next section for explanation.

Scenario 4: Low Bias AND Low Variance (Ideal!)

Metric	Value	Interpretation
Training error	0.5%	Excellent fit to training data
Dev error	1%	Excellent generalization
Diagnosis	Low Bias + Low Variance	Optimal performance

Analysis:

Model fits training data nearly perfectly (0.5% error)
Model generalizes well (1% error)
Small gap (0.5%) indicates good generalization
This is your goal!

Decision Framework

Here’s a quick decision tree for diagnosis:

\[\begin{align} \text{Training error high?} &\implies \text{High Bias (Underfitting)} \\ \text{Dev error ≫ Training error?} &\implies \text{High Variance (Overfitting)} \\ \text{Both conditions true?} &\implies \text{High Bias + High Variance} \\ \text{Both conditions false?} &\implies \text{Low Bias + Low Variance (Good!)} \end{align}\]

Concrete Thresholds (Cat Classification Example)

Cat classification example showing two images labeled Cat with a kitten and black puppy. Below them is a table showing Training Error of 1% and Dev set error of 11%, illustrating high variance where the model fits training data well but fails to generalize to new data.

Assuming human-level performance ≈ 0%:

Training Error	Dev Error	Dev - Train Gap	Diagnosis
< 2%	< 3%	< 1%	✅ Low bias, low variance
< 2%	> 5%	> 3%	⚠️ Low bias, high variance
> 5%	Similar to train	< 1%	⚠️ High bias, low variance
> 5%	Much higher	> 5%	🚫 High bias, high variance

High Bias + High Variance: How Is This Possible?

2D Example (Appears Contrived)

Imagine a classifier that is:

Mostly linear (underfits most of the data) → High bias
Extremely flexible in some regions (overfits noise/outliers) → High variance

Characteristics:

Linear portion underfits the quadratic shape (high bias)
Small wiggles overfit individual noisy examples (high variance)
Needs a smooth quadratic curve, not a linear function with bumps

Why This Matters in High Dimensions

While this 2D example might seem contrived, in high-dimensional problems this is quite common:

Example: Image classification

Model might be too simple for certain features (underfitting face detection) → High bias
Model might be too complex for other features (overfitting background patterns) → High variance
Different parts of the input space can have different bias/variance characteristics

In neural networks:

Early layers might underfit (high bias)
Later layers might overfit (high variance)
Different neurons can specialize in different ways

The Role of Bayes Error

What is Bayes Error?

Bayes error (or optimal error) is the best possible error rate achievable by any classifier, even with infinite data. It represents irreducible error due to:

Label noise (mislabeled examples)
Overlapping classes (impossible to distinguish)
Information loss (insufficient features)
Inherent randomness

When Bayes Error Affects Your Analysis

Standard assumption: Bayes error ≈ 0% (humans can perform the task perfectly)

But what if the task is inherently difficult?

Example: Blurry image classification

If images are so blurry that even humans can only achieve 15% accuracy:

Metric	Value	Standard Analysis	Corrected Analysis
Bayes error	15%	(Assumed 0%)	(Actual)
Training error	15%	High bias?	Actually optimal!
Dev error	16%	High bias?	Low bias, low variance

Key insight: Compare your errors to Bayes error, not to 0%.

Practical Implications

When Bayes error is significant:

\[\text{Bias} = \text{Training error} - \text{Bayes error}\] \[\text{Variance} = \text{Dev error} - \text{Training error}\]

Example with Bayes error = 15%:

Training error: 15%, Dev error: 16%

Bias = 15% - 15% = 0% (low bias!)
Variance = 16% - 15% = 1% (low variance!)

We’ll cover this in more detail in later lessons on human-level performance.

Summary: Two-Step Diagnostic Process

Step 1: Check Training Error (Measures Bias)

\[\text{Training error vs Bayes error} \implies \text{Bias assessment}\]

Training error close to Bayes error: Low bias ✅
Training error much higher than Bayes error: High bias ⚠️

Step 2: Check Dev Error Gap (Measures Variance)

\[\text{Dev error vs Training error} \implies \text{Variance assessment}\]

Small gap (< 1-2%): Low variance ✅
Large gap (≫ 2%): High variance ⚠️

Combined Diagnosis Table

Train Error	Dev - Train Gap	Diagnosis	Action Needed
Low	Low	✅ Good model	Deploy!
Low	High	🟡 High variance	More data, regularization
High	Low	🟡 High bias	Bigger model, more features
High	High	🔴 Both problems	Address bias first

Important Caveats

This analysis assumes:

✅ Bayes error is small (task is feasible)
✅ Train and dev sets drawn from same distribution (no distribution shift)

If either assumption is violated, you need more sophisticated analysis (covered in later lessons).

Key Takeaways

Critical skill: All great ML practitioners deeply understand bias and variance
Two metrics: Training error and dev error tell you everything you need
High bias (underfitting): Model too simple, can’t fit training data well
High variance (overfitting): Model too complex, doesn’t generalize to dev set
Both possible: High-dimensional models can have high bias AND high variance
Training error: Measures how well you fit the data (bias indicator)
Dev-train gap: Measures how well you generalize (variance indicator)
Bayes error matters: Compare to optimal error, not always 0%
Blurry images: When task is inherently hard, Bayes error is high
Distribution assumption: Train and dev must be from same distribution
No more tradeoff: Deep learning can reduce both bias and variance
Visual intuition: Linear = high bias, wiggly = high variance, smooth curve = just right
High-dim complexity: Bias/variance can vary across different input regions
Diagnostic first: Always diagnose before trying to fix

Bias / Variance

Table of contents

Introduction

The Deep Learning Era Shift

Visual Intuition: The Three Classifiers

2D Classification Example

The Challenge in High Dimensions

Diagnosing Bias and Variance with Error Metrics

The Two Key Numbers

Assumptions for This Analysis

Four Classic Scenarios

Scenario 1: High Variance

Scenario 2: High Bias

Scenario 3: High Bias AND High Variance (Worst Case!)

Scenario 4: Low Bias AND Low Variance (Ideal!)

Decision Framework

Concrete Thresholds (Cat Classification Example)

High Bias + High Variance: How Is This Possible?

2D Example (Appears Contrived)

Why This Matters in High Dimensions

The Role of Bayes Error

What is Bayes Error?

When Bayes Error Affects Your Analysis

Practical Implications

Summary: Two-Step Diagnostic Process

Step 1: Check Training Error (Measures Bias)

Step 2: Check Dev Error Gap (Measures Variance)

Combined Diagnosis Table

Important Caveats

Key Takeaways