Last modified: Jan 31 2026 at 10:09 PM • 4 mins read

Neural Networks Overview

Introduction
From Logistic Regression to Neural Networks
Neural Network Structure
How Each Node Works
Why Hidden Layers?
Neural Network Computation
Matrix Dimensions
Comparison: Logistic Regression vs Neural Network
Key Takeaways

Introduction

In the previous week, we learned about logistic regression and how to vectorize it for efficiency. Now we’ll extend these concepts to build neural networks with hidden layers.

Goal: Understand the basic structure and components of a neural network.

From Logistic Regression to Neural Networks

Logistic Regression Review

Recall logistic regression:

$z = w^T x + b$ $a = \sigma(z)$

Components:

Input: $x$
Parameters: $w$, $b$
Linear combination: $z$
Activation: $a = \sigma(z)$
Output: $\hat{y} = a$

What is a Neural Network?

A neural network is essentially stacking multiple logistic regression units together.

Key idea: Instead of going directly from input to output, we add hidden layers that learn intermediate representations.

Neural Network Structure

Single Hidden Layer Network

A neural network with one hidden layer has three layers:

Input Layer: Contains the input features
Hidden Layer: Intermediate layer that learns representations
Output Layer: Produces the final prediction

Example Architecture

Input Layer (Layer 0):

Features: $x_1, x_2, x_3$

Hidden Layer (Layer 1):

Nodes: $a^{[1]}_1, a^{[1]}_2, a^{[1]}_3, a^{[1]}_4$

Output Layer (Layer 2):

Node: $a^{[2]} = \hat{y}$

Notation Convention

Superscript $[l]$: Indicates layer number

$a^{[1]}$ = activations in layer 1 (hidden layer)
$a^{[2]}$ = activations in layer 2 (output layer)
$W^{[1]}$, $b^{[1]}$ = parameters for layer 1
$W^{[2]}$, $b^{[2]}$ = parameters for layer 2

Superscript $(i)$: Indicates training example number

$x^{(i)}$ = $i$-th training example

Subscript: Indicates node/unit number

$a^{[1]}_1$ = first node in layer 1
$a^{[1]}_2$ = second node in layer 1

How Each Node Works

Each node in a neural network performs two steps:

Step 1: Linear Combination

\[z = w^T x + b\]

Step 2: Activation Function

\[a = \sigma(z)\]

For hidden layer node $j$:

$z^{[1]}_j = w^{[1]T}_j x + b^{[1]}_j$ $a^{[1]}_j = \sigma(z^{[1]}_j)$

Why Hidden Layers?

Problem with logistic regression: Can only learn linear decision boundaries.

Solution with hidden layers:

Each hidden layer learns more complex features
Combines input features in non-linear ways
Enables learning of complex patterns

Example:

Input: Raw pixel values
Hidden layer: Learns edges, simple shapes
Output: Classifies the image

Neural Network Computation

Forward Propagation

Computing hidden layer:

For each node $j$ in hidden layer:

$z^{[1]}_j = w^{[1]T}_j x + b^{[1]}_j$ $a^{[1]}_j = \sigma(z^{[1]}_j)$

Computing output:

$z^{[2]} = w^{[2]T} a^{[1]} + b^{[2]}$ $a^{[2]} = \sigma(z^{[2]}) = \hat{y}$

Vectorized Computation

Instead of computing each node separately, we can vectorize:

Hidden layer (all nodes at once):

$Z^{[1]} = W^{[1]} X + b^{[1]}$ $A^{[1]} = \sigma(Z^{[1]})$

Output layer:

$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ $A^{[2]} = \sigma(Z^{[2]})$

Matrix Dimensions

Understanding dimensions is crucial for implementation.

For One Training Example

Input: $x \in \mathbb{R}^{n_x}$ (column vector)

Hidden layer:

$W^{[1]} \in \mathbb{R}^{n^{[1]} \times n_x}$ where $n^{[1]}$ = number of hidden units
$b^{[1]} \in \mathbb{R}^{n^{[1]}}$
$Z^{[1]} \in \mathbb{R}^{n^{[1]}}$
$A^{[1]} \in \mathbb{R}^{n^{[1]}}$

Output layer:

$W^{[2]} \in \mathbb{R}^{1 \times n^{[1]}}$ (for binary classification)
$b^{[2]} \in \mathbb{R}$
$Z^{[2]} \in \mathbb{R}$
$A^{[2]} \in \mathbb{R}$

For $m$ Training Examples

We stack examples horizontally (as columns):

Input: $X \in \mathbb{R}^{n_x \times m}$

Hidden layer:

$Z^{[1]} \in \mathbb{R}^{n^{[1]} \times m}$
$A^{[1]} \in \mathbb{R}^{n^{[1]} \times m}$

Output layer:

$Z^{[2]} \in \mathbb{R}^{1 \times m}$
$A^{[2]} \in \mathbb{R}^{1 \times m}$

Comparison: Logistic Regression vs Neural Network

Aspect	Logistic Regression	Neural Network
Layers	2 (input, output)	3+ (input, hidden(s), output)
Parameters	$w$, $b$	$W^{[1]}$, $b^{[1]}$, $W^{[2]}$, $b^{[2]}$, …
Complexity	Linear decision boundary	Non-linear decision boundaries
Representations	Uses raw features	Learns feature representations
Computation	Single step	Multiple steps (layers)

Key Takeaways

Neural networks stack multiple logistic regression-like units
Hidden layers learn intermediate representations
Notation: Use $[l]$ for layer number, $(i)$ for example number
Each node: Computes $z = w^T x + b$, then $a = \sigma(z)$
Forward propagation: Compute activations layer by layer
Vectorization: Process all examples and nodes simultaneously
More layers: Enable learning of more complex patterns
Matrix dimensions: Critical for correct implementation

Remember: A neural network is just organized logistic regression units that learn hierarchical representations!

Neural Networks Overview

Table of contents

Introduction

From Logistic Regression to Neural Networks

Logistic Regression Review

What is a Neural Network?

Neural Network Structure

Single Hidden Layer Network

Example Architecture

Notation Convention

How Each Node Works

Step 1: Linear Combination

Step 2: Activation Function

Why Hidden Layers?

Neural Network Computation

Forward Propagation

Vectorized Computation

Matrix Dimensions

For One Training Example

For $m$ Training Examples

Comparison: Logistic Regression vs Neural Network

Key Takeaways