Last modified: Jan 31 2026 at 10:09 PM • 4 mins read

Neural Network Representation

Introduction
Anatomy of a Neural Network
Notation for Activations
Notation Conventions
Parameters for Each Layer
What’s Next
Key Takeaways

Introduction

In this lesson, we’ll explore the structure and notation of neural networks, focusing on a simple architecture with one hidden layer (called a 2-layer neural network). Understanding how to properly represent and describe neural networks is fundamental for implementing them.

Anatomy of a Neural Network

A neural network consists of organized layers of nodes (neurons). Let’s break down the components:

Three Types of Layers

Consider a neural network with 3 input features and 4 hidden units:

Input Layer (Layer 0)
- Contains the input features: $x_1, x_2, x_3$
- Stacked vertically as a column vector
- Denoted as $a^{[0]} = X$
- Not counted when describing network depth
Hidden Layer (Layer 1)
- Contains intermediate computations
- In this example: 4 nodes (hidden units)
- Called “hidden” because values aren’t directly observed in training data
- We only see inputs ($X$) and outputs ($y$), not these intermediate values
Output Layer (Layer 2)
- Produces the final prediction $\hat{y}$
- In this example: 1 node for binary classification
- Generates the predicted value

Note: This is called a 2-layer neural network because we don’t count the input layer. The convention counts only layers with learnable parameters (hidden + output layers).

Notation for Activations

The term activations refers to the values that each layer computes and passes to the next layer.

Layer 0 (Input Layer)

\[a^{[0]} = X = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}\]

The input features can be denoted as either $X$ or $a^{[0]}$.

Layer 1 (Hidden Layer)

Each node in the hidden layer computes an activation:

\[a^{[1]} = \begin{bmatrix} a^{[1]}_1 \\ a^{[1]}_2 \\ a^{[1]}_3 \\ a^{[1]}_4 \end{bmatrix}\]

where:

$a^{[1]}_1$ = activation of first hidden unit
$a^{[1]}_2$ = activation of second hidden unit
$a^{[1]}_3$ = activation of third hidden unit
$a^{[1]}_4$ = activation of fourth hidden unit

In Python (NumPy), this is a $(4, 1)$ vector (4 rows, 1 column).

Layer 2 (Output Layer)

\[a^{[2]} = \hat{y}\]

The output layer produces a single real number (for binary classification), which is our prediction.

Notation Conventions

Superscript Notation Summary

Notation	Meaning	Example
$a^{[l]}$	Activations from layer $l$	$a^{[1]}$ = hidden layer activations
$a^{(i)}$	Value for training example $i$	$x^{(i)}$ = features of example $i$
$a^{[l]}_j$	Activation of unit $j$ in layer $l$	$a^{[1]}_3$ = 3rd hidden unit

Key distinction:

Square brackets $[l]$: layer number
Round brackets $(i)$: training example number
Subscript $j$: unit/node number within a layer

Why Logistic Regression Had No Brackets

In logistic regression, we used $\hat{y} = a$ without superscripts because there was only one output layer. With neural networks, we need the $[l]$ notation to distinguish between layers.

Parameters for Each Layer

Each layer with computations (hidden and output) has associated parameters:

Hidden Layer (Layer 1) Parameters

$W^{[1]}$: weight matrix, shape $(4, 3)$
- 4 rows: one for each hidden unit
- 3 columns: one for each input feature
$b^{[1]}$: bias vector, shape $(4, 1)$
- 4 rows: one bias for each hidden unit

Output Layer (Layer 2) Parameters

$W^{[2]}$: weight matrix, shape $(1, 4)$
- 1 row: one output unit
- 4 columns: one for each hidden unit
$b^{[2]}$: bias scalar, shape $(1, 1)$
- Single bias for the output

Parameter Dimensions Summary

For a network with $n_x$ input features, $n^{[1]}$ hidden units, and $n^{[2]}$ output units:

Parameter	Dimensions	Example
$W^{[1]}$	$(n^{[1]}, n_x)$	$(4, 3)$
$b^{[1]}$	$(n^{[1]}, 1)$	$(4, 1)$
$W^{[2]}$	$(n^{[2]}, n^{[1]})$	$(1, 4)$
$b^{[2]}$	$(n^{[2]}, 1)$	$(1, 1)$

We’ll explore these dimensions in more detail when implementing forward propagation.

What’s Next

Now that we understand the structure and notation of a neural network, the next step is to understand the computations: how does the network transform inputs $X$ all the way through to predictions $\hat{y}$? We’ll cover this in the next lesson on computing neural network output.

Key Takeaways

A 2-layer neural network has one hidden layer (input layer not counted)
Hidden layers are called “hidden” because their true values aren’t observed in training data
Activations $a^{[l]}$ are the outputs computed by layer $l$
Notation conventions:
- $[l]$ for layer number
- $(i)$ for training example
- $j$ subscript for unit within a layer
Each layer has parameters $W^{[l]}$ (weights) and $b^{[l]}$ (bias)
Parameter dimensions depend on the number of units in current and previous layers