PyTorch: Part 3 - Building Neural Networks

What is a Neural Network?

In Part 1 we learned about tensors, and in Part 2 we learned about autograd. Now let’s put them together to build neural networks.

A neural network is basically a mathematical function with adjustable parameters (called weights and biases). We adjust these parameters during training to make better predictions.

Think of it like this:

Input → Hidden Layers → Output

Each layer applies a transformation: output = activation(weights × input + bias)

Don’t worry if this sounds complex. We’ll build one step by step and it will make sense!

The nn.Module Class

In PyTorch, we build neural networks by creating a class that inherits from nn.Module.

Here’s the basic structure:

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        # Define layers here

    def forward(self, x):
        # Define how data flows through layers
        return x

Two important things:

__init__: Define all your layers here
forward: Define how data flows through the layers

Building Our First Network

Let’s build a simple network with:

Input: 10 features
Hidden layer: 5 neurons
Output: 2 classes

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        # Linear layer: input_features → output_features
        self.fc1 = nn.Linear(10, 5)   # 10 inputs → 5 outputs
        self.fc2 = nn.Linear(5, 2)    # 5 inputs → 2 outputs

    def forward(self, x):
        # x goes through first layer with ReLU activation
        x = torch.relu(self.fc1(x))
        # Then through second layer
        x = self.fc2(x)
        return x

# Create the network
model = SimpleNet()
print(model)

Output:

SimpleNet(
  (fc1): Linear(in_features=10, out_features=5, bias=True)
  (fc2): Linear(in_features=5, out_features=2, bias=True)
)

We just created a neural network! Here’s what it looks like:

Network Architecture Visualization

Let’s break down what each part does.

Understanding Linear Layers

nn.Linear(in_features, out_features) is the most common layer type. It’s also called a fully connected layer or dense layer.

What it does:

output = input × weights + bias

For nn.Linear(10, 5):

Takes input with 10 features
Produces output with 5 features
Has 10×5 = 50 weights + 5 biases = 55 parameters

# Check the number of parameters
layer = nn.Linear(10, 5)
print(f"Weights shape: {layer.weight.shape}")   # [5, 10]
print(f"Bias shape: {layer.bias.shape}")        # [5]
print(f"Total parameters: {layer.weight.numel() + layer.bias.numel()}")  # 55

Activation Functions

Activation functions add non linearity to our network. Without them, stacking multiple linear layers would just be one big linear function.

ReLU (Most Common)

ReLU is simple: if the value is negative, make it 0. Otherwise, keep it.

x = torch.tensor([-1.0, 0.0, 1.0, 2.0])
relu = torch.relu(x)
print("ReLU:", relu)  # tensor([0., 0., 1., 2.])

Sigmoid (Output 0 to 1)

Good for probabilities or binary classification.

x = torch.tensor([-1.0, 0.0, 1.0])
sigmoid = torch.sigmoid(x)
print("Sigmoid:", sigmoid)  # tensor([0.2689, 0.5000, 0.7311])

Softmax (For Classification)

Converts scores to probabilities that sum to 1.

logits = torch.tensor([2.0, 1.0, 0.1])
softmax = torch.softmax(logits, dim=0)
print("Softmax:", softmax)  # tensor([0.6590, 0.2424, 0.0986])
print("Sum:", softmax.sum())  # tensor(1.)

When to use what?

ReLU: Use in hidden layers (most common choice)
Sigmoid: Use for binary classification output (yes/no)
Softmax: Use for multi class classification output (choose one of many)

The Forward Pass

The forward method defines how data flows through the network.

model = SimpleNet()

# Create random input (batch_size=3, features=10)
x = torch.randn(3, 10)
print("Input shape:", x.shape)

# Forward pass
output = model(x)
print("Output shape:", output.shape)
print("Output:\n", output)

Output:

Input shape: torch.Size([3, 10])
Output shape: torch.Size([3, 2])
Output:
 tensor([[-0.3456,  0.2134],
        [ 0.1234, -0.5678],
        [-0.2341,  0.4567]], grad_fn=<AddmmBackward0>)

Notice:

We put in 3 samples with 10 features each
We got out 3 samples with 2 values each (our 2 classes)
The output has grad_fn because PyTorch is tracking gradients

Never call model.forward(x) directly! Always use model(x). This ensures PyTorch’s hooks and other features work correctly.

Common Layer Types

Here are other layer types you’ll encounter:

Convolutional Layer (for images)

# For image processing
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3)

Dropout (prevents overfitting)

# Randomly sets 50% of neurons to 0 during training
dropout = nn.Dropout(p=0.5)

Batch Normalization (stabilizes training)

# Normalizes layer outputs
bn = nn.BatchNorm1d(num_features=5)

Checking Model Parameters

You can see all trainable parameters:

model = SimpleNet()

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")

# See all parameters
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")

Output:

Total parameters: 67
fc1.weight: torch.Size([5, 10])
fc1.bias: torch.Size([5])
fc2.weight: torch.Size([2, 5])
fc2.bias: torch.Size([2])

Training vs Evaluation Mode

Neural networks behave differently during training and evaluation. Some layers like Dropout and BatchNorm act differently.

model = SimpleNet()

# Training mode (default)
model.train()
print(model.training)  # True

# Evaluation mode
model.eval()
print(model.training)  # False

Always set the correct mode!

Use model.train() when training
Use model.eval() when making predictions

Moving Model to GPU

If you have a GPU, you can speed up training:

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Move model to GPU
model = SimpleNet()
model = model.to(device)

# Don't forget to move data too!
x = torch.randn(3, 10).to(device)
output = model(x)

Quick Reference

Method	What It Does
`model.parameters()`	Returns all trainable parameters
`model.train()`	Set to training mode
`model.eval()`	Set to evaluation mode
`model.to(device)`	Move model to CPU or GPU
`model.state_dict()`	Get all parameters as dictionary
`model.load_state_dict()`	Load saved parameters

Key Takeaways

Build networks by inheriting from nn.Module
Define layers in __init__, use them in forward
nn.Linear is the basic fully connected layer
ReLU is the most common activation function
Use model.train() for training, model.eval() for inference
Always use model(x) not model.forward(x)
Move model and data to GPU with .to(device)

What’s Next?

We now know how to build neural networks. But how do we actually train them? In Part 4, we’ll learn about the Training Loop. We’ll cover:

Loss functions
Optimizers
The complete training loop
Making predictions

See you in Part 4!

PyTorch: Part 3 - Building Neural Networks

What is a Neural Network?

The nn.Module Class

Building Our First Network

Understanding Linear Layers

Activation Functions

ReLU (Most Common)

Sigmoid (Output 0 to 1)

Softmax (For Classification)

The Forward Pass

Common Layer Types

Convolutional Layer (for images)

Dropout (prevents overfitting)

Batch Normalization (stabilizes training)

Checking Model Parameters

Training vs Evaluation Mode

Moving Model to GPU

Quick Reference

Key Takeaways

What’s Next?

📚 Series: PyTorch

Comments