2-Stage Backpropagation in Python

Looking to improve your model’s performance and generalization? 🚀 This tutorial will guide you through 2-Stage Backpropagation in a practical, step-by-step manner using PyTorch, helping you understand and implement this advanced technique with clarity.

🔹 What is 2-Stage Backpropagation?

2-Stage Backpropagation is a powerful method for training neural networks that helps boost convergence and generalization. The idea is simple: split your network into two parts, train them separately, and then fine-tune the whole model for the best results.

import torch
import torch.nn as nn

class TwoStageNetwork(nn.Module):
    def __init__(self):
        super(TwoStageNetwork, self).__init__()
        self.stage1 = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 128)
        )
        self.stage2 = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        x = self.stage1(x)
        x = self.stage2(x)
        return x

🔹 Stage 1: Learning Robust Features

The first stage is all about helping your network discover meaningful, robust features from your data. This part usually involves convolutional or dense layers that dig deep to extract useful patterns and representations.

def train_stage1(model, dataloader, optimizer, criterion, epochs):
    for epoch in range(epochs):
        for inputs, _ in dataloader:
            optimizer.zero_grad()
            features = model.stage1(inputs)
            loss = criterion(features, features.detach())  # Self-supervised loss
            loss.backward()
            optimizer.step()

🔹 Stage 2: Final Classification

Now, the features your model learned in Stage 1 are put to work! In this stage, the network uses those features to perform the main classification task—typically using fully connected layers to make predictions.

def train_stage2(model, dataloader, optimizer, criterion, epochs):
    for epoch in range(epochs):
        for inputs, labels in dataloader:
            optimizer.zero_grad()
            features = model.stage1(inputs).detach()
            outputs = model.stage2(features)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

🔹 Freezing Learned Features

When training Stage 2, it’s helpful to “freeze” the parameters from Stage 1. This means you lock in those useful features and let the classification layers do the learning, ensuring your features stay intact.

def freeze_stage1(model):
    for param in model.stage1.parameters():
        param.requires_grad = False

def unfreeze_stage1(model):
    for param in model.stage1.parameters():
        param.requires_grad = True

🔹 Fine-Tuning for Best Performance

Once both stages are trained, it’s time for the magic touch: fine-tune the entire network together! This step helps your model reach its full potential by letting all layers learn in harmony.

def finetune(model, dataloader, optimizer, criterion, epochs):
    unfreeze_stage1(model)
    for epoch in range(epochs):
        for inputs, labels in dataloader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

🔹 Using Learning Rate Schedulers

Smart learning rate scheduling can make a big difference. Adjusting the learning rate at the right time helps your model converge faster and more reliably—let’s see how to set that up.

from torch.optim.lr_scheduler import StepLR

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)

for epoch in range(epochs):
    train_epoch(model, dataloader, optimizer, criterion)
    scheduler.step()

🔹 Boosting Stage 1 with Data Augmentation

Want even better features? Try data augmentation during Stage 1! By transforming your input data in creative ways, your model learns to generalize and handle real-world variations.

from torchvision import transforms

stage1_transforms = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(10),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

stage1_dataset = CustomDataset(transform=stage1_transforms)
stage1_dataloader = DataLoader(stage1_dataset, batch_size=32, shuffle=True)

🔹 Stabilizing Training with Gradient Clipping

Ever had your training go haywire? Gradient clipping is a simple trick to keep your training stable—especially useful during fine-tuning when gradients can get out of control.

def train_with_gradient_clipping(model, dataloader, optimizer, criterion, clip_value):
    for inputs, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
        optimizer.step()

🔹 Avoiding Overfitting with Early Stopping

Don’t let your model overtrain! Early stopping helps you halt training at just the right moment, preventing overfitting and saving you time.

class EarlyStopping:
    def __init__(self, patience=5, min_delta=0):
        self.patience = patience
        self.min_delta = min_delta
        self.counter = 0
        self.best_loss = float('inf')

    def __call__(self, val_loss):
        if val_loss < self.best_loss - self.min_delta:
            self.best_loss = val_loss
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                return True
        return False

🔹 Understanding What Your Network Learns

Curious about what your network is actually “seeing”? Visualizing feature maps gives you a peek into the inner workings of Stage 1, showing the features your model has learned.

import matplotlib.pyplot as plt

def visualize_feature_maps(model, input_image):
    model.eval()
    with torch.no_grad():
        features = model.stage1(input_image.unsqueeze(0))
    
    fig, axes = plt.subplots(4, 4, figsize=(12, 12))
    for i, ax in enumerate(axes.flat):
        if i < features.shape[1]:
            ax.imshow(features[0, i].cpu().numpy(), cmap='viridis')
            ax.axis('off')
    plt.tight_layout()
    plt.show()

🔹 Track Your Training Progress

Stay on top of your training! Monitoring your model’s progress (for example, with TensorBoard) helps you spot issues early and keep your experiments organized.

from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs/experiment_1')

def train_with_monitoring(model, dataloader, optimizer, criterion, epoch):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(dataloader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        if i % 100 == 99:
            writer.add_scalar('training loss',
                              running_loss / 100,
                              epoch * len(dataloader) + i)
            running_loss = 0.0

🔹 Using Transfer Learning with 2-Stage Backprop

Want to leverage pre-trained models? You can combine 2-Stage Backpropagation with transfer learning—using a powerful, pre-trained feature extractor as your Stage 1.

import torchvision.models as models

class TransferTwoStageNetwork(nn.Module):
    def __init__(self, num_classes):
        super(TransferTwoStageNetwork, self).__init__()
        resnet = models.resnet18(pretrained=True)
        self.stage1 = nn.Sequential(*list(resnet.children())[:-1])
        self.stage2 = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.stage1(x)
        x = x.view(x.size(0), -1)
        x = self.stage2(x)
        return x

🔹 Handling Class Imbalance

Imbalanced datasets? No problem! With a few tweaks to your loss function, you can help your model learn fairly from all classes—even the rare ones.

def calculate_class_weights(dataset):
    class_counts = torch.zeros(num_classes)
    for _, label in dataset:
        class_counts[label] += 1
    return 1.0 / class_counts

class_weights = calculate_class_weights(train_dataset)
criterion = nn.CrossEntropyLoss(weight=class_weights)

def train_with_weighted_loss(model, dataloader, optimizer, criterion):
    for inputs, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

🔹 Evaluating Your Model

After all that work, it’s time to see how your model performs. Let’s evaluate accuracy and loss on a test set to get a clear picture of its real-world power.

def evaluate_model(model, test_loader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()
    
    accuracy = 100 * correct / total
    average_loss = total_loss / len(test_loader)
    
    return accuracy, average_loss

🔹 Want to Dive Deeper?

Ready to keep learning? Check out these resources to explore 2-Stage Backpropagation and related deep learning techniques:

“Deep Learning” by Goodfellow, Bengio, and Courville - Available at: https://www.deeplearningbook.org/
“Curriculum Learning” by Bengio et al. (2009) - ArXiv: https://arxiv.org/abs/0904.2425
“Progressive Neural Networks” by Rusu et al. (2016) - ArXiv: https://arxiv.org/abs/1606.04671
“An Overview of Multi-Task Learning in Deep Neural Networks” by Ruder (2017) - ArXiv: https://arxiv.org/abs/1706.05098

2-Stage Backpropagation in Python

📋 Prerequisites

🎯 What You'll Learn

Related Tutorials

Neural Network Basics

Advanced Training Techniques for Deep Learning Models

Dilation and Upconvolution in PyTorch

Dilation and Upconvolution in Deep Learning

2-Stage Backpropagation in Python

📋 Prerequisites

🎯 What You'll Learn

Related Tutorials

Neural Network Basics

Advanced Training Techniques for Deep Learning Models

Dilation and Upconvolution in PyTorch

Dilation and Upconvolution in Deep Learning

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies