· Deep Learning · 6 min read
📋 Prerequisites
- Python
- PyTorch Basics
- Neural Networks
🎯 What You'll Learn
- Understand the concept of 2-Stage Backpropagation
- Implement 2-Stage Backpropagation in PyTorch
- Learn to improve convergence and generalization in neural networks
Looking to improve your model’s performance and generalization? 🚀 This tutorial will guide you through 2-Stage Backpropagation in a practical, step-by-step manner using PyTorch, helping you understand and implement this advanced technique with clarity.
🔹 What is 2-Stage Backpropagation?
2-Stage Backpropagation is a powerful method for training neural networks that helps boost convergence and generalization. The idea is simple: split your network into two parts, train them separately, and then fine-tune the whole model for the best results.
import torch
import torch.nn as nn
class TwoStageNetwork(nn.Module):
def __init__(self):
super(TwoStageNetwork, self).__init__()
self.stage1 = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128)
)
self.stage2 = nn.Sequential(
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10)
)
def forward(self, x):
x = self.stage1(x)
x = self.stage2(x)
return x
🔹 Stage 1: Learning Robust Features
The first stage is all about helping your network discover meaningful, robust features from your data. This part usually involves convolutional or dense layers that dig deep to extract useful patterns and representations.
def train_stage1(model, dataloader, optimizer, criterion, epochs):
for epoch in range(epochs):
for inputs, _ in dataloader:
optimizer.zero_grad()
features = model.stage1(inputs)
loss = criterion(features, features.detach()) # Self-supervised loss
loss.backward()
optimizer.step()
🔹 Stage 2: Final Classification
Now, the features your model learned in Stage 1 are put to work! In this stage, the network uses those features to perform the main classification task—typically using fully connected layers to make predictions.
def train_stage2(model, dataloader, optimizer, criterion, epochs):
for epoch in range(epochs):
for inputs, labels in dataloader:
optimizer.zero_grad()
features = model.stage1(inputs).detach()
outputs = model.stage2(features)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
🔹 Freezing Learned Features
When training Stage 2, it’s helpful to “freeze” the parameters from Stage 1. This means you lock in those useful features and let the classification layers do the learning, ensuring your features stay intact.
def freeze_stage1(model):
for param in model.stage1.parameters():
param.requires_grad = False
def unfreeze_stage1(model):
for param in model.stage1.parameters():
param.requires_grad = True
🔹 Fine-Tuning for Best Performance
Once both stages are trained, it’s time for the magic touch: fine-tune the entire network together! This step helps your model reach its full potential by letting all layers learn in harmony.
def finetune(model, dataloader, optimizer, criterion, epochs):
unfreeze_stage1(model)
for epoch in range(epochs):
for inputs, labels in dataloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
🔹 Using Learning Rate Schedulers
Smart learning rate scheduling can make a big difference. Adjusting the learning rate at the right time helps your model converge faster and more reliably—let’s see how to set that up.
from torch.optim.lr_scheduler import StepLR
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)
for epoch in range(epochs):
train_epoch(model, dataloader, optimizer, criterion)
scheduler.step()
🔹 Boosting Stage 1 with Data Augmentation
Want even better features? Try data augmentation during Stage 1! By transforming your input data in creative ways, your model learns to generalize and handle real-world variations.
from torchvision import transforms
stage1_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
stage1_dataset = CustomDataset(transform=stage1_transforms)
stage1_dataloader = DataLoader(stage1_dataset, batch_size=32, shuffle=True)
🔹 Stabilizing Training with Gradient Clipping
Ever had your training go haywire? Gradient clipping is a simple trick to keep your training stable—especially useful during fine-tuning when gradients can get out of control.
def train_with_gradient_clipping(model, dataloader, optimizer, criterion, clip_value):
for inputs, labels in dataloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)
optimizer.step()
🔹 Avoiding Overfitting with Early Stopping
Don’t let your model overtrain! Early stopping helps you halt training at just the right moment, preventing overfitting and saving you time.
class EarlyStopping:
def __init__(self, patience=5, min_delta=0):
self.patience = patience
self.min_delta = min_delta
self.counter = 0
self.best_loss = float('inf')
def __call__(self, val_loss):
if val_loss < self.best_loss - self.min_delta:
self.best_loss = val_loss
self.counter = 0
else:
self.counter += 1
if self.counter >= self.patience:
return True
return False
🔹 Understanding What Your Network Learns
Curious about what your network is actually “seeing”? Visualizing feature maps gives you a peek into the inner workings of Stage 1, showing the features your model has learned.
import matplotlib.pyplot as plt
def visualize_feature_maps(model, input_image):
model.eval()
with torch.no_grad():
features = model.stage1(input_image.unsqueeze(0))
fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
if i < features.shape[1]:
ax.imshow(features[0, i].cpu().numpy(), cmap='viridis')
ax.axis('off')
plt.tight_layout()
plt.show()
🔹 Track Your Training Progress
Stay on top of your training! Monitoring your model’s progress (for example, with TensorBoard) helps you spot issues early and keep your experiments organized.
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/experiment_1')
def train_with_monitoring(model, dataloader, optimizer, criterion, epoch):
running_loss = 0.0
for i, (inputs, labels) in enumerate(dataloader):
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 100 == 99:
writer.add_scalar('training loss',
running_loss / 100,
epoch * len(dataloader) + i)
running_loss = 0.0
🔹 Using Transfer Learning with 2-Stage Backprop
Want to leverage pre-trained models? You can combine 2-Stage Backpropagation with transfer learning—using a powerful, pre-trained feature extractor as your Stage 1.
import torchvision.models as models
class TransferTwoStageNetwork(nn.Module):
def __init__(self, num_classes):
super(TransferTwoStageNetwork, self).__init__()
resnet = models.resnet18(pretrained=True)
self.stage1 = nn.Sequential(*list(resnet.children())[:-1])
self.stage2 = nn.Linear(512, num_classes)
def forward(self, x):
x = self.stage1(x)
x = x.view(x.size(0), -1)
x = self.stage2(x)
return x
🔹 Handling Class Imbalance
Imbalanced datasets? No problem! With a few tweaks to your loss function, you can help your model learn fairly from all classes—even the rare ones.
def calculate_class_weights(dataset):
class_counts = torch.zeros(num_classes)
for _, label in dataset:
class_counts[label] += 1
return 1.0 / class_counts
class_weights = calculate_class_weights(train_dataset)
criterion = nn.CrossEntropyLoss(weight=class_weights)
def train_with_weighted_loss(model, dataloader, optimizer, criterion):
for inputs, labels in dataloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
🔹 Evaluating Your Model
After all that work, it’s time to see how your model performs. Let’s evaluate accuracy and loss on a test set to get a clear picture of its real-world power.
def evaluate_model(model, test_loader, criterion):
model.eval()
total_loss = 0
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in test_loader:
outputs = model(inputs)
loss = criterion(outputs, labels)
total_loss += loss.item()
_, predicted = outputs.max(1)
total += labels.size(0)
correct += predicted.eq(labels).sum().item()
accuracy = 100 * correct / total
average_loss = total_loss / len(test_loader)
return accuracy, average_loss
🔹 Want to Dive Deeper?
Ready to keep learning? Check out these resources to explore 2-Stage Backpropagation and related deep learning techniques:
- “Deep Learning” by Goodfellow, Bengio, and Courville - Available at: https://www.deeplearningbook.org/
- “Curriculum Learning” by Bengio et al. (2009) - ArXiv: https://arxiv.org/abs/0904.2425
- “Progressive Neural Networks” by Rusu et al. (2016) - ArXiv: https://arxiv.org/abs/1606.04671
- “An Overview of Multi-Task Learning in Deep Neural Networks” by Ruder (2017) - ArXiv: https://arxiv.org/abs/1706.05098