Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 75 minutes

Residuals and Normalizations Combined

Understanding residual connections and normalization techniques

Introduction

As deep networks become deeper, training becomes challenging due to vanishing gradients and unstable distributions across layers.

βœ… Residual connections and normalization techniques address these issues, allowing deep networks to train effectively and improve performance.


1️⃣ The Vanishing Gradient Problem

In deep networks:

βœ… Gradients can become very small during backpropagation.
βœ… Layers close to the input learn very slowly.
βœ… This hinders the ability to train deep architectures effectively.


2️⃣ What are Residual Connections?

Residual connections (skip connections) directly add the input of a layer to its output:

[ y = F(x) + x ]

where:

  • (x): Input to the block.
  • (F(x)): The output after passing through several layers.

Benefits:

βœ… Allow gradients to flow directly through skip paths, mitigating vanishing gradients.
βœ… Enable training of very deep networks (e.g., ResNets with 50+ layers).
βœ… Simplify learning by letting layers learn the residual mapping instead of the entire transformation.


3️⃣ What is Normalization?

Normalization techniques help stabilize and speed up training by:

βœ… Reducing internal covariate shift.
βœ… Smoothing the loss landscape for more stable optimization.


Batch Normalization

Batch Normalization normalizes the output of a layer across the current mini-batch.

Formula: [ \hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} ]

where:

  • (\mu): Mean of the batch.
  • (\sigma^2): Variance of the batch.
  • (\epsilon): Small constant to prevent division by zero.

Benefits:

βœ… Allows higher learning rates.
βœ… Acts as a form of regularization.
βœ… Reduces the need for careful weight initialization.


4️⃣ Example: Using Residual Connections and Batch Normalization

TensorFlow Example

import tensorflow as tf
from tensorflow.keras import layers

# Residual block with batch normalization
def residual_block(x, units):
    shortcut = x
    x = layers.Dense(units, activation='relu')(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dense(units)(x)
    x = layers.BatchNormalization()(x)
    x = layers.Add()([x, shortcut])
    x = layers.Activation('relu')(x)
    return x

inputs = tf.keras.Input(shape=(128,))
x = residual_block(inputs, 128)
x = layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=x)

5️⃣ Practical Tips

βœ… Use batch normalization after dense/convolutional layers and before activation functions.
βœ… Add residual connections in blocks of layers for deeper models.
βœ… Combine both techniques for stability and faster convergence in deep architectures.


Conclusion

βœ… Residual connections and normalization techniques are critical for building deep, trainable, and effective networks.
βœ… They help mitigate vanishing gradients and improve convergence.


What’s Next?

βœ… Experiment with adding residual blocks and batch normalization in your projects.
βœ… Explore deeper architectures like ResNet and DenseNet.
βœ… Continue your structured learning on superml.org.


Join the SuperML Community to share your experiments and learn collaboratively.


Happy Learning! πŸͺ