Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 60 minutes

Normalization Techniques

Exploring normalization techniques in deep learning

Introduction

Normalization techniques in deep learning help stabilize and accelerate training by addressing internal covariate shift and ensuring activations remain within a reasonable range during training.


1️⃣ What is Normalization?

Normalization in deep learning refers to adjusting and scaling the activations of layers so that:

βœ… The distribution of inputs to each layer remains stable during training.
βœ… Training converges faster.
βœ… Models become less sensitive to initialization.


2️⃣ Why is Normalization Important?

Without normalization:

βœ… The distribution of layer inputs can change during training (internal covariate shift).
βœ… Training may be unstable and slow.
βœ… The model may get stuck in poor local minima.

Normalization helps: βœ… Use higher learning rates safely.
βœ… Improve gradient flow.
βœ… Act as a regularizer, reducing the need for dropout.


3️⃣ Types of Normalization

Batch Normalization

Normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

Formula: [ \hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} ] where (\mu) = batch mean, (\sigma^2) = batch variance.

βœ… Often used in CNNs and MLPs.
βœ… Applied before or after the activation function depending on implementation.


Layer Normalization

Normalizes across the features for each sample instead of across the batch.

βœ… Useful in RNNs and transformer models where batch normalization is less effective due to varying batch sizes.


Other Normalization Techniques:

  • Instance Normalization: Common in style transfer tasks.
  • Group Normalization: Divides channels into groups for normalization, helpful in small-batch training.

4️⃣ Example: Using Batch Normalization in TensorFlow

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Dense(128),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dense(10, activation='softmax')
])

5️⃣ Best Practices

βœ… For CNNs, place batch normalization after convolution and before activation.
βœ… For MLPs, use batch normalization after dense layers.
βœ… Experiment with layer normalization for sequence models.
βœ… Adjust learning rates as normalization often allows for higher rates.


Conclusion

Normalization is a powerful tool in deep learning that:

βœ… Stabilizes training.
βœ… Speeds up convergence.
βœ… Improves generalization.

Learning to implement and tune normalization in your networks will make your models more robust and efficient.


What’s Next?

βœ… Try adding batch normalization to your existing models and observe its effect on training.
βœ… Explore advanced normalization techniques for specialized architectures like transformers.
βœ… Continue your structured deep learning journey on superml.org.


Join the SuperML Community to share experiments and learn collaboratively.


Happy Learning! ✨