Press ESC to exit fullscreen
📖 Lesson ⏱️ 60 minutes

Normalization Techniques

Exploring normalization techniques in deep learning

Introduction

Normalization techniques in deep learning help stabilize and accelerate training by addressing internal covariate shift and ensuring activations remain within a reasonable range during training.


1️⃣ What is Normalization?

Normalization in deep learning refers to adjusting and scaling the activations of layers so that:

✅ The distribution of inputs to each layer remains stable during training.
✅ Training converges faster.
✅ Models become less sensitive to initialization.


2️⃣ Why is Normalization Important?

Without normalization:

✅ The distribution of layer inputs can change during training (internal covariate shift).
✅ Training may be unstable and slow.
✅ The model may get stuck in poor local minima.

Normalization helps: ✅ Use higher learning rates safely.
✅ Improve gradient flow.
✅ Act as a regularizer, reducing the need for dropout.


3️⃣ Types of Normalization

Batch Normalization

Normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.

Formula: [ \hat{x} = \frac{x - \mu}{\sqrt{\sigma^2 + \epsilon}} ] where (\mu) = batch mean, (\sigma^2) = batch variance.

✅ Often used in CNNs and MLPs.
✅ Applied before or after the activation function depending on implementation.


Layer Normalization

Normalizes across the features for each sample instead of across the batch.

✅ Useful in RNNs and transformer models where batch normalization is less effective due to varying batch sizes.


Other Normalization Techniques:

  • Instance Normalization: Common in style transfer tasks.
  • Group Normalization: Divides channels into groups for normalization, helpful in small-batch training.

4️⃣ Example: Using Batch Normalization in TensorFlow

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Dense(128),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Dense(10, activation='softmax')
])

5️⃣ Best Practices

✅ For CNNs, place batch normalization after convolution and before activation.
✅ For MLPs, use batch normalization after dense layers.
✅ Experiment with layer normalization for sequence models.
✅ Adjust learning rates as normalization often allows for higher rates.


Conclusion

Normalization is a powerful tool in deep learning that:

✅ Stabilizes training.
✅ Speeds up convergence.
✅ Improves generalization.

Learning to implement and tune normalization in your networks will make your models more robust and efficient.


What’s Next?

✅ Try adding batch normalization to your existing models and observe its effect on training.
✅ Explore advanced normalization techniques for specialized architectures like transformers.
✅ Continue your structured deep learning journey on superml.org.


Join the SuperML Community to share experiments and learn collaboratively.


Happy Learning! ✨