Vanishing and Exploding Gradients in Deep Learning

Understand what vanishing and exploding gradients are, why they occur in deep networks, and practical strategies to mitigate them during training.

🔰 beginner
⏱️ 40 minutes
👤 SuperML Team

· Deep Learning · 2 min read

📋 Prerequisites

  • Basic understanding of neural networks and backpropagation

🎯 What You'll Learn

  • Understand what vanishing and exploding gradients are
  • Learn why these issues occur in deep networks
  • Discover their impact on model training
  • Explore practical strategies to mitigate these problems

Introduction

Training deep neural networks can be challenging due to vanishing and exploding gradients, which can slow or even completely stop learning.


1️⃣ What are Vanishing and Exploding Gradients?

Vanishing Gradients:

During backpropagation, gradients become very small as they are propagated backward through layers, causing:

✅ Early layers to learn very slowly or not at all.
✅ Stagnation in loss reduction.

Exploding Gradients:

Gradients become excessively large during backpropagation, causing:

✅ Unstable training.
✅ Weights to grow too large, resulting in NaN values or model divergence.


2️⃣ Why Do These Issues Occur?

In deep networks, gradients are repeatedly multiplied during backpropagation:

✅ If weights are < 1, gradients shrink exponentially (vanishing).
✅ If weights are > 1, gradients grow exponentially (exploding).

Activation functions like sigmoid and tanh further contribute to vanishing gradients due to their small derivatives.


3️⃣ Effects on Model Training

  • Vanishing gradients prevent lower layers from learning, leading to poor performance.
  • Exploding gradients cause instability and divergence during training.

4️⃣ Strategies to Mitigate These Issues

a) Proper Weight Initialization

  • Xavier/Glorot Initialization for tanh activations.
  • He Initialization for ReLU activations.

b) Using ReLU or Variants

ReLU activation functions help mitigate vanishing gradients as their derivative is 1 for positive inputs.

c) Gradient Clipping

Clipping gradients during training prevents them from exceeding a threshold, avoiding exploding gradients.

# Example in TensorFlow
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, clipnorm=1.0)

d) Batch Normalization

Normalizing activations helps maintain stable gradients across layers.

e) Residual Connections

Using skip connections allows gradients to flow directly, mitigating vanishing gradients in very deep networks.


5️⃣ Practical Example: Gradient Clipping

import tensorflow as tf

model.compile(optimizer=tf.keras.optimizers.Adam(clipvalue=1.0),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Conclusion

✅ Vanishing and exploding gradients are common issues in training deep networks.
✅ Understanding these concepts helps you design and train stable, effective models.
✅ Using proper initialization, ReLU, gradient clipping, batch normalization, and residual connections can mitigate these problems.


What’s Next?

✅ Experiment with gradient clipping in your models.
✅ Explore residual networks and advanced architectures that handle these challenges.
✅ Continue structured deep learning on superml.org.


Join the SuperML Community to discuss your experiments and get practical help.


Happy Learning! ⚡

Back to Tutorials

Related Tutorials

🔰beginner ⏱️ 30 minutes

Basic Linear Algebra for Deep Learning

Understand the essential linear algebra concepts for deep learning, including scalars, vectors, matrices, and matrix operations, with clear examples for beginners.

Deep Learning2 min read
deep learninglinear algebrabeginner +1
🔰beginner ⏱️ 45 minutes

Your First Deep Learning Implementation

Build your first deep learning model to classify handwritten digits using TensorFlow and Keras, explained step-by-step for beginners.

Deep Learning2 min read
deep learningbeginnerkeras +2
🔰beginner ⏱️ 30 minutes

Introduction to Deep Learning

Get started with deep learning by understanding what it is, how it differs from machine learning, and explore key concepts like neural networks and activation functions with beginner-friendly explanations.

Deep Learning2 min read
deep learningbeginnermachine learning +1
🔰beginner ⏱️ 30 minutes

Key Concepts in Deep Learning for Beginners

Understand the foundational concepts in deep learning, including neurons, layers, activation functions, loss functions, and the training process, with simple explanations and examples.

Deep Learning2 min read
deep learningbeginnerkey concepts +1