Press ESC to exit fullscreen
📖 Lesson ⏱️ 75 minutes

Understanding Nonlinearities

Activation functions and their importance in neural networks

Introduction

Neural networks rely on nonlinearities to learn and model complex patterns in data.

Without nonlinearities:

✅ Neural networks would be equivalent to a linear model, regardless of depth.
✅ They would not be able to model complex, real-world relationships.


1️⃣ What is Nonlinearity?

Nonlinearity in deep learning comes from activation functions applied to neurons.

It enables:

  • Learning of complex, non-linear patterns in data.
  • Stacking of multiple layers while retaining expressive power.

2️⃣ Why Nonlinearities are Important

✅ Enable deep networks to approximate complex functions.
✅ Allow neural networks to classify, detect, and predict complex data.
✅ Without nonlinearities, adding more layers would have no added benefit.


3️⃣ Common Activation Functions

ReLU (Rectified Linear Unit)

[ f(x) = \max(0, x) ]

✅ Most commonly used in hidden layers due to simplicity and effectiveness.
✅ Helps avoid vanishing gradient problems compared to sigmoid or tanh.


Sigmoid

[ \sigma(x) = \frac{1}{1 + e^{-x}} ]

✅ Squashes input between 0 and 1, useful for binary classification outputs.
✅ Can cause vanishing gradients in deep networks.


Tanh (Hyperbolic Tangent)

[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} ]

✅ Squashes input between -1 and 1, zero-centered, often preferred over sigmoid in hidden layers.


Leaky ReLU

[ f(x) = \begin{cases} x & \text{if } x > 0 \ \alpha x & \text{if } x \leq 0 \end{cases} ]

✅ Allows a small gradient when the unit is not active, addressing the “dying ReLU” problem.


4️⃣ Example in Python

import tensorflow as tf

# Using ReLU in a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(100,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

5️⃣ Choosing the Right Activation Function

✅ Use ReLU or its variants in hidden layers for most deep learning tasks.
✅ Use sigmoid for binary classification outputs.
✅ Use softmax for multi-class classification outputs.


Conclusion

✅ Nonlinearities allow neural networks to learn complex patterns in data.
✅ Activation functions like ReLU, sigmoid, and tanh provide this nonlinearity.
✅ Understanding these will help you build better deep learning models.


What’s Next?

✅ Experiment with different activation functions in your first deep learning models.
✅ Learn about advanced nonlinearities like Swish and GELU for modern architectures.
✅ Continue your structured deep learning journey on superml.org.


Join the SuperML Community to share your experiments and deepen your learning.


Happy Learning! ⚡