Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 75 minutes

Understanding Nonlinearities

Activation functions and their importance in neural networks

Introduction

Neural networks rely on nonlinearities to learn and model complex patterns in data.

Without nonlinearities:

βœ… Neural networks would be equivalent to a linear model, regardless of depth.
βœ… They would not be able to model complex, real-world relationships.


1️⃣ What is Nonlinearity?

Nonlinearity in deep learning comes from activation functions applied to neurons.

It enables:

  • Learning of complex, non-linear patterns in data.
  • Stacking of multiple layers while retaining expressive power.

2️⃣ Why Nonlinearities are Important

βœ… Enable deep networks to approximate complex functions.
βœ… Allow neural networks to classify, detect, and predict complex data.
βœ… Without nonlinearities, adding more layers would have no added benefit.


3️⃣ Common Activation Functions

ReLU (Rectified Linear Unit)

[ f(x) = \max(0, x) ]

βœ… Most commonly used in hidden layers due to simplicity and effectiveness.
βœ… Helps avoid vanishing gradient problems compared to sigmoid or tanh.


Sigmoid

[ \sigma(x) = \frac{1}{1 + e^{-x}} ]

βœ… Squashes input between 0 and 1, useful for binary classification outputs.
βœ… Can cause vanishing gradients in deep networks.


Tanh (Hyperbolic Tangent)

[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} ]

βœ… Squashes input between -1 and 1, zero-centered, often preferred over sigmoid in hidden layers.


Leaky ReLU

[ f(x) = \begin{cases} x & \text{if } x > 0 \ \alpha x & \text{if } x \leq 0 \end{cases} ]

βœ… Allows a small gradient when the unit is not active, addressing the β€œdying ReLU” problem.


4️⃣ Example in Python

import tensorflow as tf

# Using ReLU in a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(100,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

5️⃣ Choosing the Right Activation Function

βœ… Use ReLU or its variants in hidden layers for most deep learning tasks.
βœ… Use sigmoid for binary classification outputs.
βœ… Use softmax for multi-class classification outputs.


Conclusion

βœ… Nonlinearities allow neural networks to learn complex patterns in data.
βœ… Activation functions like ReLU, sigmoid, and tanh provide this nonlinearity.
βœ… Understanding these will help you build better deep learning models.


What’s Next?

βœ… Experiment with different activation functions in your first deep learning models.
βœ… Learn about advanced nonlinearities like Swish and GELU for modern architectures.
βœ… Continue your structured deep learning journey on superml.org.


Join the SuperML Community to share your experiments and deepen your learning.


Happy Learning! ⚑