· Deep Learning · 2 min read
📋 Prerequisites
- Basic understanding of neural networks
- Basic math (algebra and functions)
🎯 What You'll Learn
- Understand what nonlinearity means in the context of deep learning
- Learn why activation functions are essential
- Explore common activation functions with examples
- Build confidence in using nonlinearities in your first models
Introduction
Neural networks rely on nonlinearities to learn and model complex patterns in data.
Without nonlinearities:
✅ Neural networks would be equivalent to a linear model, regardless of depth.
✅ They would not be able to model complex, real-world relationships.
1️⃣ What is Nonlinearity?
Nonlinearity in deep learning comes from activation functions applied to neurons.
It enables:
- Learning of complex, non-linear patterns in data.
- Stacking of multiple layers while retaining expressive power.
2️⃣ Why Nonlinearities are Important
✅ Enable deep networks to approximate complex functions.
✅ Allow neural networks to classify, detect, and predict complex data.
✅ Without nonlinearities, adding more layers would have no added benefit.
3️⃣ Common Activation Functions
ReLU (Rectified Linear Unit)
[ f(x) = \max(0, x) ]
✅ Most commonly used in hidden layers due to simplicity and effectiveness.
✅ Helps avoid vanishing gradient problems compared to sigmoid or tanh.
Sigmoid
[ \sigma(x) = \frac{1}{1 + e^{-x}} ]
✅ Squashes input between 0 and 1, useful for binary classification outputs.
✅ Can cause vanishing gradients in deep networks.
Tanh (Hyperbolic Tangent)
[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} ]
✅ Squashes input between -1 and 1, zero-centered, often preferred over sigmoid in hidden layers.
Leaky ReLU
[ f(x) = \begin{cases} x & \text{if } x > 0 \ \alpha x & \text{if } x \leq 0 \end{cases} ]
✅ Allows a small gradient when the unit is not active, addressing the “dying ReLU” problem.
4️⃣ Example in Python
import tensorflow as tf
# Using ReLU in a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(100,)),
tf.keras.layers.Dense(10, activation='softmax')
])
5️⃣ Choosing the Right Activation Function
✅ Use ReLU or its variants in hidden layers for most deep learning tasks.
✅ Use sigmoid for binary classification outputs.
✅ Use softmax for multi-class classification outputs.
Conclusion
✅ Nonlinearities allow neural networks to learn complex patterns in data.
✅ Activation functions like ReLU, sigmoid, and tanh provide this nonlinearity.
✅ Understanding these will help you build better deep learning models.
What’s Next?
✅ Experiment with different activation functions in your first deep learning models.
✅ Learn about advanced nonlinearities like Swish and GELU for modern architectures.
✅ Continue your structured deep learning journey on superml.org
.
Join the SuperML Community to share your experiments and deepen your learning.
Happy Learning! ⚡