Course Content
Understanding Nonlinearities
Activation functions and their importance in neural networks
Introduction
Neural networks rely on nonlinearities to learn and model complex patterns in data.
Without nonlinearities:
β
Neural networks would be equivalent to a linear model, regardless of depth.
β
They would not be able to model complex, real-world relationships.
1οΈβ£ What is Nonlinearity?
Nonlinearity in deep learning comes from activation functions applied to neurons.
It enables:
- Learning of complex, non-linear patterns in data.
- Stacking of multiple layers while retaining expressive power.
2οΈβ£ Why Nonlinearities are Important
β
Enable deep networks to approximate complex functions.
β
Allow neural networks to classify, detect, and predict complex data.
β
Without nonlinearities, adding more layers would have no added benefit.
3οΈβ£ Common Activation Functions
ReLU (Rectified Linear Unit)
[ f(x) = \max(0, x) ]
β
Most commonly used in hidden layers due to simplicity and effectiveness.
β
Helps avoid vanishing gradient problems compared to sigmoid or tanh.
Sigmoid
[ \sigma(x) = \frac{1}{1 + e^{-x}} ]
β
Squashes input between 0 and 1, useful for binary classification outputs.
β
Can cause vanishing gradients in deep networks.
Tanh (Hyperbolic Tangent)
[ \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} ]
β Squashes input between -1 and 1, zero-centered, often preferred over sigmoid in hidden layers.
Leaky ReLU
[ f(x) = \begin{cases} x & \text{if } x > 0 \ \alpha x & \text{if } x \leq 0 \end{cases} ]
β Allows a small gradient when the unit is not active, addressing the βdying ReLUβ problem.
4οΈβ£ Example in Python
import tensorflow as tf
# Using ReLU in a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(100,)),
tf.keras.layers.Dense(10, activation='softmax')
])
5οΈβ£ Choosing the Right Activation Function
β
Use ReLU or its variants in hidden layers for most deep learning tasks.
β
Use sigmoid for binary classification outputs.
β
Use softmax for multi-class classification outputs.
Conclusion
β
Nonlinearities allow neural networks to learn complex patterns in data.
β
Activation functions like ReLU, sigmoid, and tanh provide this nonlinearity.
β
Understanding these will help you build better deep learning models.
Whatβs Next?
β
Experiment with different activation functions in your first deep learning models.
β
Learn about advanced nonlinearities like Swish and GELU for modern architectures.
β
Continue your structured deep learning journey on superml.org
.
Join the SuperML Community to share your experiments and deepen your learning.
Happy Learning! β‘