Course Content
Structure of Convolutions
Understanding the structure and operation of convolutions
Introduction
Convolutions are the core building blocks of Convolutional Neural Networks (CNNs), enabling efficient learning from image and signal data by capturing spatial hierarchies.
1οΈβ£ What is a Convolution?
A convolution is a mathematical operation that slides a filter (kernel) across the input data, computing dot products at each position to extract features like edges, textures, and patterns.
2οΈβ£ Key Concepts
Filters (Kernels)
Small matrices (e.g., 3x3, 5x5) with learnable weights that scan over the input.
Stride
Determines how many pixels the filter moves at each step.
β A stride of 1 scans pixel by pixel. β A stride of 2 skips every other pixel, reducing output size.
Padding
Adds borders of zeros around the input to:
β Preserve input size (βsameβ padding). β Allow filters to fully scan border areas.
3οΈβ£ Why Convolutions?
β
Reduce the number of parameters compared to fully connected layers.
β
Preserve spatial relationships in data.
β
Capture local patterns and build hierarchical representations in deeper layers.
4οΈβ£ Convolution Operation Intuition
For a 3x3 filter sliding over a 5x5 input:
- Multiply overlapping elements.
- Sum them up.
- Output a single value per position.
This process is repeated across the entire input to produce a feature map.
5οΈβ£ Example in TensorFlow
import tensorflow as tf
# Example input: grayscale image of size 28x28
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(10, activation='softmax')
])
6οΈβ£ Pooling Layers
Pooling layers often follow convolutional layers to:
β
Reduce spatial dimensions (downsampling).
β
Retain important features while reducing computation.
β
Common: Max Pooling (takes the max value in each region).
7οΈβ£ Stacking Convolutions
By stacking multiple convolutional layers:
β
Lower layers learn edges and textures.
β
Higher layers learn shapes and complex patterns.
β
CNNs build hierarchical feature representations essential for image tasks.
Conclusion
β
Convolutions allow CNNs to efficiently process and learn from spatial data.
β
Understanding filters, strides, padding, and pooling will help you design effective CNNs for image classification, object detection, and beyond.
Whatβs Next?
β
Experiment with different filter sizes, strides, and padding.
β
Visualize feature maps to see what your CNN is learning.
β
Continue your deep learning journey on superml.org
to explore advanced CNN architectures.
Join the SuperML Community to share your CNN experiments and get feedback.
Happy Learning! π·