Course Content
Building Your First Neural Network Project
Complete project building a neural network from scratch
Building Your First Neural Network Project
This comprehensive project will guide you through building a complete neural network from scratch for handwritten digit classification. Youβll learn the entire deep learning pipeline from data preparation to model evaluation.
Project Overview
Weβll build a neural network to classify handwritten digits (0-9) using the famous MNIST dataset. This project covers all essential aspects of deep learning development.
What Youβll Accomplish
- β Load and preprocess the MNIST dataset
- β Design a neural network architecture
- β Implement forward and backward propagation
- β Train the model with proper optimization
- β Evaluate performance and visualize results
- β Save and deploy your trained model
Step 1: Project Setup
First, letβs set up our environment and import necessary libraries:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns
# Set random seed for reproducibility
np.random.seed(42)
Step 2: Data Loading and Exploration
Load the MNIST dataset and explore its structure:
# Load MNIST dataset
print("Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target.astype(int)
print(f"Dataset shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Unique labels: {np.unique(y)}")
# Visualize sample images
def plot_sample_images(X, y, num_samples=10):
fig, axes = plt.subplots(2, 5, figsize=(12, 6))
axes = axes.ravel()
for i in range(num_samples):
img = X[i].reshape(28, 28)
axes[i].imshow(img, cmap='gray')
axes[i].set_title(f'Label: {y[i]}')
axes[i].axis('off')
plt.tight_layout()
plt.show()
plot_sample_images(X, y)
Step 3: Data Preprocessing
Prepare the data for neural network training:
# Normalize pixel values to [0, 1] range
X = X / 255.0
# One-hot encode labels
def one_hot_encode(y, num_classes=10):
encoded = np.zeros((len(y), num_classes))
for i, label in enumerate(y):
encoded[i, label] = 1
return encoded
y_encoded = one_hot_encode(y)
# Split into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(
X, y_encoded, test_size=0.3, random_state=42, stratify=y
)
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp.argmax(axis=1)
)
print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"Test set: {X_test.shape}")
Step 4: Neural Network Implementation
Build a neural network class from scratch:
class NeuralNetwork:
def __init__(self, layers):
"""
Initialize neural network with specified layer sizes
layers: list of layer sizes [input_size, hidden1_size, hidden2_size, output_size]
"""
self.layers = layers
self.num_layers = len(layers)
# Initialize weights and biases
self.weights = []
self.biases = []
for i in range(1, self.num_layers):
# Xavier initialization
w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
b = np.zeros((1, layers[i]))
self.weights.append(w)
self.biases.append(b)
def sigmoid(self, z):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-np.clip(z, -250, 250)))
def sigmoid_derivative(self, z):
"""Derivative of sigmoid function"""
s = self.sigmoid(z)
return s * (1 - s)
def relu(self, z):
"""ReLU activation function"""
return np.maximum(0, z)
def relu_derivative(self, z):
"""Derivative of ReLU function"""
return (z > 0).astype(float)
def softmax(self, z):
"""Softmax activation for output layer"""
exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
return exp_z / np.sum(exp_z, axis=1, keepdims=True)
def forward_propagation(self, X):
"""Forward pass through the network"""
self.z_values = []
self.a_values = [X]
# Hidden layers with ReLU
for i in range(len(self.weights) - 1):
z = np.dot(self.a_values[-1], self.weights[i]) + self.biases[i]
a = self.relu(z)
self.z_values.append(z)
self.a_values.append(a)
# Output layer with softmax
z_output = np.dot(self.a_values[-1], self.weights[-1]) + self.biases[-1]
a_output = self.softmax(z_output)
self.z_values.append(z_output)
self.a_values.append(a_output)
return a_output
def compute_loss(self, y_true, y_pred):
"""Compute cross-entropy loss"""
m = y_true.shape[0]
loss = -np.sum(y_true * np.log(y_pred + 1e-8)) / m
return loss
def backward_propagation(self, X, y_true, y_pred):
"""Backward pass to compute gradients"""
m = X.shape[0]
# Initialize gradients
dW = [np.zeros_like(w) for w in self.weights]
db = [np.zeros_like(b) for b in self.biases]
# Output layer gradient
dz = y_pred - y_true
dW[-1] = np.dot(self.a_values[-2].T, dz) / m
db[-1] = np.sum(dz, axis=0, keepdims=True) / m
# Hidden layers gradients
for i in range(len(self.weights) - 2, -1, -1):
dz = np.dot(dz, self.weights[i + 1].T) * self.relu_derivative(self.z_values[i])
dW[i] = np.dot(self.a_values[i].T, dz) / m
db[i] = np.sum(dz, axis=0, keepdims=True) / m
return dW, db
def update_parameters(self, dW, db, learning_rate):
"""Update weights and biases"""
for i in range(len(self.weights)):
self.weights[i] -= learning_rate * dW[i]
self.biases[i] -= learning_rate * db[i]
def train(self, X_train, y_train, X_val, y_val, epochs=100, learning_rate=0.001, batch_size=32):
"""Train the neural network"""
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []
for epoch in range(epochs):
# Mini-batch training
for i in range(0, len(X_train), batch_size):
X_batch = X_train[i:i+batch_size]
y_batch = y_train[i:i+batch_size]
# Forward pass
y_pred = self.forward_propagation(X_batch)
# Backward pass
dW, db = self.backward_propagation(X_batch, y_batch, y_pred)
# Update parameters
self.update_parameters(dW, db, learning_rate)
# Calculate losses and accuracies
train_pred = self.forward_propagation(X_train)
val_pred = self.forward_propagation(X_val)
train_loss = self.compute_loss(y_train, train_pred)
val_loss = self.compute_loss(y_val, val_pred)
train_acc = self.calculate_accuracy(y_train, train_pred)
val_acc = self.calculate_accuracy(y_val, val_pred)
train_losses.append(train_loss)
val_losses.append(val_loss)
train_accuracies.append(train_acc)
val_accuracies.append(val_acc)
if epoch % 10 == 0:
print(f"Epoch {epoch}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
return train_losses, val_losses, train_accuracies, val_accuracies
def predict(self, X):
"""Make predictions"""
probabilities = self.forward_propagation(X)
return np.argmax(probabilities, axis=1)
def calculate_accuracy(self, y_true, y_pred):
"""Calculate accuracy"""
y_true_labels = np.argmax(y_true, axis=1)
y_pred_labels = np.argmax(y_pred, axis=1)
return np.mean(y_true_labels == y_pred_labels)
Step 5: Model Training
Train your neural network:
# Create and train the model
model = NeuralNetwork([784, 128, 64, 10])
print("Starting training...")
train_losses, val_losses, train_accs, val_accs = model.train(
X_train, y_train, X_val, y_val,
epochs=100, learning_rate=0.01, batch_size=64
)
print("Training completed!")
Step 6: Model Evaluation
Evaluate your modelβs performance:
# Test set evaluation
test_predictions = model.predict(X_test)
test_accuracy = np.mean(test_predictions == np.argmax(y_test, axis=1))
print(f"Test Accuracy: {test_accuracy:.4f}")
# Plot training history
def plot_training_history(train_losses, val_losses, train_accs, val_accs):
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
# Loss plot
ax1.plot(train_losses, label='Training Loss')
ax1.plot(val_losses, label='Validation Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Validation Loss')
ax1.legend()
ax1.grid(True)
# Accuracy plot
ax2.plot(train_accs, label='Training Accuracy')
ax2.plot(val_accs, label='Validation Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Training and Validation Accuracy')
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.show()
plot_training_history(train_losses, val_losses, train_accs, val_accs)
# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report
y_test_labels = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_labels, test_predictions)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
# Classification report
print("\nClassification Report:")
print(classification_report(y_test_labels, test_predictions))
Step 7: Visualize Predictions
Visualize some predictions to understand model behavior:
def visualize_predictions(X, y_true, predictions, num_samples=10):
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes = axes.ravel()
# Get some random samples
indices = np.random.choice(len(X), num_samples, replace=False)
for i, idx in enumerate(indices):
img = X[idx].reshape(28, 28)
true_label = y_true[idx]
pred_label = predictions[idx]
color = 'green' if true_label == pred_label else 'red'
axes[i].imshow(img, cmap='gray')
axes[i].set_title(f'True: {true_label}, Pred: {pred_label}', color=color)
axes[i].axis('off')
plt.tight_layout()
plt.show()
visualize_predictions(X_test, y_test_labels, test_predictions)
Step 8: Model Improvement Suggestions
Here are ways to improve your model:
def analyze_errors(X, y_true, predictions):
"""Analyze prediction errors"""
errors = y_true != predictions
error_indices = np.where(errors)[0]
print(f"Total errors: {len(error_indices)} out of {len(y_true)}")
print(f"Error rate: {len(error_indices)/len(y_true)*100:.2f}%")
# Most confused classes
error_combinations = {}
for idx in error_indices:
true_pred_pair = (y_true[idx], predictions[idx])
error_combinations[true_pred_pair] = error_combinations.get(true_pred_pair, 0) + 1
print("\nMost common errors:")
sorted_errors = sorted(error_combinations.items(), key=lambda x: x[1], reverse=True)
for (true_label, pred_label), count in sorted_errors[:5]:
print(f"True: {true_label}, Predicted: {pred_label}, Count: {count}")
analyze_errors(y_test_labels, test_predictions)
Project Summary
Congratulations! Youβve built a complete neural network project. Hereβs what you accomplished:
β Key Achievements
- Built a neural network from scratch using only NumPy
- Implemented forward and backward propagation
- Trained on the MNIST dataset with proper data preprocessing
- Achieved significant accuracy on handwritten digit classification
- Evaluated model performance with multiple metrics
- Analyzed errors to understand model limitations
π Next Steps
- Experiment with architectures: Try different layer sizes and depths
- Add regularization: Implement dropout or L2 regularization
- Try different optimizers: Implement Adam or RMSprop
- Data augmentation: Rotate, scale, or shift images for better generalization
- Deploy your model: Save the model and create a simple web interface
π Portfolio Addition
This project demonstrates:
- Deep learning fundamentals
- Mathematical implementation skills
- Data preprocessing and evaluation
- Performance analysis and visualization
Save your code, results, and analysis in a GitHub repository to showcase your deep learning skills to potential employers or for academic purposes.
Conclusion
Building a neural network from scratch gives you invaluable insights into how deep learning works under the hood. You now understand the mathematical foundations and can confidently move on to using high-level frameworks like PyTorch and TensorFlow with a solid understanding of whatβs happening behind the scenes.
Keep experimenting and building more complex projects to deepen your deep learning expertise!