Press ESC to exit fullscreen
πŸ—οΈ Project ⏱️ 240 minutes

Building Your First Neural Network Project

Complete project building a neural network from scratch

Building Your First Neural Network Project

This comprehensive project will guide you through building a complete neural network from scratch for handwritten digit classification. You’ll learn the entire deep learning pipeline from data preparation to model evaluation.

Project Overview

We’ll build a neural network to classify handwritten digits (0-9) using the famous MNIST dataset. This project covers all essential aspects of deep learning development.

What You’ll Accomplish

  • βœ… Load and preprocess the MNIST dataset
  • βœ… Design a neural network architecture
  • βœ… Implement forward and backward propagation
  • βœ… Train the model with proper optimization
  • βœ… Evaluate performance and visualize results
  • βœ… Save and deploy your trained model

Step 1: Project Setup

First, let’s set up our environment and import necessary libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

Step 2: Data Loading and Exploration

Load the MNIST dataset and explore its structure:

# Load MNIST dataset
print("Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target.astype(int)

print(f"Dataset shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Unique labels: {np.unique(y)}")

# Visualize sample images
def plot_sample_images(X, y, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    axes = axes.ravel()
    
    for i in range(num_samples):
        img = X[i].reshape(28, 28)
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'Label: {y[i]}')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

plot_sample_images(X, y)

Step 3: Data Preprocessing

Prepare the data for neural network training:

# Normalize pixel values to [0, 1] range
X = X / 255.0

# One-hot encode labels
def one_hot_encode(y, num_classes=10):
    encoded = np.zeros((len(y), num_classes))
    for i, label in enumerate(y):
        encoded[i, label] = 1
    return encoded

y_encoded = one_hot_encode(y)

# Split into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y_encoded, test_size=0.3, random_state=42, stratify=y
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp.argmax(axis=1)
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"Test set: {X_test.shape}")

Step 4: Neural Network Implementation

Build a neural network class from scratch:

class NeuralNetwork:
    def __init__(self, layers):
        """
        Initialize neural network with specified layer sizes
        layers: list of layer sizes [input_size, hidden1_size, hidden2_size, output_size]
        """
        self.layers = layers
        self.num_layers = len(layers)
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # Xavier initialization
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def sigmoid(self, z):
        """Sigmoid activation function"""
        return 1 / (1 + np.exp(-np.clip(z, -250, 250)))
    
    def sigmoid_derivative(self, z):
        """Derivative of sigmoid function"""
        s = self.sigmoid(z)
        return s * (1 - s)
    
    def relu(self, z):
        """ReLU activation function"""
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        """Derivative of ReLU function"""
        return (z > 0).astype(float)
    
    def softmax(self, z):
        """Softmax activation for output layer"""
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def forward_propagation(self, X):
        """Forward pass through the network"""
        self.z_values = []
        self.a_values = [X]
        
        # Hidden layers with ReLU
        for i in range(len(self.weights) - 1):
            z = np.dot(self.a_values[-1], self.weights[i]) + self.biases[i]
            a = self.relu(z)
            
            self.z_values.append(z)
            self.a_values.append(a)
        
        # Output layer with softmax
        z_output = np.dot(self.a_values[-1], self.weights[-1]) + self.biases[-1]
        a_output = self.softmax(z_output)
        
        self.z_values.append(z_output)
        self.a_values.append(a_output)
        
        return a_output
    
    def compute_loss(self, y_true, y_pred):
        """Compute cross-entropy loss"""
        m = y_true.shape[0]
        loss = -np.sum(y_true * np.log(y_pred + 1e-8)) / m
        return loss
    
    def backward_propagation(self, X, y_true, y_pred):
        """Backward pass to compute gradients"""
        m = X.shape[0]
        
        # Initialize gradients
        dW = [np.zeros_like(w) for w in self.weights]
        db = [np.zeros_like(b) for b in self.biases]
        
        # Output layer gradient
        dz = y_pred - y_true
        dW[-1] = np.dot(self.a_values[-2].T, dz) / m
        db[-1] = np.sum(dz, axis=0, keepdims=True) / m
        
        # Hidden layers gradients
        for i in range(len(self.weights) - 2, -1, -1):
            dz = np.dot(dz, self.weights[i + 1].T) * self.relu_derivative(self.z_values[i])
            dW[i] = np.dot(self.a_values[i].T, dz) / m
            db[i] = np.sum(dz, axis=0, keepdims=True) / m
        
        return dW, db
    
    def update_parameters(self, dW, db, learning_rate):
        """Update weights and biases"""
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * dW[i]
            self.biases[i] -= learning_rate * db[i]
    
    def train(self, X_train, y_train, X_val, y_val, epochs=100, learning_rate=0.001, batch_size=32):
        """Train the neural network"""
        train_losses = []
        val_losses = []
        train_accuracies = []
        val_accuracies = []
        
        for epoch in range(epochs):
            # Mini-batch training
            for i in range(0, len(X_train), batch_size):
                X_batch = X_train[i:i+batch_size]
                y_batch = y_train[i:i+batch_size]
                
                # Forward pass
                y_pred = self.forward_propagation(X_batch)
                
                # Backward pass
                dW, db = self.backward_propagation(X_batch, y_batch, y_pred)
                
                # Update parameters
                self.update_parameters(dW, db, learning_rate)
            
            # Calculate losses and accuracies
            train_pred = self.forward_propagation(X_train)
            val_pred = self.forward_propagation(X_val)
            
            train_loss = self.compute_loss(y_train, train_pred)
            val_loss = self.compute_loss(y_val, val_pred)
            
            train_acc = self.calculate_accuracy(y_train, train_pred)
            val_acc = self.calculate_accuracy(y_val, val_pred)
            
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accuracies.append(train_acc)
            val_accuracies.append(val_acc)
            
            if epoch % 10 == 0:
                print(f"Epoch {epoch}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
                      f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
        
        return train_losses, val_losses, train_accuracies, val_accuracies
    
    def predict(self, X):
        """Make predictions"""
        probabilities = self.forward_propagation(X)
        return np.argmax(probabilities, axis=1)
    
    def calculate_accuracy(self, y_true, y_pred):
        """Calculate accuracy"""
        y_true_labels = np.argmax(y_true, axis=1)
        y_pred_labels = np.argmax(y_pred, axis=1)
        return np.mean(y_true_labels == y_pred_labels)

Step 5: Model Training

Train your neural network:

# Create and train the model
model = NeuralNetwork([784, 128, 64, 10])

print("Starting training...")
train_losses, val_losses, train_accs, val_accs = model.train(
    X_train, y_train, X_val, y_val,
    epochs=100, learning_rate=0.01, batch_size=64
)

print("Training completed!")

Step 6: Model Evaluation

Evaluate your model’s performance:

# Test set evaluation
test_predictions = model.predict(X_test)
test_accuracy = np.mean(test_predictions == np.argmax(y_test, axis=1))
print(f"Test Accuracy: {test_accuracy:.4f}")

# Plot training history
def plot_training_history(train_losses, val_losses, train_accs, val_accs):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Loss plot
    ax1.plot(train_losses, label='Training Loss')
    ax1.plot(val_losses, label='Validation Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()
    ax1.grid(True)
    
    # Accuracy plot
    ax2.plot(train_accs, label='Training Accuracy')
    ax2.plot(val_accs, label='Validation Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('Training and Validation Accuracy')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

plot_training_history(train_losses, val_losses, train_accs, val_accs)

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

y_test_labels = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_labels, test_predictions)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Classification report
print("\nClassification Report:")
print(classification_report(y_test_labels, test_predictions))

Step 7: Visualize Predictions

Visualize some predictions to understand model behavior:

def visualize_predictions(X, y_true, predictions, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.ravel()
    
    # Get some random samples
    indices = np.random.choice(len(X), num_samples, replace=False)
    
    for i, idx in enumerate(indices):
        img = X[idx].reshape(28, 28)
        true_label = y_true[idx]
        pred_label = predictions[idx]
        
        color = 'green' if true_label == pred_label else 'red'
        
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'True: {true_label}, Pred: {pred_label}', color=color)
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_predictions(X_test, y_test_labels, test_predictions)

Step 8: Model Improvement Suggestions

Here are ways to improve your model:

def analyze_errors(X, y_true, predictions):
    """Analyze prediction errors"""
    errors = y_true != predictions
    error_indices = np.where(errors)[0]
    
    print(f"Total errors: {len(error_indices)} out of {len(y_true)}")
    print(f"Error rate: {len(error_indices)/len(y_true)*100:.2f}%")
    
    # Most confused classes
    error_combinations = {}
    for idx in error_indices:
        true_pred_pair = (y_true[idx], predictions[idx])
        error_combinations[true_pred_pair] = error_combinations.get(true_pred_pair, 0) + 1
    
    print("\nMost common errors:")
    sorted_errors = sorted(error_combinations.items(), key=lambda x: x[1], reverse=True)
    for (true_label, pred_label), count in sorted_errors[:5]:
        print(f"True: {true_label}, Predicted: {pred_label}, Count: {count}")

analyze_errors(y_test_labels, test_predictions)

Project Summary

Congratulations! You’ve built a complete neural network project. Here’s what you accomplished:

βœ… Key Achievements

  • Built a neural network from scratch using only NumPy
  • Implemented forward and backward propagation
  • Trained on the MNIST dataset with proper data preprocessing
  • Achieved significant accuracy on handwritten digit classification
  • Evaluated model performance with multiple metrics
  • Analyzed errors to understand model limitations

πŸš€ Next Steps

  1. Experiment with architectures: Try different layer sizes and depths
  2. Add regularization: Implement dropout or L2 regularization
  3. Try different optimizers: Implement Adam or RMSprop
  4. Data augmentation: Rotate, scale, or shift images for better generalization
  5. Deploy your model: Save the model and create a simple web interface

πŸ“š Portfolio Addition

This project demonstrates:

  • Deep learning fundamentals
  • Mathematical implementation skills
  • Data preprocessing and evaluation
  • Performance analysis and visualization

Save your code, results, and analysis in a GitHub repository to showcase your deep learning skills to potential employers or for academic purposes.

Conclusion

Building a neural network from scratch gives you invaluable insights into how deep learning works under the hood. You now understand the mathematical foundations and can confidently move on to using high-level frameworks like PyTorch and TensorFlow with a solid understanding of what’s happening behind the scenes.

Keep experimenting and building more complex projects to deepen your deep learning expertise!