Building Your First Neural Network Project

This comprehensive project will guide you through building a complete neural network from scratch for handwritten digit classification. You’ll learn the entire deep learning pipeline from data preparation to model evaluation.

Project Overview

We’ll build a neural network to classify handwritten digits (0-9) using the famous MNIST dataset. This project covers all essential aspects of deep learning development.

What You’ll Accomplish

✅ Load and preprocess the MNIST dataset
✅ Design a neural network architecture
✅ Implement forward and backward propagation
✅ Train the model with proper optimization
✅ Evaluate performance and visualize results
✅ Save and deploy your trained model

Step 1: Project Setup

First, let’s set up our environment and import necessary libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

Step 2: Data Loading and Exploration

Load the MNIST dataset and explore its structure:

# Load MNIST dataset
print("Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target.astype(int)

print(f"Dataset shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Unique labels: {np.unique(y)}")

# Visualize sample images
def plot_sample_images(X, y, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    axes = axes.ravel()
    
    for i in range(num_samples):
        img = X[i].reshape(28, 28)
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'Label: {y[i]}')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

plot_sample_images(X, y)

Step 3: Data Preprocessing

Prepare the data for neural network training:

# Normalize pixel values to [0, 1] range
X = X / 255.0

# One-hot encode labels
def one_hot_encode(y, num_classes=10):
    encoded = np.zeros((len(y), num_classes))
    for i, label in enumerate(y):
        encoded[i, label] = 1
    return encoded

y_encoded = one_hot_encode(y)

# Split into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y_encoded, test_size=0.3, random_state=42, stratify=y
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp.argmax(axis=1)
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"Test set: {X_test.shape}")

Step 4: Neural Network Implementation

Build a neural network class from scratch:

class NeuralNetwork:
    def __init__(self, layers):
        """
        Initialize neural network with specified layer sizes
        layers: list of layer sizes [input_size, hidden1_size, hidden2_size, output_size]
        """
        self.layers = layers
        self.num_layers = len(layers)
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # Xavier initialization
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def sigmoid(self, z):
        """Sigmoid activation function"""
        return 1 / (1 + np.exp(-np.clip(z, -250, 250)))
    
    def sigmoid_derivative(self, z):
        """Derivative of sigmoid function"""
        s = self.sigmoid(z)
        return s * (1 - s)
    
    def relu(self, z):
        """ReLU activation function"""
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        """Derivative of ReLU function"""
        return (z > 0).astype(float)
    
    def softmax(self, z):
        """Softmax activation for output layer"""
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def forward_propagation(self, X):
        """Forward pass through the network"""
        self.z_values = []
        self.a_values = [X]
        
        # Hidden layers with ReLU
        for i in range(len(self.weights) - 1):
            z = np.dot(self.a_values[-1], self.weights[i]) + self.biases[i]
            a = self.relu(z)
            
            self.z_values.append(z)
            self.a_values.append(a)
        
        # Output layer with softmax
        z_output = np.dot(self.a_values[-1], self.weights[-1]) + self.biases[-1]
        a_output = self.softmax(z_output)
        
        self.z_values.append(z_output)
        self.a_values.append(a_output)
        
        return a_output
    
    def compute_loss(self, y_true, y_pred):
        """Compute cross-entropy loss"""
        m = y_true.shape[0]
        loss = -np.sum(y_true * np.log(y_pred + 1e-8)) / m
        return loss
    
    def backward_propagation(self, X, y_true, y_pred):
        """Backward pass to compute gradients"""
        m = X.shape[0]
        
        # Initialize gradients
        dW = [np.zeros_like(w) for w in self.weights]
        db = [np.zeros_like(b) for b in self.biases]
        
        # Output layer gradient
        dz = y_pred - y_true
        dW[-1] = np.dot(self.a_values[-2].T, dz) / m
        db[-1] = np.sum(dz, axis=0, keepdims=True) / m
        
        # Hidden layers gradients
        for i in range(len(self.weights) - 2, -1, -1):
            dz = np.dot(dz, self.weights[i + 1].T) * self.relu_derivative(self.z_values[i])
            dW[i] = np.dot(self.a_values[i].T, dz) / m
            db[i] = np.sum(dz, axis=0, keepdims=True) / m
        
        return dW, db
    
    def update_parameters(self, dW, db, learning_rate):
        """Update weights and biases"""
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * dW[i]
            self.biases[i] -= learning_rate * db[i]
    
    def train(self, X_train, y_train, X_val, y_val, epochs=100, learning_rate=0.001, batch_size=32):
        """Train the neural network"""
        train_losses = []
        val_losses = []
        train_accuracies = []
        val_accuracies = []
        
        for epoch in range(epochs):
            # Mini-batch training
            for i in range(0, len(X_train), batch_size):
                X_batch = X_train[i:i+batch_size]
                y_batch = y_train[i:i+batch_size]
                
                # Forward pass
                y_pred = self.forward_propagation(X_batch)
                
                # Backward pass
                dW, db = self.backward_propagation(X_batch, y_batch, y_pred)
                
                # Update parameters
                self.update_parameters(dW, db, learning_rate)
            
            # Calculate losses and accuracies
            train_pred = self.forward_propagation(X_train)
            val_pred = self.forward_propagation(X_val)
            
            train_loss = self.compute_loss(y_train, train_pred)
            val_loss = self.compute_loss(y_val, val_pred)
            
            train_acc = self.calculate_accuracy(y_train, train_pred)
            val_acc = self.calculate_accuracy(y_val, val_pred)
            
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accuracies.append(train_acc)
            val_accuracies.append(val_acc)
            
            if epoch % 10 == 0:
                print(f"Epoch {epoch}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
                      f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
        
        return train_losses, val_losses, train_accuracies, val_accuracies
    
    def predict(self, X):
        """Make predictions"""
        probabilities = self.forward_propagation(X)
        return np.argmax(probabilities, axis=1)
    
    def calculate_accuracy(self, y_true, y_pred):
        """Calculate accuracy"""
        y_true_labels = np.argmax(y_true, axis=1)
        y_pred_labels = np.argmax(y_pred, axis=1)
        return np.mean(y_true_labels == y_pred_labels)

Step 5: Model Training

Train your neural network:

# Create and train the model
model = NeuralNetwork([784, 128, 64, 10])

print("Starting training...")
train_losses, val_losses, train_accs, val_accs = model.train(
    X_train, y_train, X_val, y_val,
    epochs=100, learning_rate=0.01, batch_size=64
)

print("Training completed!")

Step 6: Model Evaluation

Evaluate your model’s performance:

# Test set evaluation
test_predictions = model.predict(X_test)
test_accuracy = np.mean(test_predictions == np.argmax(y_test, axis=1))
print(f"Test Accuracy: {test_accuracy:.4f}")

# Plot training history
def plot_training_history(train_losses, val_losses, train_accs, val_accs):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Loss plot
    ax1.plot(train_losses, label='Training Loss')
    ax1.plot(val_losses, label='Validation Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()
    ax1.grid(True)
    
    # Accuracy plot
    ax2.plot(train_accs, label='Training Accuracy')
    ax2.plot(val_accs, label='Validation Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('Training and Validation Accuracy')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

plot_training_history(train_losses, val_losses, train_accs, val_accs)

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

y_test_labels = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_labels, test_predictions)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Classification report
print("\nClassification Report:")
print(classification_report(y_test_labels, test_predictions))

Step 7: Visualize Predictions

Visualize some predictions to understand model behavior:

def visualize_predictions(X, y_true, predictions, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.ravel()
    
    # Get some random samples
    indices = np.random.choice(len(X), num_samples, replace=False)
    
    for i, idx in enumerate(indices):
        img = X[idx].reshape(28, 28)
        true_label = y_true[idx]
        pred_label = predictions[idx]
        
        color = 'green' if true_label == pred_label else 'red'
        
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'True: {true_label}, Pred: {pred_label}', color=color)
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_predictions(X_test, y_test_labels, test_predictions)

Step 8: Model Improvement Suggestions

Here are ways to improve your model:

def analyze_errors(X, y_true, predictions):
    """Analyze prediction errors"""
    errors = y_true != predictions
    error_indices = np.where(errors)[0]
    
    print(f"Total errors: {len(error_indices)} out of {len(y_true)}")
    print(f"Error rate: {len(error_indices)/len(y_true)*100:.2f}%")
    
    # Most confused classes
    error_combinations = {}
    for idx in error_indices:
        true_pred_pair = (y_true[idx], predictions[idx])
        error_combinations[true_pred_pair] = error_combinations.get(true_pred_pair, 0) + 1
    
    print("\nMost common errors:")
    sorted_errors = sorted(error_combinations.items(), key=lambda x: x[1], reverse=True)
    for (true_label, pred_label), count in sorted_errors[:5]:
        print(f"True: {true_label}, Predicted: {pred_label}, Count: {count}")

analyze_errors(y_test_labels, test_predictions)

Project Summary

Congratulations! You’ve built a complete neural network project. Here’s what you accomplished:

✅ Key Achievements

Built a neural network from scratch using only NumPy
Implemented forward and backward propagation
Trained on the MNIST dataset with proper data preprocessing
Achieved significant accuracy on handwritten digit classification
Evaluated model performance with multiple metrics
Analyzed errors to understand model limitations

🚀 Next Steps

Experiment with architectures: Try different layer sizes and depths
Add regularization: Implement dropout or L2 regularization
Try different optimizers: Implement Adam or RMSprop
Data augmentation: Rotate, scale, or shift images for better generalization
Deploy your model: Save the model and create a simple web interface

📚 Portfolio Addition

This project demonstrates:

Deep learning fundamentals
Mathematical implementation skills
Data preprocessing and evaluation
Performance analysis and visualization

Save your code, results, and analysis in a GitHub repository to showcase your deep learning skills to potential employers or for academic purposes.

Conclusion

Building a neural network from scratch gives you invaluable insights into how deep learning works under the hood. You now understand the mathematical foundations and can confidently move on to using high-level frameworks like PyTorch and TensorFlow with a solid understanding of what’s happening behind the scenes.

Keep experimenting and building more complex projects to deepen your deep learning expertise!

Course Content

Building Your First Neural Network Project

Building Your First Neural Network Project

Project Overview

What You’ll Accomplish

Step 1: Project Setup

Step 2: Data Loading and Exploration

Step 3: Data Preprocessing

Step 4: Neural Network Implementation

Step 5: Model Training

Step 6: Model Evaluation

Step 7: Visualize Predictions

Step 8: Model Improvement Suggestions

Project Summary

✅ Key Achievements

🚀 Next Steps

📚 Portfolio Addition

Conclusion

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies