Building Your First Neural Network Project

Build a complete neural network project from scratch, including data preparation, model design, training, and evaluation for image classification

🔰 beginner
⏱️ 240 minutes
👤 SuperML Team

· Deep Learning · 7 min read

📋 Prerequisites

  • Understanding of neural network basics
  • Python programming with NumPy
  • Basic knowledge of deep learning concepts

🎯 What You'll Learn

  • Build a complete neural network project from start to finish
  • Implement data preprocessing for image classification
  • Design and train a neural network architecture
  • Evaluate model performance and make improvements
  • Create a project suitable for your portfolio

Building Your First Neural Network Project

This comprehensive project will guide you through building a complete neural network from scratch for handwritten digit classification. You’ll learn the entire deep learning pipeline from data preparation to model evaluation.

Project Overview

We’ll build a neural network to classify handwritten digits (0-9) using the famous MNIST dataset. This project covers all essential aspects of deep learning development.

What You’ll Accomplish

  • ✅ Load and preprocess the MNIST dataset
  • ✅ Design a neural network architecture
  • ✅ Implement forward and backward propagation
  • ✅ Train the model with proper optimization
  • ✅ Evaluate performance and visualize results
  • ✅ Save and deploy your trained model

Step 1: Project Setup

First, let’s set up our environment and import necessary libraries:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(42)

Step 2: Data Loading and Exploration

Load the MNIST dataset and explore its structure:

# Load MNIST dataset
print("Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data, mnist.target.astype(int)

print(f"Dataset shape: {X.shape}")
print(f"Labels shape: {y.shape}")
print(f"Unique labels: {np.unique(y)}")

# Visualize sample images
def plot_sample_images(X, y, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(12, 6))
    axes = axes.ravel()
    
    for i in range(num_samples):
        img = X[i].reshape(28, 28)
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'Label: {y[i]}')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

plot_sample_images(X, y)

Step 3: Data Preprocessing

Prepare the data for neural network training:

# Normalize pixel values to [0, 1] range
X = X / 255.0

# One-hot encode labels
def one_hot_encode(y, num_classes=10):
    encoded = np.zeros((len(y), num_classes))
    for i, label in enumerate(y):
        encoded[i, label] = 1
    return encoded

y_encoded = one_hot_encode(y)

# Split into train, validation, and test sets
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y_encoded, test_size=0.3, random_state=42, stratify=y
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp.argmax(axis=1)
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"Test set: {X_test.shape}")

Step 4: Neural Network Implementation

Build a neural network class from scratch:

class NeuralNetwork:
    def __init__(self, layers):
        """
        Initialize neural network with specified layer sizes
        layers: list of layer sizes [input_size, hidden1_size, hidden2_size, output_size]
        """
        self.layers = layers
        self.num_layers = len(layers)
        
        # Initialize weights and biases
        self.weights = []
        self.biases = []
        
        for i in range(1, self.num_layers):
            # Xavier initialization
            w = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2.0 / layers[i-1])
            b = np.zeros((1, layers[i]))
            
            self.weights.append(w)
            self.biases.append(b)
    
    def sigmoid(self, z):
        """Sigmoid activation function"""
        return 1 / (1 + np.exp(-np.clip(z, -250, 250)))
    
    def sigmoid_derivative(self, z):
        """Derivative of sigmoid function"""
        s = self.sigmoid(z)
        return s * (1 - s)
    
    def relu(self, z):
        """ReLU activation function"""
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        """Derivative of ReLU function"""
        return (z > 0).astype(float)
    
    def softmax(self, z):
        """Softmax activation for output layer"""
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def forward_propagation(self, X):
        """Forward pass through the network"""
        self.z_values = []
        self.a_values = [X]
        
        # Hidden layers with ReLU
        for i in range(len(self.weights) - 1):
            z = np.dot(self.a_values[-1], self.weights[i]) + self.biases[i]
            a = self.relu(z)
            
            self.z_values.append(z)
            self.a_values.append(a)
        
        # Output layer with softmax
        z_output = np.dot(self.a_values[-1], self.weights[-1]) + self.biases[-1]
        a_output = self.softmax(z_output)
        
        self.z_values.append(z_output)
        self.a_values.append(a_output)
        
        return a_output
    
    def compute_loss(self, y_true, y_pred):
        """Compute cross-entropy loss"""
        m = y_true.shape[0]
        loss = -np.sum(y_true * np.log(y_pred + 1e-8)) / m
        return loss
    
    def backward_propagation(self, X, y_true, y_pred):
        """Backward pass to compute gradients"""
        m = X.shape[0]
        
        # Initialize gradients
        dW = [np.zeros_like(w) for w in self.weights]
        db = [np.zeros_like(b) for b in self.biases]
        
        # Output layer gradient
        dz = y_pred - y_true
        dW[-1] = np.dot(self.a_values[-2].T, dz) / m
        db[-1] = np.sum(dz, axis=0, keepdims=True) / m
        
        # Hidden layers gradients
        for i in range(len(self.weights) - 2, -1, -1):
            dz = np.dot(dz, self.weights[i + 1].T) * self.relu_derivative(self.z_values[i])
            dW[i] = np.dot(self.a_values[i].T, dz) / m
            db[i] = np.sum(dz, axis=0, keepdims=True) / m
        
        return dW, db
    
    def update_parameters(self, dW, db, learning_rate):
        """Update weights and biases"""
        for i in range(len(self.weights)):
            self.weights[i] -= learning_rate * dW[i]
            self.biases[i] -= learning_rate * db[i]
    
    def train(self, X_train, y_train, X_val, y_val, epochs=100, learning_rate=0.001, batch_size=32):
        """Train the neural network"""
        train_losses = []
        val_losses = []
        train_accuracies = []
        val_accuracies = []
        
        for epoch in range(epochs):
            # Mini-batch training
            for i in range(0, len(X_train), batch_size):
                X_batch = X_train[i:i+batch_size]
                y_batch = y_train[i:i+batch_size]
                
                # Forward pass
                y_pred = self.forward_propagation(X_batch)
                
                # Backward pass
                dW, db = self.backward_propagation(X_batch, y_batch, y_pred)
                
                # Update parameters
                self.update_parameters(dW, db, learning_rate)
            
            # Calculate losses and accuracies
            train_pred = self.forward_propagation(X_train)
            val_pred = self.forward_propagation(X_val)
            
            train_loss = self.compute_loss(y_train, train_pred)
            val_loss = self.compute_loss(y_val, val_pred)
            
            train_acc = self.calculate_accuracy(y_train, train_pred)
            val_acc = self.calculate_accuracy(y_val, val_pred)
            
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            train_accuracies.append(train_acc)
            val_accuracies.append(val_acc)
            
            if epoch % 10 == 0:
                print(f"Epoch {epoch}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}, "
                      f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
        
        return train_losses, val_losses, train_accuracies, val_accuracies
    
    def predict(self, X):
        """Make predictions"""
        probabilities = self.forward_propagation(X)
        return np.argmax(probabilities, axis=1)
    
    def calculate_accuracy(self, y_true, y_pred):
        """Calculate accuracy"""
        y_true_labels = np.argmax(y_true, axis=1)
        y_pred_labels = np.argmax(y_pred, axis=1)
        return np.mean(y_true_labels == y_pred_labels)

Step 5: Model Training

Train your neural network:

# Create and train the model
model = NeuralNetwork([784, 128, 64, 10])

print("Starting training...")
train_losses, val_losses, train_accs, val_accs = model.train(
    X_train, y_train, X_val, y_val,
    epochs=100, learning_rate=0.01, batch_size=64
)

print("Training completed!")

Step 6: Model Evaluation

Evaluate your model’s performance:

# Test set evaluation
test_predictions = model.predict(X_test)
test_accuracy = np.mean(test_predictions == np.argmax(y_test, axis=1))
print(f"Test Accuracy: {test_accuracy:.4f}")

# Plot training history
def plot_training_history(train_losses, val_losses, train_accs, val_accs):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Loss plot
    ax1.plot(train_losses, label='Training Loss')
    ax1.plot(val_losses, label='Validation Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()
    ax1.grid(True)
    
    # Accuracy plot
    ax2.plot(train_accs, label='Training Accuracy')
    ax2.plot(val_accs, label='Validation Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('Training and Validation Accuracy')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

plot_training_history(train_losses, val_losses, train_accs, val_accs)

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

y_test_labels = np.argmax(y_test, axis=1)
cm = confusion_matrix(y_test_labels, test_predictions)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Classification report
print("\nClassification Report:")
print(classification_report(y_test_labels, test_predictions))

Step 7: Visualize Predictions

Visualize some predictions to understand model behavior:

def visualize_predictions(X, y_true, predictions, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.ravel()
    
    # Get some random samples
    indices = np.random.choice(len(X), num_samples, replace=False)
    
    for i, idx in enumerate(indices):
        img = X[idx].reshape(28, 28)
        true_label = y_true[idx]
        pred_label = predictions[idx]
        
        color = 'green' if true_label == pred_label else 'red'
        
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(f'True: {true_label}, Pred: {pred_label}', color=color)
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_predictions(X_test, y_test_labels, test_predictions)

Step 8: Model Improvement Suggestions

Here are ways to improve your model:

def analyze_errors(X, y_true, predictions):
    """Analyze prediction errors"""
    errors = y_true != predictions
    error_indices = np.where(errors)[0]
    
    print(f"Total errors: {len(error_indices)} out of {len(y_true)}")
    print(f"Error rate: {len(error_indices)/len(y_true)*100:.2f}%")
    
    # Most confused classes
    error_combinations = {}
    for idx in error_indices:
        true_pred_pair = (y_true[idx], predictions[idx])
        error_combinations[true_pred_pair] = error_combinations.get(true_pred_pair, 0) + 1
    
    print("\nMost common errors:")
    sorted_errors = sorted(error_combinations.items(), key=lambda x: x[1], reverse=True)
    for (true_label, pred_label), count in sorted_errors[:5]:
        print(f"True: {true_label}, Predicted: {pred_label}, Count: {count}")

analyze_errors(y_test_labels, test_predictions)

Project Summary

Congratulations! You’ve built a complete neural network project. Here’s what you accomplished:

✅ Key Achievements

  • Built a neural network from scratch using only NumPy
  • Implemented forward and backward propagation
  • Trained on the MNIST dataset with proper data preprocessing
  • Achieved significant accuracy on handwritten digit classification
  • Evaluated model performance with multiple metrics
  • Analyzed errors to understand model limitations

🚀 Next Steps

  1. Experiment with architectures: Try different layer sizes and depths
  2. Add regularization: Implement dropout or L2 regularization
  3. Try different optimizers: Implement Adam or RMSprop
  4. Data augmentation: Rotate, scale, or shift images for better generalization
  5. Deploy your model: Save the model and create a simple web interface

📚 Portfolio Addition

This project demonstrates:

  • Deep learning fundamentals
  • Mathematical implementation skills
  • Data preprocessing and evaluation
  • Performance analysis and visualization

Save your code, results, and analysis in a GitHub repository to showcase your deep learning skills to potential employers or for academic purposes.

Conclusion

Building a neural network from scratch gives you invaluable insights into how deep learning works under the hood. You now understand the mathematical foundations and can confidently move on to using high-level frameworks like PyTorch and TensorFlow with a solid understanding of what’s happening behind the scenes.

Keep experimenting and building more complex projects to deepen your deep learning expertise!

Back to Tutorials

Related Tutorials

🔰beginner ⏱️ 30 minutes

Introduction to Deep Learning

Get started with deep learning by understanding what it is, how it differs from machine learning, and explore key concepts like neural networks and activation functions with beginner-friendly explanations.

Deep Learning2 min read
deep learningbeginnermachine learning +1
🔰beginner ⏱️ 30 minutes

Key Concepts in Deep Learning for Beginners

Understand the foundational concepts in deep learning, including neurons, layers, activation functions, loss functions, and the training process, with simple explanations and examples.

Deep Learning2 min read
deep learningbeginnerkey concepts +1
🔰beginner ⏱️ 30 minutes

Basic Linear Algebra for Deep Learning

Understand the essential linear algebra concepts for deep learning, including scalars, vectors, matrices, and matrix operations, with clear examples for beginners.

Deep Learning2 min read
deep learninglinear algebrabeginner +1
🔰beginner ⏱️ 45 minutes

Your First Deep Learning Implementation

Build your first deep learning model to classify handwritten digits using TensorFlow and Keras, explained step-by-step for beginners.

Deep Learning2 min read
deep learningbeginnerkeras +2