Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 45 minutes

Setting Up Your Development Environment

Configure Maven/Gradle and IDE for SuperML development

Setting Up Your Java ML Development Environment

A properly configured development environment is crucial for productive machine learning development with SuperML Java. This guide will walk you through setting up everything you need to start building ML applications.

Prerequisites

Before we begin, ensure you have:

  • Java 8 or higher installed on your system
  • Basic familiarity with Java development
  • An IDE (IntelliJ IDEA, Eclipse, or VS Code recommended)
  • Git for version control

Java Development Kit (JDK) Setup

Verify Java Installation

java -version
javac -version

You should see output similar to:

java version "11.0.12" 2021-07-20 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.12+8-LTS-237)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.12+8-LTS-237, mixed mode)
  • Java 11: Excellent balance of features and stability
  • Java 17: Latest LTS with improved performance
  • Java 21: Latest LTS with cutting-edge features

Installing Java (if needed)

On macOS:

# Using Homebrew
brew install openjdk@11

# Or download from Oracle/OpenJDK website

On Ubuntu/Debian:

sudo apt update
sudo apt install openjdk-11-jdk

On Windows: Download from Oracle JDK or OpenJDK.

Maven Setup

SuperML Java is distributed through Maven Central, making Maven the recommended build tool.

Installing Maven

On macOS:

brew install maven

On Ubuntu/Debian:

sudo apt install maven

On Windows:

  1. Download Maven from https://maven.apache.org/download.cgi
  2. Extract to a directory (e.g., C:\Program Files\Apache\maven)
  3. Add Maven’s bin directory to your PATH

Verify Maven Installation

mvn -version

Expected output:

Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
Maven home: /usr/local/Cellar/maven/3.8.4/libexec
Java version: 11.0.12, vendor: Eclipse Adoptium

Project Structure

Creating a New Maven Project

mvn archetype:generate \
  -DgroupId=com.example.ml \
  -DartifactId=superml-demo \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DinteractiveMode=false

cd superml-demo

Project Directory Structure

superml-demo/
β”œβ”€β”€ pom.xml
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main/
β”‚   β”‚   β”œβ”€β”€ java/
β”‚   β”‚   β”‚   └── com/example/ml/
β”‚   β”‚   β”‚       └── App.java
β”‚   β”‚   └── resources/
β”‚   β”‚       └── data/
β”‚   └── test/
β”‚       └── java/
β”‚           └── com/example/ml/
β”‚               └── AppTest.java
β”œβ”€β”€ target/ (generated during build)
└── data/ (for datasets)

Maven Configuration

Basic pom.xml Setup

Create or update your pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
         http://maven.apache.org/xsd/maven-4.0.0.xsd">
    
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.example.ml</groupId>
    <artifactId>superml-demo</artifactId>
    <version>2.1.0</version>
    <packaging>jar</packaging>
    
    <name>SuperML Demo Project</name>
    <description>Machine Learning with SuperML Java</description>
    
    <properties>
        <maven.compiler.source>11</maven.compiler.source>
        <maven.compiler.target>11</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <superml.version>2.1.0</superml.version>
        <junit.version>5.8.2</junit.version>
    </properties>
    
    <dependencies>
        <!-- SuperML Java Framework -->
        <dependency>
            <groupId>org.superml</groupId>
            <artifactId>superml-bundle-all</artifactId>
            <version>${superml.version}</version>
        </dependency>
        
        <!-- Logging -->
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.11</version>
        </dependency>
        
        <!-- Testing -->
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter</artifactId>
            <version>${junit.version}</version>
            <scope>test</scope>
        </dependency>
        
        <!-- CSV Processing (optional) -->
        <dependency>
            <groupId>com.opencsv</groupId>
            <artifactId>opencsv</artifactId>
            <version>5.6</version>
        </dependency>
    </dependencies>
    
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.10.1</version>
                <configuration>
                    <source>11</source>
                    <target>11</target>
                </configuration>
            </plugin>
            
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M7</version>
            </plugin>
            
            <!-- Executable JAR -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.3.0</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.example.ml.App</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Install Dependencies

mvn clean install

IDE Configuration

  1. Open Project: File β†’ Open β†’ Select your project directory
  2. Import Maven Project: IntelliJ should automatically detect and import
  3. Set Project SDK: File β†’ Project Structure β†’ Project β†’ Project SDK (Java 11+)

Recommended Plugins:

  • Maven Helper
  • Rainbow Brackets
  • CodeGlance
  • Key Promoter X

Code Style Settings:

  • File β†’ Settings β†’ Editor β†’ Code Style β†’ Java
  • Set indent: 4 spaces
  • Enable auto-format on save

Eclipse IDE

  1. Import Project: File β†’ Import β†’ Existing Maven Projects
  2. Set Java Build Path: Right-click project β†’ Properties β†’ Java Build Path
  3. Configure Maven: Right-click project β†’ Maven β†’ Reload Projects

Recommended Plugins:

  • M2E (Maven Integration)
  • EGit (Git Integration)
  • PMD
  • SpotBugs

VS Code

  1. Install Extensions:

    • Extension Pack for Java
    • Maven for Java
    • Test Runner for Java
  2. Open Project: File β†’ Open Folder β†’ Select project directory

VS Code Settings (settings.json):

{
    "java.home": "/path/to/your/java",
    "maven.executable.path": "/path/to/maven/bin/mvn",
    "java.format.settings.url": "https://raw.githubusercontent.com/google/styleguide/gh-pages/eclipse-java-google-style.xml"
}

Development Tools Setup

Git Configuration

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Initialize repository
git init
git add .
git commit -m "Initial commit"

.gitignore File

# Maven
target/
pom.xml.tag
pom.xml.releaseBackup
pom.xml.versionsBackup
pom.xml.next
release.properties
dependency-reduced-pom.xml
buildNumber.properties
.mvn/timing.properties

# IDEs
.idea/
*.iml
.eclipse/
.metadata/
.vscode/

# OS
.DS_Store
Thumbs.db

# Logs
*.log

# Data files (optional - exclude large datasets)
data/large_datasets/
*.csv
*.parquet

Logging Configuration

Create src/main/resources/logback.xml:

<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
    
    <logger name="org.superml" level="INFO"/>
    
    <root level="INFO">
        <appender-ref ref="STDOUT"/>
    </root>
</configuration>

Verification Setup

Create a Test Application

src/main/java/com/example/ml/App.java:

package com.example.ml;

import org.superml.linear_model.LinearRegression;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class App {
    private static final Logger logger = LoggerFactory.getLogger(App.class);
    
    public static void main(String[] args) {
        logger.info("Starting SuperML Java Demo");
        
        try {
            // Create sample data
            double[][] X = {{1, 2}, {2, 3}, {3, 4}, {4, 5}};
            double[] y = {3, 5, 7, 9}; // y = x1 + x2
            
            // Create and train model
            LinearRegression model = new LinearRegression();
            model.fit(X, y);
            
            // Make prediction
            double prediction = model.predict(new double[]{5, 6});
            logger.info("Prediction for [5, 6]: {}", prediction);
            
            // Should predict approximately 11
            assert Math.abs(prediction - 11.0) < 0.1 : "Prediction should be close to 11";
            
            logger.info("Setup verification successful!");
            
        } catch (Exception e) {
            logger.error("Setup verification failed", e);
            System.exit(1);
        }
    }
}

Create a Unit Test

src/test/java/com/example/ml/AppTest.java:

package com.example.ml;

import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.DisplayName;
import org.superml.LinearRegression;
import static org.junit.jupiter.api.Assertions.*;

class AppTest {
    
    @Test
    @DisplayName("SuperML Linear Regression Basic Test")
    void testLinearRegression() {
        // Arrange
        double[][] X = {{1, 1}, {2, 2}, {3, 3}};
        double[] y = {2, 4, 6}; // y = 2 * (x1 + x2) / 2
        
        // Act
        LinearRegression model = new LinearRegression();
        model.fit(X, y);
        double prediction = model.predict(new double[]{4, 4});
        
        // Assert
        assertEquals(8.0, prediction, 0.1, "Prediction should be close to 8.0");
    }
}

Run Verification

# Compile and run tests
mvn clean test

# Run the application
mvn exec:java -Dexec.mainClass="com.example.ml.App"

# Or build and run JAR
mvn clean package
java -jar target/superml-demo-2.1.0.jar

Performance Optimization

JVM Options for ML Development

Add to your IDE run configurations or command line:

java -Xmx4g \
     -Xms1g \
     -XX:+UseG1GC \
     -XX:+UseStringDeduplication \
     -jar your-application.jar

Maven Memory Settings

Create .mvn/jvm.config:

-Xmx2g
-Xms512m

Data Directory Setup

Organize Your Data

mkdir -p data/{raw,processed,models,results}

Directory structure:

data/
β”œβ”€β”€ raw/           # Original datasets
β”œβ”€β”€ processed/     # Cleaned/preprocessed data
β”œβ”€β”€ models/        # Trained model files
└── results/       # Prediction results

Sample Data Download Script

scripts/download-sample-data.sh:

#!/bin/bash
mkdir -p data/raw

# Download Iris dataset
curl -o data/raw/iris.csv https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv

# Download Boston Housing (if available)
# curl -o data/raw/housing.csv https://example.com/housing.csv

echo "Sample datasets downloaded successfully!"

Common Issues and Solutions

Issue 1: Java Version Mismatch

Problem: java.lang.UnsupportedClassVersionError Solution: Ensure your runtime Java version matches or exceeds the compilation version.

Issue 2: Maven Dependencies Not Found

Problem: SuperML Java not found in repository Solution: Ensure you have internet connectivity and check Maven settings.

Issue 3: IDE Not Recognizing SuperML Classes

Problem: Import errors for SuperML classes Solution:

  1. Refresh Maven project
  2. Rebuild project
  3. Check Maven dependencies

Issue 4: Memory Issues with Large Datasets

Problem: OutOfMemoryError Solution: Increase JVM heap size and use data streaming approaches.

Next Steps

Now that your environment is set up:

  1. Explore the API: Browse SuperML Java documentation at http://superml-java.superml.org/
  2. Try Examples: Check out examples at https://github.com/supermlorg/superml-java/tree/master/superml-examples
  3. Load Real Data: Learn about data loading and preprocessing
  4. Build Your First Model: Start with linear regression or classification

Summary

In this setup guide, we covered:

  • Java and Maven installation and configuration
  • Project structure and Maven dependencies
  • IDE configuration for optimal development
  • Development tools and logging setup
  • Verification of your installation
  • Performance optimization tips
  • Common troubleshooting

Your development environment is now ready for building machine learning applications with SuperML Java. The next tutorial will cover data loading and preprocessing techniques.