Build Your First Machine Learning Model: Linear Regression with Python

Learn how to build, train, and evaluate your first linear regression model using Python and scikit-learn in this beginner-friendly guide.

🔰 beginner
⏱️ 45 minutes
👤 SuperML Team

· Machine Learning · 2 min read

📋 Prerequisites

  • Basic knowledge of Python
  • Familiarity with Python data structures
  • Understanding of variables and functions

🎯 What You'll Learn

  • Understand what linear regression is and when to use it
  • Load and explore data using pandas
  • Train a linear regression model using scikit-learn
  • Evaluate model performance with metrics
  • Visualize predictions and model performance

Introduction

Linear regression is one of the most fundamental algorithms in machine learning. It helps us understand the relationship between variables and predict continuous outcomes.

In this tutorial, you’ll learn how to implement linear regression using Python with pandas, scikit-learn, and matplotlib. By the end of this tutorial, you will be able to build, train, and evaluate your first machine learning model.

Prerequisites

  • Basic knowledge of Python
  • Installed libraries: pandas, scikit-learn, matplotlib

You can install these using:

pip install pandas scikit-learn matplotlib

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

Step 2: Load and Explore Data

For this tutorial, we’ll use a simple dataset with hours studied vs. scores achieved.

data = {
    'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Scores': [10, 20, 30, 40, 50, 55, 65, 75, 85, 95]
}
df = pd.DataFrame(data)
print(df.head())

Step 3: Visualize the Data

Visualizing helps understand the relationship between hours studied and scores.

plt.scatter(df['Hours'], df['Scores'], color='blue')
plt.title('Hours vs Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.show()

Step 4: Prepare Data for Training

Split your data into features and labels, and then into training and testing sets.

X = df[['Hours']]
y = df['Scores']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Train the Linear Regression Model

Now, initialize and train your linear regression model.

model = LinearRegression()
model.fit(X_train, y_train)

Step 6: Evaluate the Model

Evaluate your model using predictions, RMSE, and R² score.

y_pred = model.predict(X_test)

print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))

Step 7: Visualize Predictions

Visualize how well your model fits the data.

plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.legend()
plt.show()

Conclusion

🎉 Congratulations! You have successfully:

✅ Loaded and visualized your dataset.
✅ Trained a linear regression model using scikit-learn.
✅ Evaluated and visualized your model’s performance.

What’s Next?

  • Try using a larger, real-world dataset.
  • Explore polynomial regression for non-linear relationships.
  • Read our Intermediate Tutorials to learn classification, hyperparameter tuning, and model deployment.

If you have questions or want to share your results, join our SuperML Community to learn and grow together!

Back to Tutorials

Related Tutorials

🔰beginner ⏱️ 50 minutes

Dimensionality Reduction

Learn what dimensionality reduction is, why it matters in machine learning, and how techniques like PCA, t-SNE, and UMAP help simplify high-dimensional data for effective analysis.

Machine Learning2 min read
machine learningdimensionality reductiondata preprocessing +1
🔰beginner ⏱️ 50 minutes

Genetic Algorithms

Learn what genetic algorithms are, how they mimic natural selection to solve optimization problems, and how they are used in machine learning.

Machine Learning2 min read
machine learninggenetic algorithmsoptimization +1
🔰beginner ⏱️ 40 minutes

Introduction to Natural Language Processing (NLP)

A clear, beginner-friendly introduction to NLP, explaining what it is, why it matters, and its key tasks with practical examples.

Machine Learning2 min read
nlpmachine learningdeep learning +1
🔰beginner ⏱️ 45 minutes

Limitations of Machine Learning

Understand the key limitations and fundamental limits of machine learning to set realistic expectations while building and using ML models.

Machine Learning2 min read
machine learninglimitationsbeginner