· Machine Learning · 2 min read
📋 Prerequisites
- Basic knowledge of Python
- Familiarity with Python data structures
- Understanding of variables and functions
🎯 What You'll Learn
- Understand what linear regression is and when to use it
- Load and explore data using pandas
- Train a linear regression model using scikit-learn
- Evaluate model performance with metrics
- Visualize predictions and model performance
Introduction
Linear regression is one of the most fundamental algorithms in machine learning. It helps us understand the relationship between variables and predict continuous outcomes.
In this tutorial, you’ll learn how to implement linear regression using Python with pandas
, scikit-learn
, and matplotlib
. By the end of this tutorial, you will be able to build, train, and evaluate your first machine learning model.
Prerequisites
- Basic knowledge of Python
- Installed libraries:
pandas
,scikit-learn
,matplotlib
You can install these using:
pip install pandas scikit-learn matplotlib
Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Load and Explore Data
For this tutorial, we’ll use a simple dataset with hours studied vs. scores achieved.
data = {
'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Scores': [10, 20, 30, 40, 50, 55, 65, 75, 85, 95]
}
df = pd.DataFrame(data)
print(df.head())
Step 3: Visualize the Data
Visualizing helps understand the relationship between hours studied and scores.
plt.scatter(df['Hours'], df['Scores'], color='blue')
plt.title('Hours vs Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.show()
Step 4: Prepare Data for Training
Split your data into features and labels, and then into training and testing sets.
X = df[['Hours']]
y = df['Scores']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train the Linear Regression Model
Now, initialize and train your linear regression model.
model = LinearRegression()
model.fit(X_train, y_train)
Step 6: Evaluate the Model
Evaluate your model using predictions, RMSE, and R² score.
y_pred = model.predict(X_test)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))
Step 7: Visualize Predictions
Visualize how well your model fits the data.
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.legend()
plt.show()
Conclusion
🎉 Congratulations! You have successfully:
✅ Loaded and visualized your dataset.
✅ Trained a linear regression model using scikit-learn
.
✅ Evaluated and visualized your model’s performance.
What’s Next?
- Try using a larger, real-world dataset.
- Explore polynomial regression for non-linear relationships.
- Read our Intermediate Tutorials to learn classification, hyperparameter tuning, and model deployment.
If you have questions or want to share your results, join our SuperML Community to learn and grow together!