Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 90 minutes

Linear Regression Advanced

Polynomial regression, regularization, and advanced techniques

Introduction

Linear regression is one of the most fundamental algorithms in machine learning. It helps us understand the relationship between variables and predict continuous outcomes.

In this tutorial, you’ll learn how to implement linear regression using Python with pandas, scikit-learn, and matplotlib. By the end of this tutorial, you will be able to build, train, and evaluate your first machine learning model.

Prerequisites

  • Basic knowledge of Python
  • Installed libraries: pandas, scikit-learn, matplotlib

You can install these using:

pip install pandas scikit-learn matplotlib

Step 1: Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

Step 2: Load and Explore Data

For this tutorial, we’ll use a simple dataset with hours studied vs. scores achieved.

data = {
    'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Scores': [10, 20, 30, 40, 50, 55, 65, 75, 85, 95]
}
df = pd.DataFrame(data)
print(df.head())

Step 3: Visualize the Data

Visualizing helps understand the relationship between hours studied and scores.

plt.scatter(df['Hours'], df['Scores'], color='blue')
plt.title('Hours vs Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.show()

Step 4: Prepare Data for Training

Split your data into features and labels, and then into training and testing sets.

X = df[['Hours']]
y = df['Scores']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Train the Linear Regression Model

Now, initialize and train your linear regression model.

model = LinearRegression()
model.fit(X_train, y_train)

Step 6: Evaluate the Model

Evaluate your model using predictions, RMSE, and RΒ² score.

y_pred = model.predict(X_test)

print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))

Step 7: Visualize Predictions

Visualize how well your model fits the data.

plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.legend()
plt.show()

Conclusion

πŸŽ‰ Congratulations! You have successfully:

βœ… Loaded and visualized your dataset.
βœ… Trained a linear regression model using scikit-learn.
βœ… Evaluated and visualized your model’s performance.

What’s Next?

  • Try using a larger, real-world dataset.
  • Explore polynomial regression for non-linear relationships.
  • Read our Intermediate Tutorials to learn classification, hyperparameter tuning, and model deployment.

If you have questions or want to share your results, join our SuperML Community to learn and grow together!