Course Content
Linear Regression Advanced
Polynomial regression, regularization, and advanced techniques
Introduction
Linear regression is one of the most fundamental algorithms in machine learning. It helps us understand the relationship between variables and predict continuous outcomes.
In this tutorial, youβll learn how to implement linear regression using Python with pandas
, scikit-learn
, and matplotlib
. By the end of this tutorial, you will be able to build, train, and evaluate your first machine learning model.
Prerequisites
- Basic knowledge of Python
- Installed libraries:
pandas
,scikit-learn
,matplotlib
You can install these using:
pip install pandas scikit-learn matplotlib
Step 1: Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Load and Explore Data
For this tutorial, weβll use a simple dataset with hours studied vs. scores achieved.
data = {
'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Scores': [10, 20, 30, 40, 50, 55, 65, 75, 85, 95]
}
df = pd.DataFrame(data)
print(df.head())
Step 3: Visualize the Data
Visualizing helps understand the relationship between hours studied and scores.
plt.scatter(df['Hours'], df['Scores'], color='blue')
plt.title('Hours vs Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.show()
Step 4: Prepare Data for Training
Split your data into features and labels, and then into training and testing sets.
X = df[['Hours']]
y = df['Scores']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 5: Train the Linear Regression Model
Now, initialize and train your linear regression model.
model = LinearRegression()
model.fit(X_train, y_train)
Step 6: Evaluate the Model
Evaluate your model using predictions, RMSE, and RΒ² score.
y_pred = model.predict(X_test)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R2 Score:", r2_score(y_test, y_pred))
Step 7: Visualize Predictions
Visualize how well your model fits the data.
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted Scores')
plt.xlabel('Hours Studied')
plt.ylabel('Score')
plt.legend()
plt.show()
Conclusion
π Congratulations! You have successfully:
β
Loaded and visualized your dataset.
β
Trained a linear regression model using scikit-learn
.
β
Evaluated and visualized your modelβs performance.
Whatβs Next?
- Try using a larger, real-world dataset.
- Explore polynomial regression for non-linear relationships.
- Read our Intermediate Tutorials to learn classification, hyperparameter tuning, and model deployment.
If you have questions or want to share your results, join our SuperML Community to learn and grow together!