Press ESC to exit fullscreen
πŸ“– Lesson ⏱️ 90 minutes

Ensemble Methods

Combining models for better performance

Introduction

Ensemble methods combine multiple machine learning models to create a stronger overall model. They help improve accuracy, stability, and robustness, often outperforming individual models.


Why Use Ensemble Methods?

βœ… Reduce variance and overfitting (e.g., bagging).
βœ… Reduce bias and improve predictive power (e.g., boosting).
βœ… Leverage multiple model strengths (e.g., stacking).


Types of Ensemble Methods

1️⃣ Bagging (Bootstrap Aggregating)

  • Trains multiple models on different subsets of data (with replacement) and averages their predictions.
  • Example: Random Forest.

2️⃣ Boosting

  • Trains models sequentially, where each model tries to correct errors from the previous one.
  • Examples: AdaBoost, Gradient Boosting, XGBoost.

3️⃣ Stacking

  • Combines multiple models (base learners) and uses another model (meta-learner) to learn how to best combine their predictions.

Example: Implementing Ensemble Methods in Python

Import Libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score

Sample Data

data = {
    'Feature1': [5, 10, 15, 20, 25, 30, 35, 40],
    'Feature2': [2, 4, 7, 10, 14, 18, 21, 25],
    'Label': [0, 0, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)

X = df[['Feature1', 'Feature2']]
y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Using Random Forest (Bagging)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

Using AdaBoost (Boosting)

ada = AdaBoostClassifier(n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))

Conclusion

πŸŽ‰ You have learned:

βœ… What ensemble methods are and why they are useful.
βœ… The differences between bagging, boosting, and stacking.
βœ… How to implement Random Forest (bagging) and AdaBoost (boosting) in scikit-learn.
βœ… How to evaluate ensemble models.


What’s Next?

  • Explore Gradient Boosting and XGBoost for advanced boosting methods.
  • Learn hyperparameter tuning to improve ensemble performance.
  • Continue with model deployment tutorials to deploy your trained models.

Join our SuperML Community to share your ensemble experiments, ask questions, and continue your learning journey!