Press ESC to exit fullscreen
📖 Lesson ⏱️ 90 minutes

Ensemble Methods

Combining models for better performance

Introduction

Ensemble methods combine multiple machine learning models to create a stronger overall model. They help improve accuracy, stability, and robustness, often outperforming individual models.


Why Use Ensemble Methods?

✅ Reduce variance and overfitting (e.g., bagging).
✅ Reduce bias and improve predictive power (e.g., boosting).
✅ Leverage multiple model strengths (e.g., stacking).


Types of Ensemble Methods

1️⃣ Bagging (Bootstrap Aggregating)

  • Trains multiple models on different subsets of data (with replacement) and averages their predictions.
  • Example: Random Forest.

2️⃣ Boosting

  • Trains models sequentially, where each model tries to correct errors from the previous one.
  • Examples: AdaBoost, Gradient Boosting, XGBoost.

3️⃣ Stacking

  • Combines multiple models (base learners) and uses another model (meta-learner) to learn how to best combine their predictions.

Example: Implementing Ensemble Methods in Python

Import Libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score

Sample Data

data = {
    'Feature1': [5, 10, 15, 20, 25, 30, 35, 40],
    'Feature2': [2, 4, 7, 10, 14, 18, 21, 25],
    'Label': [0, 0, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)

X = df[['Feature1', 'Feature2']]
y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Using Random Forest (Bagging)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

Using AdaBoost (Boosting)

ada = AdaBoostClassifier(n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))

Conclusion

🎉 You have learned:

✅ What ensemble methods are and why they are useful.
✅ The differences between bagging, boosting, and stacking.
✅ How to implement Random Forest (bagging) and AdaBoost (boosting) in scikit-learn.
✅ How to evaluate ensemble models.


What’s Next?

  • Explore Gradient Boosting and XGBoost for advanced boosting methods.
  • Learn hyperparameter tuning to improve ensemble performance.
  • Continue with model deployment tutorials to deploy your trained models.

Join our SuperML Community to share your ensemble experiments, ask questions, and continue your learning journey!