Ensemble Methods

Introduction

Ensemble methods combine multiple machine learning models to create a stronger overall model. They help improve accuracy, stability, and robustness, often outperforming individual models.

Why Use Ensemble Methods?

✅ Reduce variance and overfitting (e.g., bagging).
✅ Reduce bias and improve predictive power (e.g., boosting).
✅ Leverage multiple model strengths (e.g., stacking).

Types of Ensemble Methods

1️⃣ Bagging (Bootstrap Aggregating)

Trains multiple models on different subsets of data (with replacement) and averages their predictions.
Example: Random Forest.

2️⃣ Boosting

Trains models sequentially, where each model tries to correct errors from the previous one.
Examples: AdaBoost, Gradient Boosting, XGBoost.

3️⃣ Stacking

Combines multiple models (base learners) and uses another model (meta-learner) to learn how to best combine their predictions.

Example: Implementing Ensemble Methods in Python

Import Libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score

Sample Data

data = {
    'Feature1': [5, 10, 15, 20, 25, 30, 35, 40],
    'Feature2': [2, 4, 7, 10, 14, 18, 21, 25],
    'Label': [0, 0, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)

X = df[['Feature1', 'Feature2']]
y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Using Random Forest (Bagging)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))

Using AdaBoost (Boosting)

ada = AdaBoostClassifier(n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))

Conclusion

🎉 You have learned:

✅ What ensemble methods are and why they are useful.
✅ The differences between bagging, boosting, and stacking.
✅ How to implement Random Forest (bagging) and AdaBoost (boosting) in scikit-learn.
✅ How to evaluate ensemble models.

What’s Next?

Explore Gradient Boosting and XGBoost for advanced boosting methods.
Learn hyperparameter tuning to improve ensemble performance.
Continue with model deployment tutorials to deploy your trained models.

Join our SuperML Community to share your ensemble experiments, ask questions, and continue your learning journey!

Course Content

Introduction

Why Use Ensemble Methods?

Types of Ensemble Methods

1️⃣ Bagging (Bootstrap Aggregating)

2️⃣ Boosting

3️⃣ Stacking

Example: Implementing Ensemble Methods in Python

Import Libraries

Sample Data

Using Random Forest (Bagging)

Using AdaBoost (Boosting)

Conclusion

What’s Next?

🍪 Cookie Notice

Cookie Preferences

Essential Cookies

Analytics Cookies

Marketing Cookies

Functionality Cookies