Course Content
Ensemble Methods
Random forests, boosting, and ensemble techniques
Introduction
Ensemble methods combine multiple machine learning models to create a stronger overall model. They help improve accuracy, stability, and robustness, often outperforming individual models.
Why Use Ensemble Methods?
β
Reduce variance and overfitting (e.g., bagging).
β
Reduce bias and improve predictive power (e.g., boosting).
β
Leverage multiple model strengths (e.g., stacking).
Types of Ensemble Methods
1οΈβ£ Bagging (Bootstrap Aggregating)
- Trains multiple models on different subsets of data (with replacement) and averages their predictions.
- Example: Random Forest.
2οΈβ£ Boosting
- Trains models sequentially, where each model tries to correct errors from the previous one.
- Examples: AdaBoost, Gradient Boosting, XGBoost.
3οΈβ£ Stacking
- Combines multiple models (base learners) and uses another model (meta-learner) to learn how to best combine their predictions.
Example: Implementing Ensemble Methods in Python
Import Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.metrics import accuracy_score
Sample Data
data = {
'Feature1': [5, 10, 15, 20, 25, 30, 35, 40],
'Feature2': [2, 4, 7, 10, 14, 18, 21, 25],
'Label': [0, 0, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
X = df[['Feature1', 'Feature2']]
y = df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
Using Random Forest (Bagging)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
print("Random Forest Accuracy:", accuracy_score(y_test, y_pred_rf))
Using AdaBoost (Boosting)
ada = AdaBoostClassifier(n_estimators=50, random_state=42)
ada.fit(X_train, y_train)
y_pred_ada = ada.predict(X_test)
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))
Conclusion
π You have learned:
β
What ensemble methods are and why they are useful.
β
The differences between bagging, boosting, and stacking.
β
How to implement Random Forest (bagging) and AdaBoost (boosting) in scikit-learn.
β
How to evaluate ensemble models.
Whatβs Next?
- Explore Gradient Boosting and XGBoost for advanced boosting methods.
- Learn hyperparameter tuning to improve ensemble performance.
- Continue with model deployment tutorials to deploy your trained models.
Join our SuperML Community to share your ensemble experiments, ask questions, and continue your learning journey!