Types of Ensemble Learning
Last Updated : 17 May, 2025
Ensemble Learning in machine learning that integrates multiple models called as weak learners to create a single effective model for prediction. This technique is used to enhance accuracy, minimizing variance and removing overfitting. Here we will learn different ensemble techniques and their algorithms.
Bagging (Bootstrap Aggregating)
Bagging is a technique that involves creating multiple versions of a model and combining their outputs to improve overall performance.
In bagging several base models are trained on different subsets of the training data, then aggregate their predictions to make the final decision. The subsets of the data are created using bootstrapping, a statistical technique where samples are drawn with replacement, meaning some data points can appear more than once in a subset.
The final prediction from the ensemble is typically made by either:
- Averaging the predictions (for regression problems), or
- Majority voting (for classification problems).
This approach helps to reduce variance, especially with models that are prone to overfitting, such as decision trees.
Common Algorithms Using Bagging
1. Random Forest
- Random forest is an ensemble method based on decision trees. Multiple decision trees are trained using different bootstrapped samples of the data.
- In addition to bagging, Random Forest also introduces randomness by selecting a random subset of features at each node, further reducing variance and overfitting.
2. Bagged Decision Trees
- In Bagged Decision Trees, multiple decision trees are trained using bootstrapped samples of the data.
- Each tree is trained independently and the final prediction is made by averaging the predictions of all the trees in the ensemble.
Boosting
Boosting is an ensemble technique where multiple models are trained sequentially, with each new model attempting to correct the errors made by the previous ones.
Boosting focuses on adjusting the weights of incorrectly classified data points, so the next model pays more attention to those difficult cases. By combining the outputs of these models, boosting typically improves the accuracy of the final prediction.
In boosting, each new model is added to the ensemble in a way that emphasizes the mistakes made by previous models. The final prediction is usually made by combining the weighted predictions of all the models in the ensemble.
The final prediction from the ensemble is typically made by:
- Weighted sum (for regression problems), or
- Weighted majority vote (for classification problems).
This approach helps to reduce bias, especially when using weak learners, by focusing on the misclassified points.
Common Algorithms Using Boosting
1. AdaBoost (Adaptive Boosting)
- AdaBoost works by adjusting the weights of misclassified instances and combining the predictions of weak learners (usually decision trees). Each subsequent model is trained to correct the mistakes of the previous model.
- AdaBoost can significantly improve the performance of weak models, especially when used for classification problems.
2. Gradient Boosting
- Gradient Boosting is a more general approach to boosting that builds models sequentially, with each new model fitting the residual errors of the previous model.
- he models are trained to minimize a loss function, which can be customized based on the specific task.
- We can perform regression and classification tasks using Gradient Boosting.
3. XGBoost (Extreme Gradient Boosting)
- XGBoost is an optimized version of gradient boosting. It includes regularization to prevent overfitting and supports parallelization to speed up training.
- XGBoost has become a popular choice in machine learning competitions due to its high performance.
Stacking
Stacking (Stacked Generalization) combines multiple models (base learners) of different types, where each model makes independent predictions and a meta-model is trained to combine these predictions. Instead of simply averaging or voting, as in bagging and boosting, stacking trains a higher-level model (meta-model) to learn how to best combine the predictions of the base models.
In stacking, the base models are trained on the original data and their predictions are then used as features for the meta-model, which learns how to combine them effectively. The final prediction is made by the meta-model based on the combined outputs of all the base models.
The final prediction from the ensemble is typically made by:
- Meta-model: A model that learns how to combine the predictions of the base models to generate the final output.
Common Algorithms Using Stacking
1. Generalized Stacking
- In Generalized Stacking, multiple different models (e.g., decision trees, logistic regression, neural networks) are trained on the same dataset.
- And a meta-model (such as a logistic regression or another decision tree) is trained on the predictions made by th andese base models.
- The meta-model learns how to combine the predictions of the base models to make the final prediction.
2. Stacking with Cross-Validation
- In stacking with cross-validation, the base models are trained using cross-validation and their predictions on the validation set are used to train the meta-model.
- This prevents overfitting and ensures that the meta-model is trained on unbiased data.
3. Multi-Layer Stacking
- Multi-layer stacking involves multiple levels of base models, where the outputs of the first level of models are fed into a second level of base models and so on, before reaching the meta-model.
- This approach creates a more complex ensemble that can capture a wider variety of patterns in the data.