Implementation of Stacking - ML

Last Updated : 15 May, 2025

Stacking is a ensemble learning technique used to improve performance of models by combining the predictions of multiple models. In this article, we will see how to implement a Stacking Classifier on a classification dataset using Python.

For better understanding about stacking refer to: Stacking in Machine Learning

Before its implementation we need to install these packages for our implementation using following commands:

pip install mlxtend
pip install pandas
pip install -U scikit-learn

Step 1: Importing the required Libraries

We will import pandas, matplotlib and scikit learn for this.

python

import pandas as pd import matplotlib.pyplot as plt from mlxtend.plotting import plot_confusion_matrix from mlxtend.classifier import StackingClassifier from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB  from sklearn.metrics import confusion_matrix from sklearn.metrics import accuracy_score

Step 2: Loading the following Dataset

You can Download the dataset from this link Heart Dataset.

python

df = pd.read_csv('heart.csv') X = df.drop('target', axis = 1) y = df['target'] df.head()

Output:

Step 3: Split the Data into Training and Testing Sets

test_size = 0.2: Specifies that 20% of the data should be used for testing, leaving 80% for training.
random_state = 42: Ensures reproducibility by setting a fixed seed for random number generation.

python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

Step 4: Standardize the Data

In this step the data is standardized using the StandardScaler to ensure that features have a mean of 0 and a standard deviation of 1.

var_transform: Specifies the list of feature columns that need to be standardized.
X_train[var_transform]: Applies the fit_transform method to standardize the selected columns in the training data.
X_test[var_transform]: Applies the transform method to standardize the corresponding columns in the test data using the scaling parameters from the training data.

python

sc = StandardScaler()    var_transform = ['thalach', 'age', 'trestbps', 'oldpeak', 'chol'] X_train[var_transform] = sc.fit_transform(X_train[var_transform])    X_test[var_transform] = sc.transform(X_test[var_transform])            print(X_train.head())

Output:

Step 5: Build First Layer Estimators

The first layer consists of base models. For this example we’ll use K-Nearest Neighbors classifier and Naive Bayes classifier.

python

KNC = KNeighborsClassifier() NB = GaussianNB()

Step 6: Training and Evaluating KNeighborsClassifier

Let's train and evaluate the KNeighborsClassifier.

python

model_kNeighborsClassifier = KNC.fit(X_train, y_train) pred_knc = model_kNeighborsClassifier.predict(X_test)

Evaluation:

python

acc_knc = accuracy_score(y_test, pred_knc) print('accuracy score of KNeighbors Classifier is:', acc_knc * 100)

Output:

accuracy score of KNeighbors Classifier is: 80.4878048780

Step 7: Training and Evaluating Naive Bayes Classifier

python

model_NaiveBayes = NB.fit(X_train, y_train) pred_nb = model_NaiveBayes.predict(X_test)

Evaluation:

python

acc_nb = accuracy_score(y_test, pred_nb) print('Accuracy of Naive Bayes Classifier:', acc_nb * 100)

Output:

Accuracy of Naive Bayes Classifier: 80.0

Step 8: Implementing the Stacking Classifier

Now, we combine the base models using a Stacking Classifier. The meta-model will be a logistic regression model which will take the predictions of KNN and Naive Bayes as input.

python

from sklearn.linear_model import LogisticRegression  base_learners = [     ('knn', KNeighborsClassifier()),     ('nb', GaussianNB()) ] meta_model = LogisticRegression()  stacking_model = StackingClassifier(estimators=base_learners, final_estimator=meta_model, use_probas=True)

Step 9: Training Stacking Classifier

python

model_stack = clf_stack.fit(X_train, y_train) pred_stack = model_stack.predict(X_test)

Evaluating Stacking Classifier:

python

acc_stack = accuracy_score(y_test, pred_stack)  # evaluating accuracy print('accuracy score of Stacked model:', acc_stack * 100)

Output:

accuracy score of Stacked model: 83.90243902439025

Both of our individual models scores an accuracy of nearly 80% and our Stacked model got an accuracy of nearly 84% . By Combining two individual models we got a significant performance improvement.

Tensorflow XLA: The Fusion Compiler for Tensorflow

Koushik222

Improve

Article Tags :

Practice Tags :

Machine Learning

Implementation of Stacking - ML

Step 1: Importing the required Libraries

Step 2: Loading the following Dataset

Step 3: Split the Data into Training and Testing Sets

Step 4: Standardize the Data

Step 5: Build First Layer Estimators

Step 6: Training and Evaluating KNeighborsClassifier

Step 7: Training and Evaluating Naive Bayes Classifier

Step 8: Implementing the Stacking Classifier

Step 9: Training Stacking Classifier

Similar Reads