Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python for Machine Learning
  • Machine Learning with R
  • Machine Learning Algorithms
  • EDA
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • ML Projects
  • Deep Learning
  • NLP
  • Computer vision
  • Data Science
  • Artificial Intelligence
Open In App
Next Article:
XGBClassifier
Next article icon

XGBClassifier

Last Updated : 25 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

XGBClassifier is an efficient machine learning algorithm provided by the XGBoost library which stands for Extreme Gradient Boosting. It is widely used for solving classification problems like predicting if an email is spam, if a customer will churn or if a transaction is fraudulent. XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.

XG-Boost
XGBoost

Parameters of XGBClassifier

  1. n_estimators: Defines the number of boosting rounds. More trees can increase accuracy but also the risk of overfitting and training time.
  2. learning_rate: Controls how much each tree contributes to the final prediction. Lower values make the model more robust but require more trees.
  3. max_depth: Limits the maximum depth of each decision tree. Deeper trees can capture more patterns but may overfit the data.
  4. subsample: Specifies the fraction of training instances to be used for growing each tree. Helps prevent overfitting.
  5. colsample_bytree: Fraction of features to be used when building each tree. Reduces correlation between trees and prevents overfitting.
  6. gamma: Minimum loss reduction required to make a further partition on a leaf node. Acts as a regularization term to control tree complexity.
  7. reg_alpha(L1 regularization) and reg_lambda (L2 regularization): These help prevent overfitting by adding penalties for large weights (coefficients). L1 can lead to sparsity (feature selection), while L2 reduces weight size.
  8. objective: Specifies the learning task and the corresponding loss function.
  9. scale_pos_weight: Helps with imbalanced classification tasks by giving more importance to the minority class. It’s typically set to the ratio of negative to positive samples.
  10. early_stopping_rounds: Used during training with validation data to stop the training process once the evaluation metric stops improving.

Implementation in python

  • This code demonstrates how to use XGBClassifier from the XGBoost library for a multiclass classification task using the Iris dataset.
  • First, it loads the Iris dataset and splits it into training and testing sets (70% training, 30% testing).
  • Then, it initializes the XGBClassifier model and trains it on the training data.
  • After training, it predicts the class labels for the test set and finally prints the accuracy of the model by comparing the predicted labels to the actual labels in the test set.
Python
from xgboost import XGBClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score  # Load a sample dataset data = load_iris() X, y = data.data, data.target  # Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)  # Create and train model model = XGBClassifier() model.fit(X_train, y_train)  # Predict and evaluate y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred)) 

Output:

Accuracy: 0.9333333333333333

Use cases of XGBClassifier

  1. Credit Scoring and Risk Prediction: Banks and financial institutions use XGBClassifier to predict whether a loan applicant is likely to default. Its high accuracy and handling of imbalanced data make it ideal for credit risk modelling.
  2. Fraud Detection: In domains like banking and e commerce it helps detect fraudulent transactions by identifying subtle patterns in large, complex datasets.
  3. Customer Churn Prediction: Telecom, SaaS and subscription based businesses use it to identify which customers are likely to cancel their service enabling proactive retention strategies.
  4. Medical Diagnosis: Used in healthcare for disease classification by analyzing patient data. It can handle missing values and imbalanced datasets often found in medical records.
  5. Spam Email Detection: Trains on labelled email data to classify whether incoming emails are spam or not often with better accuracy than traditional models.

Advantages

  1. Gradient Boosting Framework: XGBClassifier is based on gradient boosting where trees are built sequentially to minimize a loss function using gradient descent.
  2. Ensemble of Decision Trees: The model combines the predictions of many weak learners (decision trees) to form a strong classifier.
  3. Additive Training: New trees are added to correct the errors made by previous trees, gradually improving the model's accuracy.
  4. Regularization: Uses both L1 (Lasso) and L2 (Ridge) regularization to control model complexity and prevent overfitting.

Disadvantages

  1. Complex Hyperparameter Tuning: XGBClassifier has a large number of hyperparameters such as max_depth, learning_rate, subsample, colsample_bytree, gamma and regularization terms. Finding the right combination for a specific problem can be time consuming and computationally expensive.
  2. Risk of Overfitting: Although XGBoost includes regularization to prevent overfitting it's still vulnerable if not tuned properly. Overfitting results in excellent training accuracy but poor generalization on unseen test data.
  3. Less Interpretable: XGBClassifier is essentially a black box model. While tools like SHAP and LIME can help explain predictions they add another layer of complexity. Compared to simpler models such as decision trees or logistic regression understanding why the model made a particular prediction is more difficult which can be a concern in domains where explainability is important.

Next Article
XGBClassifier

S

shrurfu5
Improve
Article Tags :
  • Machine Learning
  • Machine Learning
  • AI-ML-DS With Python
Practice Tags :
  • Machine Learning
  • Machine Learning

Similar Reads

    Ridge Classifier
    Supervised Learning is the type of Machine Learning that uses labelled data to train the model. Both Regression and Classification belong to the category of Supervised Learning. Regression: This is used to predict a continuous range of values using one or more features. These features act as the ind
    10 min read
    Build a Neural Network Classifier in R
    Creating a neural network classifier in R can be done using the popular deep learning framework called Keras, which provides a high-level interface to build and train neural networks. Here's a step-by-step guide on how to build a simple neural network classifier using Keras in R Programming Language
    9 min read
    Dataset for Classification
    Classification is a type of supervised learning where the objective is to predict the categorical labels of new instances based on past observations. The goal is to learn a model from the training data that can predict the class label for unseen data accurately. Classification problems are common in
    5 min read
    Binary classification using CatBoost
    CatBoost is a high-performance, open-source gradient boosting library developed by Yandex, a Russian multinational IT company. It is designed for categorical feature support, making it particularly powerful for structured data like those often encountered in real-world datasets. In this article, we
    13 min read
    Top 6 Machine Learning Classification Algorithms
    Are you navigating the complex world of machine learning and looking for the most efficient algorithms for classification tasks? Look no further. Understanding the intricacies of Machine Learning Classification Algorithms is essential for professionals aiming to find effective solutions across diver
    13 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences