Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Control Variables in Statistics
Next article icon

LightGBM Learning Control Parameters

Last Updated : 04 Nov, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will delve into the realm of LightGBM's learning control parameters, understanding their significance and impact on the model's performance.

What is LightGBM?

LightGBM is a powerful gradient-boosting framework that has gained immense popularity in the fields of machine learning and data science. Light GBM is open-source, developed by Microsoft, and part of the Distributed Machine Learning Toolkit (DMTK) project. It is designed for efficient and scalable machine learning.

Tree-based algorithms are a class of machine learning algorithms that use decision trees to make predictions. Decision trees are a versatile and interpretable way to model complex relationships in data. Tree-based algorithms are widely used for both classification and regression tasks. LightGBM uses this method for gradient boosting.

The Role of Learning Control Parameters

Control parameters in the context of LightGBM and other machine learning frameworks are parameters that allow you to influence and control various aspects of the model training process. These parameters don't directly affect the structure of the model or the data but rather control how the training algorithm behaves and when it should stop. Here are some common control parameters in LightGBM:

  • early_stopping_rounds: The number of rounds without improvement in the validation metric before training is stopped. This parameter helps to prevent overfitting.
  • max_depth: The maximum depth of the trees in the model and controls the model complexity. A higher maximum depth will result in more complex trees, but it may also lead to overfitting.
  • lambda_l1 and lambda_l2: These parameters introduce L1 and L2 regularization, respectively, to the leaf weights. Regularization helps prevent overfitting by penalizing large weights and encouraging the model to focus on the most relevant features.
  • min_data_in_leaf: The minimum number of data points in a leaf node. This parameter helps to prevent overfitting.
  • min_gain_to_split: The minimum gain required to split a node. This parameter helps to prevent overfitting.
  • feature_fraction: The fraction of features to be randomly selected at each iteration. This parameter helps to prevent overfitting.
  • bagging_fraction: The fraction of data to be randomly sampled at each iteration. This parameter helps to prevent overfitting.
  • verbosity: Controls the level of LightGBM's verbosity.

Optimizing Control Parameters

Finding the optimal combination of these parameters can significantly impact the model's performance. While manual tuning can be effective, it's often time-consuming and requires domain expertise. One approach is to use grid search or random search to try out different combinations of parameters. Another approach is to start with a set of default parameters and then adjust them one at a time until the desired performance is achieved.

It is important to note that there is no one-size-fits-all approach to tuning learning control parameters. The best parameters will vary depending on the specific dataset and task.

Implementation of Learning Control Parameters

Let's implement LightGBM with various learning control parameters in Python.

Libraries Imported :

We import the necessary libraries:

  • lightgbm as lgb: for gradient boosting.
  • train_test_split: From Scikit-Learn, this function is used to split the dataset into training and testing sets.
  • load_iris: Loads the Iris dataset from Scikit-Learn.
  • accuracy_score: This function from Scikit-Learn computes the accuracy classification score, which measures the accuracy of the classification model.

Dataset Loading and Splitting:

load_iris(): Loads the Iris dataset. iris.data contains the feature data(sepal length, sepal width, petal length, and petal width), and iris.target contains the corresponding labels (species: Setosa, Versicolor, or Virginica). We further split the data into training and testing sets using train_test_split, with 80% of the data used for training and 20% for testing. random_state ensures reproducibility.

Python3
import lightgbm as lgb from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris from sklearn.metrics import accuracy_score # Load the Iris dataset iris = load_iris() X, y = iris.data, iris.target  # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 


LightGBM Parameters :

We define a dictionary param containing following control parameters for LightGBM.

  • early_stopping_rounds: The number of rounds without improvement in the validation metric before training is stopped.
  • max_depth: Maximum tree depth.
  • lambda_l1 and lambda_l2: L1 and L2 regularization terms on weights. They are used to avoid overfitting.
  • min_data_in_leaf: The minimum number of data points in a leaf node.
  • min_gain_to_split: The minimum gain required to split a node.
  • feature_fraction: Fraction of features to be used for each boosting round (helps prevent overfitting).
  • bagging_fraction: Fraction of data to be used for bagging.
  • verbosity: Setting it to -1 makes LightGBM silent during training.
Python3
params = {     'objective': 'multiclass',  # Multiclass classification task     'metric': 'multi_logloss',  # Logarithmic Loss as the evaluation metric for multiclass classification     'num_class': 3,  # Number of classes in the dataset (Iris has 3 classes: Setosa, Versicolour, and Virginica)     'boosting_type': 'gbdt',     'early_stopping_rounds': 10,     'max_depth': 5,      'lambda_l1': 0.1,      'lambda_l2': 0.2,      'min_data_in_leaf': 20,      'min_gain_to_split': 0.01,      'feature_fraction': 0.8,      'bagging_fraction': 0.8,     'verbosity': -1  } 


LightGBM Dataset and Training:

For training and evaluation, we are going to use:

  • lgb.Dataset(): Converts the dataset into LightGBM format for efficient training.
  • lgb.train(): Trains the LightGBM model using the specified parameters, training data, and validation data.

Using the training features and labels, we build a LightGBM dataset train_data, and we use lgb to train the model.train for 100 rounds of boosting using the specified parameters.

Python3
train_data = lgb.Dataset(X_train, label=y_train) test_data = lgb.Dataset(X_test, label=y_test, reference=train_data)  num_round = 100  # Number of boosting rounds bst = lgb.train(params, train_data, num_round, valid_sets=[test_data]) 


Predictions and Evaluation:

Using the training model, we predict the test data and determine the accuracy score to assess the model's performance.

  • bst.predict(): Generates predictions for the test set.
  • accuracy_score(): Computes the accuracy of the model by comparing the predicted labels (y_pred_max) with the true labels (y_test).
Python3
y_pred = bst.predict(X_test, num_iteration=bst.best_iteration) y_pred_max = [list(x).index(max(x)) for x in y_pred]  # Convert probabilities to class labels  accuracy = accuracy_score(y_test, y_pred_max) print(f'Accuracy: {accuracy * 100:.2f}%') 

Output:

Accuracy: 98.45%

In this case, accuracy is 98.45%, indicating that 98.45% of the test samples were classified correctly.

Conclusion

LightGBM is a powerful gradient boosting algorithm that can be used for a variety of machine learning tasks. By tuning the learning control parameters, you can improve the performance of the model on your specific dataset. Whether you're aiming for higher accuracy, faster training times, or improved generalization, thoughtful tuning of these parameters can make a world of difference. As the landscape of machine learning continues to evolve, mastering these parameters equips data scientists with a valuable skill set, enabling them to tackle diverse and complex real-world problems with confidence and precision.


Next Article
Control Variables in Statistics

S

sirvinaysy60t
Improve
Article Tags :
  • Machine Learning
  • Geeks Premier League
  • AI-ML-DS
  • Geeks Premier League 2023
  • LightGBM
Practice Tags :
  • Machine Learning

Similar Reads

  • LightGBM Feature parameters
    LightGBM (Light gradient-boosting machine) is a gradient-boosting framework developed by Microsoft, known for its impressive performance and less memory usage. In this article, we'll explore LightGBM's feature parameters while working with the Wisconsin Breast Cancer dataset. What is LightGBM?Micros
    10 min read
  • LightGBM Tree Parameters
    In the ever-evolving landscape of machine learning, gradient-boosting algorithms have gained significant traction due to their exceptional predictive power and versatility. Among these, LightGBM stands out as a highly efficient and scalable framework. In this article, we will delve into the tree par
    5 min read
  • LightGBM Histogram-Based Learning
    In the era of Machine learning and Data science, various algorithms and techniques are used to handle large datasets for solving real-world problems effectively. Like various machine learning models, one revolutionary innovation is the LightGBM model which utilizes a high-performance gradient boosti
    11 min read
  • Control Variables in Statistics
    Control Variable is a type of variable used to verify the accuracy of any experiment, as the control variable is an essential part of experimental design. Control Variables are used extensively in the field of research where experiments are conducted to compare the new approach to the standard basel
    5 min read
  • Continual Learning in Machine Learning
    As we know Machine Learning (ML) is a subfield of artificial intelligence that specializes in growing algorithms that learn from statistics and make predictions or choices without being explicitly programmed. It has revolutionized many industries by permitting computer systems to understand styles,
    10 min read
  • Parameter Sharing and Typing in Machine Learning
    We usually apply limitations or penalties to parameters in relation to a fixed region or point. L2 regularisation (or weight decay) penalises model parameters that deviate from a fixed value of zero, for example. However, we may occasionally require alternative means of expressing our prior knowledg
    3 min read
  • LightGBM Model evaluation metrics
    LightGBM (Light Gradient Boosting Machine) is a popular gradient boosting framework developed by Microsoft known for its speed and efficiency in training large datasets. It's widely used for various machine-learning tasks, including classification, regression, and ranking. While training a LightGBM
    10 min read
  • Linear Regression in Machine learning
    Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets.It assumes that there is a linear relationship between the input and output, mean
    15+ min read
  • Learning Rate in Neural Network
    In machine learning, parameters play a vital role for helping a model learn effectively. Parameters are categorized into two types: machine-learnable parameters and hyper-parameters. Machine-learnable parameters are estimated by the algorithm during training, while hyper-parameters, such as the lear
    5 min read
  • What are LLM Parameters?
    Parameters are like the "controls" inside a Large Language Model (LLM) that determine how it learns and processes information. There are two main types: Trainable parameters (like weights and biases) that the model learns from data during trainingNon-trainable parameters (like hyperparameters and fr
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences