Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Decision Tree Algorithms
Next article icon

Decision Tree Algorithms

Last Updated : 02 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Decision trees are widely used machine learning algorithms and can be applied to both classification and regression tasks. These models work by splitting data into subsets based on features this process is known as decision making. Each leaf node provides a prediction and the splits create a tree-like structure. Decision trees are popular because they are easy to interpret and visualize making it easier to understand the decision-making process.

In machine learning, there are various types of decision tree algorithms. In this article, we'll explore these types so that you can choose the most appropriate one for your task.

Types of Decision Tree Algorithms

There are six different decision tree algorithms as shown in diagram are listed below. Each one of has its advantage and limitations. Let's understand them one-by-one:

Types-of-Decision-Tree-Algorithms
Decision Tree Algorithmsa

1. ID3 (Iterative Dichotomiser 3)

ID3 is a classic decision tree algorithm commonly used for classification tasks. It works by greedily choosing the feature that maximizes the information gain at each node. It calculates entropy and information gain for each feature and selects the feature with the highest information gain for splitting.

Entropy: It measures impurity in the dataset. Denoted by H(D) for dataset D is calculated using the formula:

H(D) = \Sigma^n _{i=1}\;p_{i}\; log_{2}(p_{i})

Information gain: It quantifies the reduction in entropy after splitting the dataset on a feature:

Information\; Gain = H(D) - \Sigma^V_{v=1} \frac{|D_{v}|}{|D|}H (D_{v})

ID3 recursively splits the dataset using the feature with the highest information gain until all examples in a node belong to the same class or no features remain to split. After the tree is constructed it prune branches that don't significantly improve accuracy to reduce overfitting. But it tends to overfit the training data and cannot directly handle continuous attributes. These issues are addressed by other algorithms like C4.5 and CART.

For its implementation you can refer to the article: Iterative Dichotomiser 3 (ID3) Algorithm From Scratch

2. C4.5

C4.5 uses a modified version of information gain called the gain ratio to reduce the bias towards features with many values. The gain ratio is computed by dividing the information gain by the intrinsic information which measures the amount of data required to describe an attribute’s values:

Gain Ratio = \frac{Split\; gain}{Gain\;\;information}

  • It addresses several limitations of ID3 including its inability to handle continuous attributes and its tendency to overfit the training set. It handles continuous attributes by first sorting the attribute values and then selecting the midpoint between adjacent values as a potential split point. The split that maximizes information gain or gain ratio is chosen.
  • It can also generate rules from the decision tree by converting each path from the root to a leaf into a rule, which can be used to make predictions on new data.
  • This algorithm improves accuracy and reduces overfitting by using gain ratio and post-pruning. While effective for both discrete and continuous attributes, C4.5 may still struggle with noisy data and large feature sets.

C4.5 has limitations:

  • It can be prone to overfitting especially in noisy datasets even if uses pruning techniques.
  • Performance may degrade when dealing with datasets that have many features.

3. CART (Classification and Regression Trees)

CART is a widely used decision tree algorithm that is used for classification and regression tasks.

  • For classification CART splits data based on the Gini impurity which measures the likelihood of incorrectly classified randomly selected data. The feature that minimizes the Gini impurity is selected for splitting at each node. The formula is:

Gini(D) = 1 - \Sigma^n _{i=1}\; p^2_{i}

where p_i​ is the probability of class i in dataset D.

  • For regression CART builds regression trees by minimizing the variance of the target variable within each subset. The split that reduces the variance the most is chosen.

To reduce overfitting CART uses cost-complexity pruning after tree construction. This method involves minimizing a cost function that combines the impurity and tree complexity by adding a complexity parameter to the impurity measure. It builds binary trees where each internal node has exactly two child nodes simplifying the splitting process and making the resulting tree easier to interpret.

For its implementation you can refer to the article: Implementing CART (Classification And Regression Tree) in Python

4. CHAID (Chi-Square Automatic Interaction Detection)

CHAID uses chi-square tests to determine the best splits especially for categorical variables. It recursively divides the data into smaller subsets until each subset contains only data points of the same class or within a specified range of values. It chooses feature for splitting with highest chi-squared statistic indicating the strong relationship with the target variable. This approach is particularly useful for analyzing large datasets with many categorical features. The Chi-Square Statistic formula:

X^2 = \Sigma \frac{(O_{i} - E_{i})^2}{E_{i}}

Where:

  • O_i represents the observed frequency
  • E_i represents the expected frequency in each category.

It compares the observed distribution to the expected distribution to determine if there is a significant difference. CHAID can be applied to both classification and regression tasks. In classification algorithm assigns a class label to new data points by following the tree from the root to a leaf node with leaf node’s class label being assigned to data. In regression it predicts the target variable by averaging the values at the leaf node.

5. MARS (Multivariate Adaptive Regression Splines)

MARS is an extension of the CART algorithm. It uses splines to model non-linear relationships between variables. It constructs a piecewise linear model where the relationship between the input and output variables is linear but with variable slopes at different points, known as knots. It automatically selects and positions these knots based on the data distribution and the need to capture non-linearities.

Basis Functions: Each basis function in MARS is a simple linear function defined over a range of the predictor variable. The function is described as:

h(x) = \Bigg \{ x - t \;\; if \; x>t \\ t-x \;\; if x \leq t \Bigg\}

Where

  • x is a predictor variable
  • t is the knot function.

Knot Function: The knots are the points where the piecewise linear functions connect. MARS places these knots to best represent the data's non-linear structure.

MARS begins by constructing a model with a single piece and then applies forward stepwise selection to iteratively add pieces that reduce the error. The process continues until the model reaches a desired complexity. It is particularly effective for modeling complex relationships in data and is widely used in regression tasks.

6. Conditional Inference Trees

Conditional Inference Trees uses statistical tests to choose splits based on the relationship between features and the target variable. It use permutation tests to select the feature that best splits the data while minimizing bias.

The algorithm follows a recursive approach. At each node it evaluates the statistical significance of potential splits using tests like the Chi-squared test for categorical features and the F-test for continuous features. The feature with the strongest relationship to the target is selected for the split. The process continues until the data cannot be further split or meets predefined stopping criteria.

Summarizing all Algorithms

Here’s a short summary of all decision tree algorithms we have learned so far:

  1. ID3: Uses information gain to split data and works well for classification but it is prone to overfitting and struggles with continuous data.
  2. C4.5: Advance version of ID3 with gain ratio for both discrete and continuous data but struggle with noisy data.
  3. CART: Used for both classification and regression task. It minimizes Gini impurity for classification and MSE for regression with pruning technique to prevent overfitting.
  4. CHAID: Uses chi-square tests for splitting and is effective for large categorical datasets but not for continuous data.
  5. MARS: Extended version of CART using piecewise linear functions to model non-linear relationships but it is computationally expensive.
  6. Conditional Inference Trees: Uses statistical hypothesis testing for unbiased splits and handles various data types but it is slower than others.

Decision tree algorithms provide approach for both classification and regression tasks. While each algorithm brings its own strengths understanding its mechanism is important for selecting the best algorithm for a given problem for better accuracy of model.


Next Article
Decision Tree Algorithms

S

sirvinaysy60t
Improve
Article Tags :
  • Machine Learning
  • Geeks Premier League
  • AI-ML-DS
  • Python scikit-module
  • Geeks Premier League 2023
  • AI-ML-DS With Python
Practice Tags :
  • Machine Learning

Similar Reads

    Tree Based Machine Learning Algorithms
    Tree-based algorithms are a fundamental component of machine learning, offering intuitive decision-making processes akin to human reasoning. These algorithms construct decision trees, where each branch represents a decision based on features, ultimately leading to a prediction or classification. By
    14 min read
    Algorithm definition and meaning
    Algorithm can be defined as - A set of finite rules or instructions to be followed in calculations or other problem-solving operations. An algorithm can be expressed using pseudocode or flowcharts. Properties of Algorithm: An algorithm has several important properties that include: Input: An algorit
    3 min read
    Machine Learning Algorithms
    Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith
    8 min read
    Decision Theory in AI
    Decision theory is a foundational concept in Artificial Intelligence (AI), enabling machines to make rational and informed decisions based on available data. It combines principles from mathematics, statistics, economics, and psychology to model and improve decision-making processes. In AI, decision
    8 min read
    Types of Algorithms in Pattern Recognition
    At the center of pattern recognition are various algorithms designed to process and classify data. These can be broadly classified into statistical, structural and neural network-based methods. Pattern recognition algorithms can be categorized as:Statistical Pattern Recognition – Based on probabilis
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences