Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python for Machine Learning
  • Machine Learning with R
  • Machine Learning Algorithms
  • EDA
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • ML Projects
  • Deep Learning
  • NLP
  • Computer vision
  • Data Science
  • Artificial Intelligence
Open In App
Next Article:
Sklearn | Iterative Dichotomiser 3 (ID3) Algorithms
Next article icon

Iterative Dichotomiser 3 (ID3) Algorithm From Scratch

Last Updated : 02 Jan, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In the realm of machine learning and data mining, decision trees stand as versatile tools for classification and prediction tasks. The ID3 (Iterative Dichotomiser 3) algorithm serves as one of the foundational pillars upon which decision tree learning is built. Developed by Ross Quinlan in the 1980s, ID3 remains a fundamental algorithm, forming the basis for subsequent tree-based methods like C4.5 and CART (Classification and Regression Trees).

Introduction to Decision Trees

Machine learning models called decision trees divide the input data recursively according to features to arrive at a decision. Every internal node symbolizes a feature, and every branch denotes a potential result of that feature. It is simple to interpret and visualize thanks to the tree structure. Every leaf node makes a judgment call or forecast. To optimize information acquisition or limit impurity, the best feature is chosen at each stage of creation. Decision trees are adaptable and can be used for both regression and classification applications. Although they can overfit, this is frequently avoided by employing strategies like pruning.

Decision Trees

Before delving into the intricacies of the ID3 algorithm, let's grasp the essence of decision trees. Picture a tree-like structure where each internal node represents a test on an attribute, each branch signifies an outcome of that test, and each leaf node denotes a class label or a decision. Decision trees mimic human decision-making processes by recursively splitting data based on different attributes to create a flowchart-like structure for classification or regression.

ID3 Algorithm

A well-known decision tree approach for machine learning is the Iterative Dichotomiser 3 (ID3) algorithm. By choosing the best characteristic at each node to partition the data depending on information gain, it recursively constructs a tree. The goal is to make the final subsets as homogeneous as possible. By choosing features that offer the greatest reduction in entropy or uncertainty, ID3 iteratively grows the tree. The procedure keeps going until a halting requirement is satisfied, like a minimum subset size or a maximum tree depth. Although ID3 is a fundamental method, other iterations such as C4.5 and CART have addresse

How ID3 Works

The ID3 algorithm is specifically designed for building decision trees from a given dataset. Its primary objective is to construct a tree that best explains the relationship between attributes in the data and their corresponding class labels.

1. Selecting the Best Attribute

  • ID3 employs the concept of entropy and information gain to determine the attribute that best separates the data. Entropy measures the impurity or randomness in the dataset.
  • The algorithm calculates the entropy of each attribute and selects the one that results in the most significant information gain when used for splitting the data.

2. Creating Tree Nodes

  • The chosen attribute is used to split the dataset into subsets based on its distinct values.
  • For each subset, ID3 recurses to find the next best attribute to further partition the data, forming branches and new nodes accordingly.

3. Stopping Criteria

  • The recursion continues until one of the stopping criteria is met, such as when all instances in a branch belong to the same class or when all attributes have been used for splitting.

4. Handling Missing Values

  • ID3 can handle missing attribute values by employing various strategies like attribute mean/mode substitution or using majority class values.

5. Tree Pruning

  • Pruning is a technique to prevent overfitting. While not directly included in ID3, post-processing techniques or variations like C4.5 incorporate pruning to improve the tree's generalization.

Mathematical Concepts of ID3 Algorithm

Now let's examine the formulas linked to the main theoretical ideas in the ID3 algorithm:

1. Entropy

A measure of disorder or uncertainty in a set of data is called entropy. Entropy is a tool used in ID3 to measure a dataset's disorder or impurity. By dividing the data into as homogenous subsets as feasible, the objective is to minimize entropy.

For a set S with classes {c1, c2, ..., cn}, the entropy is calculated as:

H(S) = \Sigma^n _{i=1} p_i log_2(p_i)

Where, pi is the proportion of instances of class ci in the set.

2. Information Gain

A measure of how well a certain quality reduces uncertainty is called Information Gain. ID3 splits the data at each stage, choosing the property that maximizes Information Gain. It is computed using the distinction between entropy prior to and following the split.

Information Gain measures the effectiveness of an attribute A in reducing uncertainty in set S.

IG(A,S) = H(S) - \Sigma_{v \epsilon values(A)} \frac{|S_v|}{|S|} \cdot H(S_v)

Where, |Sv | is the size of the subset of S for which attribute A has value v.

3. Gain Ratio

Gain Ratio is an improvement on Information Gain that considers the inherent worth of characteristics that have a wide range of possible values. It deals with the bias of Information Gain in favor of characteristics with more pronounced values.

GR(A,S) = \frac{IG(A,S)}{\Sigma_{v\epsilon values(A)}\frac{|S_v|}{S} \cdot log_2(\frac{|S_v|}{|S|})}

Iterative Dichotomiser 3 (ID3) Implementation using Python

Let's create a simplified version of the ID3 algorithm from scratch using Python.

Importing Libraries

Importing the necessary libraries:

Python3
from collections import Counter import numpy as np 
  • collections for the Counter class to count occurrences.
  • numpy as np for numerical operations and array handling.

Defining Node Class

Python3
class Node:     def __init__(self, feature=None, value=None, results=None, true_branch=None, false_branch=None):         self.feature = feature  # Feature to split on         self.value = value      # Value of the feature to split on         self.results = results  # Stores class labels if node is a leaf node         self.true_branch = true_branch  # Branch for values that are True for the feature         self.false_branch = false_branch  # Branch for values that are False for the feature 

The provided Python code defines a class called Node for constructing nodes in a decision tree. Each node encapsulates information crucial for decision-making within the tree. The feature attribute signifies the feature used for splitting, while value stores the specific value of that feature for the split. In the case of a leaf node, results holds class labels. The node also has branches, with true_branch representing the path for values evaluating to True for the feature, and false_branch for values evaluating to False. This class forms a fundamental building block for creating decision trees, enabling the representation of decision points and outcomes in a hierarchical structure.

Entropy Calculation Function

Python3
def entropy(data):     counts = np.bincount(data)     probabilities = counts / len(data)     entropy = -np.sum([p * np.log2(p) for p in probabilities if p > 0])     return entropy 

The entropy function calculates the entropy of a given dataset using the formula for information entropy. It first computes the counts of occurrences for each unique element in the dataset using np.bincount. Then, it calculates the probabilities of each element and uses these probabilities to compute the entropy using the standard formula - \Sigma_ip_i \cdot log_2(p_i) . The function ensures that the logarithm is not taken for zero probabilities, avoiding mathematical errors. The result is the entropy value for the input dataset, reflecting its degree of disorder or uncertainty.

Splitting Data Function

Python3
def split_data(X, y, feature, value):     true_indices = np.where(X[:, feature] <= value)[0]     false_indices = np.where(X[:, feature] > value)[0]     true_X, true_y = X[true_indices], y[true_indices]     false_X, false_y = X[false_indices], y[false_indices]     return true_X, true_y, false_X, false_y 

The split_data function divides a dataset into two subsets based on a specified feature and threshold value. It uses NumPy to identify indices where the feature values satisfy the condition (<= value for the true branch and > value for the false branch). Then, it extracts the corresponding subsets for features (true_X and false_X) and labels (true_y and false_y). The function returns these subsets, enabling the partitioning of data for further use in constructing a decision tree.

Building the Tree Function

Python3
def build_tree(X, y):     if len(set(y)) == 1:         return Node(results=y[0])      best_gain = 0     best_criteria = None     best_sets = None     n_features = X.shape[1]      current_entropy = entropy(y)      for feature in range(n_features):         feature_values = set(X[:, feature])         for value in feature_values:             true_X, true_y, false_X, false_y = split_data(X, y, feature, value)             true_entropy = entropy(true_y)             false_entropy = entropy(false_y)             p = len(true_y) / len(y)             gain = current_entropy - p * true_entropy - (1 - p) * false_entropy              if gain > best_gain:                 best_gain = gain                 best_criteria = (feature, value)                 best_sets = (true_X, true_y, false_X, false_y)      if best_gain > 0:         true_branch = build_tree(best_sets[0], best_sets[1])         false_branch = build_tree(best_sets[2], best_sets[3])         return Node(feature=best_criteria[0], value=best_criteria[1], true_branch=true_branch, false_branch=false_branch)      return Node(results=y[0]) 

The build_tree function recursively constructs a decision tree using the ID3 algorithm. It first checks if the labels in the current subset are homogenous; if so, it creates a leaf node with the corresponding class label. Otherwise, it iterates through all features and values, calculating information gain for each split and identifying the one with the highest gain. The function then recursively calls itself to build the true and false branches using the best split criteria. The resulting decision tree is constructed and returned. The process continues until further splits do not yield positive information gain, resulting in the creation of leaf nodes.

Prediction Function

Python3
def predict(tree, sample):     if tree.results is not None:         return tree.results     else:         branch = tree.false_branch         if sample[tree.feature] <= tree.value:             branch = tree.true_branch         return predict(branch, sample) 

The predict function uses a trained decision tree to predict the class label for a given sample. It recursively navigates the tree by checking if the current node is a leaf node (indicated by non-None results). If it is a leaf, it returns the class labels. Otherwise, it determines the next branch to traverse based on the feature value of the sample compared to the node's splitting criteria. The function then calls itself with the appropriate branch until a leaf node is reached, providing the final predicted class labels for the input sample.

Dataset and Tree Building

Python3
X = np.array([[1, 1], [1, 0], [0, 1], [0, 0]]) y = np.array([1, 1, 0, 0])  # Building the tree decision_tree = build_tree(X, y) 

The code creates a dataset X with binary features and their corresponding labels y. Then, it constructs a decision tree using the build_tree function, which recursively builds the tree using the ID3 algorithm based on the provided dataset. The resulting decision_tree is the root node of the constructed decision tree.

Prediction

Python3
sample = np.array([1, 0]) prediction = predict(decision_tree, sample) print(f"Prediction for sample {sample}: {prediction}") 

Output:

Prediction for sample [1 0]: 1
  • Predicts the class label for the sample using the built decision tree and prints the prediction.
  • If we want to predict the class label for the sample [1, 0], the algorithm will traverse the decision tree starting from the root node. As Feature 0 is 1 (greater than 0.5), it will follow the False branch, and thus the prediction will be 1 (Class 1).

Advantages and Limitations of ID3

Advantages

  • Interpretability: Decision trees generated by ID3 are easily interpretable, making them suitable for explaining decisions to non-technical stakeholders.
  • Handles Categorical Data: ID3 can effectively handle categorical attributes without requiring explicit data preprocessing steps.
  • Computationally Inexpensive: The algorithm is relatively straightforward and computationally less expensive compared to some complex models.

Limitations

  • Overfitting: ID3 tends to create complex trees that may overfit the training data, impacting generalization to unseen instances.
  • Sensitive to Noise: Noise or outliers in the data can lead to the creation of non-optimal or incorrect splits.
  • Binary Trees Only: ID3 constructs binary trees, limiting its ability to represent more complex relationships present in the data directly.

Conclusion

The ID3 algorithm laid the groundwork for decision tree learning, providing a robust framework for understanding attribute selection and recursive partitioning. Despite its limitations, ID3's simplicity and interpretability have paved the way for more sophisticated algorithms that address its drawbacks while retaining its essence.

As machine learning continues to evolve, the ID3 algorithm remains a crucial piece in the mosaic of tree-based methods, serving as a stepping stone for developing more advanced and accurate models in the quest for efficient data analysis and pattern recognition.


Next Article
Sklearn | Iterative Dichotomiser 3 (ID3) Algorithms

S

sirvinaysy60t
Improve
Article Tags :
  • Machine Learning
  • Geeks Premier League
  • AI-ML-DS
  • Machine Learning
  • Geeks Premier League 2023
Practice Tags :
  • Machine Learning
  • Machine Learning

Similar Reads

  • Sklearn | Iterative Dichotomiser 3 (ID3) Algorithms
    The ID3 algorithm is a popular decision tree algorithm used in machine learning. It aims to build a decision tree by iteratively selecting the best attribute to split the data based on information gain. Each node represents a test on an attribute, and each branch represents a possible outcome of the
    11 min read
  • Iterative algorithm for a forward data-flow problem
    Overview :The purpose of this article is to tell you about an iterative algorithm for forward data-flow problem. Before starting, you should know some terminology related to data flow analysis. Terminologies for Iterative algorithm :Here, we will discuss terminologies for iterative algorithm as foll
    3 min read
  • kNN: k-Nearest Neighbour Algorithm in R From Scratch
    In this article, we are going to discuss what is KNN algorithm, how it is coded in R Programming Language, its application, advantages and disadvantages of the KNN algorithm. kNN algorithm in RKNN can be defined as a K-nearest neighbor algorithm. It is a supervised learning algorithm that can be use
    15+ min read
  • Implementing the AdaBoost Algorithm From Scratch
    AdaBoost means Adaptive Boosting and it is a is a powerful ensemble learning technique that combines multiple weak classifiers to create a strong classifier. It works by sequentially adding classifiers to correct the errors made by previous models giving more weight to the misclassified data points.
    3 min read
  • Iterative algorithm for a backward data flow problem
    Introduction :The reason for this article is to inform you approximately an iterative set of rules for backward statistics float problems. Before beginning, you must recognize a few terminologies associated with statistics float analysis. Data flow analysis :It is a technique for collecting informat
    6 min read
  • How to Choose Right Machine Learning Algorithm?
    Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. ML is one of the most exciting technologies that one would have ever come across. A machine-learning algorithm is a program with a particular manner of altering its own parameters
    4 min read
  • Spectral Co-Clustering Algorithm in Scikit Learn
    Spectral co-clustering is a type of clustering algorithm that is used to find clusters in both rows and columns of a data matrix simultaneously. This is different from traditional clustering algorithms, which only cluster the rows or columns of a data matrix. Spectral co-clustering is a powerful too
    4 min read
  • Bidirectional Associative Memory (BAM) Implementation from Scratch
    Prerequisite: ANN | Bidirectional Associative Memory (BAM) Learning AlgorithmTo implement BAM model, here are some essential consideration and approach- Consider the value of M, as BAM will be constructed with M pairs of patterns. Here the value of M is 4.Set A: Input PatternsSet B: Corresponding Ta
    4 min read
  • Disjoint Set Union (Randomized Algorithm)
    A Disjoint set union is an algorithm that is used to manage a collection of disjoint sets. A disjoint set is a set in which the elements are not in any other set. Also, known as union-find or merge-find. The disjoint set union algorithm allows you to perform the following operations efficiently: Fin
    15+ min read
  • AO* algorithm in Artificial intelligence (AI)
    The AO* algorithm is an advanced search algorithm utilized in artificial intelligence, particularly in problem-solving and decision-making contexts. It is an extension of the A* algorithm, designed to handle more complex problems that require handling multiple paths and making decisions at each node
    15+ min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences