Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R
  • Data Types
  • String
  • Array
  • Vector
  • Lists
  • Matrices
  • Oops in R
Open In App
Next Article:
Tree Entropy in R Programming
Next article icon

Tree Entropy in R Programming

Last Updated : 25 Aug, 2020
Comments
Improve
Suggest changes
Like Article
Like
Report

Entropy in R Programming is said to be a measure of the contaminant or ambiguity existing in the data. It is a deciding constituent while splitting the data through a decision tree. An unsplit sample has an entropy equal to zero while a sample with equally split parts has entropy equal to one. Two major factors that are considered while choosing an appropriate tree are- information gain (IG) and entropy.

Formula :

Entropy = - \sum p(x) \log p(x)

where, p(x) is the probability

For example, consider a school data set of a decision tree whose entropy needs to be calculated.

Library availableCoaching joinedParent's educationStudent's performance
yesyesuneducatedbad
yesnouneducatedbad
nonoeducatedgood
nonouneducatedbad

Hence, it is clearly seen that a student's performance is affected by three factors - library available, coaching joined, and parent's education. A decision tree can be constructed using the information of these three variables for the prediction of student's performance and hence are called predictor variables. The variables with more information are considered a better splitter of the decision tree.

So to calculate the entropy of parent node - Student's performance, the above entropy formula is used but probability needs to be calculated first.

There are four values in the Student's performance column out of which two performances are good and two are bad.

P_{good} = \frac{\text {good performance in sample}}{\text {total performance in sample}} = \frac{1}{4} = 0.25 \newline \newline P_{bad} = \frac{\text {bad performance in sample}}{\text {total performance in sample}} = \frac{3}{4} = 0.75

Hence, total entropy of parent can be calculated as below

Entropy = - \sum p(x) \log p(x) = - (\sum p_{good} \log_{2}{P_{good}}+ \sum p_{bad} \log_{2}{P_{bad}}) = - (0.25 \log_{2}{0.25} + 0.75 \log_{2}{0.75}) = 0.811

Information Gain using Entropy

Information gain is a parameter used to decide the best variable available for splitting the data at every node in the decision tree. So IG of every predictor variable can be calculated and the variable with the highest IG wins the race of deciding factor for splitting of root nodes.

Formula:
Information Gain(IG) = Entropyparent - (weighted average * Entropychildren)

Now to calculate IG of the predictor variable coaching joined, firstly split the parent node according to this variable.

Now there are two parts and their entropy is first calculated individually.

The entropy of the left part

There are two types of output available - good and bad. On the left part, there are three total outcomes with two being bad and one being good. Hence, Pgood and Pbad is calculated again as follows:

P_{good} = \frac {1}{3}= 0.334 \newline \newline P_{bad} = \frac {2}{3}= 0.667 \newline \newline Entropy_{left} = -(0.667 \log_2{0.667} + 0.334 \log_2{0.334}) = 0.9

The entropy of the right part


There is only one component in right, ie, bad performance. Hence, the probability becomes one. And the entropy becomes 0 because there is only one category to which output can belong to. 
 

Calculating the weighted average with Entropy of children

\text{weighted average} \times Entropy_{children} = \frac{\text{no. of outcomes in left child node}}{\text{total outcomes in parent node}} \times {entropy_\text{left node}} + \frac{\text{no. of outcomes in right child node}} {\text{total outcomes in parent node}} \times {entropy_{\text{right node}}}

There are 3 outcomes in left child node and 1 in the right node. While, Entropyleft node has been calculated as 0.9 and Entropyright node is 0. Now keeping the values in the formula above we get a weighted average for this example:
 

\text{weighted average} \times Entropy_{children} = \frac{3}{4} \times 0.9 + \frac{1} {4} \times 0 = 0.675

Calculating IG


Now putting the calculated weighted average in IG formula simply to obtain IG of 'coaching joined'. 
 

IG(coaching joined) = Entropyparent - (weighted average * Entropychildren)  IG(coaching joined) = 0.811 - 0.675 = 0.136


Using the same steps and formula IG of other predictor variables is calculated, compared, and variable with the highest IG is therefore selected for splitting the data at every node.
 


Next Article
Tree Entropy in R Programming
https://media.geeksforgeeks.org/auth/avatar.png
GeeksforGeeks
Improve
Article Tags :
  • R Language
  • R-Statistics

Similar Reads

    Decision Tree in R Programming
    In this article, we’ll explore how to implement decision trees in R, covering key concepts, step-by-step examples, and tuning strategies.A decision tree is a flowchart-like model where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, an
    3 min read
    Control Statements in R Programming
    Control statements are expressions used to control the execution and flow of the program based on the conditions provided in the statements. These structures are used to make a decision after assessing the variable. In this article, we'll discuss all the control statements with the examples. In R pr
    4 min read
    How to Code in R programming?
    R is a powerful programming language and environment for statistical computing and graphics. Whether you're a data scientist, statistician, researcher, or enthusiast, learning R programming opens up a world of possibilities for data analysis, visualization, and modeling. This comprehensive guide aim
    4 min read
    Hello World in R Programming
    When we start to learn any programming languages we do follow a tradition to begin HelloWorld as our first basic program. Here we are going to learn that tradition. An interesting thing about R programming is that we can get our things done with very little code. Before we start to learn to code, le
    2 min read
    Data Structures in R Programming
    A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. R’s base data structures are often organized by
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences