Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Implementing Apriori algorithm in Python
Next article icon

Complement Naive Bayes (CNB) Algorithm

Last Updated : 10 Apr, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report
Naive Bayes algorithms are a group of very popular and commonly used Machine Learning algorithms used for classification. There are many different ways the Naive Bayes algorithm is implemented like Gaussian Naive Bayes, Multinomial Naive Bayes, etc. To learn more about the basics of Naive Bayes, you can follow this link. Complement Naive Bayes is somewhat an adaptation of the standard Multinomial Naive Bayes algorithm. Multinomial Naive Bayes does not perform very well on imbalanced datasets. Imbalanced datasets are datasets where the number of examples of some class is higher than the number of examples belonging to other classes. This means that the distribution of examples is not uniform. This type of dataset can be difficult to work with as a model may easily overfit this data in favor of the class with more number of examples. How CNB works: Complement Naive Bayes is particularly suited to work with imbalanced datasets. In complement Naive Bayes, instead of calculating the probability of an item belonging to a certain class, we calculate the probability of the item belonging to all the classes. This is the literal meaning of the word, complement and hence is called Complement Naive Bayes. A step-by-step high-level overview of the algorithm (without any involved mathematics):
  • For each class calculate the probability of the given instance not belonging to it.
  • After calculation for all the classes, we check all the calculated values and select the smallest value.
  • The smallest value (lowest probability) is selected because it is the lowest probability that it is NOT that particular class. This implies that it has the highest probability to actually belong to that class. So this class is selected.
Note: We don’t select the one with the highest value because we are calculating the complement of the probability. The one with the highest value is least likely to be the class that item belongs to. Now, let us consider an example: Say, we have two classes: Apples and Bananas and we have to classify whether a given sentence is related to apples or bananas, given the frequency of a certain number of words. Here is a tabular representation of the simple dataset:
Sentence NumberRoundRedLongYellowSoftClass
121100Apples
211095Bananas
321001Apples
Total word count in class ‘Apples’ = (2+1+1) + (2+1+1) = 8 Total word count in class ‘Bananas’ = (1 + 1 + 9 + 5) = 16 So, the Probability of a sentence to belong to the class, ‘Apples’,   [Tex]\Large p(y = Apples) = {2 \over 3}[/Tex] Similarly, the probability of a sentence to belong to the class, ‘Bananas’, [Tex]\Large p(y = Bananas) = {1 \over 3}[/Tex] In the above table, we have represented a dataset where the columns signify the frequency of words in a given sentence and then shows which class the sentence belongs to. Before we begin, you must first know about Bayes’ Theorem. Bayes’ Theorem is used to find the probability of an event, given that another event occurs. The formula is : [Tex]\Large P(A \mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}[/Tex] where A and B are events, P(A) is the probability of occurrence of A, and P(A|B) is the probability of A to occur given that event B has already occurred. P(B), the probability of event B occurring cannot be 0 since it has already occurred. If you want to learn more about regular Naive Bayes and Bayes Theorem, you can follow this link. Now let us see how Naive Bayes and Complement Naive Bayes work. The regular Naive Bayes algorithm is, [Tex]argmax \ p(y) \bullet \prod \frac{1}{p(\omega |y\acute{})^{f_{i}}} [/Tex] where fi is the frequency of some attribute. For example, the number of times certain words occur in a sentence. However, in complement naive Bayes, the formula is : [Tex]\Large argmin \ p(y) \bullet \prod {1 \over p(w | \hat y)^{f_i}} [/Tex] If you take a closer look at the formulae, you will see that complement Naive Bayes is just the inverse of the regular Naive Bayes. In Naive Bayes, the class with the largest value obtained from the formula is the predicted class. So, since Complement Naive Bayes is just the inverse, the class with the smallest value obtained from the CNB formula is the predicted class. Now, let us take an example and try to predict it using our dataset and CNB,
RoundRedLongYellowSoftClass
11001?
So, we need to find, [Tex]\Large p(y = Apples|w_1 = Round, w_2 = Red, w_3 = Soft) [/Tex] and [Tex]\Large p(y = Bananas|w_1 = Round, w_2 = Red, w_3 = Soft)[/Tex] We need to compare both the values and select the class as the predicted class as the one with the smaller value. We have to do this also for bananas and pick the one with the smallest value. i.e., if the value for (y = Apples) is smaller, the class is predicted as Apples, and if the value for (y = Bananas) is smaller, the class is predicted as Bananas.  Using the Complement Naive Bayes Formula for both the classes, [Tex]\Large p(y=Apples|w_1 = Round, w_2 = Red, w_3 = Soft) = {2 \over 3} \bullet {1 \over { {1 \over 16}^{1} \bullet {5 \over 16}^{1} \bullet {1 \over 16}^{1} } } \approx 6.302 [/Tex] [Tex]\Large p(y=Bananas|w_1 = Round, w_2 = Red, w_3 = Soft) = {1 \over 3} \bullet {1 \over { {1 \over 8}^{1} \bullet {1 \over 8}^{1} \bullet {2 \over 8}^{1} } } \approx 85.333 [/Tex] Now, since 6.302 < 85.333, the predicted class is Apples. We DON’T use the class with a higher value because a higher value means that it is more likely that a sentence with those words does NOT belong to the class. This is exactly why this algorithm is called Complement Naive Bayes. When to use CNB?
  • When the dataset is imbalanced: If the dataset on which classification is to be done is imbalanced, Multinomial and Gaussian Naive Bayes may give a low accuracy. However, Complement Naive Bayes will perform quite well and will give relatively higher accuracy.
  • For text classification tasks: Complement Naive Bayes outperforms both Gaussian Naive Bayes and Multinomial Naive Bayes in text classification tasks.
Implementation of CNB in Python: For this example, we will use the wine dataset which is slightly imbalanced. It determines the origin of wine from various chemical parameters. To know more about this dataset, you can check this link. To evaluate our model, we will check the accuracy of the test set and the classification report of the classifier. We will use the scikit-learn library to implement the Complement Naive Bayes algorithm. Code:
# Import required modules
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.naive_bayes import ComplementNB
  
# Loading the dataset 
dataset = load_wine()
X = dataset.data
y = dataset.target
  
# Splitting the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.15, random_state = 42)
  
# Creating and training the Complement Naive Bayes Classifier
classifier = ComplementNB()
classifier.fit(X_train, y_train)
  
# Evaluating the classifier
prediction = classifier.predict(X_test)
prediction_train = classifier.predict(X_train)
  
print(f"Training Set Accuracy : {accuracy_score(y_train, prediction_train) * 100} %\n")
print(f"Test Set Accuracy : {accuracy_score(y_test, prediction) * 100} % \n\n")
print(f"Classifier Report : \n\n {classification_report(y_test, prediction)}")
                      
                       
OUTPUT

Training Set Accuracy : 65.56291390728477 % Test Set Accuracy : 66.66666666666666 % Classifier Report : precision recall f1-score support 0 0.64 1.00 0.78 9 1 0.67 0.73 0.70 11 2 1.00 0.14 0.25 7 accuracy 0.67 27 macro avg 0.77 0.62 0.58 27 weighted avg 0.75 0.67 0.61 27

We get an accuracy of 65.56% on the training set and an accuracy of 66.66% on the test set. They are pretty much the same and are actually quite good given the quality of the dataset. This dataset is notorious for being difficult to classify with simple classifiers like the one we have used here. So the accuracy is acceptable. Conclusion: Now that you know what Complement Naive Bayes classifiers are and how they work, next time you come across an unbalanced dataset, you can try using Complement Naive Bayes. References:
  • scikit-learn documentation.


Next Article
Implementing Apriori algorithm in Python
author
alokesh985
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • python
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • Implementing Apriori algorithm in Python
    Prerequisites: Apriori AlgorithmApriori Algorithm is a Machine Learning algorithm which is used to gain insight into the structured relationships between different items involved. The most prominent practical application of the algorithm is to recommend products based on the products already present
    4 min read
  • Learn-One-Rule Algorithm
    Prerequisite: Rule-Based Classifier Learn-One-Rule: This method is used in the sequential learning algorithm for learning the rules. It returns a single rule that covers at least some examples (as shown in Fig 1). However, what makes it really powerful is its ability to create relations among the at
    3 min read
  • Tree Based Machine Learning Algorithms
    Tree-based algorithms are a fundamental component of machine learning, offering intuitive decision-making processes akin to human reasoning. These algorithms construct decision trees, where each branch represents a decision based on features, ultimately leading to a prediction or classification. By
    14 min read
  • Simple Genetic Algorithm (SGA)
    Prerequisite - Genetic Algorithm Introduction : Simple Genetic Algorithm (SGA) is one of the three types of strategies followed in Genetic algorithm. SGA starts with the creation of an initial population of size N.Then, we evaluate the goodness/fitness of each of the solutions/individuals. After tha
    1 min read
  • ML | ECLAT Algorithm
    ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular and efficient technique used for association rule mining. It is an improved alternative to the Apriori algorithm, offering better scalability and computational efficiency. Unlike Apriori, which follows a hori
    3 min read
  • ML | Find S Algorithm
    Introduction : The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here that the algorithm considers only those positive training example. The find-S algorithm sta
    4 min read
  • Sequential Covering Algorithm
    Prerequisites: Learn-One-Rule Algorithm Sequential Covering is a popular algorithm based on Rule-Based Classification used for learning a disjunctive set of rules. The basic idea here is to learn one rule, remove the data that it covers, then repeat the same process. In this process, In this way, it
    3 min read
  • Multinomial Naive Bayes
    Multinomial Naive Bayes is one of the variation of Naive Bayes algorithm. A classification algorithm based on Bayes' Theorem ideal for discrete data and is typically used in text classification problems. It models the frequency of words as counts and assumes each feature or word is multinomially dis
    6 min read
  • Bernoulli Naive Bayes
    Bernoulli Naive Bayes is a subcategory of the Naive Bayes Algorithm. It is typically used when the data is binary and it models the occurrence of features using Bernoulli distribution. It is used for the classification of binary features such as 'Yes' or 'No', '1' or '0', 'True' or 'False' etc. Here
    5 min read
  • Perceptron Algorithm for Classification using Sklearn
    Assigning a label or category to an input based on its features is the fundamental task of classification in machine learning. One of the earliest and most straightforward machine learning techniques for binary classification is the perceptron. It serves as the framework for more sophisticated neura
    11 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences