Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Bernoulli Naive Bayes
Next article icon

Bernoulli Naive Bayes

Last Updated : 30 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Bernoulli Naive Bayes is a subcategory of the Naive Bayes Algorithm. It is typically used when the data is binary and it models the occurrence of features using Bernoulli distribution. It is used for the classification of binary features such as 'Yes' or 'No', '1' or '0', 'True' or 'False' etc. Here it is to be noted that the features are independent of one another. In this article we will be discussing more about it.

Mathematics Behind Bernoulli Naive Bayes

The core of Bernoulli Naive Bayes is based on Bayes' Theorem which helps in calculating the conditional probability of a given class y given some data x = (x_1, x_2, ..., x_n). Now in the Bernoulli Naive Bayes model we assume that each feature is conditionally independent given the class y. This means that we can calculate the likelihood of each feature occurring as:

p(x_i|y)=p(i|y)x_i+(1-p(i|y))(1-x_i)

  • Here, p(x_i |y) is the conditional probability of xi occurring provided y has occurred.
  • i is the event
  • x_i holds binary value either 0 or 1

Now we will learn Bernoulli distribution as Bernoulli Naive Bayes works on that.

Bernoulli distribution

Bernoulli distribution is used for discrete probability calculation. It either calculates success or failure. Here the random variable is either 1 or 0 whose chance of occurring is either denoted by p or (1-p) respectively.

The mathematical formula is given

f(x)=\begin{cases} p^x*(1-p)^{1-x} & \text{if x=0,1} \\ 0 \; otherwise\\ \end{cases}

Now in the above function if we put x=1 then the value of f(x) is p and if we put x=0 then the value of f(x) is 1-p. Here p denotes the success of an event.

Example:

To understand how Bernoulli Naive Bayes works, here's a simple binary classification problem.

Message ID

Message Text

Class

M1

"buy cheap now"

Spam

M2

"limited offer buy"

Spam

M3

"meet me now"

Not Spam

M4

"let's catch up"

Not Spam

1. Vocabulary

Extract all unique words from the training data:

\text{Vocabulary} = \{\text{buy, cheap, now, limited, offer, meet, me, let's, catch, up}\}

Vocabulary size V = 10

2. Binary Feature Matrix (Presence = 1, Absence = 0)

Each message is represented using binary features indicating the presence (1) or absence (0) of a word.

ID

buy

cheap

now

limited

offer

meet

me

let's

catch

up

Class

M1

1

1

1

0

0

0

0

0

0

0

Spam

M2

1

0

0

1

1

0

0

0

0

0

Spam

M3

0

0

1

0

0

1

1

0

0

0

Not Spam

M4

0

0

0

0

0

0

0

1

1

1

Not Spam

3. Apply Laplace Smoothing

P(w_i = 1 \mid C) = \frac{\text{count}(w_i, C) + 1}{N_C + 2}

where N_C = 2 for both classes (2 documents per class), so the denominator becomes 4.

4. Word Probabilities

For Spam class:

  • P(\text{buy} \mid \text{Spam}) = \frac{2+1}{4} = 0.75
  • P(\text{cheap} \mid \text{Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{now} \mid \text{Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{limited} \mid \text{Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{offer} \mid \text{Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{others} \mid \text{Spam}) = \frac{0+1}{4} = 0.25

For Not Spam class:

  • P(\text{now} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{meet} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{me} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{let's} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{catch} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{up} \mid \text{Not Spam}) = \frac{1+1}{4} = 0.5
  • P(\text{others} \mid \text{Not Spam}) = \frac{0+1}{4} = 0.25

5. Classify Message "buy now"

The message contains words "buy" and "now, so the feature vector is:

\text{buy}=1, \quad \text{now}=1, \quad \text{others}=0

  • For Spam:

P(\text{Spam} \mid d) \propto P(\text{Spam}) \cdot P(\text{buy}=1 \mid \text{Spam}) \cdot P(\text{now}=1 \mid \text{Spam}) = 0.5 \cdot 0.75 \cdot 0.5 = 0.1875

  • For Not Spam:

P(\text{Not Spam} \mid d) \propto P(\text{Not Spam}) \cdot P(\text{buy}=1 \mid \text{Not Spam}) \cdot P(\text{now}=1 \mid \text{Not Spam}) = 0.5 \cdot 0.25 \cdot 0.5 = 0.0625

5. Final Classification

P(\text{Spam} \mid d) = 0.1875,\quad P(\text{Not Spam} \mid d) = 0.0625

Since P(\text{Spam} \mid d) > P(\text{Not Spam} \mid d), the message is classified as: \boxed{\text{Spam}}

Implementing Bernoulli Naive Bayes

For performing classification using Bernoulli Naive Bayes we have considered an email dataset.

The email dataset comprises of four columns named Unnamed: 0, label, label_num and text. The category of label is either ham or spam. For ham the number assigned is 0 and for spam 1 is assigned. Text comprises the body of the mail. The length of the dataset is 5171.

The dataset can be downloaded from here.

1. Importing Libraries

In the code we have imported necessary libraries like pandas, numpy and sklearn. Bernoulli Naive Bayes is a part of sklearn package.

Python
import numpy as np import pandas as pd from sklearn.naive_bayes import BernoulliNB from sklearn.feature_extraction.text import CountVectorizer 

2. Data Analysis

In this code we have performed a quick data analysis that includes reading the data, dropping unnecessary columns, printing shape of data, information about dataset etc.

Python
df=pd.read_csv("spam_ham_dataset.csv") print(df.shape) print(df.columns) df= df.drop(['Unnamed: 0'], axis=1) 

Output:

(5171, 4) Index(['Unnamed: 0', 'label', 'text', 'label_num'], dtype='object')

3. Count Vectorizer

In the code since text data is used to train our classifier we convert the text into a matrix comprising numbers using Count Vectorizer so that the model can perform well.

Python
x = df["text"].values y = df["label_num"].values  cv = CountVectorizer()  x = cv.fit_transform(x) 

4. Data Splitting, Model Training and Prediction

Python
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=0)\  bnb = BernoulliNB(binarize=0.0) model = bnb.fit(X_train, y_train) y_pred = bnb.predict(X_test)  from sklearn.metrics import classification_report print(classification_report(y_test, y_pred)) 

Output:

clfr
classification_report

The classification report shows that for class 0 (not spam) precision, recall and F1 score are 0.84, 0.98 and 0.91 respectively. For class 1 (spam) they are 0.92, 0.56 and 0.70. The recall for class 1 drops due to the 13% spam data. The overall accuracy of the model is 86%, which is good.

Bernoulli Naive Bayes is used for spam detection, text classification, Sentiment Analysis and used to determine whether a certain word is present in a document or not.

Difference Between Different Naive Bayes Model

AspectGaussian Naive BayesMultinomial Naive BayesBernoulli Naive Bayes
Feature TypeContinuous (real-valued features)Discrete (count data or frequency-based features)Binary (presence or absence of features)
AssumptionAssumes data follows a Gaussian (normal) distributionAssumes data follows a multinomial distributionAssumes data follows a Bernoulli (binary) distribution
Common Use CaseSuitable for continuous features like height, weight, etc.Suitable for text classification (word counts)Suitable for binary classification tasks (e.g., spam detection)
Data RepresentationFeatures are treated as continuous variablesFeatures are treated as discrete counts or frequenciesFeatures are treated as binary (0 or 1) values
Mathematical ModelUses Gaussian distribution (mean and variance) for each featureUses the multinomial distribution for word counts in text classificationUses Bernoulli distribution (probability of a feature being present)
ExamplePredicting whether an email is spam based on numeric featuresPredicting whether a document is spam based on word countsClassifying a document as spam or not based on word presence

Here is the quick comparison between types of Naive Bayes that are Gaussian Naive Bayes, Multinomial Naive Bayes and Bernoulli Naive Bayes

Bernoulli Naive Bayes is a simple yet effective for binary classification tasks. Its efficiency in handling binary data makes it suitable for applications like spam detection, sentiment analysis and many more. Its simplicity and speed makes it suitable for real-time classification problems.


Next Article
Bernoulli Naive Bayes

J

jhimlic1
Improve
Article Tags :
  • Machine Learning
  • Geeks Premier League
  • AI-ML-DS
  • Python scikit-module
  • ML-Classification
  • Geeks Premier League 2023
  • AI-ML-DS With Python
Practice Tags :
  • Machine Learning

Similar Reads

    Multinomial Naive Bayes
    Multinomial Naive Bayes is one of the variation of Naive Bayes algorithm. A classification algorithm based on Bayes' Theorem ideal for discrete data and is typically used in text classification problems. It models the frequency of words as counts and assumes each feature or word is multinomially dis
    7 min read
    Gaussian Naive Bayes
    Gaussian Naive Bayes is a type of Naive Bayes method working on continuous attributes and the data features that follows Gaussian distribution throughout the dataset. This “naive” assumption simplifies calculations and makes the model fast and efficient. Gaussian Naive Bayes is widely used because i
    6 min read
    Naive Bayes Classifiers
    Naive Bayes is a classification algorithm that uses probability to predict which category a data point belongs to, assuming that all features are unrelated. This article will give you an overview as well as more advanced use and implementation of Naive Bayes in machine learning. Illustration behind
    7 min read
    Multinomial Naive Bayes Classifier in R
    The Multinomial Naive Bayes (MNB) classifier is a popular machine learning algorithm, especially useful for text classification tasks such as spam detection, sentiment analysis, and document categorization. In this article, we discuss about the basics of the MNB classifier and how to implement it in
    6 min read
    Complement Naive Bayes (CNB) Algorithm
    Naive Bayes algorithms are a group of very popular and commonly used Machine Learning algorithms used for classification. There are many different ways the Naive Bayes algorithm is implemented like Gaussian Naive Bayes, Multinomial Naive Bayes, etc. To learn more about the basics of Naive Bayes, you
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences