Skip to content

Naive Bayes Classifiers

Naive Bayes Classifiers

Last Updated : 21 May, 2025

Naive Bayes is a classification algorithm that uses probability to predict which category a data point belongs to, assuming that all features are unrelated. This article will give you an overview as well as more advanced use and implementation of Naive Bayes in machine learning.

Illustration behind the Naive Bayes algorithm. We estimate P(x_α|y) independently in each dimension (middle two images) and then obtain an estimate of the full data distribution by assuming conditional independence P(x|y)=∏_αP(x_α|y)(very right image).

Key Features of Naive Bayes Classifiers

The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the probabilities of different classes given the features of the data. It is used mostly in high-dimensional text classification

The Naive Bayes Classifier is a simple probabilistic classifier and it has very few number of parameters which are used to build the ML models that can predict at a faster speed than other classification algorithms.
It is a probabilistic classifier because it assumes that one feature in the model is independent of existence of another feature. In other words, each feature contributes to the predictions with no relation between each other.
Naïve Bayes Algorithm is used in spam filtration, Sentimental analysis, classifying articles and many more.

Why it is Called Naive Bayes?

It is named as "Naive" because it assumes the presence of one feature does not affect other features. The "Bayes" part of the name refers to its basis in Bayes’ Theorem.

Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing golf. Here is a tabular representation of our dataset.

	Outlook	Temperature	Humidity	Windy	Play Golf
0	Rainy	Hot	High	False	No
1	Rainy	Hot	High	True	No
2	Overcast	Hot	High	False	Yes
3	Sunny	Mild	High	False	Yes
4	Sunny	Cool	Normal	False	Yes
5	Sunny	Cool	Normal	True	No
6	Overcast	Cool	Normal	True	Yes
7	Rainy	Mild	High	False	No
8	Rainy	Cool	Normal	False	Yes
9	Sunny	Mild	Normal	False	Yes
10	Rainy	Mild	Normal	True	Yes
11	Overcast	Mild	High	True	Yes
12	Overcast	Hot	Normal	False	Yes
13	Sunny	Mild	High	True	No

The dataset is divided into two parts, namely, feature matrix and the response vector.

Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
Response vector contains the value of class variable(prediction or output) for each row of feature matrix. In above dataset, the class variable name is ‘Play golf’.

Assumption of Naive Bayes

The fundamental Naive Bayes assumption is that each feature makes an:

Feature independence: This means that when we are trying to classify something, we assume that each feature (or piece of information) in the data does not affect any other feature.
Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.
Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.
Features are equally important: All features are assumed to contribute equally to the prediction of the class label.
No missing data: The data should not contain any missing values.

Introduction to Bayes' Theorem

Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:

P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}

Where:

P(y|X): Posterior probability, probability of class y given features X
P(X|y): Likelihood, probability of features X given class y
P(y): Prior probability of class y
P(X): Marginal likelihood or evidence

Naive Bayes Working

1. Terminology

Consider a classification problem (like predicting if someone plays golf based on weather). Then:

y is the class label (e.g. "Yes" or "No" for playing golf)
X = (x_1, x_2, ..., x_n) is the feature vector (e.g. Outlook, Temperature, Humidity, Wind)

A sample row from the dataset:

X = \text{(Rainy, Hot, High, False)}, \quad y = \text{No}

This represents:

What is the probability that someone will not play golf given that the weather is Rainy, Hot, High humidity, and No wind?

2. The Naive Assumption

The "naive" in Naive Bayes comes from the assumption that all features are independent given the class. That is:

P(x_1, x_2, ..., x_n | y) = P(x_1 | y) \cdot P(x_2 | y) \cdots P(x_n | y)

Thus, Bayes' theorem becomes:

P(y|x_1, ..., x_n) = \frac{P(y) \cdot \prod_{i=1}^{n} P(x_i | y)}{P(x_1)P(x_2)...P(x_n)}

Since the denominator is constant for a given input, we can write:

P(y|x_1, ..., x_n) \propto P(y) \cdot \prod_{i=1}^{n} P(x_i | y)

3. Constructing the Naive Bayes Classifier

We compute the posterior for each class y and choose the class with the highest probability:

\hat{y} = \arg\max_{y} P(y) \cdot \prod_{i=1}^{n} P(x_i | y)

This becomes our Naive Bayes classifier.

4. Example: Weather Dataset

Let’s take a dataset used for predicting if golf is played based on:

Outlook: Sunny, Rainy, Overcast
Temperature: Hot, Mild, Cool
Humidity: High, Normal
Wind: True, False

NaiveBayesExample — Example Tables for Naive Bayes

Example Input: X = (Sunny, Hot, Normal, False)

Goal: Predict if golf will be played (Yes or No).

5. Pre-computation from Dataset

Class Probabilities:

From dataset of 14 rows:

P(\text{Yes}) = \frac{9}{14}
P(\text{No}) = \frac{5}{14}

Conditional Probabilities (Tables 1–4):

Feature	Value	P (Value \| Yes)	P (Value \| No)
Outlook	Sunny	2/9	3/5
Temperature	Hot	2/9	2/5
Humidity	Normal	6/9	1/5
Wind	False	6/9	2/5

6. Calculate Posterior Probabilities

For Class = Yes:

P(\text{Yes | today}) \propto \frac{2}{9} \cdot \frac{2}{9} \cdot \frac{6}{9} \cdot \frac{6}{9} \cdot \frac{9}{14}
P(\text{Yes | today}) \approx 0.02116

For Class = No:

P(\text{No | today}) \propto \frac{3}{5} \cdot \frac{2}{5} \cdot \frac{1}{5} \cdot \frac{2}{5} \cdot \frac{5}{14}
P(\text{No | today}) \approx 0.0068

7. Normalize Probabilities

To compare:

P(\text{Yes | today}) = \frac{0.02116}{0.02116 + 0.0068} \approx 0.756
P(\text{No | today}) = \frac{0.0068}{0.02116 + 0.0068} \approx 0.244

8. Final Prediction

Since:

P(\text{Yes | today}) > P(\text{No | today})
The model predicts: Yes (Play Golf)

Naive Bayes for Continuous Features

For continuous features, we assume a Gaussian distribution:

P(x_i | y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left( -\frac{(x_i - \mu_y)^2}{2\sigma^2_y} \right)

Where:

\mu_y is the mean of feature x_i for class y
\sigma^2_y is the variance of feature x_i for class y

This leads to what is called Gaussian Naive Bayes.

Types of Naive Bayes Model

There are three types of Naive Bayes Model :

1. Gaussian Naive Bayes

In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:

2. Multinomial Naive Bayes

Multinomial Naive Bayesis used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.

3. Bernoulli Naive Bayes

Bernoulli Naive Bayes deals with binary features, where each feature indicates whether a word appears or not in a document. It is suited for scenarios where the presence or absence of terms is more relevant than their frequency. Both models are widely used in document classification tasks

Advantages of Naive Bayes Classifier

Easy to implement and computationally efficient.
Effective in cases with a large number of features.
Performs well even with limited training data.
It performs well in the presence of categorical features.
For numerical features data is assumed to come from normal distributions

Disadvantages of Naive Bayes Classifier

Assumes that features are independent, which may not always hold in real-world data.
Can be influenced by irrelevant attributes.
May assign zero probability to unseen events, leading to poor generalization.

Applications of Naive Bayes Classifier

Spam Email Filtering: Classifies emails as spam or non-spam based on features.
Text Classification: Used in sentiment analysis, document categorization, and topic classification.
Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms.
Credit Scoring: Evaluates creditworthiness of individuals for loan approval.
Weather Prediction: Classifies weather conditions based on various factors.

Naive Bayes Classifiers

K

kartik

Improve

Article Tags :

Practice Tags :

Similar Reads

Passive Aggressive Classifiers

The Passive-Aggressive algorithms are a family of Machine learning algorithms that are not very well known by beginners and even intermediate Machine Learning enthusiasts. However, they can be very useful and efficient for certain applications. Note: This is a high-level overview of the algorithm ex

Gaussian Naive Bayes

Gaussian Naive Bayes is a type of Naive Bayes method working on continuous attributes and the data features that follows Gaussian distribution throughout the dataset. This â€œnaiveâ€ assumption simplifies calculations and makes the model fast and efficient. Gaussian Naive Bayes is widely used because i

Naive Bayes Classifier in R Programming

Naive Bayes is a Supervised Non-linear classification algorithm in R Programming. Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Baye's theorem with strong(Naive) independence assumptions between the features or variables. The Naive Bayes algorithm is call

Multinomial Naive Bayes Classifier in R

The Multinomial Naive Bayes (MNB) classifier is a popular machine learning algorithm, especially useful for text classification tasks such as spam detection, sentiment analysis, and document categorization. In this article, we discuss about the basics of the MNB classifier and how to implement it in

Ridge Classifier

Supervised Learning is the type of Machine Learning that uses labelled data to train the model. Both Regression and Classification belong to the category of Supervised Learning. Regression: This is used to predict a continuous range of values using one or more features. These features act as the ind