Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R
  • Data Types
  • String
  • Array
  • Vector
  • Lists
  • Matrices
  • Oops in R
Open In App
Next Article:
Negative Binomial Distribution using rnbinom in R
Next article icon

Negative Binomial Distribution using rnbinom in R

Last Updated : 19 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

This article will cover the theory behind the Negative Binomial Distribution, how to use rnbinom() in R, and provide examples of generating random numbers, visualizing the distribution, and fitting it to real-world data using R Programming Language.

Negative Binomial Distribution

The Negative Binomial Distribution is a probability distribution used for modeling count data where the variance exceeds the mean, known as overdispersion. This distribution is particularly useful for modeling the number of failures before a specified number of successes in a sequence of independent Bernoulli trials. In R, the function rnbinom() is used to generate random numbers following the Negative Binomial Distribution.

rnbinom() in R

The rnbinom() function generates random numbers following the Negative Binomial Distribution. The syntax of rnbinom is as follows:

rnbinom(n, size, prob)

Where,

  • n: Number of observations to generate.
  • size: The number of successes (the parameter r).
  • prob: The probability of success in each trial (the parameter p).

Example 1: Generate Random Numbers Using rnbinom()

Let’s generate 1000 random numbers from a Negative Binomial Distribution with 5 successes and a success probability of 0.3 using rnbinom.

R
# Set seed for reproducibility set.seed(123)  # Generate random numbers from Negative Binomial Distribution neg_binom_data <- rnbinom(n = 1000, size = 5, prob = 0.3)  # Display the first few numbers head(neg_binom_data) 

Output:

[1] 11 19 16  8  6 22

Example 2: Visualizing the Negative Binomial Distribution

We can visualize the generated data using a histogram to see the shape of the distribution.

R
# Load necessary library library(ggplot2)  # Create a histogram ggplot(data = data.frame(x = neg_binom_data), aes(x = x)) +   geom_histogram(binwidth = 1, fill = "blue", color = "black") +   labs(title = "Histogram of Negative Binomial Distribution",         x = "Number of Failures",         y = "Frequency") +   theme_minimal() 

Output:

gh
Visualizing the Negative Binomial Distribution

The histogram shows the distribution of the number of failures before achieving the specified number of successes. The shape of the distribution is skewed to the right, typical of count data with a low probability of success.

Example 3: Fitting a Negative Binomial Model to Real Data using rnbinom

In real-world scenarios, the Negative Binomial Distribution is often used to model overdispersed count data. Let’s simulate some overdispersed data and fit a Negative Binomial model using the MASS package.

R
# Load the MASS package for the glm.nb function library(MASS)  # Simulate overdispersed data set.seed(456) x <- rnorm(100) y <- rnbinom(100, mu = exp(1 + 0.5 * x), size = 2)  # Fit a Negative Binomial model to the data nb_model <- glm.nb(y ~ x)  # Summarize the model summary(nb_model) 

Output:

Call:
glm.nb(formula = y ~ x, init.theta = 2.653492838, link = log)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.01214 0.09002 11.244 < 2e-16 ***
x 0.48632 0.08750 5.558 2.73e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(2.6535) family taken to be 1)

Null deviance: 144.79 on 99 degrees of freedom
Residual deviance: 111.86 on 98 degrees of freedom
AIC: 439.23

Number of Fisher Scoring iterations: 1


Theta: 2.653
Std. Err.: 0.742

2 x log-likelihood: -433.227
  • The glm.nb() function is used to fit a Negative Binomial regression model.
  • In this example, we simulate overdispersed count data using rnbinom() and fit the model to the data using a linear predictor involving x.
  • The summary of the model will provide information on the significance of the predictors and the model fit.

Example 4: Comparing Poisson and Negative Binomial Models

In practice, you may want to compare the Poisson and Negative Binomial models to assess which fits better. This is done using the Akaike Information Criterion (AIC).

R
# Fit a Poisson model poisson_model <- glm(y ~ x, family = "poisson")  # Compare AIC values aic_values <- AIC(poisson_model, nb_model) print(aic_values) 

Output:

              df      AIC
poisson_model 2 480.6543
nb_model 3 439.2270
  • Poisson model: The Poisson model assumes that the mean equals the variance.
  • Negative Binomial model: The Negative Binomial model accounts for overdispersion.

The AIC values help in model comparison. The model with the lower AIC value is preferred.

Conclusion

The Negative Binomial Distribution is an important tool for modeling overdispersed count data, where the variance is larger than the mean. In R, you can use the rnbinom() function to generate random numbers from this distribution, and the glm.nb() function from the MASS package to fit models. Understanding when to use the Negative Binomial Distribution and how to implement it in R can greatly improve the analysis of count data in fields such as epidemiology, ecology, and social sciences.


Next Article
Negative Binomial Distribution using rnbinom in R

N

nidhi_biet
Improve
Article Tags :
  • R Language
  • R Statistics

Similar Reads

    Binomial Distribution in R Programming
    Binomial distribution in R is a probability distribution used in statistics. The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. All its trials are independent, the probability of success remains the same and the previous outcome does not affect th
    3 min read
    Compute the Negative Binomial Density in R Programming - dnbinom() Function
    dnbinom() function in R Language is used to compute the value of negative binomial density. It also creates a plot of the negative binomial density. Syntax: dnbinom(vec, size, prob) Parameters: vec: x-values for binomial density size: Number of trials prob: Probability Example 1: Python3 1== # R pro
    1 min read
    Compute the Negative Binomial Cumulative Density in R Programming - pnbinom() Function
    pnbinom() function in R Language is used to compute the value of negative binomial cumulative density. It also creates a density plot of the negative binomial cumulative distribution. Syntax: pnbinom(vec, size, prob) Parameters: vec: x-values for binomial density size: Number of trials prob: Probabi
    1 min read
    Inverse t-distribution in R
    In probability and statistics, Student's t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions. It comes to the picture when estimating the mean of a normally distributed population, where the sample size is small and the standard deviation (
    3 min read
    Simulate Bivariate and Multivariate Normal Distribution in R
    In this article, we will learn how to simulate Bivariate and Multivariate Normal distribution in the R Programming Language. To simulate a Multivariate Normal Distribution in the R Language, we use the mvrnorm() function of the MASS package library. The mvrnorm() function is used to generate a multi
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences