Negative Binomial Distribution using rnbinom in R
Last Updated : 19 Sep, 2024
This article will cover the theory behind the Negative Binomial Distribution, how to use rnbinom()
in R, and provide examples of generating random numbers, visualizing the distribution, and fitting it to real-world data using R Programming Language.
Negative Binomial Distribution
The Negative Binomial Distribution is a probability distribution used for modeling count data where the variance exceeds the mean, known as overdispersion. This distribution is particularly useful for modeling the number of failures before a specified number of successes in a sequence of independent Bernoulli trials. In R, the function rnbinom()
is used to generate random numbers following the Negative Binomial Distribution.
rnbinom()
in R
The rnbinom()
function generates random numbers following the Negative Binomial Distribution. The syntax of rnbinom is as follows:
rnbinom(n, size, prob)
Where,
- n: Number of observations to generate.
- size: The number of successes (the parameter r).
- prob: The probability of success in each trial (the parameter p).
Example 1: Generate Random Numbers Using rnbinom()
Let’s generate 1000 random numbers from a Negative Binomial Distribution with 5 successes and a success probability of 0.3 using rnbinom.
R # Set seed for reproducibility set.seed(123) # Generate random numbers from Negative Binomial Distribution neg_binom_data <- rnbinom(n = 1000, size = 5, prob = 0.3) # Display the first few numbers head(neg_binom_data)
Output:
[1] 11 19 16 8 6 22
Example 2: Visualizing the Negative Binomial Distribution
We can visualize the generated data using a histogram to see the shape of the distribution.
R # Load necessary library library(ggplot2) # Create a histogram ggplot(data = data.frame(x = neg_binom_data), aes(x = x)) + geom_histogram(binwidth = 1, fill = "blue", color = "black") + labs(title = "Histogram of Negative Binomial Distribution", x = "Number of Failures", y = "Frequency") + theme_minimal()
Output:
Visualizing the Negative Binomial DistributionThe histogram shows the distribution of the number of failures before achieving the specified number of successes. The shape of the distribution is skewed to the right, typical of count data with a low probability of success.
Example 3: Fitting a Negative Binomial Model to Real Data using rnbinom
In real-world scenarios, the Negative Binomial Distribution is often used to model overdispersed count data. Let’s simulate some overdispersed data and fit a Negative Binomial model using the MASS
package.
R # Load the MASS package for the glm.nb function library(MASS) # Simulate overdispersed data set.seed(456) x <- rnorm(100) y <- rnbinom(100, mu = exp(1 + 0.5 * x), size = 2) # Fit a Negative Binomial model to the data nb_model <- glm.nb(y ~ x) # Summarize the model summary(nb_model)
Output:
Call:
glm.nb(formula = y ~ x, init.theta = 2.653492838, link = log)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.01214 0.09002 11.244 < 2e-16 ***
x 0.48632 0.08750 5.558 2.73e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(2.6535) family taken to be 1)
Null deviance: 144.79 on 99 degrees of freedom
Residual deviance: 111.86 on 98 degrees of freedom
AIC: 439.23
Number of Fisher Scoring iterations: 1
Theta: 2.653
Std. Err.: 0.742
2 x log-likelihood: -433.227
- The
glm.nb()
function is used to fit a Negative Binomial regression model. - In this example, we simulate overdispersed count data using
rnbinom()
and fit the model to the data using a linear predictor involving x
. - The summary of the model will provide information on the significance of the predictors and the model fit.
Example 4: Comparing Poisson and Negative Binomial Models
In practice, you may want to compare the Poisson and Negative Binomial models to assess which fits better. This is done using the Akaike Information Criterion (AIC).
R # Fit a Poisson model poisson_model <- glm(y ~ x, family = "poisson") # Compare AIC values aic_values <- AIC(poisson_model, nb_model) print(aic_values)
Output:
df AIC
poisson_model 2 480.6543
nb_model 3 439.2270
- Poisson model: The Poisson model assumes that the mean equals the variance.
- Negative Binomial model: The Negative Binomial model accounts for overdispersion.
The AIC values help in model comparison. The model with the lower AIC value is preferred.
Conclusion
The Negative Binomial Distribution is an important tool for modeling overdispersed count data, where the variance is larger than the mean. In R, you can use the rnbinom()
function to generate random numbers from this distribution, and the glm.nb()
function from the MASS
package to fit models. Understanding when to use the Negative Binomial Distribution and how to implement it in R can greatly improve the analysis of count data in fields such as epidemiology, ecology, and social sciences.