Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R
  • Data Types
  • String
  • Array
  • Vector
  • Lists
  • Matrices
  • Oops in R
Open In App
Next Article:
Permutation Hypothesis Test in R Programming
Next article icon

Permutation Hypothesis Test in R Programming

Last Updated : 24 Nov, 2020
Comments
Improve
Suggest changes
Like Article
Like
Report

In simple words, the permutation hypothesis test in R is a way of comparing a numerical value of 2 groups. The permutation Hypothesis test is an alternative to: 

  • Independent two-sample t-test 
  • Mann-Whitney U aka Wilcoxon Rank-Sum Test

Let's implement this test in R programming.

Why use the Permutation Hypothesis Test? 

  • Small Sample Size. 
  • Assumptions(for parametric approach) not met. 
  • Test something other than classic approaches comparing Means and Medians. 
  • Difficult to estimate the SE for test-statistic.

Permutation Hypothesis Test Steps

  1. Specify a hypothesis 
  2. Choose test-stat(Eg: Mean, Median, etc. ) 
  3. Determine Distribution of test-stat 
  4. Convert test-stat to P-value 

Note: P-value = No. of permutations having a test-stat value greater than observed test-stat value/ No. of permutations.

Implementation in R

  • Dataset: Chicken Diet Data. This dataset is a subset of the "chickwts" data in the "R dataset package". Download the data set here.
  • Hypothesis: The weight of the chicken is independent of the type of diet.

Test-Statistics

  • Test-Statistics #1: The absolute value of the difference in mean weights for the two diets | Y1 - Y2 |. This is the same test statistics as the independent two-sided two-sample t-test.
  • Test-Statistics #2: The absolute value of the difference in median weights for the two diets | Median1 - Median2 |
R
# R program to illustrate # Permutation Hypothesis Test  # load the data set d <- read.table(file = "ChickData.csv",                  header = T, sep = ",")  # print the dataset print(d)  # check the names names(d) levels(d$feed)  # how many observations in each diet? table(d$feed)  # let's look at a boxplot of weight gain by those 2 diets boxplot(d$weight~d$feed, las = 1,          ylab = "weight (g)",          xlab = "feed",         main = "Weight by Feed")  # calculate the difference in sample MEANS mean(d$weight[d$feed == "casein"]) # mean for casein mean(d$weight[d$feed == "meatmeal"]) # mean for meatmeal  # lets calculate the absolute diff in means test.stat1 <- abs(mean(d$weight[d$feed == "casein"]) -                    mean(d$weight[d$feed == "meatmeal"]))  test.stat1   # calculate the difference in sample MEDIANS median(d$weight[d$feed == "casein"]) # median for casein median(d$weight[d$feed == "meatmeal"]) # median for meatmeal  # lets calculate the absolute diff in medians test.stat2 <- abs(median(d$weight[d$feed == "casein"]) -                    median(d$weight[d$feed == "meatmeal"]))   test.stat2  # Permutation Test  # for reproducability of results set.seed(1979)    # the number of observations to sample n <- length(d$feed)    # the number of permutation samples to take P <- 100000   # the variable we will resample from  variable <- d$weight    # initialize a matrix to store the permutation data PermSamples <- matrix(0, nrow = n, ncol = P)  # each column is a permutation sample of data # now, get those permutation samples, using a loop # let's take a moment to discuss what that code is doing for(i in 1:P)   {     PermSamples[, i] <- sample(variable,                                 size = n,                                 replace = FALSE)   }  # we can take a quick look at the first 5 columns of PermSamples PermSamples[, 1:5]  # initialize vectors to store all of the Test-stats Perm.test.stat1 <- Perm.test.stat2 <- rep(0, P)  # loop thru, and calculate the test-stats for (i in 1:P)   {     # calculate the perm-test-stat1 and save it     Perm.test.stat1[i] <- abs(mean(PermSamples[d$feed == "casein",i]) -                                mean(PermSamples[d$feed == "meatmeal",i]))          # calculate the perm-test-stat2 and save it     Perm.test.stat2[i] <- abs(median(PermSamples[d$feed == "casein",i]) -                                median(PermSamples[d$feed == "meatmeal",i]))   }  # before going too far with this,  # let's remind ourselves of  # the TEST STATS test.stat1; test.stat2  # and, take a look at the first 15  # permutation-TEST STATS for 1 and 2 round(Perm.test.stat1[1:15], 1) round(Perm.test.stat2[1:15], 1)  # and, let's calculate the permutation p-value # notice how we can ask R a true/false question (Perm.test.stat1 >= test.stat1)[1:15]  # and if we ask for the mean of all of those, # it treats 0 = FALSE, 1 = TRUE mean((Perm.test.stat1 >= test.stat1)[1:15])  # Calculate the p-value, for all P = 100,000 mean(Perm.test.stat1 >= test.stat1)  # and, let's calculate the p-value for  # option 2 of the test statistic (abs diff in medians) mean(Perm.test.stat2 >= test.stat2) 

Output:

> print(d)       weight  feed  1     325 meatmeal  2     257 meatmeal  3     303 meatmeal  4     315 meatmeal  5     380 meatmeal  6     153 meatmeal  7     263 meatmeal  8     242 meatmeal  9     206 meatmeal  10    344 meatmeal  11    258 meatmeal  12    368   casein  13    390   casein  14    379   casein  15    260   casein  16    404   casein  17    318   casein  18    352   casein  19    359   casein  20    216   casein  21    222   casein  22    283   casein  23    332   casein  > names(d)  [1] "weight" "feed"    > levels(d$feed)  [1] "casein"   "meatmeal"  > table(d$feed)  casein meatmeal     12       11 
Output Graph
> mean(d$weight[d$feed == "casein"]) # mean for casein  [1] 323.5833  > mean(d$weight[d$feed == "meatmeal"]) # mean for meatmeal  [1] 276.9091  > test.stat1  [1] 46.67424  > median(d$weight[d$feed == "casein"]) # median for casein  [1] 342  > median(d$weight[d$feed == "meatmeal"]) # median for meatmeal  [1] 263  > test.stat2  [1] 79  > PermSamples[, 1:5]        [,1] [,2] [,3] [,4] [,5]   [1,]  379  283  380  352  206   [2,]  380  303  258  260  380   [3,]  257  206  379  380  153   [4,]  283  242  222  404  359   [5,]  222  260  325  258  258   [6,]  315  352  153  379  263   [7,]  352  263  263  325  325   [8,]  153  325  315  359  216   [9,]  368  379  344  242  260  [10,]  344  258  368  368  257  [11,]  359  257  206  257  315  [12,]  206  153  404  222  303  [13,]  404  344  303  390  390  [14,]  325  318  318  303  352  [15,]  242  404  332  263  404  [16,]  390  380  257  206  379  [17,]  260  332  216  315  318  [18,]  303  359  352  344  368  [19,]  263  222  242  283  222  [20,]  332  368  260  332  344  [21,]  318  315  283  318  283  [22,]  216  390  390  153  332  [23,]  258  216  359  216  242  > test.stat1; test.stat2  [1] 46.67424  [1] 79  > round(Perm.test.stat1[1:15], 1)   [1] 17.1 32.4 17.6 47.1 56.1 28.9 31.0 40.8  6.8 13.8  9.1 46.5 28.9 50.9 32.7  > round(Perm.test.stat2[1:15], 1)   [1] 61.0 75.0  4.5 59.0 78.0 17.0 62.0 38.5  4.5 16.0 23.0 60.5 63.5 75.0 37.0  > (Perm.test.stat1 >= test.stat1)[1:15]   [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  > mean((Perm.test.stat1 >= test.stat1)[1:15])  [1] 0.2  > mean(Perm.test.stat1 >= test.stat1)  [1] 0.09959  > mean(Perm.test.stat2 >= test.stat2)  [1] 0.05407

Next Article
Permutation Hypothesis Test in R Programming

S

samrat2825
Improve
Article Tags :
  • R Language
  • R-Statistics

Similar Reads

    Hypothesis Testing in R Programming
    A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypo
    6 min read
    Bartlett’s Test in R Programming
    In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or sam
    5 min read
    T-Test Approach in R Programming
    The T-Test is a statistical method used to determine whether there is a significant difference between the means of two groups or between a sample and a known value.For Example: businessman who owns two sweet shops in a town. He wants to know if there's a significant difference in the average number
    5 min read
    Fisher’s F-Test in R Programming
    In this article, we will delve into the fundamental concepts of the F-Test, its applications, assumptions, and how to perform it using R programming. We will also provide a step-by-step guide with examples and visualizations to help you master the F-Test in the R Programming Language.What is Fisher’
    4 min read
    Levene’s Test in R Programming
    Levene's test is an inferential statistic used to assess whether the variances of a variable are equal across two or more groups, especially when the data comes from a non-normal distribution. This test checks the assumption of homoscedasticity (equal variances) before conducting tests like ANOVA. I
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences