Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python
  • R Language
  • Python for Data Science
  • NumPy
  • Pandas
  • OpenCV
  • Data Analysis
  • ML Math
  • Machine Learning
  • NLP
  • Deep Learning
  • Deep Learning Interview Questions
  • Machine Learning
  • ML Projects
  • ML Interview Questions
Open In App
Next Article:
Pearson Correlation Testing in R Programming
Next article icon

Pearson Correlation Testing in R Programming

Last Updated : 01 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally, it lies between -1 and +1. It is a scaled version of covariance and provides the direction and strength of a relationship using R Programming Language.

Pearson Correlation Testing in R

There are mainly two types of correlation: 

  1. Parametric Correlation: It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
  2. Non-Parametric Correlation: They are rank-based correlation coefficients, and are known as non-parametric correlation.

The Pearson correlation coefficient is probably the most widely used measure for linear relationships between two normal distributed variables and thus often just called "correlation coefficient". The formula for calculating the Pearson Rank Correlation is as follows:

\displaystyle r = \frac { \Sigma(x – m_x)(y – m_y) }{\sqrt{\Sigma(x – m_x)^2 \Sigma(y – m_y)^2}}

where, 

  • r: pearson correlation coefficient
  • x and y: two vectors of length n
  • mx and my: corresponds to the means of x and y, respectively.

Note:

  • r takes a value between -1 (negative correlation) and 1 (positive correlation).
  • r = 0 means no correlation.
  • Can not be applied to ordinal variables.
  • The sample size should be moderate (20-30) for good estimation.
  • Outliers can lead to misleading values means not robust with outliers.

Implementation in R

R Programming Language provides two methods to calculate the pearson correlation coefficient. By using the functions cor() or cor.test() it can be calculated. It can be noted that cor() computes the correlation coefficient whereas cor.test() computes the test for association or correlation between paired samples. It returns both the correlation coefficient and the significance level(or p-value) of the correlation.

Syntax:

cor(x, y, method = “pearson”) 
cor.test(x, y, method = “pearson”)

Parameters: 

  • x, y: numeric vectors with the same length
  • method: correlation method

1: Correlation Coefficient Test In R Using cor() method

Here we will discuss Correlation Coefficient Test In R Using cor() method:

R
# R program to illustrate # pearson Correlation Testing # Using cor()  # Taking two numeric # Vectors with same length x = c(1, 2, 3, 4, 5, 6, 7) y = c(1, 3, 6, 2, 7, 4, 5)  # Calculating # Correlation coefficient # Using cor() method result = cor(x, y, method = "pearson")  # Print the result cat("Pearson correlation coefficient is:", result) 

Output: 

Pearson correlation coefficient is: 0.5357143

2: Correlation Coefficient Test In R Using cor.test() method

Now we will discuss Correlation Coefficient Test In R Using cor.test() method:

R
# R program to illustrate # pearson Correlation Testing # Using cor.test()  # Taking two numeric # Vectors with same length x = c(1, 2, 3, 4, 5, 6, 7) y = c(1, 3, 6, 2, 7, 4, 5)  # Calculating # Correlation coefficient # Using cor.test() method result = cor.test(x, y, method = "pearson")  # Print the result print(result) 

Output: 

Pearson's product-moment correlation  data:  x and y t = 1.4186, df = 5, p-value = 0.2152 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval:  -0.3643187  0.9183058 sample estimates:       cor  0.5357143 

In the output above:

  • T is the value of the test statistic (T = 1.4186)
  • p-value is the significance level of the test statistic (p-value = 0.2152).
  • alternative hypothesis is a character string describing the alternative hypothesis (true correlation is not equal to 0).
  • sample estimates is the correlation coefficient. For Pearson correlation coefficient it’s named as cor (Cor.coeff = 0.5357).

Testing for Statistical Significance

It's crucial to determine if the observed correlation is statistically significant. We use the rcorr() function from the Hmisc package.

R
library(ggplot2) library(corrplot) library(Hmisc)  # Load the dataset data("mtcars")  # Calculate correlation with significance levels cor_test <- rcorr(as.matrix(mtcars[, c("mpg", "wt", "hp", "disp")]), type = "pearson")  # Display the correlation coefficients cor_test$r  # Display the p-values cor_test$P 

Output:

            mpg         wt         hp       disp
mpg 1.0000000 -0.8676594 -0.7761684 -0.8475514
wt -0.8676594 1.0000000 0.6587479 0.8879799
hp -0.7761684 0.6587479 1.0000000 0.7909486
disp -0.8475514 0.8879799 0.7909486 1.0000000

mpg wt hp disp
mpg NA 1.293958e-10 1.787835e-07 9.380328e-10
wt 1.293958e-10 NA 4.145827e-05 1.222311e-11
hp 1.787835e-07 4.145827e-05 NA 7.142679e-08
disp 9.380328e-10 1.222311e-11 7.142679e-08 NA
  • p<0.05: The correlation is statistically significant.
  • p>0.05: The correlation is not statistically significant.

Visualizing Relationships Using ggplot2

It’s helpful to visualize the relationship between two variables using scatter plots.

R
# Scatter plot with a regression line ggplot(mtcars, aes(x = wt, y = mpg)) +   geom_point(color = "blue", size = 2) +   geom_smooth(method = "lm", color = "red", se = FALSE) +   labs(title = "Scatter Plot with Pearson Correlation",        x = "Weight (wt)", y = "Miles Per Gallon (mpg)") +   theme_minimal() 

Output:

gh
Pearson Correlation Testing in R Programming

geom_smooth(method = "lm"): Adds a linear regression line to visualize the linear relationship.

Conclusion

The Pearson correlation coefficient is a powerful tool for understanding the linear relationship between two continuous variables. In R, calculating and interpreting Pearson correlation is straightforward with built-in functions and packages like ggplot2 and corrplot. By following the methods outlined in this article, you can perform Pearson correlation testing, visualize relationships, and ensure statistical significance.


Next Article
Pearson Correlation Testing in R Programming

A

AmiyaRanjanRout
Improve
Article Tags :
  • Data Science
  • Machine Learning
  • R Language
  • AI-ML-DS
  • data-science
Practice Tags :
  • Machine Learning

Similar Reads

    Kendall Correlation Testing in R Programming
    Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c
    4 min read
    Spearman Correlation Testing in R Programming
    Correlation is a key statistical concept used to measure the strength and direction of the relationship between two variables. Unlike Pearson’s correlation, which assumes a linear relationship and continuous data, Spearman’s rank correlation coefficient is a non-parametric measure that assesses how
    3 min read
    Correlation Matrix in R Programming
    Correlation refers to the relationship between two variables, specifically the degree of linear association between them. In R, a correlation matrix represents this relationship as a range of values between -1 and 1.A value of -1 indicates a perfect negative linear relationship.A value of 1 indicate
    5 min read
    Hypothesis Testing in R Programming
    A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypo
    6 min read
    Permutation Hypothesis Test in R Programming
    In simple words, the permutation hypothesis test in R is a way of comparing a numerical value of 2 groups. The permutation Hypothesis test is an alternative to:  Independent two-sample t-test Mann-Whitney U aka Wilcoxon Rank-Sum Test Let's implement this test in R programming. Why use the Permutatio
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences