Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Sentiment Analysis for Customer Reviews in R
Next article icon

Sentiment Analysis for Customer Reviews in R

Last Updated : 24 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Sentiment analysis, also known as opinion mining, computationally identifies and categorizes opinions expressed in text data. It involves analyzing the polarity (positive, negative or neutral) of textual content to gauge the sentiment or attitude of the author. In the context of customer reviews, sentiment analysis helps businesses understand how customers perceive their products or services. In this article, we delve into the world of sentiment analysis for customer reviews using the R Programming Language.

Understanding the Dataset

The dataset used in this project contains TripAdvisor Hotel Reviews where each row represents a customer review. The dataset includes the following key columns:

  • S.No.: Serial number of the review.
  • Review: The actual customer review text.
  • Rating: Customer rating (usually on a scale of 1 to 5, representing their experience).

We will focus on the Review column which contains the textual data we need to analyze for sentiment. The Rating column can be used as an additional reference to compare how well our sentiment analysis matches the numerical ratings

You can download the dataset from here: TripAdvisor

1. Installing and Loading Required Packages

We need to install and load the required R packages.

  • tm: Provides text mining functions
  • SnowballC: Implements stemming for text data
  • syuzhet: Provides sentiment analysis functions
  • tidyverse: A collection of packages for data manipulation
  • wordcloud: Used for visualizing word frequencies
  • ggplot2: Used for data visualization
R
install.packages(c("tm", "SnowballC", "syuzhet", "tidyverse", "wordcloud", "ggplot2"))  library(tm) library(SnowballC) library(syuzhet) library(tidyverse) library(wordcloud) library(ggplot2) 

2. Loading the Dataset

Next, we will load the CSV file containing the reviews. The str() function will display the structure of the dataframe, showing the data types of each column and a preview of the data.

R
data <- read.csv("/content/tripadvisor.csv", header = TRUE) str(data) 

Output:

str
Loading the Dataset

3. Creating and Inspecting the Corpus

We convert the review text to a character vector and create a corpus for text processing.

R
corpus <- iconv(data$Review, to = "UTF-8", sub = "byte") corpus <- Corpus(VectorSource(corpus))  inspect(corpus[1:5]) 

Output:

corprus
Corpus

4. Cleaning the Corpus

We will clean the text by converting it to lowercase, removing punctuation, numbers, stopwords, extra whitespaces and applying stemming.

R
cleaned_corpus <- tm_map(corpus, content_transformer(tolower)) cleaned_corpus <- tm_map(cleaned_corpus, removePunctuation) cleaned_corpus <- tm_map(cleaned_corpus, removeNumbers) cleaned_corpus <- tm_map(cleaned_corpus, removeWords, stopwords('english')) cleaned_corpus <- tm_map(cleaned_corpus, stripWhitespace) cleaned_corpus <- tm_map(cleaned_corpus, stemDocument)  inspect(cleaned_corpus[1:5]) 

Output:

clean_corp
Cleaning the Corpus

5. Sampling the Data

We will sample a subset of reviews to make the analysis more manageable.

R
set.seed(123)  sampled_reviews <- sample(data$Review, 200) sampled_corpus <- Corpus(VectorSource(iconv(sampled_reviews, to = "UTF-8", sub = "byte"))) 

6. Cleaning the Sampled Corpus

We will now clean our sampled corpus similarly we did full corpus.

R
cleaned_sampled_corpus <- tm_map(sampled_corpus, content_transformer(tolower)) cleaned_sampled_corpus <- tm_map(cleaned_sampled_corpus, removePunctuation) cleaned_sampled_corpus <- tm_map(cleaned_sampled_corpus, removeNumbers) cleaned_sampled_corpus <- tm_map(cleaned_sampled_corpus, removeWords, stopwords('english')) cleaned_sampled_corpus <- tm_map(cleaned_sampled_corpus, stripWhitespace) cleaned_sampled_corpus <- tm_map(cleaned_sampled_corpus, stemDocument) 

7. Creating Sparse Term Document Matrix

We will create a sparse Term Document Matrix (TDM) for efficient processing and memory usage.

R
tdm_sparse <- TermDocumentMatrix(cleaned_sampled_corpus, control = list(weighting = weightTfIdf)) tdm_m_sparse <- as.matrix(tdm_sparse) 

8. Analyzing Term Frequencies

We analyze the frequency of terms in the corpus and display the most frequent ones.

R
term_freq <- rowSums(tdm_m_sparse) term_freq_sorted <- sort(term_freq, decreasing = TRUE) tdm_d_sparse <- data.frame(word = names(term_freq_sorted), freq = term_freq_sorted)  head(tdm_d_sparse, 5) 

Output:

freq
Term Frequencies

9. Performing Sentiment Analysis

We use three different methods (syuzhet, bing, afinn) to perform sentiment analysis on the text data.

R
text <- iconv(data$Review)  syuzhet_vector <- get_sentiment(text, method = "syuzhet") cat("Syuzhet method",head(syuzhet_vector),"\n")  bing_vector <- get_sentiment(text, method = "bing") cat("Bing method:",head(bing_vector),"\n")  afinn_vector <- get_sentiment(text, method = "afinn") cat("Afinn method:",head(afinn_vector),"\n") 

Output:

senti
Sentiment Analysis

10. Comparing Sentiment Methods

We compare the sentiment scores using the three methods.

R
rbind(   sign(head(syuzhet_vector)),   sign(head(bing_vector)),   sign(head(afinn_vector)) ) 

Output:

comparison
Comparing Sentiment Methods

Visualization of Sentiment Analysis for Customer Reviews in R

We will now visualize the sentiment analysis results using different methods, including a Word Cloud, Sentiment Histogram, Emotion Bar Plot and Pie Chart of Sentiment Distribution.

1. Word Cloud

We create a word cloud to visualize the most frequent terms in the reviews. A word cloud provides a quick and intuitive way to visualize the most common words in a text corpus, making it easier to identify patterns and trends.

R
wordcloud(words = tdm_d_sparse$word, freq = tdm_d_sparse$freq,            min.freq = 5, max.words = 100, colors = brewer.pal(8, "Dark2")) 

Output:

word-cloud
Word Cloud

Words with higher frequencies will appear larger and more prominent in the word cloud. The colors of the words are determined by the specified color palette, with each color representing a different word in the cloud.

2. Sentiment Histogram

We create a histogram to visualize the distribution of sentiment scores using the Syuzhet method. A histogram allows for a quick assessment of the overall sentiment distribution within the sampled text data.

R
text_sampled <- iconv(sampled_reviews) syuzhet_vector_sampled <- get_sentiment(text_sampled, method = "syuzhet")  ggplot(data.frame(syuzhet_vector_sampled), aes(x = syuzhet_vector_sampled)) +    geom_histogram(binwidth = 0.1, fill = "blue", color = "black") +    labs(title = "Sentiment Distribution using Syuzhet Method (Sampled Data)",         x = "Sentiment Score", y = "Frequency") +    theme_minimal() 

Output:

Sentiment-Histogram
Sentiment Histogram

Each bar in the histogram represents a range of sentiment scores and the height of the bar indicates the frequency of occurrence of sentiment scores within that range.

3. Bar Plot of emotions

We will use ggplot2 package to create a bar plot of emotions along with the sentiment scores categorized into different emotions.

R
nrc_sampled <- get_nrc_sentiment(text_sampled) nrct_sampled <- data.frame(t(nrc_sampled)) nrcs_sampled <- data.frame(rowSums(nrct_sampled)) nrcs_sampled <- cbind("sentiment" = rownames(nrcs_sampled), nrcs_sampled)  rownames(nrcs_sampled) <- NULL names(nrcs_sampled)[1] <- "sentiment" names(nrcs_sampled)[2] <- "frequency"  nrcs_sampled <- nrcs_sampled %>% mutate(percent = frequency/sum(frequency)) nrcs2_sampled <- nrcs_sampled[1:8, ] colnames(nrcs2_sampled)[1] <- "emotion"  ggplot(nrcs2_sampled, aes(x = reorder(emotion, -frequency), y = frequency,                            fill = emotion)) +    geom_bar(stat = "identity") +    labs(title = "Emotion Distribution (Sampled Data)", x = "Emotion", y = "Frequency") +    theme_minimal() +    scale_fill_brewer(palette = "Set3") 

Output:

Emotion-Bar-Plot
Emotion Bar Plot

The bar plot shows the distribution of emotions based on sentiment analysis using the NRC lexicon on the sampled dataset. Each bar represents a different emotion and the height of the bar indicates the frequency of that emotion within the text data. The colors of the bars are determined by the specified color palette, allowing for easy visualization of different emotions.

4. Bar Plot of Most Popular Words

Creating a bar plot of the most popular words in a text dataset involves visualizing the frequency distribution of words within the corpus. This visualization helps in identifying the most common words in the text data.

R
tdm_d_sparse <- tdm_d_sparse[1:10, ] tdm_d_sparse$word <- reorder(tdm_d_sparse$word, tdm_d_sparse$freq) ggplot(tdm_d_sparse, aes(x = word, y = freq, fill = word)) +    geom_bar(stat = "identity") +    coord_flip() +    labs(title = "Most Popular Words", x = "Word", y = "Frequency") +    theme_minimal() 

Output:

Bar-Plot-of-Most-Popular-Words
Bar plot of most popular word

The horizontal bar plot shows the frequency of the top 10 most popular words in the text data. Each bar represents a word and the length of the bar indicates the frequency of that word in the dataset. The colors of the bars are determined by the words themselves, providing visual differentiation between them.

5. Pie Chart of Sentiment Distribution

Creating a pie chart of sentiment distribution involves visualizing the proportion of different sentiment categories within a dataset.

R
library(ggplot2) library(RColorBrewer)  sentiment_df <- data.frame(   sentiment = c("Positive", "Negative", "Neutral"),   count = c(sum(syuzhet_vector_sampled > 0), sum(syuzhet_vector_sampled < 0),              sum(syuzhet_vector_sampled == 0)) )  ggplot(sentiment_df, aes(x = "", y = count, fill = sentiment)) +   geom_bar(stat = "identity", width = 1) +   coord_polar("y", start = 0) +   labs(title = "Sentiment Distribution", x = "", y = "") +   theme_minimal() +   scale_fill_brewer(palette = "Set3") 

Output:

Pie-chart
Pie Chart of Sentiment Distribution

The pie chart shows the distribution of sentiment categories within the dataset. Each segment of the pie chart represents a sentiment category ("Positive", "Negative", "Neutral") and the size of each segment corresponds to the count of that sentiment category in the dataset. The colors of the segments are determined by the specified color palette, allowing for easy differentiation between sentiment categories.

Conclusion

From our analysis, we can see that the majority of customers had a positive experience using TripAdvisor, expressing emotions of trust, joy and anticipation most often.


Next Article
Sentiment Analysis for Customer Reviews in R
https://media.geeksforgeeks.org/auth/avatar.png
Anonymous
Improve
Article Tags :
  • Blogathon
  • R Machine Learning
  • AI-ML-DS
  • R Projects
  • Data Science Blogathon 2024
  • R Language

Similar Reads

    Dataset for Sentiment Analysis
    Sentiment analysis, which helps understand how people feel and what they think, is very important in studying public opinions, customer thoughts, and social media buzz. But to make sentiment analysis work well, we need good datasets to train and test our systems. In this article, we will look at som
    8 min read
    Flipkart Reviews Sentiment Analysis using Python
    Sentiment analysis is a NLP task used to determine the sentiment behind textual data. In context of product reviews it helps in understanding whether the feedback given by customers is positive, negative or neutral. It helps businesses gain valuable insights about customer experiences, product quali
    3 min read
    Telecom Customer Churn Analysis in R
    Customer churn is an important concern for the telecom industry, as retaining customers is just as important as acquiring new ones. In this article we will be analyzing a dataset related to customer churn to derive insights into why customers leave and what can be done to retain them.Project Overvie
    6 min read
    Analyzing Google Play Store Reviews in R
    Analyzing Google Play Store reviews can provide valuable insights into user sentiments, app performance, and areas for improvement. In this project, we'll explore how to analyze Google Play Store reviews using R Programming Language covering theoretical concepts, dataset creation, and multiple visua
    7 min read
    Sentiment Analysis of Restaurant Reviews - Machine Learning Project
    Online reviews are essential in our decision-making process for choosing where to eat, what to buy, and which services to use. But how do businesses interpret the significance of these reviews for their service quality? That's where sentiment analysis comes in. This article provides an overview of s
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences