Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Visualization
  • Statistics in R
  • Machine Learning in R
  • Data Science in R
  • Packages in R
  • Data Types
  • String
  • Array
  • Vector
  • Lists
  • Matrices
  • Oops in R
Open In App
Next Article:
Encoding Categorical Data in R
Next article icon

Encoding Categorical Data in R

Last Updated : 26 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Encoding is the process of converting categorical data into numerical values. Categorical data is a type of data which can be classified into categories or groups (such as colors or job titles). Since categorical variables cannot be directly used in statistical analysis or machine learning models, encoding is necessary to represent them in a format that models can process.

Different Techniques to Encode Categorical Data

The Categorical data can be encoded in R using a variety of techniques. We'll go over three of the most popular approaches: label encoding, frequency encoding, and one-hot encoding.

1. One-Hot Encoding

One-hot encoding is a technique used to convert categorical data into a binary matrix. Each unique category value in a variable is assigned its own column in the matrix. For each row, if a category value is present, the corresponding column is marked with a 1, while all other columns for that row are set to 0. This technique ensures that categorical values are represented numerically, allowing them to be used in machine learning models.

In this example, we create a sample dataset and convert the gender column , which is a categorical column, to a numerical format using one hot encoding.

R
gender <-  c("male", "female", "male", "male", "female") age    <-  c(23, 34, 52, 21, 19) income <-  c(50000, 70000, 80000, 45000, 55000) df     <-  data.frame(gender, age, income)  encoded_gender <- model.matrix(~gender-1, data=df) 

Output:

one-hot
One-hot Encoding


2. Label Encoding

The Label encoding method is for encoding categorical variables that assigns the number value to each distinct value. For the instance, the numerical values 1, 2, and 3 might be assigned to a categorical variable with the three unique values of "red," "green," and "blue," respectively. The factor() function in R can be used to turn a category variable into a factor, that can subsequently be turned into integers using the as.integer() function.

In this example, the data frame contains a column color which is a categorical column. We can label encrypt the color column using the factor() function and change its type to integer using as.integer() function.

R
color <-  c("red", "green", "blue", "blue", "red") df    <-  data.frame(color)  df$color <-  as.integer(factor(df$color)) 

Output:

label_encoding
Label Encoding


3. Frequency Encoding

The Frequency Each distinct value is assigned the frequency with which it occurs in the data when encoding categorical variables. The numerical values for each of these values may be 3, 4, and 2, respectively, if a categorical variable has three distinct values (red, green, and blue), and each of those values appears three, four, or two times.

In this example, we will frequency encode the color column of the data frame.

R
color <-  c("red", "green", "blue", "blue", "red") df    <-  data.frame(color)  freq_count  <-  table(df$color) df$color    <-  match(df$color, names(freq_count) 

Output:

Frequency Encoding

Choosing an Encoding Method

The choice of encoding method depends on the type of analysis or model being used and the characteristics of the data. For categorical variables with a small number of unique values, Label Encoding and Frequency Encoding are commonly used. On the other hand, One-Hot Encoding is typically preferred for categorical variables with many unique values.

It's important to note that Label Encoding and Frequency Encoding can introduce unintended order or hierarchy into the data, which may affect the validity of analysis or machine learning models. In such cases, One-Hot Encoding may be a more suitable choice.

Difference Between all the Methods

Encoding MethodDescriptionWhen to Use
One-Hot EncodingConverts each category into a binary vector, where one element is set to 1, and all others are 0.When there is no inherent order between categories.

Ideal for categorical variables with a large number of unique values.
Frequency EncodingAssigns a numerical value to each category based on its frequency in the dataset.
When there are many categories and you want to retain information about the frequency of categories.

Useful for large datasets but may imply an unintended hierarchy.
Label EncodingAssigns a unique numerical value to each category based on its order in the dataset.



When there is an ordinal relationship between categories (e.g., low, medium, high).

Suitable for variables with a limited number of categories.

Not recommended for nominal data as it may create artificial ranking

In this article, we discussed three encoding methods One-Hot Encoding, Frequency Encoding, and Label Encoding and when to use each based on the nature of the categorical data and the analysis or model requirements.


Next Article
Encoding Categorical Data in R

P

prthmsh7
Improve
Article Tags :
  • R Language

Similar Reads

    Handling Categorical Data in Python
    Categorical data refers to features that contain a fixed set of possible values or categories that data points can belong to. Handling categorical data correctly is important because improper handling can lead to inaccurate analysis and poor model performance. In this article, we will see how to han
    5 min read
    How to Plot Categorical Data in R?
    In this article, we will be looking at different plots for the categorical data in the R programming language. Categorical Data is a variable that can take on one of a limited, and usually fixed, a number of possible values, assigning each individual or other unit of observation to a particular grou
    3 min read
    Categorical Data
    Categorical data classifies information into distinct groups or categories, lacking a specific numerical value. It refers to a form of information that can be stored and identified based on their names or labels. Categorical Data is a type of qualitative data that is easily measured numerically.In t
    14 min read
    Categorical Data Descriptive Statistics in R
    Categorical data, representing non-measurable attributes, requires specialized analysis. This article explores descriptive statistics and visualization techniques in R Programming Language for categorical data, focusing on frequencies, proportions, bar charts, pie charts, frequency tables, and conti
    12 min read
    Passing categorical data to Sklearn Decision Tree
    Theoretically, decision trees are capable of handling numerical as well as categorical data, but, while implementing, we need to prepare the data for classification. There are two methods to handle the categorical data before training: one-hot encoding and label encoding. In this article, we underst
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences