Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data preprocessing
  • Data Manipulation
  • Data Analysis using Pandas
  • EDA
  • Pandas Exercise
  • Pandas AI
  • Numpy
  • Matplotlib
  • Plotly
  • Data Analysis
  • Machine Learning
  • Data science
Open In App
Next Article:
Pandas DataFrame iterrows() Method
Next article icon

Pandas – get_dummies() method

Last Updated : 03 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In Pandas, the get_dummies() function converts categorical variables into dummy/indicator variables (known as one-hot encoding). This method is especially useful when preparing data for machine learning algorithms that require numeric input.

Syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, drop_first=False, dtype=None)

The function returns a DataFrame where each unique category in the original data is converted into a separate column, and the values are represented as True (for presence) or False (for absence).

Encoding a Pandas DataFrame

Let’s look at an example of how to use the get_dummies() method to perform one-hot encoding.

Python
import pandas as pd  data = {     'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red'],     'Size': ['Small', 'Large', 'Medium', 'Small', 'Large'] }  df = pd.DataFrame(data) print('Original DataFrame') display(df)  # Perform one-hot encoding df_encoded = pd.get_dummies(df) print('\n DataFrame after performing One-hot Encoding') display(df_encoded) 

Output:

Original-DataFrame

Sample DataFrame

Dataframe-after-performing-one-hot-encoding

DataFrame after performing One-Hot Encoding

In the output, each unique category in the Color and Size columns has been transformed into a separate binary (True or False) column. The new columns indicate whether the respective category is present in each row.

To get, the output as 0 and 1, instead of True and False, you can set the data type (dtype) as ‘float’ or ‘int’.

Python
# Perform one-hot encoding df_encoded = pd.get_dummies(df, dtype = int) print('\n DataFrame after performing One-hot Encoding') display(df_encoded) 

Output:

Pandas-Encoded-DataFrame-with-0-and-1s

Pandas DataFrame after performing One-Hot Encoding (0s and 1s)

Encoding a Pandas Series

Python
import pandas as pd  # Series with days of the week days = pd.Series(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Monday']) print(pd.get_dummies(days, dtype='int')) 

Output
   Friday  Monday  Thursday  Tuesday  Wednesday 0       0       1         0        0          0 1       0       0         0        1          0 2       0       0         0        0          1 3       ...

In this example, each unique day of the week is transformed into a dummy variable, where a 1 indicates the presence of that day.

Converting NaN Values into a Dummy Variable

The dummy_na=True option can be used when dealing with missing values. It creates a separate column indicating whether the value is missing or not.

Python
import pandas as pd import numpy as np  # List with color categories and NaN colors = ['Red', 'Blue', 'Green', np.nan, 'Red', 'Blue'] print(pd.get_dummies(colors, dummy_na=True, dtype='int')) 

Output
   Blue  Green  Red  NaN 0     0      0    1    0 1     1      0    0    0 2     0      1    0    0 3     0      0    0    1 4     0      0    1    0 5     1      0    0    0 

The dummy_na=True parameter adds a column for missing values (NaN), indicating where the NaN values were originally present.




Next Article
Pandas DataFrame iterrows() Method
author
romy421kumari
Improve
Article Tags :
  • AI-ML-DS
  • Pandas
  • Python
  • Pandas-DataFrame-Methods
  • Python-pandas
Practice Tags :
  • python

Similar Reads

  • Pandas DataFrame itertuples() Method
    itertuples() is a method that is used to iterate over the rows and return the values along with attributes in tuple format. It returns each row as a lightweight namedtuple, which is faster and more memory-efficient than other row iteration methods like iterrows(). Let us consider one sample example.
    7 min read
  • Pandas dataframe.groupby() Method
    Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. It follows a "split-apply-combine" strategy, where data is divided into groups, a function is applied to each group, and the results
    6 min read
  • Pandas DataFrame iterrows() Method
    iterrows() method in Pandas is a simple way to iterate over rows of a DataFrame. It returns an iterator that yields each row as a tuple containing the index and the row data (as a Pandas Series). This method is often used in scenarios where row-wise operations or transformations are required. Exampl
    4 min read
  • Pandas DataFrame take() Method
    Python is a great tool for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages like Pandas which make analyzing data much easier. Pandas take() function returns elements on the given indices, along an axis. This means that we are not indexing according to actu
    3 min read
  • Python | Pandas Index.get_duplicates()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.get_duplicates() function extract duplicated index elements. This functio
    2 min read
  • Pandas isnull() and notnull() Method
    In this article, we will delve into the Pandas isnull() and notnull() methods, essential tools provided by the Pandas library for simplifying the import and analysis of data. Pandas prove to be a valuable package for data manipulation, particularly when creating DataFrames from Pandas CSV files. Oft
    4 min read
  • Python | Pandas Series.str.get_dummies()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas str.get_dummies() is used to separate each string in the caller series at the p
    3 min read
  • Pandas query() Method
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. Analyzing data requires a lot of filtering operations. Pandas Dataframe provide many
    2 min read
  • Pandas DataFrame duplicated() Method | Pandas Method
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas duplicated() method identifies duplicated rows in a DataFrame. It returns a boo
    3 min read
  • Pandas DataFrame.columns
    In Pandas, DataFrame.columns attribute returns the column names of a DataFrame. It gives access to the column labels, returning an Index object with the column labels that may be used for viewing, modifying, or creating new column labels for a DataFrame. Note: This attribute doesn't require any para
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences