Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data preprocessing
  • Data Manipulation
  • Data Analysis using Pandas
  • EDA
  • Pandas Exercise
  • Pandas AI
  • Numpy
  • Matplotlib
  • Plotly
  • Data Analysis
  • Machine Learning
  • Data science
Open In App
Next Article:
Pandas DataFrame corr() Method
Next article icon

Pandas dataframe.groupby() Method

Last Updated : 03 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Pandas groupby() function is a powerful tool used to split a DataFrame into groups based on one or more columns, allowing for efficient data analysis and aggregation. It follows a "split-apply-combine" strategy, where data is divided into groups, a function is applied to each group, and the results are combined into a new DataFrame. For example, if you have a dataset of sales transactions, you can use groupby() to group the data by product category and calculate the total sales for each category.

Pandas-dataframe-groupby-Method
pandas dataframe groupby

The code is providing total sales for each product category, demonstrating the core idea of grouping data and applying an aggregation function.

How to Use Pandas GroupBy Method?

The groupby() function in Pandas involves three main steps: Splitting, Applying, and Combining.

  • Splitting: This step involves dividing the DataFrame into groups based on some criteria. The groups are defined by unique values in one or more columns.
  • Applying: In this step, a function is applied to each group independently. You can apply various functions to each group, such as:
    • Aggregation: Calculate summary statistics (e.g., sum, mean, count) for each group.
    • Transformation: Modify the values within each group.
    • Filtering: Keep or discard groups based on certain conditions.
  • Combining: Finally, the results of the applied function are combined into a new DataFrame or Series.

The groupby method has several parameters that can be customized:

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)

Parameters :

  • by: Required parameter to specify the column(s) to group by.
  • axis: Optional, specifies the axis to group by (default is 0 for rows).
  • level: Optional, used for grouping by a certain level in a MultiIndex.
  • as_index: Optional, whether to use the group labels as the index (default is True).
  • sort: Optional, whether to sort the group keys (default is True).
  • group_keys: Optional, whether to add the group keys to the index (default is True).
  • dropna: Optional, whether to include rows/columns with NULL values (default is True

Example 1: Grouping by a Single Column

In this example, we will demonstrate how to group data by a single column using the groupby method. We will work with NBA-dataset that contains information about NBA players, including their teams, points scored, and assists. We'll group the data by the Team column and calculate the total points scored for each team.

Python
import pandas as pd df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")  team = df.groupby('Team') print(team.first()) # Let's print the first entries in all the groups formed. 

Output:

Pandas-dataframe-groupby-Method
Pandas dataframe.groupby() Method

Note : This is just the snapshot of the output, not all rows are covered here.

Example 2: Grouping by Multiple Columns

Grouping by multiple columns allows you to break down the data into finer categories and compute statistics for each unique combination of those columns. Let's use the same NBA dataset and group the data by both Team and positionto calculate the total points scored by each position within each team.

Python
import pandas as pd df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")  grouping = df.groupby(['Team', 'Position']) print(grouping.first()) 

Output:

groupbymultiplecolumns
Grouping by Multiple Columns

Note : This is just the snapshot of the output, not all rows are covered here.

Example 3 : Applying Aggregation with GroupBy

Aggregation is one of the most common operations when using groupby. After grouping the data, you can apply functions like sum(), mean(), min(), max(), and more.

  • sum(): Calculates the total sum of values for each group. Useful when you need to know the total amount of a numeric column grouped by specific categories.
  • mean(): Computes the average value of each group, helpful for understanding trends and patterns within grouped data.
  • count(): Counts the number of entries in each group, returns the number of non-null entries for each group.

Let's continue with the same NBA dataset and demonstrate how aggregation works in practice using the sum(), mean(), and count() functions. The example will group the data by both Team and Position, and apply all three aggregation functions to understand the total salary, average salary, and the number of players in each group.

Python
import pandas as pd df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")  aggregated_data = df.groupby(['Team', 'Position']).agg(     total_salary=('Salary', 'sum'),     avg_salary=('Salary', 'mean'),     player_count=('Name', 'count') )  print(aggregated_data) 

Output:

aggregation-with-group-by
Applying Aggregation with GroupBy

Example 4: How to Apply Transformation Methods?

Transformation functions return an object that is indexed the same as the original group. This is useful when you need to apply operations that maintain the original structure of the data, such as normalization or standardization within groups. Purpose: Apply group-specific operations while maintaining the original shape of the dataset. Unlike aggregation, which reduces data, transformations allow group-specific modifications without altering the shape of the data.

For example: Let’s understand how to: Rank players within their teams based on their salaries - Ranking players within their teams by salary can help identify the highest- and lowest-paid players in each group.

Python
import pandas as pd df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")  # Rank players within each team by Salary df['Rank within Team'] = df.groupby('Team')['Salary'].transform(lambda x: x.rank(ascending=False)) print(df) 

Output:

How-to-Apply-Transformation-Methods
Apply Transformation Methods

We grouped by Team and used rank() to assign rankings to salaries within each team, with the highest salary receiving a rank of 1. Missing values in salary column is leading to nan amounts.

Example 5 : Filtering Groups Using Filtration Methods

Filtration allows you to drop entire groups from a GroupBy object based on a condition. This method helps in cleaning data by removing groups that do not meet specific criteria, thus focusing analysis on relevant subsets. For example: Let’s demonstrate how to filter out groups where the average salary of players is below a certain threshold.

Python
import pandas as pd df = pd.read_csv("https://media.geeksforgeeks.org/wp-content/uploads/nba.csv")  # Filter groups where the average Salary is >= 5 million filtered_df = df.groupby('Team').filter(lambda x: x['Salary'].mean() >= 1000000) print(filtered_df) 

Output
              Name            Team  ...            College     Salary 0    Avery Bradley  Boston Celtics  ...              Texas  7730337.0 1      Jae Crowder  Boston Celtics  ...          Marquette  ...

Note : This is just the gist of the output, not all rows are covered here.


Next Article
Pandas DataFrame corr() Method

S

Shubham__Ranjan
Improve
Article Tags :
  • Technical Scripter
  • Python
  • Pandas
  • AI-ML-DS
  • Python-pandas
  • Python pandas-dataFrame
  • Pandas-DataFrame-Methods
Practice Tags :
  • python

Similar Reads

  • Python | Pandas DataFrame.fillna() to replace Null values in dataframe
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Ju
    5 min read
  • Python | Pandas dataframe.clip()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.clip() is used to trim values at specified input threshold. We can us
    3 min read
  • Pandas DataFrame.columns
    In Pandas, DataFrame.columns attribute returns the column names of a DataFrame. It gives access to the column labels, returning an Index object with the column labels that may be used for viewing, modifying, or creating new column labels for a DataFrame. Note: This attribute doesn't require any para
    2 min read
  • Pandas Dataframe.sort_values()
    In Pandas, sort_values() function sorts a DataFrame by one or more columns in ascending or descending order. This method is essential for organizing and analyzing large datasets effectively. Syntax: DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last'
    2 min read
  • Python | Pandas Series.value_counts()
    Pandas is one of the most widely used library for data handling and analysis. It simplifies many data manipulation tasks especially when working with tabular data. In this article, we'll explore the Series.value_counts() function in Pandas which helps you quickly count the frequency of unique values
    2 min read
  • Python | Pandas DataFrame.nlargest()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas nlargest() method is used to get n largest values from a data frame or a series
    2 min read
  • Python | Pandas DataFrame.nsmallest()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.Pandas nsmallest() method is used to get n least values from a data frame or a series.
    2 min read
  • Python Pandas - DataFrame.copy() function
    The DataFrame.copy() function in Pandas allows to create a duplicate of a DataFrame. This duplication can be either a deep copy, where the new DataFrame is entirely independent of the original, or a shallow copy, where changes to the original data reflect in the copy. The main takeaway is that copy(
    4 min read
  • Pandas DataFrame.loc[] Method
    Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
    6 min read
  • Extracting rows using Pandas .iloc[] in Python
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. here we are learning how to Extract rows using Pandas .iloc[] in Python. Pandas .iloc
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences