Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data preprocessing
  • Data Manipulation
  • Data Analysis using Pandas
  • EDA
  • Pandas Exercise
  • Pandas AI
  • Numpy
  • Matplotlib
  • Plotly
  • Data Analysis
  • Machine Learning
  • Data science
Open In App
Next Article:
Creating a Pandas Series from Lists
Next article icon

Create a Pipeline in Pandas

Last Updated : 17 Jan, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

Pipelines play a useful role in transforming and manipulating tons of data. Pipeline are a sequence of data processing mechanisms. Pandas pipeline feature allows us to string together various user-defined Python functions in order to build a pipeline of data processing. There are two ways to create a Pipeline in pandas. By calling .pipe() function and by importing pdpipe package. 

Through pandas pipeline function i.e. pipe() function we can call more than one function at a time and in a single line for data processing. Let’s understand and create a pipeline by using the pipe() function.

Below are various examples that depict how to create a pipeline using pandas.

Example 1:

Python3

# importing pandas library
import pandas as pd
 
# Create empty dataframe
df = pd.DataFrame()
 
# Creating a simple dataframe
df['name'] = ['Reema', 'Shyam', 'Jai',
              'Nimisha', 'Rohit', 'Riya']
df['gender'] = ['Female', 'Male', 'Male',
                'Female', 'Male', 'Female']
df['age'] = [31, 32, 19, 23, 28, 33]
 
# View dataframe
df
                      
                       

Output:

Now, creating functions for data processing.

Python3

# function to find mean
def mean_age_by_group(dataframe, col):
   
    # groups the data by a column and
    # returns the mean age per group
    return dataframe.groupby(col).mean()
   
# function to convert to uppercase
def uppercase_column_name(dataframe):
   
    # Converts all the column names into uppercase
    dataframe.columns = dataframe.columns.str.upper()
     
    # And returns them
    return dataframe 
                      
                       

Now, creating a pipeline using .pipe() function.

Python3

# Create a pipeline that applies both the functions created above
pipeline = df.pipe(mean_age_by_group, col='gender').pipe(uppercase_column_name)
 
# calling pipeline
pipeline
                      
                       

Output:

Now, let’s understand and create a pipeline by importing pdpipe package.

The pdpipe Python package provides a concise interface for building pandas pipelines that have pre-conditions. The pdpipe is a pre-processing pipeline package for Python’s panda data frame. The pdpipe API helps to easily break down or compose complex-ed panda processing pipelines with few lines of codes. 

We can install this package by simply writing:

pip install pdpipe

Example 2:

Python3

# importing the package
import pdpipe as pdp
import pandas as pd
 
# creating a empty dataframe named dataset
dataset = pd.DataFrame()
 
# Creating a simple dataframe
dataset['name'] = ['Reema', 'Shyam', 'Jai',
                   'Nimisha', 'Rohit', 'Riya']
 
dataset['gender'] = ['Female', 'Male', 'Male',
                     'Female', 'Male', 'Female']
 
dataset['age'] = [31, 32, 19, 23, 28, 33]
 
dataset['department'] = ['Accounts', 'Management',
                         'IT', 'IT', 'Management',
                         'Advertising']
 
dataset['index'] = [1, 2, 3, 4, 5, 6]
 
# View dataframe
dataset
                      
                       

Output:

Removing a column from dataframe using pdpipe.

Python3

# creating a pipeline and
# dropping the unwanted column
dropCol = pdp.ColDrop("index").apply(dataset)
 
# display the new dataframe
# after column drop
dropCol
                      
                       

Output:

There is another way to drop columns through pdpipe.

Python3

# creating a pipeline and
# dropping the unwanted column
dropCol2 = pdp.ColDrop("index")
 
# applying the ColDrop to dataframe
df2 = dropCol2(dataset)
 
# display dataframe
df2
                      
                       

Output:

Here, the column is dropped in two steps. In the first step, we created a pipeline and in the second step, we applied it to the dataframe.

Example 3: 

Now we are adding one column to dataframe using pdpipe.

Python3

# importing the package
import pdpipe as pdp
import pandas as pd
 
# creating a empty dataframe named dataset
dataset = pd.DataFrame()
 
# Creating a simple dataframe
dataset['name'] = ['Reema', 'Shyam', 'Jai',
                   'Nimisha', 'Rohit', 'Riya']
 
dataset['gender'] = ['Female', 'Male', 'Male',
                     'Female', 'Male', 'Female']
 
dataset['age'] = [31, 32, 19, 23, 28, 33]
 
dataset['department'] = ['Accounts', 'Management',
                         'IT', 'IT', 'Management',
                         'Advertising']
 
dataset['index'] = [1, 2, 3, 4, 5, 6]
 
# View dataframe
dataset
                      
                       

Output:

Now, dropping the values from dataframe.

Python3

#dropping the values using ValDrop
df3 = pdp.ValDrop(['IT'],'department').apply(dataset)
 
#display dataframe
df3
                      
                       

 
 

Output:


 


 

The row containing ‘ IT ‘ value is dropped.


 



Next Article
Creating a Pandas Series from Lists

N

neelutiwari
Improve
Article Tags :
  • Python
  • Python-pandas
Practice Tags :
  • python

Similar Reads

  • Creating a Pandas Series
    A Pandas Series is like a single column of data in a spreadsheet. It is a one-dimensional array that can hold many types of data such as numbers, words or even other Python objects. Each value in a Series is associated with an index, which makes data retrieval and manipulation easy. This article exp
    3 min read
  • Create a Pandas Series from Array
    A Pandas Series is a one-dimensional labeled array that stores various data types, including numbers (integers or floats), strings, and Python objects. It is a fundamental data structure in the Pandas library used for efficient data manipulation and analysis. In this guide we will explore two simple
    2 min read
  • Creating a Pandas Series from Lists
    A Pandas Series is a one-dimensional labeled array capable of holding various data types such as integers, strings, floating-point numbers and Python objects. Unlike Python lists a Series ensures that all elements have the same data type. It is widely used in data manipulation and analysis. In this
    3 min read
  • How to Create a Pivot Table in Python using Pandas?
    A pivot table is a statistical table that summarizes a substantial table like a big dataset. It is part of data processing. This summary in pivot tables may include mean, median, sum, or other statistical terms. Pivot tables are originally associated with MS Excel but we can create a pivot table in
    3 min read
  • DataFrame vs Series in Pandas
    Pandas is a widely-used Python library for data analysis that provides two essential data structures: Series and DataFrame. These structures are potent tools for handling and examining data, but they have different features and applications. In this article, we will explore the differences between S
    8 min read
  • AWS Data Pipeline
    A Data Channel is a medium of moving data from one position (source) to a destination (similar to a data storehouse). In the process, the data is converted and optimized to gain a state that can be used and anatomized to develop business ideas. A data channel is a stage in aggregating, organizing, a
    8 min read
  • Combine two Pandas series into a DataFrame
    In this post, we will learn how to combine two series into a DataFrame? Before starting let's see what a series is?Pandas Series is a one-dimensional labeled array capable of holding any data type. In other terms, Pandas Series is nothing but a column in an excel sheet. There are several ways to con
    3 min read
  • Python | Pandas Index.factorize()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.factorize() function encode the object as an enumerated type or categoric
    2 min read
  • Creating views on Pandas DataFrame | Set - 2
    Prerequisite: Creating views on Pandas DataFrame | Set - 1 Many times while doing data analysis we are dealing with a large data set has a lot of attributes. All the attributes are not necessarily equally important. As a result, we want to work with only a set of columns in the dataframe. For that p
    2 min read
  • Python | Pandas DataFrame.ix[ ]
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame.ix[ ] is both Label and Integer based slicing technique. Besides pure
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences