Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Python | Pandas dataframe.eq()
Next article icon

Pandas DataFrame

Last Updated : 28 Nov, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

Pandas Dataframe

Creating a Pandas DataFrame

Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

Here are some ways by which we create a dataframe:

Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

Python
import pandas as pd   # list of strings lst = ['Geeks', 'For', 'Geeks', 'is',              'portal', 'for', 'Geeks']   # Calling DataFrame constructor on list df = pd.DataFrame(lst) print(df) 

Output:

Output

Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.

Python
# Python code demonstrate creating  # DataFrame from dict narray / lists  # By default addresses.   import pandas as pd   # intialise data of lists. data = {'Name':['Tom', 'nick', 'krish', 'jack'],         'Age':[20, 21, 19, 18]}   # Create DataFrame df = pd.DataFrame(data)   # Print the output. print(df) 


Output:

 
For more details refer to Creating a Pandas DataFrame

Table of Content

  • Dealing with Rows and Columns
  • Indexing and Selecting Data
  • Selecting a single row
  • Working with Missing Data
  • Iterating over rows and columns


Dealing with Rows and Columns in Pandas DataFrame

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

Python
# Import pandas package import pandas as pd   # Define a dictionary containing employee data data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],         'Age':[27, 24, 22, 32],         'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],         'Qualification':['Msc', 'MA', 'MCA', 'Phd']}   # Convert the dictionary into DataFrame  df = pd.DataFrame(data)   # select two columns print(df[['Name', 'Qualification']]) 


Output:

Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

Note: We’ll be using nba.csv file in below examples.

Python
# importing pandas package import pandas as pd   # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name")   # retrieving row by loc method first = data.loc["Avery Bradley"] second = data.loc["R.J. Hunter"]     print(first, "\n\n\n", second) 


Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.

For more Details refer to Dealing with Rows and Columns

Indexing and Selecting Data in Pandas

Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.

Indexing a Dataframe using indexing operator [] 

Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[].

In order to select a single column, we simply put the name of the column in-between the brackets

Python
# importing pandas package import pandas as pd   # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name")   # retrieving columns by indexing operator first = data["Age"]       print(first) 


Output:

Indexing a DataFrame using .loc[ ]

This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.

In order to select a single row using .loc[], we put a single row label in a .loc function.

Python
# importing pandas package import pandas as pd   # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name")   # retrieving row by loc method first = data.loc["Avery Bradley"] second = data.loc["R.J. Hunter"]     print(first, "\n\n\n", second) 


Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.

Indexing a DataFrame using .iloc[ ] 

This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.

Python
import pandas as pd   # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name")     # retrieving rows by iloc method  row2 = data.iloc[3]        print(row2) 


Output:

For more Details refer

  • Indexing and Selecting Data with Pandas
  • Boolean Indexing in Pandas

Working with Missing Data

Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.

Checking for missing values using isnull() and notnull() :


In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

Python
# importing pandas as pd import pandas as pd   # importing numpy as np import numpy as np   # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95],         'Second Score': [30, 45, 56, np.nan],         'Third Score':[np.nan, 40, 80, 98]}   # creating a dataframe from list df = pd.DataFrame(dict)   # using isnull() function   df.isnull() 


Output:

Filling missing values using fillna(), replace() and interpolate()


In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame. Interpolate() function is basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value.

Python
# importing pandas as pd import pandas as pd   # importing numpy as np import numpy as np   # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95],         'Second Score': [30, 45, 56, np.nan],         'Third Score':[np.nan, 40, 80, 98]}   # creating a dataframe from dictionary df = pd.DataFrame(dict)   # filling missing value using fillna()   df.fillna(0) 


Dropping missing values using dropna()


In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways.

Python
# importing pandas as pd import pandas as pd   # importing numpy as np import numpy as np   # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95],         'Second Score': [30, np.nan, 45, 56],         'Third Score':[52, 40, 80, 98],         'Fourth Score':[np.nan, np.nan, np.nan, 65]}   # creating a dataframe from dictionary df = pd.DataFrame(dict)     df 


Output:


Now we drop rows with at least one Nan value (Null value).

Python
# importing pandas as pd import pandas as pd   # importing numpy as np import numpy as np   # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95],         'Second Score': [30, np.nan, 45, 56],         'Third Score':[52, 40, 80, 98],         'Fourth Score':[np.nan, np.nan, np.nan, 65]}   # creating a dataframe from dictionary df = pd.DataFrame(dict)   # using dropna() function   df.dropna() 


Output:

For more Details refer to Working with Missing Data in Pandas
 

Iterating over rows and columns

Iteration is a general term for taking each item of something, one after another. Pandas DataFrame consists of rows and columns so, in order to iterate over dataframe, we have to iterate a dataframe like a dictionary.

Iterating over rows


In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These three function will help in iteration over rows.

Python
# importing pandas as pd import pandas as pd    # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],         'degree': ["MBA", "BCA", "M.Tech", "MBA"],         'score':[90, 40, 80, 98]}   # creating a dataframe from a dictionary  df = pd.DataFrame(dict)   print(df) 


Output:

Now we apply iterrows() function in order to get a each element of rows.

Python
# importing pandas as pd import pandas as pd    # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],         'degree': ["MBA", "BCA", "M.Tech", "MBA"],         'score':[90, 40, 80, 98]}   # creating a dataframe from a dictionary  df = pd.DataFrame(dict)   # iterating over rows using iterrows() function  for i, j in df.iterrows():     print(i, j)     print() 


Output:

Iterating over Columns


In order to iterate over columns, we need to create a list of dataframe columns and then iterating through that list to pull out the dataframe columns.

Python
# importing pandas as pd import pandas as pd     # dictionary of lists dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],         'degree': ["MBA", "BCA", "M.Tech", "MBA"],         'score':[90, 40, 80, 98]}    # creating a dataframe from a dictionary  df = pd.DataFrame(dict)   print(df) 


Output:

Now we iterate through columns in order to iterate through columns we first create a list of dataframe columns and then iterate through list.

Python
# creating a list of dataframe columns columns = list(df)   for i in columns:       # printing the third element of the column     print (df[i][2]) 


Output:

 
For more Details refer to Iterating over rows and columns in Pandas DataFrame

DataFrame Methods:

FUNCTIONDESCRIPTION
index()Method returns index (row labels) of the DataFrame
insert()Method inserts a column into a DataFrame
add()Method returns addition of dataframe and other, element-wise (binary operator add)
sub()Method returns subtraction of dataframe and other, element-wise (binary operator sub)
mul()Method returns multiplication of dataframe and other, element-wise (binary operator mul)
div()Method returns floating division of dataframe and other, element-wise (binary operator truediv)
unique()Method extracts the unique values in the dataframe
nunique()Method returns count of the unique values in the dataframe
value_counts()Method counts the number of times each unique value occurs within the Series
columns()Method returns the column labels of the DataFrame
axes()Method returns a list representing the axes of the DataFrame
isnull()Method creates a Boolean Series for extracting rows with null values
notnull()Method creates a Boolean Series for extracting rows with non-null values
isin()Method extracts rows from a DataFrame where a column value exists in a predefined collection
dtypes()Method returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns
astype()Method converts the data types in a Series
values()Method returns a Numpy representation of the DataFrame i.e. only the values in the DataFrame will be returned, the axes labels will be removed
sort_values()- Set1, Set2Method sorts a data frame in Ascending or Descending order of passed Column
sort_index()Method sorts the values in a DataFrame based on their index positions or labels instead of their values but sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method
loc[]Method retrieves rows based on index label
iloc[]Method retrieves rows based on index position
ix[]Method retrieves DataFrame rows based on either index label or index position. This method combines the best features of the .loc[] and .iloc[] methods
rename()Method is called on a DataFrame to change the names of the index labels or column names
columns()Method is an alternative attribute to change the coloumn name
drop()Method is used to delete rows or columns from a DataFrame
pop()Method is used to delete rows or columns from a DataFrame
sample()Method pulls out a random sample of rows or columns from a DataFrame
nsmallest()Method pulls out the rows with the smallest values in a column
nlargest()Method pulls out the rows with the largest values in a column
shape()Method returns a tuple representing the dimensionality of the DataFrame
ndim()Method returns an ‘int’ representing the number of axes / array dimensions.
Returns 1 if Series, otherwise returns 2 if DataFrame
dropna()Method allows the user to analyze and drop Rows/Columns with Null values in different ways
fillna()Method manages and let the user replace NaN values with some value of their own
rank()Values in a Series can be ranked in order with this method
query()Method is an alternate string-based syntax for extracting a subset from a DataFrame
copy()Method creates an independent copy of a pandas object
duplicated()Method creates a Boolean Series and uses it to extract rows that have duplicate values
drop_duplicates()Method is an alternative option to identifying duplicate rows and removing them through filtering
set_index()Method sets the DataFrame index (row labels) using one or more existing columns
reset_index()Method resets index of a Data Frame. This method sets a list of integer ranging from 0 to length of data as index
where()Method is used to check a Data Frame for one or more condition and return the result accordingly. By default, the rows not satisfying the condition are filled with NaN value

More on Pandas

  1. Python | Pandas Series
  2. Python | Pandas Working With Text Data
  3. Python | Pandas Working with Dates and Times
  4. Python | Pandas Merging, Joining, and Concatenating.

Next Article
Python | Pandas dataframe.eq()

A

anuragtriarna
Improve
Article Tags :
  • Pandas
  • AI-ML-DS
  • Python pandas-dataFrame

Similar Reads

  • Pandas Dataframe Index
    Index in pandas dataframe act as reference for each row in dataset. It can be numeric or based on specific column values. The default index is usually a RangeIndex starting from 0, but you can customize it for better data understanding. You can easily access the current index of a dataframe using th
    3 min read
  • Pandas Access DataFrame
    Accessing a dataframe in pandas involves retrieving, exploring, and manipulating data stored within this structure. The most basic form of accessing a DataFrame is simply referring to it by its variable name. This will display the entire DataFrame, which includes all rows and columns. [GFGTABS] Pyth
    3 min read
  • Slicing Pandas Dataframe
    Slicing a Pandas DataFrame is a important skill for extracting specific data subsets. Whether you want to select rows, columns or individual cells, Pandas provides efficient methods like iloc[] and loc[]. In this guide we’ll explore how to use integer-based and label-based indexing to slice DataFram
    3 min read
  • Python | Pandas dataframe.eq()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.eq() is a wrapper used for the flexible comparison. It provides a con
    3 min read
  • Python | Pandas Dataframe.at[ ]
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas at[] is used to return data in a dataframe at the passed location. The passed l
    2 min read
  • Python | Pandas DataFrame.ix[ ]
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DataFrame.ix[ ] is both Label and Integer based slicing technique. Besides pure
    2 min read
  • Creating a Pandas DataFrame
    Pandas DataFrame comes is a powerful tool that allows us to store and manipulate data in a structured way, similar to an Excel spreadsheet or a SQL table. A DataFrame is similar to a table with rows and columns. It helps in handling large amounts of data, performing calculations, filtering informati
    3 min read
  • Pandas DataFrame.loc[] Method
    Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure o
    6 min read
  • Python | Pandas Dataframe.iat[ ]
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas iat[] method is used to return data in a dataframe at the passed location. The
    2 min read
  • Python | Pandas dataframe.all()
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. DataFrame.all() method checks whether all elements are True, potentially over an axis.
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences