Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Drop rows from Pandas dataframe with missing values or NaN in columns
Next article icon

Working with Missing Data in Pandas

Last Updated : 02 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:

  • None: A Python object used to represent missing values in object-type arrays.
  • NaN: A special floating-point value from NumPy which is recognized by all systems that use IEEE floating-point standards.

In this article we see how to detect, handle and fill missing values in a DataFrame to keep the data clean and ready for analysis.

Checking Missing Values in Pandas

Pandas provides two important functions which help in detecting whether a value is NaN helpful in making data cleaning and preprocessing easier in a DataFrame or Series are given below :

1. Using isnull()

isnull() returns a DataFrame of Boolean value where True represents missing data (NaN). This is simple if we want to find and fill missing data in a dataset.

Example 1: Finding Missing Values in a DataFrame

We will be using Numpy and Pandas libraries for this implementation.

Python
import pandas as pd import numpy as np  d = {'First Score': [100, 90, np.nan, 95],         'Second Score': [30, 45, 56, np.nan],         'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)  mv = df.isnull()  print(mv) 

Output

Example 2: Filtering Data Based on Missing Values

Here we used random Employee dataset, you can download the csv file from here. The isnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

Python
import pandas as pd d = pd.read_csv("/content/employees.csv")  bool_series = pd.isnull(d["Gender"]) missing_gender_data = d[bool_series] print(missing_gender_data) 

Output

2. Checking for Non-Missing Values Using notnull()

notnull() function returns a DataFrame with Boolean values where True indicates non-missing (valid) data. This function is useful when we want to focus only on the rows that have valid, non-missing values.

Example 1: Identifying Non-Missing Values in a DataFrame

Python
import pandas as pd import numpy as np  d = {'First Score': [100, 90, np.nan, 95],         'Second Score': [30, 45, 56, np.nan],         'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)  nmv = df.notnull()  print(nmv) 

Output

Example 2: Filtering Data with Non-Missing Values

notnull() function is used over the "Gender" column in order to filter and print out rows containing missing gender data.

Python
import pandas as pd d = pd.read_csv("/content/employees.csv")  nmg = pd.notnull(d["Gender"])  nmgd= d[nmg]  display(nmgd) 

Output

Filling Missing Values in Pandas

Following functions allow us to replace missing values with a specified value or use interpolation methods to find the missing data.

1. Using fillna()

fillna() used to replace missing values (NaN) with a given value. Lets see various example for this.

Example 1: Fill Missing Values with Zero

Python
import pandas as pd import numpy as np  d = {'First Score': [100, 90, np.nan, 95],         'Second Score': [30, 45, 56, np.nan],         'Third Score': [np.nan, 40, 80, 98]} df = pd.DataFrame(d)  df.fillna(0) 

Output

Example 2: Fill with Previous Value (Forward Fill)

The pad method is used to fill missing values with the previous value.

Python
df.fillna(method='pad') 

Output

Example 3: Fill with Next Value (Backward Fill)

The bfill function is used to fill it with the next value.

Python
df.fillna(method='bfill')  

Output

Example 4: Fill NaN Values with 'No Gender'

Python
import pandas as pd import numpy as np d = pd.read_csv("/content/employees.csv")  d[10:25] 

Output

Now we are going to fill all the null values in Gender column with "No Gender"

Python
d["Gender"].fillna('No Gender', inplace = True)  d[10:25] 

Output

2. Using replace()

Use replace() function to replace NaN values with a specific value.

Example

Python
import pandas as pd import numpy as np  data = pd.read_csv("/content/employees.csv") data[10:25] 

Output

Now, we are going to replace the all NaN value in the data frame with -99 value. 

Python
data.replace(to_replace=np.nan, value=-99) 

Output

3. Using interpolate()

The interpolate() function fills missing values using interpolation techniques such as the linear method.

Example

Python
import pandas as pd     df = pd.DataFrame({"A": [12, 4, 5, None, 1],                     "B": [None, 2, 54, 3, None],                     "C": [20, 16, None, 3, 8],                     "D": [14, 3, None, None, 6]})   print(df) 

Output

Let’s interpolate the missing values using Linear method. This method ignore the index and consider the values as equally spaced. 

Python
 df.interpolate(method ='linear', limit_direction ='forward') 

Output

Dropping Missing Values in Pandas

The dropna() function used to removes rows or columns with NaN values. It can be used to drop data based on different conditions.

1. Dropping Rows with At Least One Null Value

Remove rows that contain at least one missing value.

Example

Python
import pandas as pd import numpy as np  dict = {'First Score': [100, 90, np.nan, 95],         'Second Score': [30, np.nan, 45, 56],         'Third Score': [52, 40, 80, 98],         'Fourth Score': [np.nan, np.nan, np.nan, 65]} df = pd.DataFrame(dict)  df.dropna() 

Output

2. Dropping Rows with All Null Values

We can drop rows where all values are missing using dropna(how='all').

Example

Python
dict = {'First Score': [100, np.nan, np.nan, 95],         'Second Score': [30, np.nan, 45, 56],         'Third Score': [52, np.nan, 80, 98],         'Fourth Score': [np.nan, np.nan, np.nan, 65]} df = pd.DataFrame(dict)  df.dropna(how='all') 

Output

3. Dropping Columns with At Least One Null Value

To remove columns that contain at least one missing value we use dropna(axis=1).

Example

Python
dict = {'First Score': [100, np.nan, np.nan, 95],         'Second Score': [30, np.nan, 45, 56],         'Third Score': [52, np.nan, 80, 98],         'Fourth Score': [60, 67, 68, 65]} df = pd.DataFrame(dict)  df.dropna(axis=1) 

Output

4. Dropping Rows with Missing Values in CSV Files

When working with CSV files, we can drop rows with missing values using dropna().

Example

Python
import pandas as pd d = pd.read_csv("/content/employees.csv")  nd = d.dropna(axis=0, how='any')  print("Old data frame length:", len(d)) print("New data frame length:", len(nd)) print("Rows with at least one missing value:", (len(d) - len(nd))) 

Output:

MISSING1
Drop Rows with NaN

Since the difference is 236, there were 236 rows which had at least 1 Null value in any column. By using these functions we can easily detect, handle and fill missing values.


Next Article
Drop rows from Pandas dataframe with missing values or NaN in columns

A

abhishek1
Improve
Article Tags :
  • Data Analysis
  • AI-ML-DS
  • AI-ML-DS With Python

Similar Reads

    Data Analysis (Analytics) Tutorial
    Data Analytics is a process of examining, cleaning, transforming and interpreting data to discover useful information, draw conclusions and support decision-making. It helps businesses and organizations understand their data better, identify patterns, solve problems and improve overall performance.
    4 min read

    Prerequisites for Data Analysis

    Exploratory Data Analysis (EDA) with NumPy, Pandas, Matplotlib and Seaborn
    Exploratory Data Analysis (EDA) serves as the foundation of any data science project. It is an essential step where data scientists investigate datasets to understand their structure, identify patterns, and uncover insights. Data preparation involves several steps, including cleaning, transforming,
    4 min read
    SQL for Data Analysis
    SQL (Structured Query Language) is a powerful tool for data analysis, allowing users to efficiently query and manipulate data stored in relational databases. Whether you are working with sales, customer or financial data, SQL helps extract insights and perform complex operations like aggregation, fi
    6 min read
    Python | Math operations for Data analysis
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.There are some important math operations that can be performed on a pandas series to si
    2 min read
    Python - Data visualization tutorial
    Data visualization is a crucial aspect of data analysis, helping to transform analyzed data into meaningful insights through graphical representations. This comprehensive tutorial will guide you through the fundamentals of data visualization using Python. We'll explore various libraries, including M
    7 min read
    Free Public Data Sets For Analysis
    Data analysis is a crucial aspect of modern decision-making processes across various domains, including business, academia, healthcare, and government. However, obtaining high-quality datasets for analysis can be challenging and costly. Fortunately, there are numerous free public datasets available
    5 min read

    Data Analysis Libraries

    Pandas Tutorial
    Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
    6 min read
    NumPy Tutorial - Python Library
    NumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
    3 min read
    Data Analysis with SciPy
    Scipy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level funct
    6 min read

    Understanding the Data

    What is Data ?
    Data is a word we hear everywhere nowadays. In general, data is a collection of facts, information, and statistics and this can be in various forms such as numbers, text, sound, images, or any other format.In this article, we will learn about What is Data, the Types of Data, Importance of Data, and
    9 min read
    Understanding Data Attribute Types | Qualitative and Quantitative
    When we talk about data mining , we usually discuss knowledge discovery from data. To learn about the data, it is necessary to discuss data objects, data attributes, and types of data attributes. Mining data includes knowing about data, finding relations between data. And for this, we need to discus
    6 min read
    Univariate, Bivariate and Multivariate data and its analysis
    Data analysis is an important process for understanding patterns and making informed decisions based on data. Depending on the number of variables involved it can be classified into three main types: univariate, bivariate and multivariate analysis. Each method focuses on different aspects of the dat
    5 min read
    Attributes and its Types in Data Analytics
    In this article, we are going to discuss attributes and their various types in data analytics. We will also cover attribute types with the help of examples for better understanding. So let's discuss them one by one. What are Attributes?Attributes are qualities or characteristics that describe an obj
    4 min read

    Loading the Data

    Pandas Read CSV in Python
    CSV files are the Comma Separated Files. It allows users to load tabular data into a DataFrame, which is a powerful structure for data manipulation and analysis. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame. Here’s a
    6 min read
    Export Pandas dataframe to a CSV file
    When working on a Data Science project one of the key tasks is data management which includes data collection, cleaning and storage. Once our data is cleaned and processed it’s essential to save it in a structured format for further analysis or sharing.A CSV (Comma-Separated Values) file is a widely
    2 min read
    Pandas - Parsing JSON Dataset
    JSON (JavaScript Object Notation) is a popular way to store and exchange data especially used in web APIs and configuration files. Pandas provides tools to parse JSON data and convert it into structured DataFrames for analysis. In this guide we will explore various ways to read, manipulate and norma
    2 min read
    Exporting Pandas DataFrame to JSON File
    Pandas a powerful Python library for data manipulation provides the to_json() function to convert a DataFrame into a JSON file and the read_json() function to read a JSON file into a DataFrame.In this article we will explore how to export a Pandas DataFrame to a JSON file with detailed explanations
    2 min read
    Working with Excel files using Pandas
    Excel sheets are very instinctive and user-friendly, which makes them ideal for manipulating large datasets even for less technical folks. If you are looking for places to learn to manipulate and automate stuff in Excel files using Python, look no further. You are at the right place.In this article,
    7 min read

    Data Cleaning

    What is Data Cleaning?
    Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting (or removing) errors, inconsistencies, and inaccuracies within a dataset. This crucial step in the data management and data science pipeline ensures that the data is accurate, consistent, and
    12 min read
    ML | Overview of Data Cleaning
    Data cleaning is a important step in the machine learning (ML) pipeline as it involves identifying and removing any missing duplicate or irrelevant data. The goal of data cleaning is to ensure that the data is accurate, consistent and free of errors as raw data is often noisy, incomplete and inconsi
    13 min read
    Best Data Cleaning Techniques for Preparing Your Data
    Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets to improve their quality, accuracy, and reliability for analysis or other applications. It involves several steps aimed at detecting and r
    6 min read

    Handling Missing Data

    Working with Missing Data in Pandas
    In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
    5 min read
    Drop rows from Pandas dataframe with missing values or NaN in columns
    We are given a Pandas DataFrame that may contain missing values, also known as NaN (Not a Number), in one or more columns. Our task is to remove the rows that have these missing values to ensure cleaner and more accurate data for analysis. For example, if a row contains NaN in any specified column,
    4 min read
    Count NaN or missing values in Pandas DataFrame
    In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. 1. DataFrame.isnull() MethodDataFrame.isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are
    3 min read
    ML | Handling Missing Values
    Missing values are a common issue in machine learning. This occurs when a particular variable lacks data points, resulting in incomplete information and potentially harming the accuracy and dependability of your models. It is essential to address missing values efficiently to ensure strong and impar
    12 min read
    Working with Missing Data in Pandas
    In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
    5 min read
    ML | Handle Missing Data with Simple Imputer
    SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder. It is implemented by the use of the SimpleImputer() method which takes the following arguments : missing_values : The missing_
    2 min read
    How to handle missing values of categorical variables in Python?
    Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. Often we come across datasets in which some values are missing from the columns. This causes problems when we apply a machine learning model to the dataset. This increases the cha
    4 min read
    Replacing missing values using Pandas in Python
    Dataset is a collection of attributes and rows. Data set can have missing data that are represented by NA in Python and in this article, we are going to replace missing values in this article We consider this data set: Dataset data set In our data contains missing values in quantity, price, bought,
    2 min read

    Outliers Detection

    Box Plot
    Box Plot is a graphical method to visualize data distribution for gaining insights and making informed decisions. Box plot is a type of chart that depicts a group of numerical data through their quartiles. In this article, we are going to discuss components of a box plot, how to create a box plot, u
    7 min read
    Detect and Remove the Outliers using Python
    Outliers are data points that deviate significantly from other data points in a dataset. They can arise from a variety of factors such as measurement errors, rare events or natural variations in the data. If left unchecked it can distort data analysis, skew statistical results and impact machine lea
    8 min read
    Z score for Outlier Detection - Python
    Z score (or standard score) is an important concept in statistics. It helps to understand if a data value is greater or smaller than the mean and how far away it is from the mean. More specifically, the Z score tells how many standard deviations away a data point is from the mean. Z score = (x -mean
    3 min read
    Clustering-Based approaches for outlier detection in data mining
    Clustering Analysis is the process of dividing a set of data objects into subsets. Each subset is a cluster such that objects are similar to each other. The set of clusters obtained from clustering analysis can be referred to as Clustering. For example: Segregating customers in a Retail market as a
    6 min read

    Exploratory Data Analysis

    What is Exploratory Data Analysis?
    Exploratory Data Analysis (EDA) is a important step in data science as it visualizing data to understand its main features, find patterns and discover how different parts of the data are connected. In this article, we will see more about Exploratory Data Analysis (EDA).Why Exploratory Data Analysis
    8 min read
    EDA - Exploratory Data Analysis in Python
    Exploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
    6 min read

    Time Series Data Analysis

    Time Series Analysis & Visualization in Python
    Time series data consists of sequential data points recorded over time which is used in industries like finance, pharmaceuticals, social media and research. Analyzing and visualizing this data helps us to find trends and seasonal patterns for forecasting and decision-making. In this article, we will
    6 min read
    What is a trend in time series?
    Time series data is a sequence of data points that measure some variable over ordered period of time. It is the fastest-growing category of databases as it is widely used in a variety of industries to understand and forecast data patterns. So while preparing this time series data for modeling it's i
    3 min read
    Basic DateTime Operations in Python
    Python has an in-built module named DateTime to deal with dates and times in numerous ways. In this article, we are going to see basic DateTime operations in Python. There are six main object classes with their respective components in the datetime module mentioned below: datetime.datedatetime.timed
    12 min read
    How to deal with missing values in a Timeseries in Python?
    It is common to come across missing values when working with real-world data. Time series data is different from traditional machine learning datasets because it is collected under varying conditions over time. As a result, different mechanisms can be responsible for missing records at different tim
    9 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences