Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Numpy exercise
  • pandas
  • Matplotlib
  • Data visulisation
  • EDA
  • Machin Learning
  • Deep Learning
  • NLP
  • Data science
  • ML Tutorial
  • Computer Vision
  • ML project
Open In App
Next Article:
NumPy | Replace NaN values with average of columns
Next article icon

NumPy | Replace NaN values with average of columns

Last Updated : 09 Feb, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Data visualization is one of the most important steps in machine learning and data analytics. 

Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values that are unusable for data visualization. 

To solve this problem, one possible method is to replace NaN values with an average of columns. 

Given below are a few methods to solve this problem.   

  • Using np.colmean and np.take 
  • Using np.ma and np.where 
  • Using Naive and zip 
  • Using list comprehension and built-in functions
  • Using zip()+lambda()

Let us understand them better with Python program examples:

Using np.colmean and np.take 

We use the colmean() method of the NumPy library to find the mean of columns. We then use the take() method to replace column mean (average) with NaN values.

Example:

Python3
# Python code to demonstrate # to replace nan values # with an average of columns  import numpy as np  # Initialising numpy array ini_array = np.array([[1.3, 2.5, 3.6, np.nan],                        [2.6, 3.3, np.nan, 5.5],                       [2.1, 3.2, 5.4, 6.5]])  # printing initial array print ("initial array", ini_array)  # column mean col_mean = np.nanmean(ini_array, axis = 0)  # printing column mean print ("columns mean", str(col_mean))  # find indices where nan value is present inds = np.where(np.isnan(ini_array))  # replace inds with avg of column ini_array[inds] = np.take(col_mean, inds[1])  # printing final array print ("final array", ini_array) 

Output:

initial array [[ 1.3  2.5  3.6  nan]
[ 2.6 3.3 nan 5.5]
[ 2.1 3.2 5.4 6.5]]
columns mean [ 2. 3. 4.5 6. ]
final array [[ 1.3 2.5 3.6 6. ]
[ 2.6 3.3 4.5 5.5]
[ 2.1 3.2 5.4 6.5]]

Using np.ma and np.where 

We use the ma() method, which allows you to create a masked array where NaN values are masked out. We then use the where() method to replace the NaN values with column averages.

Example:

Python3
# Python code to demonstrate # to replace nan values # with average of columns  import numpy as np  # Initialising numpy array ini_array = np.array([[1.3, 2.5, 3.6, np.nan],                       [2.6, 3.3, np.nan, 5.5],                       [2.1, 3.2, 5.4, 6.5]])  # printing initial array print ("initial array", ini_array)  # replace nan with col means res = np.where(np.isnan(ini_array), np.ma.array(ini_array,                mask = np.isnan(ini_array)).mean(axis = 0), ini_array)     # printing final array print ("final array", res) 


Output:

initial array [[ 1.3  2.5  3.6  nan]
[ 2.6 3.3 nan 5.5]
[ 2.1 3.2 5.4 6.5]]
final array [[ 1.3 2.5 3.6 6. ]
[ 2.6 3.3 4.5 5.5]
[ 2.1 3.2 5.4 6.5]]

Using Naive and zip 

We use Zip to pair up the elements from the unpacked arrays, effectively giving us pairs of (row, column) indices for each NaN value in the array. We then replace these values with column averages.

Example:

Python3
# Python code to demonstrate # to replace nan values # with average of columns  import numpy as np  # Initialising numpy array ini_array = np.array([[1.3, 2.5, 3.6, np.nan],                       [2.6, 3.3, np.nan, 5.5],                       [2.1, 3.2, 5.4, 6.5]])  # printing initial array print ("initial array", ini_array)  # indices where values is nan in array indices = np.where(np.isnan(ini_array))  # Iterating over numpy array to replace nan with values for row, col in zip(*indices):     ini_array[row, col] = np.mean(ini_array[            ~np.isnan(ini_array[:, col]), col])  # printing final array print ("final array", ini_array) 

Output:

initial array [[ 1.3  2.5  3.6  nan]
[ 2.6 3.3 nan 5.5]
[ 2.1 3.2 5.4 6.5]]
final array [[ 1.3 2.5 3.6 6. ]
[ 2.6 3.3 4.5 5.5]
[ 2.1 3.2 5.4 6.5]]

Using list comprehension and built-in functions

It first computes the column means using a list comprehension with the help of the filter and zip functions. Then, it replaces the NaN values in the array with the corresponding column means using another list comprehension with the help of the enumerate function. Finally, it returns the modified list.

Algorithm:

1. Compute the column means.
2. Replace the NaN values in the array with the corresponding column means using list comprehension and built-in functions.
3. Return the modified list.

Python3
def replace_nan_with_mean(arr):     col_means = [sum(filter(lambda x: x is not None, col))/len(list(filter(lambda x: x is not None, col))) for col in zip(*arr)]     for i in range(len(arr)):         arr[i] = [col_means[j] if x is None else x for j, x in enumerate(arr[i])]     return arr arr=[[1.3, 2.5, 3.6, None],      [2.6, 3.3, None, 5.5],      [2.1, 3.2, 5.4, 6.5]] print(replace_nan_with_mean(arr)) 

Output
[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]  

Using zip()+lambda()

Compute the column means excluding NaN values using a loop over the transposed array zip(*arr). Replace NaN values with column means using map() and lambda functions.

Algorithm

1. Initialize an empty list means to store the column means.
2. Loop over the transposed array zip(*arr) to iterate over columns.
3. For each column, filter out None values and compute the mean of the remaining values. If there are no remaining values, set the mean to 0.
4. Append the mean to the means list.
5. Use map() and lambda functions to replace None values with the corresponding column mean in each row of the array arr.
6. Return the modified array arr.

Python3
# initial array arr = [[1.3, 2.5, 3.6, None],        [2.6, 3.3, None, 5.5],        [2.1, 3.2, 5.4, 6.5]]  # compute column means means = [] for col in zip(*arr):     values = [x for x in col if x is not None]     means.append(sum(values)/len(values) if values else 0)  # replace NaN values with column means arr = list(map(lambda row: [means[j] if x is None else x for j,x in enumerate(row)], arr))  # print final array print(arr) 

Output
[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]  

Next Article
NumPy | Replace NaN values with average of columns

G

garg_ak0109
Improve
Article Tags :
  • Python
  • Numpy
  • Python-numpy
  • Python numpy-program
  • AI-ML-DS With Python
Practice Tags :
  • python

Similar Reads

    Replace NaN Values with Zeros in Pandas DataFrame
    NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to
    5 min read
    Replace all the NaN values with Zero's in a column of a Pandas dataframe
    Replacing the NaN or the null values in  a dataframe can be easily performed using a single line DataFrame.fillna() and DataFrame.replace() method. We will discuss these methods along with an example demonstrating how to use it.                                                      DataFrame.fillna()
    3 min read
    How to Replace Numpy NAN with String
    Dealing with missing or undefined data is a common challenge in data science and programming. In the realm of numerical computing in Python, the NumPy library is a powerhouse, offering versatile tools for handling arrays and matrices. However, when NaN (not a number) values appear in your data, you
    2 min read
    Replacing Pandas or Numpy Nan with a None to use with MysqlDB
    The widely used relational database management system is known as MysqlDB. The MysqlDB doesn't understand and accept the value of 'Nan', thus there is a need to convert the 'Nan' value coming from Pandas or Numpy to 'None'. In this article, we will see how we can replace Pandas or Numpy 'Nan' with a
    3 min read
    How to Drop Rows with NaN Values in Pandas DataFrame?
    In Pandas missing values are represented as NaN (Not a Number) which can lead to inaccurate analyses. One common approach to handling missing data is to drop rows containing NaN values using pandas. Below are some methods that can be used:Method 1: Using dropna()The dropna() method is the most strai
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences