Unnest (Explode) Multiple List Columns In A Pandas Dataframe
Last Updated : 24 Apr, 2025
An open-source manipulation tool that is used for handling data is known as Pandas. Have you ever encountered a dataset that has columns with data as a list? In such cases, there is a necessity to split that column into various columns, as Pandas cannot handle such data. In this article, we will discuss the same, i.e., unnest or explode multiple list columns into a Pandas data frame.
Unnest (Explode) Multiple List Columns In A Pandas Dataframe
What are Pandas?
Pandas is an open-source data manipulation and analysis tool built on top of the Python programming language. It provides powerful data structures, such as DataFrame and Series, that allow users to easily manipulate and analyze data.
What are nested list columns?
Nested list columns are columns in a DataFrame where each cell contains a list of values, rather than a single scalar value. This occurs when the data is structured hierarchically, with each cell representing a collection of related sub-values.
Why to unnest multiple list columns?
Decoupling multiple list columns in a data frame can be useful for several reasons:
- Data simplification: Unnesting converts complex nested data into a simpler tabular form, making it easier to understand and manipulate.Improved analysis: Nested data can be better analyzed with Panda and other data analysis tools. This allows data to be more easily combined, filtered and processed.
- Improved visualization: Nested data can be visualized more effectively, allowing better understanding to be conveyed through charts, graphs, and charts.
- Compatibility: Nested data is often needed for certain types of analysis, such as machine learning modeling, which typically requires tabular data as input.
- Data integration: Decoupling can facilitate the integration of data from different sources or systems by aligning the data structure with a more standard table format.
- Normalization: Content separation can be a step towards data normalization that can improve data quality and reduce redundancy..
Efficient ways to unnest multiple list columns in a Pandas dataframe:
- Using the explode function
- Using pandas.series.explode function
- Using pandas.series with lambda function
Using the explode function
The way of flattening nested Series objects and DataFrame columns by splitting their content into multiple rows is known as the explode function. In this method, we will see how we can unnest multiple list columns using the explode function.
Syntax:
df=df.explode(['Favourite Ice-cream', 'Favourite Soft-Drink']).reset_index(drop=True)
Here,
- column-1, column-2: These are the columns that you want to unnest.
- df: It is the data frame that has those nested columns.
Implementations:
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using the explode function.
Python3 # Import the Pandas library import pandas as pd # Create a data frame that has nested columns df = pd.DataFrame({'Name': ['Arun', 'Aniket', 'Ishita', 'Raghav', 'Vinayak'], 'Favourite Ice-cream': [['Strawberry', 'Choco-chips'], ['Vanilla', 'Black Currant'], ['Butterscotch', 'Chocolate'], ['Mango', 'Choco-chips'], ['Kulfi', 'Kaju-Kishmish']], 'Favourite Soft-Drink': [['Coca Cola', 'Lemonade'], ['Thumbs Up', 'Sprite'], ['Moutain Dew', 'Fanta'], ['Mirinda', 'Maaza'], ['7Up', 'Sprite']]}) # Print the actual data frame print('Actual dataframe:\n', df) # Unnest the nested columns df = df.explode(['Favourite Ice-cream', 'Favourite Soft-Drink'] ).reset_index(drop=True) # Print the unnested data frame print('\nDataframe after unnesting:\n', df)
Output:
Actual dataframe: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade] 1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite] 2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta] 3 Raghav [Mango, Choco-chips] [Mirinda, Maaza] 4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite] Dataframe after unnesting: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun Strawberry Coca Cola 1 Arun Choco-chips Lemonade 2 Aniket Vanilla Thumbs Up 3 Aniket Black Currant Sprite 4 Ishita Butterscotch Moutain Dew 5 Ishita Chocolate Fanta 6 Raghav Mango Mirinda 7 Raghav Choco-chips Maaza 8 Vinayak Kulfi 7Up 9 Vinayak Kaju-Kishmish Sprite
Using pandas.series.explode function
The function that splits a series object containing list-like values into multiple rows, one for each element in the list is known as pandas.series.explode function. In this method, we will see how we can unnest multiple list columns using the pandas.series.explode function.
Syntax:
df=df.set_index(['column-3']).apply(pd.Series.explode).reset_index()
Here,
- column-3: It is the column that is already unnested.
- df: It is the data frame that has those nested columns.
Implementations:
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series.explode function.
Python3 # Import the Pandas library import pandas as pd # Create a data frame that has nested columns df = pd.DataFrame({'Name': ['Arun','Aniket','Ishita', 'Raghav','Vinayak'], 'Favourite Ice-cream':[['Strawberry', 'Choco-chips'], ['Vanilla', 'Black Currant'], ['Butterscotch', 'Chocolate'], ['Mango', 'Choco-chips'], ['Kulfi', 'Kaju-Kishmish']], 'Favourite Soft-Drink':[['Coca Cola', 'Lemonade'], ['Thumbs Up', 'Sprite'], ['Moutain Dew', 'Fanta'], ['Mirinda', 'Maaza'], ['7Up', 'Sprite']]}) # Print the actual data frame print ('Actual dataframe:\n',df) # Unnest the nested columns df=df.set_index(['Name']).apply(pd.Series.explode).reset_index() # Print the unnested data frame print ('\nDataframe after unnesting:\n',df)
Output:
Actual dataframe: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade] 1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite] 2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta] 3 Raghav [Mango, Choco-chips] [Mirinda, Maaza] 4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite] Dataframe after unnesting: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun Strawberry Coca Cola 1 Arun Choco-chips Lemonade 2 Aniket Vanilla Thumbs Up 3 Aniket Black Currant Sprite 4 Ishita Butterscotch Moutain Dew 5 Ishita Chocolate Fanta 6 Raghav Mango Mirinda 7 Raghav Choco-chips Maaza 8 Vinayak Kulfi 7Up 9 Vinayak Kaju-Kishmish Sprite
Using pandas.series with lambda function
An anonymous function that can take any number of arguments, but can only have one expression is known as lambda function. In this method, we will see how we can unnest multiple list columns using the pandas.series with lambda function.
Syntax:
df=df.set_index('Name').apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)
Here,
- column-3: It is the column that is already unnested.
- df: It is the data frame that has those nested columns.
Implementations:
In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series with lambda function.
Python3 # Import the Pandas library import pandas as pd # Create a data frame that has nested columns df = pd.DataFrame({'Name': ['Arun','Aniket','Ishita', 'Raghav','Vinayak'], 'Favourite Ice-cream':[['Strawberry', 'Choco-chips'], ['Vanilla', 'Black Currant'], ['Butterscotch', 'Chocolate'], ['Mango', 'Choco-chips'], ['Kulfi', 'Kaju-Kishmish']], 'Favourite Soft-Drink':[['Coca Cola', 'Lemonade'], ['Thumbs Up', 'Sprite'], ['Moutain Dew', 'Fanta'], ['Mirinda', 'Maaza'], ['7Up', 'Sprite']]}) # Print the actual data frame print ('Actual dataframe:\n',df) # Unnest the nested columns df=df.set_index('Name').apply( lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1) # Print the unnested data frame print ('\nDataframe after unnesting:\n',df)
Output:
Actual dataframe: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun [Strawberry, Choco-chips] [Coca Cola, Lemonade] 1 Aniket [Vanilla, Black Currant] [Thumbs Up, Sprite] 2 Ishita [Butterscotch, Chocolate] [Moutain Dew, Fanta] 3 Raghav [Mango, Choco-chips] [Mirinda, Maaza] 4 Vinayak [Kulfi, Kaju-Kishmish] [7Up, Sprite] Dataframe after unnesting: Name Favourite Ice-cream Favourite Soft-Drink 0 Arun Strawberry Coca Cola 1 Arun Choco-chips Lemonade 2 Aniket Vanilla Thumbs Up 3 Aniket Black Currant Sprite 4 Ishita Butterscotch Moutain Dew 5 Ishita Chocolate Fanta 6 Raghav Mango Mirinda 7 Raghav Choco-chips Maaza 8 Vinayak Kulfi 7Up 9 Vinayak Kaju-Kishmish Sprite
Similar Reads
Add multiple columns to dataframe in Pandas
In Pandas, we have the freedom to add columns in the data frame whenever needed. There are multiple ways to add columns to pandas dataframe. Add multiple columns to a DataFrame using Lists[GFGTABS] Python3 # importing pandas library import pandas as pd # creating and initializing a nested list stud
3 min read
How to rename multiple column headers in a Pandas DataFrame?
Here we are going to rename multiple column headers using the rename() method. The rename method is used to rename a single column as well as rename multiple columns at a time. And pass columns that contain the new values and in place = true as an argument. We pass inplace = true because we just mod
5 min read
Split dataframe in Pandas based on values in multiple columns
In this article, we are going to see how to divide a dataframe by various methods and based on various parameters using Python. To divide a dataframe into two or more separate dataframes based on the values present in the column we first create a data frame. Creating a DataFrame for demonestration[G
3 min read
How to drop one or multiple columns in Pandas DataFrame
Let's learn how to drop one or more columns in Pandas DataFrame for data manipulation. Drop Columns Using df.drop() MethodLet's consider an example of the dataset (data) with three columns 'A', 'B', and 'C'. Now, to drop a single column, use the drop() method with the columnâs name. [GFGTABS] Python
4 min read
How to plot multiple data columns in a DataFrame?
Python comes with a lot of useful packages such as pandas, matplotlib, numpy, etc. To use DataFrame, we need a Pandas library and to plot columns of a DataFrame, we require matplotlib. Pandas has a tight integration with Matplotlib. You can plot data directly from your DataFrame using the plot() met
3 min read
Export Pandas dataframe to a CSV file
When working on a Data Science project one of the key tasks is data management which includes data collection, cleaning and storage. Once our data is cleaned and processed itâs essential to save it in a structured format for further analysis or sharing. A CSV (Comma-Separated Values) file is a widel
3 min read
Unnesting a list of lists in a data frame column in R
Working with data that has lists within columns is frequent when using R programming language. These lists may include various kinds of information, including other lists. But, working with these hierarchical lists can be difficult, especially if we wish to analyze or visualize the data. A list of l
6 min read
How to Merge multiple CSV Files into a single Pandas dataframe ?
While working with CSV files during data analysis, we often have to deal with large datasets. Sometimes, it might be possible that a single CSV file doesn't consist of all the data that you need. In such cases, there's a need to merge these files into a single data frame. Luckily, the Pandas library
3 min read
Get a List of Particular Column Values in a Pandas DataFrame
In this article, you'll learn how to extract all values of a particular column from a Pandas DataFrame as a Python list. Get a List of a Particular Column Using tolist()tolist() method is a simple and effective way to convert a Pandas Series (column) into a Python list. Here's an example: [GFGTABS]
3 min read
Split single column into multiple columns in PySpark DataFrame
pyspark.sql.functions provide a function split() which is used to split DataFrame string Column into multiple columns.  Syntax: pyspark.sql.functions.split(str, pattern, limit=- 1) Parameters: str: str is a Column or str to split.pattern: It is a str parameter, a string that represents a regular ex
4 min read