Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python – Data visualization
  • Pandas
  • Seaborn
  • Matplotlib
  • Plotly
  • Altair
  • Bokeh
  • Pygal
  • Exploratory Data Analysis
  • Power BI
  • Tableau
  • Data Analysis with Python
  • Python Interview Questions
  • Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Data Science
  • R Programming
Open In App
Next Article:
Detect and Remove the Outliers using Python
Next article icon

Box Plot

Last Updated : 12 Mar, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Box Plot is a graphical method to visualize data distribution for gaining insights and making informed decisions. Box plot is a type of chart that depicts a group of numerical data through their quartiles.

In this article, we are going to discuss components of a box plot, how to create a box plot, uses of a Box Plot, and how to compare box plots.

Table of Content

  • What is a Box Plot?
  • How to create a box plots?
  • Uses of a Box Plot
  • How to compare box plots?

What is a Box Plot?

The idea of box plot was presented by John Tukey in 1970. He wrote about it in his book "Exploratory Data Analysis" in 1977. Box plot is also known as a whisker plot, box-and-whisker plot, or simply a box-and whisker diagram. Box plot is a graphical representation of the distribution of a dataset. It displays key summary statistics such as the median, quartiles, and potential outliers in a concise and visual manner. By using Box plot you can provide a summary of the distribution, identify potential and compare different datasets in a compact and visual manner.

Elements of Box Plot

A box plot gives a five-number summary of a set of data which is-

  • Minimum - It is the minimum value in the dataset excluding the outliers.
  • First Quartile (Q1) - 25% of the data lies below the First (lower) Quartile.
  • Median (Q2) - It is the mid-point of the dataset. Half of the values lie below it and half above.
  • Third Quartile (Q3) - 75% of the data lies below the Third (Upper) Quartile.
  • Maximum - It is the maximum value in the dataset excluding the outliers.

Note: The box plot shown in the above diagram is a perfect plot with no skewness. The plots can have skewness and the median might not be at the center of the box.

The area inside the box (50% of the data) is known as the Inter Quartile Range. The IQR is calculated as -

IQR = Q3-Q1 

Outlies are the data points below and above the lower and upper limit. The lower and upper limit is calculated as - 

Lower Limit = Q1 - 1.5*IQR Upper Limit = Q3 + 1.5*IQR 

The values below and above these limits are considered outliers and the minimum and maximum values are calculated from the points which lie under the lower and upper limit.

How to create a box plots?

Let us take a sample data to understand how to create a box plot.

Here are the runs scored by a cricket team in a league of 12 matches - 100, 120, 110, 150, 110, 140, 130, 170, 120, 220, 140, 110.

To draw a box plot for the given data first we need to arrange the data in ascending order and then find the minimum, first quartile, median, third quartile and the maximum.

Ascending Order  100, 110, 110, 110, 120, 120, 130, 140, 140, 150, 170, 220  Median (Q2) = (120+130)/2 = 125; Since there were even values  

To find the First Quartile we take the first six values and find their median.

Q1 = (110+110)/2 = 110 

For the Third Quartile, we take the next six and find their median.

Q3 = (140+150)/2 = 145 

Note: If the total number of values is odd then we exclude the Median while calculating Q1 and Q3. Here since there were two central values we included them. Now, we need to calculate the Inter Quartile Range.

IQR = Q3-Q1 = 145-110 = 35 

We can now calculate the Upper and Lower Limits to find the minimum and maximum values and also the outliers if any.

Lower Limit = Q1-1.5*IQR = 110-1.5*35 = 57.5 Upper Limit = Q3+1.5*IQR = 145+1.5*35 = 197.5   

So, the minimum and maximum between the range [57.5,197.5] for our given data are - 

Minimum = 100 Maximum = 170   

The outliers which are outside this range are - 

Outliers = 220 

Now we have all the information, so we can draw the box plot which is as below-

We can see from the diagram that the Median is not exactly at the center of the box and one whisker is longer than the other. We also have one Outlier.

Use-Cases of Box Plot

  • Box plots provide a visual summary of the data with which we can quickly identify the average value of the data, how dispersed the data is, whether the data is skewed or not (skewness).
  • The Median gives you the average value of the data.
  • Box Plots shows Skewness of the data-
a) If the Median is at the center of the Box and the whiskers are almost the     same on both the ends then the data is Normally Distributed. b) If the Median lies closer to the First Quartile and if the whisker at the lower    end is shorter (as in the above example) then it has a Positive Skew (Right Skew). c) If the Median lies closer to the Third Quartile and if the whisker at the    upper end is shorter than it has a Negative Skew (Left Skew).   
  • The dispersion or spread of data can be visualized by the minimum and maximum values which are found at the end of the whiskers.
  • The Box plot gives us the idea of about the Outliers which are the points which are numerically distant from the rest of the data.

How to compare box plots?

As we have discussed at the beginning of the article that box plots make comparing characteristics of data between categories very easy. Let us have a look at how we can compare different box plots and derive statistical conclusions from them.

Let us take the below two plots as an example: -

  • Compare the Medians — If the median line of a box plot lies outside the box of the other box plot with which it is being compared, then we can say that there is likely to be a difference between the two groups. Here the Median line of the plot B lies outside the box of Plot A.
  • Compare the Dispersion or Spread of data — The Inter Quartile range (length of the box) gives us an idea about how dispersed the data is. Here Plot A has a longer length than Plot B which means that the dispersion of data is more in plot A as compared to plot B. The length of whiskers also gives an idea of the overall spread of data. The extreme values (minimum &maximum) give the range of data distribution. Larger the range more scattered the data. Here Plot A has a larger range than Plot B.
  • Comparing Outliers — The outliers give the idea of unusual data values which are distant from the rest of the data. More number of Outliers means the prediction will be more uncertain. We can be more confident while predicting the values for a plot which has less or no outliers.
  • Compare Skewness — Skewness gives us the direction and the magnitude of the lack of symmetry. We have discussed above how to identify skewness. Here Plot A is Positive or Right Skewed and Plot B is Negative or Left Skewed.

This is all for Box Plots. Now you might have got the idea of Box Plots how to make them and how to derive information from them. For any queries do leave a comment down below.


Next Article
Detect and Remove the Outliers using Python

S

shristikotaiah
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
  • Data Visualization
  • ML-EDA
  • ML-plots
Practice Tags :
  • Machine Learning

Similar Reads

    Data Analysis (Analytics) Tutorial
    Data Analytics is a process of examining, cleaning, transforming and interpreting data to discover useful information, draw conclusions and support decision-making. It helps businesses and organizations understand their data better, identify patterns, solve problems and improve overall performance.
    4 min read

    Prerequisites for Data Analysis

    Exploratory Data Analysis (EDA) with NumPy, Pandas, Matplotlib and Seaborn
    Exploratory Data Analysis (EDA) serves as the foundation of any data science project. It is an essential step where data scientists investigate datasets to understand their structure, identify patterns, and uncover insights. Data preparation involves several steps, including cleaning, transforming,
    4 min read
    SQL for Data Analysis
    SQL (Structured Query Language) is a powerful tool for data analysis, allowing users to efficiently query and manipulate data stored in relational databases. Whether you are working with sales, customer or financial data, SQL helps extract insights and perform complex operations like aggregation, fi
    6 min read
    Python | Math operations for Data analysis
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.There are some important math operations that can be performed on a pandas series to si
    2 min read
    Python - Data visualization tutorial
    Data visualization is a crucial aspect of data analysis, helping to transform analyzed data into meaningful insights through graphical representations. This comprehensive tutorial will guide you through the fundamentals of data visualization using Python. We'll explore various libraries, including M
    7 min read
    Free Public Data Sets For Analysis
    Data analysis is a crucial aspect of modern decision-making processes across various domains, including business, academia, healthcare, and government. However, obtaining high-quality datasets for analysis can be challenging and costly. Fortunately, there are numerous free public datasets available
    5 min read

    Data Analysis Libraries

    Pandas Tutorial
    Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
    6 min read
    NumPy Tutorial - Python Library
    NumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
    3 min read
    Data Analysis with SciPy
    Scipy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level funct
    6 min read

    Understanding the Data

    What is Data ?
    Data is a word we hear everywhere nowadays. In general, data is a collection of facts, information, and statistics and this can be in various forms such as numbers, text, sound, images, or any other format.In this article, we will learn about What is Data, the Types of Data, Importance of Data, and
    9 min read
    Understanding Data Attribute Types | Qualitative and Quantitative
    When we talk about data mining , we usually discuss knowledge discovery from data. To learn about the data, it is necessary to discuss data objects, data attributes, and types of data attributes. Mining data includes knowing about data, finding relations between data. And for this, we need to discus
    6 min read
    Univariate, Bivariate and Multivariate data and its analysis
    In this article,we will be discussing univariate, bivariate, and multivariate data and their analysis. Univariate data: Univariate data refers to a type of data in which each observation or data point corresponds to a single variable. In other words, it involves the measurement or observation of a s
    5 min read
    Attributes and its Types in Data Analytics
    In this article, we are going to discuss attributes and their various types in data analytics. We will also cover attribute types with the help of examples for better understanding. So let's discuss them one by one. What are Attributes?Attributes are qualities or characteristics that describe an obj
    4 min read

    Loading the Data

    Pandas Read CSV in Python
    CSV files are the Comma Separated Files. It allows users to load tabular data into a DataFrame, which is a powerful structure for data manipulation and analysis. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame. Here’s a
    6 min read
    Export Pandas dataframe to a CSV file
    When working on a Data Science project one of the key tasks is data management which includes data collection, cleaning and storage. Once our data is cleaned and processed it’s essential to save it in a structured format for further analysis or sharing.A CSV (Comma-Separated Values) file is a widely
    2 min read
    Pandas - Parsing JSON Dataset
    JSON (JavaScript Object Notation) is a popular way to store and exchange data especially used in web APIs and configuration files. Pandas provides tools to parse JSON data and convert it into structured DataFrames for analysis. In this guide we will explore various ways to read, manipulate and norma
    2 min read
    Exporting Pandas DataFrame to JSON File
    Pandas a powerful Python library for data manipulation provides the to_json() function to convert a DataFrame into a JSON file and the read_json() function to read a JSON file into a DataFrame.In this article we will explore how to export a Pandas DataFrame to a JSON file with detailed explanations
    2 min read
    Working with Excel files using Pandas
    Excel sheets are very instinctive and user-friendly, which makes them ideal for manipulating large datasets even for less technical folks. If you are looking for places to learn to manipulate and automate stuff in Excel files using Python, look no further. You are at the right place.In this article,
    7 min read

    Data Cleaning

    What is Data Cleaning?
    Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting (or removing) errors, inconsistencies, and inaccuracies within a dataset. This crucial step in the data management and data science pipeline ensures that the data is accurate, consistent, and
    12 min read
    ML | Overview of Data Cleaning
    Data cleaning is a important step in the machine learning (ML) pipeline as it involves identifying and removing any missing duplicate or irrelevant data. The goal of data cleaning is to ensure that the data is accurate, consistent and free of errors as raw data is often noisy, incomplete and inconsi
    13 min read
    Best Data Cleaning Techniques for Preparing Your Data
    Data cleaning, also known as data cleansing or data scrubbing, is the process of identifying and correcting errors, inconsistencies, and inaccuracies in datasets to improve their quality, accuracy, and reliability for analysis or other applications. It involves several steps aimed at detecting and r
    6 min read

    Handling Missing Data

    Working with Missing Data in Pandas
    In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
    5 min read
    Drop rows from Pandas dataframe with missing values or NaN in columns
    We are given a Pandas DataFrame that may contain missing values, also known as NaN (Not a Number), in one or more columns. Our task is to remove the rows that have these missing values to ensure cleaner and more accurate data for analysis. For example, if a row contains NaN in any specified column,
    4 min read
    Count NaN or missing values in Pandas DataFrame
    In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. 1. DataFrame.isnull() MethodDataFrame.isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are
    3 min read
    ML | Handling Missing Values
    Missing values are a common issue in machine learning. This occurs when a particular variable lacks data points, resulting in incomplete information and potentially harming the accuracy and dependability of your models. It is essential to address missing values efficiently to ensure strong and impar
    12 min read
    Working with Missing Data in Pandas
    In Pandas, missing data occurs when some values are missing or not collected properly and these missing values are represented as:None: A Python object used to represent missing values in object-type arrays.NaN: A special floating-point value from NumPy which is recognized by all systems that use IE
    5 min read
    ML | Handle Missing Data with Simple Imputer
    SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder. It is implemented by the use of the SimpleImputer() method which takes the following arguments : missing_values : The missing_
    2 min read
    How to handle missing values of categorical variables in Python?
    Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. Often we come across datasets in which some values are missing from the columns. This causes problems when we apply a machine learning model to the dataset. This increases the cha
    4 min read
    Replacing missing values using Pandas in Python
    Dataset is a collection of attributes and rows. Data set can have missing data that are represented by NA in Python and in this article, we are going to replace missing values in this article We consider this data set: Dataset data set In our data contains missing values in quantity, price, bought,
    2 min read

    Outliers Detection

    Box Plot
    Box Plot is a graphical method to visualize data distribution for gaining insights and making informed decisions. Box plot is a type of chart that depicts a group of numerical data through their quartiles. In this article, we are going to discuss components of a box plot, how to create a box plot, u
    7 min read
    Detect and Remove the Outliers using Python
    Outliers are data points that deviate significantly from other data points in a dataset. They can arise from a variety of factors such as measurement errors, rare events or natural variations in the data. If left unchecked it can distort data analysis, skew statistical results and impact machine lea
    8 min read
    Z score for Outlier Detection - Python
    Z score (or standard score) is an important concept in statistics. It helps to understand if a data value is greater or smaller than the mean and how far away it is from the mean. More specifically, the Z score tells how many standard deviations away a data point is from the mean. Z score = (x -mean
    3 min read
    Clustering-Based approaches for outlier detection in data mining
    Clustering Analysis is the process of dividing a set of data objects into subsets. Each subset is a cluster such that objects are similar to each other. The set of clusters obtained from clustering analysis can be referred to as Clustering. For example: Segregating customers in a Retail market as a
    6 min read

    Exploratory Data Analysis

    What is Exploratory Data Analysis?
    Exploratory Data Analysis (EDA) is a important step in data science as it visualizing data to understand its main features, find patterns and discover how different parts of the data are connected. In this article, we will see more about Exploratory Data Analysis (EDA).Why Exploratory Data Analysis
    8 min read
    EDA - Exploratory Data Analysis in Python
    Exploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
    6 min read

    Time Series Data Analysis

    Time Series Analysis & Visualization in Python
    Time series data consists of sequential data points recorded over time which is used in industries like finance, pharmaceuticals, social media and research. Analyzing and visualizing this data helps us to find trends and seasonal patterns for forecasting and decision-making. In this article, we will
    6 min read
    What is a trend in time series?
    Time series data is a sequence of data points that measure some variable over ordered period of time. It is the fastest-growing category of databases as it is widely used in a variety of industries to understand and forecast data patterns. So while preparing this time series data for modeling it's i
    3 min read
    Basic DateTime Operations in Python
    Python has an in-built module named DateTime to deal with dates and times in numerous ways. In this article, we are going to see basic DateTime operations in Python. There are six main object classes with their respective components in the datetime module mentioned below: datetime.datedatetime.timed
    12 min read
    How to deal with missing values in a Timeseries in Python?
    It is common to come across missing values when working with real-world data. Time series data is different from traditional machine learning datasets because it is collected under varying conditions over time. As a result, different mechanisms can be responsible for missing records at different tim
    9 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences