Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python – Data visualization
  • Pandas
  • Seaborn
  • Matplotlib
  • Plotly
  • Altair
  • Bokeh
  • Pygal
  • Exploratory Data Analysis
  • Power BI
  • Tableau
  • Data Analysis with Python
  • Python Interview Questions
  • Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Data Science
  • R Programming
Open In App
Next Article:
Scatter Plot Matrix
Next article icon

Scatter Plot Matrix

Last Updated : 23 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In a dataset, for k set of variables/columns (X1, X2, ....Xk), the scatter plot matrix plot all the pairwise scatter between different variables in the form of a matrix.  

Scatter plot matrix answer the following questions:

  • Are there any pair-wise relationships between different variables? And if there are relationships, what is the nature of these relationships?
  • Are there any outliers in the dataset?
  • Is there any clustering by groups present in the dataset on the basis of a particular variable?

For k variables in the dataset, the scatter plot matrix contains k rows and k columns. Each row and column represents as a single scatter plot. Each individual plot (i, j) can be defined as:

  • Vertical Axis: Variable Xj
  • Horizontal Axis: Variable Xi

Below are some important factors we consider when plotting the Scatter plot matrix:

  • The plot lies on the diagonal is just a 45 line because we are plotting here Xi vs Xi. However, we can plot the histogram for the Xi in the diagonals or just leave it blank.
  • Since Xi vs Xj is equivalent to Xj vs Xi with the axes reversed, we can also omit the plots below the diagonal.
  • It can be more helpful if we overlay some line plot on the scattered points in the plots to give more understanding of the plot.
  • The idea of the pair-wise plot can also be extended to different other plots such as quantile-quantile plots or bihistogram.

Implementation

  • For this implementation, we will be using the Titanic dataset. This dataset can be downloaded from Kaggle. Before plotting the scatter matrix, we will be performing some preprocessing operations on the dataframe to obtain it into the desired form.
Python3
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt % matplotlib inline  # load titanic dataset titanic_dataset = pd.read_csv('tested.csv.xls') titanic_dataset.head() # Drop some unimportant columns in the dataset. titanic_dataset.drop(['Name', 'Ticket','Cabin','PassengerId'],axis=1, inplace=True)  # check for different data types titanic_dataset.dtypes  # print unique values of dataset titanic_dataset['Embarked'].unique() titanic_dataset['Sex'].unique()  # Replace NAs with mean titanic_dataset.fillna(titanic_dataset.mean(), inplace=True)  # convert some column into integer for representation in  # scatter matrix titanic_dataset["Sex"] = titanic_dataset["Sex"].cat.codes titanic_dataset["Embarked"] = titanic_dataset["Embarked"].cat.codes  titanic_dataset.head()  # plot scatter matrix using pandas and matplotlib survive_colors = {0:'orange', 1:'blue'} pd.plotting.scatter_matrix(titanic_dataset,figsize=(20,20),grid=True,                            marker='o', c= titanic_dataset['Survived'].map(colors))   # plot scatter matrix using seaborn sns.set_theme(style="ticks") sns.pairplot(titanic_dataset, hue='Survived') 
PassengerId    Survived    Pclass    Name    Sex    Age    SibSp    Parch    Ticket    Fare    Cabin    Embarked  0    892    0    3    Kelly, Mr. James    male    34.5    0    0    330911    7.8292    NaN    Q  1    893    1    3    Wilkes, Mrs. James (Ellen Needs)    female    47.0    1    0    363272    7.0000    NaN    S  2    894    0    2    Myles, Mr. Thomas Francis    male    62.0    0    0    240276    9.6875    NaN    Q  3    895    0    3    Wirz, Mr. Albert    male    27.0    0    0    315154    8.6625    NaN    S  4    896    1    3    Hirvonen, Mrs. Alexander (Helga E Lindqvist)    female    22.0    1    1    3101298    12.2875    NaN    S
PassengerId      int64  Survived         int64  Pclass           int64  Sex             object  Age            float64  SibSp            int64  Parch            int64  Fare           float64  Embarked        object  dtype: object
Survived    Pclass    Sex    Age    SibSp    Parch    Fare    Embarked  0    0    3    1    34.5    0    0    7.8292    1  1    1    3    0    47.0    1    0    7.0000    2  2    0    2    1    62.0    0    0    9.6875    1  3    0    3    1    27.0    0    0    8.6625    2  4    1    3    0    22.0    1    1    12.2875    2
Matplotlib Scatter matrix
Seaborn Scatter matrix

References:

  • NIST handbook

Next Article
Scatter Plot Matrix

P

pawangfg
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
  • python
  • Data Visualization
  • ML-EDA
  • ML-plots
Practice Tags :
  • Machine Learning
  • python

Similar Reads

    Matplotlib Scatter
    Scatter plots are one of the most fundamental and powerful tools for visualizing relationships between two numerical variables. matplotlib.pyplot.scatter() plots points on a Cartesian plane defined by X and Y coordinates. Each point represents a data observation, allowing us to visually analyze how
    5 min read
    ML | Matrix plots in Seaborn
    Seaborn is a wonderful visualization library provided by python. It has several kinds of plots through which it provides the amazing visualization capabilities. Some of them include count plot, scatter plot, pair plots, regression plots, matrix plots and much more. This article deals with the matrix
    4 min read
    Problem Solving on Scatter Matrix
    A scatter matrix, also known as a pair plot, is a powerful visualization tool in data analysis. It provides a grid of scatter plots that display relationships between pairs of variables in a dataset, helping engineers and data scientists to identify patterns, correlations, and potential outliers. Re
    5 min read
    What Is a Scatter Plot in Python?
    Scatter plots are a fundamental tool in data visualization, providing a visual representation of the relationship between two variables. In Python, scatter plots are commonly created using libraries such as Matplotlib and Seaborn. This article will delve into the concept of scatter plots, their appl
    6 min read
    Matplotlib Tutorial
    Matplotlib is an open-source visualization library for the Python programming language, widely used for creating static, animated and interactive plots. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, Qt, GTK and wxPython. It
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences