Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Pandas
  • Numpy
  • Seaborn
  • Ploty
  • Data visualization
  • Data Analysis
  • Power BI
  • Tableau
  • Machine Learning
  • Deep Learning
  • NLP
  • Computer Vision
  • Data Science for Beginners
  • Data Science interview questions
  • Data analysis interview questions
  • NLP Interview questions
Open In App
Next Article:
How to install MySQL Connector Package in Python
Next article icon

Data Visualisation in Python using Matplotlib and Seaborn

Last Updated : 09 Nov, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

It may sometimes seem easier to go through a set of data points and build insights from it but usually this process may not yield good results. There could be a lot of things left undiscovered as a result of this process. Additionally, most of the data sets used in real life are too big to do any analysis manually. This is essentially where data visualization steps in.

Data visualization is an easier way of presenting the data, however complex it is, to analyze trends and relationships amongst variables with the help of pictorial representation.

The following are the advantages of Data Visualization

  • Easier representation of compels data
  • Highlights good and bad performing areas
  • Explores relationship between data points
  • Identifies data patterns even for larger data points

While building visualization, it is always a good practice to keep some below mentioned points in mind

  • Ensure appropriate usage of shapes, colors, and size while building visualization
  • Plots/graphs using a co-ordinate system are more pronounced
  • Knowledge of suitable plot with respect to the data types brings more clarity to the information
  • Usage of labels, titles, legends and pointers passes seamless information the wider audience

Python Libraries

There are a lot of python libraries which could be used to build visualization like matplotlib, vispy, bokeh, seaborn, pygal, folium, plotly, cufflinks, and networkx. Of the many, matplotlib and seaborn seems to be very widely used for basic to intermediate level of visualizations.

Matplotlib

It is an amazing visualization library in Python for 2D plots of arrays, It is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was introduced by John Hunter in the year 2002. Let’s try to understand some of the benefits and features of matplotlib

  • It’s fast, efficient as it is based on numpy and also easier to build
  • Has undergone a lot of improvements from the open source community since inception and hence a better library having advanced features as well
  • Well maintained visualization output with high quality graphics draws a lot of users to it
  • Basic as well as advanced charts could be very easily built
  • From the users/developers point of view, since it has a large community support, resolving issues and debugging becomes much easier

Seaborn

Conceptualized and built originally at the Stanford University, this library sits on top of matplotlib. In a sense, it has some flavors of matplotlib while from the visualization point, it is much better than matplotlib and has added features as well. Below are its advantages

  • Built-in themes aid better visualization
  • Statistical functions aiding better data insights
  • Better aesthetics and built-in plots
  • Helpful documentation with effective examples

Nature of Visualization

Depending on the number of variables used for plotting the visualization and the type of variables, there could be different types of charts which we could use to understand the relationship. Based on the count of variables, we could have

  • Univariate plot(involves only one variable)
  • Bivariate plot(more than one variable in required)

A Univariate plot could be for a continuous variable to understand the spread and distribution of the variable while for a discrete variable it could tell us the count

Similarly, a Bivariate plot for continuous variable could display essential statistic like correlation, for a continuous versus discrete variable could lead us to very important conclusions like understanding data distribution across different levels of a categorical variable. A bivariate plot between two discrete variables could also be developed.

Box plot

A boxplot, also known as a box and whisker plot, the box and the whisker are clearly displayed in the below image. It is a very good visual representation when it comes to measuring the data distribution. Clearly plots the median values, outliers and the quartiles. Understanding data distribution is another important factor which leads to better model building. If data has outliers, box plot is a recommended way to identify them and take necessary actions.

Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)

Parameters: 
x, y, hue: Inputs for plotting long-form data. 
data: Dataset for plotting. If x and y are absent, this is interpreted as wide-form. 
color: Color for all of the elements.

Returns: It returns the Axes object with the plot drawn onto it. 

The box and whiskers chart shows how data is spread out. Five pieces of information are generally included in the chart

  1. The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
  2. First quartile, Q1, is the far left of the box (left whisker)
  3. The median is shown as a line in the center of the box
  4. Third quartile, Q3, shown at the far right of the box (right whisker)
  5. The maximum is at the far right of the box

As could be seen in the below representations and charts, a box plot could be plotted for one or more than one variable providing very good insights to our data.

Representation of box plot.

Box plot representing multi-variate categorical variables

Box plot representing multi-variate categorical variables

Python3




# import required modules
import matplotlib as plt
import seaborn as sns
 
# Box plot and violin plot for Outcome vs BloodPressure
_, axes = plt.subplots(1, 2, sharey=True, figsize=(10, 4))
 
# box plot illustration
sns.boxplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[0])
 
# violin plot illustration
sns.violinplot(x='Outcome', y='BloodPressure', data=diabetes, ax=axes[1])
 
 

Output for Box Plot and Violin Plot

Python3




# Box plot for all the numerical variables
sns.set(rc={'figure.figsize': (16, 5)})
 
# multiple box plot illustration
sns.boxplot(data=diabetes.select_dtypes(include='number'))
 
 

Output Multiple Box PLot

Scatter Plot

Scatter plots or scatter graphs is a bivariate plot having greater resemblance to line graphs in the way they are built. A line graph uses a line on an X-Y axis to plot a continuous function, while a scatter plot relies on dots to represent individual pieces of data. These plots are very useful to see if two variables are correlated. Scatter plot could be 2 dimensional or 3 dimensional.

Syntax: seaborn.scatterplot(x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=’auto’, x_jitter=None, y_jitter=None, legend=’brief’, ax=None, **kwargs)
Parameters:
x, y: Input data variables that should be numeric.

data: Dataframe where each column is a variable and each row is an observation.

size: Grouping variable that will produce points with different sizes.

style: Grouping variable that will produce points with different markers.  

palette: Grouping variable that will produce points with different markers.  

markers: Object determining how to draw the markers for different levels.

alpha: Proportional opacity of the points.

Returns: This method returns the Axes object with the plot drawn onto it.

Advantages of a scatter plot

  • Displays correlation between variables
  • Suitable for large data sets
  • Easier to find data clusters
  • Better representation of each data point

 

Python3




# import module
import matplotlib.pyplot as plt
 
# scatter plot illustration
plt.scatter(diabetes['DiabetesPedigreeFunction'], diabetes['BMI'])
 
 

Output 2D Scattered Plot

Python3




# import required modules
from mpl_toolkits.mplot3d import Axes3D
 
# assign axis values
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [5, 6, 2, 3, 13, 4, 1, 2, 4, 8]
z = [2, 3, 3, 3, 5, 7, 9, 11, 9, 10]
 
# adjust size of plot
sns.set(rc={'figure.figsize': (8, 5)})
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c='r', marker='o')
 
# assign labels
ax.set_xlabel('X Label'), ax.set_ylabel('Y Label'), ax.set_zlabel('Z Label')
 
# display illustration
plt.show()
 
 

Output 3D Scattered Plot

Histogram

Histograms display counts of data and are hence similar to a bar chart. A histogram plot can also tell us how close a data distribution is to a normal curve. While working out statistical method, it is very important that we have a data which is normally or close to a normal distribution. However, histograms are univariate in nature and bar charts bivariate.

A bar graph charts actual counts against categories e.g. height of the bar indicates the number of items in that category whereas a histogram displays the same categorical variables in bins.

Bins are integral part while building a histogram they control the data points which are within a range. As a widely accepted choice we usually limit bin to a size of 5-20, however this is totally governed by the data points which is present.

Python3




# illustrate histogram
features = ['BloodPressure', 'SkinThickness']
diabetes[features].hist(figsize=(10, 4))
 
 

Output Histogram

Countplot

A countplot is a plot between a categorical and a continuous variable. The continuous variable in this case being the number of times the categorical is present or simply the frequency. In a sense, count plot can be said to be closely linked to a histogram or a bar graph.

Syntax : seaborn.countplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, dodge=True, ax=None, **kwargs)
Parameters : This method is accepting the following parameters that are described below: 
 

  • x, y: This parameter take names of variables in data or vector data, optional, Inputs for plotting long-form data.
  • hue : (optional) This parameter take column name for colour encoding.
  • data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
  • order, hue_order : (optional) This parameter take lists of strings. Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.
  • orient : (optional)This parameter take “v” | “h”, Orientation of the plot (vertical or horizontal). This is usually inferred from the dtype of the input variables but can be used to specify when the “categorical” variable is a numeric or when plotting wide-form data.
  • color : (optional) This parameter take matplotlib color, Color for all of the elements, or seed for a gradient palette.
  • palette : (optional) This parameter take palette name, list, or dict, Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.
  • saturation : (optional) This parameter take float value, Proportion of the original saturation to draw colors at. Large patches often look better with slightly desaturated colors, but set this to 1 if you want the plot colors to perfectly match the input color spec.
  • dodge : (optional) This parameter take bool value, When hue nesting is used, whether elements should be shifted along the categorical axis.
  • ax : (optional) This parameter take matplotlib Axes, Axes object to draw the plot onto, otherwise uses the current Axes.
  • kwargs : This parameter take key, value mappings, Other keyword arguments are passed through to matplotlib.axes.Axes.bar().

Returns: Returns the Axes object with the plot drawn onto it.

It simply shows the number of occurrences of an item based on a certain type of category.In python, we can create a count plot using the seaborn library. Seaborn is a module in Python that is built on top of matplotlib and used for visually appealing statistical plots.

Python3




# import required module
import seaborn as sns
 
# assign required values
_, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
 
# illustrate count plots
sns.countplot(x='Outcome', data=diabetes, ax=axes[0])
sns.countplot(x='BloodPressure', data=diabetes, ax=axes[1])
 
 

Output Countplot

Correlation plot

Correlation plot is a multi-variate analysis which comes very handy to have a look at relationship with data points. Scatter plots helps to understand the affect of one variable over the other. Correlation could be defined as the affect which one variable has over the other.

Correlation could be calculated between two variables or it could be one versus many correlations as well which we could see the below plot. Correlation could be positive, negative or neutral and the mathematical range of correlations is from -1 to 1. Understanding the correlation could have a very significant effect on the model building stage and also understanding the model outputs.

Python3




# Finding and plotting the correlation for
# the independent variables
 
# import required module
import seaborn as sns
 
# adjust plot
sns.set(rc={'figure.figsize': (14, 5)})
 
# assign data
ind_var = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM',
           'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
 
# illustrate heat map.
sns.heatmap(diabetes.select_dtypes(include='number').corr(),
            cmap=sns.cubehelix_palette(20, light=0.95, dark=0.15))
 
 

Output Correlation Plot

Heat Maps

Heat map is a multi-variate data representation. The color intensity in a heat map displays becomes an important factor to understand the affect of data points. Heat maps are easier to understand and easier to explain as well. When it comes to data analysis using visualization, its very important that the desired message gets conveyed with the help of plots.

Syntax: 

seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt=’.2g’, annot_kws=None, linewidths=0, linecolor=’white’, cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels=’auto’, yticklabels=’auto’, mask=None, ax=None, **kwargs)

Parameters : This method is accepting the following parameters that are described below: 
 

  • x, y: This parameter take names of variables in data or vector data, optional, Inputs for plotting long-form data.
  • hue : (optional) This parameter take column name for colour encoding.
  • data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
  • color : (optional) This parameter take matplotlib color, Color for all of the elements, or seed for a gradient palette.
  • palette : (optional) This parameter take palette name, list, or dict, Colors to use for the different levels of the hue variable. Should be something that can be interpreted by color_palette(), or a dictionary mapping hue levels to matplotlib colors.
  • ax : (optional) This parameter take matplotlib Axes, Axes object to draw the plot onto, otherwise uses the current Axes.
  • kwargs : This parameter take key, value mappings, Other keyword arguments are passed through to matplotlib.axes.Axes.bar().

Returns: Returns the Axes object with the plot drawn onto it.

Python3




# import required module
import seaborn as sns
import numpy as np
 
# assign data
data = np.random.randn(50, 20)
 
# illustrate heat map
ax = sns.heatmap(data, xticklabels=2, yticklabels=False)
 
 

Output Heat Map

Pie Chart

Pie chart is a univariate analysis and are typically used to show percentage or proportional data. The percentage distribution of each class in a variable is provided next to the corresponding slice of the pie. The python libraries which could be used to build a pie chart is matplotlib and seaborn.

Syntax: matplotlib.pyplot.pie(data, explode=None, labels=None, colors=None, autopct=None, shadow=False)

Parameters:
data represents the array of data values to be plotted, the fractional area of each slice is represented by data/sum(data). If sum(data)<1, then the data values returns the fractional area directly, thus resulting pie will have empty wedge of size 1-sum(data).
labels is a list of sequence of strings which sets the label of each wedge.
color attribute is used to provide color to the wedges.
autopct is a string used to label the wedge with their numerical value.
shadow is used to create shadow of wedge.

Below are the advantages of a pie chart

  • Easier visual summarization of large data points
  • Effect and size of different classes can be easily understood
  • Percentage points are used to represent the classes in the data points

Python3




# import required module
import matplotlib.pyplot as plt
 
# Creating dataset
cars = ['AUDI', 'BMW', 'FORD', 'TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
 
# Creating plot
fig = plt.figure(figsize=(10, 7))
plt.pie(data, labels=cars)
 
# Show plot
plt.show()
 
 

Output Pie Chart

Python3




# Import required module
import matplotlib.pyplot as plt
import numpy as np
 
# Creating dataset
cars = ['AUDI', 'BMW', 'FORD', 'TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
 
# Creating explode data
explode = (0.1, 0.0, 0.2, 0.3, 0.0, 0.0)
 
# Creating color parameters
colors = ("orange", "cyan", "brown", "grey", "indigo", "beige")
 
# Wedge properties
wp = {'linewidth': 1, 'edgecolor': "green"}
 
# Creating autocpt arguments
def func(pct, allvalues):
    absolute = int(pct / 100.*np.sum(allvalues))
    return "{:.1f}%\n({:d} g)".format(pct, absolute)
 
# Creating plot
fig, ax = plt.subplots(figsize=(10, 7))
wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data), explode=explode, labels=cars,
                                  shadow=True, colors=colors, startangle=90, wedgeprops=wp,
                                  textprops=dict(color="magenta"))
 
# Adding legend
ax.legend(wedges, cars, title="Cars", loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")
ax.set_title("Customizing pie chart")
 
# Show plot
plt.show()
 
 

Output

Error Bars

Error bars could be defined as a line through a point on a graph, parallel to one of the axes, which represents the uncertainty or error of the corresponding coordinate of the point. These types of plots are very handy to understand and analyze the deviations from the target. Once errors are identified, it could easily lead to deeper analysis of the factors causing them.

  • Deviation of data points from the threshold could be easily captured
  • Easily captures deviations from a larger set of data points
  • It defines the underlying data

Python3




# Import required module
import matplotlib.pyplot as plt
import numpy as np
 
# Assign axes
x = np.linspace(0,5.5,10)
y = 10*np.exp(-x)
 
# Assign errors regarding each axis
xerr = np.random.random_sample(10)
yerr = np.random.random_sample(10)
 
# Adjust plot
fig, ax = plt.subplots()
ax.errorbar(x, y, xerr=xerr, yerr=yerr, fmt='-o')
 
# Assign labels
ax.set_xlabel('x-axis'), ax.set_ylabel('y-axis')
ax.set_title('Line plot with error bars')
 
# Illustrate error bars
plt.show()
 
 

Output Error Plot



Next Article
How to install MySQL Connector Package in Python
author
digitarun27
Improve
Article Tags :
  • AI-ML-DS
  • Data Visualization
  • AI-ML-DS With Python
  • Python Data Visualization
  • Python-matplotlib
  • Python-Seaborn

Similar Reads

  • SQL for Data Science
    Mastering SQL (Structured Query Language) has become a fundamental skill for anyone pursuing a career in data science. As data plays an increasingly central role in business and technology, SQL has emerged as the most essential tool for managing and analyzing large datasets. Data scientists rely on
    7 min read
  • Introduction to SQL

    • What is SQL?
      SQL stands for Structured Query Language. It is a standardized programming language used to manage and manipulate relational databases. It enables users to perform a variety of tasks such as querying data, creating and modifying database structures, and managing access permissions. SQL is widely use
      11 min read

    • Difference Between RDBMS and DBMS
      Database Management System (DBMS) is a software that is used to define, create, and maintain a database and provides controlled access to the data. Why is DBMS Required?Database management system, as the name suggests, is a management system that is used to manage the entire flow of data, i.e, the i
      4 min read

    • Difference between SQL and NoSQL
      Choosing between SQL (Structured Query Language) and NoSQL (Not Only SQL) databases is a critical decision for developers, data engineers, and organizations looking to handle large datasets effectively. Both database types have their strengths and weaknesses, and understanding the key differences ca
      6 min read

    • SQL Data Types
      SQL Data Types are very important in relational databases. It ensures that data is stored efficiently and accurately. Data types define the type of value a column can hold, such as numbers, text, or dates. Understanding SQL Data Types is critical for database administrators, developers, and data ana
      6 min read

    • SQL | DDL, DML, TCL and DCL
      Data Definition Language (DDL), Data Manipulation Language (DML), Transaction Control Language (TCL), and Data Control Language (DCL) form the backbone of SQL. Each of these languages plays a critical role in defining, managing, and controlling data within a database system, ensuring both structural
      6 min read

    Setting Up the Environment

    • Install PostgreSQL on Windows
      Installing PostgreSQL on your Windows 10 machine is straightforward with the PostgreSQL installer. In this article, we'll walk you through installing PostgreSQL version 11.3, ensuring a smooth setup process. Steps to Install PostgreSQL on WindowsThere are three crucial steps for the installation of
      2 min read

    • How to Install SQL Server Client on Windows?
      The Client / Server Application is a computer program that allows users to access what is stored on the server. Of course, both computers can be workstations running the same type of operating system. In most network environments, the server contains a database that requires users to access this dat
      2 min read

    • How to Create a Database Connection?
      Java Database Connectivity is a standard API or we can say an application interface present between the Java programming language and the various databases like Oracle, SQL, PostgreSQL, MongoDB, etc. It basically connects the front end(for interacting with the users) with the backend for storing dat
      5 min read

    SQL Basics

    • Relational Model in DBMS
      The Relational Model represents data and their relationships through a collection of tables. Each table also known as a relation consists of rows and columns. Every column has a unique name and corresponds to a specific attribute, while each row contains a set of related data values representing a r
      11 min read

    • SQL SELECT Query
      The select query in SQL is one of the most commonly used SQL commands to retrieve data from a database. With the select command in SQL, users can access data and retrieve specific records based on various conditions, making it an essential tool for managing and analyzing data. In this article, we’ll
      4 min read

    • SQL Data Types
      SQL Data Types are very important in relational databases. It ensures that data is stored efficiently and accurately. Data types define the type of value a column can hold, such as numbers, text, or dates. Understanding SQL Data Types is critical for database administrators, developers, and data ana
      6 min read

    • SQL | WITH Clause
      SQL queries can sometimes be complex, especially when you need to deal with multiple nested subqueries, aggregations, and joins. This is where the SQL WITH clause also known as Common Table Expressions (CTEs) comes in to make life easier. The WITH Clause is a powerful tool that simplifies complex SQ
      6 min read

    • SQL | GROUP BY
      The SQL GROUP BY clause is a powerful tool used to organize data into groups based on shared values in one or more columns. It's most often used with aggregate functions like SUM, COUNT, AVG, MIN, and MAX to perform summary operations on each group helping us extract meaningful insights from large d
      5 min read

    • PHP | MySQL LIMIT Clause
      In MySQL the LIMIT clause is used with the SELECT statement to restrict the number of rows in the result set. The Limit Clause accepts one or two arguments which are offset and count.The value of both the parameters can be zero or positive integers. Offset:It is used to specify the offset of the fir
      3 min read

    • SQL LIMIT Clause
      The LIMIT clause in SQL is used to control the number of rows returned in a query result. It is particularly useful when working with large datasets, allowing us to retrieve only the required number of rows for analysis or display. Whether we're looking to paginate results, find top records, or just
      5 min read

    • SQL Distinct Clause
      The SQL DISTINCT keyword is used in queries to retrieve unique values from a database. It helps in eliminating duplicate records from the result set. It ensures that only unique entries are fetched. Whether you're analyzing datasets or performing data cleaning, the DISTINCT keyword is Important for
      4 min read

    SQL Operators

    • SQL Comparison Operators
      SQL Comparison Operators are used to compare two values and check if they meet the specific criteria. Some comparison operators are = Equal to, > Greater than , < Less than, etc. Comparison Operators in SQLThe below table shows all comparison operators in SQL : OperatorDescription=The SQL Equa
      3 min read

    • SQL - Logical Operators
      SQL Logical Operators are essential tools used to test the truth of conditions in SQL queries. They return boolean values such as TRUE, FALSE, or UNKNOWN, making them invaluable for filtering, retrieving, or manipulating data. These operators allow developers to build complex queries by combining, n
      9 min read

    • SQL | Arithmetic Operators
      Prerequisite: Basic Select statement, Insert into clause, Sql Create Clause, SQL Aliases We can use various Arithmetic Operators on the data stored in the tables. Arithmetic Operators are: + [Addition] - [Subtraction] / [Division] * [Multiplication] % [Modulus] Addition (+) : It is used to perform a
      5 min read

    • SQL | String functions
      SQL String Functions are powerful tools that allow us to manipulate, format, and extract specific parts of text data in our database. These functions are essential for tasks like cleaning up data, comparing strings, and combining text fields. Whether we're working with names, addresses, or any form
      8 min read

    • SQL Wildcard Characters
      SQL wildcard characters are powerful tools that enable advanced pattern matching in string data. They are especially useful when working with the LIKE and NOT LIKE operators, allowing for efficient searches based on partial matches or specific patterns. By using SQL wildcard characters, we can great
      6 min read

    • SQL AND and OR Operators
      The SQL AND and OR operators are used to filter data based on multiple conditions. These logical operators allow users to retrieve precise results from a database by combining various conditions in SELECT, INSERT, UPDATE, and DELETE statements. In this article, we'll learn the AND and OR operators,
      3 min read

    • SQL | Concatenation Operator
      The SQL concatenation operator (||) is a powerful feature that allows us to merge two or more strings into a single output. It is widely used to link columns, character strings, and literals in SQL queries. This operator makes it easier to format and present data in a user-friendly way, combining mu
      3 min read

    • SQL | MINUS Operator
      The Minus Operator in SQL is used with two SELECT statements. The MINUS operator is used to subtract the result set obtained by first SELECT query from the result set obtained by second SELECT query. In simple words, we can say that MINUS operator will return only those rows which are unique in only
      2 min read

    • SQL | DIVISION
      Division in SQL is typically required when you want to find out entities that are interacting with all entities of a set of different types of entities. The division operator is used when we have to evaluate queries that contain the keyword 'all'. When to Use the Division OperatorYou typically requi
      4 min read

    • SQL NOT Operator
      The SQL NOT Operator is a logical operator used to negate or reverse the result of a condition in SQL queries. It is commonly used with the WHERE clause to filter records that do not meet a specified condition, helping you exclude certain values from your results. In this article, we will learn ever
      3 min read

    • SQL | BETWEEN & IN Operator
      In SQL, the BETWEEN and IN operators are widely used for filtering data based on specific criteria. The BETWEEN operator helps filter results within a specified range of values, such as numbers, dates, or text, while the IN operator filters results based on a specific list of values. Both operators
      5 min read

    Working with Data

    • SQL | WHERE Clause
      The SQL WHERE clause allows to filtering of records in queries. Whether you're retrieving data, updating records, or deleting entries from a database, the WHERE clause plays an important role in defining which rows will be affected by the query. Without it, SQL queries would return all rows in a tab
      4 min read

    • SQL ORDER BY
      The ORDER BY clause in SQL is a powerful feature used to sort query results in either ascending or descending order based on one or more columns. Whether you're presenting data to users or analyzing large datasets, sorting the results in a structured way is essential. In this article, we’ll explain
      5 min read

    • SQL INSERT INTO Statement
      The SQL INSERT INTO statement is one of the most commonly used commands for adding new data into a table in a database. Whether you're working with customer data, products, or user details, mastering this command is crucial for efficient database management. Let’s break down how this command works,
      6 min read

    • SQL UPDATE Statement
      In SQL, the UPDATE statement is used to modify existing records in a table. Whether you are updating a single record or multiple records at once, SQL provides the necessary functionality to make these changes. Whether you are working with a small dataset or handling large-scale databases, the UPDATE
      6 min read

    • SQL DELETE Statement
      The SQL DELETE statement is one of the most commonly used commands in SQL (Structured Query Language). It allows you to remove one or more rows from the table depending on the situation. Unlike the DROP statement, which removes the entire table, the DELETE statement removes data (rows) from the tabl
      4 min read

    • SQL Data Types
      SQL Data Types are very important in relational databases. It ensures that data is stored efficiently and accurately. Data types define the type of value a column can hold, such as numbers, text, or dates. Understanding SQL Data Types is critical for database administrators, developers, and data ana
      6 min read

    • ALTER (RENAME) in SQL
      In SQL, making structural changes to a database is often necessary. Whether it's renaming a table or a column, adding new columns, or modifying data types, the SQL ALTER TABLE command plays a critical role. This command provides flexibility to manage and adjust database schemas without affecting the
      5 min read

    • SQL ALTER TABLE
      The SQL ALTER TABLE statement is a powerful tool that allows you to modify the structure of an existing table in a database. Whether you're adding new columns, modifying existing ones, deleting columns, or renaming them, the ALTER TABLE statement enables you to make changes without losing the data s
      5 min read

    SQL Queries

    • SQL | Subquery
      In SQL, subqueries are one of the most powerful and flexible tools for writing efficient queries. A subquery is essentially a query nested within another query, allowing users to perform operations that depend on the results of another query. This makes it invaluable for tasks such as filtering, cal
      6 min read

    • Nested Queries in SQL
      Nested queries, also known as subqueries, are an essential tool in SQL for performing complex data retrieval tasks. They allow us to embed one query within another, enabling us to filter, aggregate, and perform sophisticated calculations. Whether we're handling large datasets or performing advanced
      8 min read

    • Joining Three or More Tables in SQL
      SQL joins are an essential part of relational database management, allowing users to combine data from multiple tables efficiently. When the required data is spread across different tables, joining these tables efficiently is necessary. In this article, we’ll cover everything we need to know about j
      5 min read

    • Inner Join vs Outer Join
      Inner Join and Outer Join are the types of join. The inner join has the work to return the common rows between the two tables, whereas the Outer Join has the work of returning the work of the inner join in addition to the rows that are not matched.  Let's discuss both of them in detail in this artic
      9 min read

    • SQL | Join (Cartesian Join & Self Join)
      In SQL, CARTESIAN JOIN (also known as CROSS JOIN) and SELF JOIN are two distinct types of joins that help combine rows from one or more tables based on certain conditions. While both joins may seem similar, they serve different purposes. Let’s explore both in detail. CARTESIAN JOINA Cartesian Join o
      4 min read

    • How to Get the Names of the Table in SQL
      Retrieving table names in SQL is a common task that aids in effective database management and exploration. Whether we are dealing with a single database or multiple databases, knowing how to retrieve table names helps streamline operations. SQL provides the INFORMATION_SCHEMA.TABLES view, which offe
      3 min read

    • SQL | Subquery
      In SQL, subqueries are one of the most powerful and flexible tools for writing efficient queries. A subquery is essentially a query nested within another query, allowing users to perform operations that depend on the results of another query. This makes it invaluable for tasks such as filtering, cal
      6 min read

    • How to Fetch Duplicate Rows in a Table?
      Identifying duplicate rows in a database table is a common requirement, especially when dealing with large datasets. Duplicates can arise due to data entry errors, system migrations, or batch processing issues. In this article, we will explain efficient SQL techniques to identify and retrieve duplic
      3 min read

    Data Manipulation

    • SQL Joins (Inner, Left, Right and Full Join)
      SQL joins are fundamental tools for combining data from multiple tables in relational databases. Joins allow efficient data retrieval, which is essential for generating meaningful observations and solving complex business queries. Understanding SQL join types, such as INNER JOIN, LEFT JOIN, RIGHT JO
      6 min read

    • SQL Inner Join
      SQL INNER JOIN is a powerful and frequently used operation in relational databases. It allows us to combine two or more tables based on a related column, returning only the records that satisfy the join condition This article will explore the fundamentals of INNER JOIN, its syntax, practical example
      4 min read

    • SQL Outer Join
      SQL Outer Joins allow retrieval of rows from two or more tables based on a related column. Unlike inner Joins, they also include rows that do not have a corresponding match in one or both of the tables. This capability makes Outer Joins extremely useful for comprehensive data analysis and reporting,
      4 min read

    • SQL Self Join
      A Self Join in SQL is a powerful technique that allows one to join a table with itself. This operation is helpful when you need to compare rows within the same table based on specific conditions. A Self Join is often used in scenarios where there is hierarchical or relational data within the same ta
      4 min read

    • How to Group and Aggregate Data Using SQL?
      In SQL, grouping and aggregating data are essential techniques for analyzing datasets. When dealing with large volumes of data, we often need to summarize or categorize it into meaningful groups. The combination of the GROUP BY clause and aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MA
      4 min read

    • SQL HAVING Clause with Examples
      The HAVING clause in SQL is used to filter query results based on aggregate functions. Unlike the WHERE clause, which filters individual rows before grouping, the HAVING clause filters groups of data after aggregation. It is commonly used with functions like SUM(), AVG(), COUNT(), MAX(), and MIN().
      4 min read

    Data Analysis

    • CTE in SQL
      In SQL, a Common Table Expression (CTE) is an essential tool for simplifying complex queries and making them more readable. By defining temporary result sets that can be referenced multiple times, a CTE in SQL allows developers to break down complicated logic into manageable parts. CTEs help with hi
      6 min read

    • Window Functions in SQL
      SQL window functions are essential for advanced data analysis and database management. They enable calculations across a specific set of rows, known as a "window," while retaining the individual rows in the dataset. Unlike traditional aggregate functions that summarize data for the entire group, win
      7 min read

    • Pivot and Unpivot in SQL
      In SQL, PIVOT and UNPIVOT are powerful operations used to transform data and make it more readable, efficient, and manageable. These operations allow us to manipulate tables by switching between rows and columns, which can be crucial for summarizing data, reporting, and data analysis. Understanding
      4 min read

    • Data Preprocessing in Data Mining
      Data preprocessing is the process of preparing raw data for analysis by cleaning and transforming it into a usable format. In data mining it refers to preparing raw data for mining by performing tasks like cleaning, transforming, and organizing it into a format suitable for mining algorithms. Goal i
      6 min read

    • SQL Functions (Aggregate and Scalar Functions)
      SQL Functions are built-in programs that are used to perform different operations on the database. There are two types of functions in SQL: Aggregate FunctionsScalar FunctionsSQL Aggregate FunctionsSQL Aggregate Functions operate on a data group and return a singular output. They are mostly used wit
      4 min read

    • MySQL Date and Time Functions
      Handling date and time data in MySQL is essential for many database operations, especially when it comes to handling timestamps, scheduling tasks, or generating time-based. MySQL provides a variety of date and time functions that help users work with date values, perform calculations, and format the
      6 min read

    • SQL | Date Functions (Set-1)
      SQL Date Functions are essential for managing and manipulating date and time values in SQL databases. They provide tools to perform operations such as calculating date differences, retrieving current dates and times and formatting dates. From tracking sales trends to calculating project deadlines, w
      5 min read

    • SQL | Date Functions (Set-2)
      SQL Date Functions are powerful tools that allow users to manipulate, extract , and format date and time values within SQL databases. These functions simplify handling temporal data, making them indispensable for tasks like calculating intervals, extracting year or month values, and formatting dates
      5 min read

    • SQL | Numeric Functions
      SQL Numeric Functions are essential tools for performing mathematical and arithmetic operations on numeric data. These functions allow you to manipulate numbers, perform calculations, and aggregate data for reporting and analysis purposes. Understanding how to use SQL numeric functions is important
      4 min read

    • SQL Aggregate functions
      SQL Aggregate Functions are used to perform calculations on a set of rows and return a single value. These functions are particularly useful when we need to summarize, analyze, or group large datasets in SQL databases. Whether you're working with sales data, employee records, or product inventories,
      4 min read

    Data Visualization

    • What is Data Visualization and Why is It Important?
      Data visualization is the graphical representation of information. In this guide we will study what is Data visualization and its importance with use cases. Understanding Data VisualizationData visualization translates complex data sets into visual formats that are easier for the human brain to unde
      4 min read

    • Export SQL Server Data From Table to CSV File
      SQL Server is a very popular relational database because of its versatility in exporting data in Excel, CSV, and JSON formats. This feature helps with the portability of data across multiple databases. Here, we will learn how to export SQL Server Data from a table to a CSV file. Tools like Azure Dat
      3 min read

    • Data Visualisation in Python using Matplotlib and Seaborn
      It may sometimes seem easier to go through a set of data points and build insights from it but usually this process may not yield good results. There could be a lot of things left undiscovered as a result of this process. Additionally, most of the data sets used in real life are too big to do any an
      14 min read

geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences