Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
What is Data Analysis?
Next article icon

Data Analysis with Python

Last Updated : 20 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will discuss how to do data analysis with Python. We will discuss all sorts of data analysis i.e. analyzing numerical data with NumPy, Tabular data with Pandas, data visualization Matplotlib, and Exploratory data analysis.

Data Analysis With Python 

Data Analysis is the technique of collecting, transforming, and organizing data to make future predictions and informed data-driven decisions. It also helps to find possible solutions for a business problem. There are six steps for Data Analysis. They are: 

  • Ask or Specify Data Requirements
  • Prepare or Collect Data
  • Clean and Process
  • Analyze
  • Share
  • Act or Report
Data Analysis with python

Data Analysis with Python 

Note: To know more about these steps refer to our Six Steps of Data Analysis Process tutorial. 

Analyzing Numerical Data with NumPy

NumPy is an array processing package in Python and provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python.

Arrays in NumPy

NumPy Array is a table of elements (usually numbers), all of the same types, indexed by a tuple of positive integers. In Numpy, the number of dimensions of the array is called the rank of the array. A tuple of integers giving the size of the array along each dimension is known as the shape of the array. 

Creating NumPy Array

NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the use of different data types like lists, tuples, etc. The type of the resultant array is deduced from the type of elements in the sequences. NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

Create Array using numpy.empty(shape, dtype=float, order=’C’)

Python
import numpy as np  a = np.empty([2, 2], dtype = int) print("\nMatrix a : \n", a)  b = np.empty(2, dtype = int) print("Matrix b : \n", b) 

Output
Matrix a :   [[     94655291709206                   0]  [3543826506195694713   34181816989462323]] Matrix b :   [-4611686018427387904         206158462975] 

Create Array using numpy.zeros(shape, dtype = None, order = ‘C’)

Python
import numpy as np  a = np.zeros([2, 2], dtype = int) print("\nMatrix a : \n", a)  b = np.zeros(2, dtype = int) print("Matrix b : \n", b)    c = np.zeros([3, 3]) print("\nMatrix c : \n", c) 

Output
Matrix a :   [[0 0]  [0 0]] Matrix b :   [0 0]  Matrix c :   [[0. 0. 0.]  [0. 0. 0.]  [0. 0. 0.]] 

Operations on Numpy Arrays

Arithmetic Operations

  • Addition: 
Python
import numpy as np  a = np.array([5, 72, 13, 100]) b = np.array([2, 5, 10, 30])  add_ans = a+b print(add_ans)  add_ans = np.add(a, b) print(add_ans)  c = np.array([1, 2, 3, 4]) add_ans = a+b+c print(add_ans)  add_ans = np.add(a, b, c) print(add_ans) 

Output
[  7  77  23 130] [  7  77  23 130] [  8  79  26 134] [  7  77  23 130] 


  • Subtraction:
Python
import numpy as np  a = np.array([5, 72, 13, 100]) b = np.array([2, 5, 10, 30])  sub_ans = a-b print(sub_ans)  sub_ans = np.subtract(a, b) print(sub_ans) 

Output
[ 3 67  3 70] [ 3 67  3 70] 


  • Multiplication:
Python
import numpy as np  a = np.array([5, 72, 13, 100]) b = np.array([2, 5, 10, 30])  mul_ans = a*b print(mul_ans)  mul_ans = np.multiply(a, b) print(mul_ans) 

Output
[  10  360  130 3000] [  10  360  130 3000] 


  • Division:
Python
import numpy as np  a = np.array([5, 72, 13, 100]) b = np.array([2, 5, 10, 30])  div_ans = a/b print(div_ans)  div_ans = np.divide(a, b) print(div_ans) 

Output
[ 2.5        14.4         1.3         3.33333333] [ 2.5        14.4         1.3         3.33333333] 


For more information, refer to our NumPy – Arithmetic Operations Tutorial

NumPy Array Indexing

Indexing can be done in NumPy by using an array as an index. In the case of the slice, a view or shallow copy of the array is returned but in the index array, a copy of the original array is returned. Numpy arrays can be indexed with other arrays or any other sequence with the exception of tuples. The last element is indexed by -1 second last by -2 and so on.

Python NumPy Array Indexing

Python
import numpy as np  a = np.arange(10, 1, -2)  print("\n A sequential array with a negative step: \n",a)  newarr = a[np.array([3, 1, 2 ])] print("\n Elements at these indices are:\n",newarr) 

Output
 A sequential array with a negative step:   [10  8  6  4  2]   Elements at these indices are:  [4 8 6] 

NumPy Array Slicing

Consider the syntax x[obj] where x is the array and obj is the index. The slice object is the index in the case of basic slicing. Basic slicing occurs when obj is :

  • a slice object that is of the form start: stop: step
  • an integer
  • or a tuple of slice objects and integers

All arrays generated by basic slicing are always the view in the original array.

Python
import numpy as np  a = np.arange(20) print("\n Array is:\n ",a)  print("\n a[-8:17:1] = ",a[-8:17:1])  print("\n a[10:] = ",a[10:]) 

Output
 Array is:   [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]   a[-8:17:1] =  [12 13 14 15 16]   a[10:] =  [10 11 12 13 14 15 16 17 18 19] 
Python
import numpy as np  a = np.arange(20) print("\n Array is:\n ",a)  print("\n a[-8:17:1] = ",a[-8:17:1])  print("\n a[10:] = ",a[10:]) 

Output
 Array is:   [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]   a[-8:17:1] =  [12 13 14 15 16]   a[10:] =  [10 11 12 13 14 15 16 17 18 19] 

Ellipsis can also be used along with basic slicing. Ellipsis (…) is the number of : objects needed to make a selection tuple of the same length as the dimensions of the array.

Python
import numpy as np  b = np.array([[[1, 2, 3],[4, 5, 6]],             [[7, 8, 9],[10, 11, 12]]])  print(b[...,1])  

Output
[[ 2  5]  [ 8 11]] 

NumPy Array Broadcasting

The term broadcasting refers to how numpy treats arrays with different Dimensions during arithmetic operations which lead to certain constraints, the smaller array is broadcast across the larger array so that they have compatible shapes. 

Let’s assume that we have a large data set, each datum is a list of parameters. In Numpy we have a 2-D array, where each row is a datum and the number of rows is the size of the data set. Suppose we want to apply some sort of scaling to all these data every parameter gets its own scaling factor or say Every parameter is multiplied by some factor.

Just to have a clear understanding, let’s count calories in foods using a macro-nutrient breakdown. Roughly put, the caloric parts of food are made of fats (9 calories per gram), protein (4 CPG), and carbs (4 CPG). So if we list some foods (our data), and for each food list its macro-nutrient breakdown (parameters), we can then multiply each nutrient by its caloric value (apply scaling) to compute the caloric breakdown of every food item.

NumPy Array Broadcasting

With this transformation, we can now compute all kinds of useful information. For example, what is the total number of calories present in some food or, given a breakdown of my dinner know how many calories did I get from protein and so on.

Let’s see a naive way of producing this computation with Numpy:

Python
import numpy as np  macros = np.array([     [0.8, 2.9, 3.9],     [52.4, 23.6, 36.5],     [55.2, 31.7, 23.9],     [14.4, 11, 4.9] ])  cal_per_macro = np.array([3, 3, 8])  result = macros * cal_per_macro  print(result) 

Output
[[  2.4   8.7  31.2]  [157.2  70.8 292. ]  [165.6  95.1 191.2]  [ 43.2  33.   39.2]] 

Broadcasting Rules: Broadcasting two arrays together follow these rules:

  • If the arrays don’t have the same rank then prepend the shape of the lower rank array with 1s until both shapes have the same length.
  • The two arrays are compatible in a dimension if they have the same size in the dimension or if one of the arrays has size 1 in that dimension.
  • The arrays can be broadcast together if they are compatible with all dimensions.
  • After broadcasting, each array behaves as if it had a shape equal to the element-wise maximum of shapes of the two input arrays.
  • In any dimension where one array had a size of 1 and the other array had a size greater than 1, the first array behaves as if it were copied along that dimension.
Python
import numpy as np  v = np.array([12, 24, 36]) w = np.array([45, 55])  print(np.reshape(v, (3, 1)) * w)  X = np.array([[12, 22, 33], [45, 55, 66]])  print(X + v)  print((X.T + w).T)  print(X * 2) 

Output
[[ 540  660]  [1080 1320]  [1620 1980]] [[ 24  46  69]  [ 57  79 102]] [[ 57  67  78]  [100 110 121]] [[ 24  44  66]  [ 90 110 132]] 

Note: For more information, refer to our Python NumPy Tutorial.

Analyzing Data Using Pandas

Python Pandas Is used for relational or labeled data and provides various data structures for manipulating such data and time series. This library is built on top of the NumPy library. This module is generally imported as:

import pandas as pd

Here, pd is referred to as an alias to the Pandas. However, it is not necessary to import the library using the alias, it just helps in writing less amount code every time a method or property is called. Pandas generally provide two data structures for manipulating data, They are: 

  • Series
  • Dataframe

Series: 

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes. Pandas Series is nothing but a column in an excel sheet. Labels need not be unique but must be a hashable type. The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series

Pandas Series 

It can be created using the Series() function by loading the dataset from the existing storage like SQL, Database, CSV Files, Excel Files, etc., or from data structures like lists, dictionaries, etc.

Python Pandas Creating Series

Python
import pandas as pd import numpy as np  ser = pd.Series(dtype="object")  print(ser)  data = np.array(['g', 'e', 'e', 'k', 's'])  ser = pd.Series(data) print(ser) 

Output
Series([], dtype: object) 0    g 1    e 2    e 3    k 4    s dtype: object 

Dataframe:

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

Pandas Dataframe 

It can be created using the Dataframe() method and just like a series, it can also be from different file types and data structures.

Python Pandas Creating Dataframe

Python
import pandas as pd  df = pd.DataFrame() print(df)  lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks']  df = pd.DataFrame(lst, columns=['Words'])  print(df) 

Output
Empty DataFrame Columns: [] Index: []     Words 0   Geeks 1     For 2   Geeks 3      is 4  portal 5     for 6   Geeks 

Creating Dataframe from CSV

We can create a dataframe from the CSV files using the read_csv() function.

Python Pandas read CSV

Python
import pandas as pd  df = pd.read_csv("Iris.csv")  df.head() 

Output:

head of  a dataframe

head of  a dataframe

Filtering DataFrame

Pandas dataframe.filter() function is used to Subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Python Pandas Filter Dataframe

Python
import pandas as pd  df = pd.read_csv("Iris.csv")  df.filter(["Species", "SepalLengthCm", "SepalLengthCm"]).head() 

Output:

Applying filter on dataset

Applying filter on dataset 

Sorting DataFrame

In order to sort the data frame in pandas, the function sort_values() is used. Pandas sort_values() can sort the data frame in Ascending or Descending order.

Python Pandas Sorting Dataframe in Ascending Order

Sorted dataset based on a column value

Sorted dataset based on a column value 

Pandas GroupBy

Groupby is a pretty simple concept. We can create a grouping of categories and apply a function to the categories. In real data science projects, you’ll be dealing with large amounts of data and trying things over and over, so for efficiency, we use the Groupby concept.  Groupby mainly refers to a process involving one or more of the following steps they are:

  • Splitting: It is a process in which we split data into group by applying some conditions on datasets.
  • Applying: It is a process in which we apply a function to each group independently.
  • Combining: It is a process in which we combine different datasets after applying groupby and results into a data structure.

The following image will help in understanding the process involve in the Groupby concept.

1. Group the unique values from the Team column

Pandas Groupby Method

Pandas Groupby Method 

2. Now there’s a bucket for each group

3. Toss the other data into the buckets

Pandas GroupBy

4. Apply a function on the weight column of each bucket.

Applying Function on the weight column of each column 

Python Pandas GroupBy:

Python
import pandas as pd  data1 = {'Name': ['Jai', 'Anuj', 'Jai', 'Princi',                   'Gaurav', 'Anuj', 'Princi', 'Abhi'],          'Age': [27, 24, 22, 32,                  33, 36, 27, 32],          'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',                      'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'],          'Qualification': ['Msc', 'MA', 'MCA', 'Phd',                            'B.Tech', 'B.com', 'Msc', 'MA']}  df = pd.DataFrame(data1)  print("Original Dataframe") print(df)  gk = df.groupby('Name')  print("After Creating Groups") gk.first() 

Output:

pandas groupby

pandas groupby 

Applying function to group:

After splitting a data into a group, we apply a function to each group in order to do that we perform some operations they are:

  • Aggregation: It is a process in which we compute a summary statistic (or statistics) about each group. For Example, Compute group sums or means
  • Transformation: It is a process in which we perform some group-specific computations and return a like-indexed. For Example, Filling NAs within groups with a value derived from each group
  • Filtration: It is a process in which we discard some groups, according to a group-wise computation that evaluates True or False. For Example, Filtering out data based on the group sum or mean

Pandas Aggregation

Aggregation is a process in which we compute a summary statistic about each group. The aggregated function returns a single aggregated value for each group. After splitting data into groups using groupby function, several aggregation operations can be performed on the grouped data.

Python Pandas Aggregation

Python
import pandas as pd  data1 = {'Name': ['Jai', 'Anuj', 'Jai', 'Princi',                   'Gaurav', 'Anuj', 'Princi', 'Abhi'],          'Age': [27, 24, 22, 32,                  33, 36, 27, 32],          'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj',                      'Jaunpur', 'Kanpur', 'Allahabad', 'Aligarh'],          'Qualification': ['Msc', 'MA', 'MCA', 'Phd',                            'B.Tech', 'B.com', 'Msc', 'MA']}  df = pd.DataFrame(data1)  grp1 = df.groupby('Name')  result = grp1['Age'].aggregate('sum') print(result) 

Output:

Use of sum aggregate function on dataset

Use of sum aggregate function on dataset 

Concatenating DataFrame

In order to concat the dataframe, we use concat() function which helps in concatenating the dataframe. This function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

Python Pandas Concatenate Dataframe

Python
import pandas as pd  data1 = {'key': ['K0', 'K1', 'K2', 'K3'],          'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],          'Age': [27, 24, 22, 32]}  data2 = {'key': ['K0', 'K1', 'K2', 'K3'],          'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],          'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}  df = pd.DataFrame(data1) df1 = pd.DataFrame(data2)  res = pd.concat([df, df1], axis=1) print(res) 

Output:

Merging DataFrame

When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on. It’s a good practice to use keys that have unique values throughout the column to avoid unintended duplication of row values. Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.

There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.

Merge Daframe Python Pandas

Python Pandas Merge Dataframe

Python
import pandas as pd   data1 = {'key': ['K0', 'K1', 'K2', 'K3'],          'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],          'Age':[27, 24, 22, 32],}   data2 = {'key': ['K0', 'K1', 'K2', 'K3'],          'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],          'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}    df = pd.DataFrame(data1) df1 = pd.DataFrame(data2)    display(df,df1)  res = pd.merge(df, df1, on='key') print(res) 

Output:

Concatinating Two datasets

Concatinating Two datasets 

Joining DataFrame

In order to join the dataframe, we use .join() function this function is used for combining the columns of two potentially differently indexed DataFrames into a single result DataFrame.

Python Pandas Join Dataframe

Python
import pandas as pd   data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],          'Age':[27, 24, 22, 32]}   data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],          'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}     df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3']) df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])  res = df.join(df1) print(res)  

Output:

Joining two datasets 

For more information, refer to our Pandas Merging, Joining, and Concatenating tutorial

For a complete guide on Pandas refer to our Pandas Tutorial.

Visualization with Matplotlib

Matplotlib is easy to use and an amazing visualizing library in Python. It is built on NumPy arrays and designed to work with the broader SciPy stack and consists of several plots like line, bar, scatter, histogram, etc. 

Pyplot

Pyplot is a Matplotlib module that provides a MATLAB-like interface. Pyplot provides functions that interact with the figure i.e. creates a figure, decorates the plot with labels, and creates a plotting area in a figure.

Python
import matplotlib.pyplot as plt  plt.plot([1, 2, 3, 4], [1, 4, 9, 16]) plt.axis([0, 6, 0, 20]) plt.show() 

Output:

Bar chart

A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent. The bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons between the discrete categories. It can be created using the bar() method.

Python Matplotlib Bar Chart Here we will use the iris dataset only

Python
import matplotlib.pyplot as plt import pandas as pd  df = pd.read_csv("Iris.csv")  plt.bar(df['Species'], df['SepalLengthCm'])  plt.title("Iris Dataset")  plt.legend(["bar"]) plt.show() 

Output:

Bar chart using matplotlib library

Bar chart using matplotlib library  

Histograms

A histogram is basically used to represent data in the form of some groups. It is a type of bar plot where the X-axis represents the bin ranges while the Y-axis gives information about frequency. To create a histogram the first step is to create a bin of the ranges, then distribute the whole range of the values into a series of intervals, and count the values which fall into each of the intervals. Bins are clearly identified as consecutive, non-overlapping intervals of variables. The hist() function is used to compute and create a histogram of x.

Python Matplotlib Histogram

Python
import matplotlib.pyplot as plt import pandas as pd  df = pd.read_csv("Iris.csv")  plt.hist(df["SepalLengthCm"])  plt.title("Histogram")  plt.legend(["SepalLengthCm"]) plt.show() 

Output:

Histplot using matplotlib library 

Scatter Plot

Scatter plots are used to observe relationship between variables and uses dots to represent the relationship between them. The scatter() method in the matplotlib library is used to draw a scatter plot.

Python Matplotlib Scatter Plot

Python
import matplotlib.pyplot as plt import pandas as pd  df = pd.read_csv("Iris.csv")  plt.scatter(df["Species"], df["SepalLengthCm"])  plt.title("Scatter Plot")  plt.legend(["SepalLengthCm"]) plt.show() 

Output:

Scatter plot using matplotlib library

Scatter plot using matplotlib library 

Box Plot

A boxplot,Correlation also known as a box and whisker plot. It is a very good visual representation when it comes to measuring the data distribution. Clearly plots the median values, outliers and the quartiles. Understanding data distribution is another important factor which leads to better model building. If data has outliers, box plot is a recommended way to identify them and take necessary actions. The box and whiskers chart shows how data is spread out. Five pieces of information are generally included in the chart

  • The minimum is shown at the far left of the chart, at the end of the left ‘whisker’
  • First quartile, Q1, is the far left of the box (left whisker)
  • The median is shown as a line in the center of the box
  • Third quartile, Q3, shown at the far right of the box (right whisker)
  • The maximum is at the far right of the box

Representation of box plot

Inter quartile range

Inter quartile range 

Illustrating box plot

Illustrating box plot 

Python Matplotlib Box Plot

Python
import matplotlib.pyplot as plt import pandas as pd  df = pd.read_csv("Iris.csv")  plt.boxplot(df["SepalWidthCm"])  plt.title("Box Plot")  plt.legend(["SepalWidthCm"]) plt.show() 

Output:

Boxplot using matplotlib library

Boxplot using matplotlib library 

Correlation Heatmaps

A 2-D Heatmap is a data visualization tool that helps to represent the magnitude of the phenomenon in form of colors. A correlation heatmap is a heatmap that shows a 2D correlation matrix between two discrete dimensions, using colored cells to represent data from usually a monochromatic scale. The values of the first dimension appear as the rows of the table while the second dimension is a column. The color of the cell is proportional to the number of measurements that match the dimensional value. This makes correlation heatmaps ideal for data analysis since it makes patterns easily readable and highlights the differences and variation in the same data. A correlation heatmap, like a regular heatmap, is assisted by a colorbar making data easily readable and comprehensible.

Note: The data here has to be passed with corr() method to generate a correlation heatmap. Also, corr() itself eliminates columns that will be of no use while generating a correlation heatmap and selects those which can be used.

Python Matplotlib Correlation Heatmap

Python
import matplotlib.pyplot as plt import pandas as pd  df = pd.read_csv("Iris.csv")  plt.imshow(df.corr() , cmap = 'autumn' , interpolation = 'nearest' )    plt.title("Heat Map") plt.show() 

Output:

Heatmap using matplotlib library

Heatmap using matplotlib library 

For more information on data visualization refer to our below tutorials – 

  • Data Visualization using Matplotlib
  • Data Visualization with Python Seaborn
  • Data Visualisation in Python using Matplotlib and Seaborn
  • Using Plotly for Interactive Data Visualization in Python
  • Interactive Data Visualization with Bokeh

Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques. With this technique, we can get detailed information about the statistical summary of the data. We will also be able to deal with the duplicates values, outliers, and also see some trends or patterns present in the dataset.

Note: We will be using Iris Dataset.

Getting Information about the Dataset

We will use the shape parameter to get the shape of the dataset.

Shape of Dataframe 

df.shape

Output:

(150, 6)

We can see that the dataframe contains 6 columns and 150 rows.

Now, let’s also the columns and their data types. For this, we will use the info() method.

Information about Dataset 

df.info()

Output:

information about the dataset

information about the dataset 

We can see that only one column has categorical data and all the other columns are of the numeric type with non-Null entries.

Let’s get a quick statistical summary of the dataset using the describe() method. The describe() function applies basic statistical computations on the dataset like extreme values, count of data points standard deviation, etc. Any missing value or NaN value is automatically skipped. describe() function gives a good picture of the distribution of data.

Description of dataset 

df.describe()

Output:

Description about the dataset

Description about the dataset 

We can see the count of each column along with their mean value, standard deviation, minimum and maximum values.

Checking Missing Values

We will check if our data contains any missing values or not. Missing values can occur when no information is provided for one or more items or for a whole unit. We will use the isnull() method.

python code for missing value

df.isnull().sum()

Output:

Missing values in dataset

Missing values in the dataset 

We can see that no column has any missing value.

Checking Duplicates

Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps in removing duplicates from the data frame.

Pandas function for missing values 

data = df.drop_duplicates(subset ="Species",)
data

Output:

Dropping duplicate value in the dataset 

We can see that there are only three unique species. Let’s see if the dataset is balanced or not i.e. all the species contain equal amounts of rows or not. We will use the Series.value_counts() function. This function returns a Series containing counts of unique values. 

Python code for value counts in the column 

df.value_counts("Species")

Output:

value count in the dataset 

We can see that all the species contain an equal amount of rows, so we should not delete any entries.

Relation between variables

We will see the relationship between the sepal length and sepal width and also between petal length and petal width.

Comparing Sepal Length and Sepal Width

Python
import seaborn as sns import matplotlib.pyplot as plt   sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',                 hue='Species', data=df, )  plt.legend(bbox_to_anchor=(1, 1), loc=2) plt.show() 

Output:

Scatter plot using matplotlib library

Scatter plot using matplotlib library 

From the above plot, we can infer that – 

  • Species Setosa has smaller sepal lengths but larger sepal widths.
  • Versicolor Species lies in the middle of the other two species in terms of sepal length and width
  • Species Virginica has larger sepal lengths but smaller sepal widths.

Comparing Petal Length and Petal Width

Python
import seaborn as sns import matplotlib.pyplot as plt  sns.scatterplot(x='PetalLengthCm', y='PetalWidthCm',                 hue='Species', data=df, )  plt.legend(bbox_to_anchor=(1, 1), loc=2) plt.show() 

Output:

sactter plot petal length

sactter plot petal length 

From the above plot, we can infer that – 

  • The species Setosa has smaller petal lengths and widths.
  • Versicolor Species lies in the middle of the other two species in terms of petal length and width
  • Species Virginica has the largest petal lengths and widths.

Let’s plot all the column’s relationships using a pairplot. It can be used for multivariate analysis.

Python code for pairplot 

Python
import seaborn as sns import matplotlib.pyplot as plt   sns.pairplot(df.drop(['Id'], axis = 1),               hue='Species', height=2) 

Output:

Pairplot for the dataset 

We can see many types of relationships from this plot such as the species Seotsa has the smallest of petals widths and lengths. It also has the smallest sepal length but larger sepal widths. Such information can be gathered about any other species.

Handling Correlation

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the dataframe. Any NA values are automatically excluded. Any non-numeric data type columns in the dataframe are ignored.

Example:

data.corr(method='pearson')

Output:

correlation between columns in the dataset

correlation between columns in the dataset 

Heatmaps

The heatmap is a data visualization technique that is used to analyze the dataset as colors in two dimensions. Basically, it shows a correlation between all numerical variables in the dataset. In simpler terms, we can plot the above-found correlation using the heatmaps.

python code for heatmap 

Python
import seaborn as sns import matplotlib.pyplot as plt   sns.heatmap(df.corr(method='pearson').drop(   ['Id'], axis=1).drop(['Id'], axis=0),             annot = True); plt.show() 

Output:

Heatmap for correlation in the dataset

Heatmap for correlation in the dataset 

From the above graph, we can see that –

  • Petal width and petal length have high correlations.
  • Petal length and sepal width have good correlations.
  • Petal Width and Sepal length have good correlations.

Handling Outliers

An Outlier is a data item/object that deviates significantly from the rest of the (so-called normal)objects. They can be caused by measurement or execution errors. The analysis for outlier detection is referred to as outlier mining. There are many ways to detect outliers, and the removal process is the data frame same as removing a data item from the panda’s dataframe.

Let’s consider the iris dataset and let’s plot the boxplot for the SepalWidthCm column.

python code for Boxplot 

Python
import seaborn as sns import matplotlib.pyplot as plt  df = pd.read_csv('Iris.csv')  sns.boxplot(x='SepalWidthCm', data=df) 

Output:

Boxplot for sepalwidth column

Boxplot for sepalwidth column 

In the above graph, the values above 4 and below 2 are acting as outliers.

Removing Outliers

For removing the outlier, one must follow the same process of removing an entry from the dataset using its exact position in the dataset because in all the above methods of detecting the outliers end result is the list of all those data items that satisfy the outlier definition according to the method used.

We will detect the outliers using IQR and then we will remove them. We will also draw the boxplot to see if the outliers are removed or not.

Python
import sklearn from sklearn.datasets import load_boston import pandas as pd import seaborn as sns  df = pd.read_csv('Iris.csv')  Q1 = np.percentile(df['SepalWidthCm'], 25,                 interpolation = 'midpoint')  Q3 = np.percentile(df['SepalWidthCm'], 75,                 interpolation = 'midpoint') IQR = Q3 - Q1  print("Old Shape: ", df.shape)  upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR)) lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))  df.drop(upper[0], inplace = True) df.drop(lower[0], inplace = True)  print("New Shape: ", df.shape) sns.boxplot(x='SepalWidthCm', data=df) 

Output:

boxplot using seaborn library

boxplot using seaborn library 

For more information about EDA, refer to our below tutorials – 

  • What is Exploratory Data Analysis ?
  • Exploratory Data Analysis in Python | Set 1
  • Exploratory Data Analysis in Python | Set 2
  • Exploratory Data Analysis on Iris Dataset


Next Article
What is Data Analysis?
author
abhishek1
Improve
Article Tags :
  • AI-ML-DS
  • Data Analysis
  • AI-ML-DS With Python

Similar Reads

  • Data Analysis with Python
    In this article, we will discuss how to do data analysis with Python. We will discuss all sorts of data analysis i.e. analyzing numerical data with NumPy, Tabular data with Pandas, data visualization Matplotlib, and Exploratory data analysis. Data Analysis With Python Data Analysis is the technique
    15+ min read
  • Introduction to Data Analysis

    • What is Data Analysis?
      Data analysis refers to the practice of examining datasets to draw conclusions about the information they contain. It involves organizing, cleaning, and studying the data to understand patterns or trends. Data analysis helps to answer questions like "What is happening" or "Why is this happening". Or
      6 min read

    • Data Analytics and its type
      Data analytics is an important field that involves the process of collecting, processing, and interpreting data to uncover insights and help in making decisions. Data analytics is the practice of examining raw data to identify trends, draw conclusions, and extract meaningful information. This involv
      9 min read

    • How to Install Numpy on Windows?
      Python NumPy is a general-purpose array processing package that provides tools for handling n-dimensional arrays. It provides various computing tools such as comprehensive mathematical functions, and linear algebra routines. NumPy provides both the flexibility of Python and the speed of well-optimiz
      3 min read

    • How to Install Pandas in Python?
      Pandas in Python is a package that is written for data analysis and manipulation. Pandas offer various operations and data structures to perform numerical data manipulations and time series. Pandas is an open-source library that is built over Numpy libraries. Pandas library is known for its high pro
      5 min read

    • How to Install Matplotlib on python?
      Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. In this article, we will look into the various process of installing Matplotlib on Windo
      2 min read

    • How to Install Python Tensorflow in Windows?
      Tensorflow is a free and open-source software library used to do computational mathematics to build machine learning models more profoundly deep learning models. It is a product of Google built by Google’s brain team, hence it provides a vast range of operations performance with ease that is compati
      3 min read

    Data Analysis Libraries

    • Pandas Tutorial
      Pandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
      7 min read

    • NumPy Tutorial - Python Library
      NumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays. At its core it introduces the ndarray (n-dimen
      3 min read

    • Data Analysis with SciPy
      Scipy is a Python library useful for solving many mathematical equations and algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc. Using its high-level funct
      6 min read

    • Introduction to TensorFlow
      TensorFlow is an open-source framework for machine learning (ML) and artificial intelligence (AI) that was developed by Google Brain. It was designed to facilitate the development of machine learning models, particularly deep learning models, by providing tools to easily build, train, and deploy the
      6 min read

    Data Visulization Libraries

    • Matplotlib Tutorial
      Matplotlib is an open-source visualization library for the Python programming language, widely used for creating static, animated and interactive plots. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, Qt, GTK and wxPython. It
      5 min read

    • Python Seaborn Tutorial
      Seaborn is a library mostly used for statistical plotting in Python. It is built on top of Matplotlib and provides beautiful default styles and color palettes to make statistical plots more attractive. In this tutorial, we will learn about Python Seaborn from basics to advance using a huge dataset o
      15+ min read

    • Plotly tutorial
      Plotly library in Python is an open-source library that can be used for data visualization and understanding data simply and easily. Plotly supports various types of plots like line charts, scatter plots, histograms, box plots, etc. So you all must be wondering why Plotly is over other visualization
      15+ min read

    • Introduction to Bokeh in Python
      Bokeh is a Python interactive data visualization. Unlike Matplotlib and Seaborn, Bokeh renders its plots using HTML and JavaScript. It targets modern web browsers for presentation providing elegant, concise construction of novel graphics with high-performance interactivity. Features of Bokeh: Some o
      1 min read

    Exploratory Data Analysis (EDA)

    • Univariate, Bivariate and Multivariate data and its analysis
      In this article,we will be discussing univariate, bivariate, and multivariate data and their analysis. Univariate data: Univariate data refers to a type of data in which each observation or data point corresponds to a single variable. In other words, it involves the measurement or observation of a s
      5 min read

    • Measures of Central Tendency in Statistics
      Central Tendencies in Statistics are the numerical values that are used to represent mid-value or central value a large collection of numerical data. These obtained numerical values are called central or average values in Statistics. A central or average value of any statistical data or series is th
      10 min read

    • Measures of Spread - Range, Variance, and Standard Deviation
      Collecting the data and representing it in form of tables, graphs, and other distributions is essential for us. But, it is also essential that we get a fair idea about how the data is distributed, how scattered it is, and what is the mean of the data. The measures of the mean are not enough to descr
      9 min read

    • Interquartile Range and Quartile Deviation using NumPy and SciPy
      In statistical analysis, understanding the spread or variability of a dataset is crucial for gaining insights into its distribution and characteristics. Two common measures used for quantifying this variability are the interquartile range (IQR) and quartile deviation. Quartiles Quartiles are a kind
      5 min read

    • Anova Formula
      ANOVA Test, or Analysis of Variance, is a statistical method used to test the differences between the means of two or more groups. Developed by Ronald Fisher in the early 20th century, ANOVA helps determine whether there are any statistically significant differences between the means of three or mor
      7 min read

    • Skewness of Statistical Data
      Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. In simpler terms, it indicates whether the data is concentrated more on one side of the mean compared to the other side. Why is skewness important?Understanding the skewness of dat
      5 min read

    • How to Calculate Skewness and Kurtosis in Python?
      Skewness is a statistical term and it is a way to estimate or measure the shape of a distribution.  It is an important statistical methodology that is used to estimate the asymmetrical behavior rather than computing frequency distribution. Skewness can be two types: Symmetrical: A distribution can b
      3 min read

    • Difference Between Skewness and Kurtosis
      What is Skewness? Skewness is an important statistical technique that helps to determine the asymmetrical behavior of the frequency distribution, or more precisely, the lack of symmetry of tails both left and right of the frequency curve. A distribution or dataset is symmetric if it looks the same t
      4 min read

    • Histogram | Meaning, Example, Types and Steps to Draw
      What is Histogram?A histogram is a graphical representation of the frequency distribution of continuous series using rectangles. The x-axis of the graph represents the class interval, and the y-axis shows the various frequencies corresponding to different class intervals. A histogram is a two-dimens
      5 min read

    • Interpretations of Histogram
      Histograms helps visualizing and comprehending the data distribution. The article aims to provide comprehensive overview of histogram and its interpretation. What is Histogram?Histograms are graphical representations of data distributions. They consist of bars, each representing the frequency or cou
      7 min read

    • Box Plot
      Box Plot is a graphical method to visualize data distribution for gaining insights and making informed decisions. Box plot is a type of chart that depicts a group of numerical data through their quartiles. In this article, we are going to discuss components of a box plot, how to create a box plot, u
      7 min read

    • Quantile Quantile plots
      The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it
      8 min read

    • What is Univariate, Bivariate & Multivariate Analysis in Data Visualisation?
      Data Visualisation is a graphical representation of information and data. By using different visual elements such as charts, graphs, and maps data visualization tools provide us with an accessible way to find and understand hidden trends and patterns in data. In this article, we are going to see abo
      3 min read

    • Using pandas crosstab to create a bar plot
      In this article, we will discuss how to create a bar plot by using pandas crosstab in Python. First Lets us know more about the crosstab, It is a simple cross-tabulation of two or more variables. What is cross-tabulation? It is a simple cross-tabulation that help us to understand the relationship be
      3 min read

    • Exploring Correlation in Python
      This article aims to give a better understanding of a very important technique of multivariate exploration. A correlation Matrix is basically a covariance matrix. Also known as the auto-covariance matrix, dispersion matrix, variance matrix, or variance-covariance matrix. It is a matrix in which the
      4 min read

    • Covariance and Correlation
      Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. In this article, we will learn about the differences
      5 min read

    • Factor Analysis | Data Analysis
      Factor analysis is a statistical method used to analyze the relationships among a set of observed variables by explaining the correlations or covariances between them in terms of a smaller number of unobserved variables called factors. Table of Content What is Factor Analysis?What does Factor mean i
      13 min read

    • Data Mining - Cluster Analysis
      Data mining is the process of finding patterns, relationships and trends to gain useful insights from large datasets. It includes techniques like classification, regression, association rule mining and clustering. In this article, we will learn about clustering analysis in data mining. Understanding
      6 min read

    • MANOVA Test in R Programming
      Multivariate analysis of variance (MANOVA) is simply an ANOVA (Analysis of variance) with several dependent variables. It is a continuation of the ANOVA. In an ANOVA, we test for statistical differences on one continuous dependent variable by an independent grouping variable. The MANOVA continues th
      3 min read

    • MANOVA Test in R Programming
      Multivariate analysis of variance (MANOVA) is simply an ANOVA (Analysis of variance) with several dependent variables. It is a continuation of the ANOVA. In an ANOVA, we test for statistical differences on one continuous dependent variable by an independent grouping variable. The MANOVA continues th
      3 min read

    • Python - Central Limit Theorem
      Central Limit Theorem (CLT) is a foundational principle in statistics, and implementing it using Python can significantly enhance data analysis capabilities. Statistics is an important part of data science projects. We use statistical tools whenever we want to make any inference about the population
      7 min read

    • Probability Distribution Function
      Probability Distribution refers to the function that gives the probability of all possible values of a random variable.It shows how the probabilities are assigned to the different possible values of the random variable.Common types of probability distributions Include: Binomial Distribution.Bernoull
      9 min read

    • Probability Density Estimation & Maximum Likelihood Estimation
      Probability density and maximum likelihood estimation (MLE) are key ideas in statistics that help us make sense of data. Probability Density Function (PDF) tells us how likely different outcomes are for a continuous variable, while Maximum Likelihood Estimation helps us find the best-fitting model f
      8 min read

    • Exponential Distribution in R Programming - dexp(), pexp(), qexp(), and rexp() Functions
      The exponential distribution in R Language is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. In R Programming Langu
      2 min read

    • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
      The previous articles talked about some of the Continuous Probability Distributions. This article covers one of the distributions which are not continuous but discrete, namely the Binomial Distribution. Introduction - To understand the Binomial distribution, we must first understand what a Bernoulli
      5 min read

    • Poisson Distribution | Definition, Formula, Table and Examples
      The Poisson distribution is a discrete probability distribution that calculates the likelihood of a certain number of events happening in a fixed time or space, assuming the events occur independently and at a constant rate. It is characterized by a single parameter, λ (lambda), which represents the
      11 min read

    • P-Value: Comprehensive Guide to Understand, Apply, and Interpret
      A p-value is a statistical metric used to assess a hypothesis by comparing it with observed data. This article delves into the concept of p-value, its calculation, interpretation, and significance. It also explores the factors that influence p-value and highlights its limitations. Table of Content W
      12 min read

    • Z-Score in Statistics | Definition, Formula, Calculation and Uses
      Z-Score in statistics is a measurement of how many standard deviations away a data point is from the mean of a distribution. A z-score of 0 indicates that the data point's score is the same as the mean score. A positive z-score indicates that the data point is above average, while a negative z-score
      15+ min read

    • How to Calculate Point Estimates in R?
      Point estimation is a technique used to find the estimate or approximate value of population parameters from a given data sample of the population. The point estimate is calculated for the following two measuring parameters: Measuring parameterPopulation ParameterPoint EstimateProportionπp Meanμx̄ T
      3 min read

    • Confidence Interval
      Confidence Interval (CI) is a range of values that estimates where the true population value is likely to fall. Instead of just saying The average height of students is 165 cm a confidence interval allow us to say We are 95% confident that the true average height is between 160 cm and 170 cm. Before
      9 min read

    • Chi-square test in Machine Learning
      Chi-Square test helps us determine if there is a significant relationship between two categorical variables and the target variable. It is a non-parametric statistical test meaning it doesn’t follow normal distribution. It checks whether there’s a significant difference between expected and observed
      9 min read

    • Understanding Hypothesis Testing
      Hypothesis method compares two opposite statements about a population and uses sample data to decide which one is more likely to be correct.To test this assumption we first take a sample from the population and analyze it and use the results of the analysis to decide if the claim is valid or not. Su
      14 min read

    Data Preprocessing

    • ML | Data Preprocessing in Python
      Data preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
      7 min read

    • ML | Overview of Data Cleaning
      Data cleaning is a important step in the machine learning (ML) pipeline as it involves identifying and removing any missing duplicate or irrelevant data. The goal of data cleaning is to ensure that the data is accurate, consistent and free of errors as raw data is often noisy, incomplete and inconsi
      14 min read

    • ML | Handling Missing Values
      Missing values are a common issue in machine learning. This occurs when a particular variable lacks data points, resulting in incomplete information and potentially harming the accuracy and dependability of your models. It is essential to address missing values efficiently to ensure strong and impar
      12 min read

    • Detect and Remove the Outliers using Python
      Outliers, deviating significantly from the norm, can distort measures of central tendency and affect statistical analyses. The piece explores common causes of outliers, from errors to intentional introduction, and highlights their relevance in outlier mining during data analysis. The article delves
      10 min read

    Data Transformation

    • Data Normalization Machine Learning
      Normalization is an essential step in the preprocessing of data for machine learning models, and it is a feature scaling technique. Normalization is especially crucial for data manipulation, scaling down, or up the range of data before it is utilized for subsequent stages in the fields of soft compu
      9 min read

    • Sampling distribution Using Python
      There are different types of distributions that we study in statistics like normal/gaussian distribution, exponential distribution, binomial distribution, and many others. We will study one such distribution today which is Sampling Distribution. Let's say we have some data then if we sample some fin
      3 min read

    Time Series Data Analysis

    • Data Mining - Time-Series, Symbolic and Biological Sequences Data
      Data mining refers to extracting or mining knowledge from large amounts of data. In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns. Theoreticians and practitioners are continually seeking improved tech
      3 min read

    • Basic DateTime Operations in Python
      Python has an in-built module named DateTime to deal with dates and times in numerous ways. In this article, we are going to see basic DateTime operations in Python. There are six main object classes with their respective components in the datetime module mentioned below: datetime.datedatetime.timed
      12 min read

    • Time Series Analysis & Visualization in Python
      Every dataset has distinct qualities that function as essential aspects in the field of data analytics, providing insightful information about the underlying data. Time series data is one kind of dataset that is especially important. This article delves into the complexities of time series datasets,
      11 min read

    • How to deal with missing values in a Timeseries in Python?
      It is common to come across missing values when working with real-world data. Time series data is different from traditional machine learning datasets because it is collected under varying conditions over time. As a result, different mechanisms can be responsible for missing records at different tim
      10 min read

    • How to calculate MOVING AVERAGE in a Pandas DataFrame?
      Calculating the moving average in a Pandas DataFrame is used for smoothing time series data and identifying trends. The moving average, also known as the rolling mean, helps reduce noise and highlight significant patterns by averaging data points over a specific window. In Pandas, this can be achiev
      7 min read

    • What is a trend in time series?
      Time series data is a sequence of data points that measure some variable over ordered period of time. It is the fastest-growing category of databases as it is widely used in a variety of industries to understand and forecast data patterns. So while preparing this time series data for modeling it's i
      3 min read

    • How to Perform an Augmented Dickey-Fuller Test in R
      Augmented Dickey-Fuller Test: It is a common test in statistics and is used to check whether a given time series is at rest. A given time series can be called stationary or at rest if it doesn't have any trend and depicts a constant variance over time and follows autocorrelation structure over a per
      3 min read

    • AutoCorrelation
      Autocorrelation is a fundamental concept in time series analysis. Autocorrelation is a statistical concept that assesses the degree of correlation between the values of variable at different time points. The article aims to discuss the fundamentals and working of Autocorrelation. Table of Content Wh
      10 min read

    Case Studies and Projects

    • Step by Step Predictive Analysis - Machine Learning
      Predictive analytics involves certain manipulations on data from existing data sets with the goal of identifying some new trends and patterns. These trends and patterns are then used to predict future outcomes and trends. By performing predictive analysis, we can predict future trends and performanc
      3 min read

    • 6 Tips for Creating Effective Data Visualizations
      The reality of things has completely changed, making data visualization a necessary aspect when you intend to make any decision that impacts your business growth. Data is no longer for data professionals; it now serves as the center of all decisions you make on your daily operations. It's vital to e
      6 min read

geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences