Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • DSA
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps
    • Software and Tools
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Go Premium
  • Statistics with Python
  • Data Analysis Tutorial
  • Python – Data visualization tutorial
  • NumPy
  • Pandas
  • OpenCV
  • R
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
Open In App

Descriptive Statistic

Last Updated : 24 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Statistics is the foundation of data science. Descriptive statistics are simple tools that help us understand and summarize data. They show the basic features of a dataset, like the average, highest and lowest values and how spread out the numbers are. It's the first step in making sense of information.

descriptive_statistics
Descriptive Statistic

Types of Descriptive Statistics

There are three categories for standard classification of descriptive statistics methods, each serving different purposes in summarizing and describing data. They help us understand:

  1. Where the data centers (Measures of Central Tendency)
  2. How spread out the data is (Measure of Variability)
  3. How the data is distributed (Measures of Frequency Distribution)

1. Measures of Central Tendency

Statistical values that describe the central position within a dataset. There are three main measures of central tendency:

Measures of Central Tendency

Mean: is the sum of observations divided by the total number of observations. It is also defined as average which is the sum divided by count.

\bar{x}=\frac{\sum x}{n}

 where, 

  • x = Observations
  • n = number of terms

Let's look at an example of how can we find the mean of a data set using python code implementation. Before its implementation we should have some basic knowledge about numpy and scipy.

Python
import numpy as np  # Sample Data arr = [5, 6, 11]  # Mean mean = np.mean(arr)  print("Mean = ", mean) 

Output
Mean =  7.333333333333333 

Mode: The most frequently occurring value in the dataset. It’s useful for categorical data and in cases where knowing the most common choice is crucial.

Python
import scipy.stats as stats  # sample Data arr = [1, 2, 2, 3]  # Mode mode = stats.mode(arr) print("Mode = ", mode) 

Output: 

Mode = ModeResult(mode=array([2]), count=array([2]))

Median: The median is the middle value in a sorted dataset. If the number of values is odd, it's the center value, if even, it's the average of the two middle values. It's often better than the mean for skewed data.

Python
import numpy as np  # sample Data arr = [1, 2, 3, 4]  # Median median = np.median(arr)  print("Median = ", median) 

Output
Median =  2.5 

Note : All implementations are performed using numpy library in python. If you want to learn and understand more about it. Refer to the link.

Central tendency measures are the foundation for understanding data distribution and identifying anomalies. For example, the mean can reveal trends, while the median highlights skewed distributions.

2. Measure of Variability

Knowing not just where the data centers but also how it spreads out is important. Measures of variability, also called measures of dispersion, help us spot the spread or distribution of observations in a dataset. They identifying outliers, assessing model assumptions and understanding data variability in relation to its mean. The key measures of variability include:

1. Range : describes the difference between the largest and smallest data point in our data set. The bigger the range, the more the spread of data and vice versa. While easy to compute range is sensitive to outliers. This measure can provide a quick sense of the data spread but should be complemented with other statistics.

Range = Largest data value - smallest data value 

Python
import numpy as np  # Sample Data arr = [1, 2, 3, 4, 5]  # Finding Max Maximum = max(arr) # Finding Min Minimum = min(arr)  # Difference Of Max and Min Range = Maximum-Minimum print("Maximum = {}, Minimum = {} and Range = {}".format(     Maximum, Minimum, Range)) 

Output
Maximum = 5, Minimum = 1 and Range = 4 

2. Variance: is defined as an average squared deviation from the mean. It is calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them and then dividing by the number of data points present in our data set.

\sigma ^ 2 = \frac{\sum\left(x-\mu \right )^2}{N}

where,

  • x -> Observation under consideration
  • N -> number of terms 
  • mu -> Mean 
Python
import statistics  # sample data arr = [1, 2, 3, 4, 5] # variance print("Var = ", (statistics.variance(arr))) 

Output
Var =  2.5 

3. Standard deviation: Standard deviation is widely used to measure the extent of variation or dispersion in data. It's especially important when assessing model performance (e.g., residuals) or comparing datasets with different means.

It is defined as the square root of the variance. It is calculated by finding the mean, then subtracting each number from the mean which is also known as the average and squaring the result. Adding all the values and then dividing by the no of terms followed by the square root.

\sigma = \sqrt{\frac{\sum \left(x-\mu \right )^2}{N}} 

where, 

  • x = Observation under consideration
  • N = number of terms 
  • mu = Mean
Python
import statistics arr = [1, 2, 3, 4, 5] print("Std = ", (statistics.stdev(arr))) 

Output
Std =  1.5811388300841898 

Variability measures are important in residual analysis to check how well a model fits the data.

3. Measures of Frequency Distribution

Frequency distribution table is a powerful summarize way to show how data points are distributed across different categories or intervals. Helps identify patterns, outliers and the overall structure of the dataset. It is often the first step in understand the dataset before applying more advanced analytical methods or creating visualizations like histograms or pie charts.

Frequency Distribution Table Includes measure like:

  • Data intervals or categories
  • Frequency counts
  • Relative frequencies (percentages)
  • Cumulative frequencies when needed

For Frequency Distribution – Histogram, Bar Graph, Frequency Polygon and Pie Chart read article: Frequency Distribution – Table, Graphs, Formula


N

niharikasurange9
Improve
Article Tags :
  • Data Science
  • statistical-algorithms
  • python
  • ML-Statistics
  • AI-ML-DS With Python
Practice Tags :
  • python

Similar Reads

    Data Science Tutorial
    Data Science is a field that combines statistics, machine learning and data visualization to extract meaningful insights from vast amounts of raw data and make informed decisions, helping businesses and industries to optimize their operations and predict future trends.This Data Science tutorial offe
    3 min read

    Introduction to Machine Learning

    What is Data Science?
    Data science is the study of data that helps us derive useful insight for business decision making. Data Science is all about using tools, techniques, and creativity to uncover insights hidden within data. It combines math, computer science, and domain expertise to tackle real-world challenges in a
    8 min read
    Top 25 Python Libraries for Data Science in 2025
    Data Science continues to evolve with new challenges and innovations. In 2025, the role of Python has only grown stronger as it powers data science workflows. It will remain the dominant programming language in the field of data science. Its extensive ecosystem of libraries makes data manipulation,
    10 min read
    Difference between Structured, Semi-structured and Unstructured data
    Big Data includes huge volume, high velocity, and extensible variety of data. There are 3 types: Structured data, Semi-structured data, and Unstructured data. Structured data - Structured data is data whose elements are addressable for effective analysis. It has been organized into a formatted repos
    2 min read
    Types of Machine Learning
    Machine learning is the branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data and improve from previous experience without being explicitly programmed for every task.In simple words, ML teaches the systems to think and understand like h
    13 min read
    What's Data Science Pipeline?
    Data Science is a field that focuses on extracting knowledge from data sets that are huge in amount. It includes preparing data, doing analysis and presenting findings to make informed decisions in an organization. A pipeline in data science is a set of actions which changes the raw data from variou
    3 min read
    Applications of Data Science
    Data Science is the deep study of a large quantity of data, which involves extracting some meaning from the raw, structured, and unstructured data. Extracting meaningful data from large amounts usesalgorithms processing of data and this processing can be done using statistical techniques and algorit
    6 min read

    Python for Machine Learning

    Learn Data Science Tutorial With Python
    Data Science has become one of the fastest-growing fields in recent years, helping organizations to make informed decisions, solve problems and understand human behavior. As the volume of data grows so does the demand for skilled data scientists. The most common languages used for data science are P
    3 min read
    Pandas Tutorial
    Pandas (stands for Python Data Analysis) is an open-source software library designed for data manipulation and analysis. Revolves around two primary Data structures: Series (1D) and DataFrame (2D)Built on top of NumPy, efficiently manages large datasets, offering tools for data cleaning, transformat
    6 min read
    NumPy Tutorial - Python Library
    NumPy is a core Python library for numerical computing, built for handling large arrays and matrices efficiently.ndarray object – Stores homogeneous data in n-dimensional arrays for fast processing.Vectorized operations – Perform element-wise calculations without explicit loops.Broadcasting – Apply
    3 min read
    Scikit Learn Tutorial
    Scikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
    3 min read
    ML | Data Preprocessing in Python
    Data preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
    6 min read
    EDA - Exploratory Data Analysis in Python
    Exploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
    6 min read

    Introduction to Statistics

    Statistics For Data Science
    Statistics is like a toolkit we use to understand and make sense of information. It helps us collect, organize, analyze and interpret data to find patterns, trends and relationships in the world around us.From analyzing scientific experiments to making informed business decisions, statistics plays a
    12 min read
    Descriptive Statistic
    Statistics is the foundation of data science. Descriptive statistics are simple tools that help us understand and summarize data. They show the basic features of a dataset, like the average, highest and lowest values and how spread out the numbers are. It's the first step in making sense of informat
    5 min read
    What is Inferential Statistics?
    Inferential statistics is an important tool that allows us to make predictions and conclusions about a population based on sample data. Unlike descriptive statistics, which only summarize data, inferential statistics let us test hypotheses, make estimates, and measure the uncertainty about our predi
    7 min read
    Bayes' Theorem
    Bayes' Theorem is a mathematical formula used to determine the conditional probability of an event based on prior knowledge and new evidence. It adjusts probabilities when new information comes in and helps make better decisions in uncertain situations.Bayes' Theorem helps us update probabilities ba
    13 min read
    Probability Data Distributions in Data Science
    Understanding how data behaves is one of the first steps in data science. Before we dive into building models or running analysis, we need to understand how the values in our dataset are spread out and that’s where probability distributions come in.Let us start with a simple example: If you roll a f
    8 min read
    Parametric Methods in Statistics
    Parametric statistical methods are those that make assumptions regarding the distribution of the population. These methods presume that the data have a known distribution (e.g., normal, binomial, Poisson) and rely on parameters (e.g., mean and variance) to define the data.Key AssumptionsParametric t
    6 min read
    Non-Parametric Tests
    Non-parametric tests are applied in hypothesis testing when the data does not satisfy the assumptions necessary for parametric tests, such as normality or equal variances. These tests are especially helpful for analyzing ordinal data, small sample sizes, or data with outliers.Common Non-Parametric T
    5 min read
    Hypothesis Testing
    Hypothesis testing compares two opposite ideas about a group of people or things and uses data from a small part of that group (a sample) to decide which idea is more likely true. We collect and study the sample data to check if the claim is correct.Hypothesis TestingFor example, if a company says i
    9 min read
    ANOVA for Data Science and Data Analytics
    ANOVA is useful when we need to compare more than two groups and determine whether their means are significantly different. Suppose you're trying to understand which ingredients in a recipe affect its taste. Some ingredients, like spices might have a strong influence while others like a pinch of sal
    9 min read
    Bayesian Statistics & Probability
    Bayesian statistics sees unknown values as things that can change and updates what we believe about them whenever we get new information. It uses Bayes’ Theorem to combine what we already know with new data to get better estimates. In simple words, it means changing our initial guesses based on the
    6 min read

    Feature Engineering

    What is Feature Engineering?
    Feature engineering is the process of turning raw data into useful features that help improve the performance of machine learning models. It includes choosing, creating and adjusting data attributes to make the model’s predictions more accurate. The goal is to make the model better by providing rele
    5 min read
    Introduction to Dimensionality Reduction
    When working with machine learning models, datasets with too many features can cause issues like slow computation and overfitting. Dimensionality reduction helps to reduce the number of features while retaining key information. Techniques like principal component analysis (PCA), singular value decom
    4 min read
    Feature Selection Techniques in Machine Learning
    In data science many times we encounter vast of features present in a dataset. But it is not necessary all features contribute equally in prediction that's where feature selection comes. It involves selecting a subset of relevant features from the original feature set to reduce the feature space whi
    5 min read
    Feature Engineering: Scaling, Normalization, and Standardization
    Feature Scaling is a technique to standardize the independent features present in the data. It is performed during the data pre-processing to handle highly varying values. If feature scaling is not done then machine learning algorithm tends to use greater values as higher and consider smaller values
    6 min read
    Principal Component Analysis(PCA)
    PCA (Principal Component Analysis) is a dimensionality reduction technique used in data analysis and machine learning. It helps you to reduce the number of features in a dataset while keeping the most important information. It changes your original features into new features these new features don’t
    7 min read

    Model Evaluation and Tuning

    Evaluation Metrics in Machine Learning
    When building machine learning models, it’s important to understand how well they perform. Evaluation metrics help us to measure the effectiveness of our models. Whether we are solving a classification problem, predicting continuous values or clustering data, selecting the right evaluation metric al
    9 min read
    Regularization in Machine Learning
    Regularization is an important technique in machine learning that helps to improve model accuracy by preventing overfitting which happens when a model learns the training data too well including noise and outliers and perform poor on new data. By adding a penalty for complexity it helps simpler mode
    7 min read
    Cross Validation in Machine Learning
    Cross-validation is a technique used to check how well a machine learning model performs on unseen data. It splits the data into several parts, trains the model on some parts and tests it on the remaining part repeating this process multiple times. Finally the results from each validation step are a
    7 min read
    Hyperparameter Tuning
    Hyperparameter tuning is the process of selecting the optimal values for a machine learning model's hyperparameters. These are typically set before the actual training process begins and control aspects of the learning process itself. They influence the model's performance its complexity and how fas
    7 min read
    ML | Underfitting and Overfitting
    Machine learning models aim to perform well on both training data and new, unseen data and is considered "good" if:It learns patterns effectively from the training data.It generalizes well to new, unseen data.It avoids memorizing the training data (overfitting) or failing to capture relevant pattern
    5 min read
    Bias and Variance in Machine Learning
    There are various ways to evaluate a machine-learning model. We can use MSE (Mean Squared Error) for Regression; Precision, Recall, and ROC (Receiver operating characteristics) for a Classification Problem along with Absolute Error. In a similar way, Bias and Variance help us in parameter tuning and
    10 min read

    Data Science Practice

    Data Science Interview Questions and Answers
    In this Data Science interview questions guide, you will explore interview questions for Data Science for beginners and experienced professionals. Here you will find the frequently asked questions during the data science interview. Practicing all the questions below will help you explore your career
    15+ min read
    Data Science Coding Interview Questions
    To excel in data science coding interviews, it's essential to master a variety of questions that test your programming skills and understanding of data science concepts. We have prepared a list of the Top 50 Data Science Interview Questions along with their answers to ace interviews. Q.1 Write a fun
    15 min read
    Top 65+ Data Science Projects with Source Code
    Dive into the exciting world of data science with our Top 65+ Data Science Projects with Source Code. These projects are designed to help you gain hands-on experience and sharpen your skills, whether you’re a beginner or looking to upscale your data science knowledge. Covering everything from trend
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • DSA Tutorial
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences