Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • DSA
  • Interview Problems on Hash
  • Practice Hash
  • MCQs on Hash
  • Hashing Tutorial
  • Hash Function
  • Index Mapping
  • Collision Resolution
  • Open Addressing
  • Separate Chaining
  • Quadratic probing
  • Double Hashing
  • Load Factor and Rehashing
  • Advantage & Disadvantage
Open In App
Next Article:
Count distinct elements in an array in Python
Next article icon

Count-Min Sketch in Python

Last Updated : 20 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Count-Min Sketch is a probabilistic data structure which approximates the frequency of items in a stream of data. It uses little memory while handling massive amounts of data and producing approximations of the answers. In this post, we'll explore the idea behind the Count-Min Sketch, how it's implemented in Python, and discuss its uses and drawbacks.

What is Count-Min Sketch?

Count-Min is a probabilistic data structure used to count unique items in a large stream of data. It is used to find an approximate frequency of the events on the streaming data.

The idea behind Count-Min Sketch is to use hash functions and a two-dimensional array (or matrix) to efficiently store the frequency of items. The array is made up of several rows and columns, where a bucket is represented by a column and a hash function by a row. The hash functions identify the locations in the array to increment or get counts when updating or querying the frequency of entries.

Key Operations in Count-Min Sketch:

Initialization: Set the number of rows and columns that you want in the Count-Min Sketch.

Update: To increase an element's count, hash it through each hash function and update the array's associated buckets.

Query: Find the lowest count across the related buckets after hashing an element with each hash algorithm to determine its estimated frequency.

Implementation of Count-Min Sketch in Python:

Below is the implementation of Count-Min Sketch in Python:

Python
import hashlib  class CountMinSketch:     def __init__(self, rows, cols):         self.rows = rows         self.cols = cols         self.count_matrix = [[0] * cols for _ in range(rows)]         self.hash_functions = [hashlib.md5, hashlib.sha1, hashlib.sha256]       def update(self, element):         for i, hash_func in enumerate(self.hash_functions):             hash_value = int(hash_func(element.encode()).hexdigest(), 16)             bucket_index = hash_value % self.cols             self.count_matrix[i][bucket_index] += 1      def query(self, element):         min_count = float('inf')         for i, hash_func in enumerate(self.hash_functions):             hash_value = int(hash_func(element.encode()).hexdigest(), 16)             bucket_index = hash_value % self.cols             min_count = min(min_count, self.count_matrix[i][bucket_index])         return min_count  # Example usage cms = CountMinSketch(rows=3, cols=10) data_stream = ["apple", "banana", "apple", "orange", "apple", "banana", "banana"] for element in data_stream:     cms.update(element) print("Frequency of 'apple':", cms.query("apple")) 

Output
Frequency of 'apple': 3 

Count-Min Sketch is a data structure that provides accurate approximations of element frequencies in massive data streams. Python developers may efficiently address a range of frequency estimation issues by using Count-Min Sketch provided they have a thorough knowledge of its concepts, uses, and limits.


Next Article
Count distinct elements in an array in Python

A

ashish_rao_2373
Improve
Article Tags :
  • Hash
  • Matrix
  • DSA
  • Python-DSA
Practice Tags :
  • Hash
  • Matrix

Similar Reads

  • Python Bin | Count total bits in a number
    Given a positive number n, count total bit in it. Examples: Input : 13 Output : 4 Binary representation of 13 is 1101 Input : 183 Output : 8 Input : 4096 Output : 13 We have existing solution for this problem please refer Count total bits in a number link. Approach#1: We can solve this problem quick
    3 min read
  • How to Perform a COUNTIF Function in Python?
    In this article, we will discuss how to perform a COUNTIF function in Python. COUNTIF We use this function to count the elements if the condition is satisfied. Notice that the word stands as COUNT + IF. That means we want to count the element if the condition that is provided is satisfied. Approach
    4 min read
  • Find Chromatic Number in Python
    Find the chromatic number of a given graph G, which is the smallest number of colors needed to color the vertices of the graph in such a way that no two adjacent vertices share the same color. Examples: Input: Vertices = 5, Edges: [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]Output: Chromatic Num
    4 min read
  • Count distinct elements in an array in Python
    Given an unsorted array, count all distinct elements in it. Examples: Input : arr[] = {10, 20, 20, 10, 30, 10} Output : 3 Input : arr[] = {10, 20, 20, 10, 20} Output : 2 We have existing solution for this article. We can solve this problem in Python3 using Counter method. Approach#1: Using Set() Thi
    2 min read
  • Count() vs len() on a Django QuerySet
    In Django, when working with database query sets, developers often need to determine the number of records that meet certain criteria. Django offers two primary ways to accomplish this: using the count() method on a QuerySet, or the Python built-in len() function. Each method has its specific use ca
    3 min read
  • How to Plot Value Counts in Pandas
    In this article, we'll learn how to plot value counts using provide, which can help us quickly understand the frequency distribution of values in a dataset. Table of Content Concepts Related to Plotting Value CountsSteps to Plot Value Counts in Pandas1. Install Required Libraries2. Import Required L
    3 min read
  • Count the Number of Null Elements in a List in Python
    In data analysis and data processing, It's important to know about Counting the Number of Null Elements. In this article, we'll explore how to count null elements in a list in Python, along with three simple examples to illustrate the concept. Count the Number of Null Elements in a List in PythonIn
    3 min read
  • Count-Min Sketch Data Structure with Implementation
    The Count-Min Sketch is a probabilistic data structure and is defined as a simple technique to summarize large amounts of frequency data. Count-min sketch algorithm talks about keeping track of the count of things. i.e, How many times an element is present in the set. What is Count-Min Sketch?Count-
    7 min read
  • Learn Photoshop Note and Count Tool
    Adobe Photoshop is a raster-based image editing software. It is developed by Adobe.Inc and available for both macOS and Windows operating systems. You can use Photoshop to create or edit images, posters, banners, logos, invitation cards, and various types of graphic designing work. It provides vario
    6 min read
  • How to count the number of lines in a CSV file in Python?
    CSV (Comma Separated Values) is a simple file format used to store tabular data, such as a spreadsheet or database. A CSV file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences