Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Split a string on multiple delimiters in Python
Next article icon

Detecting Delimiter in Text using detect_delimiter in Python

Last Updated : 28 Apr, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

Sometimes while working with a large corpus of text, we can have a problem in which we try to find which character is acting as a delimiter. This can be an interesting and useful utility while working with a huge amount of data and judging the delimiter. A way to solve this problem is discussed in this article using the Python library of detect_delimiter.

Installation

To install this module type the below command in the terminal.

pip install detect_delimiter

The first step is to check for all the whitelist characters’ presence in the input text, if found, then those characters are counted for most frequencies and a maximum of one is returned, ignoring all from the blacklist list if provided. If no delimiter is from the whitelist, then characters avoiding blacklist characters are computed for maximum frequency, if found, that character is returned as the delimiter. If still delimiter is not found, default is returned as a delimiter if provided, else None is returned. 

Syntax: detect(text:str, text:str, default=None, whitelist=[‘,’, ‘;’, ‘:’, ‘|’, ‘\t’], blacklist=None)
text : The input string to test for delimiter.
default :  The default value to output in case no valid delimiter is found.
whitelist : The first set of characters to be checked for delimiters, if these are found, they are treated as delimiters. Useful in cases one knows out of which delimiters are possible. Defaults to [‘,’, ‘;’, ‘:’, ‘|’, ‘\t’].
blacklist : By default all digits, alphabets and full stop are not considered as blacklist, In case more values one needs to avoid being tagged as delimiters, these will get avoided in check. 

Example 1: Working with detect() and default

In this, few examples of detecting the delimiters are demonstrated along with the use of default. 

Python3

from detect_delimiter import detect
 
# simple example
print("The found delimiter [base example] : ")
print(detect("Geeksforgeeks-is-best-for-geeks"))
 
# simple example without default and no delimiter
# . is not considered as delim
print("The found delimiter [no default] : ")
print(detect("Geeksforgeeks.is.best.for.geeks"))
 
# simple example with default
# . is not considered as delim
# No delim is found, hence, default is printed
print("The found delimiter [with default] : ")
print(detect("Geeksforgeeks.is.best.for.geeks", default='@'))
                      
                       

Output : 

Working with detect() and default

Example 2: Using blacklist and whitelist parameters

Providing whitelist parameter prioritizes any particular delimiter even if its frequency is less than nonwhitelisted delim. The blacklist parameter can help to ignore any delimiter.

Python3

from detect_delimiter import detect
from string import ascii_letters
 
# simple example
# check for , as whitelist picked from default
# - [',', ';', ':', '|', '\t']
print("The found delimiter [default whitelist] : ")
print(detect("Geeksforgeeks$is-best,for-geeks"))
 
# simple example with whitelist
# ! prioritized
print("The found delimiter [provided whitelist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
             whitelist=['@', "!"]))
 
# simple example with blacklist
# default blacklist overridden
print("The found delimiter [provided blacklist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
             blacklist=['@', "-", 'e']))
                      
                       

Output : 

Examples with blacklist and whitelist Parameters.



Next Article
Split a string on multiple delimiters in Python
author
manjeet_04
Improve
Article Tags :
  • Python
  • python-modules
Practice Tags :
  • python

Similar Reads

  • Detect Encoding of a Text file with Python
    Python provides a straightforward way to determine the encoding of a text file, essential for the proper handling of diverse character sets. The chardet library is a popular choice for automatic character encoding detection. By analyzing the statistical distribution of byte values, it accurately ide
    2 min read
  • Split a string on multiple delimiters in Python
    In this article, we will explore various methods to split a string on multiple delimiters in Python. The simplest approach is by using re.split(). Using re.split()The re.split() function from the re module is the most straightforward way to split a string on multiple delimiters. It uses a regular ex
    2 min read
  • Python - Check if String Contain Only Defined Characters using Regex
    In this article, we are going to see how to check whether the given string contains only a certain set of characters in Python. These defined characters will be represented using sets. Examples: Input: ‘657’ let us say regular expression contains the following characters- (‘78653’) Output: Valid Exp
    2 min read
  • Clean Web Scraping Data Using clean-text in Python
    If you like to play with API's or like to scrape data from various websites, you must've come around random annoying text, numbers, keywords that come around with data. Sometimes it can be really complicating and frustrating to clean scraped data to obtain the actual data that we want.  In this arti
    2 min read
  • Check if String Contains Substring in Python
    This article will cover how to check if a Python string contains another string or a substring in Python. Given two strings, check whether a substring is in the given string. Input: Substring = "geeks" String="geeks for geeks"Output: yesInput: Substring = "geek" String="geeks for geeks"Output: yesEx
    8 min read
  • Difference Between strip and split in Python
    The major difference between strip and split method is that strip method removes specified characters from both ends of a string. By default it removes whitespace and returns a single modified string. Whereas, split method divides a string into parts based on a specified delimiter and by default it
    1 min read
  • Replace Commas with New Lines in a Text File Using Python
    Replacing a comma with a new line in a text file consists of traversing through the file's content and substituting each comma with a newline character. In this article, we will explore three different approaches to replacing a comma with a new line in a text file. Replace Comma With a New Line in a
    2 min read
  • Extract IP address from file using Python
    Let us see how to extract IP addresses from a file using Python. Algorithm : Import the re module for regular expression.Open the file using the open() function.Read all the lines in the file and store them in a list.Declare the pattern for IP addresses. The regex pattern is : r'(\d{1,3}\.\d{1,3}\.\
    2 min read
  • How to Read Text File Into List in Python?
    In this article, we are going to see how to read text files into lists in Python. File for demonstration: Example 1: Converting a text file into a list by splitting the text on the occurrence of '.'. We open the file in reading mode, then read all the text using the read() and store it into a variab
    2 min read
  • Check if a string exists in a PDF file in Python
    In this article, we'll learn how to use Python to determine whether a string is present in a PDF file. In Python, strings are essential for Projects, applications software, etc. Most of the time, we have to determine whether a string is present in a PDF file or not. Here, we'll discuss how to check
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences