Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Python RegEx
Next article icon

Python RegEx

Last Updated : 19 Jul, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

In this tutorial, you'll learn about RegEx and understand various regular expressions.

  • Regular Expressions
  • Why Regular Expressions
  • Basic Regular Expressions
  • More Regular Expressions
  • Compiled Regular Expressions

A RegEx is a powerful tool for matching text, based on a pre-defined pattern. It can detect the presence or absence of a text by matching it with a particular pattern, and also can split a pattern into one or more sub-patterns. The Python standard library provides a re module for regular expressions. Its primary function is to offer a search, where it takes a regular expression and a string. Here, it either returns the first match or else none.

Python3
import re   match = re.search(r'portal', 'GeeksforGeeks: A computer science \                   portal for geeks') print(match) print(match.group())  print('Start Index:', match.start()) print('End Index:', match.end()) 

Output
<_sre.SRE_Match object; span=(52, 58), match='portal'>  portal  Start Index: 52  End Index: 58

Here r character (r'portal') stands for raw, not RegEx. The raw string is slightly different from a regular string, it won't interpret the \ character as an escape character. This is because the regular expression engine uses \ character for its own escaping purpose.

Before starting with the Python regex module let's see how to actually write RegEx using metacharacters or special sequences. 

MetaCharacters

To understand the RE analogy, MetaCharacters are useful, important, and will be used in functions of module re. Below is the list of metacharacters.

MetaCharactersDescription
\Used to drop the special meaning of character following it
[]Represent a character class
^Matches the beginning
$Matches the end
.Matches any character except newline
|Means OR (Matches with any of the characters separated by it.
?Matches zero or one occurrence
*Any number of occurrences (including 0 occurrences)
+One or more occurrences
{}Indicate the number of occurrences of a preceding RegEx to match.
()Enclose a group of RegEx

The group method returns the matching string, and the start and end method provides the starting and ending string index. Apart from this, it has so many other methods, which we will discuss later.

Why RegEx?

Let's take a moment to understand why we should use Regular expression.

  1. Data Mining: Regular expression is the best tool for data mining. It efficiently identifies a text in a heap of text by checking with a pre-defined pattern. Some common scenarios are identifying an email, URL, or phone from a pile of text.
  2. Data Validation: Regular expression can perfectly validate data. It can include a wide array of validation processes by defining different sets of patterns. A few examples are validating phone numbers, emails, etc.

Basic RegEx

Let's understand some of the basic regular expressions. They are as follows:

  • Character Classes
  • Rangers
  • Negation
  • Shortcuts
  • Beginning and End of String
  • Any Character

Character Classes

Character classes allow you to match a single set of characters with a possible set of characters. You can mention a character class within the square brackets. Let's consider an example of case sensitive words. 

Python3
import re   print(re.findall(r'[Gg]eeks', 'GeeksforGeeks: \                  A computer science portal for geeks')) 

Output
['Geeks', 'Geeks', 'geeks']

Ranges

The range provides the flexibility to match a text with the help of a range pattern such as a range of numbers(0 to 9), a range of characters (A to Z), and so on. The hyphen character within the character class represents a range.

Python3
import re   print('Range',re.search(r'[a-zA-Z]', 'x')) 

Output
Range <_sre.SRE_Match object; span=(0, 1), match='x'>

Negation

Negation inverts a character class. It will look for a match except for the inverted character or range of inverted characters mentioned in the character class.

Python3
import re  print(re.search(r'[^a-z]', 'c')) 

Output
None

In the above case, we have inverted the character class that ranges from a to z. If we try to match a character within the mentioned range, the regular expression engine returns None.

Let's consider another example

Python3
import re  print(re.search(r'G[^e]', 'Geeks')) 

Output
None

Here it accepts any other character that follows G, other than e.

List of special sequences 

Special SequenceDescriptionExamples
\AMatches if the string begins with the given character\Afor for geeks
for the world
\bMatches if the word begins or ends with the given character. \b(string) will check for the beginning of the word and (string)\b will check for the ending of the word.\bgegeeks
get
\BIt is the opposite of the \b i.e. the string should not start or end with the given regex.\Bgetogether
forge
\dMatches any decimal digit, this is equivalent to the set class [0-9]\d123
gee1
\DMatches any non-digit character, this is equivalent to the set class [^0-9]\Dgeeks
geek1
\sMatches any whitespace character.\sgee ks
a bc a
\SMatches any non-whitespace character\Sa bd
abcd
\wMatches any alphanumeric character, this is equivalent to the class [a-zA-Z0-9_].\w123
geeKs4
\WMatches any non-alphanumeric character.\W>$
gee<>
\ZMatches if the string ends with the given regexab\Zabcdab
abababab

Shortcuts

Let's discuss some of the shortcuts provided by the regular expression engine.

  • \w - matches a word character
  • \d - matches digit character
  • \s - matches whitespace character (space, tab, newline, etc.)
  • \b - matches a zero-length character
Python3
import re   print('Geeks:', re.search(r'\bGeeks\b', 'Geeks')) print('GeeksforGeeks:', re.search(r'\bGeeks\b', 'GeeksforGeeks')) 

Output
Geeks: <_sre.SRE_Match object; span=(0, 5), match='Geeks'>  GeeksforGeeks: None

Beginning and End of String

The ^ character chooses the beginning of a string and the $ character chooses the end of a string.

Python3
import re   # Beginning of String match = re.search(r'^Geek', 'Campus Geek of the month') print('Beg. of String:', match)  match = re.search(r'^Geek', 'Geek of the month') print('Beg. of String:', match)  # End of String match = re.search(r'Geeks$', 'Compute science portal-GeeksforGeeks') print('End of String:', match) 

Output
Beg. of String: None  Beg. of String: <_sre.SRE_Match object; span=(0, 4), match='Geek'>  End of String: <_sre.SRE_Match object; span=(31, 36), match='Geeks'>

Any Character

The . character represents any single character outside a bracketed character class.

Python3
import re  print('Any Character', re.search(r'p.th.n', 'python 3')) 

Output
Any Character <_sre.SRE_Match object; span=(0, 6), match='python'>

More RegEx

Some of the other regular expressions are as follows:

  • Optional Characters
  • Repetition
  • Shorthand
  • Grouping
  • Lookahead
  • Substitution

Optional Characters

Regular expression engine allows you to specify optional characters using the ? character. It allows a character or character class either to present once or else not to occur. Let's consider the example of a word with an alternative spelling - color or colour.

Python3
import re   print('Color',re.search(r'colou?r', 'color'))  print('Colour',re.search(r'colou?r', 'colour')) 

Output
Color <_sre.SRE_Match object; span=(0, 5), match='color'>  Colour <_sre.SRE_Match object; span=(0, 6), match='colour'>

Repetition

Repetition enables you to repeat the same character or character class. Consider an example of a date that consists of day, month, and year. Let's use a regular expression to identify the date (mm-dd-yyyy).

Python3
import re   print('Date{mm-dd-yyyy}:', re.search(r'[\d]{2}-[\d]{2}-[\d]{4}',                                      '18-08-2020')) 

Output
Date{mm-dd-yyyy}: <_sre.SRE_Match object; span=(0, 10), match='18-08-2020'>

Here, the regular expression engine checks for two consecutive digits. Upon finding the match, it moves to the hyphen character. After then, it checks the next two consecutive digits, and the process is repeated.  

Let's discuss three other regular expressions under repetition.

Repetition ranges

The repetition range is useful when you have to accept one or more formats. Consider a scenario where both three digits, as well as four digits, are accepted. Let's have a look at the regular expression.

Python3
import re   print('Three Digit:', re.search(r'[\d]{3,4}', '189')) print('Four Digit:', re.search(r'[\d]{3,4}', '2145')) 

Output
Three Digit: <_sre.SRE_Match object; span=(0, 3), match='189'>  Four Digit: <_sre.SRE_Match object; span=(0, 4), match='2145'>

Open-Ended Ranges

There are scenarios where there is no limit for a character repetition. In such scenarios, you can set the upper limit as infinitive. A common example is matching street addresses. Let's have a look  

Python3
import re   print(re.search(r'[\d]{1,}','5th Floor, A-118,\ Sector-136, Noida, Uttar Pradesh - 201305')) 

Output
<_sre.SRE_Match object; span=(0, 1), match='5'>

Shorthand

Shorthand characters allow you to use + character to specify one or more ({1,}) and * character to specify zero or more ({0,}.

Python3
import re  print(re.search(r'[\d]+', '5th Floor, A-118,\ Sector-136, Noida, Uttar Pradesh - 201305')) 

Output
<_sre.SRE_Match object; span=(0, 1), match='5'>

Grouping

Grouping is the process of separating an expression into groups by using parentheses, and it allows you to fetch each individual matching group.  

Python3
import re   grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})', '26-08-2020') print(grp) 

Output
<_sre.SRE_Match object; span=(0, 10), match='26-08-2020'>

Let's see some of its functionality.

Return the entire match

The re module allows you to return the entire match using the group() method

Python3
import re   grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020') print(grp.group()) 

Output
26-08-2020

Return a tuple of matched groups

You can use groups() method to return a tuple that holds individual matched groups

Python3
import re   grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020') print(grp.groups()) 

Output
('26', '08', '2020')

Retrieve a single group

Upon passing the index to a group method, you can retrieve just a single group.

Python3
import re   grp = re.search(r'([\d]{2})-([\d]{2})-([\d]{4})','26-08-2020') print(grp.group(3)) 

Output
2020

Name your groups

The re module allows you to name your groups. Let's look into the syntax.

Python3
import re   match = re.search(r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]{4})',                   '26-08-2020') print(match.group('mm')) 

Output
08

Individual match as a dictionary

We have seen how regular expression provides a tuple of individual groups. Not only tuple, but it can also provide individual match as a dictionary in which the name of each group acts as the dictionary key.

Python3
import re   match = re.search(r'(?P<dd>[\d]{2})-(?P<mm>[\d]{2})-(?P<yyyy>[\d]{4})',                   '26-08-2020') print(match.groupdict()) 

Output
{'dd': '26', 'mm': '08', 'yyyy': '2020'}

Lookahead

In the case of a  negated character class, it won't match if a character is not present to check against the negated character. We can overcome this case by using lookahead; it accepts or rejects a match based on the presence or absence of content.  

Python3
import re   print('negation:', re.search(r'n[^e]', 'Python')) print('lookahead:', re.search(r'n(?!e)', 'Python')) 

Output
negation: None  lookahead: <_sre.SRE_Match object; span=(5, 6), match='n'>

Lookahead can also disqualify the match if it is not followed by a particular character. This process is called a positive lookahead, and can be achieved by simply replacing ! character with = character.

Python3
import re  print('positive lookahead', re.search(r'n(?=e)', 'jasmine')) 

Output
positive lookahead <_sre.SRE_Match object; span=(5, 6), match='n'>

Substitution

The regular expression can replace the string and returns the replaced one using the re.sub method. It is useful when you want to avoid characters such as /, -, ., etc. before storing it to a database. It takes three arguments:

  • the regular expression
  • the replacement string
  • the source string being searched

Let's have a look at the below code that replaces - character from a credit card number.

Python3
import re  print(re.sub(r'([\d]{4})-([\d]{4})-([\d]{4})-([\d]{4})',r'\1\2\3\4',              '1111-2222-3333-4444')) 

Output
1111222233334444

Compiled RegEx

The Python regular expression engine can return a compiled regular expression(RegEx) object using compile function. This object has its search method and sub-method, where a developer can reuse it when in need.  

Python3
import re  regex = re.compile(r'([\d]{2})-([\d]{2})-([\d]{4})')  # search method print('compiled reg expr', regex.search('26-08-2020'))  # sub method print(regex.sub(r'\1.\2.\3', '26-08-2020')) 

Output

compiled reg expr <_sre.SRE_Match object; span=(0, 10), match='26-08-2020'> 26.08.2020

Summary

RegEx is a powerful tool for data mining and data validation. However, avoid using regular expressions whenever you have a straightforward solution. And also, when you have to deal with complex structures such as non-trivial document format, try to use other libraries that meet the need.


Next Article
Python RegEx

S

sonugeorge
Improve
Article Tags :
  • Python
  • regular-expression
Practice Tags :
  • python

Similar Reads

    Python Modules
    Python Module is a file that contains built-in functions, classes,its and variables. There are many Python modules, each with its specific work.In this article, we will cover all about Python modules, such as How to create our own simple module, Import Python modules, From statements in Python, we c
    7 min read
    Python Arrays
    Lists in Python are the most flexible and commonly used data structure for sequential storage. They are similar to arrays in other languages but with several key differences:Dynamic Typing: Python lists can hold elements of different types in the same list. We can have an integer, a string and even
    9 min read
    asyncio in Python
    Asyncio is a Python library that is used for concurrent programming, including the use of async iterator in Python. It is not multi-threading or multi-processing. Asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web servers, databa
    4 min read
    Calendar in Python
    Python has a built-in Python Calendar module to work with date-related tasks. Using the module, we can display a particular month as well as the whole calendar of a year. In this article, we will see how to print a calendar month and year using Python. Calendar in Python ExampleInput: yy = 2023 mm =
    2 min read
    Python Collections Module
    The collection Module in Python provides different types of containers. A Container is an object that is used to store different objects and provide a way to access the contained objects and iterate over them. Some of the built-in containers are Tuple, List, Dictionary, etc. In this article, we will
    13 min read
    Working with csv files in Python
    Python is one of the important fields for data scientists and many programmers to handle a variety of data. CSV (Comma-Separated Values) is one of the prevalent and accessible file formats for storing and exchanging tabular data. In article explains What is CSV. Working with CSV files in Python, Rea
    10 min read
    Python datetime module
    In Python, date and time are not data types of their own, but a module named DateTime in Python can be imported to work with the date as well as time. Python Datetime module comes built into Python, so there is no need to install it externally. In this article, we will explore How DateTime in Python
    14 min read
    Functools module in Python
    The functools module offers a collection of tools that simplify working with functions and callable objects. It includes utilities to modify, extend, or optimize functions without rewriting their core logic, helping you write cleaner and more efficient code.Let's discuss them in detail.1. Partial cl
    5 min read
    hashlib module in Python
    A Cryptographic hash function is a function that takes in input data and produces a statistically unique output, which is unique to that particular set of data. The hash is a fixed-length byte stream used to ensure the integrity of the data. In this article, you will learn to use the hashlib module
    5 min read
    Heap queue or heapq in Python
    A heap queue or priority queue is a data structure that allows us to quickly access the smallest (min-heap) or largest (max-heap) element. A heap is typically implemented as a binary tree, where each parent node's value is smaller (for a min-heap) or larger (for a max-heap) than its children. Howeve
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences