Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • DSA Tutorial
  • Data Structures
  • Algorithms
  • Array
  • Strings
  • Linked List
  • Stack
  • Queue
  • Tree
  • Graph
  • Searching
  • Sorting
  • Recursion
  • Dynamic Programming
  • Binary Tree
  • Binary Search Tree
  • Heap
  • Hashing
  • Divide & Conquer
  • Mathematical
  • Geometric
  • Bitwise
  • Greedy
  • Backtracking
  • Branch and Bound
  • Matrix
  • Pattern Searching
  • Randomized
Open In App
Next Article:
Remove duplicate words from Sentence using Regular Expression
Next article icon

Remove duplicate words from Sentence using Regular Expression

Last Updated : 09 Apr, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Given a string str which represents a sentence, the task is to remove the duplicate words from sentences using regular Expression in Programming Languages like C++, Java, C#, Python, etc.

Examples of Remove Duplicate Words from Sentences

Input: str = "Good bye bye world world" 
Output: Good bye world 
Explanation: We remove the second occurrence of bye and world from Good bye bye world world

Input: str = "Ram went went to to to his home" 
Output: Ram went to his home 
Explanation: We remove the second occurrence of went and the second and third occurrences of to from Ram went went to to to his home.

Input: str = "Hello hello world world" 
Output: Hello world 
Explanation: We remove the second occurrence of hello and world from Hello hello world world. 
 

Approach

1. Get the sentence.
2. Form a regular expression to remove duplicate words from sentences. 

regex = "\\b(\\w+)(?:\\W+\\1\\b)+";

The details of the above regular expression can be understood as: 

  • "\\b": A word boundary. Boundaries are needed for special cases. For example, in "My thesis is great", "is" wont be matched twice.
  • "\\w+" A word character: [a-zA-Z_0-9] 
  • (?:\\W+\\1\\b)+ : This part is a non-capturing group (denoted by (?:...)). It's used to group together the repeated words. Let's break it down further:
  • "\\W+" : This matches one or more non-word characters (anything that is not a word character).
  • "\\1:" This is a back reference to the first capturing group (\\w+). It ensures that the same word that was captured earlier is repeated. The \\1 references the exact text captured by the first capturing group.
  • "\\b" Another word boundary anchor to ensure that the repeated word is a whole word.
  • "+" This quantifier ensures that the non-capturing group (?:\\W+\\1\\b) matches one or more times, effectively matching one or more repeated words.

3. Match the sentence with the Regex. In Java, this can be done using Pattern.matcher(). 
4. return the modified sentence.

Below is the implementation of the above approach:

C++
// C++ program to remove duplicate words // using Regular Expression or ReGex. #include <iostream> #include <regex> using namespace std;  // Function to validate the sentence // and remove the duplicate words string removeDuplicateWords(string s) {    // Regex to matching repeated words.   const regex pattern("\\b(\\w+)(?:\\W+\\1\\b)+", regex_constants::icase);    string answer = s;   for (auto it = sregex_iterator(s.begin(), s.end(), pattern);        it != sregex_iterator(); it++)   {       // flag type for determining the matching behavior       // here it is for matches on 'string' objects       smatch match;       match = *it;       answer.replace(answer.find(match.str(0)), match.str(0).length(), match.str(1));   }    return answer; }  // Driver Code int main() {   // Test Case: 1   string str1       = "Good bye bye world world";   cout << removeDuplicateWords(str1) << endl;    // Test Case: 2   string str2       = "Ram went went to to his home";   cout << removeDuplicateWords(str2) << endl;    // Test Case: 3   string str3       = "Hello hello world world";   cout << removeDuplicateWords(str3) << endl;    return 0; }  // This code is contributed by yuvraj_chandra 
Java
// Java program to remove duplicate words // Using Regular Expression or ReGex. import java.util.regex.Matcher; import java.util.regex.Pattern;  // Driver Class class GFG {     // Function to validate the sentence     // and remove the duplicate words     public static String removeDuplicateWords(String input)     {         // Regex to matching repeated words.         String regex = "\\b(\\w+)(?:\\W+\\1\\b)+";         Pattern p = Pattern.compile(regex,Pattern.CASE_INSENSITIVE);          // Pattern class contains matcher() method         // to find matching between given sentence         // and regular expression.         Matcher m = p.matcher(input);          // Check for subsequences of input         // that match the compiled pattern         while (m.find()) {             input = input.replaceAll( m.group(), m.group(1));         }         return input;     }      // Driver code     public static void main(String args[])     {         // Test Case: 1         String str1 = "Good bye bye world world";         System.out.println(removeDuplicateWords(str1));          // Test Case: 2         String str2 = "Ram went went to to his home";         System.out.println(removeDuplicateWords(str2));          // Test Case: 3         String str3 = "Hello hello world world";         System.out.println( removeDuplicateWords(str3));     } } 
Python3
# Python program to remove duplicate words # using Regular Expression or ReGex. import re   # Function to validate the sentence # and remove the duplicate words def removeDuplicateWords(input):      # Regex to matching repeated words     regex = r'\b(\w+)(?:\W+\1\b)+'      return re.sub(regex, r'\1', input, flags=re.IGNORECASE)   # Driver Code  # Test Case: 1 str1 = "Good bye bye world world" print(removeDuplicateWords(str1))  # Test Case: 2 str2 = "Ram went went to to his home" print(removeDuplicateWords(str2))  # Test Case: 3 str3 = "Hello hello world world" print(removeDuplicateWords(str3))  # This code is contributed by yuvraj_chandra 
C#
using System; using System.Text.RegularExpressions;  class Program {     // Function to validate the sentence     // and remove the duplicate words     static string RemoveDuplicateWords(string s)     {         // Regex to matching repeated words.         Regex pattern = new Regex(@"\b(\w+)(?:\W+\1\b)+", RegexOptions.IgnoreCase);          string answer = s;         MatchCollection matches = pattern.Matches(s);          foreach (Match match in matches)         {             answer = answer.Replace(match.Groups[0].Value, match.Groups[1].Value);         }          return answer;     }      // Driver Code     static void Main()     {         // Test Case: 1         string str1 = "Good bye bye world world";         Console.WriteLine(RemoveDuplicateWords(str1));          // Test Case: 2         string str2 = "Ram went went to to his home";         Console.WriteLine(RemoveDuplicateWords(str2));          // Test Case: 3         string str3 = "Hello hello world world";         Console.WriteLine(RemoveDuplicateWords(str3));     } } 
JavaScript
// Function to remove duplicate words using Regular Expression function removeDuplicateWords(input) {     // Regular expression to match repeated words     let regex = /\b(\w+)(?:\W+\1\b)+/gi;      // Replace duplicate words with the first occurrence     return input.replace(regex, '$1'); }  // Test cases // Test Case: 1 let str1 = "Good bye bye world world"; console.log(removeDuplicateWords(str1));  // Test Case: 2 let str2 = "Ram went went to to his home"; console.log(removeDuplicateWords(str2));  // Test Case: 3 let str3 = "Hello hello world world"; console.log(removeDuplicateWords(str3)); 

Output
Good bye world Ram went to his home Hello world

Complexity of the above Programs

Time Complexity : O(n), where n is length of string
Auxiliary Space : O(1)


Next Article
Remove duplicate words from Sentence using Regular Expression

P

prashant_srivastava
Improve
Article Tags :
  • Java
  • Programming Language
  • DSA
  • python-regex
  • java-regular-expression
  • CPP-regex
Practice Tags :
  • Java

Similar Reads

    Remove Duplicate/Repeated words from String
    Given a string S, the task is to remove all duplicate/repeated words from the given string. Examples: Input: S = "Geeks for Geeks A Computer Science portal for Geeks" Output: Geeks for A Computer Science portal Explanation: here 'Geeks' and 'for' are duplicate so these words are removed from the str
    4 min read
    Remove last occurrence of a word from a given sentence string
    Given two strings S and W of sizes N and M respectively, the task is to remove the last occurrence of W from S. If there is no occurrence of W in S, print S as it is. Examples: Input: S = “This is GeeksForGeeks”, W="Geeks"Output: This is GeeksForExplanation:The last occurrence of “Geeks” in the stri
    11 min read
    Extracting each word from a String using Regex in Java
    Given a string, extract words from it. "Words" are defined as contiguous strings of alphabetic characters i.e. any upper or lower case characters a-z or A-Z. Examples: Input : Funny?? are not you? Output : Funny are not you Input : Geeks for geeks?? Output : Geeks for geeks We have discussed a solut
    2 min read
    Remove all the palindromic words from the given sentence
    Given a sentence str. The problem is to remove all the palindromic words from the given sentence.Examples: Input : str = "Text contains malayalam and level words" Output : Text contains and words Input : str = "abc bcd" Output : abc bcd Approach: One by one extract all the words. Check if the curren
    12 min read
    How to Remove repetitive characters from words of the given Pandas DataFrame using Regex?
    Prerequisite: Regular Expression in Python In this article, we will see how to remove continuously repeating characters from the words of the given column of the given Pandas Dataframe using Regex. Here, we are actually looking for continuously occurring repetitively coming characters for that we ha
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences