Python Program to Split joined consecutive similar characters
Last Updated : 18 May, 2023
Given a String, our task is to write a Python program to split on the occurrence of a non-similar character.
Input : test_str = 'ggggffggisssbbbeessssstt'
Output : ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
Explanation : All similar consecutive characters are converted to separate strings.
Input : test_str = 'ggggffgg'
Output : ['gggg', 'ff', 'gg']
Explanation : All similar consecutive characters are converted to separate strings.
Method #1 : Using join() + list comprehension + groupby()
In this, the characters are grouped on similarity using groupby(), join() is used to reform strings list. List comprehension performs task of iterating constructed groups.
Python3 # Python3 code to demonstrate working of # Split joined consecutive similar characters # Using join() + list comprehension + groupby() from itertools import groupby # initializing string test_str = 'ggggffggisssbbbeessssstt' # printing original string print("The original string is : " + str(test_str)) # groupby groups the elements, join joining Consecutive groups res = ["".join(group) for ele, group in groupby(test_str)] # printing result print("Consecutive split string is : " + str(res))
OutputThe original string is : ggggffggisssbbbeessssstt Consecutive split string is : ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #2 : Using finditer() + regex + list comprehension
In this, regex is used to check for consecutive equal sequences. The finditer() performs the task of finding the matching regex in a string.
Python3 # Python3 code to demonstrate working of # Split joined consecutive similar characters # Using finditer() + regex + list comprehension import re # initializing string test_str = 'ggggffggisssbbbeessssstt' # printing original string print("The original string is : " + str(test_str)) # list comprehension iterates for all the formed groups found by regex # if consecutive numbers need to search "d" can be used. res = [iters.group(0) for iters in re.finditer(r"(\D)\1*", test_str)] # printing result print("Consecutive split string is : " + str(res))
OutputThe original string is : ggggffggisssbbbeessssstt Consecutive split string is : ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
The Time and Space Complexity of all the methods is :
Time Complexity: O(n)
Space Complexity: O(n)
Approach 3: Using for loop
This approach uses a for loop to iterate over the characters in the string and a temporary string to store the consecutive similar characters. The time and space complexity of this approach is O(n).
Python3 def split_consecutive_characters(test_str): n = len(test_str) result = [] temp = test_str[0] for i in range(1, n): if test_str[i] == test_str[i-1]: temp += test_str[i] else: result.append(temp) temp = test_str[i] result.append(temp) return result test_str = 'ggggffggisssbbbeessssstt' print(split_consecutive_characters(test_str))
Output['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
Time Complexity: O(n), where n is the length of the input string.
Auxiliary Space: O(n)
Approach#4: Using re.findall()
In this approach, we use the re.findall method to find all the consecutive groups of similar characters in the input string. The regular expression pattern r"((\w)\2*)" matches any sequence of characters that are the same, and returns them as groups We then extract the first element of each group (which is the string of consecutive characters) and return them as a list.
- Define a regular expression pattern to match consecutive characters.
- Use re.findall() method to extract all the groups of consecutive characters from the input string.
- Return a list of the matched groups.
Python3 import re def split_consecutive_chars(test_str): pattern = r"((\w)\2*)" groups = re.findall(pattern, test_str) return [group[0] for group in groups] test_str = 'ggggffggisssbbbeessssstt' print(split_consecutive_chars(test_str))
Output['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
Time Complexity: O(n), where n is the length of the input string. The re.findall() method is usually implemented using efficient string matching algorithms that have linear time complexity.
Space Complexity: O(n), where n is the length of the input string. This is because we need to store the matched groups in a list. The regular expression pattern itself does not require any extra space.
Approach#5: Using recursion:
Algorithm:
- Check if the input string is empty. If yes, return an empty list.
- Set the variable 'first' to the first character of the input string.
- Set the variable 'i' to 1.
- While 'i' is less than the length of the input string and the 'i-th' character of the input string is equal to the 'first' character, increment 'i'.
- Create a new list with the current consecutive characters, which is the 'first' character repeated 'i' times, and concatenate it with the result of recursively calling the
- function on the remaining string after the current consecutive characters.
- Return the final list.
Python3 def split_consecutive_characters(test_str): if not test_str: return [] first = test_str[0] i = 1 while i < len(test_str) and test_str[i] == first: i += 1 return [first * i] + split_consecutive_characters(test_str[i:]) test_str = 'ggggffggisssbbbeessssstt' # printing original string print("The original string is : " + str(test_str)) print(split_consecutive_characters(test_str)) #This code is contributed by Jyothi Pinjala
OutputThe original string is : ggggffggisssbbbeessssstt ['gggg', 'ff', 'gg', 'i', 'sss', 'bbb', 'ee', 'sssss', 'tt']
Time Complexity: O(n), where n is the length of the input string. This is because we need to iterate through the entire string to split it into consecutive characters.
Space Complexity: O(n), where n is the length of the input string. This is because we are creating a new list for each set of consecutive characters, and we could potentially have n/2 such sets if all characters are consecutive. Additionally, the recursive calls to the function create a call stack with a maximum depth of n/2, as the length of the string decreases by at least half with each recursive call.
Similar Reads
Python | Split string in groups of n consecutive characters
Given a string (be it either string of numbers or characters), write a Python program to split the string by every nth character. Examples: Input : str = "Geeksforgeeks", n = 3 Output : ['Gee', 'ksf', 'org', 'eek', 's'] Input : str = "1234567891234567", n = 4 Output : [1234, 5678, 9123, 4567] Method
2 min read
Python - Consecutive Repetition of Characters
Sometimes, while working with character lists we can have a problem in which we need to perform consecutive repetition of characters. This can have applications in many domains. Let us discuss certain ways in which this task can be performed. Method #1: Using list comprehension This is one of the
5 min read
Python | Pair the consecutive character strings in a list
Sometimes while programming, we can face a problem in which we need to perform consecutive element concatenation. This problem can occur at times of school programming or competitive programming. Let's discuss certain ways in which this problem can be solved. Method #1 : Using list comprehension + z
5 min read
Python - Equidistant consecutive characters Strings
Given a Strings List, extract all the strings, whose consecutive characters are at the common difference in ASCII order. Input : test_list = ["abcd", "egil", "mpsv", "abd"] Output : ['abcd', 'mpsv'] Explanation : In mpsv, consecutive characters are at distance 3. Input : test_list = ["abcd", "egil",
9 min read
Python - Custom Consecutive character repetition in String
Given a String, repeat characters consecutively by number mapped in dictionary. Input : test_str = 'Geeks4Geeks', test_dict = {"G" : 3, "e" : 1, "4" : 3, "k" : 5, "s" : 3} Output : GGGeekkkkksss444GGGeekkkkksss Explanation : Each letter repeated as per value in dictionary.Input : test_str = 'Geeks4G
4 min read
Python Program To Remove all control characters
In the telecommunication and computer domain, control characters are non-printable characters which are a part of the character set. These do not represent any written symbol. They are used in signaling to cause certain effects other than adding symbols to text. Removing these control characters is
3 min read
Python program to remove last N characters from a string
In this article, weâll explore different ways to remove the last N characters from a string in Python. This common string manipulation task can be achieved using slicing, loops, or built-in methods for efficient and flexible solutions. Using String SlicingString slicing is one of the simplest and mo
2 min read
Python program to extract characters in given range from a string list
Given a Strings List, extract characters in index range spanning entire Strings list. Input : test_list = ["geeksforgeeks", "is", "best", "for", "geeks"], strt, end = 14, 20 Output : sbest Explanation : Once concatenated, 14 - 20 range is extracted.Input : test_list = ["geeksforgeeks", "is", "best",
4 min read
Python | Custom Consecutive Character Pairing
Sometimes, while working with Python Strings, we can have problem in which we need to perform the pairing of consecutive strings with deliminator. This can have application in many domains. Lets discuss certain ways in which this task can be performed. Method #1 : Using join() + list comprehension T
4 min read
Python - Group Similar Start and End character words
Sometimes, while working with Python data, we can have problem in which we need to group all the words on basis of front and end characters. This kind of application is common in domains in which we work with data like web development. Lets discuss certain ways in which this task can be performed. M
5 min read