Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Python | Remove all digits from a list of strings
Next article icon

Python – Remove Non-English characters Strings from List

Last Updated : 10 Apr, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report

Given a List of Strings, perform removal of all Strings with non-english characters.

Input : test_list = [‘Good| ????’, ‘??Geeks???’] 
Output : [] 
Explanation : Both contain non-English characters 

Input : test_list = [“Gfg”, “Best”] 
Output : [“Gfg”, “Best”] 
Explanation : Both are valid English words.

Method #1 : Using regex + findall() + list comprehension

In this, we create a regex of unicodes and check for occurrence in String List, extract each String without unicode using findall().

  • Initializes a list called “test_list” with some sample strings containing non-English characters.
  • Prints the original list using the print() function along with a message.
  • Next, uses the findall() method of the re module to check for the presence of non-English characters in each string of the “test_list“.
  • The regular expression “[^\u0000-\u05C0\u2100-\u214F]+” matches any character that is not within the Unicode ranges of \u0000-\u05C0 and \u2100-\u214F. These ranges cover most of the Latin, Cyrillic, and Hebrew scripts, which are the commonly used scripts for English and other European languages.
  • The list comprehension creates a new list called “res” that contains only those strings from the original list which do not contain any non-English characters.
  • Finally, prints the extracted list using the print() function along with a message.

Below is the implementation of the above approach:

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
# Using regex + findall() + list comprehension
import re
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for",  '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# using findall() to neglect unicode of Non-English alphabets
res = [idx for idx in test_list if not re.findall("[^\u0000-\u05C0\u2100-\u214F]+", idx)]
 
# printing result
print("The extracted list : " + str(res))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good| ????', 'for', '??Geeks???']

Time complexity: O(n*k), where n is the length of the input list and k is the average length of the strings in the list.
Auxiliary space: O(m), where m is the length of the output list.

Method #2 : Using regex + search() + filter() + lambda

In this, we search for only English alphabets in String, and extract only those that have those. We use filter() + lambda to perform the task of passing filter functionality and iteration.

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
# Using regex + search() + filter() + lambda
import re
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for",  '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# using search() to get only those strings with alphabets
res = list(filter(lambda ele: re.search("[a-zA-Z\s]+", ele) is not None, test_list))
 
# printing result
print("The extracted list : " + str(res))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good| ????', 'for', '??Geeks???']

Time Complexity: O(n)
Auxiliary Space: O(n)

Method #3: Using for loop

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for", '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
loweralphabets="abcdefghijklmnopqrstuvwxyz"
upperalphabets="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
x=loweralphabets+upperalphabets
res=[]
for i in test_list:
    a=""
    for j in i:
        if j in x:
            a+=j
    res.append(a)
             
# printing result
print("The extracted list : " + str(res))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good', 'for', 'Geeks']

Time complexity: O(n*m), where n is the length of the input list and m is the maximum length of a string in the list.
Auxiliary space: O(n*m), as we are creating a new list to store the filtered strings.

Method 4: Using the unicodedata library

Step-by-step approach:

  • Import the unicodedata library.
  • Define a function is_english that takes a character as input and returns True if the character is a English alphabet, otherwise False.
  • Define a function remove_non_english that takes a list of strings as input, and returns a new list with only the English alphabets from the original strings.
  • In the remove_non_english function, iterate through each string in the input list using a for loop.
  • For each string, convert it to a list of characters using the list function.
  • Use the filter function with the is_english function as the filter condition to keep only the English alphabets in the list.
  • Use the join function to convert the filtered list of characters back into a string.
  • Append the filtered string to the output list.
  • Return the output list.

Below is the implementation of the above approach:
 

Python3




import unicodedata
 
def is_english(c):
    return c.isalpha() and unicodedata.name(c).startswith(('LATIN', 'COMMON'))
 
def remove_non_english(lst):
    output = []
    for s in lst:
        filtered = filter(is_english, list(s))
        english_str = ''.join(filtered)
        output.append(english_str)
    return output
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for", '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# printing result
print("The extracted list : " + str(remove_non_english(test_list)))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good', 'for', 'Geeks']

Time complexity: O(nk) where n is the length of the input list and k is the length of the longest string in the input list. 
Auxiliary space: O(nk) since we are storing the filtered strings in the output list.

Method #5: Using the ord() function

use the ord() function to determine if a character is an English alphabet. English alphabets have ASCII values ranging from 65 to 90 for uppercase letters and 97 to 122 for lowercase letters.

Here’s the step-by-step approach:

  1. Define a function is_english(c) that takes a character as input and returns True if the character is an English alphabet and False otherwise. We can use the ord() function to get the ASCII value of the character and compare it with the ASCII values of English alphabets.
  2. Define a function remove_non_english(lst) that takes a list of strings as input and returns a list of strings with non-English characters removed. We can iterate through each string in the input list and iterate through each character in the string. If a character is English, we add it to a new string. If not, we skip it. We append the new string to an output list.
  3. Initialize a list test_list with some sample input strings.
  4. Call the remove_non_english() function with the test_list as input.
  5. Print the original and extracted lists.

Python3




def is_english(c):
    ascii_value = ord(c)
    return (ascii_value >= 65 and ascii_value <= 90) or (ascii_value >= 97 and ascii_value <= 122)
 
def remove_non_english(lst):
    output = []
    for s in lst:
        english_str = ""
        for c in s:
            if is_english(c):
                english_str += c
        output.append(english_str)
    return output
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for", '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# printing result
print("The extracted list : " + str(remove_non_english(test_list)))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good', 'for', 'Geeks']

Time Complexity: O(n*m), where n is the number of strings in the input list and m is the length of the longest string in the list.
Auxiliary Space: O(n*m), where n is the number of strings in the input list and m is the length of the longest string in the list. 

Method #6: Using the translate() method

Step-by-step approach:

  • Initialize a translation table that will be used to remove non-English characters from the strings. This is done using the str.maketrans() method and passing two strings as arguments: the first string contains all non-English characters that should be replaced with None, and the second string is an empty string to indicate that those characters should be removed.
  • Initialize a list called result to store the modified strings.
  • Iterate over each string in the test_list using a for loop.
  • Apply the translation table to the current string using the translate() method and passing the translation table as an argument.
  • Append the modified string to the result list.
  • Print the resulting list using the print() function and passing the string representation of result.

Below is the implementation of the above approach:

Python3




# Python3 code to demonstrate working of
# Remove Non-English characters Strings from List
 
# initializing list
test_list = ['Gfg', 'Good| ????', "for", '??Geeks???']
 
# printing original list
print("The original list is : " + str(test_list))
 
# create a translation table to remove non-English characters
non_english = str.maketrans("", "", "0123456789!@#$%^&*()_+-=[]{}\\|;:'\",./<>?`~¡¢£¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ")
 
# initialize an empty list to store modified strings
result = []
 
# iterate over each string in the test_list
for string in test_list:
    # apply the translation table to remove non-English characters
    modified_string = string.translate(non_english)
    # append the modified string to the result list
    result.append(modified_string)
 
# print the resulting list
print("The extracted list : " + str(result))
 
 
Output
The original list is : ['Gfg', 'Good| ????', 'for', '??Geeks???'] The extracted list : ['Gfg', 'Good ', 'for', 'Geeks']

Time complexity: O(n*m), where n is the length of the test_list and m is the maximum length of a string in the list. 
Auxiliary space: O(n*m), where n is the length of the test_list and m is the maximum length of a string in the list. 



Next Article
Python | Remove all digits from a list of strings
author
manjeet_04
Improve
Article Tags :
  • Python
  • Python Programs
  • Python string-programs
Practice Tags :
  • python

Similar Reads

  • Python | Remove Kth character from strings list
    Sometimes, while working with data, we can have a problem in which we need to remove a particular column, i.e the Kth character from string list. String are immutable, hence removal just means re creating a string without the Kth character. Let's discuss certain ways in which this task can be perfor
    7 min read
  • Python | Remove last character in list of strings
    Sometimes, we come across an issue in which we require to delete the last character from each string, that we might have added by mistake and we need to extend this to the whole list. This type of utility is common in web development. Having shorthands to perform this particular job is always a plus
    8 min read
  • Python | Remove given character from Strings list
    Sometimes, while working with Python list, we can have a problem in which we need to remove a particular character from each string from list. This kind of application can come in many domains. Let's discuss certain ways to solve this problem. Method #1 : Using replace() + enumerate() + loop This is
    8 min read
  • Python - Remove Rear K characters from String List
    Sometimes, we come across an issue in which we require to delete the last characters from each string, that we might have added by mistake and we need to extend this to the whole list. This type of utility is common in web development. Having shorthands to perform this particular job is always a plu
    5 min read
  • Python | Remove all digits from a list of strings
    The problem is about removing all numeric digits from each string in a given list of strings. We are provided with a list where each element is a string and the task is to remove any digits (0-9) from each string, leaving only the non-digit characters. In this article, we'll explore multiple methods
    4 min read
  • Remove Special Characters from String in Python
    When working with text data in Python, it's common to encounter strings containing unwanted special characters such as punctuation, symbols or other non-alphanumeric elements. For example, given the input "Data!@Science#Rocks123", the desired output is "DataScienceRocks123". Let's explore different
    2 min read
  • Python - Escape reserved characters in Strings List
    Given List of Strings, escape reserved characters in each String. Input : test_list = ["Gf-g", "be)s(t"] Output : ['Gf\\-g', 'be\\)s\\(t'] Explanation : All reserved character elements escaped, by adding double \\. Input : test_list = ["Gf-g"] Output : ['Gf\\-g'] Explanation : All reserved character
    3 min read
  • Python - Remove front K characters from each string in String List
    Sometimes, we come across an issue in which we require to delete the first K characters from each string, that we might have added by mistake and we need to extend this to the whole list. This type of utility is common in web development. Having shorthands to perform this particular job is always a
    6 min read
  • Removing newline character from string in Python
    When working with text data, newline characters (\n) are often encountered especially when reading from files or handling multi-line strings. These characters can interfere with data processing and formatting. In this article, we will explore different methods to remove newline characters from strin
    2 min read
  • Remove Multiple Characters from a String in Python
    Removing multiple characters from a string in Python can be achieved using various methods, such as str.replace(), regular expressions, or list comprehensions. Each method serves a specific use case, and the choice depends on your requirements. Let’s explore the different ways to achieve this in det
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences