Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Lancaster Stemming Technique in NLP
Next article icon

Lancaster Stemming Technique in NLP

Last Updated : 19 Dec, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The Lancaster Stemmer or the Paice-Husk Stemmer, is a robust algorithm used in natural language processing to reduce words to their root forms. Developed by C.D. Paice in 1990, this algorithm aggressively applies rules to strip suffixes such as "ing" or "ed."

Prerequisites: NLP Pipeline, Stemming

Implementing Lancaster Stemming

You can easily implement the Lancaster Stemmer using Python. Here’s a simple example using the 'stemming' library, which can be installed using the following command:

!pip install stemming

Now, proceed with the implementation:

Python
import nltk nltk.download('punkt_tab')  from stemming.paicehusk import stem from nltk.tokenize import word_tokenize  text = "The cats are running swiftly." words = word_tokenize(text) stemmed_words = [stem(word) for word in words]  print("Original words:", words) print("Stemmed words:", stemmed_words) 

Output:

Original words: ['The', 'cats', 'are', 'running', 'swiftly', '.']

Stemmed words: ['Th', 'cat', 'ar', 'run', 'swiftli', '.']

How the Lancaster Stemmer Works?

The Lancaster Stemmer works by repeatedly applying a set of rules to remove endings from words until no more changes can be made. It simplifies words like "running" or "runner" into their root form, such as "run" or even "r" depending on how aggressively the algorithm applies its rules.

Key Features and Benefits of Lancaster Stemmer

  • The Lancaster Stemmer is designed for speed, making it suitable for processing large datasets quickly.
  • It reduces the diversity of word forms by consolidating various forms into a single root, enhancing the efficiency of search operations.
  • Utilizing over 100 rules, it can handle complex word forms that might be overlooked by less comprehensive stemmers.
  • The stemmer is straightforward to implement in programming environments, making it accessible for beginners.

Limitations of Lancaster Stemmer

  • The aggressive nature of the algorithm can result in stems that are not meaningful, such as reducing "university" and "universe" to "univers."
  • Primarily optimized for English, its performance may degrade with other languages.
  • Due to its aggressive stemming, it can conflate words with different meanings into the same stem, leading to potential ambiguity.



Next Article
Lancaster Stemming Technique in NLP

A

ayushimalm50
Improve
Article Tags :
  • NLP
  • AI-ML-DS
  • AI-ML-DS With Python

Similar Reads

    Natural Language Processing (NLP): 7 Key Techniques
    Natural Language Processing (NLP) is a subfield in Deep Learning that makes machines or computers learn, interpret, manipulate and comprehend the natural human language. Natural human language comes under the unstructured data category, such as text and voice. Generally, computers can understand the
    5 min read
    Advanced Smoothing Techniques in Language Models
    Language models predicts the probability of a sequence of words and generate coherent text. These models are used in various applications, including chatbots, translators, and more. However, one of the challenges in building language models is handling the issue of zero probabilities for unseen even
    6 min read
    Porter Stemmer Technique in Natural Language Processing
    It is one of the most popular stemming methods proposed in 1980 by Martin Porter . It simplifies words by reducing them to their root forms, a process known as "stemming." For example, the words "running," "runner," and "ran" can all be reduced to their root form, "run." In this article we will expl
    2 min read
    Feature Extraction Techniques - NLP
    Introduction : This article focuses on basic feature extraction techniques in NLP to analyse the similarities between pieces of text. Natural Language Processing (NLP) is a branch of computer science and machine learning that deals with training computers to process a large amount of human (natural)
    10 min read
    Discounting Techniques in Language Models
    Language models are essential tools in natural language processing (NLP), responsible for predicting the next word in a sequence based on the words that precede it. A common challenge in building language models, particularly n-gram models, is the estimation of probabilities for word sequences that
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences