Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • NLP
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • NumPy
  • Pandas
  • OpenCV
  • R
  • Machine Learning Tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
Open In App
Next Article:
POS(Parts-Of-Speech) Tagging in NLP
Next article icon

POS(Parts-Of-Speech) Tagging in NLP

Last Updated : 03 Jan, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

One of the core tasks in Natural Language Processing (NLP) is Parts of Speech (PoS) tagging, which is giving each word in a text a grammatical category, such as nouns, verbs, adjectives, and adverbs. Through improved comprehension of phrase structure and semantics, this technique makes it possible for machines to study and comprehend human language more accurately.

In many NLP applications, including machine translation, sentiment analysis, and information retrieval, PoS tagging is essential. PoS tagging serves as a link between language and machine understanding, enabling the creation of complex language processing systems and serving as the foundation for advanced linguistic analysis.

What is POS(Parts-Of-Speech) Tagging?

Parts of Speech tagging is a linguistic activity in Natural Language Processing (NLP) wherein each word in a document is given a particular part of speech (adverb, adjective, verb, etc.) or grammatical category. Through the addition of a layer of syntactic and semantic information to the words, this procedure makes it easier to comprehend the sentence's structure and meaning.

In NLP applications, POS tagging is useful for machine translation, named entity recognition, and information extraction, among other things. It also works well for clearing out ambiguity in terms with numerous meanings and revealing a sentence's grammatical structure.

Default tagging is a basic step for the part-of-speech tagging. It is performed using the DefaultTagger class. The DefaultTagger class takes 'tag' as a single argument. NN is the tag for a singular noun. DefaultTagger is most useful when it gets to work with most common part-of-speech tag. that's why a noun tag is recommended. Example of POS Tagging

Consider the sentence: "The quick brown fox jumps over the lazy dog."

After performing POS Tagging:

  • "The" is tagged as determiner (DT)
  • "quick" is tagged as adjective (JJ)
  • "brown" is tagged as adjective (JJ)
  • "fox" is tagged as noun (NN)
  • "jumps" is tagged as verb (VBZ)
  • "over" is tagged as preposition (IN)
  • "the" is tagged as determiner (DT)
  • "lazy" is tagged as adjective (JJ)
  • "dog" is tagged as noun (NN)

By offering insights into the grammatical structure, this tagging aids machines in comprehending not just individual words but also the connections between them inside a phrase. For many NLP applications, like text summarization, sentiment analysis, and machine translation, this kind of data is essential.

Workflow of POS Tagging in NLP

The following are the processes in a typical natural language processing (NLP) example of part-of-speech (POS) tagging:

  • Tokenization: Divide the input text into discrete tokens, which are usually units of words or subwords. The first stage in NLP tasks is tokenization.
  • Loading Language Models: To utilize a library such as NLTK or SpaCy, be sure to load the relevant language model. These models offer a foundation for comprehending a language's grammatical structure since they have been trained on a vast amount of linguistic data.
  • Text Processing: If required, preprocess the text to handle special characters, convert it to lowercase, or eliminate superfluous information. Correct PoS labeling is aided by clear text.
  • Linguistic Analysis: To determine the text's grammatical structure, use linguistic analysis. This entails understanding each word's purpose inside the sentence, including whether it is an adjective, verb, noun, or other.
  • Part-of-Speech Tagging: To determine the text's grammatical structure, use linguistic analysis. This entails understanding each word's purpose inside the sentence, including whether it is an adjective, verb, noun, or other.
  • Results Analysis: Verify the accuracy and consistency of the PoS tagging findings with the source text. Determine and correct any possible problems or mistagging.

Implementation of Parts-of-Speech tagging using NLTK in Python

Installing packages

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

Implementation

Python3
# Importing the NLTK library import nltk from nltk.tokenize import word_tokenize from nltk import pos_tag  # Sample text text = "NLTK is a powerful library for natural language processing."  # Performing PoS tagging pos_tags = pos_tag(words)  # Displaying the PoS tagged result in separate lines print("Original Text:") print(text)  print("\nPoS Tagging Result:") for word, pos_tag in pos_tags:     print(f"{word}: {pos_tag}") 

Output:

Original Text:
NLTK is a powerful library for natural language processing.
PoS Tagging Result:
NLTK: NNP
is: VBZ
a: DT
powerful: JJ
library: NN
for: IN
natural: JJ
language: NN
processing: NN
.: .

Import the NLTK library and its modules for tokenization. Tokenize the input text into words using word_tokenize. Use the pos_tag function from NLTK to perform part-of-speech tagging on the tokenized words. Print the original text and the resulting POS tags in separate lines, showing each word along with its corresponding part-of-speech tag.

Implementation of Parts-of-Speech tagging using Spacy in Python

Installing Packages

!pip install spacy
!python -m spacy download en_core_web_sm

Implementation

Python3
#importing libraries  import spacy  # Load the English language model nlp = spacy.load("en_core_web_sm")  # Sample text text = "SpaCy is a popular natural language processing library."  # Process the text with SpaCy doc = nlp(text)  # Display the PoS tagged result print("Original Text: ", text) print("PoS Tagging Result:") for token in doc:     print(f"{token.text}: {token.pos_}") 

Output:

Original Text:  SpaCy is a popular natural language processing library.
PoS Tagging Result:
SpaCy: PROPN
is: AUX
a: DET
popular: ADJ
natural: ADJ
language: NOUN
processing: NOUN
library: NOUN
.: PUNCT

Import the SpaCy library and load the English language model "en_core_web_sm" using spacy.load("en_core_web_sm"). Process the sample text using the loaded SpaCy model to obtain a Doc object containing linguistic annotations. Print the original text and iterate through the tokens in the processed Doc, displaying each token's text and its associated part-of-speech tag (token.pos_).

Types of POS Tagging in NLP

Assigning grammatical categories to words in a text is known as Part-of-Speech (PoS) tagging, and it is an essential aspect of Natural Language Processing (NLP). Different PoS tagging approaches exist, each with a unique methodology. Here are a few typical kinds:

1. Rule-Based Tagging

Rule-based part-of-speech (POS) tagging involves assigning words their respective parts of speech using predetermined rules, contrasting with machine learning-based POS tagging that requires training on annotated text corpora. In a rule-based system, POS tags are assigned based on specific word characteristics and contextual cues.

For instance, a rule-based POS tagger could designate the "noun" tag to words ending in "‑tion" or "‑ment," recognizing common noun-forming suffixes. This approach offers transparency and interpretability, as it doesn't rely on training data.

Let's consider an example of how a rule-based part-of-speech (POS) tagger might operate:
Rule: Assign the POS tag "noun" to words ending in "-tion" or "-ment."

Text: "The presentation highlighted the key achievements of the project's development."

Rule based Tags:

  • "The" - Determiner (DET)
  • "presentation" - Noun (N)
  • "highlighted" - Verb (V)
  • "the" - Determiner (DET)
  • "key" - Adjective (ADJ)
  • "achievements" - Noun (N)
  • "of" - Preposition (PREP)
  • "the" - Determiner (DET)
  • "project's" - Noun (N)
  • "development" - Noun (N)

In this instance, the predetermined rule is followed by the rule-based POS tagger to label words. "Noun" tags are applied to words like "presentation," "achievements," and "development" because of the aforementioned restriction. Despite the simplicity of this example, rule-based taggers may handle a broad variety of linguistic patterns by incorporating different rules, which makes the tagging process transparent and comprehensible.

2. Transformation Based tagging

Transformation-based tagging (TBT) is a part-of-speech (POS) tagging method that uses a set of rules to change the tags that are applied to words inside a text. In contrast, statistical POS tagging uses trained algorithms to predict tags probabilistically, while rule-based POS tagging assigns tags directly based on predefined rules.

To change word tags in TBT, a set of rules is created depending on contextual information. A rule could, for example, change a verb's tag to a noun if it comes after a determiner like "the." The text is systematically subjected to these criteria, and after each transformation, the tags are updated.

When compared to rule-based tagging, TBT can provide higher accuracy, especially when dealing with complex grammatical structures. To attain ideal performance, nevertheless, it might require a large rule set and additional computer power.

Consider the transformation rule: Change the tag of a verb to a noun if it follows a determiner like "the."

Text: "The cat chased the mouse".

Initial Tags:

  • "The" - Determiner (DET)
  • "cat" - Noun (N)
  • "chased" - Verb (V)
  • "the" - Determiner (DET)
  • "mouse" - Noun (N)

Transformation rule applied:

Change the tag of "chased" from Verb (V) to Noun (N) because it follows the determiner "the."

Updated tags:

  • "The" - Determiner (DET)
  • "cat" - Noun (N)
  • "chased" - Noun (N)
  • "the" - Determiner (DET)
  • "mouse" - Noun (N)

In this instance, the tag "chased" was changed from a verb to a noun by the TBT system using a transformation rule based on the contextual pattern. The tagging is updated iteratively and the rules are applied sequentially. Although this example is simple, given a well-defined set of transformation rules, TBT systems can handle more complex grammatical patterns.

3. Statistical POS Tagging

Utilizing probabilistic models, statistical part-of-speech (POS) tagging is a computer linguistics technique that places grammatical categories on words inside a text. If rule-based tagging uses massive annotated corpora to train its algorithms, statistical tagging uses machine learning.

In order to capture the statistical linkages present in language, these algorithms learn the probability distribution of word-tag sequences. CRFs (conditional random fields) and Hidden Markov Models (HMMs) are popular models for statistical point-of-sale classification. The algorithm estimates the chance of observing a specific tag given the current word and its context by learning from labeled samples during training.

The most likely tags for text that hasn't been seen are then predicted using the trained model. Statistical POS tagging works especially well for languages with complicated grammatical structures because it is exceptionally good at handling linguistic ambiguity and catching subtle language trends.

  • Hidden Markov Model POS tagging: Hidden Markov Models (HMMs) serve as a statistical framework for part-of-speech (POS) tagging in natural language processing (NLP). In HMM-based POS tagging, the model undergoes training on a sizable annotated text corpus to discern patterns in various parts of speech. Leveraging this training, the model predicts the POS tag for a given word based on the probabilities associated with different tags within its context.
    Comprising states for potential POS tags and transitions between them, the HMM-based POS tagger learns transition probabilities and word-emission probabilities during training. To tag new text, the model, employing the Viterbi algorithm, calculates the most probable sequence of POS tags based on the learned probabilities.
    Widely applied in NLP, HMMs excel at modeling intricate sequential data, yet their performance may hinge on the quality and quantity of annotated training data.

Advantages of POS Tagging

There are several advantages of Parts-Of-Speech (POS) Tagging including:

  • Text Simplification: Breaking complex sentences down into their constituent parts makes the material easier to understand and easier to simplify.
  • Information Retrieval: Information retrieval systems are enhanced by point-of-sale (POS) tagging, which allows for more precise indexing and search based on grammatical categories.
  • Named Entity Recognition: POS tagging helps to identify entities such as names, locations, and organizations inside text and is a precondition for named entity identification.
  • Syntactic Parsing: It facilitates syntactic parsing, which helps with phrase structure analysis and word link identification.

Disadvantages of POS Tagging

Some common disadvantages in part-of-speech (POS) tagging include:

  • Ambiguity: The inherent ambiguity of language makes POS tagging difficult since words can signify different things depending on the context, which can result in misunderstandings.
  • Idiomatic Expressions: Slang, colloquialisms, and idiomatic phrases can be problematic for POS tagging systems since they don't always follow formal grammar standards.
  • Out-of-Vocabulary Words: Out-of-vocabulary words (words not included in the training corpus) can be difficult to handle since the model might have trouble assigning the correct POS tags.
  • Domain Dependence: For best results, POS tagging models trained on a single domain should have a lot of domain-specific training data because they might not generalize well to other domains.

Next Article
POS(Parts-Of-Speech) Tagging in NLP

M

mohit gupta_omg :)
Improve
Article Tags :
  • Python
  • NLP
  • AI-ML-DS
  • Python-nltk
  • Natural-language-processing
Practice Tags :
  • python

Similar Reads

    Semantic Roles in NLP
    Semantic roles are labels that describe the relationship between a verb and its arguments, indicating the roles that entities play in a sentence. Semantic roles are crucial in NLP for understanding the meaning of sentences by identifying the relationships between verbs and their arguments.This artic
    6 min read
    Python | PoS Tagging and Lemmatization using spaCy
    spaCy is one of the best text analysis library. spaCy excels at large-scale information extraction tasks and is one of the fastest in the world. It is also the best way to prepare text for deep learning. spaCy is much faster and accurate than NLTKTagger and TextBlob. How to Install ? pip install spa
    2 min read
    Text Preprocessing in NLP
    Natural Language Processing (NLP) has seen tremendous growth and development, becoming an integral part of various applications, from chatbots to sentiment analysis. One of the foundational steps in NLP is text preprocessing, which involves cleaning and preparing raw text data for further analysis o
    6 min read
    Phases of Natural Language Processing (NLP)
    Natural Language Processing (NLP) helps computers to understand, analyze and interact with human language. It involves a series of phases that work together to process language and each phase helps in understanding structure and meaning of human language. In this article, we will understand these ph
    7 min read
    Unleashing the Power of Natural Language Processing
    Imagine talking to a computer and it understands you just like a human would. That’s the magic of Natural Language Processing. It a branch of AI that helps computers understand and respond to human language. It works by combining computer science to process text, linguistics to understand grammar an
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences