Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • NLP
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • NumPy
  • Pandas
  • OpenCV
  • R
  • Machine Learning Tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
Open In App
Next Article:
Token Classification in Natural Language Processing
Next article icon

Token Classification in Natural Language Processing

Last Updated : 12 Jun, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Token classification is a core task in Natural Language Processing (NLP) where each token (typically a word or sub-word) in a sequence is assigned a label. This task is important for extracting structured information from unstructured text. It does so by labeling tokens with specific categories such as entity types, parts of speech, or syntactic chunks. It helps models to better understand both the semantic meaning and the grammatical structure of the language.

What is a Token ?

Tokens are the smaller units of a text, typically separated by whitespace (like spaces or newlines) and punctuation marks (such as commas or periods). For example, Consider the sentence

"The quick brown fox jumps over the lazy dog."

The tokens for this sentence would be:

"The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog", "."

Word level vs Sub-word level (BERT-style)

1. In traditional NLP pipelines, tokenization is done at the word level, meaning each token corresponds directly to a word in the sentence.

Text: "running"

Tokens: ["running"]

This approach is simple, but has drawbacks as it struggles with :

  • Rare or misspelled words
  • Change in structure and formation of words
  • Out of vocabulary terms

2. Modern transformer-based models like BERT, RoBERTa, and GPT use sub-word tokenization algorithms such as WordPiece, Byte Pair Encoding (BPE) or Unigram models. These break words into smaller, meaningful fragments.

Text: "running"

Tokens: ["run", "##ing"]

"##ing" shows that it is a continuation of a previous sub-word. This method helps with:

  • Efficiently handling rare or unseen words
  • Sharing sub-word representations across different words (e.g., “runner”, “running”, “run”)

BIO/IOB tagging scheme

The IOB tagging scheme also referred to as BIO (Beginning, Inside, Outside) is a widely used labeling format in token classification tasks such as Named Entity Recognition (NER), chunking, and semantic role labeling. This format helps models identify spans of consecutive tokens that together represent a meaningful entity (like a person’s name, organization, or location).

Components of IOB/BIO Tags

Each token is assigned one of these tags

Tags

Meaning

B

Beginning of a chunk/entity

I

Inside a chunk/entity

O

Outside of any chunk/entity

Applications of Token Classification

1. Named Entity Recognition (NER)

NER classified entities typically include categories such as person names, locations, organizations, dates, and more. The task involves assigning a specific label to each token in the text, indicating the type of entity it belongs to, or even a neutral label (often "O") if the token is not part of any named entity.

Code Example:

Python
from transformers import pipeline  classifier = pipeline("ner") classifier("Hello I'm Mary and I live in Amsterdam.") 

Output:

NER_output
Output for NER model

Interpretation:

  1. "Mary" labeled as a Person (I-PER) with 99.8% confidence
  2. "Amsterdam" labeled as a Location (I-LOC) with 99.85% confidence

2. Part-of-Speech (POS) Tagging

POS tagging is a classic token classification task, where the goal is to assign a grammatical category (such as noun, verb, adjective, etc.) to each word in a sentence. The model processes the input text and labels each token based on its syntactic role within the sentence.

Code Example:

Python
from transformers import pipeline  classifier = pipeline("token-classification", model = "vblagoje/bert-english-uncased-finetuned-pos") classifier("Hello I'm Mary and I live in Amsterdam.") 

Output:

Screenshot-2025-06-07-172009
Output for POS model

Interpretation:

TokenPOS TagMeaningConfidence
helloINTJInterjection (e.g., greetings)99.68%
iPRONPronoun99.94%
'AUXAuxiliary verb (part of "I'm")99.67%
mAUXAuxiliary verb ("am")99.65%
maryPROPNProper noun99.85%
andCCONJCoordinating conjunction99.92%
iPRONPronoun99.95%
liveVERBMain verb99.86%
inADPAdposition (preposition)99.94%
amsterdamPROPNProper noun99.88%
.PUNCTPunctuation99.96%

Challenges in Token Classification

  • Sub-word Token Handling: Sub-word tokenization splits words into smaller units, complicating token-level labeling.
    For example, "playing" → ["play", "##ing"]. The challenge lies in aligning word-level labels with sub-word tokens. A common solution is to assign the label only to the first sub-word and mask the rest during training and evaluation.
  • Imbalanced Label Distribution: In tasks like NER, the "O" class dominates, making the model biased toward non-entity predictions. This results in poor performance for rare labels. To address this, oversampling techniques are used to ensure the model pays attention to minority classes.
  • Domain Adaptation: Pretrained models often fail on specialized domains like medical or legal texts due to unfamiliar terms. This mismatch reduces performance. Fine-tuning or using domain adapted models like BioBERT or LegalBERT helps bridge the vocabulary and context gap.
  • Label Ambiguity: Words like "Apple" may refer to a company or a fruit, depending on context. Such ambiguity confuses models in the absence of strong contextual clues. Transformer models like BERT use contextual embeddings to simplify meanings, but errors still occur in subtle or low-resource contexts.

Next Article
Token Classification in Natural Language Processing

Y

yashmwcl2
Improve
Article Tags :
  • NLP
  • Natural-language-processing
  • AI-ML-DS With Python

Similar Reads

    Top 7 Applications of NLP (Natural Language Processing)
    In the past, did you ever imagine that you could talk to your phone and get things done? Or that your phone would talk back to you! This has become a pretty normal thing these days with Siri, Alexa, Google Assistant, etc. You can ask any possible questions ranging from “What’s the weather outside” t
    6 min read
    What is Tokenization in Natural Language Processing (NLP)?
    Tokenization is a fundamental process in Natural Language Processing (NLP), essential for preparing text data for various analytical and computational tasks. In NLP, tokenization involves breaking down a piece of text into smaller, meaningful units called tokens. These tokens can be words, subwords,
    4 min read
    Natural Language Processing with R
    Natural Language Processing (NLP) is a field of artificial intelligence (AI) that enables machines to understand and process human language. R, known for its statistical capabilities, provides a wide range of libraries to perform various NLP tasks. Understanding Natural Language ProcessingNLP involv
    4 min read
    Building a Rule-Based Chatbot with Natural Language Processing
    A rule-based chatbot follows a set of predefined rules or patterns to match user input and generate an appropriate response. The chatbot can’t understand or process input beyond these rules and relies on exact matches making it ideal for handling repetitive tasks or specific queries.Pattern Matching
    4 min read
    Augmented Transition Networks in Natural Language Processing
    Augmented Transition Networks (ATNs) are a powerful formalism for parsing natural language, playing a significant role in the early development of natural language processing (NLP). Developed in the late 1960s and early 1970s by William Woods, ATNs extend finite state automata to include additional
    8 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences