Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Conditional Random Fields (CRFs) for POS tagging in NLP
Next article icon

Conditional Random Fields (CRFs) for POS tagging in NLP

Last Updated : 30 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Part of Speech tagging is one of the tasks on which early Language models were tested for the GLUE score. In this article, we will learn about one such method which can be used for POS tagging. But before that let us understand what is POS tagging.

What is POS tagging?

Part-of-speech (POS) tagging is the process of assigning grammatical categories, such as nouns, verbs, adjectives, etc., to each word in a sentence. POS tagging is a fundamental task in Natural Language Processing (NLP) and is used in various applications, such as machine translation, sentiment analysis, and text-to-speech synthesis.

Here's an example of POS tagging for the sentence "She likes to read books":

WordPOS Tag

She

PRON

likes

VERB

to

PART

read

VERB

books

NOUN


In this example, the word "She" is tagged as a pronoun, "likes" is tagged as a verb, "to" is tagged as a particle, "read" is tagged as a verb, and "books" is tagged as a noun. The POS tags provide information about the syntactic structure of the sentence, which can be used in downstream tasks, such as parsing or sentiment analysis.

Conditional Random Fields

A Conditional Random Field (CRF) is a type of probabilistic graphical model often used in Natural Language Processing (NLP) and computer vision tasks. It is a variant of a Markov Random Field (MRF), which is a type of undirected graphical model.

  • CRFs are used for structured prediction tasks, where the goal is to predict a structured output based on a set of input features. For example, in NLP, a commonly structured prediction task is Part-of-Speech (POS) tagging, where the goal is to assign a part-of-speech tag to each word in a sentence. CRFs can also be used for Named Entity Recognition (NER), chunking, and other tasks where the output is a structured sequence.
  • CRFs are trained using maximum likelihood estimation, which involves optimizing the parameters of the model to maximize the probability of the correct output sequence given the input features. This optimization problem is typically solved using iterative algorithms like gradient descent or L-BFGS.
  • The formula for a Conditional Random Field (CRF) is similar to that of a Markov Random Field (MRF) but with the addition of input features that condition the probability distribution over output sequences.

Let X be the input features and Y be the output sequence. The joint probability distribution of a CRF is given by:

P(Y | X) = \frac{1}{Z(X)} exp(\sum i\sum k λ_k * f_k(y_i-1, y_i, x_i))

where:

  •  Z(X) is the normalization factor that ensures the distribution sums to 1 over all possible output sequences.
  •  λk are the learned model parameters.
  • fk(yi - 1, yi, xi) are the feature functions that take as input the current output state yi, the previous output state yi - 1, and the input features xi.
  •  These functions can be binary or real-valued, and capture dependencies between the input features and the output sequence.

Here's an example of using Conditional Random Fields (CRFs) for POS tagging in Python using the sklearn_crfsuite library. First, you'll need to install the sklearn_crfsuite library using 'pip':

pip install sklearn-crfsuite

'sklearn-crfsuite' is a Python library that provides an interface to the CRFsuite implementation of Conditional Random Fields (CRFs), a popular machine learning algorithm for sequence labeling tasks such as Part-Of-Speech (POS) tagging and named entity recognition (NER). The library is built on top of scikit-learn, a popular machine-learning library for Python.

Python3
import nltk import sklearn_crfsuite from sklearn_crfsuite import metrics 

Then, you can load a dataset of tagged sentences. For example:

Python3
# Load the Penn Treebank corpus nltk.download('treebank') corpus = nltk.corpus.treebank.tagged_sents() print(corpus) 

Output:

[nltk_data] Downloading package treebank to /root/nltk_data... [nltk_data]   Package treebank is already up-to-date! [[('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'),  ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB')......

In this article we are using treebank corpus, you can use your own dataset.

Define Feature function.

In order to convert a sentence into a sequence of features that can be used as input to a CRF model, you can define a feature function that extracts relevant information from each word in the sentence. Here's an example feature function that extracts the following features for each word in the sentence:

  • The word itself.
  • The word is in lowercase.
  • The word is in uppercase.
  • The length of the word.
  • Whether the word contains a hyphen.
  • Whether the word is the first word in the sentence.
  • Whether the word is the last word in the sentence.
  • The previous word in the sentence.
  • The next word in the sentence.
Python3
# Define a function to extract features for each word in a sentence def word_features(sentence, i):     word = sentence[i][0]     features = {         'word': word,         'is_first': i == 0, #if the word is a first word         'is_last': i == len(sentence) - 1,  #if the word is a last word         'is_capitalized': word[0].upper() == word[0],         'is_all_caps': word.upper() == word,      #word is in uppercase         'is_all_lower': word.lower() == word,      #word is in lowercase          #prefix of the word         'prefix-1': word[0],            'prefix-2': word[:2],         'prefix-3': word[:3],          #suffix of the word         'suffix-1': word[-1],         'suffix-2': word[-2:],         'suffix-3': word[-3:],          #extracting previous word         'prev_word': '' if i == 0 else sentence[i-1][0],          #extracting next word         'next_word': '' if i == len(sentence)-1 else sentence[i+1][0],         'has_hyphen': '-' in word,    #if word has hypen         'is_numeric': word.isdigit(),  #if word is in numeric         'capitals_inside': word[1:].lower() != word[1:]     }     return features 

Note that this is just an example feature function and the features you extract may vary depending on your specific use case. You can customize this function to extract any features that you think will be relevant to your sequence labeling task. The next step is splitting the dataset into a train set and a test set. 

Python3
# Extract features for each sentence in the corpus X = [] y = [] for sentence in corpus:     X_sentence = []     y_sentence = []     for i in range(len(sentence)):         X_sentence.append(word_features(sentence, i))         y_sentence.append(sentence[i][1])     X.append(X_sentence)     y.append(y_sentence)   # Split the data into training and testing sets split = int(0.8 * len(X)) X_train = X[:split] y_train = y[:split] X_test = X[split:] y_test = y[split:] 

Now, let's train the CRF model.

Python3
# Train a CRF model on the training data crf = sklearn_crfsuite.CRF(     algorithm='lbfgs',     c1=0.1,     c2=0.1,     max_iterations=100,     all_possible_transitions=True ) crf.fit(X_train, y_train)  # Make predictions on the test data and evaluate the performance y_pred = crf.predict(X_test)  print(metrics.flat_accuracy_score(y_test, y_pred)) 

Output:

0.9631718149608264

'sklearn_crfsuite.CRF()' is a class in the sklearn-crfsuite Python library that represents a Conditional Random Fields (CRF) model. It is used to train and evaluate CRF models for sequence labeling tasks such as Part-Of-Speech (POS) tagging and named entity recognition (NER).

The CRF() class constructor takes several parameters:

  • algorithm: The optimization algorithm to use for training the CRF model. Possible values are 'lbfgs', 'l2sgd', 'ap', 'pa', and 'arow'. The default is 'lbfgs'.
  • c1: The L1 regularization parameter for the CRF model. The default is 1.0.
  • c2: The L2 regularization parameter for the CRF model. The default is 1e-3.
  • max_iterations: The maximum number of iterations to run the optimization algorithm. The default is 100.
  • all_possible_transitions:  Whether to include all possible state transitions in the CRF model. The default is False.
  • verbose:  Whether to output progress messages during training. The default is False.

Another way to train a CRF model is to use 'pycrfsuite.Trainer()' which is a part of the python-crfsuite library. The 'pycrfsuite.Trainer()' is used for training the CRF model. Let's see its implementation,

Python3
import pycrfsuite  # Train a CRF model suing pysrfsuite trainer = pycrfsuite.Trainer(verbose=False) for x, y in zip(X_train, y_train):     trainer.append(x, y) trainer.set_params({     'c1': 1.0,     'c2': 1e-3,     'max_iterations': 50,     'feature.possible_transitions': True }) trainer.train('pos.crfsuite')  # Tag a new sentence tagger = pycrfsuite.Tagger() tagger.open('pos.crfsuite') sentence = 'Geeksforgeeks is a best platform for students.'.split() features = [word_features(sentence, i) for i in range(len(sentence))] tags = tagger.tag(features) print(list(zip(sentence, tags))) 

Output:

[('Geeksforgeeks', 'MD'), ('is', 'VB'), ('a', 'DT'), ('best', 'JJ'),  ('platform', 'NN'), ('for', 'NN'), ('students.', 'NNS')]

The 'pycrfsuite.Tagger()' is used for applying the trained model for prediction.

Conclusion

CRFs have been shown to be effective for POS tagging in various languages, including English, Chinese, and Arabic. They are also used in other NLP tasks, such as named entity recognition and syntactic parsing.


Next Article
Conditional Random Fields (CRFs) for POS tagging in NLP

R

ram9119
Improve
Article Tags :
  • NLP
  • AI-ML-DS

Similar Reads

    NLP | Customization Using Tagged Corpus Reader
    How we can use Tagged Corpus Reader ?   Customizing word tokenizerCustomizing sentence tokenizerCustomizing paragraph block readerCustomizing tag separatorConverting tags to a universal tagset   Code #1 : Customizing word tokenizer   Python3 # Loading the libraries from nltk.tokenize import SpaceTok
    2 min read
    NLP | Regex and Affix tagging
    Regular expression matching is used to tag words. Consider the example, numbers can be matched with \d to assign the tag CD (which refers to a Cardinal number). Or one can match the known word patterns, such as the suffix "ing".  Understanding the concept -  RegexpTagger is a subclass of SequentialB
    3 min read
    NLP | Classifier-based tagging
    ClassifierBasedPOSTagger class: It is a subclass of ClassifierBasedTagger that uses classification technique to do part-of-speech tagging. From the words, features are extracted and then passed to an internal classifier. It classifies the features and returns a label i.e. a part-of-speech tag. The f
    2 min read
    POS(Parts-Of-Speech) Tagging in NLP
    One of the core tasks in Natural Language Processing (NLP) is Parts of Speech (PoS) tagging, which is giving each word in a text a grammatical category, such as nouns, verbs, adjectives, and adverbs. Through improved comprehension of phrase structure and semantics, this technique makes it possible f
    11 min read
    NLP | Splitting and Merging Chunks
    In natural language processing (NLP), text division into pieces that are smaller and easier to handle with subsequent recombination is an essential process. These actions, referred to as splitting and merging, enable systems to comprehend the language structure more effectively and allow for analysi
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences