Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • NLP
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • NumPy
  • Pandas
  • OpenCV
  • R
  • Machine Learning Tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
Open In App
Next Article:
Transfer learning & fine-tuning using Keras
Next article icon

Transfer Learning with Fine-Tuning in NLP

Last Updated : 17 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we will explore the principles of Transfer Learning and Fine-Tuning in the context of Natural Language Processing (NLP). We will fine-tune a pre-trained model, BERT to perform sentiment analysis.

By following this guide, you will understand how to use Hugging Face's transformers library to fine-tune a pretrained BERT model for text classification.

BERT (Bidirectional Encoder Representations from Transformers) is designed to understand the context of a word based on all of its surrounding words, rather than just the words that precede or follow it.

The bidirectional approach allows BERT to capture deep contextual relationships and meanings within sentences or documents, making it highly effective for a variety of NLP tasks.

However, BERT was initially pre-trained on a general corpus without any specific tasks in mind.

Why Fine-Tune BERT?

  • Fine-tuningallows us to leverage this pre-trained knowledge by adapting the model to a particular NLP task using a smaller, task-specific dataset.
  • This fine-tuning process not only saves computational resources but also enhances the model's ability to generalize to new and unseen data.

Fine-Tuning BERT Model for Sentiment Analysis

Let's begin with implementation.

Step 1: Install and Import Required Libraries

First, install the necessary libraries if you haven’t already:

!pip install transformers 

Importing necessary libraries:

Python
import torch import transformers from transformers import AdamW, BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, TensorDataset import torch.nn.functional as F 


Step 2: Load the Pre-Trained BERT Model and Tokenizer

We will use bert-base-uncased, a pretrained BERT model, and its tokenizer.

  • BertTokenizer.from_pretrained(pretrained_model_name): Loads the tokenizer for tokenizing text into input IDs.
  • BertForSequenceClassification.from_pretrained(pretrained_model_name, num_labels=2): Loads a BERT model for binary classification.
Python
pretrained_model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(pretrained_model_name) model = BertForSequenceClassification.from_pretrained(pretrained_model_name,                                                        num_labels=2) 


Move the model to GPU if available:

Python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) 

Step 3: Prepare the Training Dataset

We create a labeled dataset for sentiment analysis. Here, 1 represents positive sentiment and 0 represent negative sentiment.

Python
train_texts = [     "I love this product, it's amazing!",  # Positive     "Absolutely fantastic experience, will buy again!",  # Positive     "Worst purchase ever. Completely useless.",  # Negative     "I hate this item, it doesn't work!",  # Negative     "The quality is top-notch, highly recommend!",  # Positive     "Terrible service, never coming back.",  # Negative     "This is the best thing I've ever bought!",  # Positive     "Very disappointing. Waste of money.",  # Negative     "Superb! Exceeded all my expectations.",  # Positive     "Not worth the price at all.",  # Negative ] train_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 1, 0]).to(device) 


Tokenize the Dataset

  • padding=True: Ensures all input sequences have the same length.
  • truncation=True: Shortens long sentences beyond max_length=128.
Python
encoded_train = tokenizer(train_texts,                            padding=True,                            truncation=True,                            max_length=128,                            return_tensors='pt') train_input_ids = encoded_train['input_ids'].to(device) train_attention_masks = encoded_train['attention_mask'].to(device) 

Step 4: Create a DataLoader for Efficient Training

Convert data into PyTorch Dataloader:

  • TensorDataset(): Combines input IDs, attention masks, and labels into a dataset.
  • DataLoader(): Loads data in mini-batches to improve efficiency.
Python
train_dataset = TensorDataset(train_input_ids, train_attention_masks, train_labels) train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True) 

Step 5: Define the Training Loop

Define the optimizer:

Python
optimizer = AdamW(model.parameters(), lr=2e-5) 


Train the model:

Python
epochs = 5 model.train() for epoch in range(epochs):     total_loss = 0     correct = 0     total = 0      for batch in train_loader:         batch_input_ids, batch_attention_masks, batch_labels = batch          optimizer.zero_grad()         outputs = model(input_ids=batch_input_ids,                          attention_mask=batch_attention_masks,                          labels=batch_labels)          loss = outputs.loss         logits = outputs.logits          total_loss += loss.item()         loss.backward()         optimizer.step()          preds = torch.argmax(F.softmax(logits, dim=1), dim=1)         correct += (preds == batch_labels).sum().item()         total += batch_labels.size(0)      avg_loss = total_loss / len(train_loader)     accuracy = correct / total * 100     print(f"Epoch {epoch+1} - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%") 


The model computes loss and backpropogates gradients. It tracks accuracy of each epoch.

Step 6: Save and Load the Fine-Tuned Model

Save the model:

Python
torch.save(model.state_dict(), "fine_tuned_bert.pth") 


Load the fine-tuned model later:

Python
model.load_state_dict(torch.load("fine_tuned_bert.pth")) model.to(device) 


Step 7: Evaluate on the test dataset

Define Test Samples

Python
test_texts = [     "This is a great product, I love it!",  # Positive     "Horrible experience, I want a refund!",  # Negative     "Highly recommended! Five stars.",  # Positive     "Not worth it. I regret buying this.",  # Negative ] test_labels = torch.tensor([1, 0, 1, 0]).to(device) 


Tokenize test data:

Python
encoded_test = tokenizer(test_texts,                           padding=True,                           truncation=True,                           max_length=128,                           return_tensors='pt') test_input_ids = encoded_test['input_ids'].to(device) test_attention_masks = encoded_test['attention_mask'].to(device) 


Step 8: Make Predictions

Python
model.eval() with torch.no_grad():     outputs = model(input_ids=test_input_ids,                      attention_mask=test_attention_masks)     predicted_labels = torch.argmax(outputs.logits, dim=1)  test_accuracy = (predicted_labels == test_labels).sum().item() / len(test_labels) * 100 print(f"\nTest Accuracy: {test_accuracy:.2f}%")  for text, label in zip(test_texts, predicted_labels):     print(f'Text: {text}\nPredicted Label: {label.item()}\n') 

Output:

Epoch 1 - Loss: 0.8377, Accuracy: 50.00%
Epoch 2 - Loss: 0.6050, Accuracy: 50.00%
Epoch 3 - Loss: 0.4371, Accuracy: 90.00%
Epoch 4 - Loss: 0.3349, Accuracy: 100.00%
Epoch 5 - Loss: 0.2301, Accuracy: 100.00%

Test Accuracy: 75.00%
Text: This is a great product, I love it!
Predicted Label: 1

Text: Horrible experience, I want a refund!
Predicted Label: 1

Text: Highly recommended! Five stars.
Predicted Label: 1

Text: Not worth it. I regret buying this.
Predicted Label: 0

Complete Code:

Python
import torch import transformers from transformers import AdamW, BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, TensorDataset, random_split import torch.nn.functional as F  # Load Pretrained BERT Tokenizer & Model pretrained_model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(pretrained_model_name) model = BertForSequenceClassification.from_pretrained(pretrained_model_name, num_labels=2)  # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device)  # Define a Larger Training Dataset train_texts = [     "I love this product, it's amazing!",  # Positive     "Absolutely fantastic experience, will buy again!",  # Positive     "Worst purchase ever. Completely useless.",  # Negative     "I hate this item, it doesn't work!",  # Negative     "The quality is top-notch, highly recommend!",  # Positive     "Terrible service, never coming back.",  # Negative     "This is the best thing I've ever bought!",  # Positive     "Very disappointing. Waste of money.",  # Negative     "Superb! Exceeded all my expectations.",  # Positive     "Not worth the price at all.",  # Negative ] train_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 1, 0]).to(device)  # 1 = Positive, 0 = Negative  # Tokenize Training Data encoded_train = tokenizer(train_texts, padding=True, truncation=True, max_length=128, return_tensors='pt') train_input_ids = encoded_train['input_ids'].to(device) train_attention_masks = encoded_train['attention_mask'].to(device)  # Create PyTorch Dataset & DataLoader train_dataset = TensorDataset(train_input_ids, train_attention_masks, train_labels) train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)  # Mini-batches of size 2  # Training Parameters epochs = 5 optimizer = AdamW(model.parameters(), lr=2e-5)  # Training Loop with Mini-Batch Processing model.train() for epoch in range(epochs):     total_loss = 0     correct = 0     total = 0          for batch in train_loader:         batch_input_ids, batch_attention_masks, batch_labels = batch          optimizer.zero_grad()         outputs = model(input_ids=batch_input_ids, attention_mask=batch_attention_masks, labels=batch_labels)          loss = outputs.loss         logits = outputs.logits          total_loss += loss.item()         loss.backward()         optimizer.step()          # Compute Training Accuracy         preds = torch.argmax(F.softmax(logits, dim=1), dim=1)         correct += (preds == batch_labels).sum().item()         total += batch_labels.size(0)      avg_loss = total_loss / len(train_loader)     accuracy = correct / total * 100     print(f"Epoch {epoch+1} - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")  # Save Fine-tuned Model torch.save(model.state_dict(), "fine_tuned_bert.pth")  # Switch to Evaluation Mode model.eval()  # Test Dataset test_texts = [     "This is a great product, I love it!",  # Positive     "Horrible experience, I want a refund!",  # Negative     "Highly recommended! Five stars.",  # Positive     "Not worth it. I regret buying this.",  # Negative ] test_labels = torch.tensor([1, 0, 1, 0]).to(device)  # Tokenize Test Data encoded_test = tokenizer(test_texts, padding=True, truncation=True, max_length=128, return_tensors='pt') test_input_ids = encoded_test['input_ids'].to(device) test_attention_masks = encoded_test['attention_mask'].to(device)  # Run Model on Test Data with torch.no_grad():     outputs = model(input_ids=test_input_ids, attention_mask=test_attention_masks)     predicted_labels = torch.argmax(outputs.logits, dim=1)  # Compute Test Accuracy test_accuracy = (predicted_labels == test_labels).sum().item() / len(test_labels) * 100 print(f"\nTest Accuracy: {test_accuracy:.2f}%")  # Print Predictions for text, label in zip(test_texts, predicted_labels):     print(f'Text: {text}\nPredicted Label: {label.item()}\n') 

In this tutorial, we fine-tuned a pretrained BERT model using transfer learning for sentiment analysis. The step-by-step process included:

  1. Loading the BERT model and tokenizer.
  2. Preparing a training dataset.
  3. Fine-tuning using mini-batch training.
  4. Evaluating the test accuracy.

This approach allows BERT to learn domain-specific knowledge while leveraging its powerful language understanding capabilities.


Next Article
Transfer learning & fine-tuning using Keras

A

anagha730
Improve
Article Tags :
  • Machine Learning
  • Deep Learning
  • AI-ML-DS
  • Natural-language-processing
Practice Tags :
  • Machine Learning

Similar Reads

  • Transfer learning & fine-tuning using Keras
    Transfer learning is a powerful technique used in deep learning tasks. Here, a model developed for a particular task is reused as a starting point for a model on the second task. Thus, transfer learning uses the knowledge gained from a pre-trained model and allows faster convergence with better perf
    7 min read
  • Transfer Learning in NLP
    Transfer learning is an important tool in natural language processing (NLP) that helps build powerful models without needing massive amounts of data. This article explains what transfer learning is, why it's important in NLP, and how it works. Table of Content Why Transfer Learning is important in N
    15+ min read
  • Difference Between Fine-Tuning and Transfer Learning
    Fine-tuning and transfer learning both allow models to take advantage of knowledge learn from one task to improve performance on another. These methods may seem similar at first glance, but they have distinct differences in terms of how they are applied and their underlying mechanisms. Transfer lear
    4 min read
  • What is Transfer Learning?
    Transfer learning is a machine learning technique where a model trained on one task is repurposed as the foundation for a second task. This approach is beneficial when the second task is related to the first or when data for the second task is limited. Leveraging learned features from the initial ta
    11 min read
  • Next Word Prediction with Deep Learning in NLP
    Next Word Prediction is a natural language processing (NLP) task where a model predicts the most likely word that should follow a given sequence of words in a sentence. It is a fundamental concept in language modeling and is widely used in various applications such as autocomplete systems, chatbots,
    7 min read
  • Transfer Learning in Data Mining
    Transfer learning is the way in which humans apply their knowledge in a task to learn another task. Transfer learning gains the knowledge from one or more tasks that were successfully approved and applies this knowledge to solve the new problem. In Transfer learning, the distributions and the data d
    4 min read
  • Few-Shot Learning vs Transfer Learning
    In machine learning few-shot learning and transfer learning are some knowledge transfer approaches. Few-shot learning focus on model to quickly adapt to new tasks using very few examples. On the other hand Transfer learning uses pre-trained model knowledge for task related to pre trained model. Whil
    3 min read
  • Machine Translation with Transformer in Python
    Machine translation converts text from one language to another and enable tools like Google Translate. Modern translation models use Transformer-based architectures which capture context efficiently. In this article we fine-tune a pre-trained Transformer model from Hugging Face to translate English
    6 min read
  • Multiclass image classification using Transfer learning
    Image classification is one of the supervised machine learning problems which aims to categorize the images of a dataset into their respective categories or labels. Classification of images of various dog breeds is a classic image classification problem. So, we have to classify more than one class t
    9 min read
  • Data Transformation in Machine Learning
    Often the data received in a machine learning project is messy and missing a bunch of values, creating a problem while we try to train our model on the data without altering it. In building a machine learning project that could predict the outcome of data well, the model requires data to be presente
    15+ min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences