Transfer Learning with Fine-Tuning in NLP

Last Updated : 17 Feb, 2025

In this article, we will explore the principles of Transfer Learning and Fine-Tuning in the context of Natural Language Processing (NLP). We will fine-tune a pre-trained model, BERT to perform sentiment analysis.

By following this guide, you will understand how to use Hugging Face's transformers library to fine-tune a pretrained BERT model for text classification.

BERT (Bidirectional Encoder Representations from Transformers) is designed to understand the context of a word based on all of its surrounding words, rather than just the words that precede or follow it.
The bidirectional approach allows BERT to capture deep contextual relationships and meanings within sentences or documents, making it highly effective for a variety of NLP tasks.

However, BERT was initially pre-trained on a general corpus without any specific tasks in mind.

Why Fine-Tune BERT?

Fine-tuningallows us to leverage this pre-trained knowledge by adapting the model to a particular NLP task using a smaller, task-specific dataset.
This fine-tuning process not only saves computational resources but also enhances the model's ability to generalize to new and unseen data.

Fine-Tuning BERT Model for Sentiment Analysis

Let's begin with implementation.

Step 1: Install and Import Required Libraries

First, install the necessary libraries if you haven’t already:

!pip install transformers

Importing necessary libraries:

Python

import torch import transformers from transformers import AdamW, BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, TensorDataset import torch.nn.functional as F

Step 2: Load the Pre-Trained BERT Model and Tokenizer

We will use bert-base-uncased, a pretrained BERT model, and its tokenizer.

BertTokenizer.from_pretrained(pretrained_model_name): Loads the tokenizer for tokenizing text into input IDs.
BertForSequenceClassification.from_pretrained(pretrained_model_name, num_labels=2): Loads a BERT model for binary classification.

Python

pretrained_model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(pretrained_model_name) model = BertForSequenceClassification.from_pretrained(pretrained_model_name,                                                        num_labels=2)

Move the model to GPU if available:

Python

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device)

Step 3: Prepare the Training Dataset

We create a labeled dataset for sentiment analysis. Here, 1 represents positive sentiment and 0 represent negative sentiment.

Python

train_texts = [     "I love this product, it's amazing!",  # Positive     "Absolutely fantastic experience, will buy again!",  # Positive     "Worst purchase ever. Completely useless.",  # Negative     "I hate this item, it doesn't work!",  # Negative     "The quality is top-notch, highly recommend!",  # Positive     "Terrible service, never coming back.",  # Negative     "This is the best thing I've ever bought!",  # Positive     "Very disappointing. Waste of money.",  # Negative     "Superb! Exceeded all my expectations.",  # Positive     "Not worth the price at all.",  # Negative ] train_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 1, 0]).to(device)

Tokenize the Dataset

padding=True: Ensures all input sequences have the same length.
truncation=True: Shortens long sentences beyond max_length=128.

Python

encoded_train = tokenizer(train_texts,                            padding=True,                            truncation=True,                            max_length=128,                            return_tensors='pt') train_input_ids = encoded_train['input_ids'].to(device) train_attention_masks = encoded_train['attention_mask'].to(device)

Step 4: Create a DataLoader for Efficient Training

Convert data into PyTorch Dataloader:

TensorDataset(): Combines input IDs, attention masks, and labels into a dataset.
DataLoader(): Loads data in mini-batches to improve efficiency.

Python

train_dataset = TensorDataset(train_input_ids, train_attention_masks, train_labels) train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)

Step 5: Define the Training Loop

Define the optimizer:

Python

optimizer = AdamW(model.parameters(), lr=2e-5)

Train the model:

Python

epochs = 5 model.train() for epoch in range(epochs):     total_loss = 0     correct = 0     total = 0      for batch in train_loader:         batch_input_ids, batch_attention_masks, batch_labels = batch          optimizer.zero_grad()         outputs = model(input_ids=batch_input_ids,                          attention_mask=batch_attention_masks,                          labels=batch_labels)          loss = outputs.loss         logits = outputs.logits          total_loss += loss.item()         loss.backward()         optimizer.step()          preds = torch.argmax(F.softmax(logits, dim=1), dim=1)         correct += (preds == batch_labels).sum().item()         total += batch_labels.size(0)      avg_loss = total_loss / len(train_loader)     accuracy = correct / total * 100     print(f"Epoch {epoch+1} - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")

The model computes loss and backpropogates gradients. It tracks accuracy of each epoch.

Step 6: Save and Load the Fine-Tuned Model

Save the model:

Python

torch.save(model.state_dict(), "fine_tuned_bert.pth")

Load the fine-tuned model later:

Python

model.load_state_dict(torch.load("fine_tuned_bert.pth")) model.to(device)

Step 7: Evaluate on the test dataset

Define Test Samples

Python

test_texts = [     "This is a great product, I love it!",  # Positive     "Horrible experience, I want a refund!",  # Negative     "Highly recommended! Five stars.",  # Positive     "Not worth it. I regret buying this.",  # Negative ] test_labels = torch.tensor([1, 0, 1, 0]).to(device)

Tokenize test data:

Python

encoded_test = tokenizer(test_texts,                           padding=True,                           truncation=True,                           max_length=128,                           return_tensors='pt') test_input_ids = encoded_test['input_ids'].to(device) test_attention_masks = encoded_test['attention_mask'].to(device)

Step 8: Make Predictions

Python

model.eval() with torch.no_grad():     outputs = model(input_ids=test_input_ids,                      attention_mask=test_attention_masks)     predicted_labels = torch.argmax(outputs.logits, dim=1)  test_accuracy = (predicted_labels == test_labels).sum().item() / len(test_labels) * 100 print(f"\nTest Accuracy: {test_accuracy:.2f}%")  for text, label in zip(test_texts, predicted_labels):     print(f'Text: {text}\nPredicted Label: {label.item()}\n')

Output:

Epoch 1 - Loss: 0.8377, Accuracy: 50.00%
Epoch 2 - Loss: 0.6050, Accuracy: 50.00%
Epoch 3 - Loss: 0.4371, Accuracy: 90.00%
Epoch 4 - Loss: 0.3349, Accuracy: 100.00%
Epoch 5 - Loss: 0.2301, Accuracy: 100.00%

Test Accuracy: 75.00%
Text: This is a great product, I love it!
Predicted Label: 1

Text: Horrible experience, I want a refund!
Predicted Label: 1

Text: Highly recommended! Five stars.
Predicted Label: 1

Text: Not worth it. I regret buying this.
Predicted Label: 0

Complete Code:

Python

import torch import transformers from transformers import AdamW, BertTokenizer, BertForSequenceClassification from torch.utils.data import DataLoader, TensorDataset, random_split import torch.nn.functional as F  # Load Pretrained BERT Tokenizer & Model pretrained_model_name = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(pretrained_model_name) model = BertForSequenceClassification.from_pretrained(pretrained_model_name, num_labels=2)  # Move model to GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device)  # Define a Larger Training Dataset train_texts = [     "I love this product, it's amazing!",  # Positive     "Absolutely fantastic experience, will buy again!",  # Positive     "Worst purchase ever. Completely useless.",  # Negative     "I hate this item, it doesn't work!",  # Negative     "The quality is top-notch, highly recommend!",  # Positive     "Terrible service, never coming back.",  # Negative     "This is the best thing I've ever bought!",  # Positive     "Very disappointing. Waste of money.",  # Negative     "Superb! Exceeded all my expectations.",  # Positive     "Not worth the price at all.",  # Negative ] train_labels = torch.tensor([1, 1, 0, 0, 1, 0, 1, 0, 1, 0]).to(device)  # 1 = Positive, 0 = Negative  # Tokenize Training Data encoded_train = tokenizer(train_texts, padding=True, truncation=True, max_length=128, return_tensors='pt') train_input_ids = encoded_train['input_ids'].to(device) train_attention_masks = encoded_train['attention_mask'].to(device)  # Create PyTorch Dataset & DataLoader train_dataset = TensorDataset(train_input_ids, train_attention_masks, train_labels) train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)  # Mini-batches of size 2  # Training Parameters epochs = 5 optimizer = AdamW(model.parameters(), lr=2e-5)  # Training Loop with Mini-Batch Processing model.train() for epoch in range(epochs):     total_loss = 0     correct = 0     total = 0          for batch in train_loader:         batch_input_ids, batch_attention_masks, batch_labels = batch          optimizer.zero_grad()         outputs = model(input_ids=batch_input_ids, attention_mask=batch_attention_masks, labels=batch_labels)          loss = outputs.loss         logits = outputs.logits          total_loss += loss.item()         loss.backward()         optimizer.step()          # Compute Training Accuracy         preds = torch.argmax(F.softmax(logits, dim=1), dim=1)         correct += (preds == batch_labels).sum().item()         total += batch_labels.size(0)      avg_loss = total_loss / len(train_loader)     accuracy = correct / total * 100     print(f"Epoch {epoch+1} - Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")  # Save Fine-tuned Model torch.save(model.state_dict(), "fine_tuned_bert.pth")  # Switch to Evaluation Mode model.eval()  # Test Dataset test_texts = [     "This is a great product, I love it!",  # Positive     "Horrible experience, I want a refund!",  # Negative     "Highly recommended! Five stars.",  # Positive     "Not worth it. I regret buying this.",  # Negative ] test_labels = torch.tensor([1, 0, 1, 0]).to(device)  # Tokenize Test Data encoded_test = tokenizer(test_texts, padding=True, truncation=True, max_length=128, return_tensors='pt') test_input_ids = encoded_test['input_ids'].to(device) test_attention_masks = encoded_test['attention_mask'].to(device)  # Run Model on Test Data with torch.no_grad():     outputs = model(input_ids=test_input_ids, attention_mask=test_attention_masks)     predicted_labels = torch.argmax(outputs.logits, dim=1)  # Compute Test Accuracy test_accuracy = (predicted_labels == test_labels).sum().item() / len(test_labels) * 100 print(f"\nTest Accuracy: {test_accuracy:.2f}%")  # Print Predictions for text, label in zip(test_texts, predicted_labels):     print(f'Text: {text}\nPredicted Label: {label.item()}\n')

In this tutorial, we fine-tuned a pretrained BERT model using transfer learning for sentiment analysis. The step-by-step process included:

Loading the BERT model and tokenizer.
Preparing a training dataset.
Fine-tuning using mini-batch training.
Evaluating the test accuracy.

This approach allows BERT to learn domain-specific knowledge while leveraging its powerful language understanding capabilities.

Transfer learning & fine-tuning using Keras

anagha730

Improve

Article Tags :

Practice Tags :

Machine Learning

Transfer Learning with Fine-Tuning in NLP

Why Fine-Tune BERT?

Fine-Tuning BERT Model for Sentiment Analysis

Step 1: Install and Import Required Libraries

Step 2: Load the Pre-Trained BERT Model and Tokenizer

Step 3: Prepare the Training Dataset

Step 4: Create a DataLoader for Efficient Training

Step 5: Define the Training Loop

Step 6: Save and Load the Fine-Tuned Model

Step 7: Evaluate on the test dataset

Step 8: Make Predictions

Complete Code:

Similar Reads