Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Text Summarization in NLP
Next article icon

Text Summarization in NLP

Last Updated : 22 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Automatic Text Summarization is a key technique in Natural Language Processing (NLP) that uses algorithms to reduce large texts while preserving essential information. Although it doesn’t receive as much attention as other machine learning breakthroughs, text summarization technology has seen continuous improvements. By extracting key concepts and maintaining the original meaning, these systems can revolutionize industries such as banking, law, and healthcare, enabling faster decision-making and information retrieval.

There are two primary types of text summarization techniques:

  1. Extractive Summarization
  2. Abstractive Summarization

Extractive Summarization

Extractive summarization algorithms automatically generate summaries by selecting and combining key passages from the original text. Unlike human summarizers, these models focus on extracting the most important sentences without creating new content. The goal is to preserve the meaning of the original text while condensing it.

The TextRank algorithm is widely used for extractive summarization tasks. By ranking sentences based on their relevance and importance, it can generate a concise summary. Let's explore how this algorithm works with a sample text.

Utilizing TextRank Algorithm for Extractive Text Summarization

TextRank is implemented in the spaCy library. With the help of PyTextRank, a spaCy extension, we can efficiently apply the TextRank algorithm to summarize text. While extractive summarization provides a modified version of the original text by retaining key phrases, it does not generate entirely new content.

Prerequisites:

  1. spaCy: A Python library for NLP tasks.
  2. PyTextRank: A spaCy extension that implements the TextRank algorithm.

To install spaCy and the required language model, run the following commands:

!pip install spacy
!python3 -m spacy download en_core_web_lg

To install PyTextRank, run:

!pip install pytextrank

Here’s a simple implementation of spaCy and PyTextRank for automatic text summarization. The code installs the required packages, downloads the spaCy language model, and processes a lengthy text to extract key phrases and sentences. The summary is limited to two key phrases and two sentences.

Python
import spacy import pytextrank  nlp = spacy.load("en_core_web_lg") nlp.add_pipe("textrank")  example_text = """Deep learning (also known as deep structured learning) is part of a  broader family of machine learning methods based on artificial neural networks with  representation learning. Learning can be supervised, semi-supervised or unsupervised.  Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning,  recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing,  machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to  and in some cases surpassing human expert performance. Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological brains. Specifically,  neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier,  but that a network with a nonpolynomial activation function with one hidden layer of unbounded width can. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size,  which permits practical application and optimized implementation, while retaining theoretical universality  under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely  from biologically informed connectionist models, for the sake of efficiency, trainability and understandability,  whence the structured part.""" print('Original Document Size:',len(example_text)) doc = nlp(example_text)  for sent in doc._.textrank.summary(limit_phrases=2, limit_sentences=2):     print(sent)     print('Summary Length:',len(sent)) 

Output:

Original Document Size: 1808
Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
Summary Length: 76
Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue.
Summary Length: 27

Abstractive Summarization

Abstractive summarization generates entirely new sentences to convey key ideas from the original text. Unlike extractive summarization, which selects and rearranges sentences from the original content, abstractive methods rephrase information in a more concise and coherent manner, often using new vocabulary that wasn't present in the original.

Abstractive summarization has gained prominence with the advent of Transformer models, which have revolutionized NLP tasks. Initially, models based on recurrent neural networks (RNNs) were used for text summarization, but Transformers introduced a unique architecture that significantly improved performance.

Note: Not all Transformer models are designed for text summarization. One of the most notable models in this domain is PEGASUS, which has shown superior performance in generating high-quality summaries.

PEGASUS: A Transformer Model for Text Summarization

PEGASUS is a Transformer-based model designed specifically for text summarization. Unlike other models, PEGASUS uses a unique pre-training strategy where critical sentences are masked during training. The model is then tasked with generating these hidden sentences, which enables it to create more accurate and coherent summaries.

To use the PEGASUS model for text summarization, you need to install the following libraries and frameworks:

!pip install git+https://github.com/PyTorchLightning/pytorch-lightning
!pip install git+https://github.com/huggingface/transformers
!pip install sentencepiece
!pip install git+https://github.com/stas00/transformers
!pip install pegasus

Once the dependencies are installed, you can begin summarizing text with the PEGASUS model. Below is an example code snippet that uses the Hugging Face Transformers library to load the model, tokenize the input text, generate a summary, and display it.

Python
from transformers import pipeline from transformers import PegasusForConditionalGeneration, PegasusTokenizer  # Pick model model_name = "google/pegasus-xsum" # Load pretrained tokenizer pegasus_tokenizer = PegasusTokenizer.from_pretrained(model_name)  example_text = """ Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning.  Learning can be supervised, semi-supervised or unsupervised. Deep-learning architectures such as  deep neural networks, deep belief networks, deep reinforcement learning,  recurrent neural networks and convolutional neural networks have been applied to  fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis,  material inspection and board game programs, where they have produced results  comparable to and in some cases surpassing human expert performance.  Artificial neural networks (ANNs) were inspired by information processing and  distributed communication nodes in biological systems. ANNs have various differences  from biological brains. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier,  but that a network with a nonpolynomial activation function with one hidden layer of  unbounded width can. Deep learning is a modern variation which is concerned with an  unbounded number of layers of bounded size, which permits practical application and  optimized implementation, while retaining theoretical universality under mild conditions.  In deep learning the layers are also permitted to be heterogeneous and to deviate widely  from biologically informed connectionist models, for the sake of efficiency, trainability  and understandability, whence the structured part."""  print('Original Document Size:',len(example_text)) # Define PEGASUS model pegasus_model = PegasusForConditionalGeneration.from_pretrained(model_name) # Create tokens tokens = pegasus_tokenizer(example_text, truncation=True, padding="longest", return_tensors="pt")  # Generate the summary encoded_summary = pegasus_model.generate(**tokens)  # Decode the summarized text decoded_summary = pegasus_tokenizer.decode(encoded_summary[0], skip_special_tokens=True)  # Print the summary print('Decoded Summary :',decoded_summary)  summarizer = pipeline(     "summarization",      model=model_name,      tokenizer=pegasus_tokenizer,      framework="pt" )  summary = summarizer(example_text, min_length=30, max_length=150) summary[0]["summary_text"] 

Output:

Original Document Size: 1825
Decoded Summary : Deep learning is a branch of computer science that deals with the study and training of machine learning.
'Deep learning is a branch of computer science which deals with the study and training of complex systems such as speech recognition, natural language processing, machine translation and medical image analysis. Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and neuralal networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.'

Conclusion

The future of text summarization looks promising, with advancements in both extractive and abstractive methods, powered by models like PEGASUS. As these techniques evolve, they will enable more accurate and intuitive summarization, transforming how we process vast amounts of information. This progress highlights the growing potential of AI in enhancing human comprehension and knowledge management.


Next Article
Text Summarization in NLP

A

ashish_rao_2373
Improve
Article Tags :
  • Geeks Premier League
  • NLP
  • AI-ML-DS
  • Geeks Premier League 2023

Similar Reads

    Text Summarizations using HuggingFace Model
    Text summarization is a crucial task in natural language processing (NLP) that involves generating concise and coherent summaries from longer text documents. This task has numerous applications, such as creating summaries for news articles, research papers, and long-form content, making it easier fo
    5 min read
    Text Summarization Techniques
    Despite its manual-to-automated evolution facilitated by AI and ML progress, Text Summarization remains complex. Text Summarization is critical in news, document organization, and web exploration, increasing data usage and bettering decision-making. It enhances the comprehension of crucial informati
    6 min read
    Subword Tokenization in NLP
    Subword Tokenization is a Natural Language Processing technique(NLP) in which a word is split into subwords and these subwords are known as tokens. This technique is used in any NLP task where a model needs to maintain a large vocabulary and complex word structures. The concept behind this, frequent
    5 min read
    Text Preprocessing in NLP
    Natural Language Processing (NLP) has seen tremendous growth and development, becoming an integral part of various applications, from chatbots to sentiment analysis. One of the foundational steps in NLP is text preprocessing, which involves cleaning and preparing raw text data for further analysis o
    6 min read
    How to Perform Lemmatization in R?
    Lemmatization is a critical technique in the field of Natural Language Processing (NLP). It plays an essential role in text preprocessing by transforming words into their base or root forms, known as lemmas. This process helps standardize words that appear in different grammatical forms, reducing th
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences