Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
TF-IDF Representations in TensorFlow
Next article icon

TF-IDF Representations in TensorFlow

Last Updated : 12 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Text data is one of the most common forms of unstructured data, and converting it into a numerical representation is essential for machine learning models.

Term Frequency-Inverse Document Frequency (TF-IDF) is a widely used text vectorization technique that helps represent text in a way that captures word importance. It evaluates the importance of a word in a document relative to a collection (corpus) of documents. It consists of two components:

  1. Term Frequency (TF): Measures how often a word appears in a document.
    TF(w) = \frac{\text{Number of times word w appears in the document}}{\text{Total number of words in the document}}
  2. Inverse Document Frequency (IDF): Measures the significance of a word across multiple documents.
    IDF(w) = \log \left( \frac{\text{Total number of documents}}{\text{Number of documents containing the word w}} + 1 \right)

The final TF-IDF score is calculated as:

TF-IDF(w) = TF(w) \times IDF(w)

Words that appear frequently in a document but are rare across the corpus will have higher TF-IDF scores.

Implementing TF-IDF in TensorFlow

TensorFlow provides efficient ways to handle text preprocessing, including TF-IDF representation. We will use the tf.keras.layers.TextVectorization layer to compute TF-IDF features.

Step 1: Import Required Libraries

Python
import tensorflow as tf import numpy as np 


Step 2: Prepare the Dataset

Python
corpus = [     "TensorFlow is an open-source machine learning framework.",     "Machine learning models improve by training on data.",     "Deep learning is a subset of machine learning.",     "TF-IDF helps in text vectorization for NLP tasks." ] 

Step 3: Create a TextVectorization Layer with TF-IDF Mode

TensorFlow’s TextVectorization layer can be used to automatically compute TF-IDF values.

Python
vectorizer = tf.keras.layers.TextVectorization(     output_mode="tf_idf",     ngrams=None )  # Adapting the vectorizer to the corpus vectorizer.adapt(corpus) 

Step 4: Convert Text to TF-IDF Representation

Python
tfidf_matrix = vectorizer(corpus) tfidf_matrix_np = tfidf_matrix.numpy()  # Print the TF-IDF matrix print(tfidf_matrix_np) 

Output:

tfmatrix

Each row in the TF-IDF matrix corresponds to a document in the corpus, and each column represents a tokenized word. The values indicate the importance of words within each document.

Advantages of Using TensorFlow for TF-IDF

  • Scalability: TensorFlow handles large text datasets efficiently using GPU acceleration.
  • Ease of Integration: Works seamlessly with other TensorFlow components like tf.data pipelines.
  • Customization: Allows users to apply preprocessing (lowercasing, tokenization) and integrate TF-IDF with deep learning models.

TF-IDF is a fundamental technique for representing text in a way that emphasizes important words. TensorFlow’s TextVectorization layer simplifies TF-IDF computation, making it a great choice for NLP applications. With this approach, you can efficiently preprocess text and feed it into machine learning models for tasks like classification, clustering, and information retrieval.


Next Article
TF-IDF Representations in TensorFlow

S

sanjulika_sharma
Improve
Article Tags :
  • NLP
  • AI-ML-DS
  • Tensorflow
  • AI-ML-DS With Python

Similar Reads

    Bag-of-Words Representations in TensorFlow
    Bag-of-Words (BoW) converts text into numerical vectors based on word occurrences, ignoring grammar and word order. The model represents text as a collection (bag) of words, where each word's frequency or presence is recorded. It follows these steps:Tokenization – Splitting text into words.Vocabular
    2 min read
    Tensorflow.js tf.fill() Function
    Tensorflow.js is an open-source library for creating machine learning models in Javascript that allows users to run the models directly in the browser. The tf.fill() is a function defined in the class tf.Tensor. It is used to create a tensor that is filled with a scalar value. Syntax: tf.fill( shape
    2 min read
    How to Reshape a Tensor in Tensorflow?
    Tensor reshaping is the process of reshaping the order and total number of elements in tensors while only the shape is being changed. It is a fundamental operation in TensorFlow that allows you to change the shape of a tensor without changing its underlying data. Using tf.reshape() In TensorFlow, th
    4 min read
    Tensor Data type in Tensorflow
    In the realm of data science and machine learning, understanding the tensor data type is fundamental, particularly when working with TensorFlow. Tensors are the core data structures used in TensorFlow to represent and manipulate data. This article explores the concept of tensors in the context of a
    5 min read
    Tensor Indexing in Tensorflow
    In the realm of machine learning and deep learning, tensors are fundamental data structures used to represent numerical data with multiple dimensions. TensorFlow, a powerful numerical computation library, equips you with an intuitive and versatile set of operations for manipulating and accessing dat
    10 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences