Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Object Detection using TensorFlow
Next article icon

Object Detection using TensorFlow

Last Updated : 28 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, provides a robust collection of tools for developing and deploying object detection models.

In this article, we will go over the fundamentals of using TensorFlow for object identification. TensorFlow provides a flexible and efficient framework to match your demands, whether you're working on a computer vision research project or designing apps that require real-time object identification capabilities. Let's get into the specifics of utilizing TensorFlow to develop object detection and realize the full potential of this cutting-edge technology.

What is Object detection?

Object detection is a computer vision task that involves identifying and locating multiple objects within an image or video. The goal is not just to classify what is in the image but also to precisely outline and pinpoint where each object is located.

Key Concepts in Object Detection:

  • Bounding Boxes
    • Object detection involves drawing bounding boxes around detected objects. A bounding box is a rectangle that encloses an object and is defined by its coordinates—typically, (x_min, y_min) for the top-left corner and (x_max, y_max) for the bottom-right corner.
  • Object Localization
    • Localization is the process of determining the object's location within the image. It involves predicting the coordinates of the bounding box that encapsulates the object.
  • Class Prediction
    • Object detection not only locates objects but also categorizes them into different classes (e.g., person, car, dog). Each object is assigned a class label, providing information about what the object is.
  • Model Architectures
    • Numerous architectures are used for object detection, such as SSD (Single Shot Multibox Detector), Faster R-CNN (Region-based Convolutional Neural Network), and YOLO (You Only Look Once). These models differ in their approach to balancing speed and accuracy.

Object Detection using TensorFlow

Setting Up TensorFlow

Begin by installing TensorFlow using pip:

!pip install tensorflow

Ensure that you have the necessary dependencies, and if you have a compatible GPU, consider installing TensorFlow with GPU support for faster training.

Choosing a Pre-trained Model

TensorFlow provides pre-trained models on large datasets like COCO (Common Objects in Context). These models serve as a starting point for transfer learning. Common models include Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). For this tutorial we will be using the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 model.

Understanding the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 Model

  • SSD (Single Shot Multibox Detector): SSD is a popular object detection algorithm known for its speed and accuracy. It's designed to detect objects of different scales and aspect ratios in a single pass.
  • MobileNetV2: MobileNetV2 is a lightweight neural network architecture optimized for mobile and edge devices. It strikes a balance between efficiency and performance, making it ideal for real-time applications.
  • 640x640: This denotes the input image size the model expects. Larger input sizes often yield more accurate results but require more computational resources. These models are also smaller in size than models trained on larger images like 1024x1024. Also the inference time is shorter.
    • Example: centernet_hg104_1024x1024_coco17_tpu-32 is a model of 1.33 GBs
    • while ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 stands at 19 MBs
    • and efficientdet_d1_coco17_tpu-32 is of 50 MB (for 640x640 images)
    • the inference time for all three in Google Colab is around 42s, 0s and 4s. You can clearly see how size affects the inference time of the models
  • COCO (Common Objects in Context) Dataset: The COCO dataset is a large-scale dataset for object detection, segmentation, and captioning. It encompasses a diverse range of object categories and is widely used for training and evaluating computer vision models.
  • TPU-8 (Tensor Processing Unit - 8): TensorFlow's TPUs are custom hardware accelerators designed for machine learning workloads. The "8" refers to the number of cores, indicating enhanced parallel processing capabilities.

Now that we have everything needed, let's begin with the code:

Step 1: Import Libraries

First let's import the necessary libraries for TensorFlow, NumPy, OpenCV, Pillow, and Matplotlib.

Python3
import tensorflow as tf import numpy as np import cv2 from PIL import Image from matplotlib import pyplot as plt from random import randint 

Step 2: Download, Extract and Load the Pre-trained Model

Now, load the pre-trained model using TensorFlow's SavedModel format.

Python3
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz !tar -xzvf ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz  model = tf.saved_model.load("ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model") 

Step 3: Load and Preprocess Image

In this step, load an image, convert it to a NumPy array, and preprocess it for input to the model, as the model can't directly work on an image therefore we first converted it into a tensor.

Python3
image = Image.open("detect.jpg") image_np = np.array(image) input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.uint8) image 

Output:

detect(1)

Step 5: Perform Object Detection

Here we use the loaded model to perform object detection on the input image and extract bounding box coordinates, class IDs, and scores.

Python3
detection = model(input_tensor)  # Parse the detection results boxes = detection['detection_boxes'].numpy() classes = detection['detection_classes'].numpy().astype(int) scores = detection['detection_scores'].numpy() 

Step 6: Add the COCO Labels

These are the labels for the COCO dataset, which contains class names corresponding to class IDs.

The Model only gives us the integer values of classes that it was trained on i.e. the COCO dataset, to translate those integer values into meaningful class names we need these labels.

Python3
labels = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',           'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter',            'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',            'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',            'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',           'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',           'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',            'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',           'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator',           'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] 

Before going further let's learn about some concepts:

  • Confidence
    • Confidence in object detection represents how certain the model is about its predictions. It's like a measure of how sure the model is that it correctly identified an object in an image. Confidence values range from 0 to 1, where 1 means the model is very confident in its prediction.
  • Normalized Coordinates
    • Normalized coordinates are a way to describe the location of an object in an image in a standardized manner. Instead of using pixel values, which can vary based on image size, normalization scales coordinates to a consistent range, usually between 0 and 1.

Let's understand with an Analogy

Think of a treasure map. Instead of saying "walk 50 steps north," which depends on the map's size, you say "walk halfway up the map." Normalized coordinates provide a universal language for pinpointing locations.

Step 7: Visualize the detected objects

Now let's look at the code

We iterate through the detected objects, filter out low-confidence detections, convert coordinates, get class names, and visualize the result with randomly colored boxes. Adjust the confidence threshold (0.5 in this case) and other parameters as needed.

Python3
for i in range(classes.shape[1]):     class_id = int(classes[0, i])     score = scores[0, i]      if np.any(score > 0.5):  # Filter out low-confidence detections         h, w, _ = image_np.shape         ymin, xmin, ymax, xmax = boxes[0, i]          # Convert normalized coordinates to image coordinates         xmin = int(xmin * w)         xmax = int(xmax * w)         ymin = int(ymin * h)         ymax = int(ymax * h)          # Get the class name from the labels list         class_name = labels[class_id]          random_color = (randint(0, 256), randint(0, 256), randint(0, 256))          # Draw bounding box and label on the image         cv2.rectangle(image_np, (xmin, ymin), (xmax, ymax), random_color, 2)         label = f"Class: {class_name}, Score: {score:.2f}"         cv2.putText(image_np, label, (xmin, ymin - 10),                     cv2.FONT_HERSHEY_SIMPLEX, 0.5, random_color, 2)  # Display the result plt.imshow(image_np) plt.axis('off') plt.show() 

Output:

final-ssd-detetcion-0s

Applications of object detection:

Object detection finds applications in diverse fields, including:

  • Autonomous Vehicles: Identifying pedestrians, other vehicles, and obstacles.
  • Surveillance Systems: Monitoring and tracking objects in real-time.
  • Medical Imaging: Detecting anomalies or specific structures in medical images.
  • Retail Analytics: Tracking products and customer behavior in stores.
  • Augmented Reality: Overlapping digital information on real-world objects.
  • Implementing Object Detection using TensorFlow

Conclusion

Object detection with models like these opens doors to a myriad of applications. From autonomous vehicles and surveillance systems to retail analytics and augmented reality, the impact is profound. As technology advances, we can anticipate further developments in model architectures, dataset diversity, and real-time deployment, ushering in a new era of intelligent visual perception.


Next Article
Object Detection using TensorFlow

A

as904465
Improve
Article Tags :
  • Geeks Premier League
  • Computer Vision
  • AI-ML-DS
  • Tensorflow
  • Geeks Premier League 2023

Similar Reads

    Real-Time Object Detection Using TensorFlow
    In November 2015, Google's deep artificial intelligence research division introduced TensorFlow, a cutting-edge machine learning library initially designed for internal purposes. This open-source library revolutionized the field, which helped researchers and developers in building, training, and dep
    11 min read
    Object Detection with YOLO using TensorFlow
    YOLO, or "You Only Look Once," is a family of deep learning models that enable real-time object detection by treating the task as a single regression problem. Unlike traditional methods that apply detection across multiple regions of an image, YOLO detects objects in one pass, which makes it fast an
    15+ min read
    Image Recognition using TensorFlow
    Image recognition is a task where a model identifies objects in an image and assigns labels to them. For example a model can be trained to identify difference between different types of flowers, animals or traffic signs. In this article, we will use Tensorflow and Keras to build a simple image recog
    5 min read
    Tensorflow.js tf.eye() Function
    Tensorflow.js is an open-source library for creating machine learning models in Javascript that allows users to run the models directly in the browser. The tf.eye() is a function defined in the class tf.Tensor. It’s used to create an identity matrix of specified rows and columns. An identity matrix
    3 min read
    Image Segmentation Using TensorFlow
    Image segmentation refers to the task of annotating a single class to different groups of pixels. While the input is an image, the output is a mask that draws the region of the shape in that image. Image segmentation has wide applications in domains such as medical image analysis, self-driving cars,
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences