Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Object Detection with YOLO using TensorFlow
Next article icon

Object Detection with YOLO using TensorFlow

Last Updated : 15 Nov, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

YOLO, or "You Only Look Once," is a family of deep learning models that enable real-time object detection by treating the task as a single regression problem. Unlike traditional methods that apply detection across multiple regions of an image, YOLO detects objects in one pass, which makes it fast and efficient. The YOLOv3 model improves over earlier versions by introducing multi-scale predictions and a more powerful backbone, called Darknet-53.

In this article, we’ll explore how to implement object detection with YOLOv3 using TensorFlow.

Pre-requisites: Convolution Neural Networks (CNNs), ResNet, TensorFlow

Key Features of YOLOv3 include:

  • Speed: Fast enough for real-time applications.
  • Accuracy: Provides good accuracy even with high-speed performance.
  • Multi-scale Detection: Detects objects at different scales, making it robust for small and large objects.

Before diving into the implementation, let's have a look at the components of YOLOv3.

Key Components of YOLOv3

The key components of YOLOv3 are:

  • Darknet-53 Backbone: A feature extraction network composed of 53 convolutional layers.
  • Detection Heads: Three detection layers that enable multi-scale predictions.
  • Anchor Boxes: Predefined bounding boxes of different sizes used to detect objects at various scales.

Implementing Object Detection using YOLOv3 and TensorFlow

Step 1: Import Necessary Libraries

For building YOLOv3 model, we need to set up our environment with the essential libraries for data handling, image processing, model creation and visualization.

You can install the libraries using following commands:

pip install opencv-python
pip install tensorflow

Let's go over these imports:

  • Data Handling: Libraries for data manipulation (numpy, pandas) and image file processing (cv2, os, glob).
  • Annotation Parsing: xml.etree.ElementTree for reading bounding box data in XML.
  • Visualization: matplotlib.pyplot to display images with bounding boxes.
  • TensorFlow & Keras: Key layers, regularizers, and losses for building and training the YOLOv3 model architecture.
Python
import numpy as np import pandas as pd import cv2, os, glob import xml.etree.ElementTree as ET import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.layers import (     Add, Concatenate, Conv2D,     Input, Lambda, LeakyReLU,     MaxPool2D, UpSampling2D, ZeroPadding2D ) from tensorflow.keras.regularizers import l2 from tensorflow.keras.losses import (     binary_crossentropy,     sparse_categorical_crossentropy ) 

With these libraries set up, we are ready to start defining the architecture of our YOLOv3 model in the next steps.

Step 2: Define YOLOv3 Layers, Anchors, and Anchor Masks

In this step, we set up the key components for our YOLOv3 model:

  1. YOLOV3_LAYER_LIST: Key layer names for loading weights and managing the YOLOv3 architecture.
  2. yolo_anchors: Predefined bounding box sizes, normalized for three scales to detect small, medium, and large objects.
  3. yolo_anchor_masks: Groups of anchors for each detection scale, helping match objects of different sizes.
Python
YOLOV3_LAYER_LIST = [     'yolo_darknet',       # Darknet feature extraction backbone     'yolo_conv_0',        # Convolutional layers for detection head 0     'yolo_output_0',      # Output layer for detection head 0     'yolo_conv_1',        # Convolutional layers for detection head 1     'yolo_output_1',      # Output layer for detection head 1     'yolo_conv_2',        # Convolutional layers for detection head 2     'yolo_output_2',      # Output layer for detection head 2   yolo_anchors = np.array([     (10, 13), (16, 30), (33, 23),   # Small-scale anchor boxes     (30, 61), (62, 45), (59, 119),  # Medium-scale anchor boxes     (116, 90), (156, 198), (373, 326)  # Large-scale anchor boxes ], np.float32) / 416  # Normalize by dividing by input size (416)  yolo_anchor_masks = np.array([[6, 7, 8], [3, 4, 5], [0, 1, 2]])  # Masks for different scales 

Step 3: Define Class Names

This list contains the class labels for the 80 common object categories YOLOv3 can detect. Each entry represents a type of object the model is trained to recognize.

Python
class_names = [     'person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat',     'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench',     'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',     'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',     'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',     'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',     'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',     'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa',     'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse',     'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink',     'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',     'hair drier', 'toothbrush' ] 

Step 6: Load Pretrained Darknet Weights

The load_darknet_weights function transfers pretrained weights from the original Darknet model into our TensorFlow/Keras YOLOv3 model. This allows us to use the knowledge learned by YOLOv3 during training without starting from scratch.

You can download the model weights from here.

Python
def load_darknet_weights(model, weights_file):     wf = open(weights_file, 'rb')     major, minor, revision, seen, _ = np.fromfile(wf, dtype=np.int32, count=5)          layers = YOLOV3_LAYER_LIST  # List of layers as defined in your code          for layer_name in layers:         sub_model = model.get_layer(layer_name)         for i, layer in enumerate(sub_model.layers):             if not layer.name.startswith('conv2d'):                 continue             batch_norm = None             if i + 1 < len(sub_model.layers) and sub_model.layers[i + 1].name.startswith('batch_norm'):                 batch_norm = sub_model.layers[i + 1]             filters = layer.filters             size = layer.kernel_size[0]             in_dim = layer.input.shape[-1]  # Use layer.input.shape for input dimension             if batch_norm is None:                 conv_bias = np.fromfile(wf, dtype=np.float32, count=filters)             else:                 bn_weights = np.fromfile(wf, dtype=np.float32, count=4 * filters)                 bn_weights = bn_weights.reshape((4, filters))[[1, 0, 2, 3]]                              conv_shape = (filters, in_dim, size, size)             conv_weights = np.fromfile(wf, dtype=np.float32, count=np.product(conv_shape))             conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])                          if batch_norm is None:                 layer.set_weights([conv_weights, conv_bias])             else:                 layer.set_weights([conv_weights])                 batch_norm.set_weights(bn_weights)          assert len(wf.read()) == 0, 'failed to read all data'     wf.close() 

Step 7: Calculate Intersection over Union (IoU) with Broadcasting

The broadcast_iou function calculates the Intersection over Union (IoU) between two sets of bounding boxes. IoU measures the overlap between two boxes, helping to determine how well the predicted box matches the actual object.

Python
def broadcast_iou(box_1, box_2):     # Prepare for broadcasting      box_1 = tf.expand_dims(box_1, -2)     box_2 = tf.expand_dims(box_2, 0)          # Determine the new shape to broadcast both box sets     new_shape = tf.broadcast_dynamic_shape(tf.shape(box_1), tf.shape(box_2))     box_1 = tf.broadcast_to(box_1, new_shape)     box_2 = tf.broadcast_to(box_2, new_shape)          # Calculate intersection width and height     int_w = tf.maximum(tf.minimum(box_1[..., 2], box_2[..., 2]) - tf.maximum(box_1[..., 0], box_2[..., 0]), 0)     int_h = tf.maximum(tf.minimum(box_1[..., 3], box_2[..., 3]) - tf.maximum(box_1[..., 1], box_2[..., 1]), 0)     int_area = int_w * int_h # Intersection Area           # Calculate the area of each box      box_1_area = (box_1[..., 2] - box_1[..., 0]) * (box_1[..., 3] - box_1[..., 1])     box_2_area = (box_2[..., 2] - box_2[..., 0]) * (box_2[..., 3] - box_2[..., 1])     return int_area / (box_1_area + box_2_area - int_area) 

Step 8: Freeze Model Layers

The freeze_all function sets the trainable property of the model's layers, allowing us to freeze or unfreeze them. It is useful in transfer learning when you want to keep certain weights constant during training.

Python
def freeze_all(model, frozen = True):     model.trainable = not frozen     if isinstance(model, tf.keras.Model):         for l in model.layers:             freeze_all(l, frozen) 

Step 9: Draw Bounding Boxes on the Image

The draw_outputs function overlays bounding boxes, class names, and confidence scores onto the image. This is crucial for visualizing the detection results produced by the YOLOv3 model.

Python
def draw_outputs(img, outputs, class_names):     boxes, objectness, classes, nums = outputs     boxes, objectness, classes, nums = boxes[0], objectness[0], classes[0], nums[0]     wh = np.flip(img.shape[0:2])          for i in range(nums):       	# Covert box coordinates to pixel values          x1y1 = tuple((np.array(boxes[i][0:2]) * wh).astype(np.int32))         x2y2 = tuple((np.array(boxes[i][2:4]) * wh).astype(np.int32))                  # Draw the bouding box          img = cv2.rectangle(img, x1y1, x2y2, (255, 0, 0), 2)                  #Label the box with class name and confidence score         img = cv2.putText(img, '{} {:.4f}'.format(             class_names[int(classes[i])], objectness[i]),             x1y1, cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 0, 255), 2)     return img 

Step 10: Preprocess Images for YOLOv3 Model

The transform_images function resizes images to the input size required by the YOLOv3 model and normalizes pixel values, preparing the images for processing.

Python
def transform_images(x_train, size):     x_train = tf.image.resize(x_train, (size, size))     x_train = x_train / 255     return x_train  

Step 11: Transform Target Labels for YOLOv3 Output

The transform_targets_for_output and transform_targets functions convert ground truth bounding boxes into a format compatible with the YOLOv3 output. This transformation aligns bounding boxes with specific grid cells and anchors in the model's output, essential for training.

transform_targets_for_output

This function aligns each bounding box with a grid cell and anchor, creating a target output that matches the YOLOv3 grid format.

  • Grid Cell Matching: Assigns bounding boxes to appropriate grid cells.
  • Anchor Matching: Matches each bounding box with the best-fitting anchor box.
  • Labeling: Adds the class label and presence flag to each box in the grid.
Python
@tf.function def transform_targets_for_output(y_true, grid_size, anchor_idxs, classes):     N = tf.shape(y_true)[0]     y_true_out = tf.zeros(         (N, grid_size, grid_size, tf.shape(anchor_idxs)[0], 6))     anchor_idxs = tf.cast(anchor_idxs, tf.int32)     indexes = tf.TensorArray(tf.int32, 1, dynamic_size=True)     updates = tf.TensorArray(tf.float32, 1, dynamic_size=True)     idx = 0     for i in tf.range(N):         for j in tf.range(tf.shape(y_true)[1]):             if tf.equal(y_true[i][j][2], 0):                 continue             anchor_eq = tf.equal(                 anchor_idxs, tf.cast(y_true[i][j][5], tf.int32))             if tf.reduce_any(anchor_eq):                 box = y_true[i][j][0:4]                 box_xy = (y_true[i][j][0:2] + y_true[i][j][2:4]) / 2                 anchor_idx = tf.cast(tf.where(anchor_eq), tf.int32)                 grid_xy = tf.cast(box_xy // (1/grid_size), tf.int32)                 indexes = indexes.write(                     idx, [i, grid_xy[1], grid_xy[0], anchor_idx[0][0]])                 updates = updates.write(                     idx, [box[0], box[1], box[2], box[3], 1, y_true[i][j][4]])                 idx += 1     return tf.tensor_scatter_nd_update(         y_true_out, indexes.stack(), updates.stack()) 

transform_targets

This function applies the transform_targets_for_output transformation across multiple detection scales. YOLOv3 detects objects at three scales (small, medium, large), each with its own grid size and set of anchor boxes.

  • Multi-Scale Matching: Applies the target transformation across three scales to detect objects of various sizes.
  • IoU Calculation: Calculates Intersection over Union (IoU) to determine the best anchor for each bounding box.
  • Return Format: Produces a tuple of transformed targets, one for each detection scale in YOLOv3.
Python
def transform_targets(y_train, anchors, anchor_masks, classes):     y_outs = []     grid_size = 13     anchors = tf.cast(anchors, tf.float32)     anchor_area = anchors[..., 0] * anchors[..., 1]     box_wh = y_train[..., 2:4] - y_train[..., 0:2]     box_wh = tf.tile(tf.expand_dims(box_wh, -2), (1, 1, tf.shape(anchors)[0], 1))     box_area = box_wh[..., 0] * box_wh[..., 1]     intersection = tf.minimum(box_wh[..., 0], anchors[..., 0]) * tf.minimum(box_wh[..., 1], anchors[..., 1])     iou = intersection / (box_area + anchor_area - intersection)     anchor_idx = tf.cast(tf.argmax(iou, axis=-1), tf.float32)     anchor_idx = tf.expand_dims(anchor_idx, axis=-1)     y_train = tf.concat([y_train, anchor_idx], axis=-1)          # Apply transformation for each scale's grid size and anchors      for anchor_idxs in anchor_masks:         y_outs.append(transform_targets_for_output(             y_train, grid_size, anchor_idxs, classes))         grid_size *= 2     return tuple(y_outs)  

Step 12: Define Custom Batch Normalization Layer

The BatchNormalization class customizes the standard batch normalization layer in TensorFlow to adapt its behavior for the YOLOv3 model. This modification allows batch normalization to behave correctly in both training and inference modes, based on the layer’s trainable status.

Python
class BatchNormalization(tf.keras.layers.BatchNormalization):     @tf.function     def call(self, x, training=False):         if training is None:             training = tf.constant(False)         training = tf.logical_and(training, self.trainable)         return super(BatchNormalization, self).call(x, training=training) 

Step 13: Build the Darknet Backbone for YOLOv3

The Darknet backbone is a series of convolutional blocks and residual connections designed for feature extraction. This backbone, initially developed for YOLO, is important for capturing spatial hierarchies in the input images.

DarknetConv: Convolutional Layer with Optional Batch Normalization

This function defines a convolutional layer with customizable parameters, including batch normalization and activation.

  • Customizable Convolution: Includes options for stride, padding, and batch normalization.
  • Activation: Uses LeakyReLU, commonly used in YOLO, to handle negative values.
Python
def DarknetConv(x, filters, size, strides=1, batch_norm=True):     if strides == 1:         padding = 'same'     else:         x = ZeroPadding2D(((1, 0), (1, 0)))(x)  # top left half-padding         padding = 'valid'     x = Conv2D(filters=filters, kernel_size=size,                strides=strides, padding=padding,                use_bias=not batch_norm, kernel_regularizer=l2(0.0005))(x)     if batch_norm:         x = BatchNormalization()(x)         x = LeakyReLU(alpha=0.1)(x)     return x 

DarknetResidual: Residual Block

A residual network consists of two convolutions and a skip connection, which allows the model to capture both new and retained information from previous layers.

Python
def DarknetResidual(x, filters):     prev = x     x = DarknetConv(x, filters // 2, 1)     x = DarknetConv(x, filters, 3)     x = Add()([prev, x])  # Ensure addition is valid     return x  # Return the tensor 

DarknetBlock: Stack of Residual Blocks

This function stacks multiple residual blocks, increasing feature extraction depth.

Python
def DarknetBlock(x, filters, blocks):     x = DarknetConv(x, filters, 3, strides=2)     for _ in range(blocks):         x = DarknetResidual(x, filters)  # Ensure residual connection     return x  # Return the tensor 


Darknet: Construct the Full Darknet Model

Combines DarknetConv and DarknetBlock layers to build the full backbone.

Python
def Darknet(name=None):     x = inputs = Input([None, None, 3])     x = DarknetConv(x, 32, 3)     x = DarknetBlock(x, 64, 1)     x = DarknetBlock(x, 128, 2)  # skip connection     x = x_36 = DarknetBlock(x, 256, 8)  # skip connection     x = x_61 = DarknetBlock(x, 512, 8)     x = DarknetBlock(x, 1024, 4)     return tf.keras.Model(inputs, (x_36, x_61, x), name=name) 

Step 14: Build YOLOv3 Head and Model

The YOLOv3 model requires specialized components for handling multi-scale detections, including convolutional layers with skip connections, output layers for generating predictions, and non-max suppression for refining final detections.

Here’s how these components are defined:

1. YoloConv: Convolutional Layers with Skip Connections

This function builds a stack of convolutional layers, using skip connections to improve gradient flow and capture multi-scale features.

Python
def YoloConv(x_in, filters, name=None):     if isinstance(x_in, tuple):  # Skip connection         inputs = Input(x_in[0].shape[1:]), Input(x_in[1].shape[1:])         x, x_skip = inputs         x = DarknetConv(x, filters, 1)         x = UpSampling2D(2)(x)         x = Concatenate()([x, x_skip])     else:         x = inputs = Input(x_in.shape[1:])     x = DarknetConv(x, filters, 1)     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, filters, 1)     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, filters, 1)     return Model(inputs, x, name=name)(x_in) 

2. YoloOutput: Final Prediction Layer

This layer produces the YOLOv3 output format, including bounding box coordinates, objectness scores, and class probabilities.

Python
def YoloOutput(x_in, filters, anchors, classes, name=None):     x = inputs = Input(x_in.shape[1:])     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, anchors * (classes + 5), 1, batch_norm=False)     x = Lambda(lambda x: tf.reshape(x, (-1, tf.shape(x)[1], tf.shape(x)[2], anchors, classes + 5)))(x)     return tf.keras.Model(inputs, x, name=name)(x_in) 

3. yolo_boxes: Extract Bounding Boxes, Objectness, and Class Scores

This function decodes the YOLO output into bounding box coordinates, objectness scores, and class probabilities.

Python
def yolo_boxes(pred, anchors, classes):     '''pred: (batch_size, grid, grid, anchors, (x, y, w, h, obj, ...classes))'''     grid_size = tf.shape(pred)[1]     box_xy, box_wh, objectness, class_probs = tf.split(         pred, (2, 2, 1, classes), axis=-1)     box_xy = tf.sigmoid(box_xy)     objectness = tf.sigmoid(objectness)     class_probs = tf.sigmoid(class_probs)     pred_box = tf.concat((box_xy, box_wh), axis=-1)  # original xywh for loss     grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))     grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)  # [gx, gy, 1, 2]     box_xy = (box_xy + tf.cast(grid, tf.float32)) / \         tf.cast(grid_size, tf.float32)     box_wh = tf.exp(box_wh) * anchors     box_x1y1 = box_xy - box_wh / 2     box_x2y2 = box_xy + box_wh / 2     bbox = tf.concat([box_x1y1, box_x2y2], axis=-1)     return bbox, objectness, class_probs, pred_box 

4. yolo_nms: Non-Max Suppression for Final Output

This function applies non-max suppression to filter overlapping boxes, retaining only the most confident predictions.

Python
def yolo_nms(outputs, anchors, masks, classes):     '''boxes, conf, type'''     b, c, t = [], [], []     for o in outputs:         b.append(tf.reshape(o[0], (tf.shape(o[0])[0], -1, tf.shape(o[0])[-1])))         c.append(tf.reshape(o[1], (tf.shape(o[1])[0], -1, tf.shape(o[1])[-1])))         t.append(tf.reshape(o[2], (tf.shape(o[2])[0], -1, tf.shape(o[2])[-1])))     bbox = tf.concat(b, axis=1)     confidence = tf.concat(c, axis=1)     class_probs = tf.concat(t, axis=1)     scores = confidence * class_probs     boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(         boxes=tf.reshape(bbox, (tf.shape(bbox)[0], -1, 1, 4)),         scores=tf.reshape(             scores,             (tf.shape(scores)[0], -1, tf.shape(scores)[-1])         ),         max_output_size_per_class=100,         max_total_size = 100,         iou_threshold = 0.5,         score_threshold = 0.5     )     return boxes, scores, classes, valid_detections 

YoloV3: Construct the Full YOLOv3 Model

Combines the Darknet backbone, YoloConv, and YoloOutput layers to construct the full YOLOv3 model.

Python
def YoloV3(size=None, channels=3, anchors=yolo_anchors, masks=yolo_anchor_masks, classes=80, training=False):     x = inputs = Input([size, size, channels])     x_36, x_61, x = Darknet(name='yolo_darknet')(x)     x = YoloConv(x, 512, name='yolo_conv_0')     output_0 = YoloOutput(x, 512, len(masks[0]), classes, name='yolo_output_0')     x = YoloConv((x, x_61), 256, name='yolo_conv_1')     output_1 = YoloOutput(x, 256, len(masks[1]), classes, name='yolo_output_1')     x = YoloConv((x, x_36), 128, name='yolo_conv_2')     output_2 = YoloOutput(x, 128, len(masks[2]), classes, name='yolo_output_2')     if training:         return Model(inputs, (output_0, output_1, output_2), name='yolov3')     boxes_0 = Lambda(lambda x: yolo_boxes(x, anchors[masks[0]], classes), name='yolo_boxes_0')(output_0)     boxes_1 = Lambda(lambda x: yolo_boxes(x, anchors[masks[1]], classes), name='yolo_boxes_1')(output_1)     boxes_2 = Lambda(lambda x: yolo_boxes(x, anchors[masks[2]], classes), name='yolo_boxes_2')(output_2)     outputs = Lambda(lambda x: yolo_nms(x, anchors, masks, classes), name='yolo_nms')((boxes_0[:3], boxes_1[:3], boxes_2[:3]))     return Model(inputs, outputs, name='yolov3') 

YOLOv3 Loss Function

The YoloLoss function computes the loss for the YOLOv3 model. It takes into account the localization (bounding box coordinates), confidence (objectness score), and classification errors.

Python
def YoloLoss(anchors, classes=80, ignore_thresh=0.5):     def yolo_loss(y_true, y_pred):         # 1. Transform predicted outputs         pred_box, pred_obj, pred_class, pred_xywh = yolo_boxes(y_pred, anchors, classes)         pred_xy = pred_xywh[..., 0:2]         pred_wh = pred_xywh[..., 2:4]                  # 2. Transform true outputs         true_box, true_obj, true_class_idx = tf.split(             y_true, (4, 1, 1), axis=-1)         true_xy = (true_box[..., 0:2] + true_box[..., 2:4]) / 2         true_wh = true_box[..., 2:4] - true_box[..., 0:2]         # give higher weights to small boxes         box_loss_scale = 2 - true_wh[..., 0] * true_wh[..., 1]                  # 3. Adjust true outputs to match predicted grid format         grid_size = tf.shape(y_true)[1]         grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))         grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)         true_xy = true_xy * tf.cast(grid_size, tf.float32) - \             tf.cast(grid, tf.float32)         true_wh = tf.math.log(true_wh / anchors)         true_wh = tf.where(tf.math.is_inf(true_wh), tf.zeros_like(true_wh), true_wh)                  # 4. Compute masks for objects and ignore regions         obj_mask = tf.squeeze(true_obj, -1)         true_box_flat = tf.boolean_mask(true_box, tf.cast(obj_mask, tf.bool))         best_iou = tf.reduce_max(broadcast_iou(             pred_box, true_box_flat), axis=-1)         ignore_mask = tf.cast(best_iou < ignore_thresh, tf.float32)                  # 5. Calculate individual losses         xy_loss = obj_mask * box_loss_scale * \             tf.reduce_sum(tf.square(true_xy - pred_xy), axis=-1)         wh_loss = obj_mask * box_loss_scale * \             tf.reduce_sum(tf.square(true_wh - pred_wh), axis=-1)         obj_loss = binary_crossentropy(true_obj, pred_obj)         obj_loss = obj_mask * obj_loss + \             (1 - obj_mask) * ignore_mask * obj_loss         # Could also use binary_crossentropy instead         class_loss = obj_mask * sparse_categorical_crossentropy(             true_class_idx, pred_class)                  # 6. Sum up the losses         xy_loss = tf.reduce_sum(xy_loss, axis=(1, 2, 3))         wh_loss = tf.reduce_sum(wh_loss, axis=(1, 2, 3))         obj_loss = tf.reduce_sum(obj_loss, axis=(1, 2, 3))         class_loss = tf.reduce_sum(class_loss, axis=(1, 2, 3))         return xy_loss + wh_loss + obj_loss + class_loss     return yolo_loss 

Instantiate and Summarize the Model

This builds and displays the YOLOv3 model, showing layer details and the number of parameters, ready for training or inference.

Python
yolo = YoloV3(size=416, classes=80) yolo.summary() 

Output:

yolov3
 Total params: 62,001,757 (236.52 MB)
Trainable params: 61,949,149 (236.32 MB)
Non-trainable params: 52,608 (205.50 KB)

Step 15: Visualize the YOLOv3 Model Architecture

To better understand the model structure, we can generate a visual representation of the YOLOv3 architecture using plot_model from TensorFlow’s Keras utilities. This will produce an image that shows the layers and connections within the model.

Python
from tensorflow.keras.utils import plot_model  plot_model(     yolo, rankdir = 'TB',     to_file = 'yolo_model1.png',     show_shapes = False,     show_layer_names = True,     expand_nested = False ) 

Output:

yolo-model-architecture
YOLO Model

Step 16: Load Weights and Make Predictions with YOLOv3

This step involves loading pretrained weights into the YOLOv3 model and creating a predict function to make predictions on new images. The function processes an image, performs object detection, and visualizes the results.

Python
load_darknet_weights(yolo, '/content/yolov3.weights')  def predict(image_file, visualize = True, figsize = (16, 16)):     img = tf.image.decode_image(open(image_file, 'rb').read(), channels=3)     img = tf.expand_dims(img, 0)     img = transform_images(img, 416)     boxes, scores, classes, nums = yolo.predict(img)     img = cv2.cvtColor(cv2.imread(image_file), cv2.COLOR_BGR2RGB)     img = draw_outputs(img, (boxes, scores, classes, nums), class_names)     if visualize:         fig, axes = plt.subplots(figsize = figsize)         plt.imshow(img)         plt.show()     return boxes, scores, classes, nums image_file = '/content/Dog and Cat Image.jpg' 

Step 18: Update the Prediction Function with Scaled Bounding Boxes

This updated predict function enhances the visualization by scaling bounding boxes back to pixel coordinates, providing a more accurate display of the detection results on the original image.

Python
import tensorflow as tf import cv2 import matplotlib.pyplot as plt  # Updated prediction function with scaled bounding boxes def predict(image_file, visualize=True, figsize=(16, 16)):     # Load and preprocess the image     img = tf.image.decode_image(open(image_file, 'rb').read(), channels=3)     img = tf.expand_dims(img, 0)     img = transform_images(img, 416)          # Get predictions     boxes, scores, classes, nums = yolo.predict(img)          # Convert the original image for plotting     img = cv2.cvtColor(cv2.imread(image_file), cv2.COLOR_BGR2RGB)     img_height, img_width, _ = img.shape          # Plotting the results with scaled boxes     if visualize:         plt.figure(figsize=figsize)                  for i in range(nums[0]):             # Extract and scale box coordinates to pixel values             y1, x1, y2, x2 = boxes[0][i]             x1 = int(x1 * img_width)             y1 = int(y1 * img_height)             x2 = int(x2 * img_width)             y2 = int(y2 * img_height)                          # Draw bounding box             cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)                          # Prepare label with class and score             class_name = class_names[int(classes[0][i])]             score = scores[0][i]             label = f"{class_name}: {score:.2f}"                          # Display label above the box             cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)                  # Show the final image with annotations         plt.imshow(img)         plt.axis('off')         plt.show()          return boxes, scores, classes, nums  # Define the path to the image image_file = '/content/lamb-8287365_1280.jpg'  # Run prediction and plot predict(image_file) 

Output:

Detected-Image
Detected Objects in the Image

Complete Code

You can download the complete code form here.

Python
import numpy as np import pandas as pd import cv2, os, glob import xml.etree.ElementTree as ET import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras import Model from tensorflow.keras.layers import (     Add, Concatenate, Conv2D,     Input, Lambda, LeakyReLU,     MaxPool2D, UpSampling2D, ZeroPadding2D ) from tensorflow.keras.regularizers import l2 from tensorflow.keras.losses import (     binary_crossentropy,     sparse_categorical_crossentropy )  YOLOV3_LAYER_LIST = [     'yolo_darknet',     'yolo_conv_0',     'yolo_output_0',     'yolo_conv_1',     'yolo_output_1',     'yolo_conv_2',     'yolo_output_2', ] yolo_anchors = np.array([     (10, 13), (16, 30), (33, 23), (30, 61), (62, 45),     (59, 119), (116, 90), (156, 198), (373, 326)],     np.float32) / 416  yolo_anchor_masks = np.array([[6, 7, 8], [3, 4, 5], [0, 1, 2]])  class_names = [     'person', 'bicycle','car','motorbike','aeroplane','bus','train','truck','boat',     'traffic light','fire hydrant','stop sign','parking meter','bench',     'bird','cat','dog','horse','sheep','cow','elephant','bear','zebra',     'giraffe','backpack','umbrella','handbag','tie','suitcase','frisbee',     'skis','snowboard','sports ball','kite','baseball bat','baseball glove',     'skateboard','surfboard','tennis racket','bottle','wine glass','cup',     'fork','knife','spoon','bowl','banana','apple','sandwich','orange',     'broccoli','carrot','hot dog','pizza','donut','cake','chair','sofa',     'pottedplant','bed','diningtable','toilet','tvmonitor','laptop','mouse',     'remote','keyboard','cell phone','microwave','oven','toaster','sink',     'refrigerator','book','clock','vase','scissors','teddy bear',     'hair drier','toothbrush' ]  def load_darknet_weights(model, weights_file):     wf = open(weights_file, 'rb')     major, minor, revision, seen, _ = np.fromfile(wf, dtype=np.int32, count=5)          layers = YOLOV3_LAYER_LIST  # List of layers as defined in your code          for layer_name in layers:         sub_model = model.get_layer(layer_name)         for i, layer in enumerate(sub_model.layers):             if not layer.name.startswith('conv2d'):                 continue             batch_norm = None             if i + 1 < len(sub_model.layers) and sub_model.layers[i + 1].name.startswith('batch_norm'):                 batch_norm = sub_model.layers[i + 1]             filters = layer.filters             size = layer.kernel_size[0]             in_dim = layer.input.shape[-1]  # Use layer.input.shape for input dimension             if batch_norm is None:                 conv_bias = np.fromfile(wf, dtype=np.float32, count=filters)             else:                 bn_weights = np.fromfile(wf, dtype=np.float32, count=4 * filters)                 bn_weights = bn_weights.reshape((4, filters))[[1, 0, 2, 3]]                              conv_shape = (filters, in_dim, size, size)             conv_weights = np.fromfile(wf, dtype=np.float32, count=np.product(conv_shape))             conv_weights = conv_weights.reshape(conv_shape).transpose([2, 3, 1, 0])                          if batch_norm is None:                 layer.set_weights([conv_weights, conv_bias])             else:                 layer.set_weights([conv_weights])                 batch_norm.set_weights(bn_weights)          assert len(wf.read()) == 0, 'failed to read all data'     wf.close()  def broadcast_iou(box_1, box_2):         # broadcast boxes     box_1 = tf.expand_dims(box_1, -2)     box_2 = tf.expand_dims(box_2, 0)     # new_shape: (..., N, (x1, y1, x2, y2))     new_shape = tf.broadcast_dynamic_shape(tf.shape(box_1), tf.shape(box_2))     box_1 = tf.broadcast_to(box_1, new_shape)     box_2 = tf.broadcast_to(box_2, new_shape)     int_w = tf.maximum(tf.minimum(box_1[..., 2], box_2[..., 2]) - tf.maximum(box_1[..., 0], box_2[..., 0]), 0)     int_h = tf.maximum(tf.minimum(box_1[..., 3], box_2[..., 3]) - tf.maximum(box_1[..., 1], box_2[..., 1]), 0)     int_area = int_w * int_h     box_1_area = (box_1[..., 2] - box_1[..., 0]) * (box_1[..., 3] - box_1[..., 1])     box_2_area = (box_2[..., 2] - box_2[..., 0]) * (box_2[..., 3] - box_2[..., 1])     return int_area / (box_1_area + box_2_area - int_area)    def freeze_all(model, frozen = True):     model.trainable = not frozen     if isinstance(model, tf.keras.Model):         for l in model.layers:             freeze_all(l, frozen)  def draw_outputs(img, outputs, class_names):     boxes, objectness, classes, nums = outputs     boxes, objectness, classes, nums = boxes[0], objectness[0], classes[0], nums[0]     wh = np.flip(img.shape[0:2])     for i in range(nums):         x1y1 = tuple((np.array(boxes[i][0:2]) * wh).astype(np.int32))         x2y2 = tuple((np.array(boxes[i][2:4]) * wh).astype(np.int32))         img = cv2.rectangle(img, x1y1, x2y2, (255, 0, 0), 2)         img = cv2.putText(img, '{} {:.4f}'.format(             class_names[int(classes[i])], objectness[i]),             x1y1, cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 0, 255), 2)     return img  def transform_images(x_train, size):     x_train = tf.image.resize(x_train, (size, size))     x_train = x_train / 255     return x_train   @tf.function def transform_targets_for_output(y_true, grid_size, anchor_idxs, classes):     N = tf.shape(y_true)[0]     y_true_out = tf.zeros(         (N, grid_size, grid_size, tf.shape(anchor_idxs)[0], 6))     anchor_idxs = tf.cast(anchor_idxs, tf.int32)     indexes = tf.TensorArray(tf.int32, 1, dynamic_size=True)     updates = tf.TensorArray(tf.float32, 1, dynamic_size=True)     idx = 0     for i in tf.range(N):         for j in tf.range(tf.shape(y_true)[1]):             if tf.equal(y_true[i][j][2], 0):                 continue             anchor_eq = tf.equal(                 anchor_idxs, tf.cast(y_true[i][j][5], tf.int32))             if tf.reduce_any(anchor_eq):                 box = y_true[i][j][0:4]                 box_xy = (y_true[i][j][0:2] + y_true[i][j][2:4]) / 2                 anchor_idx = tf.cast(tf.where(anchor_eq), tf.int32)                 grid_xy = tf.cast(box_xy // (1/grid_size), tf.int32)                 indexes = indexes.write(                     idx, [i, grid_xy[1], grid_xy[0], anchor_idx[0][0]])                 updates = updates.write(                     idx, [box[0], box[1], box[2], box[3], 1, y_true[i][j][4]])                 idx += 1     return tf.tensor_scatter_nd_update(         y_true_out, indexes.stack(), updates.stack())  def transform_targets(y_train, anchors, anchor_masks, classes):     y_outs = []     grid_size = 13     anchors = tf.cast(anchors, tf.float32)     anchor_area = anchors[..., 0] * anchors[..., 1]     box_wh = y_train[..., 2:4] - y_train[..., 0:2]     box_wh = tf.tile(tf.expand_dims(box_wh, -2), (1, 1, tf.shape(anchors)[0], 1))     box_area = box_wh[..., 0] * box_wh[..., 1]     intersection = tf.minimum(box_wh[..., 0], anchors[..., 0]) * tf.minimum(box_wh[..., 1], anchors[..., 1])     iou = intersection / (box_area + anchor_area - intersection)     anchor_idx = tf.cast(tf.argmax(iou, axis=-1), tf.float32)     anchor_idx = tf.expand_dims(anchor_idx, axis=-1)     y_train = tf.concat([y_train, anchor_idx], axis=-1)     for anchor_idxs in anchor_masks:         y_outs.append(transform_targets_for_output(             y_train, grid_size, anchor_idxs, classes))         grid_size *= 2     return tuple(y_outs)     class BatchNormalization(tf.keras.layers.BatchNormalization):     @tf.function     def call(self, x, training=False):         if training is None:             training = tf.constant(False)         training = tf.logical_and(training, self.trainable)         return super(BatchNormalization, self).call(x, training=training)  def DarknetConv(x, filters, size, strides=1, batch_norm=True):     if strides == 1:         padding = 'same'     else:         x = ZeroPadding2D(((1, 0), (1, 0)))(x)  # top left half-padding         padding = 'valid'     x = Conv2D(filters=filters, kernel_size=size,                strides=strides, padding=padding,                use_bias=not batch_norm, kernel_regularizer=l2(0.0005))(x)     if batch_norm:         x = BatchNormalization()(x)         x = LeakyReLU(alpha=0.1)(x)     return x  def DarknetResidual(x, filters):     prev = x     x = DarknetConv(x, filters // 2, 1)     x = DarknetConv(x, filters, 3)     x = Add()([prev, x])  # Ensure addition is valid     return x  # Return the tensor  def DarknetBlock(x, filters, blocks):     x = DarknetConv(x, filters, 3, strides=2)     for _ in range(blocks):         x = DarknetResidual(x, filters)  # Ensure residual connection     return x  # Return the tensor  def Darknet(name=None):     x = inputs = Input([None, None, 3])     x = DarknetConv(x, 32, 3)     x = DarknetBlock(x, 64, 1)     x = DarknetBlock(x, 128, 2)  # skip connection     x = x_36 = DarknetBlock(x, 256, 8)  # skip connection     x = x_61 = DarknetBlock(x, 512, 8)     x = DarknetBlock(x, 1024, 4)     return tf.keras.Model(inputs, (x_36, x_61, x), name=name)    def YoloConv(x_in, filters, name=None):     if isinstance(x_in, tuple):         inputs = Input(x_in[0].shape[1:]), Input(x_in[1].shape[1:])         x, x_skip = inputs         # concat with skip connection         x = DarknetConv(x, filters, 1)         x = UpSampling2D(2)(x)         x = Concatenate()([x, x_skip])     else:         x = inputs = Input(x_in.shape[1:])     x = DarknetConv(x, filters, 1)     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, filters, 1)     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, filters, 1)     return Model(inputs, x, name=name)(x_in)    def YoloOutput(x_in, filters, anchors, classes, name=None):     x = inputs = Input(x_in.shape[1:])     x = DarknetConv(x, filters * 2, 3)     x = DarknetConv(x, anchors * (classes + 5), 1, batch_norm=False)     x = Lambda(lambda x: tf.reshape(x, (-1, tf.shape(x)[1], tf.shape(x)[2], anchors, classes + 5)))(x)     return tf.keras.Model(inputs, x, name=name)(x_in)    def yolo_boxes(pred, anchors, classes):     '''pred: (batch_size, grid, grid, anchors, (x, y, w, h, obj, ...classes))'''     grid_size = tf.shape(pred)[1]     box_xy, box_wh, objectness, class_probs = tf.split(         pred, (2, 2, 1, classes), axis=-1)     box_xy = tf.sigmoid(box_xy)     objectness = tf.sigmoid(objectness)     class_probs = tf.sigmoid(class_probs)     pred_box = tf.concat((box_xy, box_wh), axis=-1)  # original xywh for loss     grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))     grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)  # [gx, gy, 1, 2]     box_xy = (box_xy + tf.cast(grid, tf.float32)) / \         tf.cast(grid_size, tf.float32)     box_wh = tf.exp(box_wh) * anchors     box_x1y1 = box_xy - box_wh / 2     box_x2y2 = box_xy + box_wh / 2     bbox = tf.concat([box_x1y1, box_x2y2], axis=-1)     return bbox, objectness, class_probs, pred_box    def yolo_nms(outputs, anchors, masks, classes):     '''boxes, conf, type'''     b, c, t = [], [], []     for o in outputs:         b.append(tf.reshape(o[0], (tf.shape(o[0])[0], -1, tf.shape(o[0])[-1])))         c.append(tf.reshape(o[1], (tf.shape(o[1])[0], -1, tf.shape(o[1])[-1])))         t.append(tf.reshape(o[2], (tf.shape(o[2])[0], -1, tf.shape(o[2])[-1])))     bbox = tf.concat(b, axis=1)     confidence = tf.concat(c, axis=1)     class_probs = tf.concat(t, axis=1)     scores = confidence * class_probs     boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(         boxes=tf.reshape(bbox, (tf.shape(bbox)[0], -1, 1, 4)),         scores=tf.reshape(             scores,             (tf.shape(scores)[0], -1, tf.shape(scores)[-1])         ),         max_output_size_per_class=100,         max_total_size = 100,         iou_threshold = 0.5,         score_threshold = 0.5     )     return boxes, scores, classes, valid_detections    def YoloV3(size=None, channels=3, anchors=yolo_anchors, masks=yolo_anchor_masks, classes=80, training=False):     x = inputs = Input([size, size, channels])     x_36, x_61, x = Darknet(name='yolo_darknet')(x)     x = YoloConv(x, 512, name='yolo_conv_0')     output_0 = YoloOutput(x, 512, len(masks[0]), classes, name='yolo_output_0')     x = YoloConv((x, x_61), 256, name='yolo_conv_1')     output_1 = YoloOutput(x, 256, len(masks[1]), classes, name='yolo_output_1')     x = YoloConv((x, x_36), 128, name='yolo_conv_2')     output_2 = YoloOutput(x, 128, len(masks[2]), classes, name='yolo_output_2')     if training:         return Model(inputs, (output_0, output_1, output_2), name='yolov3')     boxes_0 = Lambda(lambda x: yolo_boxes(x, anchors[masks[0]], classes),                      name='yolo_boxes_0')(output_0)     boxes_1 = Lambda(lambda x: yolo_boxes(x, anchors[masks[1]], classes),                      name='yolo_boxes_1')(output_1)     boxes_2 = Lambda(lambda x: yolo_boxes(x, anchors[masks[2]], classes),                      name='yolo_boxes_2')(output_2)     outputs = Lambda(lambda x: yolo_nms(x, anchors, masks, classes),                      name='yolo_nms')((boxes_0[:3], boxes_1[:3], boxes_2[:3]))     return Model(inputs, outputs, name='yolov3')    def YoloLoss(anchors, classes=80, ignore_thresh=0.5):     def yolo_loss(y_true, y_pred):         # 1. transform all pred outputs         # y_pred: (batch_size, grid, grid, anchors, (x, y, w, h, obj, ...cls))         pred_box, pred_obj, pred_class, pred_xywh = yolo_boxes(y_pred, anchors, classes)         pred_xy = pred_xywh[..., 0:2]         pred_wh = pred_xywh[..., 2:4]                  # 2. transform all true outputs         # y_true: (batch_size, grid, grid, anchors, (x1, y1, x2, y2, obj, cls))         true_box, true_obj, true_class_idx = tf.split(             y_true, (4, 1, 1), axis=-1)         true_xy = (true_box[..., 0:2] + true_box[..., 2:4]) / 2         true_wh = true_box[..., 2:4] - true_box[..., 0:2]         # give higher weights to small boxes         box_loss_scale = 2 - true_wh[..., 0] * true_wh[..., 1]                  # 3. inverting the pred box equations         grid_size = tf.shape(y_true)[1]         grid = tf.meshgrid(tf.range(grid_size), tf.range(grid_size))         grid = tf.expand_dims(tf.stack(grid, axis=-1), axis=2)         true_xy = true_xy * tf.cast(grid_size, tf.float32) - \             tf.cast(grid, tf.float32)         true_wh = tf.math.log(true_wh / anchors)         true_wh = tf.where(tf.math.is_inf(true_wh), tf.zeros_like(true_wh), true_wh)                  # 4. calculate all masks         obj_mask = tf.squeeze(true_obj, -1)         # ignore false positive when iou is over threshold         true_box_flat = tf.boolean_mask(true_box, tf.cast(obj_mask, tf.bool))         best_iou = tf.reduce_max(broadcast_iou(             pred_box, true_box_flat), axis=-1)         ignore_mask = tf.cast(best_iou < ignore_thresh, tf.float32)                  # 5. calculate all losses         xy_loss = obj_mask * box_loss_scale * \             tf.reduce_sum(tf.square(true_xy - pred_xy), axis=-1)         wh_loss = obj_mask * box_loss_scale * \             tf.reduce_sum(tf.square(true_wh - pred_wh), axis=-1)         obj_loss = binary_crossentropy(true_obj, pred_obj)         obj_loss = obj_mask * obj_loss + \             (1 - obj_mask) * ignore_mask * obj_loss         # Could also use binary_crossentropy instead         class_loss = obj_mask * sparse_categorical_crossentropy(             true_class_idx, pred_class)                  # 6. sum over (batch, gridx, gridy, anchors) => (batch, 1)         xy_loss = tf.reduce_sum(xy_loss, axis=(1, 2, 3))         wh_loss = tf.reduce_sum(wh_loss, axis=(1, 2, 3))         obj_loss = tf.reduce_sum(obj_loss, axis=(1, 2, 3))         class_loss = tf.reduce_sum(class_loss, axis=(1, 2, 3))         return xy_loss + wh_loss + obj_loss + class_loss     return yolo_loss    yolo = YoloV3(size=416, classes=80) yolo.summary()  from tensorflow.keras.utils import plot_model  plot_model(     yolo, rankdir = 'TB',     to_file = 'yolo_model1.png',     show_shapes = False,     show_layer_names = True,     expand_nested = False )  load_darknet_weights(yolo, '/content/yolov3.weights') def predict(image_file, visualize = True, figsize = (16, 16)):     img = tf.image.decode_image(open(image_file, 'rb').read(), channels=3)     img = tf.expand_dims(img, 0)     img = transform_images(img, 416)     boxes, scores, classes, nums = yolo.predict(img)     img = cv2.cvtColor(cv2.imread(image_file), cv2.COLOR_BGR2RGB)     img = draw_outputs(img, (boxes, scores, classes, nums), class_names)     if visualize:         fig, axes = plt.subplots(figsize = figsize)         plt.imshow(img)         plt.show()     return boxes, scores, classes, nums image_file = '/content/Dog and Cat Image.jpg'  import tensorflow as tf import cv2 import matplotlib.pyplot as plt  # Updated prediction function with scaled bounding boxes def predict(image_file, visualize=True, figsize=(16, 16)):     # Load and preprocess the image     img = tf.image.decode_image(open(image_file, 'rb').read(), channels=3)     img = tf.expand_dims(img, 0)     img = transform_images(img, 416)          # Get predictions     boxes, scores, classes, nums = yolo.predict(img)          # Convert the original image for plotting     img = cv2.cvtColor(cv2.imread(image_file), cv2.COLOR_BGR2RGB)     img_height, img_width, _ = img.shape          # Plotting the results with scaled boxes     if visualize:         plt.figure(figsize=figsize)                  for i in range(nums[0]):             # Extract and scale box coordinates to pixel values             y1, x1, y2, x2 = boxes[0][i]             x1 = int(x1 * img_width)             y1 = int(y1 * img_height)             x2 = int(x2 * img_width)             y2 = int(y2 * img_height)                          # Draw bounding box             cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)                          # Prepare label with class and score             class_name = class_names[int(classes[0][i])]             score = scores[0][i]             label = f"{class_name}: {score:.2f}"                          # Display label above the box             cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)                  # Show the final image with annotations         plt.imshow(img)         plt.axis('off')         plt.show()          return boxes, scores, classes, nums  # Define the path to the image image_file = '/content/lamb-8287365_1280.jpg'  # Run prediction and plot predict(image_file) 

Conclusion

In this article, we explored how to build the YOLOv3 model from scratch using TensorFlow, load pretrained weights, and visualize detections on sample images. YOLOv3’s speed and accuracy make it a strong choice for real-time object detection tasks across a variety of applications.



Next Article
Object Detection with YOLO using TensorFlow

A

arundhutichak
Improve
Article Tags :
  • Computer Vision
  • AI-ML-DS
  • Tensorflow
  • Computer Vision Projects
  • AI-ML-DS With Python

Similar Reads

    Object Detection using TensorFlow
    Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, pr
    7 min read
    Real-Time Object Detection Using TensorFlow
    In November 2015, Google's deep artificial intelligence research division introduced TensorFlow, a cutting-edge machine learning library initially designed for internal purposes. This open-source library revolutionized the field, which helped researchers and developers in building, training, and dep
    11 min read
    Object Detection using yolov8
    In the world of computer vision, YOLOv8 object detection really stands out for its super accuracy and speed. It's the latest version of the YOLO series, and it's known for being able to detect objects in real-time. YOLOv8 takes web applications, APIs, and image analysis to the next level with its to
    7 min read
    Object Detection with YOLO and OpenCV
    Object Detection is a task of computer vision that helps to detect the objects in the image or video frame. It helps to recognize objects count the occurrences of them to keep records, etc. The objective of object detection is to identify and annotate each of the objects present in the media. YOLO(Y
    6 min read
    One Hot Encoding using Tensorflow
    In this post, we will be seeing how to initialize a vector in TensorFlow with all zeros or ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array full of zeros and ones accordingly. Code: pyt
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences