Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Computer Vision 101
Next article icon

Computer Vision with PyTorch

Last Updated : 28 Mar, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

PyTorch is a powerful framework applicable to various computer vision tasks. The article aims to enumerate the features and functionalities within the context of computer vision that empower developers to build neural networks and train models. It also demonstrates how PyTorch framework can be utilized for computer vision tasks.

AI can use various technologies like computer vision, which facilitates the customization and experimentation, thus allowing researchers and developers to come up with the best methods of solving serious vision related problems.

  1. Image Classification: Image includes becomes to figure out what objects (for instance, dog, car and beach ) are featured in it.
  2. Object Detection: Obtain an image, locate the position of the objects, and draw boxes around them by example the individual, car, and traffic sign (people images).
  3. Image Segmentation: The segmentation of an object into different portions (e.g., background, foreground objects and respective parts of an object) can be done by dividing the image in different regions attributed to distinct features.
  4. Video Processing: What is important in Video Analytic for example identifying activities (such as whether a person is walking, running or dancing), recognizing objects on the video (it can be sport, news or entertainments) or following the object when it moves.

PyTorch Capabilities for Computer Vision Tasks

  1. It supports Torchvision which is a PyTorch library and it is given with some pre-trained models, datasets, and tools designed specifically for computer vision tasks. It also gives researchers an access to popular deep learning models like ResNet, VGG, and DenseNet, which they can be used to build their model.
  2. PyTorch manages the load easily and also allows users to prepare image datasets for training their models. It consists of some standard datasets like ImageNet, CIFAR, and COCO which can be used for the own custom datasets.
  3. It supports data augmentation with PyTorch's TorchVision transforms. It can be used for random transformations like cropping, resizing, and color tweaks into the images during training, which helps the model to get better.
  4. This integrates with CUDA, allowing users to leverage the power of its GPU for accelerating the training of deep learning models. This can increase the model to speed up the training process, especially for large datasets and complex architectures.
  5. It have dynamic computation graph which allows users to approach the model & allows them to create and modify computational graphs during runtime. This enables flexibility which let users to experiment the different model architectures and control flows easily, which is great for rapid development in computer vision tasks.
  6. This also provides automatic differentiation which is a key feature of PyTorch's autograd engine. It offers efficiently computes gradients for training the models, simplifying the complex neural network architectures and optimize the algorithms used in the computer vision.

Computer Vision Hands on with PyTorch

In this, we will use the CIFAR-10 dataset, a popular dataset for image classification. This contains 60,000 32x32 color images in 10 classes, with 6,000 images per class so, We'll load the dataset, prepare data loaders, build a simple convolutional neural network (CNN) as a baseline model, and perform evaluation.

  • We import the necessary libraries including torch for PyTorch functionalities and torchvision for datasets and transformations.
  • We define transformations to normalize the data using transforms.Compose.

Step 1: Loading the Dataset

We are going to Load the CIFAR-10 dataset using torchvision.datasets.CIFAR10 and create data loaders for training and testing sets using torch.utils.data.DataLoader.

Python
import torch import torchvision import torchvision.transforms as transforms import torch.nn as nn import torch.optim as optim  # Step 1: Loading the CIFAR-10 dataset transform = transforms.Compose([     transforms.ToTensor(),     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize to [-1, 1] ])  trainset = torchvision.datasets.CIFAR10(root='./data', train=True,                                         download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,                                           shuffle=True, num_workers=2)  testset = torchvision.datasets.CIFAR10(root='./data', train=False,                                        download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=4,                                          shuffle=False, num_workers=2)  classes = ('plane', 'car', 'bird', 'cat',            'deer', 'dog', 'frog', 'horse', 'ship', 'truck') 

Step 2: Defining the Model

In this step, we are preparing data loaders for training and testing. In this, we will define the classes of the dataset and define a simple CNN model (SimpleCNN) using nn.Module.

Python
# Step 2: Defining the CNN model class Net(nn.Module):     def __init__(self):         super(Net, self).__init__()         self.conv1 = nn.Conv2d(3, 6, 5)         self.pool = nn.MaxPool2d(2, 2)         self.conv2 = nn.Conv2d(6, 16, 5)         self.fc1 = nn.Linear(16 * 5 * 5, 120)         self.fc2 = nn.Linear(120, 84)         self.fc3 = nn.Linear(84, 10)      def forward(self, x):         x = self.pool(torch.relu(self.conv1(x)))         x = self.pool(torch.relu(self.conv2(x)))         x = x.view(-1, 16 * 5 * 5)         x = torch.relu(self.fc1(x))         x = torch.relu(self.fc2(x))         x = self.fc3(x)         return x  net = Net() 

Step 3: Defining Loss Function and optimizer

Now, we shall be building a simple CNN model as a baseline for which we define loss function using (nn.CrossEntropyLoss) and optimizer using (optim.SGD).

Python
# Step 3: Defining loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) 

Step 4: Model Training Process

Now we will be training the model by using a couple of epochs. In this step, we define some crucial points while training a couple of epochs like Data Loading, Forward Pass, Compute Loss etc.

Python
# Step 4: Training the model for epoch in range(2):  # loop over the dataset multiple times     running_loss = 0.0     for i, data in enumerate(trainloader, 0):         inputs, labels = data          optimizer.zero_grad()          outputs = net(inputs)         loss = criterion(outputs, labels)         loss.backward()         optimizer.step()          running_loss += loss.item()         if i % 2000 == 1999:  # print every 2000 mini-batches             print('[%d, %5d] loss: %.3f' %                   (epoch + 1, i + 1, running_loss / 2000))             running_loss = 0.0  print('Finished Training') 

Output:

[1,  2000] loss: 2.140
[1, 4000] loss: 1.808
[1, 6000] loss: 1.638
[1, 8000] loss: 1.562
[1, 10000] loss: 1.505
[1, 12000] loss: 1.441
[2, 2000] loss: 1.378
[2, 4000] loss: 1.356
[2, 6000] loss: 1.343
[2, 8000] loss: 1.330
[2, 10000] loss: 1.282
[2, 12000] loss: 1.292
Finished Training

Step 5: Model Evaluation

In this step, we shall evaluate the network on the test dataset by iterating through the test data loader. Lets Evaluate the model on the test set.

Python
correct = 0 total = 0 with torch.no_grad():     for data in testloader:         images, labels = data         outputs = net(images)         _, predicted = torch.max(outputs.data, 1)         total += labels.size(0)         correct += (predicted == labels).sum().item()  print('Accuracy of the network on the 10000 test images: %d %%' % (     100 * correct / total)) 

Output:

Accuracy of the network on the 10000 test images: 54 %

Next Article
Computer Vision 101
author
0902cs2py8d
Improve
Article Tags :
  • Deep Learning
  • Dev Scripter
  • AI-ML-DS
  • Python-PyTorch
  • Dev Scripter 2024
  • AI-ML-DS With Python

Similar Reads

  • Computer Vision Tutorial
    Computer Vision is a branch of Artificial Intelligence (AI) that enables computers to interpret and extract information from images and videos, similar to human perception. It involves developing algorithms to process visual data and derive meaningful insights. Why Learn Computer Vision?High Demand
    8 min read
  • Computer Vision - Introduction
    Ever wondered how are we able to understand the things we see? Like we see someone walking, whether we realize it or not, using the prerequisite knowledge, our brain understands what is happening and stores it as information. Imagine we look at something and go completely blank. Into oblivion. Scary
    3 min read
  • Load a Computer Vision Dataset in PyTorch
    Computer vision is a subset of Artificial Intelligence that gives the ability to the computer to understand images. In Deep Learning, Convolution Neural Network is used to process the image. For building the good we need a lot of images to process. There are several ways to load a computer vision da
    3 min read
  • Top Computer Vision Models
    Computer Vision has affected diverse fields due to the release of resourceful models. Some of these are the image classification models of CNNs such as AlexNet and ResNet; object detection models include R-CNN variants, while medical image segmentation uses U-Nets. YOLO and SSD models are perfect fo
    10 min read
  • Computer Vision 101
    Computer Vision, an interdisciplinary field at the intersection of artificial intelligence and image processing, focuses on enabling machines to interpret and understand visual data from the world around us. This technology empowers computers to derive meaningful information from images, videos, and
    12 min read
  • How to learn Computer Vision?
    Computer vision is about teaching computers to perceive and interpret the world around them, even though they lack the lifetime experiences we have. This article covers the basics of computer vision, strategies for learning it, recommended resources and courses, and its various applications. To lear
    9 min read
  • PyTorch Functional Transforms for Computer Vision
    In this post, we will discuss ten PyTorch Functional Transforms most used in computer vision and image processing using PyTorch. PyTorch provides the torchvision library to perform different types of computer vision-related tasks. The functional transforms can be accessed from the torchvision.transf
    6 min read
  • PyTorch for Speech Recognition
    Speech recognition is a transformative technology that enables computers to understand and interpret spoken language, fostering seamless interaction between humans and machines. By implementing algorithms and machine learning techniques, speech recognition systems transcribe spoken words into text,
    5 min read
  • Caffe vs Pytorch
    When it comes to deep learning frameworks, choosing the right one for your project can significantly impact your workflow, model performance, and development experience. Two prominent frameworks in the machine learning community are Caffe and PyTorch. While both offer robust capabilities, they cater
    4 min read
  • Python | PyTorch sinh() method
    PyTorch is an open-source machine learning library developed by Facebook. It is used for deep neural network and natural language processing purposes. The function torch.sinh() provides support for the hyperbolic sine function in PyTorch. It expects the input in radian form. The input type is tensor
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences