Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • DSA
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps
    • Software and Tools
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Go Premium
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App

Computer Vision Tutorial

Last Updated : 06 Aug, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Computer Vision (CV) is a branch of Artificial Intelligence (AI) that helps computers to interpret and understand visual information much like humans. This tutorial is designed for both beginners and experienced professionals and covers key concepts such as Image Processing, Feature Extraction, Object Detection, Image Segmentation and other core techniques in CV.

Before moving into computer vision, it is recommended to have a foundational understanding of:

  1. Machine Learning
  2. Deep Learning
  3. OpenCV

These areas form the foundation of computer vision which helps us apply techniques and algorithms more effectively If we're unfamiliar with any of these topics, we recommend checking out their respective tutorials to build a solid foundation.

Mathematical Prerequisites for Computer Vision

Before moving into Computer Vision, having a foundational understanding of certain mathematical concepts will help us which includes:

1. Linear Algebra

  • Linear Algebra
  • Vectors
  • Matrices and Tensors
  • Eigenvalues and Eigenvectors
  • Singular Value Decomposition

2. Probability and Statistics

  • Probability and Statistics
  • Probability Distributions
  • Bayesian Inference and Bayes' Theorem
  • Markov Chains
  • Kalman Filters

3. Signal Processing

  • Signal Processing
  • Image Filtering and Convolution
  • Discrete Fourier Transform (DFT)
  • Fast Fourier Transform (FFT)
  • Principal Component Analysis (PCA)

Key Concepts in Computer Vision

1. Image Processing

It refers to techniques for manipulating and analyzing digital images. Common image processing tasks include:

1. Image Transformation

  • Image Transformation
  • Geometric Transformations
  • Fourier Transform
  • Intensity Transformation

2. Image Enhancement

  • Image Enhancement
  • Histogram Equalization
  • Contrast Enhancement
  • Image Sharpening
  • Color Correction

3. Noise Reduction Techniques

  • Noise Reduction Techniques
  • Median Filtering
  • Bilateral Filtering
  • Wavelet Denoising

4. Morphological Operations

  • Morphological Operations
  • Erosion and Dilation
  • Opening
  • Closing
  • Morphological Gradient

2. Feature Extraction

It involves identifying distinctive elements within an image for analysis and its techniques include:

1. Edge Detection Techniques

  • Computer Vision Algorithms
  • Edge Detection Techniques
  • Canny Edge Detector
  • Sobel Operator
  • Laplacian of Gaussian (LoG)

2. Corner and Interest Point Detection

  • Harris Corner Detection

3. Feature Descriptors

  • Feature Descriptors
  • SIFT (Scale-Invariant Feature Transform)
  • SURF (Speeded-Up Robust Features)
  • ORB (Oriented FAST and Rotated BRIEF)
  • HOG (Histogram of Oriented Gradients)

How Does Computer Vision Work?

  1. Computer Vision works much like the human eye and brain. First, our eyes capture the image and send the visual data to our brain. The brain then processes this information and transforms it into a meaningful interpretation, recognizing and categorizing the object based on its properties.
  2. In a similar way, Computer Vision uses a camera (acting like the human eye) to capture images. The visual data is then processed by algorithms to recognize and identify the objects based on patterns it has learned. However, before the system can recognize objects in new images, it needs to be trained on a large dataset of labeled images. This training enables the system to identify and associate various patterns with their corresponding labels.
  3. For example, imagine providing a computer with thousands of bird song recordings. The system learns by analyzing features like pitch, rhythm and duration. Once trained, it can then recognize whether a new sound resembles a bird song or not.

For more details you can refer to: Steps in Computer Vision

Popular Libraries for Computer Vision

To implement computer vision tasks effectively, various libraries are used:

  1. OpenCV: Mostly used open-source library for computer vision tasks like image processing, video capture and real-time applications.
  2. TensorFlow: A popular deep learning framework that includes tools for building and training computer vision models.
  3. PyTorch: Another deep learning library that provides great flexibility for computer vision tasks for research and development.
  4. scikit-image: A part of the scikit-learn ecosystem, this library provides algorithms for image processing and computer vision.

For more details you can refer to: Computer Vision Libraries

Deep Learning for Computer Vision

Deep learning has greatly enhanced computer vision by allowing machines to understand and analyze visual data and its key deep learning models include:

1. Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are designed for learning spatial hierarchies of features from images and its key components include:

  • Deep Learning for Computer Vision
  • Deep learning
  • Convolutional Neural Networks
  • Convolutional Layers
  • Pooling Layers
  • Fully Connected Layers

2. Generative Adversarial Networks (GANs)

It consists of two networks (generator and discriminator) that work against each other to create realistic images. There are various types of GANs each designed for specific tasks and improvements:

  • Generative Adversarial Networks (GANs)
  • Deep Convolutional GAN (DCGAN)
  • Conditional GAN (cGAN)
  • Cycle-Consistent GAN (CycleGAN)
  • Super-Resolution GAN (SRGAN)
  • StyleGAN

3. Variational Autoencoders (VAEs)

They are the probabilistic version of autoencoders which forces the model to learn a distribution over the latent space rather than a fixed point, some other autoencoders used in computer vision are:

  • Autoencoders
  • Variational Autoencoders (VAEs)
  • Denoising Autoencoders (DAE)
  • Convolutional Autoencoder (CAE)

4. Vision Transformers (ViT)

They are inspired by transformers models to treat images and sequence of patches and process them using self-attention mechanisms, some common vision transformers include:

  • Vision Transformers (ViT)
  • Swin Transformer
  • CvT (Convolutional Vision Transformer)

5. Vision Language Models

They integrate visual and textual information to perform image processing and natural language understanding.

  • Vision language models
  • CLIP (Contrastive Language-Image Pre-training)
  • ALIGN (A Large-scale ImaGe and Noisy-text)
  • BLIP (Bootstrapping Language-Image Pre-training)

Computer Vision Tasks

1. Image Classification

It involves analyzing an image and assigning it a specific label or category based on its content such as identifying whether an image contains a cat, dog or car.

Its techniques are as follows:

  • Computer Vision Tasks
  • Image Classification
  • Image Classification using Support Vector Machine (SVM)
  • Image Classification using RandomForest
  • Image Classification using CNN
  • Image Classification using TensorFlow
  • Image Classification using PyTorch Lightning

There are various types for Image Classification which are as follows:

  • Dataset for Image Classification.
  • Multiclass classification
  • Multilabel classification
  • Zero-shot classification

To learn about the datasets for image classification, we can go through the article on Dataset for Image Classification mentioned above.

2. Object Detection

It involves identifying and locating objects within an image by drawing bounding boxes around them.

It includes below following Techniques:

  • Top Computer Vision Models
  • Object Detection
  • YOLO (You Only Look Once)
  • SSD (Single Shot Multibox Detector)
  • Region-Based Convolutional Neural Networks (R-CNNs)
  • Fast R-CNN
  • Faster R-CNN
  • Mask R-CNN
  • Object Detection using TensorFlow
  • Object Detection using PyTorch

Type of Object Detection Concepts are as follows:

  • Bounding Box Regression
  • Intersection over Union (IoU)
  • Region Proposal Networks (RPN)
  • Non-Maximum Suppression (NMS)

3. Image Segmentation

It involves partitioning an image into distinct regions or segments to identify objects or boundaries at a pixel level.

Types of image segmentation are:

  • Image Segmentation
  • Semantic Segmentation
  • Instance Segmentation
  • Panoptic Segmentation

We can perform image segmentation using the following methods:

  • Image Segmentation using K Means Clustering
  • Image Segmentation using UNet
  • Image Segmentation using TensorFlow
  • Image Segmentation with Mask R-CNN

Need for Computer Vision

  1. High Demand in the Job Market: Critical for careers in AI, machine learning and data science across industries like healthcare, automotive and robotics.
  2. Revolutionizing Industries: Powers advancements in self-driving cars, medical diagnostics, agriculture and manufacturing by automating visual tasks.
  3. Solving Real-World Problems: Enhances safety, improves medical imaging and optimizes industrial processes.
  4. Improving Accessibility: It helps people with disabilities through image recognition and sign language translation.
  5. Enhancing Consumer Experiences: It personalizes shopping and improves customer service in retail and entertainment.

Applications of Computer Vision

  1. Healthcare: Used for disease detection and medical image analysis (X-rays, MRIs).
  2. Automotive: Helps self-driving cars to detect objects, lane keeping and traffic sign recognition.
  3. Retail: It helps with inventory management, theft prevention and customer behavior analysis.
  4. Agriculture: It is used for crop monitoring and disease detection.
  5. Security and Surveillance: It recognizes faces and find suspicious activities in security footage.

For more details you can refer to: Applications of Computer Vision


K

kumar_satyam
Improve
Article Tags :
  • Computer Vision
  • AI-ML-DS
  • Tutorials

Similar Reads

    Computer Vision Tutorial
    Computer Vision (CV) is a branch of Artificial Intelligence (AI) that helps computers to interpret and understand visual information much like humans. This tutorial is designed for both beginners and experienced professionals and covers key concepts such as Image Processing, Feature Extraction, Obje
    7 min read

    Introduction to Computer Vision

    Computer Vision - Introduction
    Computer Vision (CV) in artificial intelligence (AI) help machines to interpret and understand visual information similar to how humans use their eyes and brains. It involves teaching computers to analyze and understand images and videos, helping them "see" the world. From identifying objects in ima
    4 min read
    A Quick Overview to Computer Vision
    Computer vision means the extraction of information from images, text, videos, etc. Sometimes computer vision tries to mimic human vision. It’s a subset of computer-based intelligence or Artificial intelligence which collects information from digital images or videos and analyze them to define the a
    3 min read
    Applications of Computer Vision
    Have you ever wondered how machines can "see" and understand the world around them, much like humans do? This is the magic of computer vision—a branch of artificial intelligence that enables computers to interpret and analyze digital images, videos, and other visual inputs. From self-driving cars to
    6 min read
    Fundamentals of Image Formation
    Image formation is an analog to digital conversion of an image with the help of 2D Sampling and Quantization techniques that is done by the capturing devices like cameras. In general, we see a 2D view of the 3D world.In the same way, the formation of the analog image took place. It is basically a co
    7 min read
    Satellite Image Processing
    Satellite Image Processing is an important field in research and development and consists of the images of earth and satellites taken by the means of artificial satellites. Firstly, the photographs are taken in digital form and later are processed by the computers to extract the information. Statist
    2 min read
    Image Formats
    Image formats are different types of file types used for saving pictures, graphics, and photos. Choosing the right image format is important because it affects how your images look, load, and perform on websites, social media, or in print. Common formats include JPEG, PNG, GIF, and SVG, each with it
    5 min read

    Image Processing & Transformation

    Digital Image Processing Basics
    Digital Image Processing means processing digital image by means of a digital computer. We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful information. Digital image processing is the use of algorithms and mathematical models to proc
    7 min read
    Difference Between RGB, CMYK, HSV, and YIQ Color Models
    The colour spaces in image processing aim to facilitate the specifications of colours in some standard way. Different types of colour models are used in multiple fields like in hardware, in multiple applications of creating animation, etc. Let’s see each colour model and its application. RGBCMYKHSV
    3 min read
    Image Enhancement Techniques using OpenCV - Python
    Image enhancement is the process of improving the quality and appearance of an image. It can be used to correct flaws or defects in an image, or to simply make an image more visually appealing. Image enhancement techniques can be applied to a wide range of images, including photographs, scans, and d
    15+ min read
    Image Transformations using OpenCV in Python
    In this tutorial, we are going to learn Image Transformation using the OpenCV module in Python. What is Image Transformation? Image Transformation involves the transformation of image data in order to retrieve information from the image or preprocess the image for further usage. In this tutorial we
    5 min read
    How to find the Fourier Transform of an image using OpenCV Python?
    The Fourier Transform is a mathematical tool used to decompose a signal into its frequency components. In the case of image processing, the Fourier Transform can be used to analyze the frequency content of an image, which can be useful for tasks such as image filtering and feature extraction. In thi
    5 min read
    Python | Intensity Transformation Operations on Images
    Intensity transformations are applied on images for contrast manipulation or image thresholding. These are in the spatial domain, i.e. they are performed directly on the pixels of the image at hand, as opposed to being performed on the Fourier transform of the image. The following are commonly used
    5 min read
    Histogram Equalization in Digital Image Processing
    A digital image is a two-dimensional matrix of two spatial coordinates, with each cell specifying the intensity level of the image at that point. So, we have an N x N matrix with integer values ranging from a minimum intensity level of 0 to a maximum level of L-1, where L denotes the number of inten
    5 min read
    Python - Color Inversion using Pillow
    Color Inversion (Image Negative) is the method of inverting pixel values of an image. Image inversion does not depend on the color mode of the image, i.e. inversion works on channel level. When inversion is used on a multi color image (RGB, CMYK etc) then each channel is treated separately, and the
    4 min read
    Image Sharpening using Laplacian, High Boost Filtering in MATLAB
    Image sharpening is a crucial process in digital image processing, aimed at improving the clarity and crispness of visual content. By emphasizing the edges and fine details in a picture, sharpening transforms dull or blurred images into visuals where objects stand out more distinctly from their back
    3 min read
    Wand sharpen() function - Python
    The sharpen() function is an inbuilt function in the Python Wand ImageMagick library which is used to sharpen the image. Syntax: sharpen(radius, sigma) Parameters: This function accepts four parameters as mentioned above and defined below: radius: This parameter stores the radius value of the sharpn
    2 min read
    Python OpenCV - Smoothing and Blurring
    In this article, we are going to learn about smoothing and blurring with python-OpenCV. When we are dealing with images at some points the images will be crisper and sharper which we need to smoothen or blur to get a clean image, or sometimes the image will be with a really bad edge which also we ne
    7 min read
    Python PIL | GaussianBlur() method
    PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The ImageFilter module contains definitions for a pre-defined set of filters, which can be used with the Image.filter() method. PIL.ImageFilter.GaussianBlur() method create Gaussian blur filter.
    1 min read
    Apply a Gauss filter to an image with Python
    A Gaussian Filter is a low-pass filter used for reducing noise (high-frequency components) and for blurring regions of an image. This filter uses an odd-sized, symmetric kernel that is convolved with the image. The kernel weights are highest at the center and decrease as you move towards the periphe
    2 min read
    Spatial Filtering and its Types
    Spatial Filtering technique is used directly on pixels of an image. Mask is usually considered to be added in size so that it has specific center pixel. This mask is moved on the image such that the center of the mask traverses all image pixels. Classification on the basis of Linearity There are two
    3 min read
    Python PIL | MedianFilter() and ModeFilter() method
    PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The ImageFilter module contains definitions for a pre-defined set of filters, which can be used with the Image.filter() method. PIL.ImageFilter.MedianFilter() method creates a median filter. Pick
    1 min read
    Python | Bilateral Filtering
    A bilateral filter is used for smoothening images and reducing noise, while preserving edges. This article explains an approach using the averaging filter, while this article provides one using a median filter. However, these convolutions often result in a loss of important edge information, since t
    2 min read
    Python OpenCV - Morphological Operations
    Python OpenCV Morphological operations are one of the Image processing techniques that processes image based on shape. This processing strategy is usually performed on binary images.  Morphological operations based on OpenCV are as follows:ErosionDilationOpeningClosingMorphological GradientTop hatBl
    5 min read
    Erosion and Dilation of images using OpenCV in Python
    Morphological operations modify images based on the structure and arrangement of pixels. They apply kernel to an input image for changing its features depending on the arrangement of neighboring pixels. Morphological operations like erosion and dilation are techniques in image processing, especially
    3 min read
    Introduction to Resampling methods
    While reading about Machine Learning and Data Science we often come across a term called Imbalanced Class Distribution, which generally happens when observations in one of the classes are much higher or lower than in other classes. As Machine Learning algorithms tend to increase accuracy by reducing
    8 min read
    Python | Image Registration using OpenCV
    Image registration is a digital image processing technique that helps us align different images of the same scene. For instance, one may click the picture of a book from various angles. Below are a few instances that show the diversity of camera angles.Now, we may want to "align" a particular image
    3 min read

    Feature Extraction and Description

    Feature Extraction Techniques - NLP
    Introduction : This article focuses on basic feature extraction techniques in NLP to analyse the similarities between pieces of text. Natural Language Processing (NLP) is a branch of computer science and machine learning that deals with training computers to process a large amount of human (natural)
    10 min read
    SIFT Interest Point Detector Using Python - OpenCV
    SIFT (Scale Invariant Feature Transform) Detector is used in the detection of interest points on an input image. It allows the identification of localized features in images which is essential in applications such as:   Object Recognition in ImagesPath detection and obstacle avoidance algorithmsGest
    4 min read
    Feature Matching using Brute Force in OpenCV
    In this article, we will do feature matching using Brute Force in Python by using OpenCV library. Prerequisites: OpenCV OpenCV is a python library which is used to solve the computer vision problems. OpenCV is an open source Computer Vision library. So computer vision is a way of teaching intelligen
    13 min read
    Feature detection and matching with OpenCV-Python
    In this article, we are going to see about feature detection in computer vision with OpenCV in Python. Feature detection is the process of checking the important features of the image in this case features of the image can be edges, corners, ridges, and blobs in the images. In OpenCV, there are a nu
    5 min read
    Feature matching using ORB algorithm in Python-OpenCV
    Feature matching is an important technique that helps us find and compare similar points between images. The ORB (Oriented FAST and Rotated BRIEF) algorithm is an efficient method for feature matching. It combines FAST which detects keypoints and BRIEF which describes those keypoints. Since BRIEF st
    3 min read
    Mahotas - Speeded-Up Robust Features
    In this article we will see how we can get the speeded up robust features of image in mahotas. In computer vision, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as object recognition, image registration, classification, or 3D rec
    2 min read
    Create Local Binary Pattern of an image using OpenCV-Python
    In this article, we will discuss the image and how to find a binary pattern using the pixel value of the image. As we all know, image is also known as a set of pixels. When we store an image in computers or digitally, it’s corresponding pixel values are stored. So, when we read an image to a variabl
    5 min read

    Deep Learning for Computer Vision

    Image Classification using CNN
    Image classification is a key task in machine learning where the goal is to assign a label to an image based on its content. Convolutional Neural Networks (CNNs) are specifically designed to analyze and interpret images. Unlike traditional neural networks, they are good at detecting patterns, shapes
    5 min read
    What is Transfer Learning?
    Transfer learning is a machine learning technique where a model trained on one task is repurposed as the foundation for a second task. This approach is beneficial when the second task is related to the first or when data for the second task is limited. Using learned features from the initial task, t
    8 min read
    Top 5 PreTrained Models in Natural Language Processing (NLP)
    Pretrained models are deep learning models that have been trained on huge amounts of data before fine-tuning for a specific task. The pre-trained models have revolutionized the landscape of natural language processing as they allow the developer to transfer the learned knowledge to specific tasks, e
    7 min read
    ML | Introduction to Strided Convolutions
    Let us begin this article with a basic question - "Why padding and strided convolutions are required?" Assume we have an image with dimensions of n x n. If it is convoluted with an f x f filter, then the dimensions of the image obtained are (n-f+1) x (n-f+1). Example: Consider a 6 x 6 image as shown
    2 min read
    Dilated Convolution
    Prerequisite: Convolutional Neural Networks Dilated Convolution: It is a technique that expands the kernel (input) by inserting holes between its consecutive elements. In simpler terms, it is the same as convolution but it involves pixel skipping, so as to cover a larger area of the input.  Dilated
    5 min read
    Continuous Kernel Convolution
    Continuous Kernel convolution was proposed by the researcher of Verije University Amsterdam in collaboration with the University of Amsterdam in a paper titled 'CKConv: Continuous Kernel Convolution For Sequential Data'. The motivation behind that is to propose a model that uses the properties of co
    6 min read
    CNN | Introduction to Pooling Layer
    Pooling layer is used in CNNs to reduce the spatial dimensions (width and height) of the input feature maps while retaining the most important information. It involves sliding a two-dimensional filter over each channel of a feature map and summarizing the features within the region covered by the fi
    5 min read
    CNN | Introduction to Padding
    During convolution, the size of the output feature map is determined by the size of the input feature map, the size of the kernel, and the stride. if we simply apply the kernel on the input feature map, then the output feature map will be smaller than the input. This can result in the loss of inform
    5 min read
    What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?
    Padding is a technique used in convolutional neural networks (CNNs) to preserve the spatial dimensions of the input data and prevent the loss of information at the edges of the image. It involves adding additional rows and columns of pixels around the edges of the input data. There are several diffe
    14 min read
    Convolutional Neural Network (CNN) Architectures
    Convolutional Neural Network(CNN) is a neural network architecture in Deep Learning, used to recognize the pattern from structured arrays. However, over many years, CNN architectures have evolved. Many variants of the fundamental CNN Architecture This been developed, leading to amazing advances in t
    11 min read
    Deep Transfer Learning - Introduction
    Deep transfer learning is a machine learning technique that utilizes the knowledge learned from one task to improve the performance of another related task. This technique is particularly useful when there is a shortage of labeled data for the target task, as it allows the model to leverage the know
    8 min read
    Introduction to Residual Networks
    Recent years have seen tremendous progress in the field of Image Processing and Recognition. Deep Neural Networks are becoming deeper and more complex. It has been proved that adding more layers to a Neural Network can make it more robust for image-related tasks. But it can also cause them to lose a
    4 min read
    Residual Networks (ResNet) - Deep Learning
    After the first CNN-based architecture (AlexNet) that win the ImageNet 2012 competition, Every subsequent winning architecture uses more layers in a deep neural network to reduce the error rate. This works for less number of layers, but when we increase the number of layers, there is a common proble
    9 min read
    ML | Inception Network V1
    Inception net achieved a milestone in CNN classifiers when previous models were just going deeper to improve the performance and accuracy but compromising the computational cost. The Inception network, on the other hand, is heavily engineered. It uses a lot of tricks to push performance, both in ter
    4 min read
    Understanding GoogLeNet Model - CNN Architecture
    GoogLeNet (Inception V1) is a deep convolutional neural network architecture designed for efficient image classification. It introduces the Inception module, which performs multiple convolution operations (1x1, 3x3, 5x5) in parallel, along with max pooling and concatenates their outputs. The archite
    3 min read
    Image Recognition with Mobilenet
    Image Recognition plays an important role in many fields like medical disease analysis and many more. In this article, we will mainly focus on how to Recognize the given image, what is being displayed. What is MobilenetMobilenet is a model which does the same convolution as done by CNN to filter ima
    4 min read
    VGG-16 | CNN model
    A Convolutional Neural Network (CNN) architecture is a deep learning model designed for processing structured grid-like data such as images and is used for tasks like image classification, object detection and image segmentation.The VGG-16 model is a convolutional neural network (CNN) architecture t
    6 min read
    Autoencoders in Machine Learning
    Autoencoders are a special type of neural networks that learn to compress data into a compact form and then reconstruct it to closely match the original input. They consist of an:Encoder that captures important features by reducing dimensionality.Decoder that rebuilds the data from this compressed r
    8 min read
    How Autoencoders works ?
    Autoencoders is used for tasks like dimensionality reduction, anomaly detection and feature extraction. The goal of an autoencoder is to to compress data into a compact form and then reconstruct it to closely match the original input. The model trains by minimizing reconstruction error using loss fu
    6 min read
    Difference Between Encoder and Decoder
    Combinational Logic is the concept in which two or more input states define one or more output states. The Encoder and Decoder are combinational logic circuits. In which we implement combinational logic with the help of boolean algebra. To encode something is to convert in piece of information into
    9 min read
    Implementing an Autoencoder in PyTorch
    Autoencoders are neural networks designed for unsupervised tasks like dimensionality reduction, anomaly detection and feature extraction. They work by compressing data into a smaller form through an encoder and then reconstructing it back using a decoder. The goal is to minimize the difference betwe
    4 min read
    Generative Adversarial Network (GAN)
    Generative Adversarial Networks (GAN) help machines to create new, realistic data by learning from existing examples. It is introduced by Ian Goodfellow and his team in 2014 and they have transformed how computers generate images, videos, music and more. Unlike traditional models that only recognize
    12 min read
    Deep Convolutional GAN with Keras
    Deep Convolutional GAN (DCGAN) was proposed by a researcher from MIT and Facebook AI research. It is widely used in many convolution-based generation-based techniques. The focus of this paper was to make training GANs stable. Hence, they proposed some architectural changes in the computer vision pro
    9 min read
    StyleGAN - Style Generative Adversarial Networks
    StyleGAN is a generative model that produces highly realistic images by controlling image features at multiple levels from overall structure to fine details like texture and lighting. It is developed by NVIDIA and builds on traditional GANs with a unique architecture that separates style from conten
    5 min read

    Object Detection and Recognition

    Detect an object with OpenCV-Python
    Object detection refers to identifying and locating objects within images or videos. OpenCV provides a simple way to implement object detection using Haar Cascades a classifier trained to detect objects based on positive and negative images. In this article we will focus on detecting objects using i
    4 min read
    Haar Cascades for Object Detection - Python
    Haar Cascade classifiers are a machine learning-based method for object detection. They use a set of positive and negative images to train a classifier, which is then used to detect objects in new images. Positive Images: These images contain the objects that the classifier is trained to detect.Nega
    3 min read
    R-CNN - Region-Based Convolutional Neural Networks
    Traditional Convolutional Neural Networks (CNNs) with fully connected layers often struggle with object detection tasks, especially when dealing with multiple objects of varying sizes and positions within an image. A brute-force method like applying a sliding window across the image to detect object
    8 min read
    YOLO v2 - Object Detection
    In terms of speed, YOLO is one of the best models in object recognition, able to recognize objects and process frames at the rate up to 150 FPS for small networks. However, In terms of accuracy mAP, YOLO was not the state of the art model but has fairly good Mean average Precision (mAP) of 63% when
    7 min read
    Face recognition using Artificial Intelligence
    The current technology amazes people with amazing innovations that not only make life simple but also bearable. Face recognition has over time proven to be the least intrusive and fastest form of biometric verification. The software uses deep learning algorithms to compare a live captured image to t
    15+ min read
    Deep Face Recognition
    DeepFace is the facial recognition system used by Facebook for tagging images. It was proposed by researchers at Facebook AI Research (FAIR) at the 2014 IEEE Computer Vision and Pattern Recognition Conference (CVPR). In modern face recognition there are 4 steps: DetectAlignRepresentClassify This app
    8 min read
    ML | Face Recognition Using Eigenfaces (PCA Algorithm)
    In 1991, Turk and Pentland suggested an approach to face recognition that uses dimensionality reduction and linear algebra concepts to recognize faces. This approach is computationally less expensive and easy to implement and thus used in various applications at that time such as handwritten recogni
    4 min read
    Emojify using Face Recognition with Machine Learning
    In this article, we will learn how to implement a modification app that will show an emoji of expression which resembles the expression on your face. This is a fun project based on computer vision in which we use an image classification model in reality to classify different expressions of a person.
    7 min read
    Object Detection with Detection Transformer (DETR) by Facebook
    Facebook has just released its State of the art object detection Model on 27 May 2020. They are calling it DERT stands for Detection Transformer as it uses transformers to detect objects.This is the first time that transformer is used for such a task of Object detection along with a Convolutional Ne
    7 min read

    Image Segmentation

    Image Segmentation Using TensorFlow
    Image segmentation is a computer method that breaks up a picture into different parts. It looks at the small details of each pixel (the tiny dots that make up the image) and decides what kind of thing it is like a pet, the pet’s outline or the background. The main goal is to give every pixel in a pi
    5 min read
    Thresholding-Based Image Segmentation
    Image segmentation is the technique of subdividing an image into constituent sub-regions or distinct objects. The level of detail to which subdivision is carried out depends on the problem being solved. That is, segmentation should stop when the objects or the regions of interest in an application h
    7 min read
    Region and Edge Based Segmentation
    SegmentationSegmentation is the separation of one or more regions or objects in an image based on a discontinuity or a similarity criterion. A region in an image can be defined by its border (edge) or its interior, and the two representations are equal. There are prominently three methods of perform
    4 min read
    Image Segmentation with Watershed Algorithm - OpenCV Python
    Image segmentation is a fundamental computer vision task that involves partitioning an image into meaningful and semantically homogeneous regions. The goal is to simplify the representation of an image or make it more meaningful for further analysis. These segments typically correspond to objects or
    9 min read
    Mask R-CNN | ML
    The article provides a comprehensive understanding of the evolution from basic Convolutional Neural Networks (CNN) to the sophisticated Mask R-CNN, exploring the iterative improvements in object detection, instance segmentation, and the challenges and advantages associated with each model. What is R
    9 min read

    3D Reconstruction

    Python OpenCV - Depth map from Stereo Images
    OpenCV is the huge open-source library for the computer vision, machine learning, and image processing and now it plays a major role in real-time operation which is very important in today’s systems.Note: For more information, refer to Introduction to OpenCV Depth Map : A depth map is a picture wher
    2 min read
    Top 7 Modern-Day Applications of Augmented Reality (AR)
    Augmented Reality (or AR), in simpler terms, means intensifying the reality of real-time objects which we see through our eyes or gadgets like smartphones. You may think, How is it trending a lot? The answer is that it can offer an unforgettable experience, either of learning, measuring the three-di
    10 min read
    Virtual Reality, Augmented Reality, and Mixed Reality
    Virtual Reality (VR): The word 'virtual' means something that is conceptual and does not exist physically and the word 'reality' means the state of being real. So the term 'virtual reality' is itself conflicting. It means something that is almost real. We will probably never be on the top of Mount E
    3 min read
    Camera Calibration with Python - OpenCV
    Prerequisites: OpenCV A camera is an integral part of several domains like robotics, space exploration, etc camera is playing a major role. It helps to capture each and every moment and helpful for many analyses. In order to use the camera as a visual sensor, we should know the parameters of the cam
    4 min read
    Python OpenCV - Pose Estimation
    What is Pose Estimation? Pose estimation is a computer vision technique that is used to predict the configuration of the body(POSE) from an image. The reason for its importance is the abundance of applications that can benefit from technology.  Human pose estimation localizes body key points to accu
    7 min read
    40+ Top Computer Vision Projects [2025 Updated]
    Computer Vision is a branch of Artificial Intelligence (AI) that helps computers understand and interpret context of images and videos. It is used in domains like security cameras, photo editing, self-driving cars and robots to recognize objects and navigate real world using machine learning.This ar
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Campus Training Program
  • Explore
  • POTD
  • Job-A-Thon
  • Community
  • Videos
  • Blogs
  • Nation Skill Up
  • Tutorials
  • Programming Languages
  • DSA
  • Web Technology
  • AI, ML & Data Science
  • DevOps
  • CS Core Subjects
  • Interview Preparation
  • GATE
  • Software and Tools
  • Courses
  • IBM Certification
  • DSA and Placements
  • Web Development
  • Programming Languages
  • DevOps & Cloud
  • GATE
  • Trending Technologies
  • Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
  • Preparation Corner
  • Aptitude
  • Puzzles
  • GfG 160
  • DSA 360
  • System Design
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences