Deep Convolutional GAN with Keras
Last Updated : 07 Apr, 2025
Deep Convolutional GAN (DCGAN) was proposed by a researcher from MIT and Facebook AI research. It is widely used in many convolution-based generation-based techniques. The focus of this paper was to make training GANs stable. Hence, they proposed some architectural changes in the computer vision problems. In this article, we will be using DCGAN on the fashion MNIST dataset to generate images related to clothes.
Need for DCGANs:
DCGANs are introduced to reduce the problem of mode collapse. Mode collapse occurs when the generator got biased towards a few outputs and can't able to produce outputs of every variation from the dataset. For example- take the case of mnist digits dataset (digits from 0 to 9) , we want the generator should generate all type of digits but sometimes our generator got biased towards two to three digits and produce them only. Because of that the discriminator also got optimized towards that particular digits only, and this state is known as mode collapse. But this problem can be overcome by using DCGANs.
Architecture:

The generator of the DCGAN architecture takes 100 uniform generated values using normal distribution as an input. First, it changes the dimension to 4x4x1024 and performed a fractionally stridden convolution 4 times with a stride of 1/2 (this means every time when applied, it doubles the image dimension while reducing the number of output channels). The generated output has dimensions of (64, 64, 3). There are some architectural changes proposed in the generator such as the removal of all fully connected layers, and the use of Batch Normalization which helps in stabilizing training. In this paper, the authors use ReLU activation function in all layers of the generator, except for the output layers. We will be implementing generator with similar guidelines but not completely the same architecture.
The role of the discriminator here is to determine that the image comes from either a real dataset or a generator. The discriminator can be simply designed similar to a convolution neural network that performs an image classification task. However, the authors of this paper suggested some changes in the discriminator architecture. Instead of fully connected layers, they used only strided-convolutions with LeakyReLU as an activation function, the input of the generator is a single image from the dataset or generated image and the output is a score that determines whether the image is real or generated.
Implementation:
In this section we will be discussing the implementation of DCGAN in Keras, since our dataset in the Fashion MNIST dataset, this dataset contains images of size (28, 28) of 1 color channel instead of (64, 64) of 3 color channels. So, we need to make some changes in the architecture, we will be discussing these changes as we go along.
In the first step, we need to import the necessary classes such as TensorFlow, Keras, matplotlib, etc. We will be using TensorFlow version 2. This version of TensorFlow provides inbuilt support for the Keras library as its default High-level API.
python # code % matplotlib inline import tensorflow as tf from tensorflow import keras import numpy as np import matplotlib.pyplot as plt from tqdm import tqdm from IPython import display # Check tensorflow version print('Tensorflow version:', tf.__version__)
Now we load the fashion-MNIST dataset, the good thing is that the dataset can be imported from tf.keras.datasets API. So, we don't need to load datasets manually by copying files. This dataset contains 60k training images and 10k test images for each dimension (28, 28, 1). Since the value of each pixel is in the range (0, 255), we divide these values by 255 to normalize it.
python (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data() x_train = x_train.astype(np.float32) / 255.0 x_test = x_test.astype(np.float32) / 255.0 x_train.shape, x_test.shape
((60000, 28, 28), (10000, 28, 28))
Now in the next step, we will be visualizing some of the images from the Fashion-MNIST dataset, we use matplotlib library for that.
python # We plot first 25 images of training dataset plt.figure(figsize =(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(x_train[i], cmap = plt.cm.binary) plt.show()
Original Fashion MNIST imagesNow, we define training parameters such as batch size and divide the dataset into batches, and fill those batches by randomly sampling the training data.
python batch_size = 32 # replacing the selected elements with new elements. def create_batch(x_train): # Correct indentation here dataset = tf.data.Dataset.from_tensor_slices(x_train).shuffle(1000) # Combines consecutive elements of this dataset into batches dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1) # Creates a Dataset that prefetches elements from this dataset return dataset
Now, we define the generator architecture, this generator architecture takes a vector of size 100 and first reshape that into (7, 7, 128) vector and then, it applies transpose convolution on that reshaped image in combination with batch normalization. The output of this generator is a trained image of dimension (28, 28, 1).
python # code num_features = 100 generator = keras.models.Sequential([ keras.layers.Dense(7 * 7 * 128, input_shape =[num_features]), keras.layers.Reshape([7, 7, 128]), keras.layers.BatchNormalization(), keras.layers.Conv2DTranspose( 64, (5, 5), (2, 2), padding ="same", activation ="selu"), keras.layers.BatchNormalization(), keras.layers.Conv2DTranspose( 1, (5, 5), (2, 2), padding ="same", activation ="tanh"), ]) generator.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 6272) 633472
_________________________________________________________________
reshape (Reshape) (None, 7, 7, 128) 0
_________________________________________________________________
batch_normalization (BatchNo (None, 7, 7, 128) 512
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 14, 14, 64) 204864
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 64) 256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 1) 1601
=================================================================
Total params: 840, 705
Trainable params: 840, 321
Non-trainable params: 384
_________________________________________________________________
Now, we define discriminator architecture, the discriminator takes an image of size 28*28 with 1 color channel and outputs a scalar value representing an image from either dataset or generated image.
python discriminator = keras.models.Sequential([ keras.layers.Conv2D(64, (5, 5), (2, 2), padding ="same", input_shape =[28, 28, 1]), keras.layers.LeakyReLU(0.2), keras.layers.Dropout(0.3), keras.layers.Conv2D(128, (5, 5), (2, 2), padding ="same"), keras.layers.LeakyReLU(0.2), keras.layers.Dropout(0.3), keras.layers.Flatten(), keras.layers.Dense(1, activation ='sigmoid') ]) discriminator.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 14, 14, 64) 1664
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 7, 7, 128) 204928
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 6272) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 6273
=================================================================
Total params: 212, 865
Trainable params: 212, 865
Non-trainable params: 0
_________________________________________________________________
Now we need to compile our DCGAN model (combination of generator and discriminator), we will first compile the discriminator and set its training to False, because we first want to train the generator.
python # compile discriminator using binary cross entropy loss and adam optimizer discriminator.compile(loss ="binary_crossentropy", optimizer ="adam") # make discriminator no-trainable as of now discriminator.trainable = False # Combine both generator and discriminator gan = keras.models.Sequential([generator, discriminator]) # compile generator using binary cross entropy loss and adam optimizer gan.compile(loss ="binary_crossentropy", optimizer ="adam")
Now, we define the training procedure for this GAN model, we will be using tqdm package which we have imported earlier., this package helps in visualizing training.
python seed = tf.random.normal(shape =[batch_size, 100]) def train_dcgan(gan, dataset, batch_size, num_features, epochs = 5): generator, discriminator = gan.layers for epoch in tqdm(range(epochs)): print() print("Epoch {}/{}".format(epoch + 1, epochs)) for X_batch in dataset: # create a random noise of sizebatch_size * 100 # to passit into the generator noise = tf.random.normal(shape =[batch_size, num_features]) generated_images = generator(noise) # take batch of generated image and real image and # use them to train the discriminator X_fake_and_real = tf.concat([generated_images, X_batch], axis = 0) y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size) discriminator.trainable = True discriminator.train_on_batch(X_fake_and_real, y1) # Here we will be training our GAN model, in this step # we pass noise that uses generatortogenerate the image # and pass it with labels as [1] So, it can fool the discriminator noise = tf.random.normal(shape =[batch_size, num_features]) y2 = tf.constant([[1.]] * batch_size) discriminator.trainable = False gan.train_on_batch(noise, y2) # generate images for the GIF as we go generate_and_save_images(generator, epoch + 1, seed) generate_and_save_images(generator, epochs, seed)
Now we define a function that generates and save images from generator (during training). We will use these generated images to plot the GIF later.
python def generate_and_save_images(model, epoch, test_input): # Indent this line properly to be part of the function predictions = model(test_input, training=False) fig = plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i + 1) plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='binary') plt.axis('off') # Save the generated images as a PNG file plt.savefig('image_epoch_{:04d}.png'.format(epoch))
Now, we need to train the model but before that, we also need to create batches of training data and add a dimension that represents number of color maps.
python # reshape to add a color map x_train_dcgan = x_train.reshape(-1, 28, 28, 1) * 2. - 1. # create batches dataset = create_batch(x_train_dcgan) # callthe training function with 10 epochs and record time %% time train_dcgan(gan, dataset, batch_size, num_features, epochs = 10)
0%| | 0/10 [00:00<?, ?it/s]
Epoch 1/10
10%|? | 1/10 [01:04<09:39, 64.37s/it]
Epoch 2/10
20%|?? | 2/10 [02:10<08:39, 64.99s/it]
Epoch 3/10
30%|??? | 3/10 [03:14<07:33, 64.74s/it]
Epoch 4/10
40%|???? | 4/10 [04:19<06:27, 64.62s/it]
Epoch 5/10
50%|????? | 5/10 [05:23<05:22, 64.58s/it]
Epoch 6/10
60%|?????? | 6/10 [06:27<04:17, 64.47s/it]
Epoch 7/10
70%|??????? | 7/10 [07:32<03:13, 64.55s/it]
Epoch 8/10
80%|???????? | 8/10 [08:37<02:08, 64.48s/it]
Epoch 9/10
90%|????????? | 9/10 [09:41<01:04, 64.54s/it]
Epoch 10/10
100%|??????????| 10/10 [10:46<00:00, 64.61s/it]
CPU times: user 7min 4s, sys: 33.3 s, total: 7min 37s
Wall time: 10min 46s
Now we will define a function that takes the saved images and convert them into GIF. We use this function from here
python import imageio import glob anim_file = 'dcgan_results.gif' with imageio.get_writer(anim_file, mode ='I') as writer: filenames = glob.glob('image*.png') filenames = sorted(filenames) last = -1 for i, filename in enumerate(filenames): frame = 2*(i) if round(frame) > round(last): last = frame else: continue image = imageio.imread(filename) writer.append_data(image) image = imageio.imread(filename) writer.append_data(image) display.Image(filename = anim_file)
Generated Images resultsResults and Conclusion:
To evaluate the quality of the representations learned by DCGANs for supervised tasks, the authors train the model on ImageNet-1k and then use the discriminator’s convolution features from all layers, max-pooling each layer's representation to produce a 4 × 4 spatial grid. These features are then flattened and concatenated to form a 28672-dimensional vector and a regularized linear L2-SVM classifier is trained on top of them. This model is then evaluated on CIFAR-10 dataset but not trained on it. The model reported an accuracy of 82 % which also displays the robustness of the model.

On Street View Housing Number dataset, it achieved a validation loss of 22% which is the new state-of-the-art, even discriminator architecture when supervise and trained as a CNN model has more validation loss than it.
Similar Reads
Computer Vision Tutorial Computer Vision is a branch of Artificial Intelligence (AI) that enables computers to interpret and extract information from images and videos, similar to human perception. It involves developing algorithms to process visual data and derive meaningful insights.Why Learn Computer Vision?High Demand i
8 min read
Introduction to Computer Vision
Computer Vision - IntroductionEver wondered how are we able to understand the things we see? Like we see someone walking, whether we realize it or not, using the prerequisite knowledge, our brain understands what is happening and stores it as information. Imagine we look at something and go completely blank. Into oblivion. Scary
3 min read
A Quick Overview to Computer VisionComputer vision means the extraction of information from images, text, videos, etc. Sometimes computer vision tries to mimic human vision. Itâs a subset of computer-based intelligence or Artificial intelligence which collects information from digital images or videos and analyze them to define the a
3 min read
Applications of Computer VisionHave you ever wondered how machines can "see" and understand the world around them, much like humans do? This is the magic of computer visionâa branch of artificial intelligence that enables computers to interpret and analyze digital images, videos, and other visual inputs. From self-driving cars to
6 min read
Fundamentals of Image FormationImage formation is an analog to digital conversion of an image with the help of 2D Sampling and Quantization techniques that is done by the capturing devices like cameras. In general, we see a 2D view of the 3D world.In the same way, the formation of the analog image took place. It is basically a co
7 min read
Satellite Image ProcessingSatellite Image Processing is an important field in research and development and consists of the images of earth and satellites taken by the means of artificial satellites. Firstly, the photographs are taken in digital form and later are processed by the computers to extract the information. Statist
2 min read
Image FormatsImage formats are different types of file types used for saving pictures, graphics, and photos. Choosing the right image format is important because it affects how your images look, load, and perform on websites, social media, or in print. Common formats include JPEG, PNG, GIF, and SVG, each with it
5 min read
Image Processing & Transformation
Digital Image Processing BasicsDigital Image Processing means processing digital image by means of a digital computer. We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful information. Digital image processing is the use of algorithms and mathematical models to proc
7 min read
Difference Between RGB, CMYK, HSV, and YIQ Color ModelsThe colour spaces in image processing aim to facilitate the specifications of colours in some standard way. Different types of colour models are used in multiple fields like in hardware, in multiple applications of creating animation, etc. Letâs see each colour model and its application. RGBCMYKHSV
3 min read
Image Enhancement Techniques using OpenCV - PythonImage enhancement is the process of improving the quality and appearance of an image. It can be used to correct flaws or defects in an image, or to simply make an image more visually appealing. Image enhancement techniques can be applied to a wide range of images, including photographs, scans, and d
15+ min read
Image Transformations using OpenCV in PythonIn this tutorial, we are going to learn Image Transformation using the OpenCV module in Python. What is Image Transformation? Image Transformation involves the transformation of image data in order to retrieve information from the image or preprocess the image for further usage. In this tutorial we
5 min read
How to find the Fourier Transform of an image using OpenCV Python?The Fourier Transform is a mathematical tool used to decompose a signal into its frequency components. In the case of image processing, the Fourier Transform can be used to analyze the frequency content of an image, which can be useful for tasks such as image filtering and feature extraction. In thi
5 min read
Python | Intensity Transformation Operations on ImagesIntensity transformations are applied on images for contrast manipulation or image thresholding. These are in the spatial domain, i.e. they are performed directly on the pixels of the image at hand, as opposed to being performed on the Fourier transform of the image. The following are commonly used
5 min read
Histogram Equalization in Digital Image ProcessingA digital image is a two-dimensional matrix of two spatial coordinates, with each cell specifying the intensity level of the image at that point. So, we have an N x N matrix with integer values ranging from a minimum intensity level of 0 to a maximum level of L-1, where L denotes the number of inten
5 min read
Python - Color Inversion using PillowColor Inversion (Image Negative) is the method of inverting pixel values of an image. Image inversion does not depend on the color mode of the image, i.e. inversion works on channel level. When inversion is used on a multi color image (RGB, CMYK etc) then each channel is treated separately, and the
4 min read
Image Sharpening Using Laplacian Filter and High Boost Filtering in MATLABImage sharpening is an effect applied to digital images to give them a sharper appearance. Sharpening enhances the definition of edges in an image. The dull images are those which are poor at the edges. There is not much difference in background and edges. On the contrary, the sharpened image is tha
4 min read
Wand sharpen() function - PythonThe sharpen() function is an inbuilt function in the Python Wand ImageMagick library which is used to sharpen the image. Syntax: sharpen(radius, sigma) Parameters: This function accepts four parameters as mentioned above and defined below: radius: This parameter stores the radius value of the sharpn
2 min read
Python OpenCV - Smoothing and BlurringIn this article, we are going to learn about smoothing and blurring with python-OpenCV. When we are dealing with images at some points the images will be crisper and sharper which we need to smoothen or blur to get a clean image, or sometimes the image will be with a really bad edge which also we ne
7 min read
Python PIL | GaussianBlur() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The ImageFilter module contains definitions for a pre-defined set of filters, which can be used with the Image.filter() method. PIL.ImageFilter.GaussianBlur() method create Gaussian blur filter.
1 min read
Apply a Gauss filter to an image with PythonA Gaussian Filter is a low-pass filter used for reducing noise (high-frequency components) and for blurring regions of an image. This filter uses an odd-sized, symmetric kernel that is convolved with the image. The kernel weights are highest at the center and decrease as you move towards the periphe
2 min read
Spatial Filtering and its TypesSpatial Filtering technique is used directly on pixels of an image. Mask is usually considered to be added in size so that it has specific center pixel. This mask is moved on the image such that the center of the mask traverses all image pixels. Classification on the basis of Linearity There are two
3 min read
Python PIL | MedianFilter() and ModeFilter() methodPIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. The ImageFilter module contains definitions for a pre-defined set of filters, which can be used with the Image.filter() method. PIL.ImageFilter.MedianFilter() method creates a median filter. Pick
1 min read
Python | Bilateral FilteringA bilateral filter is used for smoothening images and reducing noise, while preserving edges. This article explains an approach using the averaging filter, while this article provides one using a median filter. However, these convolutions often result in a loss of important edge information, since t
2 min read
Python OpenCV - Morphological OperationsPython OpenCV Morphological operations are one of the Image processing techniques that processes image based on shape. This processing strategy is usually performed on binary images. Morphological operations based on OpenCV are as follows:ErosionDilationOpeningClosingMorphological GradientTop hatBl
5 min read
Erosion and Dilation of images using OpenCV in pythonMorphological operations are a set of operations that process images based on shapes. They apply a structuring element to an input image and generate an output image. The most basic morphological operations are two: Erosion and Dilation Basics of Erosion: Erodes away the boundaries of the foreground
2 min read
Introduction to Resampling methodsWhile reading about Machine Learning and Data Science we often come across a term called Imbalanced Class Distribution, which generally happens when observations in one of the classes are much higher or lower than in other classes. As Machine Learning algorithms tend to increase accuracy by reducing
8 min read
Python | Image Registration using OpenCVImage registration is a digital image processing technique that helps us align different images of the same scene. For instance, one may click the picture of a book from various angles. Below are a few instances that show the diversity of camera angles.Now, we may want to "align" a particular image
3 min read
Feature Extraction and Description
Feature Extraction Techniques - NLPIntroduction : This article focuses on basic feature extraction techniques in NLP to analyse the similarities between pieces of text. Natural Language Processing (NLP) is a branch of computer science and machine learning that deals with training computers to process a large amount of human (natural)
10 min read
SIFT Interest Point Detector Using Python - OpenCVSIFT (Scale Invariant Feature Transform) Detector is used in the detection of interest points on an input image. It allows the identification of localized features in images which is essential in applications such as:Â Â Object Recognition in ImagesPath detection and obstacle avoidance algorithmsGest
4 min read
Feature Matching using Brute Force in OpenCVIn this article, we will do feature matching using Brute Force in Python by using OpenCV library. Prerequisites: OpenCV OpenCV is a python library which is used to solve the computer vision problems. OpenCV is an open source Computer Vision library. So computer vision is a way of teaching intelligen
13 min read
Feature detection and matching with OpenCV-PythonIn this article, we are going to see about feature detection in computer vision with OpenCV in Python. Feature detection is the process of checking the important features of the image in this case features of the image can be edges, corners, ridges, and blobs in the images. In OpenCV, there are a nu
5 min read
Feature matching using ORB algorithm in Python-OpenCVORB is a fusion of FAST keypoint detector and BRIEF descriptor with some added features to improve the performance. FAST is Features from Accelerated Segment Test used to detect features from the provided image. It also uses a pyramid to produce multiscale-features. Now it doesnât compute the orient
2 min read
Mahotas - Speeded-Up Robust FeaturesIn this article we will see how we can get the speeded up robust features of image in mahotas. In computer vision, speeded up robust features (SURF) is a patented local feature detector and descriptor. It can be used for tasks such as object recognition, image registration, classification, or 3D rec
2 min read
Create Local Binary Pattern of an image using OpenCV-PythonIn this article, we will discuss the image and how to find a binary pattern using the pixel value of the image. As we all know, image is also known as a set of pixels. When we store an image in computers or digitally, itâs corresponding pixel values are stored. So, when we read an image to a variabl
5 min read
Deep Learning for Computer Vision
Image Classification using CNNThe article is about creating an Image classifier for identifying cat-vs-dogs using TFLearn in Python. Machine Learning is now one of the hottest topics around the world. Well, it can even be said of the new electricity in today's world. But to be precise what is Machine Learning, well it's just one
7 min read
What is Transfer Learning?Transfer learning is a machine learning technique where a model trained on one task is repurposed as the foundation for a second task. This approach is beneficial when the second task is related to the first or when data for the second task is limited. Using learned features from the initial task, t
8 min read
Top 5 PreTrained Models in Natural Language Processing (NLP)Pretrained models are deep learning models that have been trained on huge amounts of data before fine-tuning for a specific task. The pre-trained models have revolutionized the landscape of natural language processing as they allow the developer to transfer the learned knowledge to specific tasks, e
7 min read
ML | Introduction to Strided ConvolutionsLet us begin this article with a basic question - "Why padding and strided convolutions are required?" Assume we have an image with dimensions of n x n. If it is convoluted with an f x f filter, then the dimensions of the image obtained are (n-f+1) x (n-f+1). Example: Consider a 6 x 6 image as shown
2 min read
Dilated ConvolutionPrerequisite: Convolutional Neural Networks Dilated Convolution: It is a technique that expands the kernel (input) by inserting holes between its consecutive elements. In simpler terms, it is the same as convolution but it involves pixel skipping, so as to cover a larger area of the input. Dilated
5 min read
Continuous Kernel ConvolutionContinuous Kernel convolution was proposed by the researcher of Verije University Amsterdam in collaboration with the University of Amsterdam in a paper titled 'CKConv: Continuous Kernel Convolution For Sequential Data'. The motivation behind that is to propose a model that uses the properties of co
6 min read
CNN | Introduction to Pooling LayerPooling layer is used in CNNs to reduce the spatial dimensions (width and height) of the input feature maps while retaining the most important information. It involves sliding a two-dimensional filter over each channel of a feature map and summarizing the features within the region covered by the fi
5 min read
CNN | Introduction to PaddingDuring convolution, the size of the output feature map is determined by the size of the input feature map, the size of the kernel, and the stride. if we simply apply the kernel on the input feature map, then the output feature map will be smaller than the input. This can result in the loss of inform
5 min read
What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?Padding is a technique used in convolutional neural networks (CNNs) to preserve the spatial dimensions of the input data and prevent the loss of information at the edges of the image. It involves adding additional rows and columns of pixels around the edges of the input data. There are several diffe
14 min read
Convolutional Neural Network (CNN) ArchitecturesConvolutional Neural Network(CNN) is a neural network architecture in Deep Learning, used to recognize the pattern from structured arrays. However, over many years, CNN architectures have evolved. Many variants of the fundamental CNN Architecture This been developed, leading to amazing advances in t
11 min read
Deep Transfer Learning - IntroductionDeep transfer learning is a machine learning technique that utilizes the knowledge learned from one task to improve the performance of another related task. This technique is particularly useful when there is a shortage of labeled data for the target task, as it allows the model to leverage the know
8 min read
Introduction to Residual NetworksRecent years have seen tremendous progress in the field of Image Processing and Recognition. Deep Neural Networks are becoming deeper and more complex. It has been proved that adding more layers to a Neural Network can make it more robust for image-related tasks. But it can also cause them to lose a
4 min read
Residual Networks (ResNet) - Deep LearningAfter the first CNN-based architecture (AlexNet) that win the ImageNet 2012 competition, Every subsequent winning architecture uses more layers in a deep neural network to reduce the error rate. This works for less number of layers, but when we increase the number of layers, there is a common proble
9 min read
ML | Inception Network V1Inception net achieved a milestone in CNN classifiers when previous models were just going deeper to improve the performance and accuracy but compromising the computational cost. The Inception network, on the other hand, is heavily engineered. It uses a lot of tricks to push performance, both in ter
4 min read
Understanding GoogLeNet Model - CNN ArchitectureGoogle Net (or Inception V1) was proposed by research at Google (with the collaboration of various universities) in 2014 in the research paper titled "Going Deeper with Convolutions". This architecture was the winner at the ILSVRC 2014 image classification challenge. It has provided a significant de
4 min read
Image Recognition with MobilenetIntroduction: Image Recognition plays an important role in many fields like medical disease analysis, and many more. In this article, we will mainly focus on how to Recognize the given image, what is being displayed. We are assuming to have a pre-knowledge of Tensorflow, Keras, Python, MachineLearni
5 min read
VGG-16 | CNN modelA Convolutional Neural Network (CNN) architecture is a deep learning model designed for processing structured grid-like data, such as images. It consists of multiple layers, including convolutional, pooling, and fully connected layers. CNNs are highly effective for tasks like image classification, o
7 min read
Autoencoders in Machine LearningAutoencoders are a special type of neural networks that learn to compress data into a compact form and then reconstruct it to closely match the original input. They consist of an:Encoder that captures important features by reducing dimensionality.Decoder that rebuilds the data from this compressed r
8 min read
How Autoencoders works ?Autoencoders is used for tasks like dimensionality reduction, anomaly detection and feature extraction. The goal of an autoencoder is to to compress data into a compact form and then reconstruct it to closely match the original input. The model trains by minimizing reconstruction error using loss fu
6 min read
Difference Between Encoder and DecoderCombinational Logic is the concept in which two or more input states define one or more output states. The Encoder and Decoder are combinational logic circuits. In which we implement combinational logic with the help of boolean algebra. To encode something is to convert in piece of information into
9 min read
Implementing an Autoencoder in PyTorchAutoencoders are neural networks designed for unsupervised tasks like dimensionality reduction, anomaly detection and feature extraction. They work by compressing data into a smaller form through an encoder and then reconstructing it back using a decoder. The goal is to minimize the difference betwe
4 min read
Generative Adversarial Network (GAN)Generative Adversarial Networks (GANs) help machines to create new, realistic data by learning from existing examples. It is introduced by Ian Goodfellow and his team in 2014 and they have transformed how computers generate images, videos, music and more. Unlike traditional models that only recogniz
12 min read
Deep Convolutional GAN with KerasDeep Convolutional GAN (DCGAN) was proposed by a researcher from MIT and Facebook AI research. It is widely used in many convolution-based generation-based techniques. The focus of this paper was to make training GANs stable. Hence, they proposed some architectural changes in the computer vision pro
9 min read
StyleGAN - Style Generative Adversarial NetworksStyleGAN is a generative model that produces highly realistic images by controlling image features at multiple levels from overall structure to fine details like texture and lighting. It is developed by NVIDIA and builds on traditional GANs with a unique architecture that separates style from conten
5 min read
Object Detection and Recognition
Image Segmentation
3D Reconstruction
Python OpenCV - Depth map from Stereo ImagesOpenCV is the huge open-source library for the computer vision, machine learning, and image processing and now it plays a major role in real-time operation which is very important in todayâs systems.Note: For more information, refer to Introduction to OpenCV Depth Map : A depth map is a picture wher
2 min read
Top 7 Modern-Day Applications of Augmented Reality (AR)Augmented Reality (or AR), in simpler terms, means intensifying the reality of real-time objects which we see through our eyes or gadgets like smartphones. You may think, How is it trending a lot? The answer is that it can offer an unforgettable experience, either of learning, measuring the three-di
10 min read
Virtual Reality, Augmented Reality, and Mixed RealityVirtual Reality (VR): The word 'virtual' means something that is conceptual and does not exist physically and the word 'reality' means the state of being real. So the term 'virtual reality' is itself conflicting. It means something that is almost real. We will probably never be on the top of Mount E
3 min read
Camera Calibration with Python - OpenCVPrerequisites: OpenCV A camera is an integral part of several domains like robotics, space exploration, etc camera is playing a major role. It helps to capture each and every moment and helpful for many analyses. In order to use the camera as a visual sensor, we should know the parameters of the cam
4 min read
Python OpenCV - Pose EstimationWhat is Pose Estimation? Pose estimation is a computer vision technique that is used to predict the configuration of the body(POSE) from an image. The reason for its importance is the abundance of applications that can benefit from technology. Human pose estimation localizes body key points to accu
7 min read
40+ Top Computer Vision Projects [2025 Updated] Computer Vision is a branch of Artificial Intelligence (AI) that helps computers understand and interpret context of images and videos. It is used in domains like security cameras, photo editing, self-driving cars and robots to recognize objects and navigate real world using machine learning.This ar
4 min read