Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Computational Graph in PyTorch
Next article icon

How to Compute Gradients in PyTorch

Last Updated : 12 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

PyTorch is a leading deep-learning library that offers flexibility and a dynamic computing environment, making it a preferred tool for researchers and developers. One of its most praised features is the ease of computing gradients automatically, which is crucial for training neural networks.

In this guide, we will explore how gradients can be computed in PyTorch using its autograd module.

Understanding Automatic Differentiation

Automatic differentiation is a cornerstone of modern deep learning, allowing for efficient computation of gradients—that is, the derivatives of functions. PyTorch achieves this through its autograd module, which automatically provides derivatives for tensors concerning the tensors that have requires_grad set to True. This feature simplifies the implementation of many algorithms in machine learning.

Role of Gradients in Neural Networks

Gradients are indispensable in the training of neural networks, guiding the optimization of parameters through backpropagation:

  • Learning Mechanism: Gradients direct how parameters (weights and biases) should be adjusted to minimize prediction errors.
  • Backpropagation: Backpropagation is the algorithm at the core of training deep learning models. It consists of two main phases:
    • Forward Pass: In this phase, input data is passed through the network layer by layer until the output is produced. The output is then compared to the true value, and a loss is computed.
    • Backward Pass (Backpropagation of Errors): This is where gradients come into play. Starting from the output layer back to the input layer, gradients of the loss function are calculated with respect to each parameter. The computation uses the chain rule from calculus to propagate the error backward through the network.
  • Parameter Updates: Optimization algorithms, such as Gradient Descent, use these gradients to update the model parameters, steering the model toward optimal performance.
  • Efficiency and Scalability: PyTorch's automatic differentiation tools enhance training efficiency, particularly in large models.

Introduction to Gradient Computation in PyTorch

Gradients represent the partial derivatives of a loss function relative to model parameters. They indicate both the direction and rate of error reduction needed to minimize the loss.

How to Use torch.autograd for Gradient Calculation?

torch.autograd is PyTorch’s engine for automatic differentiation. Here are its key components:

  • Tensor: Tensors are the fundamental data units in PyTorch, akin to arrays and matrices. The requires_grad attribute, when set to True, allows PyTorch to compute gradients for tensor operations.
  • Function: Each operation performed on tensors creates a function node that forms part of a computation graph, which is dynamic by nature.

Basic Usage of Gradients

To compute gradients, follow these steps:

  1. Initialize a Tensor with requires_grad set to True.
  2. Perform Operations on the tensor to define the computation graph.
  3. Backward Pass: Use the backward() method to compute gradients. For example, for y = x^2, where x =2 , the gradient would be 4.

Example Code for Computing Gradients

Here's how to apply this in a neural network context:

Python
import torch  # Initialize tensor with gradient tracking x = torch.tensor([2.0], requires_grad=True)  # Define the operation y = x ** 2  # Compute gradients y.backward()  # Print the gradient print(x.grad)  # Output: tensor([4.0]) 

Output:

tensor([4.])

Gradient Computation in PyTorch: Guide to Training Neural Networks

Here's a more comprehensive example that includes a basic neural network with one hidden layer, a loss function, and the gradient update process using an optimizer:

Step 1: Setup Environment and Data

Python
import torch import torch.nn as nn import torch.optim as optim  # Example dataset: XOR problem X = torch.tensor([[0,0], [0,1], [1,0], [1,1]], dtype=torch.float) y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float)  # Neural Network Structure class SimpleNet(nn.Module):     def __init__(self):         super(SimpleNet, self).__init__()         self.fc1 = nn.Linear(2, 2)  # Input layer to hidden layer         self.fc2 = nn.Linear(2, 1)  # Hidden layer to output layer      def forward(self, x):         x = torch.sigmoid(self.fc1(x))         x = torch.sigmoid(self.fc2(x))         return x  # Initialize the network net = SimpleNet() 


Step 2: Define Loss Function and Optimizer

Python
# Loss function criterion = nn.MSELoss()  # Optimizer optimizer = optim.SGD(net.parameters(), lr=0.1) 


Step 3: Training Loop

Python
# Number of epochs epochs = 5000  for epoch in range(epochs):     # Forward pass: Compute predicted y by passing x to the model     pred_y = net(X)      # Compute and print loss     loss = criterion(pred_y, y)     if (epoch+1) % 500 == 0:         print(f'Epoch {epoch+1}, Loss: {loss.item()}')      # Zero gradients, perform a backward pass, and update the weights.     optimizer.zero_grad()  # Clear gradients for next train     loss.backward()        # Backpropagation, compute gradients     optimizer.step()       # Apply gradients 

Output:

Epoch 500, Loss: 0.25002944469451904
Epoch 1000, Loss: 0.25000864267349243
Epoch 1500, Loss: 0.24999231100082397
Epoch 2000, Loss: 0.24997900426387787
Epoch 2500, Loss: 0.24996770918369293
Epoch 3000, Loss: 0.24995779991149902
Epoch 3500, Loss: 0.24994871020317078
Epoch 4000, Loss: 0.24994011223316193
Epoch 4500, Loss: 0.24993163347244263
Epoch 5000, Loss: 0.24992311000823975

Step 4: Checking Gradients

After the training loop, you may want to check the gradients of specific parameters to understand how they've been adjusted:

Python
# Example: Check gradients of the first fully connected layer's weights print("Gradients of the first layer weights:") print(net.fc1.weight.grad) 

Output:

Gradients of the first layer weights:
tensor([[-1.0688e-04, -2.0416e-04],
[-2.1948e-05, -3.6009e-05]])

Understanding Gradient Flow in Neural Networks

Knowing how gradients propagate through a network is crucial for debugging and optimizing training processes:

  1. Forward Pass: Activations are computed as the signal progresses through the network.
  2. Backward Pass: Gradients are propagated back through the network using the chain rule.

Common Issues with Gradients

  • Vanishing Gradients: Can occur with deep networks using sigmoid activations, hindering effective learning.
  • Exploding Gradients: Typically happen in deep networks with poor initialization, leading to unstable learning.

Tips for Managing Gradients

  1. Normalization: Techniques like batch normalization can help stabilize gradient distributions.
  2. Initialization: Proper weight initialization can mitigate issues with vanishing and exploding gradients.
  3. Gradient Clipping: Controls the magnitude of gradients to prevent explosion during training.

Conclusion

Understanding and effectively calculating gradients is crucial in optimizing neural network performance. PyTorch provides both the tools and flexibility needed to master this essential aspect of deep learning. By familiarizing yourself with gradient computation in PyTorch, you can enhance the accuracy and efficiency of your models, paving the way for more sophisticated deep learning applications.


Next Article
Computational Graph in PyTorch
author
ujjwalshrivastava2309
Improve
Article Tags :
  • Deep Learning
  • AI-ML-DS
  • Python-PyTorch
  • AI-ML-DS With Python

Similar Reads

  • How to Differentiate a Gradient in PyTorch?
    PyTorch is an open-source machine-learning framework based on the Torch library. It is built by the Facebook AI team.  It is used for Computer vision and Natural Language Processing applications. PyTorch uses tensors to use the power of GPU. Differentiation is part of Calculus. So, In this article,
    8 min read
  • Computational Graph in PyTorch
    PyTorch is a popular open-source machine learning library for developing deep learning models. It provides a wide range of functions for building complex neural networks. PyTorch defines a computational graph as a Directed Acyclic Graph (DAG) where nodes represent operations (e.g., addition, multipl
    4 min read
  • How to use GPU acceleration in PyTorch?
    PyTorch is a well-liked deep learning framework that offers good GPU acceleration support, enabling users to take advantage of GPUs' processing power for quicker neural network training. This post will discuss the advantages of GPU acceleration, how to determine whether a GPU is available, and how t
    7 min read
  • How to compute QR decomposition of a matrix in Pytorch?
    In this article, we are going to discuss how to compute the QR decomposition of a matrix in Python using PyTorch. torch.linalg.qr() method accepts a matrix and a batch of matrices as input. This method also supports the input of float, double, cfloat, and cdouble data types. It will return a named t
    2 min read
  • Load a Computer Vision Dataset in PyTorch
    Computer vision is a subset of Artificial Intelligence that gives the ability to the computer to understand images. In Deep Learning, Convolution Neural Network is used to process the image. For building the good we need a lot of images to process. There are several ways to load a computer vision da
    3 min read
  • How to convert an image to grayscale in PyTorch
    In this article, we are going to see how to convert an image to grayscale in PyTorch.  torchvision.transforms.grayscale method Grayscaling is the process of converting an image from other color spaces e.g. RGB, CMYK, HSV, etc. to shades of gray. It varies between complete black and complete white. t
    2 min read
  • How to Form Graphs in Tensorflow?
    Tensorflow, a Google open-source machine learning toolkit, is widely used for developing and training various deep learning models. TensorFlow's key idea is the creation of computation graphs, which specify the operations and relationships between tensors. In this article, we'll look at how to creat
    4 min read
  • How to Use PyTorch's nn.MultiheadAttention
    The nn.MultiheadAttention module in PyTorch is a powerful tool that allows models to jointly attend to information from different representation subspaces. This technique, known as multi-head attention, is a cornerstone of transformer models and has been widely adopted in various natural language pr
    6 min read
  • How to create a custom Loss Function in PyTorch?
    Choosing the appropriate loss function is crucial in deep learning. It serves as a guide for directing the optimization process of neural networks while they are being trained. Although PyTorch offers many pre-defined loss functions, there are cases where regular loss functions are not enough. In th
    3 min read
  • How to Implement Adam Gradient Descent from Scratch using Python?
    Grade descent is an extensively used optimization algorithm in machine literacy and deep literacy. It's used to minimize the cost or loss function of a model by iteratively confirming the model's parameters grounded on the slants of the cost function with respect to those parameters. One variant of
    14 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences