Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
How to implement transfer learning in PyTorch?
Next article icon

How to Implement Various Optimization Algorithms in Pytorch?

Last Updated : 24 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Optimization algorithms are an essential aspect of deep learning, and PyTorch provides a wide range of optimization algorithms to help us train our neural networks effectively. In this article, we will explore various optimization algorithms in PyTorch and demonstrate how to implement them. We will use a simple neural network for the demonstration.

NOTE: If in your system, the PyTorch module is not installed, then you need to install PyTorch by running the following command in your terminal or command prompt :

pip install torch torchvision

This will install the PyTorch module along with torchvision, which is a package that provides access to popular datasets, model architectures, and image transformations for PyTorch. Once you have installed these modules, you should be able to run the code without any errors.

Implementations

Import Libraries:

First, we need to import the required libraries. We will be using the PyTorch framework, so we will import the torch library. We will also use the MNIST dataset to train our neural network, so we will import the torchvision library.

Python3
import torch import torchvision import torchvision.transforms as transforms 

Load Data:

Next, we will load the MNIST dataset and prepare it for training. We will normalize the data and create batches of data using the DataLoader class.

Python3
transform = transforms.Compose([transforms.ToTensor(),                                 transforms.Normalize((0.5,), (0.5,))])  trainset = torchvision.datasets.MNIST(root='./data', train=True,                                         download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,                                           shuffle=True, num_workers=2) 

Output:

Files already downloaded and verified

Build Neural Network Model:

We will define a simple neural network with two hidden layers, each with 128 neurons, and an output layer with 10 neurons, one for each digit. We will use the ReLU activation function for the hidden layers and the softmax activation function for the output layer.

Python3
class Net(torch.nn.Module):     def __init__(self):         super(Net, self).__init__()         self.fc1 = torch.nn.Linear(784, 128)         self.fc2 = torch.nn.Linear(128, 128)         self.fc3 = torch.nn.Linear(128, 10)      def forward(self, x):         x = x.view(-1, 784)         x = torch.nn.functional.relu(self.fc1(x))         x = torch.nn.functional.relu(self.fc2(x))         x = torch.nn.functional.softmax(self.fc3(x), dim=1)         return x  net = Net() 

Loss Function and Optimization Algorithm:

We will use the cross-entropy loss function to train our neural network. We will also use various optimization algorithms, such as stochastic gradient descent (SGD), Adam, Adagrad, and Adadelta, to train our neural network. We will define these optimization algorithms and their hyperparameters as follows:

Python3
criterion = torch.nn.CrossEntropyLoss()  # SGD optimizer optimizer_sgd = torch.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)  # Adam optimizer optimizer_adam = torch.optim.Adam(net.parameters(), lr=0.01, betas=(0.9, 0.999))  # Adagrad optimizer optimizer_adagrad = torch.optim.Adagrad(net.parameters(), lr=0.01)  # Adadelta optimizer optimizer_adadelta = torch.optim.Adadelta(net.parameters(), rho=0.9) 

Now, Train the Neural Network:

We will now train our neural network using the various optimization algorithms we defined earlier. We will train our neural network for 10 epochs and print the loss and accuracy after each epoch.

Python3
# Train the neural network using different optimization algorithms for epoch in range(10):     running_loss = 0.0     correct = 0     total = 0     for i, data in enumerate(trainloader, 0):         inputs, labels = data         # move data and target to the GPU         inputs, labels = inputs.to(device), labels.to(device)         optimizer_sgd.zero_grad()         optimizer_adam.zero_grad()         optimizer_adagrad.zero_grad()         optimizer_adadelta.zero_grad()         outputs = net(inputs)         loss = criterion(outputs, labels)         loss.backward()         optimizer_sgd.step()         optimizer_adam.step()         optimizer_adagrad.step()         optimizer_adadelta.step()         running_loss += loss.item()         _, predicted = torch.max(outputs.data, 1)         total += labels.size(0)         correct += (predicted == labels).sum().item()      print('Epoch: %d | Loss: %.3f | Accuracy: %.3f %%' %           (epoch + 1, running_loss / len(trainloader), 100 * correct / total)) 

Output:

Epoch: 1 | Loss: 1.589 | Accuracy: 42.224 % Epoch: 2 | Loss: 1.377 | Accuracy: 51.298 % Epoch: 3 | Loss: 1.314 | Accuracy: 54.116 % Epoch: 4 | Loss: 1.272 | Accuracy: 55.800 % Epoch: 5 | Loss: 1.249 | Accuracy: 57.118 % Epoch: 6 | Loss: 1.223 | Accuracy: 57.998 % Epoch: 7 | Loss: 1.204 | Accuracy: 58.720 % Epoch: 8 | Loss: 1.191 | Accuracy: 59.426 % Epoch: 9 | Loss: 1.181 | Accuracy: 59.916 % Epoch: 10 | Loss: 1.176 | Accuracy: 60.258 %

Use different optimization algorithms for different parts of the model

Python3
import torch import torch.nn as nn import torch.optim as optim import torchvision.datasets as datasets import torchvision.transforms as transforms from torch.utils.data import DataLoader  # Define a neural network architecture class Net(nn.Module):     def __init__(self):         super(Net, self).__init__()         self.conv1 = nn.Conv2d(3, 6, 5)         self.pool = nn.MaxPool2d(2, 2)         self.conv2 = nn.Conv2d(6, 16, 5)         self.fc1 = nn.Linear(16 * 5 * 5, 120)         self.fc2 = nn.Linear(120, 84)         self.fc3 = nn.Linear(84, 10)      def forward(self, x):         x = self.pool(torch.relu(self.conv1(x)))         x = self.pool(torch.relu(self.conv2(x)))         x = x.view(-1, 16 * 5 * 5)         x = torch.relu(self.fc1(x))         x = torch.relu(self.fc2(x))         x = self.fc3(x)         return x  # Define the training dataset and data loader transform = transforms.Compose([transforms.ToTensor(),                                 transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) trainset = datasets.CIFAR10(     root='./data', train=True, download=True, transform=transform) trainloader = DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)  # Move the model to the GPU device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") net = Net().to(device)  # Define the optimization algorithms optimizers = [optim.SGD(net.parameters('fc3'), lr=0.001, momentum=0.9),               optim.Adagrad(net.parameters('fc2'), lr=0.001),              optim.Adam(net.parameters('fc1'), lr=0.001)]   # Train the neural network using different optimization algorithms for epoch in range(10):     running_loss = 0.0     correct = 0     total = 0     for i, data in enumerate(trainloader, 0):         inputs, labels = data         # move data and target to the GPU         inputs, labels = inputs.to(device), labels.to(device)         for optimizer in optimizers:             optimizer.zero_grad()         outputs = net(inputs)                  EntropyLoss = nn.CrossEntropyLoss()(outputs, labels)         fc1_loss = nn.L1Loss()(net.fc1.weight, torch.zeros_like(net.fc1.weight))         fc2_loss = nn.L1Loss()(net.fc2.weight, torch.zeros_like(net.fc2.weight))         total_loss = EntropyLoss + fc1_loss + fc2_loss         total_loss.backward()                  for optimizer in optimizers:             optimizer.step()         running_loss += total_loss.item()         _, predicted = torch.max(outputs.data, 1)         total += labels.size(0)         correct += (predicted == labels).sum().item()     print('Epoch: %d | Loss: %.3f | Accuracy: %.3f %%' %           (epoch + 1, running_loss / len(trainloader), 100 * correct / total)) 

Output:

Files already downloaded and verified Epoch: 1 | Loss: 1.634 | Accuracy: 41.848 % Epoch: 2 | Loss: 1.436 | Accuracy: 50.932 % Epoch: 3 | Loss: 1.367 | Accuracy: 54.456 % Epoch: 4 | Loss: 1.318 | Accuracy: 56.632 % Epoch: 5 | Loss: 1.287 | Accuracy: 58.154 % Epoch: 6 | Loss: 1.270 | Accuracy: 59.088 % Epoch: 7 | Loss: 1.247 | Accuracy: 60.192 % Epoch: 8 | Loss: 1.235 | Accuracy: 60.676 % Epoch: 9 | Loss: 1.226 | Accuracy: 61.344 % Epoch: 10 | Loss: 1.220 | Accuracy: 61.608 %

Advantages and disadvantages of implementing various Optimization Algorithm in Pytorch

Advantages:

  • Improved training performance: Using different optimization algorithms for different parts of the model can improve the training performance by allowing each part of the model to learn at its optimal rate.
  • Better convergence:  Some optimization algorithms perform better for specific types of model architectures. With the help of multiple optimizations, we can take advantage of their respective strength to achieve better convergence.
  • Regularization: Different optimization algorithms can have different regularisation impacts on the model. It can prevent from overfitting and enhance the model's generalizability.

Disadvantages:

  • Increased complexity:  Implementing multiple optimization algorithms can increase the complexity which will require more training time and resources, And it may be harder to maintain and debug.
  • Risk of instability: Using several optimization algorithms can make the training process more unstable because different algorithms may attempt to optimize the same parameter in conflicting or oscillating ways.

Next Article
How to implement transfer learning in PyTorch?
author
worldhello
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
  • Python-PyTorch
  • python
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • Implementation of Whale Optimization Algorithm
    Previous article Whale optimization algorithm (WOA) talked about the inspiration of whale optimization, its mathematical modeling and algorithm. In this article we will implement a whale optimization algorithm (WOA) for two fitness functions 1) Rastrigin function    2) Sphere function  The algorithm
    6 min read
  • How to implement Genetic Algorithm using PyTorch
    The optimization algorithms are capable of solving complex problems and genetic algorithm is one of the optimization algorithm. Genetic Algorithm can be easily integrate with PyTorch to address a wide array of optimization tasks. We will understand how to implement Genetic Algorithm using PyTorch. G
    8 min read
  • How to optimize networking using optimization algorithms in PyBrain
    In this article, we are going to see how to optimize networking using optimization algorithms in PyBrain using Python. In the field of machine learning, Optimization algorithms are specifically used to reduce certain functions known as loss function/error function. By loss function, the optimization
    3 min read
  • How to implement transfer learning in PyTorch?
    What is Transfer Learning?Transfer learning is a technique in deep learning where a pre-trained model on a large dataset is reused as a starting point for a new task. This approach significantly reduces training time and improves performance, especially when dealing with limited datasets. It is very
    15+ min read
  • Batch Normalization Implementation in PyTorch
    Batch Normalization (BN) is a critical technique in the training of neural networks, designed to address issues like vanishing or exploding gradients during training. In this tutorial, we will implement batch normalization using PyTorch framework. Table of Content What is Batch Normalization?How Bat
    7 min read
  • How to optimize memory usage in PyTorch?
    Memory optimization is essential when using PyTorch, particularly when training deep learning models on GPUs or other devices with restricted memory. Larger model training, quicker training periods, and lower costs in cloud settings may all be achieved with effective memory management. This article
    4 min read
  • How to implement neural networks in PyTorch?
    This tutorial shows how to use PyTorch to create a basic neural network for classifying handwritten digits from the MNIST dataset. Neural networks, which are central to modern AI, enable machines to learn tasks like regression, classification, and generation. With PyTorch, you'll learn how to design
    5 min read
  • Optimization Algorithms in Machine Learning
    Optimization algorithms are the backbone of machine learning models as they enable the modeling process to learn from a given data set. These algorithms are used in order to find the minimum or maximum of an objective function which in machine learning context stands for error or loss. In this artic
    15+ min read
  • How to Make a grid of Images in PyTorch?
    In this article, we are going to see How to Make a grid of Images in PyTorch. we can make a grid of images using the make_grid() function of torchvision.utils package. make_grid() function: The make_grid() function accept 4D tensor with [B, C ,H ,W] shape. where B represents the batch size, C repres
    3 min read
  • How to implement a gradient descent in Python to find a local minimum ?
    Gradient Descent is an iterative algorithm that is used to minimize a function by finding the optimal parameters. Gradient Descent can be applied to any dimension function i.e. 1-D, 2-D, 3-D. In this article, we will be working on finding global minima for parabolic function (2-D) and will be implem
    8 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences