Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
MNIST Dataset : Practical Applications Using Keras and PyTorch
Next article icon

Distributed Applications with PyTorch

Last Updated : 16 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

PyTorch, an open-source machine learning library developed by Facebook's AI Research lab, has become a favorite tool among researchers and developers for its flexibility and ease of use. One of the key features that enable PyTorch to scale efficiently across multiple devices and nodes is its distributed computing capability, facilitated by the torch.distributed package. This article focuses into what torch.distributed is, its components, and how it can be utilized for distributed training.

Prerequisites

  • A basic understanding of parallel computing concepts.
  • Basic knowledge of Python and PyTorch.
  • PyTorch installed on your system : Refer to link for installation

Table of Content

  • What is torch.distributed in PyTorch?
  • Key Concepts and Components of torch.distributed in Pytorch
    • 1. Process Groups
    • 2. Communication Backends
    • 3. Collective Communication
    • 4. Distributed Data-Parallel (DDP)
    • 5. Initialization Methods
    • 6. Distributed Optimizers
  • Practical Example: Distributed Training of a ResNet Model
    • Step 1: Define the Model and Dataset
    • Step 2: Training Loop
    • Step 3. Main function to Initialize the Processes

What is torch.distributed in PyTorch?

Distributed computing involves spreading the workload across multiple computational units, such as GPUs or nodes, to accelerate processing and improve model performance. PyTorch's torch.distributed package provides the necessary tools and APIs to facilitate distributed training. This package supports various parallelism strategies, including data parallelism, model parallelism, and hybrid approaches.

torch.distributed is a package within PyTorch designed to support distributed training. Distributed training involves splitting the training process across multiple GPUs, machines, or even clusters to accelerate the training of deep learning models. By leveraging this package, users can scale their models and training processes seamlessly.

Key Concepts and Components of torch.distributed in Pytorch

1. Process Groups

At the core of torch.distributed is the concept of a process group. A process group is a set of processes that can communicate with each other. Communication can be either point-to-point (between two processes) or collective (among all processes in the group).

import torch.distributed as dist

dist.init_process_group(backend='nccl', init_method='env://')

2. Communication Backends

PyTorch's torch.distributed supports multiple backends to facilitate communication:

  • NCCL (NVIDIA Collective Communications Library): Optimized for multi-GPU communication.
  • Gloo: A collective communications library supporting both CPU and GPU communication.
  • MPI (Message Passing Interface): A standardized and portable message-passing system.

3. Collective Communication

Collective communication operations involve all processes in a process group. Common collective operations include:

  • Broadcast: Sends data from one process to all other processes.
  • All-Reduce: Aggregates data from all processes and distributes the result back to all processes.
  • Scatter: Distributes chunks of data from one process to all other processes.
  • Gather: Collects chunks of data from all processes to one process.

4. Distributed Data-Parallel (DDP)

Distributed Data-Parallel is a high-level module that parallelizes data across multiple processes, each process running on a different GPU. It synchronizes gradients and parameters efficiently.

import torch
import torch.nn as nn
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP

# Initialize process group
dist.init_process_group(backend='nccl')

# Create model and move it to GPU with id rank
model = nn.Linear(10, 10).cuda(dist.get_rank())
model = DDP(model)

# Define loss and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Forward pass, backward pass, and optimization
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

5. Initialization Methods

There are several ways to initialize process groups in PyTorch:

  • Environment Variables (env://): Uses environment variables to initialize the process group.
  • File System (file://): Uses a shared file system to initialize the process group.
  • TCP (tcp://): Uses TCP sockets for initialization, suitable for single-node multi-GPU setups.

6. Distributed Optimizers

When training models in a distributed fashion, the optimization step also needs to be synchronized. PyTorch's torch.optim module works seamlessly with torch.distributed to ensure that gradients are averaged across all processes before updating the model parameters.

Practical Example: Distributed Training of a ResNet Model

Let's walk through a practical example of training a ResNet model using distributed data parallelism.

  1. Setup and Cleanup Functions: These functions initialize and clean up the distributed environment using torch.distributed.init_process_group and torch.distributed.destroy_process_group.
  2. Train Function: This function:
    • Sets up the distributed environment.
    • Defines the ResNet-50 model and wraps it with DistributedDataParallel.
    • Defines the loss function and optimizer.
    • Prepares the CIFAR-10 dataset and DataLoader with a distributed sampler.
    • Implements the training loop, where each rank processes its subset of data, computes the loss, and updates the model parameters.
  3. Main Function: This function initializes the distributed training by spawning multiple processes, each running the train function.

By following this example, you can set up and run distributed training for a ResNet model on the CIFAR-10 dataset using PyTorch's Distributed Data Parallel (DDP) framework.

Step 1: Define the Model and Dataset

Python
import os import torch import torch.distributed as dist import torch.multiprocessing as mp import torch.nn as nn import torch.optim as optim import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torch.nn.parallel import DistributedDataParallel as DDP from torch.utils.data import DataLoader, DistributedSampler  # Function to set up the distributed environment def setup(rank, world_size):     os.environ['MASTER_ADDR'] = 'localhost'     os.environ['MASTER_PORT'] = '12355'     dist.init_process_group("nccl", rank=rank, world_size=world_size)  # Function to clean up the distributed environment def cleanup():     dist.destroy_process_group()  # Function to define the training loop def train(rank, world_size):     setup(rank, world_size)          # Define the model and move it to the appropriate device     model = models.resnet50().to(rank)     ddp_model = DDP(model, device_ids=[rank])          # Define the loss function and optimizer     criterion = nn.CrossEntropyLoss().to(rank)     optimizer = optim.SGD(ddp_model.parameters(), lr=0.01)          # Define the data transformations and dataset     transform = transforms.Compose([         transforms.Resize(256),         transforms.CenterCrop(224),         transforms.ToTensor(),         transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),     ])          dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)     sampler = DistributedSampler(dataset, num_replicas=world_size, rank=rank)     dataloader = DataLoader(dataset, sampler=sampler, batch_size=32) 

Step 2: Training Loop

Python
    # Training loop     for epoch in range(10):         ddp_model.train()         for inputs, labels in dataloader:             inputs, labels = inputs.to(rank), labels.to(rank)             optimizer.zero_grad()             outputs = ddp_model(inputs)             loss = criterion(outputs, labels)             loss.backward()             optimizer.step()         print(f"Rank {rank}, Epoch {epoch}, Loss: {loss.item()}")          cleanup() 

Output:

Rank 0, Epoch 0, Loss: 2.302585
Rank 1, Epoch 0, Loss: 2.302585
Rank 0, Epoch 1, Loss: 2.301234
Rank 1, Epoch 1, Loss: 2.301234
...
Rank 0, Epoch 9, Loss: 1.234567
Rank 1, Epoch 9, Loss: 1.234567

Step 3. Main function to Initialize the Processes

Python
# Main function to initialize the processes def main():     world_size = 2     mp.spawn(train, args=(world_size,), nprocs=world_size, join=True)  if __name__ == "__main__":     main() 

Conclusion

torch.distributed in PyTorch is a powerful package that provides the necessary tools and functionalities to perform distributed training efficiently. By utilizing various backends, initializing process groups, and leveraging collective communication operations, users can scale their models across multiple GPUs and nodes, significantly speeding up the training process. Understanding and implementing torch.distributed can lead to substantial improvements in training times and model performance, making it an essential tool for any deep learning practitioner


Next Article
MNIST Dataset : Practical Applications Using Keras and PyTorch

A

agarwalyoge6kqa
Improve
Article Tags :
  • Blogathon
  • Deep Learning
  • AI-ML-DS
  • Python-PyTorch
  • Data Science Blogathon 2024

Similar Reads

  • How to Write Distributed Applications with Pytorch?
    Distributed computing has become essential in the era of big data and large-scale machine learning models. PyTorch, one of the most popular deep learning frameworks, offers robust support for distributed computing, enabling developers to train models on multiple GPUs and machines. This article will
    6 min read
  • Deep Learning with PyTorch | An Introduction
    PyTorch in a lot of ways behaves like the arrays we love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calc
    7 min read
  • Distributed System Algorithms
    Distributed systems are the backbone of modern computing, but what keeps them running smoothly? It's all about the algorithms. These algorithms are like the secret sauce, making sure everything works together seamlessly. In this article, we'll break down distributed system algorithms in simple langu
    9 min read
  • Extending PyTorch with Custom Activation Functions
    In the context of deep learning and neural networks, activation functions are mathematical functions that are applied to the output of a neuron or a set of neurons. The output of the activation function is then passed on as input to the next layer of neurons. The purpose of an activation function is
    7 min read
  • MNIST Dataset : Practical Applications Using Keras and PyTorch
    The MNIST dataset is a popular dataset used for training and testing in the field of machine learning for handwritten digit recognition. The article aims to explore the MNIST dataset, its characteristics and its significance in machine learning. Table of Content What is MNIST Dataset?Structure of MN
    8 min read
  • Distributed Coordination-Based Systems
    Distributed Coordination-Based Systems in Distributed Systems explores how different parts of a computer network work together to achieve common goals. It explains the methods and tools used to coordinate tasks and share information across multiple computers, making the system efficient and reliable
    8 min read
  • Initialize weights in PyTorch
    If we are trying to build a neural network then we have to initialize the layers of the network with some initial weights which we try to optimize as the training process of the model goes on. The method by which the weights of a neural network are initialized does affect the time required to reach
    6 min read
  • Train a Deep Learning Model With Pytorch
    Neural Network is a type of machine learning model inspired by the structure and function of human brain. It consists of layers of interconnected nodes called neurons which process and transmit information. Neural networks are particularly well-suited for tasks such as image and speech recognition,
    6 min read
  • Python - Matrix multiplication using Pytorch
    The matrix multiplication is an integral part of scientific computing. It becomes complicated when the size of the matrix is huge. One of the ways to easily compute the product of two matrices is to use methods provided by PyTorch. This article covers how to perform matrix multiplication using PyTor
    7 min read
  • Graph Neural Networks with PyTorch
    Graph Neural Networks (GNNs) represent a powerful class of machine learning models tailored for interpreting data described by graphs. This is particularly useful because many real-world structures are networks composed of interconnected elements, such as social networks, molecular structures, and c
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences