Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Swish Activation Function
Next article icon

Extending PyTorch with Custom Activation Functions

Last Updated : 25 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In the context of deep learning and neural networks, activation functions are mathematical functions that are applied to the output of a neuron or a set of neurons. The output of the activation function is then passed on as input to the next layer of neurons.

The purpose of an activation function is to introduce non-linearity into the network. Without an activation function, a neural network would simply be a linear function of its inputs, which would severely limit its ability to model complex patterns and relationships.

Activation functions are typically applied element-wise to the output of each neuron, meaning that the same function is applied to each element of the output vector. Some common activation functions used in neural networks include sigmoid, tanh, ReLU, and softmax.

The choice of activation function can have a significant impact on the performance and behavior of a neural network, and it is often an area of active research and experimentation. Additionally, in some cases, it may be beneficial to define and use custom activation functions that are tailored to the specific needs and characteristics of a given task or dataset.

Certainly! Here is an example of how to define a custom activation function in PyTorch:

Custom Activation Function: 1 Softplus function

1. Mathematical Formula :

Let's say we want to define a custom activation function called "Softplus" that takes in a tensor x as input and returns the element-wise function:

Softplus(x) = \frac{1}{\beta}\log(1 + e^{\beta*x})

This is a non-linear function that has a similar shape to the ReLU activation function, but with a smoother curve.

2. PyTorch Code :

To define this custom activation function in PyTorch, we can write:

Python
import torch import torch.nn as nn  class Softplus(nn.Module):     def __init__(self):         super(Softplus, self).__init__()      def forward(self, x, beta=1):         return 1/beta * torch.log(1 + torch.exp(beta * x)) 

Here, we define a new PyTorch module called "Softplus" that inherits from the nn.Module class. The forward method defines how the module should behave when given an input tensor x.

To plot the graph of this activation function, we can use the following code :

Python
import matplotlib.pyplot as plt  # create custom dataset x = torch.linspace(-5, 5, 100) k = Softplus() y = k(x)  # plot the softplus function graph plt.plot(x, y) plt.grid(True) plt.title('Softplus Function') plt.xlabel('x') plt.ylabel('y') plt.show() 

Output : 

Sofplus function - Geeksforgeeks
Sofplus function

This is just one example of how to define a custom activation function in PyTorch. The process may vary depending on the specific function being defined, and it may require additional considerations such as ensuring the function is differentiable for use in backpropagation.

Custom Activation Function: 2 Sigmoid Function

1. Mathematical Formula :

The Sigmoid activation function is defined as:

\sigma(x) = \frac{1}{1+e^{-x}}

2. PyTorch Code :

To define this custom activation function in PyTorch, we can write:

Python
import torch import torch.nn as nn  class sigmoid(nn.Module):     def __init__(self):         super(sigmoid, self).__init__()      def forward(self, x):         return 1/(1 + torch.exp(-x)) 

Here's the PyTorch code to define the sigmoid activation function :

Python
import matplotlib.pyplot as plt  # create custom dataset x = torch.linspace(-10, 10, 100) k = sigmoid() y = k(x)  # Plot the sigmoid function plt.plot(x, y) plt.grid(True) plt.title('Sigmoid Function') plt.xlabel('x') plt.ylabel('y') plt.show() 

Output : 

Sigmoid Function

Custom Activation Function: 3 Swish activation function

The Swish activation function is defined as :

swish_activation(x) = x * sigmoid(x)

2. PyTorch Code :

Here's the PyTorch code to define the Swish activation function :

Python
class SwishActivation(nn.Module):     def __init__(self):         super(SwishActivation, self).__init__()              def forward(self, x):         sigmoid = 1/(1 + torch.exp(-x))         return x * sigmoid 

Here's the PyTorch code to define the sigmoid activation function :

Python
import matplotlib.pyplot as plt  # create custom dataset x = torch.linspace(-10, 10, 100) k = SwishActivation() y = k(x)  # Plot the Swish Activation Function plt.plot(x, y) plt.grid(True) plt.title('Swish Activation Function') plt.xlabel('x') plt.ylabel('y') plt.show() 

Output:

Swiss Activation Function - Geeksforgeeks
Swiss Activation Function

Training a Neural Network with Custom Activation Functions using PyTorch : 

In implementations use the above-defined custom activation function and train the model.

Use the Swish activation function in PyTorch to train a simple neural network on the MNIST dataset :

Steps:

  • Import the necessary libraries
  • Define the custom Swish activation function
  • Define the PyTorch neural network model
  • Load and prepare the MNIST dataset
  • Initialize the model and define the loss function and optimizer.
  • Train the model and plot the loss vs iterations curve.
Python
# Import the necessary libraries import torch import torch.nn as nn import torch.optim as optim import torchvision.datasets as datasets import torchvision.transforms as transforms import matplotlib.pyplot as plt  # Define the Swish activation function class SwishActivation(nn.Module):     def __init__(self):         super(SwishActivation, self).__init__()              def forward(self, x):         sigmoid = 1/(1 + torch.exp(-x))         return x * sigmoid  # Define the neural network class Net(nn.Module):     def __init__(self):         super(Net, self).__init__()         self.fc1 = nn.Linear(784, 128)         self.activation = SwishActivation()         self.fc2 = nn.Linear(128, 10)          def forward(self, x):         x = x.view(-1, 784)         x = self.activation(self.fc1(x))         x = self.fc2(x)         return x  # Load the MNIST dataset train_dataset = datasets.MNIST(root='./data',                                 train=True,                                 download=True,                                 transform=transforms.ToTensor()) train_loader = torch.utils.data.DataLoader(train_dataset,                                             batch_size=128,                                             shuffle=True)  # Initialize the neural network model = Net()  # Define the loss function, optimizer, and learning rate criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01)  # Train the model loss_list = [] for epoch in range(10):     running_loss = 0.0     for i, (inputs, labels) in enumerate(train_loader, 0):         optimizer.zero_grad()         outputs = model(inputs)         loss = criterion(outputs, labels)         loss.backward()         optimizer.step()         running_loss += loss.item()     loss_list.append(running_loss)     print('Epoch %d loss: %.3f' % (epoch + 1, running_loss))  # Plot the loss vs iterations curve plt.plot(loss_list) plt.title('Loss vs Iterations') plt.xlabel('Iterations') plt.ylabel('Loss') plt.show() 

Output: 

Epoch 1 loss: 921.342
Epoch 2 loss: 459.694
Epoch 3 loss: 281.649
Epoch 4 loss: 227.518
Epoch 5 loss: 201.599
Epoch 6 loss: 186.339
Epoch 7 loss: 176.337
Epoch 8 loss: 169.018
Epoch 9 loss: 163.496
Epoch 10 loss: 158.999
Loss vs Iterations - Geeksforgeeks
Loss vs Iterations

The learning rate is a hyperparameter that controls the step size at each iteration of gradient descent. In this code, the learning rate is set to 0.01 for the stochastic gradient descent optimizer. The choice of learning rate can significantly affect the training process of the neural network. If the learning rate is too high, the optimizer may overshoot the optimal solution, resulting in instability or divergence. If the learning rate is too low, the optimizer may converge too slowly, leading to long training times.

Overfitting occurs when a neural network model becomes too complex, and it fits the training data too well but generalizes poorly to new data. Overfitting can be observed when the training loss continues to decrease, but the validation loss starts to increase. In this code, we do not have a separate validation set to evaluate the model's performance on unseen data. Still, overfitting can be avoided by regularization techniques such as early stopping, dropout, or weight decay.

Underfitting occurs when a neural network model is too simple to capture the underlying patterns in the data. Underfitting can be observed when the training loss and validation loss are both high. In this code, we train the neural network for ten epochs, which may not be enough to achieve a good fit to the data. Increasing the number of epochs or adding more layers to the neural network could potentially improve the model's performance.

Applications of Extending PyTorch with Custom Activation Functions :

  1. Novel activation functions: Extending PyTorch with custom activation functions allows researchers and practitioners to experiment with new activation functions and test their effectiveness in different applications.
  2. Customized models: Custom activation functions can be used to create customized models that better suit specific tasks or data distributions.
  3. Domain-specific models: Custom activation functions can be used to create models that are specifically designed for a particular domain or application, such as computer vision, speech recognition, or natural language processing.

Next Article
Swish Activation Function

S

shivammahajan5516
Improve
Article Tags :
  • Deep Learning
  • AI-ML-DS
  • Python-PyTorch
  • AI-ML-DS With Python

Similar Reads

  • Swish activation function in Pytorch
    Activation functions are a fundamental component of artificial neural networks. They introduce non-linearity into the model, allowing it to learn complex relationships in the data. One such activation function, the Swish activation function, has gained attention for its unique properties and potenti
    6 min read
  • Activation Functions in Pytorch
    In this article, we will Understand PyTorch Activation Functions. What is an activation function and why to use them?Activation functions are the building blocks of Pytorch. Before coming to types of activation function, let us first understand the working of neurons in the human brain. In the Artif
    5 min read
  • How to create a custom Loss Function in PyTorch?
    Choosing the appropriate loss function is crucial in deep learning. It serves as a guide for directing the optimization process of neural networks while they are being trained. Although PyTorch offers many pre-defined loss functions, there are cases where regular loss functions are not enough. In th
    3 min read
  • Types Of Activation Function in ANN
    The biological neural network has been modeled in the form of Artificial Neural Networks with artificial neurons simulating the function of a biological neuron. The artificial neuron is depicted in the below picture: Each neuron consists of three major components:  A set of 'i' synapses having weigh
    4 min read
  • Swish Activation Function
    As the Machine Learning community keeps working on trying to identify complex patterns in the dataset for better results, Google proposed the Swish Activation function as an alternative to the popular ReLU activation function. The authors of the research paper show that using the Swish Activation fu
    4 min read
  • Distributed Applications with PyTorch
    PyTorch, an open-source machine learning library developed by Facebook's AI Research lab, has become a favorite tool among researchers and developers for its flexibility and ease of use. One of the key features that enable PyTorch to scale efficiently across multiple devices and nodes is its distrib
    6 min read
  • Choosing the Right Activation Function for Your Neural Network
    Activation functions are a critical component in the design and performance of neural networks. Choosing the right activation function can significantly impact the efficiency and accuracy of a neural network. This article will guide you through the process of selecting the appropriate activation fun
    4 min read
  • Create Custom Neural Network in PyTorch
    PyTorch is a popular deep learning framework, empowers you to build and train powerful neural networks. But what if you need to go beyond the standard layers offered by the library? Here's where custom layers come in, allowing you to tailor the network architecture to your specific needs. This compr
    5 min read
  • Activation Function in TensorFlow
    Activation functions add non-linearity to deep learning models and allow them to learn complex patterns. TensorFlow’s tf.keras.activations module provides a variety of activation functions to use in different scenarios. An activation function is a mathematical transformation applied to the output of
    4 min read
  • Tensorflow.js tf.layers.activation() Function
    Introduction: Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. Tensorflow.js tf.layers.activation() function is used to applied to function to all the element of our input layer . we
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences