Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Activation Functions in Pytorch
Next article icon

Swish activation function in Pytorch

Last Updated : 25 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Activation functions are a fundamental component of artificial neural networks. They introduce non-linearity into the model, allowing it to learn complex relationships in the data. One such activation function, the Swish activation function, has gained attention for its unique properties and potential advantages over the widely used Rectified Linear Unit (ReLU) activation. In this article, we'll delve into the Swish activation function, provide the mathematical formula, explore its advantages over ReLU, and demonstrate its implementation using PyTorch.

Swish Activation Function

The Swish activation function, introduced by researchers at Google in 2017, is defined mathematically as follows:

Swish(x) = x * sigmoid(x)

Where:

  • x: The input value to the activation function.
  • sigmoid(x): The sigmoid function, which maps any real-valued number to the range [0, 1]. It smoothly transitions from 0 to 1 as x increases.

The Swish activation combines a linear component (the input x) with a non-linear component (the sigmoid function), resulting in a smooth and differentiable activation function.

Where to Use Swish Activation?

Swish can be used in various neural network architectures, including feedforward neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Its advantages become particularly apparent in deep networks where it can help mitigate the vanishing gradient problem.

Advantages of Swish Activation Function over ReLU

Now, let's explore the advantages of the Swish activation function compared to the popular ReLU activation.

Smoothness and Differentiability

Swish is a smooth and differentiable function due to the presence of the sigmoid component. This property makes it well-suited for gradient-based optimization techniques like stochastic gradient descent (SGD) and backpropagation. In contrast, ReLU is not differentiable at zero (ReLU's derivative is undefined at x=0), which can lead to optimization challenges.

Improved Learning in Deep Networks

In deep neural networks, Swish can potentially enable better learning and convergence compared to ReLU. The smoothness of Swish helps gradients flow more smoothly through the network, reducing the likelihood of vanishing gradients during training. This is especially beneficial in very deep networks.

Similar Computational Cost

Swish activation is computationally efficient, similar to ReLU. Both functions involve basic arithmetic operations and do not significantly increase the computational burden during training or inference.

Implementation Using PyTorch

Now, let's see how to implement the Swish activation function using PyTorch. We'll create a custom Swish module and integrate it into a simple neural network.

Let's start with importing the necessary libraries.

Python
import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F from torchvision import datasets, transforms from torch.utils.data import DataLoader 

Once, we are done with importing the libraries, we can define a custom activation - Swish.

The following code defines a class that inherits the PyTorch base class. Inside the class, there is a forward method. The method defines how the module will process the input data. It will take input tensor as an arguement and return the output tensor after applying the Swish activation.

Python
# Swish Function  class Swish(nn.Module):     def forward(self, x):         return x * torch.sigmoid(x) 

After defining the Swish class, we proceed with defining the neural network model.

In the following code snippet, we have defined a neural network model using PyTorch designed for image classification task.

  • The input layer has 28x28 pixels.
  • The hidden layer
    • The first hidden layer consists of 256 neurons. It takes the flattened input and applies a linear transformation to produce output.
    • The second hidden layer consists 128 neurons that takes the 256 dimensional output from the previous layer and produces a 128-dimensional output.
    • The swish activation function is applied to both hidden layers to introduce non-linearity to the network.
    • The output layer consists of 10 neurons to perform classification into 10 classes.
Python
# Define the neural network model class Net(nn.Module):     def __init__(self):         super(Net, self).__init__()         self.fc1 = nn.Linear(28 * 28, 256)         self.fc2 = nn.Linear(256, 128)         self.fc3 = nn.Linear(128, 10)         self.swish = Swish()      def forward(self, x):         x = x.view(-1, 28 * 28)         x = self.fc1(x)         x = self.swish(x)         x = self.fc2(x)         x = self.swish(x)         x = self.fc3(x)         return x 

To set up the neural network for training, we create an instance of the model, define the loss function, the optimizer and data transformations.

Python
# Create an instance of the model model = Net()  # Define the loss function and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01)  # Define data transformations transform = transforms.Compose([     transforms.ToTensor(), ]) 

Once, we are done with this step, we can proceed to train and evaluate the model on a dataset. Let's load the MNIST data and create data loaders for training using the following code.

Python
# Load the MNIST dataset train_dataset = datasets.MNIST('', train=True, download=True, transform=transform) test_dataset = datasets.MNIST('', train=False, download=True, transform=transform)  # Create data loaders train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False) 

With these data loaders in place, we can proceed with the training loop to iterate through batches of training and testing data.

In the following code, we have executed the training loop for the neural network. The loop will repeat for 5 epochs, during which the model's weights are updated to minimize the loss and improve its performance on the training data.

Python
# Training loop num_epochs = 5 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  model.to(device)  for epoch in range(num_epochs):     model.train()     total_loss = 0.0     for batch_idx, (data, target) in enumerate(train_loader):         data, target = data.to(device), target.to(device)         optimizer.zero_grad()         output = model(data)         loss = criterion(output, target)         loss.backward()         optimizer.step()         total_loss += loss.item()          print(f"Epoch {epoch+1}/{num_epochs}, Loss: {total_loss / len(train_loader)}") 

Output:

Epoch 1/5, Loss: 1.6938323568503062
Epoch 2/5, Loss: 0.4569567457397779
Epoch 3/5, Loss: 0.3522500048557917
Epoch 4/5, Loss: 0.31695075702369213
Epoch 5/5, Loss: 0.2961081813474496

The last step is the model evaluation step.

Python
# Evaluation loop model.eval() correct = 0 total = 0 with torch.no_grad():     for data, target in test_loader:         data, target = data.to(device), target.to(device)         outputs = model(data)         _, predicted = torch.max(outputs.data, 1)         total += target.size(0)         correct += (predicted == target).sum().item()  print(f"Accuracy on test set: {100 * correct / total}%") 

Output:

Accuracy on test set: 92.02%

Conclusion

The Swish activation function offers a promising alternative to traditional activation functions like ReLU. Its smoothness, differentiability, and potential to improve learning in deep networks make it a valuable tool for modern neural network architectures. By implementing Swish in PyTorch, you can harness its benefits and explore its effectiveness in various machine learning tasks.


Next Article
Activation Functions in Pytorch

S

shivangi2707
Improve
Article Tags :
  • Python
  • Geeks Premier League
  • AI-ML-DS
  • Python-PyTorch
  • Geeks Premier League 2023
  • AI-ML-DS With Python
Practice Tags :
  • python

Similar Reads

  • Activation Functions in Pytorch
    In this article, we will Understand PyTorch Activation Functions. What is an activation function and why to use them?Activation functions are the building blocks of Pytorch. Before coming to types of activation function, let us first understand the working of neurons in the human brain. In the Artif
    5 min read
  • Swish Activation Function
    As the Machine Learning community keeps working on trying to identify complex patterns in the dataset for better results, Google proposed the Swish Activation function as an alternative to the popular ReLU activation function. The authors of the research paper show that using the Swish Activation fu
    4 min read
  • Activation Function in TensorFlow
    Activation functions add non-linearity to deep learning models and allow them to learn complex patterns. TensorFlow’s tf.keras.activations module provides a variety of activation functions to use in different scenarios. An activation function is a mathematical transformation applied to the output of
    4 min read
  • Classification using PyTorch linear function
    In machine learning, prediction is a critical component. It is the process of using a trained model to make predictions on new data. PyTorch is an open-source machine learning library that allows developers to build and train neural networks. One common use case in PyTorch is using linear classifier
    7 min read
  • Types Of Activation Function in ANN
    The biological neural network has been modeled in the form of Artificial Neural Networks with artificial neurons simulating the function of a biological neuron. The artificial neuron is depicted in the below picture: Each neuron consists of three major components:  A set of 'i' synapses having weigh
    4 min read
  • Extending PyTorch with Custom Activation Functions
    In the context of deep learning and neural networks, activation functions are mathematical functions that are applied to the output of a neuron or a set of neurons. The output of the activation function is then passed on as input to the next layer of neurons. The purpose of an activation function is
    7 min read
  • Activation Functions in Neural Networks Using R
    Activation functions are essential components of neural networks that play a crucial role in determining how a model processes and interprets data. They introduce non-linearity into the network, enabling it to learn and capture complex patterns and relationships within the data. By applying mathemat
    5 min read
  • Softmax Activation Function in Neural Networks
    Softmax is an activation function commonly used in neural networks for multi-classification problems. This article will explore Softmax's mathematical explanation and how it works in neural networks. Table of Content Introduction of SoftMax in Neural Networks How Softmax Works?Softmax and Cross-Entr
    10 min read
  • Tensorflow.js tf.layers.activation() Function
    Introduction: Tensorflow.js is an open-source library developed by Google for running machine learning models and deep learning neural networks in the browser or node environment. Tensorflow.js tf.layers.activation() function is used to applied to function to all the element of our input layer . we
    3 min read
  • Understanding Activation Functions in Depth
    In artificial neural networks, the activation function of a neuron determines its output for a given input. This output serves as the input for subsequent neurons in the network, continuing the process until the network solves the original problem. Consider a binary classification problem, where the
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences