Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
One Hot Encoding using Tensorflow
Next article icon

Softmax Regression using TensorFlow

Last Updated : 10 Mar, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report

This article discusses the basics of Softmax Regression and its implementation in Python using the TensorFlow library.

Softmax regression

Softmax regression (or multinomial logistic regression) is a generalization of logistic regression to the case where we want to handle multiple classes in the target column. In binary logistic regression, the labels were binary, that is for ith observation,

 [Tex]y_{i} \in \{ 0, 1 \} [/Tex]

But consider a scenario where we need to classify an observation out of three or more class labels. For example, in digit classification here, the possible labels are:

 [Tex]y_{i} \in \{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \} [/Tex]

In such cases, we can use Softmax Regression. 

Softmax layer

It is harder to train the model using score values since it is hard to differentiate them while implementing the Gradient Descent algorithm for minimizing the cost function. So, we need some function that normalizes the logit scores as well as makes them easily differentiable. In order to convert the score matrix Z to probabilities, we use the Softmax function. For a vector y, softmax function S(y) is defined as:

[Tex]S\left ( y_i \right )=\frac{e^{y_i}}{\sum_{j=0}^{n-1}e^{y_i}}     [/Tex] 

So, the softmax function helps us to achieve two functionalities:

1. Convert all scores to probabilities. 2. Sum of all probabilities is 1.

Recall that in the Binary Logistic regression, we used the sigmoid function for the same task. The softmax function is nothing but a generalization of the sigmoid function. Now, this softmax function computes the probability that the ith training sample belongs to class j given the logits vector Zi as: 

[Tex]P\left ( y=j| Z_i \right )=\left[S\left ( Z_i \right )\right]_j=\frac{e^{Z_{ij}}}{\sum_{p=0}^{k}e^{Z_{ip}}}     [/Tex] 

In vector form, we can simply write:

[Tex]P\left ( y=j| Z_i \right )=\left[S\left ( Z_i \right )\right]_j     [/Tex] 

For simplicity, let Si denote the softmax probability vector for ith observation.

Cost function

Now, we need to define a cost function for which, we have to compare the softmax probabilities and one-hot encoded target vector for similarity. We use the concept of Cross-Entropy for the same. The Cross-entropy is a distance calculation function that takes the calculated probabilities from the softmax function and created a one-hot-encoding matrix to calculate the distance. For the right target classes, the distance values will be lesser, and the distance values will be larger for the wrong target classes. We define cross-entropy, D(Si, Ti) for ith observation with softmax probability vector, Si, and one-hot target vector, Ti as:

[Tex]D\left ( S_i, T_i \right )=-\sum_{j=1}^{k} T_{ij}\log S_{ij}     [/Tex] 

And now, the cost function, J can be defined as the average cross-entropy.

[Tex]J\left ( W,b \right )=\frac{1}{n}\sum_{i=1}^{n}D\left ( S_i, T_i \right )     [/Tex] 

Let us now implement Softmax Regression on the MNIST handwritten digit dataset using the TensorFlow library. For a gentle introduction to TensorFlow, follow this tutorial.

Importing Libraries and Dataset

First of all, we import the dependencies. 

Python3

import tensorflow as tf
import tensorflow.compat.v1 as tf1
 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
                      
                       

TensorFlow allows you to download and read the MNIST data automatically. Consider the code given below. It will download and assign the MNIST_data to the desired variables like it has been done below. 

Python3

(X_train, Y_train),\
(X_val, Y_val) = tf.keras.datasets.mnist.load_data()
print("Shape of feature matrix:", X_train.shape)
print("Shape of target matrix:", Y_train.shape)
                      
                       

Output:

Shape of feature matrix: (60000, 28, 28) Shape of target matrix: (60000,)

Now, we try to understand the structure of the dataset. The MNIST data is split into two parts: 60,000 data points of training data, and 10,000 points of validation data. Each image is 28 pixels by 28 pixels. The number of class labels is 10.

Python3

# visualize data by plotting images
fig, ax = plt.subplots(10, 10)
for i in range(10):
    for j in range(10):
        k = np.random.randint(0,X_train.shape[0])
        ax[i][j].imshow(X_train[k].reshape(28, 28),
                        aspect='auto')
plt.show()
                      
                       

Output:

Sample images from the MNIST data

Now let’s define some hyperparameters here only so, that we can control them for the whole notebook from here only. Also, we need to reshape the data, as well as one hot encode the data to get the desired results.

Python3

num_features = 784
num_labels = 10
learning_rate = 0.05
batch_size = 128
num_steps = 5001
 
# input data
train_dataset = X_train.reshape(-1, 784)
train_labels = pd.get_dummies(Y_train).values
valid_dataset = X_val.reshape(-1, 784)
valid_labels = pd.get_dummies(Y_val).values
                      
                       

Computation Graph

Now, we create a computation graph. Defining a computation graph helps us to achieve the functionality of the EagerTensor that is provided by TensorFlow. 

Python3

# initialize a tensorflow graph
graph = tf.Graph()
 
with graph.as_default():
    # Inputs
    tf_train_dataset = tf1.placeholder(tf.float32,
                                       shape=(batch_size, num_features))
    tf_train_labels = tf1.placeholder(tf.float32,
                                      shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
 
    # Variables.
    weights = tf.Variable(
        tf.random.truncated_normal([num_features, num_labels]))
    biases = tf.Variable(tf.zeros([num_labels]))
 
    # Training computation.
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
        labels=tf_train_labels, logits=logits))
 
    # Optimizer.
    optimizer = tf1.train.GradientDescentOptimizer(
        learning_rate).minimize(loss)
 
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    tf_valid_dataset = tf.cast(tf_valid_dataset, tf.float32)
    valid_prediction = tf.nn.softmax(
        tf.matmul(tf_valid_dataset, weights) + biases)
                      
                       

Running the Computation Graph

Since we have already built the computation graph, now it’s time to run it through a session.

Python3

# utility function to calculate accuracy
def accuracy(predictions, labels):
    correctly_predicted = np.sum(
        np.argmax(predictions, 1) == np.argmax(labels, 1))
    acc = (100.0 * correctly_predicted) / predictions.shape[0]
    return acc
                      
                       

We will use the above utility function to calculate the accuracy of the model as the training goes on.

Python3

with tf1.Session(graph=graph) as session:
    # initialize weights and biases
    tf1.global_variables_initializer().run()
    print("Initialized")
 
    for step in range(num_steps):
        # pick a randomized offset
        offset = np.random.randint(0, train_labels.shape[0] - batch_size - 1)
 
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
 
        # Prepare the feed dict
        feed_dict = {tf_train_dataset: batch_data,
                     tf_train_labels: batch_labels}
 
        # run one step of computation
        _, l, predictions = session.run([optimizer, loss, train_prediction],
                                        feed_dict=feed_dict)
 
        if (step % 500 == 0):
            print("Minibatch loss at step {0}: {1}".format(step, l))
            print("Minibatch accuracy: {:.1f}%".format(
                accuracy(predictions, batch_labels)))
            print("Validation accuracy: {:.1f}%".format(
                accuracy(valid_prediction.eval(), valid_labels)))
                      
                       

Output:

Initialized Minibatch loss at step 0: 3185.3974609375 Minibatch accuracy: 7.0% Validation accuracy: 21.1% Minibatch loss at step 500: 619.6030883789062 Minibatch accuracy: 86.7% Validation accuracy: 89.0% Minibatch loss at step 1000: 247.22283935546875 Minibatch accuracy: 93.8% Validation accuracy: 85.7% Minibatch loss at step 1500: 2945.78662109375 Minibatch accuracy: 78.9% Validation accuracy: 83.6% Minibatch loss at step 2000: 337.13922119140625 Minibatch accuracy: 94.5% Validation accuracy: 89.0% Minibatch loss at step 2500: 409.4652404785156 Minibatch accuracy: 89.8% Validation accuracy: 90.6% Minibatch loss at step 3000: 1077.618408203125 Minibatch accuracy: 84.4% Validation accuracy: 90.3% Minibatch loss at step 3500: 986.0247802734375 Minibatch accuracy: 80.5% Validation accuracy: 85.9% Minibatch loss at step 4000: 467.134521484375 Minibatch accuracy: 89.8% Validation accuracy: 85.1% Minibatch loss at step 4500: 1007.259033203125 Minibatch accuracy: 87.5% Validation accuracy: 87.5% Minibatch loss at step 5000: 342.13690185546875 Minibatch accuracy: 94.5% Validation accuracy: 89.6%

Some important points to note:

  • In every iteration, a minibatch is selected by choosing a random offset value using np.random.randint method.
  • To feed the placeholders tf_train_dataset and tf_train_label, we create a feed_dict like this:

feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}

Although many of the functionalities we have implemented from scratch here are provided automatically if one uses TensorFlow. But they have been implemented from scratch to get a better intuition of the mathematical formulas which are used in the Softmax Regression Classifier.



Next Article
One Hot Encoding using Tensorflow
https://media.geeksforgeeks.org/auth/avatar.png
GeeksforGeeks
Improve
Article Tags :
  • Computer Subject
  • Python
  • Technical Scripter
  • Tensorflow
Practice Tags :
  • python

Similar Reads

  • ML | Logistic Regression using Tensorflow
    Prerequisites: Understanding Logistic Regression and TensorFlow. Brief Summary of Logistic Regression: Logistic Regression is Classification algorithm commonly used in Machine Learning. It allows categorizing data into discrete classes by learning the relationship from a given set of labeled data. I
    6 min read
  • One Hot Encoding using Tensorflow
    In this post, we will be seeing how to initialize a vector in TensorFlow with all zeros or ones. The function you will be calling is tf.ones(). To initialize with zeros you could use tf.zeros() instead. These functions take in a shape and return an array full of zeros and ones accordingly. Code: imp
    2 min read
  • Random number generation using TensorFlow
    In the field of Machine Learning, Random numbers generation plays an important role by providing stochasticity essential for model training, initialization, and augmentation. We have TensorFlow, a powerful open-source machine learning library, that contains tf.random module. This module helps us for
    6 min read
  • Solving Linear Regression without using Sklearn and TensorFlow
    In this article, we will see how can we implement a Linear Regression class on our own without using any of the sklearn or the Tensorflow API pre-implemented functions which are highly optimized for such tasks. But then why we are implementing these functions on our own? The answer to this is very s
    3 min read
  • Integrating Numba with Tensorflow
    TensorFlow is a widely-used open-source library for machine learning and deep learning applications, while Numba is a just-in-time (JIT) compiler that translates a subset of Python and NumPy code into fast machine code. Combining these two powerful tools can potentially enhance computational efficie
    5 min read
  • Stock Price Prediction Project using TensorFlow
    Stock price prediction is a challenging task in the field of finance with applications ranging from personal investment strategies to algorithmic trading. In this article we will explore how to build a stock price prediction model using TensorFlow and Long Short-Term Memory (LSTM) networks a type of
    5 min read
  • Using the SavedModel format in Tensorflow
    TensorFlow is a popular deep-learning framework that provides a variety of tools to help users build, train, and deploy machine-learning models. One of the most important aspects of deploying a machine learning model is saving and exporting it to a format that can be easily used by other programs an
    4 min read
  • Classification on Imbalanced data using Tensorflow
    In the modern days of machine learning, imbalanced datasets are like a curse that degrades the overall model performance in classification tasks. In this article, we will implement a Deep learning model using TensorFlow for classification on a highly imbalanced dataset. Classification on Imbalanced
    7 min read
  • Training Loop in TensorFlow
    Training neural networks is at the core of machine learning, and understanding how to write a training loop from scratch is fundamental for any deep learning practitioner and TensorFlow provides powerful tools for building and training neural networks efficiently. In this article, we will get into t
    7 min read
  • Implementing Neural Networks Using TensorFlow
    Deep learning has been on the rise in this decade and its applications are so wide-ranging and amazing that it's almost hard to believe that it's been only a few years in its advancements. And at the core of deep learning lies a basic "unit" that governs its architecture, yes, It's neural networks.
    9 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences