Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
What is Gradient descent?
Next article icon

Vectorization Of Gradient Descent

Last Updated : 24 Oct, 2020
Comments
Improve
Suggest changes
Like Article
Like
Report

In Machine Learning, Regression problems can be solved in the following ways:

1. Using Optimization Algorithms – Gradient Descent

  • Batch Gradient Descent.
  • Stochastic Gradient Descent.
  • Mini-Batch Gradient Descent
  • Other Advanced Optimization Algorithms like ( Conjugate Descent … )

2. Using the Normal Equation :

  • Using the concept of Linear Algebra.

Let’s consider the case for Batch Gradient Descent for Univariate Linear Regression Problem.

The cost function for this Regression Problem is :

J(\Theta)=(1/2m)*\sum_{i=1}^m(h_{\theta}(x^i)-y^i)^2

Goal:

minimize_{\ \theta_{o},\theta_{1}}\ \ J({\theta})

In order to solve this problem, we can either go for a Vectorized approach ( Using the concept of Linear Algebra ) or unvectorized approach (Using for-loop).

1. Unvectorized Approach:

Here in order to solve the below mentioned mathematical expressions, We use for loop.

The above mathematical expression is a part of Cost Function.  
 
\sum_{i=1}^m(h_{\theta}(x^i)-y^i)^2
The above Mathematical Expression is the hypothesis.  
h_{\theta}=\theta_{0}x_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+... +\theta_{n}x_{n}\\ where,\\ h_{\theta}=hypothesis.\\
Code: Python Implementation of Unvectorzed Grad
# Import required modules.
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
import numpy as np
import time
   
# Create and plot the data set.
x, y = make_regression(n_samples = 100, n_features = 1,
                       n_informative = 1, noise = 10, random_state = 42)
  
plt.scatter(x, y, c = 'red')
plt.xlabel('Feature')
plt.ylabel('Target_Variable')
plt.title('Training Data')
plt.show()
  
# Convert y from 1d to 2d array.
y = y.reshape(100, 1)
   
# Number of Iterations for Gradient Descent
num_iter = 1000
   
# Learning Rate
alpha = 0.01
   
# Number of Training samples.
m = len(x)
   
# Initializing Theta.
theta = np.zeros((2, 1),dtype = float)
   
# Variables
t0 = t1 = 0
Grad0 = Grad1 = 0
  
# Batch Gradient Descent.
start_time = time.time()
   
for i in range(num_iter):
    # To find Gradient 0.
    for j in range(m):
        Grad0 = Grad0 + (theta[0] + theta[1] * x[j]) - (y[j])
      
    # To find Gradient 1.
    for k in range(m):
        Grad1 = Grad1 + ((theta[0] + theta[1] * x[k]) - (y[k])) * x[k]
    t0 = theta[0] - (alpha * (1/m) * Grad0)
    t1 = theta[1] - (alpha * (1/m) * Grad1)
    theta[0] = t0
    theta[1] = t1
    Grad0 = Grad1 = 0
       
# Print the model parameters.    
print('model parameters:',theta,sep = '\n')
   
# Print Time Take for Gradient Descent to Run.
print('Time Taken For Gradient Descent in Sec:',time.time()- start_time)
  
# Prediction on the same training set.
h = []
for i in range(m):
    h.append(theta[0] + theta[1] * x[i])
       
# Plot the output.
plt.plot(x,h)
plt.scatter(x,y,c = 'red')
plt.xlabel('Feature')
plt.ylabel('Target_Variable')
plt.title('Output')
                      
                       


 Output: 

model parameters:  [[ 1.15857049]   [44.42210912]]     Time Taken For Gradient Descent in Sec: 2.482538938522339  

2. Vectorized Approach:

Here in order to solve the below mentioned mathematical expressions, We use Matrix and Vectors (Linear Algebra).

The above mathematical expression is a part of Cost Function.  
\sum_{i=1}^m(h_{\theta}(x^i)-y^i)^2
The above Mathematical Expression is the hypothesis.  
h_{\theta}=\theta^T.X\\ where,\\ h_{\theta}=hypothesis.\\ \theta=  \begin{bmatrix}    \theta_{0} \\    \theta_{1}\\    \theta_{2}\\    \theta_{3}\\    .\\    .\\    \theta_{n}\\  \end{bmatrix} X= \begin{bmatrix}    {x_{0}} \\    {x_{1}}\\    {x_{2}}\\    {x_{3}}\\    .\\    .\\    {x_{n}}\\  \end{bmatrix}\\

Batch Gradient Descent :

Loop\ until\ converge\{\\ \ \theta_{j}:=\theta_{j}-(1/m)*(\alpha)*\frac{\partial J(\theta)}{\partial \theta_{j}} \\ \}\\ Let, \ Gradients=\frac{\partial J(\theta)}{\partial \theta_{j}}

Concept To Find Gradients  Using Matrix Operations:

X\_New= \begin{bmatrix}    {x_{0}^1} & {x_{1}^1} \\    {x_{0}^2} & {x_{1}^2}\\    {x_{0}^3} & {x_{1}^3}\\    {x_{0}^4} & {x_{1}^4}\\    . & .\\    . & .\\    . & .\\    {x_{0}^m} & {x_{1}^m}  \end{bmatrix}_{m X 2}  \theta=  \begin{bmatrix}    \theta_{0} \\    \theta_{1}\\  \end{bmatrix}_{2X1}\\  where,\\  \x_{0}^i=1\\      H(\theta)=X\_New\ .\ \theta\\ H(\theta)= \begin{bmatrix}    {\Theta_{0}}{x_{0}^1}+{\Theta_{1}}{x_{1}^1} \\    {\Theta_{0}}{x_{0}^2}+{\Theta_{1}}{x_{1}^2}\\    {\Theta_{0}}{x_{0}^3}+{\Theta_{1}}{x_{1}^3}\\    {\Theta_{0}}{x_{0}^4}+{\Theta_{1}}{x_{1}^4}\\    .\\    .\\    . \\    {\Theta_{0}}{x_{0}^m}+{\Theta_{1}}{x_{1}^m}  \end{bmatrix}_{mX1} And\ \ \  Y=  \begin{bmatrix}     {y^1}\\     {y^2}\\     {y^3}\\     .\\     .\\     .\\     {y^m}  \end{bmatrix}_{mX1}\\        H(\theta)-Y= \begin{bmatrix}    {\Theta_{0}}{x_{0}^1}+{\Theta_{1}}{x_{1}^1} -y^1\\    {\Theta_{0}}{x_{0}^2}+{\Theta_{1}}{x_{1}^2}-y^2\\    {\Theta_{0}}{x_{0}^3}+{\Theta_{1}}{x_{1}^3}-y^3\\    {\Theta_{0}}{x_{0}^4}+{\Theta_{1}}{x_{1}^4}-y^4\\    .\\    .\\    . \\    {\Theta_{0}}{x_{0}^m}+{\Theta_{1}}{x_{1}^m}-y^m  \end{bmatrix}_{mX1} \\      X\_New^T= \begin{bmatrix}    {x_{0}^1\ x_{0}^2\ x_{0}^3\ .\ .\ .\ x_{0}^m}\\    {x_{1}^1\ x_{1}^2\ x_{1}^3\ .\ .\ .\ x_{1}^m}  \end{bmatrix}_{2Xm} \\      Gradients=X\_New\ . \ (H(\theta)-Y)\\ = \begin{bmatrix} {x_{0}^1(\Theta x_{0}^1+\Theta x_{1}^1-y^1)\ + \ x_{0}^2(\Theta x_{0}^2+\Theta x_{1}^2-y^2)\ + \ x_{0}^3(\Theta x_{0}^3+\Theta x_{1}^3-y^3)\ + . . .}\\ {x_{1}^1(\Theta x_{0}^1+\Theta x_{1}^1-y^1)\ + \ x_{1}^2(\Theta x_{0}^2+\Theta x_{1}^2-y^2)\ + \ x_{1}^3(\Theta x_{0}^3+\Theta x_{1}^3-y^3)\ + . . .}\\      \end{bmatrix}_{2X1}\\       Finally\ we\ can \ say,\\ \ \ \ Gradients=\frac{\partial J(\theta)}{\partial \theta_{j}}=X\_New^T.(X\_New.\theta-Y)  Code: Python implementation of vectorized Gradient Descent approach
# Import required modules.
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
import numpy as np
import time
   
# Create and plot the data set.
x, y = make_regression(n_samples = 100, n_features = 1,
                       n_informative = 1, noise = 10, random_state = 42)
  
plt.scatter(x, y, c = 'red')
plt.xlabel('Feature')
plt.ylabel('Target_Variable')
plt.title('Training Data')
plt.show()
  
  
# Adding x0=1 column to x array.
X_New = np.array([np.ones(len(x)), x.flatten()]).T
  
# Convert y from 1d to 2d array.
y = y.reshape(100, 1)
   
# Number of Iterations for Gradient Descent
num_iter = 1000
   
# Learning Rate
alpha = 0.01
   
# Number of Training samples.
m = len(x)
   
# Initializing Theta.
theta = np.zeros((2, 1),dtype = float)
   
# Batch-Gradient Descent.
start_time = time.time()
   
for i in range(num_iter):
    gradients = X_New.T.dot(X_New.dot(theta)- y)
    theta = theta - (1/m) * alpha * gradients
   
# Print the model parameters.    
print('model parameters:',theta,sep = '\n')
   
# Print Time Take for Gradient Descent to Run.
print('Time Taken For Gradient Descent in Sec:',time.time() - start_time)
  
# Hypothesis.
h = X_New.dot(theta) # Prediction on training data itself.
   
# Plot the Output.
plt.scatter(x, y, c = 'red')
plt.plot(x ,h)
plt.xlabel('Feature')
plt.ylabel('Target_Variable')
plt.title('Output')
                      
                       

Output:

model parameters:  [[ 1.15857049]   [44.42210912]]     Time Taken For Gradient Descent in Sec: 0.019551515579223633

Observations:

  1. Implementing a vectorized approach decreases the time taken for execution of Gradient Descent( Efficient Code ).
  2. Easy to debug.


Next Article
What is Gradient descent?

R

rohan007
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • python
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • Stochastic Gradient Descent In R
    Gradient Descent is an iterative optimization process that searches for an objective function’s optimum value (Minimum/Maximum). It is one of the most used methods for changing a model’s parameters to reduce a cost function in machine learning projects. In this article, we will learn the concept of
    10 min read
  • What is Gradient descent?
    Gradient Descent is a fundamental algorithm in machine learning and optimization. It is used for tasks like training neural networks, fitting regression lines, and minimizing cost functions in models. In this article we will understand what gradient descent is, how it works , mathematics behind it a
    8 min read
  • Different Variants of Gradient Descent
    Gradient descent is a fundamental optimization algorithm in machine learning used to minimize functions by iteratively moving towards the minimum. It's important for training models by fine-tuning parameters to reduce prediction errors. In this article, we are going to explore different variants of
    5 min read
  • ML | Stochastic Gradient Descent (SGD)
    Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d
    8 min read
  • Stochastic Gradient Descent Regressor
    A key method in data science and machine learning is the stochastic gradient descent (SGD) regression. It is essential to many regression activities and aids in the creation of predictive models for a variety of uses. We will study the idea of the SGD Regressor, its operation, and its importance in
    10 min read
  • Optimization techniques for Gradient Descent
    Gradient Descent is a widely used optimization algorithm for machine learning models. However, there are several optimization techniques that can be used to improve the performance of Gradient Descent. Here are some of the most popular optimization techniques for Gradient Descent: Learning Rate Sche
    4 min read
  • Gradient Descent in Linear Regression
    Gradient descent is a optimization algorithm used in linear regression to minimize the error in predictions. This article explores how gradient descent works in linear regression. Why Gradient Descent in Linear Regression?Linear regression involves finding the best-fit line for a dataset by minimizi
    4 min read
  • Gradient Descent With RMSProp from Scratch
    RMSprop modifies the traditional gradient descent algorithm by adapting the learning rate for each parameter based on the magnitude of recent gradients. The key advantage of RMSprop is that it helps to smooth the parameter updates and avoid oscillations, particularly when gradients fluctuate over ti
    4 min read
  • Stochastic Gradient Descent Classifier
    One essential tool in the data science and machine learning toolkit for a variety of classification tasks is the stochastic gradient descent (SGD) classifier. Through an exploration of its functionality and critical role in data-driven decision-making, we set out to explore the complexities of the S
    14 min read
  • Gradient Descent Algorithm in Machine Learning
    Gradient descent is the backbone of the learning process for various algorithms, including linear regression, logistic regression, support vector machines, and neural networks which serves as a fundamental optimization technique to minimize the cost function of a model by iteratively adjusting the m
    15+ min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences