Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Stochastic Gradient Descent In R
Next article icon

Gradient Descent Algorithm in R

Last Updated : 09 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Gradient Descent is a fundamental optimization algorithm used in machine learning and statistics. It is designed to minimize a function by iteratively moving toward the direction of the steepest descent, as defined by the negative of the gradient. The goal is to find the set of parameters that result in the lowest possible error for a given model.

How Gradient Descent Works

  • The goal of Gradient Descent is to optimize a function by adjusting its parameters in such a way that the error between the model's predictions and the actual values is minimized.
  • The algorithm calculates the gradient (slope) of the cost function to each parameter and updates the parameters in the direction opposite to the gradient.
  • It ensures that with each iteration, the parameters move closer to the point where the error is minimized (global minimum).

Learning Rate and Its Effect

The learning rate (α) determines how large a step is taken during each update. It plays a critical role in the convergence of the algorithm:

  • A high learning rate may cause the algorithm to overshoot the minimum, resulting in divergence.
  • A low learning rate can lead to slow convergence, making the algorithm inefficient. An optimal learning rate strikes a balance between these two extremes, allowing for faster and more stable convergence.

Types of Gradient Descent

There are three main types of Gradient Descent:

  1. Batch Gradient Descent
  2. Stochastic Gradient Descent (SGD)
  3. Mini-batch Gradient Descent

1.Batch Gradient Descent

In Batch Gradient Descent, the gradient is calculated using the entire dataset. This means that every iteration takes into account all the data points when updating the model's parameters. While this approach is accurate, it can be slow and computationally expensive, especially with large datasets.

For a linear regression model, the loss function (Mean Squared Error) is:

[ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})^2 ]

where,

  • ( m ) is the number of data points.
  • ( h_{\theta}(x^{(i)}) ) is the predicted value for the ( i )-th data point.
  • ( y^{(i)} ) is the actual value.

The gradient for each parameter ( \theta_j ) is:

[ \frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

The parameters are updated as follows:

[ \theta_j = \theta_j - \alpha \frac{1}{m} \sum_{i=1}^{m} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

2. Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (SGD) updates the model's parameters for each individual data point, rather than using the entire dataset at once. This makes the algorithm faster and more efficient, especially for large datasets. However, because it uses only one data point at a time, the updates can be noisy, causing the loss function to fluctuate.

The gradient for each parameter ( \theta_j ) is calculated for each data point:

[ \frac{\partial J(\theta)}{\partial \theta_j} = \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

The parameters are updated as follows:

[ \theta_j = \theta_j - \alpha \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

3. Mini-batch Gradient Descent

Mini-batch Gradient Descent is a compromise between Batch Gradient Descent and Stochastic Gradient Descent. It splits the dataset into small batches and updates the model's parameters after processing each batch. This approach balances the efficiency of SGD and the accuracy of Batch Gradient Descent, reducing the noise while still being computationally efficient.

The gradient for each parameter ( \theta_j ) is calculated for a mini-batch of data points:

[ \frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{b} \sum_{i=1}^{b} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

where,

  • ( b ) is the number of data points in the mini-batch.

The parameters are updated as follows:

[ \theta_j = \theta_j - \alpha \frac{1}{b} \sum_{i=1}^{b} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) x_j^{(i)} ]

Now we implement step by step Batch Gradient Descent for a linear regression problem in R Programming Language.

Step 1: Data Preparation

First create a synthetic dataset for this example.

R
set.seed(42) n <- 100 x <- runif(n, min = 0, max = 100) y <- 50 * x + 100 + rnorm(n, mean = 0, sd = 10) 

Step 2:Initialize Parameters

Initialize the slope (m), intercept (b), learning rate (alpha), and the number of iterations.

R
m <- 0  # Initial slope b <- 0  # Initial intercept alpha <- 0.00001  # Learning rate iterations <- 1000  # Number of iterations 

Step 3: Manual Gradient Descent Implementation

We will implement the gradient descent algorithm using loops.

R
gradient_descent <- function(x, y, m, b, alpha, iterations) {   n <- length(y)  # Number of data points   cost_history <- numeric(iterations)  # To store the cost at each iteration      for (i in 1:iterations) {     # Predicted values     y_pred <- m * x + b          # Calculate gradients     gradient_m <- -(2/n) * sum(x * (y - y_pred))  # Gradient for slope (m)     gradient_b <- -(2/n) * sum(y - y_pred)  # Gradient for intercept (b)          # Update parameters     m <- m - alpha * gradient_m     b <- b - alpha * gradient_b          # Calculate and store the cost (Mean Squared Error)     cost <- sum((y - y_pred)^2) / n     cost_history[i] <- cost          # Print the cost every 100 iterations     if (i %% 100 == 0) {       cat("Iteration:", i, " Cost:", cost, "\n")     }   }      return(list(m = m, b = b, cost_history = cost_history)) }  # Run the gradient descent algorithm result <- gradient_descent(x, y, m, b, alpha, iterations)  # Extract final slope, intercept, and cost history final_m <- result$m final_b <- result$b cost_history <- result$cost_history 

Output:

Iteration: 100  Cost: 2395.926 
Iteration: 200 Cost: 2390.765
Iteration: 300 Cost: 2388.487
Iteration: 400 Cost: 2386.211
Iteration: 500 Cost: 2383.938
Iteration: 600 Cost: 2381.667
Iteration: 700 Cost: 2379.397
Iteration: 800 Cost: 2377.131
Iteration: 900 Cost: 2374.866
Iteration: 1000 Cost: 2372.604

Step 4: Plot the Fitted Line

Visualize the data points and the best-fit line obtained from gradient descent.

R
plot(x, y, main = "Gradient Descent: Fitted Line", xlab = "x", ylab = "y") abline(a = final_b, b = final_m, col = "red", lwd = 2) 

Output:

Screenshot-2024-09-08-123821
Plot the Fitted Line

Step 5: Visualization of the Cost Function Over Iterations

Plot the cost function over the iterations to visualize how the algorithm converges toward the minimum.

R
plot(1:iterations, cost_history, type = "l", col = "blue", lwd = 2,      main = "Cost Function over Iterations", xlab = "Iterations", ylab = "Cost") 

Output:

Screenshot-2024-09-08-124000
Plot the Cost Function

Step 6: Summary of Results

Check the final result.

R
cat("Final Slope (m):", final_m, "\nFinal Intercept (b):", final_b, "\n") 

Output:

Final Slope (m): 51.42538 
Final Intercept (b): 1.215063

Applications and Considerations

Applications:

  • Linear and Logistic Regression: Gradient Descent is used to optimize the parameters for these models.
  • Neural Networks: It is crucial for training neural networks, where it helps adjust weights and biases to minimize error.
  • Support Vector Machines (SVMs): Gradient Descent can be used to optimize the margin in SVMs.

Considerations:

  • The learning rate must be chosen carefully. Too high a rate can cause the algorithm to overshoot the minimum, while too low a rate can make convergence very slow.
  • Gradient Descent might not always reach the global minimum, especially if the loss function has multiple minima (local minima).
  • The choice between Batch, Stochastic, and Mini-batch Gradient Descent depends on the dataset size and available computational resources.

Conclusion

Gradient Descent is a versatile and essential optimization algorithm used across various machine learning models. By understanding its different types and how to implement it in R, one can effectively optimize models for better performance. Choosing the right type of Gradient Descent and properly tuning the learning rate are critical for achieving the best results.


Next Article
Stochastic Gradient Descent In R

S

surajpatlcyj
Improve
Article Tags :
  • Deep Learning
  • AI-ML-DS
  • AI-ML-DS With R

Similar Reads

  • Gradient Descent Algorithm in Machine Learning
    Gradient descent is the backbone of the learning process for various algorithms, including linear regression, logistic regression, support vector machines, and neural networks which serves as a fundamental optimization technique to minimize the cost function of a model by iteratively adjusting the m
    15+ min read
  • What is Gradient descent?
    Gradient Descent is a fundamental algorithm in machine learning and optimization. It is used for tasks like training neural networks, fitting regression lines, and minimizing cost functions in models. In this article we will understand what gradient descent is, how it works , mathematics behind it a
    8 min read
  • Applications of Gradient Descent in TensorFlow
    To reduce a model's cost function, machine learning practitioners frequently employ the gradient descent optimization procedure. It entails incrementally changing the model's parameters in the direction of the cost function's steepest decline. A free machine learning package called TensorFlow has bu
    7 min read
  • Stochastic Gradient Descent In R
    Gradient Descent is an iterative optimization process that searches for an objective function’s optimum value (Minimum/Maximum). It is one of the most used methods for changing a model’s parameters to reduce a cost function in machine learning projects. In this article, we will learn the concept of
    10 min read
  • Gradient Boosting in R
    In this article, we will explore how to implement Gradient Boosting in R, its theory, and practical examples using various R packages, primarily gbm and xgboost. Gradient Boosting in RGradient Boosting is a powerful machine-learning technique for regression and classification problems. It builds mod
    6 min read
  • Gradient Descent in Linear Regression
    Gradient descent is a optimization algorithm used in linear regression to minimize the error in predictions. This article explores how gradient descent works in linear regression. Why Gradient Descent in Linear Regression?Linear regression involves finding the best-fit line for a dataset by minimizi
    4 min read
  • Root Finding Algorithm
    Root-finding algorithms are tools used in mathematics and computer science to locate the solutions, or "roots," of equations. These algorithms help us find solutions to equations where the function equals zero. For example, if we have an equation like f(x) = 0, a root-finding algorithm will help us
    8 min read
  • Vectorization Of Gradient Descent
    In Machine Learning, Regression problems can be solved in the following ways: 1. Using Optimization Algorithms - Gradient Descent Batch Gradient Descent.Stochastic Gradient Descent.Mini-Batch Gradient DescentOther Advanced Optimization Algorithms like ( Conjugate Descent ... ) 2. Using the Normal Eq
    5 min read
  • Gradient Boosting in ML
    Gradient Boosting is a ensemble learning method used for classification and regression tasks. It is a boosting algorithm which combine multiple weak learner to create a strong predictive model. It works by sequentially training models where each new model tries to correct the errors made by its pred
    5 min read
  • First-Order algorithms in machine learning
    First-order algorithms are a cornerstone of optimization in machine learning, particularly for training models and minimizing loss functions. These algorithms are essential for adjusting model parameters to improve performance and accuracy. This article delves into the technical aspects of first-ord
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences