Skip to content

Tutorials
Courses

Data Science
Data Science Projects
Data Analysis
Data Visualization
Machine Learning
ML Projects
Deep Learning
NLP
Computer Vision
Artificial Intelligence

Gradient Descent in Linear Regression

Gradient Descent in Linear Regression

Last Updated : 27 May, 2025

Comments

Improve

Suggest changes

Like Article

Like

Report

Gradient descent is a optimization algorithm used in linear regression to find the best fit line to the data. It works by gradually by adjusting the line’s slope and intercept to reduce the difference between actual and predicted values. This process helps the model make accurate predictions by minimizing errors step by step. In this article we will see more about Gradient Descent and its core concepts in detail.

Linear-Regression — Gradient Descent in Linear Regression

Above image shows two graphs, left one plots house prices against size to show errors measured by the cost function while right one shows how gradient descent moves downhill on the cost curve to minimize error by updating parameters step by step.

Why Use Gradient Descent for Linear Regression?

Linear regression finds the best-fit line for a dataset by minimizing the error between the actual and predicted values. This error is measured using the cost function usually Mean Squared Error (MSE). The goal is to find the model parameters i.e. the slope m and the intercept b that minimize this cost function.

For simple linear regression, we can use formulas like Normal Equation to find parameters directly. However for large datasets or high-dimensional data these methods become computationally expensive due to:

Large matrix computations.
Memory limitations.

In models like polynomial regression, the cost function becomes highly complex and non-linear, so analytical solutions are not available. That’s where gradient descent plays an important role even for:

Large datasets.
Complex, high-dimensional problems.

How Does Gradient Descent Work in Linear Regression?

Lets see various steps involved in the working of Gradient Descent in Linear Regression:

1. Initializing Parameters: Start with random initial values for the slope (m) and intercept (b).

2. Calculate the Cost Function: Measure the error using the Mean Squared Error (MSE):

J(m, b) = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - (mx_i + b) \right)^2

3. Compute the Gradient: Calculate how much the cost function changes with respect to m and b.

For slope m :

\frac{\partial J}{\partial m} = -\frac{2}{n} \sum_{i=1}^{n} x_i (y_i - (mx_i + b))

For intercept b:

\frac{\partial J}{\partial b} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - (mx_i + b))

4. Update Parameters: Change m and b to reduce the error:

For slope m :

m = m - \alpha \cdot \frac{\partial J}{\partial m}

For intercept b :

b = b - \alpha \cdot \frac{\partial J}{\partial b}

Here \alpha is the learning rate that controls the size of each update.

5. Repeat: Keep repeating steps 2–4 until the error stops decreasing significantly.

Implementation of Gradient Descent in Linear Regression

Let’s implement linear regression step by step. To understand how gradient descent improves the model, we will first build a simple linear regression without using gradient descent and observe its results.

Here we will be using Numpy, Pandas, Matplotlib and Sckit learn libraries for this.

X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42): Generating 100 data points with one feature and some noise for realism.
X_b = np.c_[np.ones((m, 1)), X]: Addind a column of ones to X to account for the intercept term in the model.
theta = np.array([[2.0], [3.0]]): Initializing model parameters (intercept and slope) with starting values.

Python

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_regression  X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42) y = y.reshape(-1, 1) m = X.shape[0]  X_b = np.c_[np.ones((m, 1)), X]  theta = np.array([[2.0], [3.0]])  plt.figure(figsize=(10, 5)) plt.scatter(X, y, color="blue", label="Actual Data") plt.plot(X, X_b.dot(theta), color="green", label="Initial Line (No GD)") plt.xlabel("Feature") plt.ylabel("Target") plt.title("Linear Regression Without Gradient Descent") plt.legend() plt.show()

Output:

download2 — Linear Regression without Gradient Descent

Here the model’s predictions are not accurate and the line does not fit the data well. This happens because the initial parameters are not optimized which prevents the model from finding the best-fit line.

Now we will apply gradient descent to improve the model and optimize these parameters.

learning_rate = 0.1, n_iterations = 100: Set the learning rate and number of iterations for gradient descent to run respectively.
gradients = (2 / m) * X_b.T.dot(y_pred - y): Finding gradients of the cost function with respect to parameters.
theta -= learning_rate * gradients: Updating parameters by moving opposite to the gradient direction.

Python

learning_rate = 0.1 n_iterations = 100  for _ in range(n_iterations):      y_pred = X_b.dot(theta)      gradients = (2 / m) * X_b.T.dot(y_pred - y)      theta -= learning_rate * gradients  plt.figure(figsize=(10, 5)) plt.scatter(X, y, color="blue", label="Actual Data") plt.plot(X, X_b.dot(theta), color="red", label="Optimized Line (With GD)") plt.xlabel("Feature") plt.ylabel("Target") plt.title("Linear Regression With Gradient Descent") plt.legend() plt.show()

Output:

download3 — Linear Regression with Gradient Descent

Linear Regression with Gradient Descent shows how the model gradually learns to fit the line that minimizes the difference between predicted and actual values by updating parameters step by step.

As datasets grow larger and models become more complex, gradient descent will continue to help in building accurate and efficient machine learning systems.

Gradient Descent in Linear Regression

M

mohit gupta_omg :)

Improve

Article Tags :

Machine Learning
AI-ML-DS
ML-Regression
python
AI-ML-DS With Python

Practice Tags :

Machine Learning
python

Similar Reads

Machine Learning Algorithms

Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith

Top 15 Machine Learning Algorithms Every Data Scientist Should Know in 2025

Machine Learning (ML) Algorithms are the backbone of everything from Netflix recommendations to fraud detection in financial institutions. These algorithms form the core of intelligent systems, empowering organizations to analyze patterns, predict outcomes, and automate decision-making processes. Wi

Linear Model Regression

Ordinary Least Squares (OLS) using statsmodels

Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P

Linear Regression (Python Implementation)

Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr

Multiple Linear Regression using Python - ML

Linear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen

Polynomial Regression ( From Scratch using Python )

Prerequisites Linear RegressionGradient DescentIntroductionLinear Regression finds the correlation between the dependent variable ( or target variable ) and independent variables ( or features ). In short, it is a linear model to fit the data linearly. But it fails to fit and catch the pattern in no

Bayesian Linear Regression

Linear regression is based on the assumption that the underlying data is normally distributed and that all relevant predictor variables have a linear relationship with the outcome. But In the real world, this is not always possible, it will follows these assumptions, Bayesian regression could be the

How to Perform Quantile Regression in Python

In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we a

Isotonic Regression in Scikit Learn

Isotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner. Mat

Stepwise Regression in Python

Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F

Least Angle Regression (LARS)

Regression is a supervised machine learning task that can predict continuous values (real numbers), as compared to classification, that can predict categorical or discrete values. Before we begin, if you are a beginner, I highly recommend this article. Least Angle Regression (LARS) is an algorithm u

Linear Model Classification

Logistic Regression in Machine Learning

Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two po

Understanding Activation Functions in Depth

In artificial neural networks, the activation function of a neuron determines its output for a given input. This output serves as the input for subsequent neurons in the network, continuing the process until the network solves the original problem. Consider a binary classification problem, where the

Regularization

Implementation of Lasso Regression From Scratch using Python

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a linear regression technique that combines prediction with feature selection. It does this by adding a penalty term to the cost function shrinking less relevant feature's coefficients to zero. This makes it effective for high-dim

Implementation of Ridge Regression from Scratch using Python

Prerequisites: Linear Regression Gradient Descent Introduction: Ridge Regression ( or L2 Regularization ) is a variation of Linear Regression. In Linear Regression, it minimizes the Residual Sum of Squares ( or RSS or cost function ) to fit the training examples perfectly as possible. The cost funct

Implementation of Elastic Net Regression From Scratch

Prerequisites: Linear RegressionGradient DescentLasso & Ridge RegressionIntroduction: Elastic-Net Regression is a modification of Linear Regression which shares the same hypothetical function for prediction. The cost function of Linear Regression is represented by J. \frac{1}{m} \sum_{i=1}^{m}\l

K-Nearest Neighbors (KNN)

Implementation of Elastic Net Regression From Scratch

Prerequisites: Linear RegressionGradient DescentLasso & Ridge RegressionIntroduction: Elastic-Net Regression is a modification of Linear Regression which shares the same hypothetical function for prediction. The cost function of Linear Regression is represented by J. \frac{1}{m} \sum_{i=1}^{m}\l

Brute Force Approach and its pros and cons

In this article, we will discuss the Brute Force Algorithm and what are its pros and cons. What is the Brute Force Algorithm?A brute force algorithm is a simple, comprehensive search strategy that systematically explores every option until a problem's answer is discovered. It's a generic approach to

Implementation of KNN classifier using Scikit - learn - Python

K-Nearest Neighbors isÂ aÂ mostÂ simpleÂ butÂ fundamentalÂ classifierÂ algorithmÂ in Machine Learning. ItÂ isÂ underÂ the supervised learningÂ categoryÂ andÂ usedÂ withÂ greatÂ intensityÂ forÂ pattern recognition, data mining andÂ analysis ofÂ intrusion.Â It is widely disposable in real-life scenarios since it is non-par

Regression using k-Nearest Neighbors in R Programming

Machine learning is a subset of Artificial Intelligence that provides a machine with the ability to learn automatically without being explicitly programmed. The machine in such cases improves from the experience without human intervention and adjusts actions accordingly. It is primarily of 3 types:

Support Vector Machines

Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or

Classifying data using Support Vector Machines(SVMs) in Python

Introduction to SVMs: In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. A Support Vector Machine (SVM) is a discriminative classifier

Support Vector Regression (SVR) using Linear and Non-Linear Kernels in Scikit Learn

Support vector regression (SVR) is a type of support vector machine (SVM) that is used for regression tasks. It tries to find a function that best predicts the continuous output value for a given input value. SVR can use both linear and non-linear kernels. A linear kernel is a simple dot product bet

Major Kernel Functions in Support Vector Machine (SVM)

In previous article we have discussed about SVM(Support Vector Machine) in Machine Learning. Now we are going to learnÂ in detail about SVM Kernel and Different Kernel Functions and its examples.Types of SVM Kernel FunctionsSVM algorithm use the mathematical function defined by the kernel. Kernel Fu

ML - Stochastic Gradient Descent (SGD)

Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d

Decision Tree

Major Kernel Functions in Support Vector Machine (SVM)

In previous article we have discussed about SVM(Support Vector Machine) in Machine Learning. Now we are going to learnÂ in detail about SVM Kernel and Different Kernel Functions and its examples.Types of SVM Kernel FunctionsSVM algorithm use the mathematical function defined by the kernel. Kernel Fu

CART (Classification And Regression Tree) in Machine Learning

CART( Classification And Regression Trees) is a variation of the decision tree algorithm. It can handle both classification and regression tasks. Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also called â€œgrowingâ€ trees). CART was first produced b

Decision Tree Classifiers in R Programming

Classification is the task in which objects of several categories are categorized into their respective classes using the properties of classes. A classification model is typically used to, Predict the class label for a new unlabeled data objectProvide a descriptive model explaining what features ch

Decision Tree Regression using sklearn - Python

Decision Tree Regression is a method used to predict continuous values like prices or scores by using a tree-like structure. It works by splitting the data into smaller parts based on simple rules taken from the input features. These splits help reduce errors in prediction. At the end of each branch

Ensemble Learning

Ensemble Methods in Python

Ensemble means a group of elements viewed as a whole rather than individually. An Ensemble method creates multiple models and combines them to solve it. Ensemble methods help to improve the robustness/generalizability of the model. In this article, we will discuss some methods with their implementat

Random Forest Regression in Python

A random forest is an ensemble learning method that combines the predictions from multiple decision trees to produce a more accurate and stable prediction. It is a type of supervised learning algorithm that can be used for both classification and regression tasks.In regression task we can use Random

ML | Extra Tree Classifier for Feature Selection

Prerequisites: Decision Tree Classifier Extremely Randomized Trees Classifier(Extra Trees Classifier) is a type of ensemble learning technique which aggregates the results of multiple de-correlated decision trees collected in a "forest" to output it's classification result. In concept, it is very si

Implementing the AdaBoost Algorithm From Scratch

AdaBoost means Adaptive Boosting which is a ensemble learning technique that combines multiple weak classifiers to create a strong classifier. It works by sequentially adding classifiers to correct the errors made by previous models giving more weight to the misclassified data points. In this articl

Traditional machine learning models like decision trees and random forests are easy to interpret but often struggle with accuracy on complex datasets. XGBoost short form for eXtreme Gradient Boosting is an advanced machine learning algorithm designed for efficiency, speed and high performance.It is

CatBoost in Machine Learning

When working with machine learning we often deal with datasets that include categorical data. We use techniques like One-Hot Encoding or Label Encoding to convert these categorical features into numerical values. However One-Hot Encoding can lead to sparse matrix and cause overfitting. This is where

LightGBM (Light Gradient Boosting Machine)

LightGBM is an open-source high-performance framework developed by Microsoft. It is an ensemble learning framework that uses gradient boosting method which constructs a strong learner by sequentially adding weak learners in a gradient descent manner.It's designed for efficiency, scalability and high

Stacking in Machine Learning

Stacking is a ensemble learning technique where the final model known as the â€œstacked model" combines the predictions from multiple base models. The goal is to create a stronger model by using different models and combining them.Architecture of StackingStacking architecture is like a team of models

Corporate & Communications Address:

A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)

Registered Address:

K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305

Advertise with us

Company
About Us
Legal
Privacy Policy
In Media
Contact Us
Advertise with us
GFG Corporate Solution
Placement Training Program

Languages
Python
Java
C++
PHP
GoLang
SQL
R Language
Android Tutorial
Tutorials Archive

DSA
Data Structures
Algorithms
DSA for Beginners
Basic DSA Problems
DSA Roadmap
Top 100 DSA Interview Problems
DSA Roadmap by Sandeep Jain
All Cheat Sheets

Data Science & ML
Data Science With Python
Data Science For Beginner
Machine Learning
ML Maths
Data Visualisation
Pandas
NumPy
NLP
Deep Learning

Web Technologies
HTML
CSS
JavaScript
TypeScript
ReactJS
NextJS
Bootstrap
Web Design

Python Tutorial
Python Programming Examples
Python Projects
Python Tkinter
Python Web Scraping
OpenCV Tutorial
Python Interview Question
Django

Computer Science
Operating Systems
Computer Network
Database Management System
Software Engineering
Digital Logic Design
Engineering Maths
Software Development
Software Testing

DevOps
Git
Linux
AWS
Docker
Kubernetes
Azure
GCP
DevOps Roadmap

System Design
High Level Design
Low Level Design
UML Diagrams
Interview Guide
Design Patterns
OOAD
System Design Bootcamp
Interview Questions

Inteview Preparation
Competitive Programming
Top DS or Algo for CP
Company-Wise Recruitment Process
Company-Wise Preparation
Aptitude Preparation
Puzzles

School Subjects
Mathematics
Physics
Chemistry
Biology
Social Science
English Grammar
Commerce
World GK

GeeksforGeeks Videos
DSA
Python
Java
C++
Web Development
Data Science
CS Subjects

@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

Improvement

Suggest Changes

Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.

geeksforgeeks-suggest-icon

Create Improvement

Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.

geeksforgeeks-improvement-icon

Suggest Changes

min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences

Admission Experiences

Career Journeys

Work Experiences

Campus Experiences

Competitive Exam Experiences