Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Ordinary Least Squares (OLS) using statsmodels
Next article icon

Linear Regression in Machine learning

Last Updated : 09 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets.
It assumes that there is a linear relationship between the input and output, meaning the output changes at a constant rate as the input changes. This relationship is represented by a straight line.

For example, We want to predict a student’s exam score based on how many hours they studied.

We observe that as students study more hours, their scores go up. In the example of predicting exam scores based on hours studied:

  • Independent variable (input): Hours studied because it’s the factor we control or observe.
  • Dependent variable (output): Exam score because it depends on how many hours were studied.

We use the independent variable to predict the dependent variable.

Why Linear Regression is Important?

Here’s why linear regression is important:

  • Simplicity and Interpretability: It’s easy to understand and interpret, making it a starting point for learning about machine learning.
  • Predictive Ability: Helps predict future outcomes based on past data, making it useful in various fields like finance, healthcare and marketing.
  • Basis for Other Models: Many advanced algorithms, like logistic regression or neural networks, build on the concepts of linear regression.
  • Efficiency: It’s computationally efficient and works well for problems with a linear relationship.
  • Widely Used: It’s one of the most widely used techniques in both statistics and machine learning for regression tasks.
  • Analysis: It provides insights into relationships between variables (e.g., how much one variable influences another).

Best Fit Line in Linear Regression

In linear regression, the best-fit line is the straight line that most accurately represents the relationship between the independent variable (input) and the dependent variable (output). It is the line that minimizes the difference between the actual data points and the predicted values from the model.

1. Goal of the Best-Fit Line

The goal of linear regression is to find a straight line that minimizes the error (the difference) between the observed data points and the predicted values. This line helps us predict the dependent variable for new, unseen data.

Linear Regression in Machine learning

Linear Regression

Here Y is called a dependent or target variable and X is called an independent variable also known as the predictor of Y. There are many types of functions or modules that can be used for regression. A linear function is the simplest type of function. Here, X may be a single feature or multiple features representing the problem.

2. Equation of the Best-Fit Line

For simple linear regression (with one independent variable), the best-fit line is represented by the equation

[Tex]y = mx + b[/Tex]

Where:

  • y is the predicted value (dependent variable)
  • x is the input (independent variable)
  • m is the slope of the line (how much y changes when x changes)
  • b is the intercept (the value of y when x = 0)

The best-fit line will be the one that optimizes the values of m (slope) and b (intercept) so that the predicted y values are as close as possible to the actual data points.

3. Minimizing the Error: The Least Squares Method

To find the best-fit line, we use a method called Least Squares. The idea behind this method is to minimize the sum of squared differences between the actual values (data points) and the predicted values from the line. These differences are called residuals.

The formula for residuals is:

[Tex]Residual = yᵢ – ŷᵢ[/Tex]

Where:

  • [Tex]yᵢ[/Tex] is the actual observed value
  • [Tex]ŷᵢ[/Tex] is the predicted value from the line for that [Tex]xᵢ[/Tex]

The least squares method minimizes the sum of the squared residuals:

[Tex]Sum of squared errors (SSE) = Σ(yᵢ – ŷᵢ)²[/Tex]

This method ensures that the line best represents the data, where the sum of the squared differences between the predicted values and actual values is as small as possible.

4. Interpretation of the Best-Fit Line

  • Slope (m): The slope of the best-fit line indicates how much the dependent variable (y) changes with each unit change in the independent variable (x). For example, if the slope is 5, it means that for every 1-unit increase in x, the value of y increases by 5 units.
  • Intercept (b): The intercept represents the predicted value of y when x = 0. It’s the point where the line crosses the y-axis.

In linear regression some hypothesis are made to ensure reliability of the model’s results.

Limitations

  • Assumes Linearity: The method assumes the relationship between the variables is linear. If the relationship is non-linear, linear regression might not work well.
  • Sensitivity to Outliers: Outliers can significantly affect the slope and intercept, skewing the best-fit line.

Hypothesis function in Linear Regression

In linear regression, the hypothesis function is the equation used to make predictions about the dependent variable based on the independent variables. It represents the relationship between the input features and the target output.

For a simple case with one independent variable, the hypothesis function is:

[Tex]h(x) = β₀ + β₁x[/Tex]

Where:

  • [Tex]h(x) (or ŷ)[/Tex] is the predicted value of the dependent variable (y).
  • x is the independent variable.
  • [Tex]β₀[/Tex] is the intercept, representing the value of y when x is 0.
  • [Tex]β₁[/Tex] is the slope, indicating how much y changes for each unit change in x.

For multiple linear regression (with more than one independent variable), the hypothesis function expands to:

[Tex]h(x₁, x₂, …, xₖ) = β₀ + β₁x₁ + β₂x₂ + … + βₖxₖ[/Tex]

Where:

  • [Tex]x₁, x₂, …, xₖ[/Tex] are the independent variables.
  • [Tex]β₀[/Tex] is the intercept.
  • [Tex]β₁, β₂, …, βₖ[/Tex] are the coefficients, representing the influence of each respective independent variable on the predicted output.

Assumptions of the Linear Regression

1. Linearity: The relationship between inputs (X) and the output (Y) is a straight line.

python-linear-regression-4

Linearity

2. Independence of Errors: The errors in predictions should not affect each other.

3. Constant Variance (Homoscedasticity): The errors should have equal spread across all values of the input. If the spread changes (like fans out or shrinks), it’s called heteroscedasticity and it’s a problem for the model.

python-linear-regression-5

Homoscedasticity

4. Normality of Errors: The errors should follow a normal (bell-shaped) distribution.

5. No Multicollinearity(for multiple regression): Input variables shouldn’t be too closely related to each other.

6. No Autocorrelation: Errors shouldn’t show repeating patterns, especially in time-based data.

7. Additivity: The total effect on Y is just the sum of effects from each X, no mixing or interaction between them.’

To understand Multicollinearity in detail refer to article: Multicollinearity.

Types of Linear Regression

When there is only one independent feature it is known as Simple Linear Regression or Univariate Linear Regression and when there are more than one feature it is known as Multiple Linear Regression orMultivariate Regression.

1. Simple Linear Regression

Simple linear regression is used when we want to predict a target value (dependent variable) using only one input feature (independent variable). It assumes a straight-line relationship between the two.

Formula:

[Tex]\hat{y} = \theta_0 + \theta_1 x[/Tex]

Where:

  • [Tex]\hat{y}[/Tex]​ is the predicted value
  • x is the input (independent variable)
  • [Tex]\theta_0[/Tex] is the intercept (value of [Tex]\hat{y}​[/Tex] when x=0)
  • [Tex]\theta_1[/Tex]​ is the slope or coefficient (how much [Tex]\hat{y}[/Tex]​ changes with one unit of x)

Example:

Predicting a person’s salary (y) based on their years of experience (x).

2. Multiple Linear Regression

Multiple linear regression involves more than one independent variable and one dependent variable. The equation for multiple linear regression is:

[Tex]\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n[/Tex]

where:

  • [Tex]\hat{y}[/Tex]​ is the predicted value
  • [Tex]x_1, x_2, \dots, x_n \quad[/Tex] are the independent variables
  • [Tex]\theta_1, \theta_2, \dots, \theta_n \quad[/Tex] are the coefficients (weights) corresponding to each predictor.
  • [Tex]\theta_0 \quad[/Tex] is the intercept.

The goal of the algorithm is to find the best Fit Line equation that can predict the values based on the independent variables.

In regression set of records are present with X and Y values and these values are used to learn a function so if you want to predict Y from an unknown X this learned function can be used. In regression we have to find the value of Y, So, a function is required that predicts continuous Y in the case of regression given X as independent features.

Use Case of Multiple Linear Regression

Multiple linear regression allows us to analyze relationship between multiple independent variables and a single dependent variable. Here are some use cases:

  • Real Estate Pricing: In real estate MLR is used to predict property prices based on multiple factors such as location, size, number of bedrooms, etc. This helps buyers and sellers understand market trends and set competitive prices.
  • Financial Forecasting: Financial analysts use MLR to predict stock prices or economic indicators based on multiple influencing factors such as interest rates, inflation rates and market trends. This enables better investment strategies and risk management24.
  • Agricultural Yield Prediction: Farmers can use MLR to estimate crop yields based on several variables like rainfall, temperature, soil quality and fertilizer usage. This information helps in planning agricultural practices for optimal productivity
  • E-commerce Sales Analysis: An e-commerce company can utilize MLR to assess how various factors such as product price, marketing promotions and seasonal trends impact sales.

Now that we have understood about linear regression, its assumption and its type now we will learn how to make a linear regression model.

Cost function for Linear Regression

As we have discussed earlier about best fit line in linear regression, its not easy to get it easily in real life cases so we need to calculate errors that affects it. These errors need to be calculated to mitigate them. The difference between the predicted value [Tex]\hat{Y}     [/Tex]and the true value Y and it is called cost function or the loss function.

In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which calculates the average of the squared errors between the predicted values [Tex]\hat{y}_i[/Tex] and the actual values [Tex]{y}_i[/Tex]. The purpose is to determine the optimal values for the intercept [Tex]\theta_1[/Tex] and the coefficient of the input feature [Tex]\theta_2[/Tex] providing the best-fit line for the given data points. The linear equation expressing this relationship is [Tex]\hat{y}_i = \theta_1 + \theta_2x_i[/Tex].

MSE function can be calculated as:

[Tex]\text{Cost function}(J) = \frac{1}{n}\sum_{n}^{i}(\hat{y_i}-y_i)^2[/Tex]

Utilizing the MSE function, the iterative process of gradient descent is applied to update the values of \[Tex]\theta_1 \& \theta_2 [/Tex]. This ensures that the MSE value converges to the global minima, signifying the most accurate fit of the linear regression line to the dataset.

This process involves continuously adjusting the parameters \(\theta_1\) and \(\theta_2\) based on the gradients calculated from the MSE. The final result is a linear regression line that minimizes the overall squared differences between the predicted and actual values, providing an optimal representation of the underlying relationship in the data.

Now we have calculated loss function we need to optimize model to mtigate this error and it is done through gradient descent.

Gradient Descent for Linear Regression

A linear regression model can be trained using the optimization algorithm gradient descent by iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the model on a training dataset. To update θ1 and θ2 values in order to reduce the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The idea is to start with random θ1 and θ2 values and then iteratively update the values, reaching minimum cost. 

A gradient is nothing but a derivative that defines the effects on outputs of the function with a little bit of variation in inputs.

Let’s differentiate the cost function(J) with respect to [Tex]\theta_1     [/Tex]  

[Tex]\begin {aligned} {J}’_{\theta_1} &=\frac{\partial J(\theta_1,\theta_2)}{\partial \theta_1} \\ &= \frac{\partial}{\partial \theta_1} \left[\frac{1}{n} \left(\sum_{i=1}^{n}(\hat{y}_i-y_i)^2 \right )\right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(\frac{\partial}{\partial \theta_1}(\hat{y}_i-y_i) \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(\frac{\partial}{\partial \theta_1}( \theta_1 + \theta_2x_i-y_i) \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(1+0-0 \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}(\hat{y}_i-y_i) \left(2 \right ) \right] \\ &= \frac{2}{n}\sum_{i=1}^{n}(\hat{y}_i-y_i) \end {aligned}[/Tex]

Let’s differentiate the cost function(J) with respect to [Tex]\theta_2[/Tex]

[Tex]\begin {aligned} {J}’_{\theta_2} &=\frac{\partial J(\theta_1,\theta_2)}{\partial \theta_2} \\ &= \frac{\partial}{\partial \theta_2} \left[\frac{1}{n} \left(\sum_{i=1}^{n}(\hat{y}_i-y_i)^2 \right )\right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(\frac{\partial}{\partial \theta_2}(\hat{y}_i-y_i) \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(\frac{\partial}{\partial \theta_2}( \theta_1 + \theta_2x_i-y_i) \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}2(\hat{y}_i-y_i) \left(0+x_i-0 \right ) \right] \\ &= \frac{1}{n}\left[\sum_{i=1}^{n}(\hat{y}_i-y_i) \left(2x_i \right ) \right] \\ &= \frac{2}{n}\sum_{i=1}^{n}(\hat{y}_i-y_i)\cdot x_i \end {aligned}[/Tex]

Finding the coefficients of a linear equation that best fits the training data is the objective of linear regression. By moving in the direction of the Mean Squared Error negative gradient with respect to the coefficients, the coefficients can be changed. And the respective intercept and coefficient of X will be if [Tex]\alpha     [/Tex] is the learning rate.

Gradient Descent -Geeksforgeeks

Gradient Descent

[Tex]\begin{aligned} \theta_1 &= \theta_1 – \alpha \left( {J}’_{\theta_1}\right) \\&=\theta_1 -\alpha \left( \frac{2}{n}\sum_{i=1}^{n}(\hat{y}_i-y_i)\right) \end{aligned} \\ \begin{aligned} \theta_2 &= \theta_2 – \alpha \left({J}’_{\theta_2}\right) \\&=\theta_2 – \alpha \left(\frac{2}{n}\sum_{i=1}^{n}(\hat{y}_i-y_i)\cdot x_i\right) \end{aligned}[/Tex]

After optimizing our model, we evaluate our models accuracy to see how well it will perform in real world scenario.

Evaluation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of any linear regression model. These assessment metrics often give an indication of how well the model is producing the observed outputs.

The most common measurements are:

1. Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the average of the squared differences between the actual and predicted values for all the data points. The difference is squared to ensure that negative and positive differences don’t cancel each other out.

[Tex]MSE = \frac{1}{n}\sum_{i=1}^{n}\left ( y_i – \widehat{y_{i}} \right )^2[/Tex]

Here,

  • n is the number of data points.
  • yi is the actual or observed value for the ith data point.
  • [Tex]\widehat{y_{i}} [/Tex] is the predicted value for the ith data point.

MSE is a way to quantify the accuracy of a model’s predictions. MSE is sensitive to outliers as large errors contribute significantly to the overall score.

2. Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the accuracy of a regression model. MAE measures the average absolute difference between the predicted values and actual values.

Mathematically, MAE is expressed as:

[Tex]MAE =\frac{1}{n} \sum_{i=1}^{n}|Y_i – \widehat{Y_i}|[/Tex]

Here,

  • n is the number of observations
  • Yi represents the actual values.
  • [Tex]\widehat{Y_i} [/Tex] represents the predicted values

Lower MAE value indicates better model performance. It is not sensitive to the outliers as we consider absolute differences.

3. Root Mean Squared Error (RMSE)

The square root of the residuals’ variance is the Root Mean Squared Error. It describes how well the observed data points match the expected values, or the model’s absolute fit to the data.
In mathematical notation, it can be expressed as:

[Tex]RMSE=\sqrt{\frac{RSS}{n}}=\sqrt\frac{{{\sum_{i=2}^{n}(y^{actual}_{i}}- y_{i}^{predicted})^2}}{n}[/Tex]

Rather than dividing the entire number of data points in the model by the number of degrees of freedom, one must divide the sum of the squared residuals to obtain an unbiased estimate. Then, this figure is referred to as the Residual Standard Error (RSE).

In mathematical notation, it can be expressed as:

[Tex]RMSE=\sqrt{\frac{RSS}{n}}=\sqrt\frac{{{\sum_{i=2}^{n}(y^{actual}_{i}}- y_{i}^{predicted})^2}}{(n-2)}[/Tex]

RSME is not as good of a metric as R-squared. Root Mean Squared Error can fluctuate when the units of the variables vary since its value is dependent on the variables’ units (it is not a normalized measure).

4. Coefficient of Determination (R-squared)

R-Squared is a statistic that indicates how much variation the developed model can explain or capture. It is always in the range of 0 to 1. In general, the better the model matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:

[Tex]R^{2}=1-(^{\frac{RSS}{TSS}})[/Tex]

  • Residual sum of Squares (RSS): The sum of squares of the residual for each data point in the plot or data is known as the residual sum of squares, or RSS. It is a measurement of the difference between the output that was observed and what was anticipated.

[Tex]RSS=\sum_{i=1}^{n}(y_{i}-b_{0}-b_{1}x_{i})^{2}[/Tex]

  • Total Sum of Squares (TSS): The sum of the data points’ errors from the answer variable’s mean is known as the total sum of squares, or TSS.

[Tex]TSS=\sum_{i=1}^{n}(y-\overline{y_{i}})^2[/Tex].

R squared metric is a measure of the proportion of variance in the dependent variable that is explained the independent variables in the model.

5. Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent variable that is explained by independent variables in a regression model. Adjusted R-square accounts the number of predictors in the model and penalizes the model for including irrelevant predictors that don’t contribute significantly to explain the variance in the dependent variables.

Mathematically, adjusted R2 is expressed as:

[Tex]Adjusted \, R^2 = 1 – (\frac{(1-R^2).(n-1)}{n-k-1})[/Tex]

Here,

  • n is the number of observations
  • k is the number of predictors in the model
  • R2 is coeeficient of determination

Adjusted R-square helps to prevent overfitting. It penalizes the model with additional predictors that do not contribute significantly to explain the variance in the dependent variable.

While evaluation metrics help us measure the performance of a model, regularization helps in improving that performance by addressing overfitting and enhancing generalization.

Regularization Techniques for Linear Models

1. Lasso Regression (L1 Regularization)

Lasso Regression is a technique used for regularizing a linear regression model, it adds a penalty term to the linear regression objective function to prevent overfitting.

The objective function after applying lasso regression is:

[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} – y_i) ^2+ \lambda \sum_{j=1}^{n}|\theta_j|[/Tex]

  • the first term is the least squares loss, representing the squared difference between predicted and actual values.
  • the second term is the L1 regularization term, it penalizes the sum of absolute values of the regression coefficient θj.

2. Ridge Regression (L2 Regularization)

Ridge regression is a linear regression technique that adds a regularization term to the standard linear objective. Again, the goal is to prevent overfitting by penalizing large coefficient in linear regression equation. It useful when the dataset has multicollinearity where predictor variables are highly correlated.

The objective function after applying ridge regression is:

[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} – y_i)^2 + \lambda \sum_{j=1}^{n}\theta_{j}^{2}[/Tex]

  • the first term is the least squares loss, representing the squared difference between predicted and actual values.
  • the second term is the L1 regularization term, it penalizes the sum of square of values of the regression coefficient θj.

3. Elastic Net Regression

Elastic Net Regression is a hybrid regularization technique that combines the power of both L1 and L2 regularization in linear regression objective.

[Tex]J(\theta) = \frac{1}{2m} \sum_{i=1}^{m}(\widehat{y_i} – y_i)^2 + \alpha \lambda \sum_{j=1}^{n}{|\theta_j|} + \frac{1}{2}(1- \alpha) \lambda \sum_{j=1}{n} \theta_{j}^{2}[/Tex]

  • the first term is least square loss.
  • the second term is L1 regularization and third is ridge regression.
  • λ is the overall regularization strength.
  • α controls the mix between L1 and L2 regularization.

Now that we have learned how to make a linear regression model, now we will implement it.

Python Implementation of Linear Regression

Import the necessary libraries:

Python
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.axes as ax from matplotlib.animation import FuncAnimation 

Load the dataset and separate input and Target variables

Here is the link for dataset: Dataset Link

Python
url = 'https://media.geeksforgeeks.org/wp-content/uploads/20240320114716/data_for_lr.csv' data = pd.read_csv(url) data  # Drop the missing values data = data.dropna()  # training dataset and labels train_input = np.array(data.x[0:500]).reshape(500, 1) train_output = np.array(data.y[0:500]).reshape(500, 1)  # valid dataset and labels test_input = np.array(data.x[500:700]).reshape(199, 1) test_output = np.array(data.y[500:700]).reshape(199, 1) 

Build the Linear Regression Model and Plot the regression line

Steps:

  • In forward propagation Linear regression function Y=mx+c is applied by initially assigning random value of parameter (m & c).
  • The we have written the function to finding the cost function i.e the mean 
Python
class LinearRegression:      def __init__(self):          self.parameters = {}       def forward_propagation(self, train_input):          m = self.parameters['m']          c = self.parameters['c']          predictions = np.multiply(m, train_input) + c          return predictions       def cost_function(self, predictions, train_output):          cost = np.mean((train_output - predictions) ** 2)          return cost       def backward_propagation(self, train_input, train_output, predictions):          derivatives = {}          df = (predictions-train_output)          # dm= 2/n * mean of (predictions-actual) * input          dm = 2 * np.mean(np.multiply(train_input, df))          # dc = 2/n * mean of (predictions-actual)          dc = 2 * np.mean(df)          derivatives['dm'] = dm          derivatives['dc'] = dc          return derivatives       def update_parameters(self, derivatives, learning_rate):          self.parameters['m'] = self.parameters['m'] - learning_rate * derivatives['dm']          self.parameters['c'] = self.parameters['c'] - learning_rate * derivatives['dc']       def train(self, train_input, train_output, learning_rate, iters):          # Initialize random parameters          self.parameters['m'] = np.random.uniform(0, 1) * -1         self.parameters['c'] = np.random.uniform(0, 1) * -1          # Initialize loss          self.loss = []           # Initialize figure and axis for animation          fig, ax = plt.subplots()          x_vals = np.linspace(min(train_input), max(train_input), 100)          line, = ax.plot(x_vals, self.parameters['m'] * x_vals +                         self.parameters['c'], color='red', label='Regression Line')          ax.scatter(train_input, train_output, marker='o',                  color='green', label='Training Data')           # Set y-axis limits to exclude negative values          ax.set_ylim(0, max(train_output) + 1)           def update(frame):              # Forward propagation              predictions = self.forward_propagation(train_input)               # Cost function              cost = self.cost_function(predictions, train_output)               # Back propagation              derivatives = self.backward_propagation(                  train_input, train_output, predictions)               # Update parameters              self.update_parameters(derivatives, learning_rate)               # Update the regression line              line.set_ydata(self.parameters['m']                          * x_vals + self.parameters['c'])               # Append loss and print              self.loss.append(cost)              print("Iteration = {}, Loss = {}".format(frame + 1, cost))               return line,          # Create animation          ani = FuncAnimation(fig, update, frames=iters, interval=200, blit=True)           # Save the animation as a video file (e.g., MP4)          ani.save('linear_regression_A.gif', writer='ffmpeg')           plt.xlabel('Input')          plt.ylabel('Output')          plt.title('Linear Regression')          plt.legend()          plt.show()           return self.parameters, self.loss  

Trained the model and Final Prediction

Python
#Example usage linear_reg = LinearRegression() parameters, loss = linear_reg.train(train_input, train_output, 0.0001, 20) 

Output:

Iteration = 1, Loss = 9130.407560462196
Iteration = 1, Loss = 1107.1996742908998
Iteration = 1, Loss = 140.31580932842422
Iteration = 1, Loss = 23.795780526084116
Iteration = 2, Loss = 9.753848205147605
Iteration = 3, Loss = 8.061641745006835
Iteration = 4, Loss = 7.8577116490914864
Iteration = 5, Loss = 7.8331350515579015
Iteration = 6, Loss = 7.830172502503967
Iteration = 7, Loss = 7.829814681591015
Iteration = 8, Loss = 7.829770758846183
Iteration = 9, Loss = 7.829764664327399
Iteration = 10, Loss = 7.829763128602258
Iteration = 11, Loss = 7.829762142342088
Iteration = 12, Loss = 7.829761222379141
Iteration = 13, Loss = 7.829760310486438
Iteration = 14, Loss = 7.829759399646989
Iteration = 15, Loss = 7.829758489015161
Iteration = 16, Loss = 7.829757578489033
Iteration = 17, Loss = 7.829756668056319
Iteration = 18, Loss = 7.829755757715535
Iteration = 19, Loss = 7.829754847466484
Iteration = 20, Loss = 7.829753937309139

ezgif-1-ba0c9540c5

The linear regression line provides valuable insights into the relationship between the two variables. It represents the best-fitting line that captures the overall trend of how a dependent variable (Y) changes in response to variations in an independent variable (X).

  • Positive Linear Regression Line: A positive linear regression line indicates a direct relationship between the independent variable (X) and the dependent variable (Y). This means that as the value of X increases, the value of Y also increases. The slope of a positive linear regression line is positive, meaning that the line slants upward from left to right.
  • Negative Linear Regression Line: A negative linear regression line indicates an inverse relationship between the independent variable (X) and the dependent variable (Y). This means that as the value of X increases, the value of Y decreases. The slope of a negative linear regression line is negative, meaning that the line slants downward from left to right.

Applications of Linear Regression

Linear regression is used in many different fields including finance, economics and psychology to understand and predict the behavior of a particular variable.

For example linear regression is widely used in finance to analyze relationships and make predictions. It can model how a company’s earnings per share (EPS) influence its stock price. If the model shows that a $1 increase in EPS results in a $15 rise in stock price, investors gain insights into the company’s valuation. Similarly, linear regression can forecast currency values by analyzing historical exchange rates and economic indicators, helping financial professionals make informed decisions and manage risks effectively.

Also read – Linear Regression – In Simple Words, with real-life Examples

Advantages and Disadvantages of Linear Regression

Advantages of Linear Regression

  • Linear regression is a relatively simple algorithm, making it easy to understand and implement. The coefficients of the linear regression model can be interpreted as the change in the dependent variable for a one-unit change in the independent variable, providing insights into the relationships between variables.
  • Linear regression is computationally efficient and can handle large datasets effectively. It can be trained quickly on large datasets, making it suitable for real-time applications.
  • Linear regression is relatively robust to outliers compared to other machine learning algorithms. Outliers may have a smaller impact on the overall model performance.
  • Linear regression often serves as a good baseline model for comparison with more complex machine learning algorithms.
  • Linear regression is a well-established algorithm with a rich history and is widely available in various machine learning libraries and software packages.

Disadvantages of Linear Regression

  • Linear regression assumes a linear relationship between the dependent and independent variables. If the relationship is not linear, the model may not perform well.
  • Linear regression is sensitive to multicollinearity, which occurs when there is a high correlation between independent variables. Multicollinearity can inflate the variance of the coefficients and lead to unstable model predictions.
  • Linear regression assumes that the features are already in a suitable form for the model. Feature engineering may be required to transform features into a format that can be effectively used by the model.
  • Linear regression is susceptible to both overfitting and underfitting. Overfitting occurs when the model learns the training data too well and fails to generalize to unseen data. Underfitting occurs when the model is too simple to capture the underlying relationships in the data.
  • Linear regression provides limited explanatory power for complex relationships between variables. More advanced machine learning techniques may be necessary for deeper insights.


Next Article
Ordinary Least Squares (OLS) using statsmodels
author
mohit gupta_omg :)
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • AI-ML-DS With Python
  • ML-Regression
Practice Tags :
  • Machine Learning

Similar Reads

  • Maths for Machine Learning
    Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
    5 min read
  • Linear Algebra and Matrix

    • Matrices
      Matrices are key concepts in mathematics, widely used in solving equations and problems in fields like physics and computer science. A matrix is simply a grid of numbers, and a determinant is a value calculated from a square matrix. Example: [Tex]\begin{bmatrix} 6 & 9 \\ 5 & -4 \\ \end{bmatr
      3 min read

    • Scalar and Vector
      Scalar and Vector Quantities are used to describe the motion of an object. Scalar Quantities are defined as physical quantities that have magnitude or size only. For example, distance, speed, mass, density, etc. However, vector quantities are those physical quantities that have both magnitude and di
      8 min read

    • Add Two Matrices - Python
      The task of adding two matrices in Python involves combining corresponding elements from two given matrices to produce a new matrix. Each element in the resulting matrix is obtained by adding the values at the same position in the input matrices. For example, if two 2x2 matrices are given as: The su
      3 min read

    • Python Program to Multiply Two Matrices
      Given two matrices, we will have to create a program to multiply two matrices in Python. Example: Python Matrix Multiplication of Two-Dimension [GFGTABS] Python matrix_a = [[1, 2], [3, 4]] matrix_b = [[5, 6], [7, 8]] result = [[0, 0], [0, 0]] for i in range(2): for j in range(2): result[i][j] = (mat
      5 min read

    • Vector Operations
      Vectors are fundamental quantities in physics and mathematics, that have both magnitude and direction. So performing mathematical operations on them directly is not possible. So we have special operations that work only with vector quantities and hence the name, vector operations. Thus, It is essent
      8 min read

    • Product of Vectors
      Vector operations are used almost everywhere in the field of physics. Many times these operations include addition, subtraction, and multiplication. Addition and subtraction can be performed using the triangle law of vector addition. In the case of products, vector multiplication can be done in two
      6 min read

    • Scalar Product of Vectors
      Two vectors or a vector and a scalar can be multiplied. There are mainly two kinds of products of vectors in physics, scalar multiplication of vectors and Vector Product (Cross Product) of two vectors. The result of the scalar product of two vectors is a number (a scalar). The common use of the scal
      9 min read

    • Dot and Cross Products on Vectors
      A quantity that has both magnitude and direction is known as a vector. Various operations can be performed on such quantities, such as addition, subtraction, and multiplication (products), etc. Some examples of vector quantities are: velocity, force, acceleration, and momentum, etc. Vectors can be m
      8 min read

    • Transpose a matrix in Single line in Python
      Transpose of a matrix is a task we all can perform very easily in Python (Using a nested loop). But there are some interesting ways to do the same in a single line. In Python, we can implement a matrix as a nested list (a list inside a list). Each element is treated as a row of the matrix. For examp
      4 min read

    • Transpose of a Matrix
      A Matrix is a rectangular arrangement of numbers (or elements) in rows and columns. It is often used in mathematics to represent data, solve systems of equations, or perform transformations. A matrix is written as: [Tex]A = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \\ 7 & 8 & 9\e
      12 min read

    • Adjoint and Inverse of a Matrix
      Given a square matrix, find the adjoint and inverse of the matrix. We strongly recommend you to refer determinant of matrix as a prerequisite for this. Adjoint (or Adjugate) of a matrix is the matrix obtained by taking the transpose of the cofactor matrix of a given square matrix is called its Adjoi
      15+ min read

    • How to inverse a matrix using NumPy
      In this article, we will see NumPy Inverse Matrix in Python before that we will try to understand the concept of it. The inverse of a matrix is just a reciprocal of the matrix as we do in normal arithmetic for a single number which is used to solve the equations to find the value of unknown variable
      3 min read

    • Program to find Determinant of a Matrix
      The determinant of a Matrix is defined as a special number that is defined only for square matrices (matrices that have the same number of rows and columns). A determinant is used in many places in calculus and other matrices related to algebra, it actually represents the matrix in terms of a real n
      15+ min read

    • Program to find Normal and Trace of a matrix
      Given a 2D matrix, the task is to find Trace and Normal of matrix.Normal of a matrix is defined as square root of sum of squares of matrix elements.Trace of a n x n square matrix is sum of diagonal elements. Examples : Input : mat[][] = {{7, 8, 9}, {6, 1, 2}, {5, 4, 3}}; Output : Normal = 16 Trace =
      6 min read

    • Data Science | Solving Linear Equations
      Linear Algebra is a very fundamental part of Data Science. When one talks about Data Science, data representation becomes an important aspect of Data Science. Data is represented usually in a matrix form. The second important thing in the perspective of Data Science is if this data contains several
      9 min read

    • Data Science - Solving Linear Equations with Python
      A collection of equations with linear relationships between the variables is known as a system of linear equations. The objective is to identify the values of the variables that concurrently satisfy each equation, each of which is a linear constraint. By figuring out the system, we can learn how the
      4 min read

    • System of Linear Equations
      In mathematics, a system of linear equations consists of two or more linear equations that share the same variables. These systems often arise in real-world applications, such as engineering, physics, economics, and more, where relationships between variables need to be analyzed. Understanding how t
      8 min read

    • System of Linear Equations in three variables using Cramer's Rule
      Cramer's rule: In linear algebra, Cramer's rule is an explicit formula for the solution of a system of linear equations with as many equations as unknown variables. It expresses the solution in terms of the determinants of the coefficient matrix and of matrices obtained from it by replacing one colu
      12 min read

    • Eigenvalues and Eigenvectors
      Eigenvectors are the directions that remain unchanged during a transformation, even if they get longer or shorter. Eigenvalues are the numbers that indicate how much something stretches or shrinks during that transformation. These ideas are important in many areas of math and engineering, including
      15+ min read

    • Applications of Eigenvalues and Eigenvectors
      Eigenvalues and eigenvectors play a crucial role in a wide range of applications across engineering and science. Fields like control theory, vibration analysis, electric circuits, advanced dynamics, and quantum mechanics frequently rely on these concepts. One key application involves transforming ma
      7 min read

    • How to compute the eigenvalues and right eigenvectors of a given square array using NumPY?
      In this article, we will discuss how to compute the eigenvalues and right eigenvectors of a given square array using NumPy library. Example: Suppose we have a matrix as: [[1,2], [2,3]] Eigenvalue we get from this matrix or square array is: [-0.23606798 4.23606798] Eigenvectors of this matrix are: [[
      2 min read

    Statistics for Machine Learning

    • Descriptive Statistic
      Statistics is the foundation of data science. Descriptive statistics are simple tools that help us understand and summarize data. They show the basic features of a dataset, like the average, highest and lowest values and how spread out the numbers are. It's the first step in making sense of informat
      5 min read

    • Measures of Central Tendency
      Usually, frequency distribution and graphical representation are used to depict a set of raw data to attain meaningful conclusions from them. However, sometimes, these methods fail to convey a proper and clear picture of the data as expected. Therefore, some measures, also known as Measures of Centr
      5 min read

    • Measures of Dispersion | Types, Formula and Examples
      Measures of Dispersion are used to represent the scattering of data. These are the numbers that show the various aspects of the data spread across various parameters. Let's learn about the measure of dispersion in statistics, its types, formulas, and examples in detail. Dispersion in StatisticsDispe
      10 min read

    • Mean, Variance and Standard Deviation
      Mean, Variance and Standard Deviation are fundamental concepts in statistics and engineering mathematics, essential for analyzing and interpreting data. These measures provide insights into data's central tendency, dispersion, and spread, which are crucial for making informed decisions in various en
      10 min read

    • Calculate the average, variance and standard deviation in Python using NumPy
      Numpy in Python is a general-purpose array-processing package. It provides a high-performance multidimensional array object and tools for working with these arrays. It is the fundamental package for scientific computing with Python. Numpy provides very easy methods to calculate the average, variance
      5 min read

    • Random Variable
      Random variable is a fundamental concept in statistics that bridges the gap between theoretical probability and real-world data. A Random variable in statistics is a function that assigns a real value to an outcome in the sample space of a random experiment. For example: if you roll a die, you can a
      11 min read

    • Difference between Parametric and Non-Parametric Methods
      Statistical analysis plays a crucial role in understanding and interpreting data across various disciplines. Two prominent approaches in statistical analysis are Parametric and Non-Parametric Methods. While both aim to draw inferences from data, they differ in their assumptions and underlying princi
      8 min read

    • Probability Distribution - Function, Formula, Table
      A probability distribution describes how the probabilities of different outcomes are assigned to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a frequency distribution shows how often outcomes occur in a sample or
      15+ min read

    • Confidence Interval
      Confidence Interval (CI) is a range of values that estimates where the true population value is likely to fall. Instead of just saying The average height of students is 165 cm a confidence interval allow us to say We are 95% confident that the true average height is between 160 cm and 170 cm. Before
      9 min read

    • Covariance and Correlation
      Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions. In this article, we will learn about the differences
      5 min read

    • Program to Find Correlation Coefficient
      The correlation coefficient is a statistical measure that helps determine the strength and direction of the relationship between two variables. It quantifies how changes in one variable correspond to changes in another. This coefficient, sometimes referred to as the cross-correlation coefficient, al
      9 min read

    • Robust Correlation
      Correlation is a statistical tool that is used to analyze and measure the degree of relationship or degree of association between two or more variables. There are generally three types of correlation: Positive correlation: When we increase the value of one variable, the value of another variable inc
      8 min read

    • Normal Probability Plot
      The probability plot is a way of visually comparing the data coming from different distributions. These data can be of empirical dataset or theoretical dataset. The probability plot can be of two types: P-P plot: The (Probability-to-Probability) p-p plot is the way to visualize the comparing of cumu
      3 min read

    • Quantile Quantile plots
      The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it
      8 min read

    • True Error vs Sample Error
      True Error The true error can be said as the probability that the hypothesis will misclassify a single randomly drawn sample from the population. Here the population represents all the data in the world. Let's consider a hypothesis h(x) and the true/target function is f(x) of population P. The proba
      3 min read

    • Bias-Variance Trade Off - Machine Learning
      It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. A p
      3 min read

    • Understanding Hypothesis Testing
      Hypothesis method compares two opposite statements about a population and uses sample data to decide which one is more likely to be correct.To test this assumption we first take a sample from the population and analyze it and use the results of the analysis to decide if the claim is valid or not. Su
      14 min read

    • T-test
      After learning about the Z-test we now move on to another important statistical test called the t-test. While the Z-test is useful when we know the population variance. The t-test is used to compare the averages of two groups to see if they are significantly different from each other. Suppose You wa
      11 min read

    • Paired T-Test - A Detailed Overview
      Student’s t-test or t-test is the statistical method used to determine if there is a difference between the means of two samples. The test is often performed to find out if there is any sampling error or unlikeliness in the experiment. This t-test is further divided into 3 types based on your data a
      5 min read

    • P-value in Machine Learning
      P-value helps us determine how likely it is to get a particular result when the null hypothesis is assumed to be true. It is the probability of getting a sample like ours or more extreme than ours if the null hypothesis is correct. Therefore, if the null hypothesis is assumed to be true, the p-value
      6 min read

    • F-Test in Statistics
      F test is a statistical test that is used in hypothesis testing that determines whether the variances of two samples are equal or not. The article will provide detailed information on f test, f statistic, its calculation, critical value and how to use it to test hypotheses. To understand F test firs
      6 min read

    • Z-test : Formula, Types, Examples
      After learning about inferential statistics we now move on to a more specific technique used for making decisions based on sample data – the Z-test. Studying entire populations can be time-consuming, costly and sometimes impossible. so instead you take a sample from that population. This is where th
      9 min read

    • Residual Leverage Plot (Regression Diagnostic)
      In linear or multiple regression, it is not enough to just fit the model into the dataset. But, it may not give the desired result. To apply the linear or multiple regression efficiently to the dataset. There are some assumptions that we need to check on the dataset that made linear/multiple regress
      5 min read

    • Difference between Null and Alternate Hypothesis
      Hypothesis is a statement or an assumption that may be true or false. There are six types of hypotheses mainly the Simple hypothesis, Complex hypothesis, Directional hypothesis, Associative hypothesis, and Null hypothesis. Usually, the hypothesis is the start point of any scientific investigation, I
      3 min read

    • Mann and Whitney U test
      Mann and Whitney's U-test or Wilcoxon rank-sum testis the non-parametric statistic hypothesis test that is used to analyze the difference between two independent samples of ordinal data. In this test, we have provided two randomly drawn samples and we have to verify whether these two samples is from
      5 min read

    • Wilcoxon Signed Rank Test
      The Wilcoxon Signed Rank Test is a non-parametric statistical test used to compare two related groups. It is often applied when the assumptions for the paired t-test (such as normality) are not met. This test evaluates whether there is a significant difference between two paired observations, making
      5 min read

    • Kruskal Wallis Test
      The Kruskal-Wallis test (H test) is a nonparametric statistical test used to compare three or more independent groups to determine if there are statistically significant differences between them. It is an extension of the Mann-Whitney U test, which is used for comparing two groups. Unlike the one-wa
      5 min read

    • Friedman Test
      The Friedman Test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. It is often used when the data is in the form of rankings or ordinal data, and when you have more than two related groups or repeated measures. The Friedman test is the non-
      6 min read

    • Probability Class 10 Important Questions
      Probability is a fundamental concept in mathematics for measuring of chances of an event happening By assigning numerical values to the chances of different outcomes, probability allows us to model, analyze, and predict complex systems and processes. Probability Formulas for Class 10 It says the pos
      4 min read

    Probability and Probability Distributions

    • Mathematics - Law of Total Probability
      Probability theory is the branch of mathematics concerned with the analysis of random events. It provides a framework for quantifying uncertainty, predicting outcomes, and understanding random phenomena. In probability theory, an event is any outcome or set of outcomes from a random experiment, and
      13 min read

    • Bayes's Theorem for Conditional Probability
      Bayes's Theorem for Conditional Probability: Bayes's Theorem is a fundamental result in probability theory that describes how to update the probabilities of hypotheses when given evidence. Named after the Reverend Thomas Bayes, this theorem is crucial in various fields, including engineering, statis
      9 min read

    • Mathematics | Probability Distributions Set 1 (Uniform Distribution)
      Probability distribution is a mathematical function that can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. For instance, if the random variable X is used to denote the outcome of a coin toss ("the experiment"), then the probability distri
      3 min read

    • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
      The previous articles talked about some of the Continuous Probability Distributions. This article covers one of the distributions which are not continuous but discrete, namely the Binomial Distribution. Introduction - To understand the Binomial distribution, we must first understand what a Bernoulli
      5 min read

    • Mathematics | Probability Distributions Set 5 (Poisson Distribution)
      The previous article covered the Binomial Distribution. This article talks about another Discrete Probability Distribution, the Poisson Distribution. Introduction -Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we c
      4 min read

    • Uniform Distribution | Formula, Definition and Examples
      A Uniform Distribution is a type of probability distribution in which every outcome in a given range is equally likely to occur. That means there is no bias—no outcome is more likely than another within the specified set. It is also known as rectangular distribution (continuous uniform distribution)
      12 min read

    • Mathematics | Probability Distributions Set 2 (Exponential Distribution)
      The previous article covered the basics of Probability Distributions and talked about the Uniform Probability Distribution. This article covers the Exponential Probability Distribution which is also a Continuous distribution just like Uniform Distribution. Introduction - Suppose we are posed with th
      5 min read

    • Mathematics | Probability Distributions Set 3 (Normal Distribution)
      The previous two articles introduced two Continuous Distributions: Uniform and Exponential. This article covers the Normal Probability Distribution, also a Continuous distribution, which is by far the most widely used model for continuous measurement. Introduction - Whenever a random experiment is r
      5 min read

    • Mathematics | Beta Distribution Model
      The Beta Distribution is a continuous probability distribution defined on the interval [0, 1], widely used in statistics and various fields for modeling random variables that represent proportions or probabilities. It is particularly useful when dealing with scenarios where the outcomes are bounded
      12 min read

    • Gamma Distribution Model in Mathematics
      Introduction : Suppose an event can occur several times within a given unit of time. When the total number of occurrences of the event is unknown, we can think of it as a random variable. Now, if this random variable X has gamma distribution, then its probability density function is given as follows
      2 min read

    • Chi-Square Test for Feature Selection - Mathematical Explanation
      One of the primary tasks involved in any supervised Machine Learning venture is to select the best features from the given dataset to obtain the best results. One way to select these features is the Chi-Square Test. Mathematically, a Chi-Square test is done on two distributions two determine the lev
      4 min read

    • Student's t-distribution in Statistics
      As we know normal distribution assumes two important characteristics about the dataset: a large sample size and knowledge of the population standard deviation. However, if we do not meet these two criteria, and we have a small sample size or an unknown population standard deviation, then we use the
      10 min read

    • Python - Central Limit Theorem
      Central Limit Theorem (CLT) is a foundational principle in statistics, and implementing it using Python can significantly enhance data analysis capabilities. Statistics is an important part of data science projects. We use statistical tools whenever we want to make any inference about the population
      7 min read

    • Limits, Continuity and Differentiability
      Limits, Continuity, and Differentiation are fundamental concepts in calculus. They are essential for analyzing and understanding function behavior and are crucial for solving real-world problems in physics, engineering, and economics. Table of Content LimitsKey Characteristics of LimitsExample of Li
      10 min read

    • Implicit Differentiation
      Implicit Differentiation is the process of differentiation in which we differentiate the implicit function without converting it into an explicit function. For example, we need to find the slope of a circle with an origin at 0 and a radius r. Its equation is given as x2 + y2 = r2. Now, to find the s
      6 min read

    Calculus for Machine Learning

    • Partial Derivatives in Engineering Mathematics
      Partial derivatives are a basic concept in multivariable calculus. They convey how a function would change when one of its input variables changes, while keeping all the others constant. This turns out to be particularly useful in fields such as physics, engineering, economics, and computer science,
      10 min read

    • Advanced Differentiation
      Derivatives are used to measure the rate of change of any quantity. This process is called differentiation. It can be considered as a building block of the theory of calculus. Geometrically speaking, the derivative of any function at a particular point gives the slope of the tangent at that point of
      8 min read

    • How to find Gradient of a Function using Python?
      The gradient of a function simply means the rate of change of a function. We will use numdifftools to find Gradient of a function. Examples: Input : x^4+x+1 Output :Gradient of x^4+x+1 at x=1 is 4.99 Input :(1-x)^2+(y-x^2)^2 Output :Gradient of (1-x^2)+(y-x^2)^2 at (1, 2) is [-4. 2.] Approach: For S
      2 min read

    • Optimization techniques for Gradient Descent
      Gradient Descent is a widely used optimization algorithm for machine learning models. However, there are several optimization techniques that can be used to improve the performance of Gradient Descent. Here are some of the most popular optimization techniques for Gradient Descent: Learning Rate Sche
      4 min read

    • Higher Order Derivatives
      Higher order derivatives refer to the derivatives of a function that are obtained by repeatedly differentiating the original function. The first derivative of a function, f′(x), represents the rate of change or slope of the function at a point.The second derivative, f′′(x), is the derivative of the
      6 min read

    • Taylor Series
      A Taylor series represents a function as an infinite sum of terms, calculated from the values of its derivatives at a single point. Taylor series is a powerful mathematical tool used to approximate complex functions with an infinite sum of terms derived from the function's derivatives at a single po
      8 min read

    • Application of Derivative - Maxima and Minima
      Derivatives have many applications, like finding rate of change, approximation, maxima/minima and tangent. In this section, we focus on their use in finding maxima and minima. Note: If f(x) is a continuous function, then for every continuous function on a closed interval has a maximum and a minimum
      6 min read

    • Absolute Minima and Maxima
      Absolute Maxima and Minima are the maximum and minimum values of the function defined on a fixed interval. A function in general can have high values or low values as we move along the function. The maximum value of the function in any interval is called the maxima and the minimum value of the funct
      12 min read

    • Optimization for Data Science
      From a mathematical foundation viewpoint, it can be said that the three pillars for data science that we need to understand quite well are Linear Algebra , Statistics and the third pillar is Optimization which is used pretty much in all data science algorithms. And to understand the optimization con
      5 min read

    • Unconstrained Multivariate Optimization
      Wikipedia defines optimization as a problem where you maximize or minimize a real function by systematically choosing input values from an allowed set and computing the value of the function. That means when we talk about optimization we are always interested in finding the best solution. So, let sa
      4 min read

    • Lagrange Multipliers | Definition and Examples
      In mathematics, a Lagrange multiplier is a potent tool for optimization problems and is applied especially in the cases of constraints. Named after the Italian-French mathematician Joseph-Louis Lagrange, the method provides a strategy to find maximum or minimum values of a function along one or more
      8 min read

    • Lagrange's Interpolation
      What is Interpolation? Interpolation is a method of finding new data points within the range of a discrete set of known data points (Source Wiki). In other words interpolation is the technique to estimate the value of a mathematical function, for any intermediate value of the independent variable. F
      7 min read

    • Linear Regression in Machine learning
      Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets.It assumes that there is a linear relationship between the input and output, mean
      15+ min read

    • Ordinary Least Squares (OLS) using statsmodels
      Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
      3 min read

    Regression in Machine Learning

    geeksforgeeks-footer-logo
    Corporate & Communications Address:
    A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
    Registered Address:
    K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
    GFG App on Play Store GFG App on App Store
    Advertise with us
    • Company
    • About Us
    • Legal
    • Privacy Policy
    • In Media
    • Contact Us
    • Advertise with us
    • GFG Corporate Solution
    • Placement Training Program
    • Languages
    • Python
    • Java
    • C++
    • PHP
    • GoLang
    • SQL
    • R Language
    • Android Tutorial
    • Tutorials Archive
    • DSA
    • Data Structures
    • Algorithms
    • DSA for Beginners
    • Basic DSA Problems
    • DSA Roadmap
    • Top 100 DSA Interview Problems
    • DSA Roadmap by Sandeep Jain
    • All Cheat Sheets
    • Data Science & ML
    • Data Science With Python
    • Data Science For Beginner
    • Machine Learning
    • ML Maths
    • Data Visualisation
    • Pandas
    • NumPy
    • NLP
    • Deep Learning
    • Web Technologies
    • HTML
    • CSS
    • JavaScript
    • TypeScript
    • ReactJS
    • NextJS
    • Bootstrap
    • Web Design
    • Python Tutorial
    • Python Programming Examples
    • Python Projects
    • Python Tkinter
    • Python Web Scraping
    • OpenCV Tutorial
    • Python Interview Question
    • Django
    • Computer Science
    • Operating Systems
    • Computer Network
    • Database Management System
    • Software Engineering
    • Digital Logic Design
    • Engineering Maths
    • Software Development
    • Software Testing
    • DevOps
    • Git
    • Linux
    • AWS
    • Docker
    • Kubernetes
    • Azure
    • GCP
    • DevOps Roadmap
    • System Design
    • High Level Design
    • Low Level Design
    • UML Diagrams
    • Interview Guide
    • Design Patterns
    • OOAD
    • System Design Bootcamp
    • Interview Questions
    • Inteview Preparation
    • Competitive Programming
    • Top DS or Algo for CP
    • Company-Wise Recruitment Process
    • Company-Wise Preparation
    • Aptitude Preparation
    • Puzzles
    • School Subjects
    • Mathematics
    • Physics
    • Chemistry
    • Biology
    • Social Science
    • English Grammar
    • Commerce
    • World GK
    • GeeksforGeeks Videos
    • DSA
    • Python
    • Java
    • C++
    • Web Development
    • Data Science
    • CS Subjects
    @GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
    We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
    Lightbox
    Improvement
    Suggest Changes
    Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
    geeksforgeeks-suggest-icon
    Create Improvement
    Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
    geeksforgeeks-improvement-icon
    Suggest Changes
    min 4 words, max Words Limit:1000

    Thank You!

    Your suggestions are valuable to us.

    What kind of Experience do you want to share?

    Interview Experiences
    Admission Experiences
    Career Journeys
    Work Experiences
    Campus Experiences
    Competitive Exam Experiences