Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Linear Regression (Python Implementation)
Next article icon

Implementation of Polynomial Regression

Last Updated : 11 Jan, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modelled as an nth-degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y | x). In this article, we’ll go in-depth about polynomial regression.

Table of Content

  • What is a Polynomial Regression? 
  • Why Polynomial Regression?
  • How does a Polynomial Regression work?
  • Polynomial Regression Real-Life Example
  • Polynomial Regression implementations using Python
  • Overfitting Vs Under-fitting
  • Application of Polynomial Regression
  • Advantages & Disadvantages of using Polynomial Regression

What is a Polynomial Regression? 

  • There are some relationships that a researcher will hypothesize is curvilinear. Clearly, such types of cases will include a polynomial term.
  • Inspection of residuals. If we try to fit a linear model to curved data, a scatter plot of residuals (Y-axis) on the predictor (X-axis) will have patches of many positive residuals in the middle. Hence in such a situation, it is not appropriate.
  • An assumption in the usual multiple linear regression analysis is that all the independent variables are independent. In the polynomial regression model, this assumption is not satisfied.

Why Polynomial Regression?

Polynomial regression is a type of regression analysis used in statistics and machine learning when the relationship between the independent variable (input) and the dependent variable (output) is not linear. While simple linear regression models the relationship as a straight line, polynomial regression allows for more flexibility by fitting a polynomial equation to the data.

When the relationship between the variables is better represented by a curve rather than a straight line, polynomial regression can capture the non-linear patterns in the data.

How does a Polynomial Regression work?

If we observe closely then we will realize that to evolve from linear regression to polynomial regression. We are just supposed to add the higher-order terms of the dependent features in the feature space. This is sometimes also known as feature engineering but not exactly.

When the relationship is non-linear, a polynomial regression model introduces higher-degree polynomial terms.

The general form of a polynomial regression equation of degree n is:

[Tex]y=β_0+β_1x+β_2x^2+…+β_nx^n +ϵ [/Tex]

where,

  • y is the dependent variable.
  • x is the independent variable.
  • [Tex]β_0,β_1,…,β_n [/Tex]​ are the coefficients of the polynomial terms.
  • n is the degree of the polynomial.
  • [Tex]ϵ[/Tex] represents the error term.

The basic goal of regression analysis is to model the expected value of a dependent variable y in terms of the value of an independent variable x. In simple linear regression, we used the following equation – 

y = a + bx + e

Here y is a dependent variable, a is the y-intercept, b is the slope and e is the error rate. In many cases, this linear model will not work out For example if we analyze the production of chemical synthesis in terms of the temperature at which the synthesis takes place in such cases we use a quadratic model.[Tex]y = a + b_1x + b_2x^2 + e [/Tex]

Here,

  • y is the dependent variable on x
  • a is the y-intercept and e is the error rate.

In general, we can model it for the nth value. [Tex]y = a + b_1x + b_2x^2 +….+ b_nx^n [/Tex]

Since the regression function is linear in terms of unknown variables, hence these models are linear from the point of estimation. Hence through the Least Square technique, response value (y) can be computed.

By including higher-degree terms (quadratic, cubic, etc.), the model can capture the non-linear patterns in the data.

  1. The choice of the polynomial degree (n) is a crucial aspect of polynomial regression. A higher degree allows the model to fit the training data more closely, but it may also lead to overfitting, especially if the degree is too high. Therefore, the degree should be chosen based on the complexity of the underlying relationship in the data.
  2. The polynomial regression model is trained to find the coefficients that minimize the difference between the predicted values and the actual values in the training data.
  3. Once the model is trained, it can be used to make predictions on new, unseen data. The polynomial equation captures the non-linear patterns observed in the training data, allowing the model to generalize to non-linear relationships.

Polynomial Regression Real-Life Example

Let’s consider a real-life example to illustrate the application of polynomial regression. Suppose you are working in the field of finance, and you are analyzing the relationship between the years of experience (in years) an employee has and their corresponding salary (in dollars). You suspect that the relationship might not be linear and that higher degrees of the polynomial might better capture the salary progression over time.

Years of Experience

Salary (in dollars)

1

50,000

2

55,000

3

65,000

4

80,000

5

110,000

6

150,000

7

200,000

Now, let’s apply polynomial regression to model the relationship between years of experience and salary. We’ll use a quadratic polynomial (degree 2) for this example.

The quadratic polynomial regression equation is:

Salary=[Tex]β_0+β_1 [/Tex] ×Experience+[Tex]β_2 [/Tex]​×Experience^2+[Tex]ϵ [/Tex]

Now, to find the coefficients that minimize the difference between the predicted salaries and the actual salaries in the dataset we can use a method of least squares. The objective is to minimize the sum of squared differences between the predicted values and the actual values.

Polynomial Regression implementations using Python

To get the Dataset used for the analysis of Polynomial Regression, click here. Import the important libraries and the dataset we are using to perform Polynomial Regression. 

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib/Seaborn – This library is used to draw visualizations.
  • Sklearn – This module contains multiple libraries having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.

Python3

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
 
# Importing the dataset
datas = pd.read_csv('data.csv')
datas
                      
                       

Output:

Printing the head of a pandas dataframe.

First Five rows of the dataset

 Our feature variable that is X will contain the Column between 1st and the target variable that is y will contain the 2nd column. 

Python3

X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values
                      
                       

Now let’s fit a linear regression model on the data at hand.

Python3

# Features and the target variables
X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values
 
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin = LinearRegression()
 
lin.fit(X, y)
                      
                       

Fitting the Polynomial Regression model on two components X and y. 

Python3

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
 
poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)
 
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)
                      
                       

In this step, we are Visualising the Linear Regression results using a scatter plot.

Python3

# Visualising the Linear Regression results
plt.scatter(X, y, color='blue')
 
plt.plot(X, lin.predict(X), color='red')
plt.title('Linear Regression')
plt.xlabel('Temperature')
plt.ylabel('Pressure')
 
plt.show()
                      
                       

Output:

Scatter plot of feature and the target variable.

Scatter plot of feature and the target variable.

Visualize the Polynomial Regression results using a scatter plot.

Python3

# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
 
plt.plot(X, lin2.predict(poly.fit_transform(X)),
         color='red')
plt.title('Polynomial Regression')
plt.xlabel('Temperature')
plt.ylabel('Pressure')
 
plt.show()
                      
                       

Output:

Implementation of Polynomial Regression

Implementation of Polynomial Regression

Predict new results with both Linear and Polynomial Regression. Note that the input variable must be in a Numpy 2D array.

Python3

# Predicting a new result with Linear Regression
# after converting predict variable to 2D array
pred = 110.0
predarray = np.array([[pred]])
lin.predict(predarray)
                      
                       

Output:

array([0.20675333])

Python3

# Predicting a new result with Polynomial Regression
# after converting predict variable to 2D array
pred2 = 110.0
pred2array = np.array([[pred2]])
lin2.predict(poly.fit_transform(pred2array))
                      
                       

Output:

array([0.43295877])

Overfitting Vs Under-fitting

While dealing with the polynomial regression one thing that we face is the problem of overfitting this happens because while we increase the order of the polynomial regression to achieve better and better performance model gets overfit on the data and does not perform on the new data points.

Due to this reason only while using the polynomial regression, do we try to penalize the weights of the model to regularize the effect of the overfitting problem. Regularization techniques like Lasso regression and Ridge regression methodologies are used whenever we deal with a situation in which the model may overfit the data at hand.

Bias Vs Variance Tradeoff

This technique is the generalization of the approach that is used to avoid the problem of overfitting and underfitting. Here as well this technique helps us to avoid the problem of overfitting by helping us select the appropriate value for the degree of the polynomial we are trying to fit our data on. For example, this is achieved when after increasing the degree of polynomial after a certain level the gap between the training and the validation metrics starts increasing.

Application of Polynomial Regression

The reason behind the vast use cases of the polynomial regression is that approximately all of the real-world data is non-linear in nature and hence when we fit a non-linear model on the data or a curvilinear regression line then the results that we obtain are far better than what we can achieve with the standard linear regression. Some of the use cases of the Polynomial regression are as stated below: 

  • The growth rate of tissues.
  • Progression of disease epidemics
  • Distribution of carbon isotopes in lake sediments

Advantages & Disadvantages of using Polynomial Regression

Advantages of using Polynomial Regression

  • A broad range of functions can be fit under it.
  • Polynomial basically fits a wide range of curvatures.
  • Polynomial provides the best approximation of the relationship between dependent and independent variables.

Disadvantages of using Polynomial Regression 

  • These are too sensitive to outliers.
  • The presence of one or two outliers in the data can seriously affect the results of nonlinear analysis.
  • In addition, there are unfortunately fewer model validation tools for the detection of outliers in nonlinear regression than there are for linear regression.

Conclusion

Polynomial regression, a versatile tool, finds applications in diverse domains. While addressing non-linear relationships, it requires careful consideration of overfitting and model complexity.



Next Article
Linear Regression (Python Implementation)

A

Akashkumar17
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • AI-ML-DS With Python
Practice Tags :
  • Machine Learning

Similar Reads

  • Linear Regression (Python Implementation)
    Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
    14 min read
  • Polynomial Regression for Non-Linear Data - ML
    Non-linear data is usually encountered in daily life. Consider some of the equations of motion as studied in physics. Projectile Motion: The height of a projectile is calculated as h = -½ gt2 +ut +ho Equation of motion under free fall: The distance travelled by an object after falling freely under g
    5 min read
  • Polynomial Regression vs Neural Network
    In this article, we are going to compare polynomial regression and neural networks. What is Polynomial Regression?Polynomial regression is a technique used to model the relationship between a dependent variable (what you're trying to predict) and an independent variable (what you're basing your pred
    4 min read
  • Normal Equation in Linear Regression
    Linear regression is a popular method for understanding how different factors (independent variables) affect an outcome (dependent variable. At its core, linear regression aims to find the best-fitting line that minimizes the error between observed data points and predicted values. One efficient met
    8 min read
  • Logistic Regression With Polynomial Features
    Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patter
    5 min read
  • Implementation of Ridge Regression from Scratch using Python
    Prerequisites: Linear Regression Gradient Descent Introduction: Ridge Regression ( or L2 Regularization ) is a variation of Linear Regression. In Linear Regression, it minimizes the Residual Sum of Squares ( or RSS or cost function ) to fit the training examples perfectly as possible. The cost funct
    4 min read
  • Implementation of Lasso Regression From Scratch using Python
    Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a linear regression technique that combines prediction with feature selection. It does this by adding a penalty term to the cost function shrinking less relevant feature's coefficients to zero. This makes it effective for high-dim
    7 min read
  • Principal Component Regression (PCR)
    Principal Component Regression (PCR) is a statistical technique for regression analysis that is used to reduce the dimensionality of a dataset by projecting it onto a lower-dimensional subspace. This is done by finding a set of orthogonal (i.e., uncorrelated) linear combinations of the original vari
    7 min read
  • Linear Regression Implementation From Scratch using Python
    Linear Regression is a supervised learning algorithm which is both a statistical and a machine learning algorithm. It is used to predict the real-valued output y based on the given input value x. It depicts the relationship between the dependent variable y and the independent variables xi ( or featu
    4 min read
  • Gradient Descent in Linear Regression
    Gradient descent is a optimization algorithm used in linear regression to minimize the error in predictions. This article explores how gradient descent works in linear regression. Why Gradient Descent in Linear Regression?Linear regression involves finding the best-fit line for a dataset by minimizi
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences