Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Multinomial Logistic Regression with PyTorch
Next article icon

Logistic Regression With Polynomial Features

Last Updated : 27 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patterns in the data and improve the model's predictive performance.

In this article we will understand the significance of Logistic Regression With Polynomial Features as well it's implementation in scikit-learn.

Table of Content

  • Understanding Polynomial Features with Logistic Regression
  • Utilizing Logistic Regression with Polynomial Features
    • Generate polynomial features
    • Train the logistic regression model
  • Creating the pipeline Generating Polynomial Features
  • Advantages and Disadvantages of Logistic Regression With Polynomial Features

Understanding Polynomial Features with Logistic Regression

Polynomial features are created by transforming the original input features into a new set of features that include not only the original features but also their polynomial combinations up to a specified degree. This transformation allows logistic regression, which is inherently a linear model, to capture non-linear relationships between the input variables and the target variable.

The degree of the polynomial determines the highest power to which the features are raised. Typically, degrees of 2 or 3 are used, as higher degrees can lead to overfitting.

Logistic regression with polynomial features is a powerful technique for handling non-linear relationships in data. The idea is to transform the input features into higher-degree polynomials, which can capture more complex relationships between variables. This approach is particularly useful when the decision boundary is non-linear, as it can help the model fit the data more accurately.

The Stone-Weierstrass theorem asserts that any continuous function on an interval can be approximated by polynomials. This means that, in theory, polynomial logistic regression can approximate any continuous decision boundary. However, in practice, the choice of polynomial degree is crucial to avoid overfitting or underfitting.

Utilizing Logistic Regression with Polynomial Features

To implement polynomial logistic regression in scikit-learn you need to convert your data to polynomial features using the PolynomialFeatures class, and then build your logistic regression model on these features.

The degree of the polynomial is specified when creating the PolynomialFeatures object.

  • The influence of the polynomial order on the decision boundary has shown that a higher order polynomial can better classify data, especially for nonlinear problems.
  • However, a high polynomial order can also lead to overfitting, while a low order can result in underfitting. Therefore, finding the optimal polynomial order is important for achieving good model performance.

Steps to Implement Polynomial Logistic Regression:

  1. Transforming Features: Use PolynomialFeatures to transform the original features into polynomial features. For example, with two input variables and a degree of 2, the transformed features would include the original features, their squares, and their product.
  2. Fitting the Model: Fit a logistic regression model to the transformed features. This can be done using scikit-learn's LogisticRegression class.
  3. Pipeline (Optional): To streamline the process, a pipeline can be used to combine the transformation and fitting steps into a single object. This helps avoid creating intermediate objects and simplifies the workflow.

Import necessary Libraries and generate synthetic dataset

Python
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix, classification_report X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=1) 

Split the dataset into training and testing sets

Python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 

Generate polynomial features

Python
# Generate polynomial features poly = PolynomialFeatures(degree=2)  # You can change the degree as needed X_train_poly = poly.fit_transform(X_train) X_test_poly = poly.transform(X_test) 

Train the logistic regression model

Python
model = LogisticRegression() model.fit(X_train_poly, y_train) 

Predictions and Evaluating the Model

Python
y_pred = model.predict(X_test_poly) accuracy = accuracy_score(y_test, y_pred) conf_matrix = confusion_matrix(y_test, y_pred) class_report = classification_report(y_test, y_pred)  print(f"Accuracy: {accuracy}") print("Confusion Matrix:") print(conf_matrix) print("Classification Report:") print(class_report) 

Output:

Accuracy: 1.0 Confusion Matrix: [[10  0]  [ 0 10]] Classification Report:               precision    recall  f1-score   support             0       1.00      1.00      1.00        10            1       1.00      1.00      1.00        10      accuracy                           1.00        20    macro avg       1.00      1.00      1.00        20 weighted avg       1.00      1.00      1.00        20

Visualize the decision boundary

Python
# Optional: Visualize the decision boundary (for 2D data only) def plot_decision_boundary(X, y, model, poly):     x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1     y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1     xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),                          np.arange(y_min, y_max, 0.1))     Z = model.predict(poly.transform(np.c_[xx.ravel(), yy.ravel()]))     Z = Z.reshape(xx.shape)     plt.contourf(xx, yy, Z, alpha=0.8)     plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')     plt.show()  # Visualize the decision boundary plot_decision_boundary(X_test, y_test, model, poly) 

Output:

download-(17)
Decision Boundary

Creating the pipeline Generating Polynomial Features

Incorporating a pipeline will streamline the workflow by combining steps like preprocessing and model training into a single process.

Python
# Create a pipeline that generates polynomial features and trains a logistic regression model pipeline = Pipeline([     ('poly', PolynomialFeatures(degree=2)),  # Generate polynomial features     ('logistic', LogisticRegression())      # Train logistic regression model ])  # Train the pipeline pipeline.fit(X_train, y_train) 

Advantages and Disadvantages of Logistic Regression With Polynomial Features

Advantages of Logistic Regression With Polynomial Features

  • Polynomial logistic regression can model non-linear relationships, making it more flexible than linear logistic regression.
  • It can improve the classification performance for complex datasets.

Disadvantages of of Logistic Regression With Polynomial Features

  • Higher-degree polynomials can lead to overfitting, where the model performs well on training data but poorly on unseen data.
  • A balance must be struck between the complexity of the model (degree of polynomial) and the risk of overfitting.

Conclusion

In summary, polynomial logistic regression is a powerful technique for handling non-linear decision boundaries in classification tasks. By transforming input features into polynomial features, it allows logistic regression models to capture more complex patterns in the data.


Next Article
Multinomial Logistic Regression with PyTorch

A

akrammwiri
Improve
Article Tags :
  • Machine Learning
  • Blogathon
  • AI-ML-DS
  • Data Science Blogathon 2024
Practice Tags :
  • Machine Learning

Similar Reads

  • Multinomial Logistic Regression with PyTorch
    Logistic regression is a popular machine learning algorithm used for binary classification tasks. It models the probability of the output variable (also known as the dependent variable) given the input variables (also known as the independent variables). It is a linear algorithm that applies a logis
    11 min read
  • Multinomial Logistic Regression in R
    Multinomial logistic regression is applied when the dependent variable has more than two categories that are not ordered. This method extends binary logistic regression to deal with multiple classes by estimating the probability of each outcome category relative to a baseline. It is commonly used in
    4 min read
  • Polynomial Regression vs Neural Network
    In this article, we are going to compare polynomial regression and neural networks. What is Polynomial Regression?Polynomial regression is a technique used to model the relationship between a dependent variable (what you're trying to predict) and an independent variable (what you're basing your pred
    4 min read
  • Polynomial Regression for Non-Linear Data - ML
    Non-linear data is usually encountered in daily life. Consider some of the equations of motion as studied in physics. Projectile Motion: The height of a projectile is calculated as h = -½ gt2 +ut +ho Equation of motion under free fall: The distance travelled by an object after falling freely under g
    5 min read
  • Logistic Regression using Python
    A basic machine learning approach that is frequently used for binary classification tasks is called logistic regression. Though its name suggests otherwise, it uses the sigmoid function to simulate the likelihood of an instance falling into a specific class, producing values between 0 and 1. Logisti
    8 min read
  • Multiple Linear Regression Model with Normal Equation
    Prerequisite: NumPy Consider a data set, area (x1)rooms (x2)age (x3)price (y)2338656215274569244968972954756231768234253107485 let us consider, Here area, rooms, age are features / independent variables and price is the target / dependent variable. As we know the hypothesis for multiple linear regre
    3 min read
  • Ordinal Logistic Regression in R
    A statistical method for modelling and analysing ordinal categorical outcomes is ordinal logistic regression, commonly referred to as ordered logistic regression. Ordinal results are categorical variables having a built-in order, but the gaps between the categories are not all the same. An example o
    10 min read
  • ML | Why Logistic Regression in Classification ?
    Using Linear Regression, all predictions >= 0.5 can be considered as 1 and rest all < 0.5 can be considered as 0. But then the question arises why classification can't be performed using it? Problem - Suppose we are classifying a mail as spam or not spam and our output is y, it can be 0(spam)
    3 min read
  • Logistic Regression Vs Random Forest Classifier
    A statistical technique called logistic regression is used to solve problems involving binary classification, in which the objective is to predict a binary result (such as yes/no, true/false, or 0/1) based on one or more predictor variables (also known as independent variables, features, or predicto
    7 min read
  • Implementation of Polynomial Regression
    Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modelled as an nth-degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y,
    9 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences