Logistic Regression With Polynomial Features
Last Updated : 27 May, 2024
Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patterns in the data and improve the model's predictive performance.
In this article we will understand the significance of Logistic Regression With Polynomial Features as well it's implementation in scikit-learn.
Understanding Polynomial Features with Logistic Regression
Polynomial features are created by transforming the original input features into a new set of features that include not only the original features but also their polynomial combinations up to a specified degree. This transformation allows logistic regression, which is inherently a linear model, to capture non-linear relationships between the input variables and the target variable.
The degree of the polynomial determines the highest power to which the features are raised. Typically, degrees of 2 or 3 are used, as higher degrees can lead to overfitting.
Logistic regression with polynomial features is a powerful technique for handling non-linear relationships in data. The idea is to transform the input features into higher-degree polynomials, which can capture more complex relationships between variables. This approach is particularly useful when the decision boundary is non-linear, as it can help the model fit the data more accurately.
The Stone-Weierstrass theorem asserts that any continuous function on an interval can be approximated by polynomials. This means that, in theory, polynomial logistic regression can approximate any continuous decision boundary. However, in practice, the choice of polynomial degree is crucial to avoid overfitting or underfitting.
Utilizing Logistic Regression with Polynomial Features
To implement polynomial logistic regression in scikit-learn you need to convert your data to polynomial features using the PolynomialFeatures
class, and then build your logistic regression model on these features.
The degree of the polynomial is specified when creating the PolynomialFeatures
object.
- The influence of the polynomial order on the decision boundary has shown that a higher order polynomial can better classify data, especially for nonlinear problems.
- However, a high polynomial order can also lead to overfitting, while a low order can result in underfitting. Therefore, finding the optimal polynomial order is important for achieving good model performance.
Steps to Implement Polynomial Logistic Regression:
- Transforming Features: Use
PolynomialFeatures
to transform the original features into polynomial features. For example, with two input variables and a degree of 2, the transformed features would include the original features, their squares, and their product. - Fitting the Model: Fit a logistic regression model to the transformed features. This can be done using scikit-learn's
LogisticRegression
class. - Pipeline (Optional): To streamline the process, a pipeline can be used to combine the transformation and fitting steps into a single object. This helps avoid creating intermediate objects and simplifies the workflow.
Import necessary Libraries and generate synthetic dataset
Python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix, classification_report X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=1)
Split the dataset into training and testing sets
Python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Generate polynomial features
Python # Generate polynomial features poly = PolynomialFeatures(degree=2) # You can change the degree as needed X_train_poly = poly.fit_transform(X_train) X_test_poly = poly.transform(X_test)
Train the logistic regression model
Python model = LogisticRegression() model.fit(X_train_poly, y_train)
Predictions and Evaluating the Model
Python y_pred = model.predict(X_test_poly) accuracy = accuracy_score(y_test, y_pred) conf_matrix = confusion_matrix(y_test, y_pred) class_report = classification_report(y_test, y_pred) print(f"Accuracy: {accuracy}") print("Confusion Matrix:") print(conf_matrix) print("Classification Report:") print(class_report)
Output:
Accuracy: 1.0 Confusion Matrix: [[10 0] [ 0 10]] Classification Report: precision recall f1-score support 0 1.00 1.00 1.00 10 1 1.00 1.00 1.00 10 accuracy 1.00 20 macro avg 1.00 1.00 1.00 20 weighted avg 1.00 1.00 1.00 20
Visualize the decision boundary
Python # Optional: Visualize the decision boundary (for 2D data only) def plot_decision_boundary(X, y, model, poly): x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1)) Z = model.predict(poly.transform(np.c_[xx.ravel(), yy.ravel()])) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o') plt.show() # Visualize the decision boundary plot_decision_boundary(X_test, y_test, model, poly)
Output:
Decision Boundary
Creating the pipeline Generating Polynomial Features
Incorporating a pipeline will streamline the workflow by combining steps like preprocessing and model training into a single process.
Python # Create a pipeline that generates polynomial features and trains a logistic regression model pipeline = Pipeline([ ('poly', PolynomialFeatures(degree=2)), # Generate polynomial features ('logistic', LogisticRegression()) # Train logistic regression model ]) # Train the pipeline pipeline.fit(X_train, y_train)
Advantages and Disadvantages of Logistic Regression With Polynomial Features
Advantages of Logistic Regression With Polynomial Features
- Polynomial logistic regression can model non-linear relationships, making it more flexible than linear logistic regression.
- It can improve the classification performance for complex datasets.
Disadvantages of of Logistic Regression With Polynomial Features
- Higher-degree polynomials can lead to overfitting, where the model performs well on training data but poorly on unseen data.
- A balance must be struck between the complexity of the model (degree of polynomial) and the risk of overfitting.
Conclusion
In summary, polynomial logistic regression is a powerful technique for handling non-linear decision boundaries in classification tasks. By transforming input features into polynomial features, it allows logistic regression models to capture more complex patterns in the data.
Similar Reads
Multinomial Logistic Regression with PyTorch
Logistic regression is a popular machine learning algorithm used for binary classification tasks. It models the probability of the output variable (also known as the dependent variable) given the input variables (also known as the independent variables). It is a linear algorithm that applies a logis
11 min read
Multinomial Logistic Regression in R
Multinomial logistic regression is applied when the dependent variable has more than two categories that are not ordered. This method extends binary logistic regression to deal with multiple classes by estimating the probability of each outcome category relative to a baseline. It is commonly used in
4 min read
Polynomial Regression vs Neural Network
In this article, we are going to compare polynomial regression and neural networks. What is Polynomial Regression?Polynomial regression is a technique used to model the relationship between a dependent variable (what you're trying to predict) and an independent variable (what you're basing your pred
4 min read
Polynomial Regression for Non-Linear Data - ML
Non-linear data is usually encountered in daily life. Consider some of the equations of motion as studied in physics. Projectile Motion: The height of a projectile is calculated as h = -½ gt2 +ut +ho Equation of motion under free fall: The distance travelled by an object after falling freely under g
5 min read
Logistic Regression using Python
A basic machine learning approach that is frequently used for binary classification tasks is called logistic regression. Though its name suggests otherwise, it uses the sigmoid function to simulate the likelihood of an instance falling into a specific class, producing values between 0 and 1. Logisti
8 min read
Multiple Linear Regression Model with Normal Equation
Prerequisite: NumPy Consider a data set, area (x1)rooms (x2)age (x3)price (y)2338656215274569244968972954756231768234253107485 let us consider, Here area, rooms, age are features / independent variables and price is the target / dependent variable. As we know the hypothesis for multiple linear regre
3 min read
Ordinal Logistic Regression in R
A statistical method for modelling and analysing ordinal categorical outcomes is ordinal logistic regression, commonly referred to as ordered logistic regression. Ordinal results are categorical variables having a built-in order, but the gaps between the categories are not all the same. An example o
10 min read
ML | Why Logistic Regression in Classification ?
Using Linear Regression, all predictions >= 0.5 can be considered as 1 and rest all < 0.5 can be considered as 0. But then the question arises why classification can't be performed using it? Problem - Suppose we are classifying a mail as spam or not spam and our output is y, it can be 0(spam)
3 min read
Logistic Regression Vs Random Forest Classifier
A statistical technique called logistic regression is used to solve problems involving binary classification, in which the objective is to predict a binary result (such as yes/no, true/false, or 0/1) based on one or more predictor variables (also known as independent variables, features, or predicto
7 min read
Implementation of Polynomial Regression
Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modelled as an nth-degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y,
9 min read