Logistic Regression With Polynomial Features

Last Updated : 27 May, 2024

Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patterns in the data and improve the model's predictive performance.

In this article we will understand the significance of Logistic Regression With Polynomial Features as well it's implementation in scikit-learn.

Table of Content

Understanding Polynomial Features with Logistic Regression

Polynomial features are created by transforming the original input features into a new set of features that include not only the original features but also their polynomial combinations up to a specified degree. This transformation allows logistic regression, which is inherently a linear model, to capture non-linear relationships between the input variables and the target variable.

The degree of the polynomial determines the highest power to which the features are raised. Typically, degrees of 2 or 3 are used, as higher degrees can lead to overfitting.

Logistic regression with polynomial features is a powerful technique for handling non-linear relationships in data. The idea is to transform the input features into higher-degree polynomials, which can capture more complex relationships between variables. This approach is particularly useful when the decision boundary is non-linear, as it can help the model fit the data more accurately.

The Stone-Weierstrass theorem asserts that any continuous function on an interval can be approximated by polynomials. This means that, in theory, polynomial logistic regression can approximate any continuous decision boundary. However, in practice, the choice of polynomial degree is crucial to avoid overfitting or underfitting.

Utilizing Logistic Regression with Polynomial Features

To implement polynomial logistic regression in scikit-learn you need to convert your data to polynomial features using the PolynomialFeatures class, and then build your logistic regression model on these features.

The degree of the polynomial is specified when creating the PolynomialFeatures object.

The influence of the polynomial order on the decision boundary has shown that a higher order polynomial can better classify data, especially for nonlinear problems.
However, a high polynomial order can also lead to overfitting, while a low order can result in underfitting. Therefore, finding the optimal polynomial order is important for achieving good model performance.

Steps to Implement Polynomial Logistic Regression:

Transforming Features: Use PolynomialFeatures to transform the original features into polynomial features. For example, with two input variables and a degree of 2, the transformed features would include the original features, their squares, and their product.
Fitting the Model: Fit a logistic regression model to the transformed features. This can be done using scikit-learn's LogisticRegression class.
Pipeline (Optional): To streamline the process, a pipeline can be used to combine the transformation and fitting steps into a single object. This helps avoid creating intermediate objects and simplifies the workflow.

Import necessary Libraries and generate synthetic dataset

Python

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix, classification_report X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, random_state=1)

Split the dataset into training and testing sets

Python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Generate polynomial features

Python

# Generate polynomial features poly = PolynomialFeatures(degree=2)  # You can change the degree as needed X_train_poly = poly.fit_transform(X_train) X_test_poly = poly.transform(X_test)

Train the logistic regression model

Python

model = LogisticRegression() model.fit(X_train_poly, y_train)

Predictions and Evaluating the Model

Python

y_pred = model.predict(X_test_poly) accuracy = accuracy_score(y_test, y_pred) conf_matrix = confusion_matrix(y_test, y_pred) class_report = classification_report(y_test, y_pred)  print(f"Accuracy: {accuracy}") print("Confusion Matrix:") print(conf_matrix) print("Classification Report:") print(class_report)

Output:

Accuracy: 1.0 Confusion Matrix: [[10  0]  [ 0 10]] Classification Report:               precision    recall  f1-score   support             0       1.00      1.00      1.00        10            1       1.00      1.00      1.00        10      accuracy                           1.00        20    macro avg       1.00      1.00      1.00        20 weighted avg       1.00      1.00      1.00        20

Visualize the decision boundary

Python

# Optional: Visualize the decision boundary (for 2D data only) def plot_decision_boundary(X, y, model, poly):     x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1     y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1     xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),                          np.arange(y_min, y_max, 0.1))     Z = model.predict(poly.transform(np.c_[xx.ravel(), yy.ravel()]))     Z = Z.reshape(xx.shape)     plt.contourf(xx, yy, Z, alpha=0.8)     plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')     plt.show()  # Visualize the decision boundary plot_decision_boundary(X_test, y_test, model, poly)

Output:

Creating the pipeline Generating Polynomial Features

Incorporating a pipeline will streamline the workflow by combining steps like preprocessing and model training into a single process.

Python

# Create a pipeline that generates polynomial features and trains a logistic regression model pipeline = Pipeline([     ('poly', PolynomialFeatures(degree=2)),  # Generate polynomial features     ('logistic', LogisticRegression())      # Train logistic regression model ])  # Train the pipeline pipeline.fit(X_train, y_train)

Advantages and Disadvantages of Logistic Regression With Polynomial Features

Advantages of Logistic Regression With Polynomial Features

Polynomial logistic regression can model non-linear relationships, making it more flexible than linear logistic regression.
It can improve the classification performance for complex datasets.

Disadvantages of of Logistic Regression With Polynomial Features

Higher-degree polynomials can lead to overfitting, where the model performs well on training data but poorly on unseen data.
A balance must be struck between the complexity of the model (degree of polynomial) and the risk of overfitting.

Conclusion

In summary, polynomial logistic regression is a powerful technique for handling non-linear decision boundaries in classification tasks. By transforming input features into polynomial features, it allows logistic regression models to capture more complex patterns in the data.

Multinomial Logistic Regression with PyTorch

akrammwiri

Improve

Article Tags :

Practice Tags :

Machine Learning

Logistic Regression With Polynomial Features

Understanding Polynomial Features with Logistic Regression

Utilizing Logistic Regression with Polynomial Features

Generate polynomial features

Train the logistic regression model

Creating the pipeline Generating Polynomial Features

Advantages and Disadvantages of Logistic Regression With Polynomial Features

Advantages of Logistic Regression With Polynomial Features

Disadvantages of of Logistic Regression With Polynomial Features

Conclusion

Similar Reads