Isotonic Regression in Scikit Learn
Last Updated : 26 Apr, 2025
Isotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner.
Mathematically, isotonic regression can be formulated as an optimization problem in which the goal is to find a monotonic function that minimizes the sum of the squared errors between the predicted and observed values of the target variable.
The optimization problem can be written as follows:
minimize ∑(y_i - f(x_i))^2 subject to f(x_1) ≤ f(x_2) ≤ ... ≤ f(x_n)
where x_i and y_i are the predictors and target variables for the i^{th} data point, respectively, and f is the monotonic function that is being fit to the data. The constraint ensures that the function is monotonic.
One way to solve this optimization problem is through a dynamic programming approach, which involves iteratively updating the function by adding one predictor-target pair at a time and making sure that the function remains monotonic at each step.
Applications of Isotonic Regression
Isotonic regression has a number of applications, including:
- Calibration of predicted probabilities: Isotonic regression can be used to adjust the predicted probabilities produced by a classifier so that they are more accurately calibrated to the true probabilities.
- Ordinal regression: Isotonic regression can be used to model ordinal variables, which are variables that can be ranked in order (e.g., "low," "medium," and "high").
- Non-parametric regression: Because isotonic regression does not make any assumptions about the functional form of the relationship between the predictor and target variables, it can be used as a non-parametric regression method.
- Imputing missing values: Isotonic regression can be used to impute missing values in a dataset by predicting the missing values based on the surrounding non-missing values.
- Outlier detection: Isotonic regression can be used to identify outliers in a dataset by identifying points that are significantly different from the overall trend of the data.
In scikit-learn, isotonic regression can be performed using the 'IsotonicRegression' class. This class implements the isotonic regression algorithm, which fits a non-decreasing piecewise-constant function to the data.
Here is an example of how to use the IsotonicRegression class in scikit-learn to perform isotonic regression:
1. Create the sample data with NumPy library
Python3 import numpy as np # Sample dataset n=20 x = np.arange(n) print('Input:\n',x) y = np.random.randint(0,20,size=n) + 10 * np.log1p(np.arange(n)) print("Target :\n",y)
Outputs :
Input: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] Target : [ 1. 22.93147181 20.98612289 20.86294361 27.09437912 31.91759469 38.45910149 23.79441542 22.97224577 35.02585093 32.97895273 40.8490665 39.64949357 45.3905733 39.08050201 43.72588722 31.33213344 36.90371758 47.44438979 44.95732274]
2. Import Isotonic Regression from sklearn.isotonic and predict the Target value
Python3 from sklearn.isotonic import IsotonicRegression ir = IsotonicRegression() # create an instance of the IsotonicRegression class # Fit isotonic regression model y_ir = ir.fit_transform(x, y) # fit the model and transform the data print('Isotonic Regression Predictions :\n',y_ir)
Output:
Isotonic Regression Predictions : [ 1. 21.59351277 21.59351277 21.59351277 27.09437912 29.28583934 29.28583934 29.28583934 29.28583934 34.00240183 34.00240183 39.5616248 39.5616248 39.5616248 39.5616248 39.5616248 39.5616248 39.5616248 46.20085626 46.20085626]
This code will fit an isotonic regression model to the sample data and make predictions on the same data. We can observe from the above Target that it is increasing or decreasing along the target value.
3. Let's use Linear regression to predict from the same data.
Python3 from sklearn.linear_model import LinearRegression lr = LinearRegression() # create an instance of the LinearRegression class # Fit linear regression model lr.fit(x.reshape(-1, 1), y) # fit the model to the data y_lr = lr.predict(x.reshape(-1, 1)) # make predictions using the fitted model print('Linear Regression Prediction :\n', y_lr)
Outputs :
Linear Regression Prediction : [17.69949296 19.24352614 20.78755933 22.33159252 23.8756257 25.41965889 26.96369208 28.50772526 30.05175845 31.59579164 33.13982482 34.68385801 36.2278912 37.77192438 39.31595757 40.85999076 42.40402394 43.94805713 45.49209032 47.0361235 ]
4. Let's compare by plotting both predictions with matplotlib.
Python3 import matplotlib.pyplot as plt from matplotlib.collections import LineCollection lines=[[[i,y[i]],[i,y_ir[i]]] for i in range(n)] # Line to measure the difference between actual and target value lc=LineCollection(lines) # plt.figure(figsize=(10,4)) plt.plot(x,y,'.',markersize=10, label='data') plt.plot(x,y_ir,'-',markersize=10,label='isotonic regression' ) plt.plot(x,y_lr, '-', label='linear regression') plt.gca().add_collection(lc) plt.legend() # add a legend plt.title("Isotonic Regression") plt.show()
Output:
Isotonic Regression Here, the blue dots represent the original target w.r.t input value. The orange line represents the predicted isotonic regression value. which is varying monotonically along the actual target value. while linear regression is represented by a green line, which is the best linear fit line for input data.
Comparison with different regression algorithms:
Here is a Python code that demonstrates how isotonic regression is different from other regression techniques using a sample dataset:
Python3 from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn.isotonic import IsotonicRegression import numpy as np import matplotlib.pyplot as plt # Sample dataset n = 20 x = np.arange(n) print('Input:\n', x) y = np.random.randint(0, 20, size=n) + 10 * np.log1p(np.arange(n)) print("Target :\n", y) # Fit isotonic regression model # create an instance of the IsotonicRegression class ir = IsotonicRegression() # fit the model and transform the data y_ir = ir.fit_transform(x, y) # Fit linear regression model # create an instance of the LinearRegression class lr = LinearRegression() # fit the model to the data lr.fit(x.reshape(-1, 1), y) # make predictions using the fitted model y_lr = lr.predict(x.reshape(-1, 1)) # Fit polynomial regression model # create an instance of the PolynomialFeatures # class with a degree of 2 poly = PolynomialFeatures(degree=2) # transform the data x_poly = poly.fit_transform(x.reshape(-1, 1)) # create an instance of the # LinearRegression class lr_poly = LinearRegression() # fit the model to the transformed data lr_poly.fit(x_poly, y) # make predictions using the fitted model y_poly = lr_poly.predict(x_poly) # Plot the results plt.plot(x, y, 'o', label='data') # plot the original data # plot the fitted isotonic regression model plt.plot(x, y_ir, label='isotonic regression') # plot the fitted linear regression model plt.plot(x, y_lr, label='linear regression') # plot the fitted polynomial regression model plt.plot(x, y_poly, label='polynomial regression') plt.legend() # add a legend # Add labels and title plt.xlabel('X') # add x-axis label plt.ylabel('Y') # add y-axis label plt.title('Comparison of Regression Techniques') # add title plt.show() # show the plot
Output:
Comparison of different Regression Techniques The first block imports the necessary libraries and generates a sample dataset with six data points. The second block fits an isotonic regression model to the data using the IsotonicRegression class from the sklearn library. The fit_transform method is used to fit the model and transform the data. The third block fits a linear regression model to the data using the LinearRegression class from the sklearn library. The fourth block fits a polynomial regression model to the data by first transforming the data using the PolynomialFeatures class from the sklearn library, and then fitting a linear regression model to the transformed data. The last block plots the original data, as well as the fitted models, using the matplotlib library.
Similar Reads
Machine Learning Algorithms
Machine learning algorithms are essentially sets of instructions that allow computers to learn from data, make predictions, and improve their performance over time without being explicitly programmed. Machine learning algorithms are broadly categorized into three types: Supervised Learning: Algorith
8 min read
Top 15 Machine Learning Algorithms Every Data Scientist Should Know in 2025
Machine Learning (ML) Algorithms are the backbone of everything from Netflix recommendations to fraud detection in financial institutions. These algorithms form the core of intelligent systems, empowering organizations to analyze patterns, predict outcomes, and automate decision-making processes. Wi
15 min read
Linear Model Regression
Ordinary Least Squares (OLS) using statsmodels
Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement Ordinary Least Squares (OLS) regression using P
3 min read
Linear Regression (Python Implementation)
Linear regression is a statistical method that is used to predict a continuous dependent variable i.e target variable based on one or more independent variables. This technique assumes a linear relationship between the dependent and independent variables which means the dependent variable changes pr
14 min read
ML | Multiple Linear Regression using Python
Linear regression is a fundamental statistical method widely used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression is an extension of this concept that allows us to
4 min read
Polynomial Regression ( From Scratch using Python )
Prerequisites Linear RegressionGradient DescentIntroductionLinear Regression finds the correlation between the dependent variable ( or target variable ) and independent variables ( or features ). In short, it is a linear model to fit the data linearly. But it fails to fit and catch the pattern in no
5 min read
Bayesian Linear Regression
Linear regression is based on the assumption that the underlying data is normally distributed and that all relevant predictor variables have a linear relationship with the outcome. But In the real world, this is not always possible, it will follows these assumptions, Bayesian regression could be the
11 min read
How to Perform Quantile Regression in Python
In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we a
4 min read
Isotonic Regression in Scikit Learn
Isotonic regression is a regression technique in which the predictor variable is monotonically related to the target variable. This means that as the value of the predictor variable increases, the value of the target variable either increases or decreases in a consistent, non-oscillating manner. Mat
6 min read
Stepwise Regression in Python
Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data. There are two main types of stepwise regression: F
6 min read
Least Angle Regression (LARS)
Regression is a supervised machine learning task that can predict continuous values (real numbers), as compared to classification, that can predict categorical or discrete values. Before we begin, if you are a beginner, I highly recommend this article. Least Angle Regression (LARS) is an algorithm u
3 min read
Linear Model Classification
K-Nearest Neighbors (KNN)
ML | Stochastic Gradient Descent (SGD)
Stochastic Gradient Descent (SGD) is an optimization algorithm in machine learning, particularly when dealing with large datasets. It is a variant of the traditional gradient descent algorithm but offers several advantages in terms of efficiency and scalability, making it the go-to method for many d
8 min read