Regression in machine learning
Last Updated : 13 Jan, 2025
Regression in machine learning refers to a supervised learning technique where the goal is to predict a continuous numerical value based on one or more independent features. It finds relationships between variables so that predictions can be made. we have two types of variables present in regression:
- Dependent Variable (Target): The variable we are trying to predict e.g house price.
- Independent Variables (Features): The input variables that influence the prediction e.g locality, number of rooms.
Regression analysis problem works with if output variable is a real or continuous value such as “salary” or “weight”. Many different regression models can be used but the simplest model in them is linear regression.
Types of Regression
Regression can be classified into different types based on the number of predictor variables and the nature of the relationship between variables:
1. Simple Linear Regression
Linear regression is one of the simplest and most widely used statistical models. This assumes that there is a linear relationship between the independent and dependent variables. This means that the change in the dependent variable is proportional to the change in the independent variables. For example predicting the price of a house based on its size.
2. Multiple Linear Regression
Multiple linear regression extends simple linear regression by using multiple independent variables to predict target variable. For example predicting the price of a house based on multiple features such as size, location, number of rooms, etc.
3. Polynomial Regression
Polynomial regression is used to model with non-linear relationships between the dependent variable and the independent variables. It adds polynomial terms to the linear regression model to capture more complex relationships. For example when we want to predict a non-linear trend like population growth over time we use polynomial regression.
4. Ridge & Lasso Regression
Ridge & lasso regression are regularized versions of linear regression that help avoid overfitting by penalizing large coefficients. When there’s a risk of overfitting due to too many features we use these type of regression algorithms.
5. Support Vector Regression (SVR)
SVR is a type of regression algorithm that is based on the Support Vector Machine (SVM) algorithm. SVM is a type of algorithm that is used for classification tasks but it can also be used for regression tasks. SVR works by finding a hyperplane that minimizes the sum of the squared residuals between the predicted and actual values.
6. Decision Tree Regression
Decision tree Uses a tree-like structure to make decisions where each branch of tree represents a decision and leaves represent outcomes. For example predicting customer behavior based on features like age, income, etc there we use decison tree regression.
7. Random Forest Regression
Random Forest is a ensemble method that builds multiple decision trees and each tree is trained on a different subset of the training data. The final prediction is made by averaging the predictions of all of the trees. For example customer churn or sales data using this.
Regression Evaluation Metrics
Evaluation in machine learning measures the performance of a model. Here are some popular evaluation metrics for regression:
- Mean Absolute Error (MAE): The average absolute difference between the predicted and actual values of the target variable.
- Mean Squared Error (MSE): The average squared difference between the predicted and actual values of the target variable.
- Root Mean Squared Error (RMSE): Square root of the mean squared error.
- Huber Loss: A hybrid loss function that transitions from MAE to MSE for larger errors, providing balance between robustness and MSE’s sensitivity to outliers.
- R2 – Score: Higher values indicate better fit ranging from 0 to 1.
Regression Model Machine Learning
Let's take an example of linear regression. We have a Housing data set and we want to predict the price of the house. Following is the python code for it.
Python import matplotlib matplotlib.use('TkAgg') # General backend for plots import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model import pandas as pd # Load dataset df = pd.read_csv("Housing.csv") # Extract features and target variable Y = df['price'] X = df['lotsize'] # Reshape for compatibility with scikit-learn X = X.to_numpy().reshape(len(X), 1) Y = Y.to_numpy().reshape(len(Y), 1) # Split data into training and testing sets X_train = X[:-250] X_test = X[-250:] Y_train = Y[:-250] Y_test = Y[-250:] # Plot the test data plt.scatter(X_test, Y_test, color='black') plt.title('Test Data') plt.xlabel('Size') plt.ylabel('Price') plt.xticks(()) plt.yticks(()) # Train linear regression model regr = linear_model.LinearRegression() regr.fit(X_train, Y_train) # Plot predictions plt.plot(X_test, regr.predict(X_test), color='red', linewidth=3) plt.show()
Output:

Here in this graph we plot the test data. The red line indicates the best fit line for predicting the price.
To make an individual prediction using the linear regression model:
print("Predicted price for a lot size of 5000: " + str(round(regr.predict([[5000]])[0][0])))
Applications of Regression
- Predicting prices: Used to predict the price of a house based on its size, location and other features.
- Forecasting trends: Model to forecast the sales of a product based on historical sales data.
- Identifying risk factors: Used to identify risk factors for heart patient based on patient medical data.
- Making decisions: It could be used to recommend which stock to buy based on market data.
Advantages of Regression
- Easy to understand and interpret.
- Robust to outliers.
- Can handle both linear relationships easily.
Disadvantages of Regression
- Assumes linearity.
- Sensitive to situation where two or more independent variables are highly correlated with each other i.e multicollinearity.
- May not be suitable for highly complex relationships.
Conclusion
Regression in machine learning is a fundamental technique for predicting continuous outcomes based on input features. It is used in many real-world applications like price prediction, trend analysis and risk assessment. With its simplicity and effectiveness regression is used to understand relationships in data.
Similar Reads
Linear Regression in Machine learning Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea
15+ min read
Regularization in Machine Learning Regularization is an important technique in machine learning that helps to improve model accuracy by preventing overfitting which happens when a model learns the training data too well including noise and outliers and perform poor on new data. By adding a penalty for complexity it helps simpler mode
7 min read
Multioutput Regression in Machine Learning In machine learning we often encounter regression, these problems involve predicting a continuous target variable, such as house prices, or temperature. However, in many real-world scenarios, we need to predict not only single but many variables together, this is where we use multi-output regression
11 min read
Classification vs Regression in Machine Learning Classification and regression are two primary tasks in supervised machine learning, where key difference lies in the nature of the output: classification deals with discrete outcomes (e.g., yes/no, categories), while regression handles continuous values (e.g., price, temperature).Both approaches req
5 min read
Robust Regression for Machine Learning in Python Simple linear regression aims to find the best fit line that describes the linear relationship between some input variables(denoted by X) and the target variable(denoted by y). This has some limitations as in real-world problems, there is a high probability that the dataset may have outliers. This r
4 min read