Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Prediction Intervals for Machine Learning
Next article icon

Robust Regression for Machine Learning in Python

Last Updated : 30 Dec, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

Simple linear regression aims to find the best fit line that describes the linear relationship between some input variables(denoted by X) and the target variable(denoted by y). This has some limitations as in real-world problems, there is a high probability that the dataset may have outliers. This results in biased model fitting. To overcome this limitation of the biased fitted model, robust regression was introduced. In this article, we will learn about some state-of-the-art machine learning models which are robust to outliers.

One of the most used algorithms for Robust Regression is Random Sample Consensus (RANSAC). It is an iterative and non-deterministic method that is used to estimate the values of parameters used to build a machine-learning model from a set of observed data that contains outliers. When outliers are to be accorded, there is no influence on the estimated value. Hence, it also can be interpreted as a method for outlier detection. It produces a reasonable result only with a certain probability. This probability increases as more iterations are allowed.

Importing Libraries and Dataset

Here we will import a dataset and use it with some of the robust linear regression models. Python libraries make it easy for us to handle the data and perform typical and complex tasks with a single line of code.

  • Pandas – This library helps to load the data frame in a 2D array format and has multiple functions to perform analysis tasks in one go.
  • Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
  • Matplotlib/Seaborn – This library is used to draw visualizations.
  • Sklearn – This module contains multiple libraries are having pre-implemented functions to perform tasks from data preprocessing to model development and evaluation.

Python3

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, RANSACRegressor
from sklearn.metrics import r2_score, mean_squared_error
  
import warnings
warnings.filterwarnings('ignore')
                      
                       

Sklearn datasets library contains some exemplary datasets for testing ideas and illustrations.

Python3

# Load the Boston Housing dataset 
# for training
boston = datasets.load_boston()
  
# Load the columns present in the dataset
df = pd.DataFrame(boston.data)
df.columns = ['CRIM', 'ZN', 'INDUS', 'CHAS',
              'NOX', 'RM', 'AGE', 'DIS', 'RAD',
              'TAX', 'PTRATIO', 'B', 'LSTAT']
# Set target as column "MEDV"
df['MEDV'] = boston.target
  
# Select Avg. No of rooms per dwelling
# as one feature
X = df['RM'].to_numpy().reshape(-1, 1)
y = df['MEDV'].to_numpy().reshape(-1, 1)
                      
                       

RANSAC Regressor

In this model first data is separated into inliers and outliers then the model is trained on the inlier’s data. Training model in this way helps the model to learn patterns instead of any noises.

Python3

# Create a model
model = RANSACRegressor(base_estimator=LinearRegression(),
                        min_samples=50, max_trials=100,
                        loss='absolute_loss', random_state=42,
                        residual_threshold=10)
  
# Fit the model
model.fit(X, y)
                      
                       

Output:

RANSACRegressor(base_estimator=LinearRegression(), loss=’absolute_loss’,

                min_samples=50, random_state=42, residual_threshold=10)

Now let’s check the mean absolute error of the model.

Python3

y_pred = model.predict(X)
print(metrics.mean_absolute_error(y, y_pred))
                      
                       

Output:

4.475672221331006

Theil Sen Regressor

This model is somehow similar to the random forest model in which we train multiple decision trees and average their results. This helps us to eliminate the problem of overfitting. This model also trains multiple regression models on the subsets of the training data and then the coefficients of those models are combined. This averaging step of the coefficient is exactly the step where the model becomes robust to the outliers.

Python3

from sklearn.linear_model import TheilSenRegressor
  
# Create a model
model = TheilSenRegressor(random_state=42)
  
# Fit the model
model.fit(X, y)
                      
                       

Output:

TheilSenRegressor(max_subpopulation=10000, random_state=42)

Now let’s check the mean absolute error of the model.

Python3

y_pred = model.predict(X)
print(metrics.mean_absolute_error(y, y_pred))
                      
                       

Output:

4.442032221450043

Huber Regressor

In this model, weights are optimized by giving higher preferences to the data points which are inliers. This again helps to learn the patterns specifically instead of the noise present in the data.

Python3

from sklearn.linear_model import HuberRegressor
  
# Create a model
model = HuberRegressor()
  
# Fit the model
model.fit(X, y)
                      
                       

Now let’s check the mean absolute error of the model.

Python3

y_pred = model.predict(X)
print(metrics.mean_absolute_error(y, y_pred))
                      
                       

Output:

4.437123637682936

If the dataset you are using contains a lot of outliers then do try to train above mentioned robust regression models and choose the best out of them by using the validation dataset.



Next Article
Prediction Intervals for Machine Learning

Z

zawaresumedha
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • python
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • 5 Reasons Why Python is Used for Machine Learning
    Machine learning (ML) stands out as a key technology in the fast-coming field of artificial intelligence and solutions based on data, with implications for a variety of sectors. Python, a programming language, is central to this transformation, becoming a top choice for machine learning researchers,
    7 min read
  • Machine Learning Projects Using Regression
    Regression analysis in machine learning is used to find the relationship between a dependent variable and one or more independent variables. The goal is to predict the value of dependent variable based on input features. In this article, we will explore different Machine learning Projects using Regr
    4 min read
  • Python for Machine Learning
    Welcome to "Python for Machine Learning," a comprehensive guide to mastering one of the most powerful tools in the data science toolkit. Python is widely recognized for its simplicity, versatility, and extensive ecosystem of libraries, making it the go-to programming language for machine learning. I
    6 min read
  • Model Selection for Machine Learning
    Machine learning (ML) is a field that enables computers to learn patterns from data and make predictions without being explicitly programmed. However, one of the most crucial aspects of machine learning is selecting the right model for a given problem. This process is called model selection. The cho
    6 min read
  • Prediction Intervals for Machine Learning
    Prediction intervals are an essential concept in machine learning and statistics, providing a range within which a future observation is expected to fall with a certain probability. Unlike confidence intervals, which estimate the uncertainty of a population parameter, prediction intervals focus on t
    9 min read
  • Flight Fare Prediction Using Machine Learning
    In this article, we will develop a predictive machine learning model that can effectively predict flight fares. Why do we need to predict flight fares?There are several use cases of flight fare prediction, which are discussed below: Trip planning apps: Several Travel planning apps use airfare calcul
    5 min read
  • Machine Learning with Python Tutorial
    Python language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
    5 min read
  • Loss function for Linear regression in Machine Learning
    The loss function quantifies the disparity between the prediction value and the actual value. In the case of linear regression, the aim is to fit a linear equation to the observed data, the loss function evaluate the difference between the predicted value and true values. By minimizing this differen
    7 min read
  • House Price Prediction using Machine Learning in Python
    House price prediction is a problem in the real estate industry to make informed decisions. By using machine learning algorithms we can predict the price of a house based on various features such as location, size, number of bedrooms and other relevant factors. In this article we will explore how to
    6 min read
  • Top Python Notebooks for Machine Learning
    Notebooks illustrates the analysis process step-by-step manner by arranging the stuff like text, code, images, output, etc. This helps a data scientist record the process of thinking while designing the process of research. Traditionally, notebooks were used to record work and replicate findings, si
    6 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences