Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
How does L1 and L2 regularization prevent overfitting?
Next article icon

How does L1 and L2 regularization prevent overfitting?

Last Updated : 14 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Overfitting is a recurring problem in machine learning that can harm a model's capacity to perform well and be generalized. Regularization is a useful tactic for addressing this problem since it keeps models from becoming too complicated and, thus, too customized to the training set. L1 and L2, two widely used regularization techniques, provide different solutions for this issue. In this article, we will be exploring how does regularization prevents overfitting.

How do we avoid Overfitting?

Overfitting occurs when a machine learning model learns the training data too well, to the extent that it starts to memorize noise and random fluctuations in the data rather than capturing the underlying patterns. This can result in poor performance when the model is applied to new, unseen data. Essentially, it's like a student who memorizes the answers to specific questions without truly understanding the material, and then struggles when faced with new questions or scenarios. Avoiding overfitting is crucial in developing robust and generalizable machine learning models.

To improve a model's performance, various techniques can be applied. These include methods like dropout, which randomly removes neurons during training, adaptive regularization to adjust regularization strength based on data, and early stopping to halt training when performance plateaus, along with experimenting with different architectures and applying L1 or L2 regularization for controlling overfitting. Here, we will emphasize on L1 and L2 regularization.

How does L1, and L2 regularization prevent overfitting?

L1 regularization, or Lasso regularization, introduces a penalty term based on the absolute values of the weights into the model's cost function. This penalty encourages the model to prioritize a smaller set of significant features, aiding in feature selection. By reducing feature complexity, L1 regularization helps prevent overfitting.

We can represent the modified loss function as:

L_{L1} = L_{original} + \lambda \sum_{i=1}^{n}|w_i|

Here,

  • L_{L1} is the new loss function with L1 regularization.
  • L_{orginal} is the original loss function without regularization.
  • \lambda is the regularization parameter
  • n is the number of features
  • w_i are the coefficients of the features.

The term \lambda \sum_{i=1}^{n}|w_i|penalizes large coefficients by adding their absolute values to the loss function.

L2 regularization, also known as Ridge regularization, incorporates a penalty term proportional to the square of the weights into the model's cost function. This encourages the model to evenly distribute weights across all features, preventing overreliance on any single feature and thereby reducing overfitting.

We can represent the modified loss function as:

L_{L2} = L_{original} + \lambda \sum_{i=1}^{n}|w_i^{2}|

Here,

  • L_{L2} is the new loss function with L2 regularization
  • L_{original} is the original loss function without regularization
  • \lambda is the regularization parameter
  • n is the number of features
  • w_i are the coefficients of the features

The term \lambda \sum_{i=1}^{n} w_{i}^{2}​ penalizes large coefficients by adding their squared values to the loss function.

In essence, both L1 and L2 regularization techniques counter overfitting by simplifying the model and promoting more balanced weight distribution across features.

L1 Vs L2 regularization


L1 Regularization (Lasso)L2 Regularization (Ridge)
Advantages



Feature selection: Encourages sparse models by driving irrelevant feature weights to zero.Smooths model: Encourages more balanced weight distribution across features, reducing over-reliance on any single feature.
Robust to outliers: Due to the absolute penalty, L1 regularization is less sensitive to outliers.Better for multicollinear features: Handles multicollinearity well by distributing weights evenly among correlated features.
Interpretable models: Produces simpler, more interpretable models by emphasizing important features. Generally stable: Offers more stability in the presence of correlated predictors.
Disadvantages



Non-differentiable at zero: Can have issues in optimization due to non-differentiability at zero, requiring specialized optimization techniques. Doesn't perform feature selection: Does not drive any weights exactly to zero, leading to less sparse models.
May shrink coefficients too much: In some cases, L1 regularization may excessively shrink coefficients, leading to underfitting.Not robust to outliers: Can be sensitive to outliers due to the squared penalty term, potentially affecting model performance.
Works poorly with correlated features: May arbitrarily select one feature over another when features are highly correlated.Less interpretable models: Ridge regression tends to keep all features in the model, which can make interpretation more challenging.



Next Article
How does L1 and L2 regularization prevent overfitting?

A

aryanmishra21
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
Practice Tags :
  • Machine Learning

Similar Reads

    Overfitting and Regularization in ML
    The effectiveness of a machine learning model is measured by its ability to make accurate predictions and minimize prediction errors. An ideal or good machine learning model should be able to perform well with new input data, allowing us to make accurate predictions about future data that the model
    14 min read
    How can Feature Selection reduce overfitting?
    The development of precise models is essential for predicted performance in the rapidly developing area of machine learning. The possibility of overfitting, in which a model picks up noise and oscillations unique to the training set in addition to the underlying patterns in the data, presents an inh
    8 min read
    How K-Fold Prevents overfitting in a model?
    In machine learning, accurately processing how well a model performs and whether it can handle new data is crucial. Yet, with limited data or concerns about generalization, traditional methods of evaluation may not cut it. That's where cross-validation steps in. It's a method that rigorously tests p
    9 min read
    Dropout Regularization in Deep Learning
    Training a model excessively on available data can lead to overfitting, causing poor performance on new test data. Dropout regularization is a method employed to address overfitting issues in deep learning. This blog will delve into the details of how dropout regularization works to enhance model ge
    4 min read
    How to handle overfitting in TensorFlow models?
    Overfitting occurs when a machine learning model learns to perform well on the training data but fails to generalize to new, unseen data. In TensorFlow models, overfitting typically manifests as high accuracy on the training dataset but lower accuracy on the validation or test datasets. This phenome
    10 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences