Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Model Selection for Machine Learning
Next article icon

Feature Selection Techniques in Machine Learning

Last Updated : 11 Feb, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In data science many times we encounter vast of features present in a dataset. But it is not necessary all features contribute equally in prediction that where feature engineering comes. It helps in choosing important features while discarding rest. In this article we will learn more about it and its techniques.

Feature Selection Foundation

Feature selection is a important step in machine learning which involves selecting a subset of relevant features from the original feature set to reduce the feature space while improving the model’s performance by reducing computational power. It’s a critical step in the machine learning especially when dealing with high-dimensional data.

In real-world machine learning tasks not all features in the dataset contribute equally to model performance. Some features may be redundant, irrelevant or even noisy. Feature selection helps remove these improving the model’s accuracy instead of random guessing based on all features and increased interpretability.

There are various algorithms used for feature selection and are grouped into three main categories:

  1. Filter Methods
  2. Wrapper Methods
  3. Embedded Methods

Each one has its own strengths and trade-offs depending on the use case.

1. Filter Methods

Filter methods evaluate each feature independently with target variable. Feature with high correlation with target variable are selected as it means this feature has some relation and can help us in making predictions. These methods are used in the preprocessing phase to remove irrelevant or redundant features based on statistical tests (correlation) or other criteria.

Filter Methods Implementation

Advantages:

  • Fast and inexpensive: Can quickly evaluate features without training the model.
  • Good for removing redundant or correlated features.

Limitations: These methods don’t consider feature interactions so they may miss feature combinations that improve model performance.

Some techniques used are:  

  • Information Gain – It is defined as the amount of information provided by the feature for identifying the target value and measures reduction in the entropy values. Information gain of each attribute is calculated considering the target values for feature selection.
  • Chi-square test — Chi-square method (X2) is generally used to test the relationship between categorical variables. It compares the observed values from different attributes of the dataset to its expected value.

Chi-square Formula

  • Fisher’s Score – Fisher’s Score selects each feature independently according to their scores under Fisher criterion leading to a suboptimal set of features. The larger the Fisher’s score is, the better is the selected feature.
  • Correlation Coefficient – Pearson’s Correlation Coefficient is a measure of quantifying the association between the two continuous variables and the direction of the relationship with its values ranging from -1 to 1.
  • Variance Threshold – It is an approach where all features are removed whose variance doesn’t meet the specific threshold. By default, this method removes features having zero variance. The assumption made using this method is higher variance features are likely to contain more information.
  • Mean Absolute Difference (MAD) – This method is similar to variance threshold method but the difference is there is no square in MAD. This method calculates the mean absolute difference from the mean value.
  • Dispersion Ratio – Dispersion ratio is defined as the ratio of the Arithmetic mean (AM) to that of Geometric mean (GM) for a given feature. Its value ranges from +1 to ∞ as AM ≥ GM for a given feature. Higher dispersion ratio implies a more relevant feature.

2. Wrapper methods

Wrapper methods are also referred as greedy algorithms that train algorithm. They use different combination of features and compute relation between these subset features and target variable and based on conclusion addition and removal of features are done. Stopping criteria for selecting the best subset are usually pre-defined by the person training the model such as when the performance of the model decreases or a specific number of features are achieved.

Wrapper Methods Implementation

Advantages:

  • Can lead to better model performance since they evaluate feature subsets in the context of the model.
  • They can capture feature dependencies and interactions.

Limitations: They are computationally more expensive than filter methods especially for large datasets.

Some techniques used are:

  • Forward selection – This method is an iterative approach where we initially start with an empty set of features and keep adding a feature which best improves our model after each iteration. The stopping criterion is till the addition of a new variable does not improve the performance of the model.
  • Backward elimination – This method is also an iterative approach where we initially start with all features and after each iteration, we remove the least significant feature. The stopping criterion is till no improvement in the performance of the model is observed after the feature is removed.
  • Recursive elimination – This greedy optimization method selects features by recursively considering the smaller and smaller set of features. The estimator is trained on an initial set of features and their importance is obtained using feature_importance_attribute. The least important features are then removed from the current set of features till we are left with the required number of features.

3. Embedded methods

Embedded methods perform feature selection during the model training process. They combine the benefits of both filter and wrapper methods. Feature selection is integrated into the model training allowing the model to select the most relevant features based on the training process dynamically.

Embedded Methods Implementation

Advantages:

  • More efficient than wrapper methods because the feature selection process is embedded within model training.
  • Often more scalable than wrapper methods.

Limitations: Works with a specific learning algorithm so the feature selection might not work well with other models

Some techniques used are:

  • L1 Regularization (Lasso): A regression method that applies L1 regularization to encourage sparsity in the model. Features with non-zero coefficients are considered important.
  • Decision Trees and Random Forests: These algorithms naturally perform feature selection by selecting the most important features for splitting nodes based on criteria like Gini impurity or information gain.
  • Gradient Boosting: Like random forests gradient boosting models select important features while building trees by prioritizing features that reduce error the most.

Choosing the Right Feature Selection Method

Choice of feature selection method depends on several factors:

  • Dataset Size: Filter methods are often preferred for very large datasets due to their speed.
  • Feature Interactions: Wrapper and embedded methods are better for capturing complex feature interactions.
  • Model Type: Some methods like Lasso and decision trees are more suitable for certain models like linear models or tree-based models.

For example filter methods like correlation or variance threshold are excellent when we have a lot of features and want to remove irrelevant ones quickly. However if we want to maximize model performance and have the computational resources we might want to explore wrapper methods like RFE or embedded methods like Lasso.

Feature selection is a critical step in building efficient and accurate machine learning models. By choosing the right features we can improve our model’s accuracy, reduce overfitting and make it more interpretable. Each feature selection method has its strengths and weaknesses and understanding them will help us to choose the right approach for our dataset and task.



Next Article
Model Selection for Machine Learning

R

rahulbajaj1
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • Technical Scripter
  • ML-EDA
  • Technical Scripter 2020
Practice Tags :
  • Machine Learning

Similar Reads

  • Regularization Techniques in Machine Learning
    Overfitting is a major concern in the field of machine learning, as models aim to extract complex patterns from data. When a model learns to commit the training data to memory instead of making good generalizations to new data, this is known as overfitting. The model may perform poorly as a result w
    10 min read
  • Feature Transformation Techniques in Machine Learning
    Most machine learning algorithms are statistics dependent, meaning that all of the algorithms are indirectly using a statistical approach to solve the complex problems in the data. In statistics, the normal distribution of the data is one that a statistician desires to be. A normal distribution of t
    6 min read
  • Feature Selection in Python with Scikit-Learn
    Feature selection is a crucial step in the machine learning pipeline. It involves selecting the most important features from your dataset to improve model performance and reduce computational cost. In this article, we will explore various techniques for feature selection in Python using the Scikit-L
    4 min read
  • The Role of Feature Extraction in Machine Learning
    An essential step in the machine learning process is feature extraction. It entails converting unprocessed data into a format that algorithms can utilize to efficiently forecast outcomes or spot trends. The effectiveness of machine learning models is strongly impacted by the relevance and quality of
    8 min read
  • Model Selection for Machine Learning
    Machine learning (ML) is a field that enables computers to learn patterns from data and make predictions without being explicitly programmed. However, one of the most crucial aspects of machine learning is selecting the right model for a given problem. This process is called model selection. The cho
    6 min read
  • What is Data Segmentation in Machine Learning?
    In machine learning, the effective utilization of data is paramount. Data segmentation stands as a crucial process in this landscape, facilitating the organization and analysis of datasets to derive meaningful insights. From enhancing model accuracy to optimizing decision-making processes, data segm
    15 min read
  • Statistics For Machine Learning
    Machine Learning Statistics: In the field of machine learning (ML), statistics plays a pivotal role in extracting meaningful insights from data to make informed decisions. Statistics provides the foundation upon which various ML algorithms are built, enabling the analysis, interpretation, and predic
    8 min read
  • SVM with Univariate Feature Selection in Scikit Learn
    Support Vector Machines (SVM) is a powerful machine learning algorithm used for classification and regression analysis. It is based on the idea of finding the optimal boundary between two classes that maximizes the margin between them. However, the challenge with SVM is that it requires a large amou
    10 min read
  • Classification vs Regression in Machine Learning
    Classification and regression are two primary tasks in supervised machine learning, where key difference lies in the nature of the output: classification deals with discrete outcomes (e.g., yes/no, categories), while regression handles continuous values (e.g., price, temperature). Both approaches re
    5 min read
  • Regression in machine learning
    Regression in machine learning refers to a supervised learning technique where the goal is to predict a continuous numerical value based on one or more independent features. It finds relationships between variables so that predictions can be made. we have two types of variables present in regression
    5 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences