Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
__future__ Module in Python
Next article icon

Imbalanced-Learn module in Python

Last Updated : 11 Dec, 2020
Comments
Improve
Suggest changes
Like Article
Like
Report

Imbalanced-Learn is a Python module that helps in balancing the datasets which are highly skewed or biased towards some classes. Thus, it helps in resampling the classes which are otherwise oversampled or undesampled. If there is a greater imbalance ratio, the output is biased to the class which has a higher number of examples. The following dependencies need to be installed to use imbalanced-learn:

  • scipy(>=0.19.1)
  • numpy(>=1.13.3)
  • scikit-learn(>=0.23)
  • joblib(>=0.11)
  • keras 2 (optional)
  • tensorflow (optional)

To install imbalanced-learn just type in :

pip install imbalanced-learn

The resampling of data is done in 2 parts: 

Estimator: It implements a fit method which is derived from scikit-learn. The data and targets are both in the form of a 2D array

estimator = obj.fit(data, targets)

Resampler: The fit_resample method resample the data and targets into a dictionary with a key-value pair of data_resampled and targets_resampled.

data_resampled, targets_resampled = obj.fit_resample(data, targets)

The Imbalanced Learn module has different algorithms for oversampling and undersampling:

We will use the built-in dataset called the make_classification dataset which return 

  • x: a matrix of n_samples*n_features and 
  • y: an array of integer labels.

Click dataset to get the dataset used.

Python3




# import required modules
from sklearn.datasets import make_classification
  
# define dataset
x, y = make_classification(n_samples=10000, 
                           weights=[0.99], 
                           flip_y=0)
print('x:\n', X)
print('y:\n', y)
 
 

Output:

Below are some programs in which depict how to apply oversampling and undersampling to the dataset:

Oversampling

  • Random Over Sampler: It is a naive method where classes that have low examples are generated and randomly resampled.

Syntax:

from imblearn.over_sampling import RandomOverSampler

Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, ratio=None

Implementation:
oversample = RandomOverSampler(sampling_strategy=’minority’)
X_oversample,Y_oversample=oversample.fit_resample(X,Y)

Return Type:a matrix with the shape of n_samples*n_features

Example:

Python3




# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import RandomOverSampler
  
# define dataset
x, y = make_classification(n_samples=10000, 
                           weights=[0.99], 
                           flip_y=0)
  
oversample = RandomOverSampler(sampling_strategy='minority')
x_over, y_over = oversample.fit_resample(x, y)
  
# print the features and the labels
print('x_over:\n', x_over)
print('y_over:\n', y_over)
 
 

Output:

  •  SMOTE, ADASYN: Synthetic Minority Oversampling Technique (SMOTE)  and the Adaptive Synthetic (ADASYN) are 2 methods used in oversampling. These also generate low examples but ADASYN takes into account the density of distribution to distribute the data points evenly.

Syntax:

from imblearn.over_sampling import SMOTE, ADASYN

Parameters(optional):*, sampling_strategy=’auto’, random_state=None, n_neighbors=5, n_jobs=None

Implementation:
smote = SMOTE(ratio=’minority’)
X_smote,Y_smote=smote.fit_resample(X,Y)

Return Type:a matrix with the shape of n_samples*n_features

Example:

Python3




# import required modules
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
  
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
smote = SMOTE()
x_smote, y_smote = smote.fit_resample(x, y)
  
# print the features and the labels
print('x_smote:\n', x_smote)
print('y_smote:\n', y_smote)
 
 

Output:

Undersampling

  • Edited Nearest Neighbours: This algorithm removes any sample which has labels different from those of its adjoining classes.

Syntax:

from imblearn.under_sampling import EditedNearestNeighbours

Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, n_neighbors=3, kind_sel=’all’, n_jobs=1, ratio=None

Implementation:
en = EditedNearestNeighbours()
X_en,Y_en=en.fit_resample(X, y)

Return Type:a matrix with the shape of n_samples*n_features

Example:

Python3




# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import EditedNearestNeighbours
  
# define dataset
x, y = make_classification(n_samples=10000, weights=[0.99], flip_y=0)
en = EditedNearestNeighbours()
x_en, y_en = en.fit_resample(x, y)
  
# print the features and the labels
print('x_en:\n', x_en)
print('y_en:\n', y_en)
 
 

Output:

  • Random Under Sampler: It involves sampling any random class with or without any replacement.

Syntax:

from imblearn.under_sampling import RandomUnderSampler
Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, replacement=False, ratio=None

Implementation:
undersample = RandomUnderSampler()
X_under, y_under = undersample.fit_resample(X, y)

Return Type: a matrix with the shape of n_samples*n_features

Example:

Python3




# import required modules
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler
  
# define dataset
x, y = make_classification(n_samples=10000, 
                           weights=[0.99], 
                           flip_y=0)
undersample = RandomUnderSampler()
x_under, y_under = undersample.fit_resample(x, y)
  
# print the features and the labels
print('x_under:\n', x_under)
print('y_under:\n', y_under)
 
 

Output:



Next Article
__future__ Module in Python

S

sangy987
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • python
  • python-modules
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • Inspect Module in Python
    The inspect module in Python is useful for examining objects in your code. Since Python is an object-oriented language, this module helps inspect modules, functions and other objects to better understand their structure. It also allows for detailed analysis of function calls and tracebacks, making d
    4 min read
  • Cmdparse module in Python
    The Class which provides a simple framework for writing line-oriented command interpreters is called cmd class. These are often useful for administrative tools, prototypes and test harnesses that will later be wrapped in a more sophisticated interface. The command-line interface can be easily made u
    6 min read
  • Create and Import modules in Python
    In Python, a module is a self-contained Python file that contains Python statements and definitions, like a file named GFG.py, can be considered as a module named GFG which can be imported with the help of import statement. However, one might get confused about the difference between modules and pac
    3 min read
  • External Modules in Python
    Python is one of the most popular programming languages because of its vast collection of modules which make the work of developers easy and save time from writing the code for a particular task for their program. Python provides various types of modules which include built-in modules and external m
    5 min read
  • __future__ Module in Python
    __future__ module is a built-in module in Python that is used to inherit new features that will be available in the new Python versions..  This module includes all the latest functions which were not present in the previous version in Python. And we can use this by importing the __future__ module. I
    4 min read
  • Python-interface module
    In object-oriented languages like Python, the interface is a collection of method signatures that should be provided by the implementing class. Implementing an interface is a way of writing an organized code and achieve abstraction. The package zope.interface provides an implementation of "object in
    3 min read
  • Docopt module in Python
    Docopt is a command line interface description module. It helps you define a interface for a command-line application and generates parser for it. The interface message in docopt is a formalized help message. Installation You can install docopt module in various ways, pip is one of the best ways to
    3 min read
  • Python Fire Module
    Python Fire is a library to create CLI applications. It can automatically generate command line Interfaces from any object in python. It is not limited to this, it is a good tool for debugging and development purposes. With the help of Fire, you can turn existing code into CLI. In this article, we w
    3 min read
  • Best Python Modules for Automation
    Automation is an addition of technology that performs tasks with reduced human assistance to processes that facilitate feedback loops between operations and development teams so that iterative updates can be deployed faster to applications in production. There are different types of automation libra
    3 min read
  • Basics Of Python Modules
    A library refers to a collection of modules that together cater to a specific type of needs or application. Module is a file(.py file) containing variables, class definitions statements, and functions related to a particular task. Python modules that come preloaded with Python are called standard li
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences