Imbalanced-Learn module in Python
Last Updated : 11 Dec, 2020
Imbalanced-Learn is a Python module that helps in balancing the datasets which are highly skewed or biased towards some classes. Thus, it helps in resampling the classes which are otherwise oversampled or undesampled. If there is a greater imbalance ratio, the output is biased to the class which has a higher number of examples. The following dependencies need to be installed to use imbalanced-learn:
- scipy(>=0.19.1)
- numpy(>=1.13.3)
- scikit-learn(>=0.23)
- joblib(>=0.11)
- keras 2 (optional)
- tensorflow (optional)
To install imbalanced-learn just type in :
pip install imbalanced-learn
The resampling of data is done in 2 parts:
Estimator: It implements a fit method which is derived from scikit-learn. The data and targets are both in the form of a 2D array
estimator = obj.fit(data, targets)
Resampler: The fit_resample method resample the data and targets into a dictionary with a key-value pair of data_resampled and targets_resampled.
data_resampled, targets_resampled = obj.fit_resample(data, targets)
The Imbalanced Learn module has different algorithms for oversampling and undersampling:
We will use the built-in dataset called the make_classification dataset which return
- x: a matrix of n_samples*n_features and
- y: an array of integer labels.
Click dataset to get the dataset used.
Python3
from sklearn.datasets import make_classification x, y = make_classification(n_samples = 10000 , weights = [ 0.99 ], flip_y = 0 ) print ( 'x:\n' , X) print ( 'y:\n' , y) |
Output:

Below are some programs in which depict how to apply oversampling and undersampling to the dataset:
Oversampling
- Random Over Sampler: It is a naive method where classes that have low examples are generated and randomly resampled.
Syntax:
from imblearn.over_sampling import RandomOverSampler
Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, ratio=None
Implementation:
oversample = RandomOverSampler(sampling_strategy=’minority’)
X_oversample,Y_oversample=oversample.fit_resample(X,Y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
from sklearn.datasets import make_classification from imblearn.over_sampling import RandomOverSampler x, y = make_classification(n_samples = 10000 , weights = [ 0.99 ], flip_y = 0 ) oversample = RandomOverSampler(sampling_strategy = 'minority' ) x_over, y_over = oversample.fit_resample(x, y) print ( 'x_over:\n' , x_over) print ( 'y_over:\n' , y_over) |
Output:

- SMOTE, ADASYN: Synthetic Minority Oversampling Technique (SMOTE) and the Adaptive Synthetic (ADASYN) are 2 methods used in oversampling. These also generate low examples but ADASYN takes into account the density of distribution to distribute the data points evenly.
Syntax:
from imblearn.over_sampling import SMOTE, ADASYN
Parameters(optional):*, sampling_strategy=’auto’, random_state=None, n_neighbors=5, n_jobs=None
Implementation:
smote = SMOTE(ratio=’minority’)
X_smote,Y_smote=smote.fit_resample(X,Y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
from sklearn.datasets import make_classification from imblearn.over_sampling import SMOTE x, y = make_classification(n_samples = 10000 , weights = [ 0.99 ], flip_y = 0 ) smote = SMOTE() x_smote, y_smote = smote.fit_resample(x, y) print ( 'x_smote:\n' , x_smote) print ( 'y_smote:\n' , y_smote) |
Output:

Undersampling
- Edited Nearest Neighbours: This algorithm removes any sample which has labels different from those of its adjoining classes.
Syntax:
from imblearn.under_sampling import EditedNearestNeighbours
Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, n_neighbors=3, kind_sel=’all’, n_jobs=1, ratio=None
Implementation:
en = EditedNearestNeighbours()
X_en,Y_en=en.fit_resample(X, y)
Return Type:a matrix with the shape of n_samples*n_features
Example:
Python3
from sklearn.datasets import make_classification from imblearn.under_sampling import EditedNearestNeighbours x, y = make_classification(n_samples = 10000 , weights = [ 0.99 ], flip_y = 0 ) en = EditedNearestNeighbours() x_en, y_en = en.fit_resample(x, y) print ( 'x_en:\n' , x_en) print ( 'y_en:\n' , y_en) |
Output:

- Random Under Sampler: It involves sampling any random class with or without any replacement.
Syntax:
from imblearn.under_sampling import RandomUnderSampler
Parameters(optional): sampling_strategy=’auto’, return_indices=False, random_state=None, replacement=False, ratio=None
Implementation:
undersample = RandomUnderSampler()
X_under, y_under = undersample.fit_resample(X, y)
Return Type: a matrix with the shape of n_samples*n_features
Example:
Python3
from sklearn.datasets import make_classification from imblearn.under_sampling import RandomUnderSampler x, y = make_classification(n_samples = 10000 , weights = [ 0.99 ], flip_y = 0 ) undersample = RandomUnderSampler() x_under, y_under = undersample.fit_resample(x, y) print ( 'x_under:\n' , x_under) print ( 'y_under:\n' , y_under) |
Output:

Similar Reads
Inspect Module in Python
The inspect module in Python is useful for examining objects in your code. Since Python is an object-oriented language, this module helps inspect modules, functions and other objects to better understand their structure. It also allows for detailed analysis of function calls and tracebacks, making d
4 min read
Cmdparse module in Python
The Class which provides a simple framework for writing line-oriented command interpreters is called cmd class. These are often useful for administrative tools, prototypes and test harnesses that will later be wrapped in a more sophisticated interface. The command-line interface can be easily made u
6 min read
Create and Import modules in Python
In Python, a module is a self-contained Python file that contains Python statements and definitions, like a file named GFG.py, can be considered as a module named GFG which can be imported with the help of import statement. However, one might get confused about the difference between modules and pac
3 min read
External Modules in Python
Python is one of the most popular programming languages because of its vast collection of modules which make the work of developers easy and save time from writing the code for a particular task for their program. Python provides various types of modules which include built-in modules and external m
5 min read
__future__ Module in Python
__future__ module is a built-in module in Python that is used to inherit new features that will be available in the new Python versions.. This module includes all the latest functions which were not present in the previous version in Python. And we can use this by importing the __future__ module. I
4 min read
Python-interface module
In object-oriented languages like Python, the interface is a collection of method signatures that should be provided by the implementing class. Implementing an interface is a way of writing an organized code and achieve abstraction. The package zope.interface provides an implementation of "object in
3 min read
Docopt module in Python
Docopt is a command line interface description module. It helps you define a interface for a command-line application and generates parser for it. The interface message in docopt is a formalized help message. Installation You can install docopt module in various ways, pip is one of the best ways to
3 min read
Python Fire Module
Python Fire is a library to create CLI applications. It can automatically generate command line Interfaces from any object in python. It is not limited to this, it is a good tool for debugging and development purposes. With the help of Fire, you can turn existing code into CLI. In this article, we w
3 min read
Best Python Modules for Automation
Automation is an addition of technology that performs tasks with reduced human assistance to processes that facilitate feedback loops between operations and development teams so that iterative updates can be deployed faster to applications in production. There are different types of automation libra
3 min read
Basics Of Python Modules
A library refers to a collection of modules that together cater to a specific type of needs or application. Module is a file(.py file) containing variables, class definitions statements, and functions related to a particular task. Python modules that come preloaded with Python are called standard li
3 min read