Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Python for Machine Learning
  • Machine Learning with R
  • Machine Learning Algorithms
  • EDA
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • ML Projects
  • Deep Learning
  • NLP
  • Computer vision
  • Data Science
  • Artificial Intelligence
Open In App
Next Article:
K-Nearest Neighbor(KNN) Algorithm
Next article icon

K-Nearest Neighbor(KNN) Algorithm

Last Updated : 14 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm generally used for classification but can also be used for regression tasks. It works by finding the "k" closest data points (neighbors) to a given input and makesa predictions based on the majority class (for classification) or the average value (for regression). Since KNN makes no assumptions about the underlying data distribution it makes it a non-parametric and instance-based learning method.

K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification it performs an action on the dataset.

For example, consider the following table of data points containing two features:

KNN Algorithm working visualization
KNN Algorithm working visualization

The new point is classified as Category 2 because most of its closest neighbors are blue squares. KNN assigns the category based on the majority of nearby points. The image shows how KNN predicts the category of a new data point based on its closest neighbours.

  • The red diamonds represent Category 1 and the blue squares represent Category 2.
  • The new data point checks its closest neighbors (circled points).
  • Since the majority of its closest neighbors are blue squares (Category 2) KNN predicts the new data point belongs to Category 2.

KNN works by using proximity and majority voting to make predictions.

What is 'K' in K Nearest Neighbour?

In the k-Nearest Neighbours algorithm k is just a number that tells the algorithm how many nearby points or neighbors to look at when it makes a decision.

Example: Imagine you're deciding which fruit it is based on its shape and size. You compare it to fruits you already know.

  • If k = 3, the algorithm looks at the 3 closest fruits to the new one.
  • If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an apple because most of its neighbors are apples.

How to choose the value of k for KNN Algorithm?

  • The value of k in KNN decides how many neighbors the algorithm looks at when making a prediction.
  • Choosing the right k is important for good results.
  • If the data has lots of noise or outliers, using a larger k can make the predictions more stable.
  • But if k is too large the model may become too simple and miss important patterns and this is called underfitting.
  • So k should be picked carefully based on the data.

Statistical Methods for Selecting k

  • Cross-Validation: Cross-Validation is a good way to find the best value of k is by using k-fold cross-validation. This means dividing the dataset into k parts. The model is trained on some of these parts and tested on the remaining ones. This process is repeated for each part. The k value that gives the highest average accuracy during these tests is usually the best one to use.
  • Elbow Method: In Elbow Method we draw a graph showing the error rate or accuracy for different k values. As k increases the error usually drops at first. But after a certain point error stops decreasing quickly. The point where the curve changes direction and looks like an "elbow" is usually the best choice for k.
  • Odd Values for k: It’s a good idea to use an odd number for k especially in classification problems. This helps avoid ties when deciding which class is the most common among the neighbors.

Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbor, these neighbors are used for classification and regression task. To identify nearest neighbor we use below distance metrics:

1. Euclidean Distance

Euclidean distance is defined as the straight-line distance between two points in a plane or space. You can think of it like the shortest path you would walk if you were to go directly from one point to another.

\text{distance}(x, X_i) = \sqrt{\sum_{j=1}^{d} (x_j - X_{i_j})^2} ]

2. Manhattan Distance

This is the total distance you would travel if you could only move along horizontal and vertical lines like a grid or city streets. It’s also called "taxicab distance" because a taxi can only drive along the grid-like streets of a city.

d\left ( x,y \right )={\sum_{i=1}^{n}\left | x_i-y_i \right |}

3. Minkowski Distance

Minkowski distance is like a family of distances, which includes both Euclidean and Manhattan distances as special cases.

d\left ( x,y \right )=\left ( {\sum_{i=1}^{n}\left ( x_i-y_i \right )^p} \right )^{\frac{1}{p}}

From the formula above, when p=2, it becomes the same as the Euclidean distance formula and when p=1, it turns into the Manhattan distance formula. Minkowski distance is essentially a flexible formula that can represent either Euclidean or Manhattan distance depending on the value of p.

Working of KNN algorithm

Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity where it predicts the label or value of a new data point by considering the labels or values of its K nearest neighbors in the training dataset.

Workings of KNN algorithm

Step 1: Selecting the optimal value of K

  • K represents the number of nearest neighbors that needs to be considered while making prediction.

Step 2: Calculating distance

  • To measure the similarity between target and training data points Euclidean distance is used. Distance is calculated between data points in the dataset and target point.

Step 3: Finding Nearest Neighbors

  • The k data points with the smallest distances to the target point are nearest neighbors.

Step 4: Voting for Classification or Taking Average for Regression

  • When you want to classify a data point into a category like spam or not spam, the KNN algorithm looks at the K closest points in the dataset. These closest points are called neighbors. The algorithm then looks at which category the neighbors belong to and picks the one that appears the most. This is called majority voting.
  • In regression, the algorithm still looks for the K closest points. But instead of voting for a class in classification, it takes the average of the values of those K neighbors. This average is the predicted value for the new point for the algorithm.

It shows how a test point is classified based on its nearest neighbors. As the test point moves the algorithm identifies the closest 'k' data points i.e. 5 in this case and assigns test point the majority class label that is grey label class here.

Python Implementation of KNN Algorithm

1. Importing Libraries

Counter is used to count the occurrences of elements in a list or iterable. In KNN after finding the k nearest neighbor labels Counter helps count how many times each label appears.

Python
import numpy as np from collections import Counter 

2. Defining the Euclidean Distance Function

euclidean_distance is to calculate euclidean distance between points.

Python
def euclidean_distance(point1, point2):     return np.sqrt(np.sum((np.array(point1) - np.array(point2))**2)) 

3. KNN Prediction Function

  • distances.append saves how far each training point is from the test point, along with its label.
  • distances.sort is used to sorts the list so the nearest points come first.
  • k_nearest_labels picks the labels of the k closest points.
  • Uses Counter to find which label appears most among those k labels that becomes the prediction.
Python
def knn_predict(training_data, training_labels, test_point, k):     distances = []     for i in range(len(training_data)):         dist = euclidean_distance(test_point, training_data[i])         distances.append((dist, training_labels[i]))     distances.sort(key=lambda x: x[0])     k_nearest_labels = [label for _, label in distances[:k]]     return Counter(k_nearest_labels).most_common(1)[0][0] 

4. Training Data, Labels and Test Point

Python
training_data = [[1, 2], [2, 3], [3, 4], [6, 7], [7, 8]] training_labels = ['A', 'A', 'A', 'B', 'B'] test_point = [4, 5] k = 3 

5. Prediction

Python
prediction = knn_predict(training_data, training_labels, test_point, k) print(prediction) 

Output: 

A

The algorithm calculates the distances of the test point [4, 5] to all training points selects the 3 closest points as k = 3 and determines their labels. Since the majority of the closest points are labelled 'A' the test point is classified as 'A'.

In machine learning we can also use Scikit Learn python library which has in built functions to perform KNN machine learning model and for that you refer to Implementation of KNN classifier using Sklearn.

Applications of KNN

  • Recommendation Systems: Suggests items like movies or products by finding users with similar preferences.
  • Spam Detection: Identifies spam emails by comparing new emails to known spam and non-spam examples.
  • Customer Segmentation: Groups customers by comparing their shopping behavior to others.
  • Speech Recognition: Matches spoken words to known patterns to convert them into text.

Advantages of KNN

  • Simple to use: Easy to understand and implement.
  • No training step: No need to train as it just stores the data and uses it during prediction.
  • Few parameters: Only needs to set the number of neighbors (k) and a distance method.
  • Versatile: Works for both classification and regression problems.

Disadvantages of KNN

  • Slow with large data: Needs to compare every point during prediction.
  • Struggles with many features: Accuracy drops when data has too many features.
  • Can Overfit: It can overfit especially when the data is high-dimensional or not clean.

Also Check for more understanding:

  • K Nearest Neighbors with Python | ML
  • Implementation of K-Nearest Neighbors from Scratch using Python
  • Mathematical explanation of K-Nearest Neighbour
  • Weighted K-NN

Next Article
K-Nearest Neighbor(KNN) Algorithm

K

kartik
Improve
Article Tags :
  • Machine Learning
  • AI-ML-DS
  • Directi
  • ML-Classification
  • Machine Learning
  • AI-ML-DS With Python
Practice Tags :
  • Directi
  • Machine Learning
  • Machine Learning

Similar Reads

    kNN: k-Nearest Neighbour Algorithm in R From Scratch
    In this article, we are going to discuss what is KNN algorithm, how it is coded in R Programming Language, its application, advantages and disadvantages of the KNN algorithm. kNN algorithm in RKNN can be defined as a K-nearest neighbor algorithm. It is a supervised learning algorithm that can be use
    15+ min read
    k-nearest neighbor algorithm using Sklearn - Python
    K-Nearest Neighbors (KNN) works by identifying the 'k' nearest data points called as neighbors to a given input and predicting its class or value based on the majority class or the average of its neighbors. In this article we will implement it using Python's Scikit-Learn library.1. Generating and Vi
    5 min read
    r-Nearest neighbors
    r-Nearest neighbors are a modified version of the k-nearest neighbors. The issue with k-nearest neighbors is the choice of k. With a smaller k, the classifier would be more sensitive to outliers. If the value of k is large, then the classifier would be including many points from other classes. It is
    5 min read
    ML | K-means++ Algorithm
    In Clustering we group similar data points together. K-Means Clustering is one of the simplest and most popular clustering algorithms but it has one major drawback — the random initialization of cluster centers often leads to poor clustering results. Some clusters may have no points or multiple cent
    5 min read
    Implementation of K Nearest Neighbors
    Prerequisite: K nearest neighbors   Introduction Say we are given a data set of items, each having numerically valued features (like Height, Weight, Age, etc). If the count of features is n, we can represent the items as points in an n-dimensional grid. Given a new item, we can calculate the distanc
    10 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences