Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
Machine Learning Model Evaluation
Next article icon

Evaluation Metrics in Machine Learning

Last Updated : 05 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Evaluation is always good in any field, right? In the case of machine learning, it is best practice. In this post, we will cover almost all the popular as well as common metrics used for machine learning.

Table of Content

  • Classification Metrics
    • Accuracy
    • Logarithmic Loss
    • Area Under Curve (AUC)
    • Precision
    • Recall
    • F1 Score
    • Confusion Matrix
  • Regression Evaluation Metrics
    • Mean Absolute Error (MAE)
    • Mean Squared Error (MSE)
    • Root Mean Square Error (RMSE)
    • Root Mean Squared Logarithmic Error (RMSLE)
    • R2 – Score

Classification Metrics

In a classification task, our main task is to predict the target variable, which is in the form of discrete values. To evaluate the performance of such a model, following are the commonly used evaluation metrics:

  • Accuracy
  • Logarithmic Loss
  • Area Under Curve
  • Precision
  • Recall
  • F1 Score
  • Confusion Matrix

Accuracy

Accuracy is a fundamental metric for evaluating the performance of a classification model, providing a quick snapshot of how well the model is performing in terms of correct predictions. It is calculated as the ratio of correct predictions to the total number of input samples.

[Tex]\rm{Accuracy} = \frac{\rm{No.\; of\; correct \;predictions}}{\rm{Total\; number \;of \;input\; samples}}[/Tex]

It works great if there are an equal number of samples for each class. For example, we have a 90% sample of class A and a 10% sample of class B in our training set. Then, our model will predict with an accuracy of 90% by predicting all the training samples belonging to class A. If we test the same model with a test set of 60% from class A and 40% from class B. Then the accuracy will fall, and we will get an accuracy of 60%. 

Accuracy is good but it gives a False Positive sense of achieving high accuracy. The problem arises due to the possibility of misclassification of minor class samples being very high.

Logarithmic Loss

Log loss penalizes the false (false positive) classification. It usually works well with multi-class classification. Working on Log loss, the classifier should assign a probability for each and every class of all the samples. If there are N  samples belonging to the M class, then we calculate the Log loss in this way:

[Tex]\text{Logarithmic Loss} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{M} y_{ij} \cdot \log(p_{ij})[/Tex] 

Now the Terms, 

  • yij indicate whether sample i belongs to class j.
  • pij – The probability of sample i belongs to class j.
  • The range of log loss is [0,?). When the log loss is near 0 it indicates high accuracy and when away from zero then, it indicates lower accuracy.
  • Let me give you a bonus point, minimizing log loss gives you higher accuracy for the classifier.

Area Under Curve (AUC)

It is one of the widely used metrics and basically used for binary classification. The AUC of a classifier is defined as the probability of a classifier will rank a randomly chosen positive example higher than a negative example. Before going into  AUC  more, let me make you comfortable with a few basic terms. 

True Positive Rate:

Also called or termed sensitivity. True Positive Rate is considered as a portion of positive data points that are correctly considered as positive, with respect to all data points that are positive.

[Tex]\rm{TPR} = \frac{TP}{TP + FN}    [/Tex] 

True Negative Rate

Also called or termed specificity. True Negative Rate is considered as a portion of negative data points that are correctly considered as negative, with respect to all data points that are negatives.

[Tex]\rm{TNR} = \frac{TN}{TN \;+\; FP} [/Tex] 

False Positive Rate

False Negatives rate is actually the proportion of actual positives that are incorrectly identified as negatives

[Tex]\rm{FPR} = \frac{\rm{FP}}{\rm{FP \;+ \;TN}}[/Tex]

False Positive Rate and True Positive Rate both have values in the range [0, 1]. Now the thing is what is A U C then? So, A U C  is a curve plotted between False Positive Rate Vs True Positive Rate at all different data points with a range of  [0, 1]. Greater the value of AUCC better the performance of the model.

ROC Curve for Evaluation of Classification Models

ROC Curve for Evaluation of Classification Models

Precision

There is another metric named Precision. Precision is a measure of a model’s performance that tells you how many of the positive predictions made by the model are actually correct.

[Tex]\rm{Precision} = \frac{TP}{TP\; +\; FP}[/Tex]

Recall

Recall is the ratio of correctly predicted positive instances to the total actual positive instances. It measures how well the model captures all relevant positive cases.

[Tex]\rm{Recall} = \frac{TP}{TP\;+\;FN}[/Tex]

F1 Score

F1-Score is a harmonic mean between recall and precision. Its range is [0,1]. This metric usually tells us how precise (correctly classifies how many instances) and robust (does not miss any significant number of instances) our classifier is.

Lower recall and higher precision give you great accuracy but then it misses a large number of instances. The more the F1 score better will be performance. It can be expressed mathematically in this way:

[Tex]F 1=2 * \frac{1}{\frac{1}{\text {precision}}+\frac{1}{\text {recall}}}[/Tex]

Confusion Matrix

Confusion matrix creates a N X N matrix, where N is the number of classes or categories that are to be predicted. Here we have N = 2, so we get a 2 X 2 matrix. Suppose there is a problem with our practice which is a binary classification. Samples of that classification belong to either Yes or No. So, we build our classifier which will predict the class for the new input sample. After that, we tested our model with 165 samples, and we get the following result.

[Tex]\begin{array}{|c|c|c|}\hline\textbf{n = 165} & \text{Predicted: NO} & \text{Predicted: YES} \\\hline\text{Actual: NO} & 50 & 10 \\\hline\text{Actual: YES} & 5 & 100 \\\hline\end{array}[/Tex]

There are 4 terms you should keep in mind: 

  1. True Positives: It is the case where we predicted Yes and the real output was also Yes.
  2. True Negatives: It is the case where we predicted No and the real output was also No.
  3. False Positives: It is the case where we predicted Yes but it was actually No.
  4. False Negatives: It is the case where we predicted No but it was actually Yes. 

The accuracy of the matrix is always calculated by taking average values present in the main diagonal i.e.

[Tex]\begin{array}{l}\text{Accuracy} = \frac{\text{True Positive} + \text{True Negative}}{\text{Total Samples}} \\\text{Accuracy} = \frac{100 + 50}{165} \\\text{Accuracy} = 0.91\end{array}[/Tex]

Regression Evaluation Metrics

In the regression task, we are supposed to predict the target variable which is in the form of continuous values. To evaluate the performance of such a model below mentioned evaluation metrics are used:

  • Mean Absolute Error
  • Mean Squared Error
  • Root Mean Square Error
  • Root Mean Square Logarithmic Error
  • R2 – Score

Mean Absolute Error (MAE)

Mean Absolute Error(MAE) is the average distance between predicted and original values. Basically, it gives how we have predicted from the actual output. However, there is one limitation i.e. it doesn’t give any idea about the direction of the error which is whether we are under-predicting or over-predicting our data. It can be represented mathematically in this way:

[Tex]\rm{MAE}=\frac{1}{N} \sum_{j=1}^{N}\left|y_{j}-\hat{y}_{j}\right|[/Tex]

Mean Squared Error (MSE)

MSE is similar to mean absolute error but the difference is it takes the square of the average of between predicted and original values. The main advantage to take this metric is here, it is easier to calculate the gradient whereas, in the case of mean absolute error, it takes complicated programming tools to calculate the gradient. By taking the square of errors it pronounces larger errors more than smaller errors, we can focus more on larger errors. It can be expressed mathematically in this way.

[Tex]\rm{MSE}=\frac{1}{N} \sum_{j=1}^{N}\left(y_{j}-\hat{y}_{j}\right)^{2}[/Tex]

Root Mean Square Error (RMSE)

RMSE is a metric that can be obtained by just taking the square root of the MSE value. As we know that the MSE metrics are not robust to outliers and so are the RMSE values. This gives higher weightage to the large errors in predictions.

[Tex]\rm{RMSE}=\sqrt{\frac{\sum_{j=1}^{N}\left(y_{j}-\hat{y}_{j}\right)^{2}}{N}}[/Tex]

Root Mean Squared Logarithmic Error (RMSLE)

There are times when the target variable varies in a wide range of values. And hence we do not want to penalize the overestimation of the target values but penalize the underestimation of the target values. For such cases, RMSLE is used as an evaluation metric which helps us to achieve the above objective.

Some changes in the original formula of the RMSE code will give us the RMSLE formula that is as shown below:

[Tex]\rm{RMSLE}=\sqrt{\frac{\sum_{j=1}^{N}\left(\log(y_{j}+1) – \log (\hat{y}_{j}+1)\right)^{2}}{N}}[/Tex]

R2 – Score

The coefficient of determination also called the R2 score is used to evaluate the performance of a linear regression model. It is the amount of variation in the output-dependent attribute which is predictable from the input independent variable(s). It is used to check how well-observed results are reproduced by the model, depending on the ratio of total deviation of results described by the model.

[Tex]R^2 = 1 – \frac{\sum_{j=1}^{n} (y_j – \hat{y}_j)^2}{\sum_{j=1}^{n} (y_j – \bar{y})^2}[/Tex]



Next Article
Machine Learning Model Evaluation
author
amsten
Improve
Article Tags :
  • AI-ML-DS
  • Machine Learning
  • python
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • Clustering Metrics in Machine Learning
    Clustering is a technique in Machine Learning that is used to group similar data points. While the algorithm performs its job, helping uncover the patterns and structures in the data, it is important to judge how well it functions. Several metrics have been designed to evaluate the performance of th
    8 min read
  • Machine Learning Model Evaluation
    Model evaluation is a process that uses some metrics which help us to analyze the performance of the model. Think of training a model like teaching a student. Model evaluation is like giving them a test to see if they truly learned the subject—or just memorized answers. It helps us answer: Did the m
    10 min read
  • Top Machine Learning Applications in 2019
    Suppose you want to search for Machine Learning on Google. Well, the results you will see are carefully curated and ranked by Google using Machine Learning!!! That's how embedded ML is in the current technology. And this is only going to increase in the future. According to Forbes, the International
    6 min read
  • What is AutoML in Machine Learning?
    Automated Machine Learning (automl) addresses the challenge of democratizing machine learning by automating the complex model development process. With applications in various sectors, AutoML aims to make machine learning accessible to those lacking expertise. The article highlights the growing sign
    13 min read
  • 50 Machine Learning Terms Explained
    Machine Learning has become an integral part of modern technology, driving advancements in everything from personalized recommendations to autonomous systems. As the field evolves rapidly, it’s essential to grasp the foundational terms and concepts that underpin machine learning systems. Understandi
    8 min read
  • Applications of Machine Learning
    Machine learning is one of the most exciting technologies that one would have ever come across. As is evident from the name, it gives the computer that which makes it more similar to humans: The ability to learn. Machine learning is actively being used today, perhaps in many more places than one wou
    5 min read
  • Maths for Machine Learning
    Mathematics is the foundation of machine learning. Math concepts plays a crucial role in understanding how models learn from data and optimizing their performance. Before diving into machine learning algorithms, it's important to familiarize yourself with foundational topics, like Statistics, Probab
    5 min read
  • Top Machine Learning Certifications in 2025
    Machine learning is a critical skill in today’s tech-driven world, affecting sectors such as healthcare, finance, retail, and others. As organizations depend more on artificial intelligence (AI) to solve complex problems, the need for machine learning professionals is skyrocketing. For those looking
    9 min read
  • Bias and Variance in Machine Learning
    There are various ways to evaluate a machine-learning model. We can use MSE (Mean Squared Error) for Regression; Precision, Recall, and ROC (Receiver operating characteristics) for a Classification Problem along with Absolute Error. In a similar way, Bias and Variance help us in parameter tuning and
    10 min read
  • What is Test Dataset in Machine Learning?
    In Machine Learning, a Test Dataset plays a crucial role in evaluating the performance of your trained model. In this blog, we will delve into the intricacies of test dataset in machine learning, its significance, and its indispensable role in the data science lifecycle. What is Test Dataset in Mach
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences