Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • NLP
  • Data Analysis Tutorial
  • Python - Data visualization tutorial
  • NumPy
  • Pandas
  • OpenCV
  • R
  • Machine Learning Tutorial
  • Machine Learning Projects
  • Machine Learning Interview Questions
  • Machine Learning Mathematics
  • Deep Learning Tutorial
  • Deep Learning Project
  • Deep Learning Interview Questions
  • Computer Vision Tutorial
  • Computer Vision Projects
  • NLP
  • NLP Project
  • NLP Interview Questions
  • Statistics with Python
  • 100 Days of Machine Learning
Open In App
Next Article:
How to Use chatgpt on Linux
Next article icon

Spam Classification using OpenAI

Last Updated : 20 Mar, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The majority of people in today's society own a mobile phone, and they all frequently get communications (SMS/email) on their phones. But the key point is that some of the messages you get may be spam, with very few being genuine or important interactions. You may be tricked into providing your personal information, such as your password, account number, or Social Security number, by scammers that send out phony text messages. They may be able to access your bank, email, and other accounts if they obtain this information. To filter out these messages, a spam filtering system is used that marks a message spam on the basis of its contents or sender.

In this article, we will be seeing how to develop a spam classification system and also evaluate our model using various metrics. In this article, we will be majorly focusing on OpenAI API. There are 2 ways to

We will be using the Email Spam Classification Dataset dataset which has mainly 2 columns and 5572 rows with spam and non-spam messages.

Steps to implement Spam Classification using OpenAI

Now there are two approaches that we will be covering in this article:

1. Using Embeddings API developed by OpenAI

Step 1: Install all the necessary salaries

!pip install -q openai

Step 2: Import all the required libraries

Python3
# necessary libraries import openai import pandas as pd import numpy as np # libraries to develop and evaluate a machine learning model from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score from sklearn.metrics import confusion_matrix 

Step 3: Assign your API key to the OpenAI environment

Python3
# replace "YOUR API KEY" with your generated API key openai.api_key = "YOUR API KEY" 

Step 4: Read the CSV file and clean the dataset

Our dataset has 3 unnamed columns with NULL values,

Note: Open AI's public API does not process more than 60 requests per minute. so we will drop them and we are taking only 60 records here only.

Python3
# while loading the csv, we ignore any encoding errors and skip any bad line df = pd.read_csv('spam.csv', encoding_errors='ignore', on_bad_lines='skip') print(df.shape) # we have 3 columns with NULL values, to remove that we use the below line df = df.dropna(axis=1) # we are taking only the first 60 rows for developing the model df = df.iloc[:60] # rename the columns v1 and v2 to Output and Text respectively df.rename(columns = {'v1':'OUTPUT', 'v2': 'TEXT'}, inplace = True) print(df.shape) df.head() 

Output:

Email Spam Detection DataFrame
Email Spam Classification Dataset

Step 5: Define a function to use Open AI's Embedding API

We use the Open AI's Embedding function to generate embedding vectors and use them for classification. Our API uses the "text-embedding-ada-002" model which belongs to the second generation of embedding models developed by OpenAI. The embeddings generated by this model are of length 1536.

Python3
# function to generate vector for a string def get_embedding(text, model="text-embedding-ada-002"):    return openai.Embedding.create(input = [text], model=model)['data'][0]['embedding']  # applying the above funtion to generate vectors for all 60 text pieces df["embedding"] = df.TEXT.apply(get_embedding).apply(np.array)  # convert string to array df.head() 

Output:

Spam Email Classification Dataset
Email Spam Classification Dataset

Step 6: Custom Label the classes of the output variable to 1 and 0, where 1 means "spam" and 0 means "not spam".

Python3
class_dict = {'spam': 1, 'ham': 0} df['class_embeddings'] = df.OUTPUT.map(class_dict) df.head() 

Output:

Spam Classification dataset
Spam Classification dataFrame after feature engineerin

Step 7: Develop a Classification model.

We will be splitting the dataset into a training set and validation dataset using train_test_split and training a Random Forest Classification model.

Python3
# split data into train and test X = np.array(df.embedding) y = np.array(df.class_embeddings) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # train random forest classifier clf = RandomForestClassifier(n_estimators=100) clf.fit(X_train.tolist(), y_train) preds = clf.predict(X_test.tolist())  # generate a classification report involving f1-score, recall, precision and accuracy report = classification_report(y_test, preds) print(report) 

Output:

             precision    recall  f1-score   support
0 0.82 1.00 0.90 9
1 1.00 0.33 0.50 3
accuracy 0.83 12
macro avg 0.91 0.67 0.70 12
weighted avg 0.86 0.83 0.80 12

Step 8: Calculate the accuracy of the model

Python3
print("accuracy: ", np.round(accuracy_score(y_test, preds)*100,2), "%") 

Output:

accuracy:  83.33 %

Step 9: Print the confusion matrix for our classification model

Python3
confusion_matrix(y_test, preds) 

Output:

array([[9, 0],
[2, 1]])

2. Using text completion API developed by OpenAI

Step 1: Install the Openai library in the Python environment

!pip install -q openai

Step 2: Import the following libraries

Python3
import openai 

Step 3: Assign your API key to the Openaithe environment

Python3
# replace "YOUR API KEY" with your generated API key openai.api_key = "YOUR API KEY" 

Step 4: Define a function using the text completion API of Openai

Python3
def spam_classification(message):   response = openai.Completion.create(     model="text-davinci-003",     prompt=f"Classify the following message as spam or not spam:\n\n{message}\n\nAnswer:",     temperature=0,     max_tokens=64,     top_p=1.0,     frequency_penalty=0.0,     presence_penalty=0.0   )   return response['choices'][0]['text'].strip() 

Step 5: Try out the function with some examples

Example 1:

Python3
out = spam_classification("""Congratulations! You've Won a $1000 gift card from walmart.                            Go to https://bit.ly to claim your reward.""") print(out) 

Output:

Spam

Example 2:

Python3
out = spam_classification("Hey Alex, just wanted to let you know tomorrow is an off. Thank you") print(out) 

Output:

Not spam

1. Which algorithm is best for spam detection?

There isn't a single algorithm that has consistently produced reliable outcomes. The type of the spam, the data that is accessible, and the particular requirements of the problem are some of the variables that affect an algorithm's efficiency. Although Naive Bayes, Neural Networks (RNNs), Logistic Regression, Random Forest, and Support Vector Machines are some of the most frequently used classification techniques.

2. What is embedding or word embedding?

The embedding or Word embedding is a natural language processing (NLP) technique where words are mapped into vectors of real numbers. It is a way of representing words and documents through a dense vector representation. This representation is learned from data and is shown to capture the semantic and syntactic properties of words. The words closest in vector space have the most similar meanings.


3. Is spam classification supervised or unsupervised?

Spam classification is supervised as one requires both independent variable(message contents) and target variables(outcome,i.e., whether the email is spam or not) to develop a model.

4. What is spam vs ham classification?

Email that is not spam is referred to be "Ham". Alternatively, "good mail" or "non-spam" It ought to be viewed as a quicker, snappier alternative to "non-spam". The phrase "non-spam" is probably preferable in most contexts because it is more extensively used by anti-spam software makers than it is elsewhere.

Conclusion

In this article, we discussed the development of a spam classifier using OpenAI modules. Open AI has many such modules that can help you ease your daily work and also help you get started with projects in the field of Artificial Intelligence. You can check out other tutorials using Open AI API's below:

  • Generate Images With OpenAI in Python
  • PandasAI Library from OpenAI
  • How to Build a ChatGPT Like App in Android using OpenAI API?

Next Article
How to Use chatgpt on Linux

P

prathamso02t4
Improve
Article Tags :
  • Data Science
  • Machine Learning
  • AI-ML-DS
  • Natural-language-processing
  • python
  • ChatGPT
Practice Tags :
  • Machine Learning
  • python

Similar Reads

  • OpenAI Python API - Complete Guide
    OpenAI is the leading company in the field of AI. With the public release of software like ChatGPT, DALL-E, GPT-3, and Whisper, the company has taken the entire AI industry by storm. Everyone has incorporated ChatGPT to do their work more efficiently and those who failed to do so have lost their job
    15+ min read
  • Extract keywords from text with ChatGPT
    In this article, we will learn how to extract keywords from text with ChatGPT using Python. ChatGPT is developed by OpenAI. It is an extensive language model based on the GPT-3.5 architecture. It is a type of AI chatbot that can take input from users and generate solutions similar to humans. ChatGPT
    4 min read
  • Pandas AI: The Generative AI Python Library
    In the age of AI, many of our tasks have been automated especially after the launch of ChatGPT. One such tool that uses the power of ChatGPT to ease data manipulation task in Python is PandasAI. It leverages the power of ChatGPT to generate Python code and executes it. The output of the generated co
    9 min read
  • Text Manipulation using OpenAI
    Open AI is a leading organization in the field of Artificial Intelligence and Machine Learning, they have provided the developers with state-of-the-art innovations like ChatGPT, WhisperAI, DALL-E, and many more to work on the vast unstructured data available. For text manipulation, OpenAI has compil
    11 min read
  • OpenAI Whisper
    In today's time, data is available in many forms, like tables, images, text, audio, or video. We use this data to gain insights and make predictions for certain events using various machine learning and deep learning techniques. There are many techniques that help us work on tables, images, texts, a
    9 min read
  • Spam Classification using OpenAI
    The majority of people in today's society own a mobile phone, and they all frequently get communications (SMS/email) on their phones. But the key point is that some of the messages you get may be spam, with very few being genuine or important interactions. You may be tricked into providing your pers
    6 min read
  • How to Use chatgpt on Linux
    OpenAI has developed an AI-powered chatbot named `ChatGPT`, which is used by users to have their answers to questions and queries. One can access ChatGPT on searchingness easily. But some users want to access this chatbot on their Linux System. It can be accessed as a Desktop application on Ubuntu o
    6 min read
  • PandasAI Library from OpenAI
    We spend a lot of time editing, cleaning, and analyzing data using various methodologies in today's data-driven environment. Pandas is a well-known Python module that aids with data manipulation. It keeps data in structures known as dataframes and enables you to alter, clean up, or analyze data by c
    9 min read
  • ChatGPT Prompt to get Datasets for Machine Learning
    With the development of machine learning, access to high-quality datasets is becoming increasingly important. Datasets are crucial for assessing the accuracy and effectiveness of the final model, which is a prerequisite for any machine learning project. In this article, we'll learn how to use a Chat
    7 min read
  • How To Implement ChatGPT In Django
    Integrating ChatGPT into a Django application allows you to create dynamic and interactive chat interfaces. By following the steps outlined in this article, you can implement ChatGPT in your Django project and provide users with engaging conversational experiences. Experiment with different prompts,
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences