Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Data Science
  • Data Science Projects
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • ML Projects
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
Open In App
Next Article:
What is Web Content Mining?
Next article icon

What is Web Content Mining?

Last Updated : 30 Nov, 2022
Comments
Improve
Suggest changes
Like Article
Like
Report

Pre-requisites: Web Mining

Web Content Mining is one of the three different types of techniques in Web Mining. In this article, we will purely discuss Web Content Mining. Mining, extraction, and integration of useful data, information, and knowledge from Web page content are known as Web Mining.

It describes the discovery of useful information from web content. In simple words, it is the application of web mining that extracts relevant or useful information content from the Web. Web Content mining is somehow related but different from other mining techniques like data mining and text mining. Due to heterogeneity and the absence of web data, automated discovery of new knowledge patterns can be challenging to some extent. 

Web data are generally semi-structured and/or unstructured, while data mining is primarily concerned with structured data . It performs scanning and mining of text, image and images, and groups of web pages according to the content of input by displaying the list in search engines.

For Example: if the user is searching for a particular song then the search engine will display or provide suggestions relevant to it.

Web content mining deals with different kinds of data such as text, audio, video, image, etc.

Unstructured Web Data Mining

Unstructured data includes data such as audio, video, etc,  We convert these unstructured data into structured data,i.e., into useful information or structured information (which is known as Web Content Mining). the process of Conversion is mentioned as follows:

Web Content Mining
 

Unstructured Documents Feature Extraction:

1. Bag of words to represent unstructured documents

  • Takes a single word as a feature.
  • It ignores the sequence or order in which words occur.

2. Features could be:

  • Boolean: This would either occur or may not occur in the document.
  • Frequency-based: A number of times the word is repeated in the particular document.

3. Variations of the feature selection include:

  •  Removal of the case, punctuation, less frequent words and also top words, etc.

4. Features can be reduced using different feature selection techniques:

  • Gain of Information, measuring of difference between the probability distribution.
  • Stemming: it reduces words to their morphological roots.

Mining Techniques Using Agents and Databases:

1. Agent-Based Approaches:

  • Intelligent- Search- This type of search basically refers to a particular goal of the user and will return the results based on the conclusion of that goal.
  • Information-Filtering/ Categorization - This type of search basically deals with the filtering of data, i.e., removal of unwanted information or redundant information using certain ai based methods. Like, HyPursuit, BO ( Bookmark Organizer).
  • Growth of Sophisticated AI systems replacing users in an automated or unautomated manner. One of these is Deep Learning, wherein the system is trained by feeding it with certain kinds of data.

2. Database Approaches:

Used for transforming unstructured data into a more structured and high-level collection of resources, such as in relational databases, and using standard database querying mechanisms and data mining techniques to access and analyze this information. 

  • Multilevel Databases:
    • Lowest Level - semi-structured information is kept.
    • High Level- generalization from lower levels organized into relations and objects.
  • Web Query Systems:
    • Web-query systems are developed such as SQL, and Natural Language Processing for extracting data.
Web Content Mining Categorization
 

Web Content Mining Techniques:

  1. Pre-processing 
  2. Clustering
  3. Classifying
  4. Identifying the associations
  5. Topic identification, tracking, and drift analysis

Applications of Web Content Mining:

  1.  Classifying the web documents into categories.
  2.  Identify topics of web documents.
  3.  Finding similar web pages across the different web servers.
  4.  Applications related to relevance.

Next Article
What is Web Content Mining?

S

singhankitasingh066
Improve
Article Tags :
  • Data Science
  • Technical Scripter 2022
  • datamining

Similar Reads

    What is Web Usage Mining?
    Web usage mining, a subset of Data Mining, is basically the extraction of various types of interesting data that is readily available and accessible in the ocean of huge web pages, Internet- or formally known as World Wide Web (WWW). Being one of the applications of data mining technique, it has hel
    6 min read
    What is Data Mining - A Complete Beginner's Guide
    Data mining is a rapidly growing field. It is the process of discovering patterns and relationships in large datasets using techniques such as machine learning and statistical analysis. The goal of data mining is to extract useful information from large datasets and use it for informed decision-maki
    15+ min read
    What is Text Analytics ?
    In a world filled with words, from social media posts to online reviews, understanding what they mean on a large scale is no easy task. That's where text analytics comes in—a powerful tool that helps us make sense of all this information. In this article, we'll take a closer look at text analytics,
    10 min read
    Difference Between Data Mining and Web Mining
    Data mining: Data mining is the method of analyzing expansive sums of data in an exertion to discover relationships, designs, and insights. These designs, concurring to Witten and Eibemust be "meaningful in that they lead to a few advantages, more often than not a financial advantage." Data in data
    3 min read
    Different Types of Data in Data Mining
    Introduction : In general terms, “Mining” is the process of extraction. In the context of computer science, Data Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. There are other kinds of data like semi-structur
    7 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences