Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Web Scraping Financial News Using Python
Next article icon

Implementing web scraping using lxml in Python

Last Updated : 05 Oct, 2021
Comments
Improve
Suggest changes
Like Article
Like
Report

Web scraping basically refers to fetching only some important piece of information from one or more websites. Every website has recognizable structure/pattern of HTML elements. 

Steps to perform web scraping :
1. Send a link and get the response from the sent link 
2. Then convert response object to a byte string. 
3. Pass the byte string to ‘fromstring’ method in html class in lxml module. 
4. Get to a particular element by xpath. 
5. Use the content according to your need.
 

For accomplishing this task some third-party packages is needed to install. Use pip to install wheel(.whl) files.  

pip install requests pip install lxml

xpath to the element is also needed from which data will be scrapped. An easy way to do this is –

1. Right-click the element in the page which has to be scrapped and go-to “Inspect”.  

2. Right-click the element on source-code to the right. 

3. Copy xpath. 

Here is a simple implementation on “geeksforgeeks homepage“: 

Python3




# Python3 code implementing web scraping using lxml
 
import requests
 
# import only html class
from lxml import html
 
# url to scrap data from
url = 'https://www.geeksforgeeks.org'
 
# path to particular element
path = '//*[@id ="post-183376"]/div / p'
 
# get response object
response = requests.get(url)
 
# get byte string
byte_data = response.content
 
# get filtered source code
source_code = html.fromstring(byte_data)
 
# jump to preferred html element
tree = source_code.xpath(path)
 
# print texts in first element in list
print(tree[0].text_content())
 
 

The above code scrapes the paragraph in first article from “geeksforgeeks homepage” homepage. 
Here’s the sample output. The output may not be same for everyone as the article would have changed.

Output :  

"Consider the following C/C++ programs and try to guess the output? Output of all of the above programs is unpredictable (or undefined). The compilers (implementing… Read More »"

Here’s another example for data scraped from Wiki-web-scraping. 

Python3




import requests
from lxml import html
 
# url to scrap data from
link = 'https://en.wikipedia.org / wiki / Web_scraping'
 
# path to particular element
path = '//*[@id ="mw-content-text"]/div / p[1]'
 
response = requests.get(link)
byte_string = response.content
 
# get filtered source code
source_code = html.fromstring(byte_string)
 
# jump to preferred html element
tree = source_code.xpath(path)
 
# print texts in first element in list
print(tree[0].text_content())
 
 

Output : 

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automate processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. 
 

 



Next Article
Web Scraping Financial News Using Python
author
mohit_negi
Improve
Article Tags :
  • Python
  • Python-Miscellaneous
  • python-utility
Practice Tags :
  • python

Similar Reads

  • Quote Guessing Game using Web Scraping in Python
    Prerequisite: BeautifulSoup Installation  In this article, we will scrape a quote and details of the author from this site http//quotes.toscrape.com using python framework called BeautifulSoup and develop a guessing game using different data structures and algorithm. The user will be given 4 chances
    3 min read
  • Web Scraping Financial News Using Python
    In this article, we will cover how to extract financial news seamlessly using Python. This financial news helps many traders in placing the trade in cryptocurrency, bitcoins, the stock markets, and many other global stock markets setting up of trading bot will help us to analyze the data. Thus all t
    3 min read
  • Clean Web Scraping Data Using clean-text in Python
    If you like to play with API's or like to scrape data from various websites, you must've come around random annoying text, numbers, keywords that come around with data. Sometimes it can be really complicating and frustrating to clean scraped data to obtain the actual data that we want.  In this arti
    2 min read
  • Implementing Web Scraping in Python with BeautifulSoup
    There are mainly two ways to extract data from a website: Use the API of the website (if it exists). For example, Facebook has the Facebook Graph API which allows retrieval of data posted on Facebook.Access the HTML of the webpage and extract useful information/data from it. This technique is called
    8 min read
  • Pagination using Scrapy - Web Scraping with Python
    Pagination using Scrapy. Web scraping is a technique to fetch information from websites. Scrapy is used as a Python framework for web scraping. Getting data from a normal website is easier, and can be just achieved by just pulling the HTML of the website and fetching data by filtering tags. But what
    3 min read
  • Scraping Indeed Job Data Using Python
    In this article, we are going to see how to scrape Indeed job data using python. Here we will use Beautiful Soup and the request module to scrape the data. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Py
    3 min read
  • Increase the speed of Web Scraping in Python using HTTPX module
    In this article, we will talk about how to speed up web scraping using the requests module with the help of the HTTPX module and AsyncIO by fetching the requests concurrently. The user must be familiar with Python. Knowledge about the Requests module or web scraping would be a bonus. Required Module
    4 min read
  • Scrape LinkedIn Profiles without login using Python
    In this article, we'll explore how to scrape LinkedIn profiles without the need for a login, empowering you to gather valuable insights and information programmatically. By leveraging Python's web scraping capabilities, we can access public LinkedIn profiles seamlessly, opening up new possibilities
    3 min read
  • Scraping Flipkart Data using Python
    Web scraping is commonly used to gather information from a webpage. Using this technique, we are able to extract a large amount of data and then save it. We can use this data at many places later according to our needs.   For Scraping data, we need to import a few modules. These modules did not come
    3 min read
  • Scraping dynamic content using Python-Scrapy
    Let's suppose we are reading some content from a source like websites, and we want to save that data on our device. We can copy the data in a notebook or notepad for reuse in future jobs. This way, we used scraping(if we didn't have a font or database, the form brute removes the data in documents, s
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences