Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Beautiful Soup
  • Selenium
  • Scrapy
  • urllib
  • Request
  • open cv
  • Data analysis
  • Machine learning
  • NLP
  • Deep learning
  • Data Science
  • Interview question
  • ML math
  • ML Projects
  • ML interview
  • DL interview
Open In App
Next Article:
Introduction to Web Scraping
Next article icon

Python Web Scraping Tutorial

Last Updated : 02 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In today’s digital world, data is the key to unlocking valuable insights, and much of this data is available on the web. But how do you gather large amounts of data from websites efficiently? That’s where Python web scraping comes in.Web scraping, the process of extracting data from websites, has emerged as a powerful technique to gather information from the vast expanse of the internet.

In this tutorial, we’ll explore various Python libraries and modules commonly used for web scraping and delve into why Python 3 is the preferred choice for this task. Along with this you will also explore how to use powerful tools like BeautifulSoup, Scrapy, and Selenium to scrape any website.

Essential Packages and Tools for Python Web Scraping

The latest version of Python , offers a rich set of tools and libraries specifically designed for web scraping, making it easier than ever to retrieve data from the web efficiently and effectively.

Table of Content

  • Requests Module
  • BeautifulSoup Library
  • Selenium
  • Lxml
  • Urllib Module
  • PyautoGUI
  • Schedule
  • Why Python3 for Web Scraping?

Requests Module

The requests library is used for making HTTP requests to a specific URL and returns the response. Python requests provide inbuilt functionalities for managing both the request and response.

pip install requests

Example: Making a Request

Python requests module has several built-in methods to make HTTP requests to specified URI using GET, POST, PUT, PATCH, or HEAD requests. A HTTP request is meant to either retrieve data from a specified URI or to push data to a server. It works as a request-response protocol between a client and a server. Here we will be using the GET request. The GET method is used to retrieve information from the given server using a given URI. The GET method sends the encoded user information appended to the page request.

Python
import requests  # Making a GET request r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')  # check status code for response received # success code - 200 print(r)  # print content of request print(r.content) 

Output

Python requests making GET request

For more information, refer to our Python Requests Tutorial .

BeautifulSoup Library

Beautiful Soup provides a few simple methods and Pythonic phrases for guiding, searching, and changing a parse tree: a toolkit for studying a document and removing what you need. It doesn’t take much code to document an application.

Beautiful Soup automatically converts incoming records to Unicode and outgoing forms to UTF-8. You don’t have to think about encodings unless the document doesn’t define an encoding, and Beautiful Soup can’t catch one. Then you just have to choose the original encoding. Beautiful Soup sits on top of famous Python parsers like LXML and HTML, allowing you to try different parsing strategies or trade speed for flexibility.

pip install beautifulsoup4

Example

  1. Importing Libraries: The code imports the requests library for making HTTP requests and the BeautifulSoup class from the bs4 library for parsing HTML.
  2. Making a GET Request: It sends a GET request to ‘https://www.geeksforgeeks.org/python-programming-language/’ and stores the response in the variable r.
  3. Checking Status Code: It prints the status code of the response, typically 200 for success.
  4. Parsing the HTML : The HTML content of the response is parsed using BeautifulSoup and stored in the variable soup.
  5. Printing the Prettified HTML: It prints the prettified version of the parsed HTML content for readability and analysis.
Python
import requests from bs4 import BeautifulSoup   # Making a GET request r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')  # check status code for response received # success code - 200 print(r)  # Parsing the HTML soup = BeautifulSoup(r.content, 'html.parser') print(soup.prettify()) 

Output

Python BeautifulSoup Parsing HTML

Finding Elements by Class

Now, we would like to extract some useful data from the HTML content. The soup object contains all the data in the nested structure which could be programmatically extracted. The website we want to scrape contains a lot of text so now let’s scrape all those content. First, let’s inspect the webpage we want to scrape.


findallbs4pythontutorial-copy


In the above image, we can see that all the content of the page is under the div with class entry-content. We will use the find class. This class will find the given tag with the given attribute. In our case, it will find all the div having class as entry-content.

We can see that the content of the page is under the <p> tag. Now we have to find all the p tags present in this class. We can use the find_all class of the BeautifulSoup.

Python
import requests from bs4 import BeautifulSoup   # Making a GET request r = requests.get('https://www.geeksforgeeks.org/python-programming-language/')  # Parsing the HTML soup = BeautifulSoup(r.content, 'html.parser')  s = soup.find('div', class_='entry-content') content = soup.find_all('p')  print(content) 

Output:

find_all bs4

For more information, refer to our Python BeautifulSoup .

Selenium

Selenium is a popular Python module used for automating web browsers. It allows developers to control web browsers programmatically, enabling tasks such as web scraping, automated testing, and web application interaction. Selenium supports various web browsers, including Chrome, Firefox, Safari, and Edge, making it a versatile tool for browser automation.

Example 1: For Firefox

In this specific example, we’re directing the browser to the Google search page with the query parameter “geeksforgeeks”. The browser will load this page, and we can then proceed to interact with it programmatically using Selenium. This interaction could involve tasks like extracting search results, clicking on links, or scraping specific content from the page.

Python
# import webdriver  from selenium import webdriver   # create webdriver object  driver = webdriver.Firefox()   # get google.co.in  driver.get("https://google.co.in / search?q = geeksforgeeks")  

Output

for-firefox

Example 2: For Chrome

  1. We import the webdriver module from the Selenium library.
  2. We specify the path to the web driver executable. You need to download the appropriate driver for your browser and provide the path to it. In this example, we’re using the Chrome driver.
  3. We create a new instance of the web browser using webdriver.Chrome() and pass the path to the Chrome driver executable as an argument.
  4. We navigate to a webpage by calling the get() method on the browser object and passing the URL of the webpage.
  5. We extract information from the webpage using various methods provided by Selenium. In this example, we retrieve the page title using the title attribute of the browser object.
  6. Finally, we close the browser using the quit() method.
Python
# importing necessary packages from selenium import webdriver from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager  # for holding the resultant list element_list = []  for page in range(1, 3, 1):      page_url = "https://webscraper.io/test-sites/e-commerce/static/computers/laptops?page=" + str(page)     driver = webdriver.Chrome(ChromeDriverManager().install())     driver.get(page_url)     title = driver.find_elements(By.CLASS_NAME, "title")     price = driver.find_elements(By.CLASS_NAME, "price")     description = driver.find_elements(By.CLASS_NAME, "description")     rating = driver.find_elements(By.CLASS_NAME, "ratings")      for i in range(len(title)):         element_list.append([title[i].text, price[i].text, description[i].text, rating[i].text])  print(element_list)  #closing the driver driver.close() 

Output

For more information, refer to our Python Selenium .

Lxml

The lxml module in Python is a powerful library for processing XML and HTML documents. It provides a high-performance XML and HTML parsing capabilities along with a simple and Pythonic API. lxml is widely used in Python web scraping due to its speed, flexibility, and ease of use.

pip install lxml

Example

Here’s a simple example demonstrating how to use the lxml module for Python web scraping:

  1. We import the html module from lxml along with the requests module for sending HTTP requests.
  2. We define the URL of the website we want to scrape.
  3. We send an HTTP GET request to the website using the requests.get() function and retrieve the HTML content of the page.
  4. We parse the HTML content using the html.fromstring() function from lxml, which returns an HTML element tree.
  5. We use XPath expressions to extract specific elements from the HTML tree. In this case, we’re extracting the text content of all the <a> (anchor) elements on the page.
  6. We iterate over the extracted link titles and print them out.
Python
from lxml import html import requests  # Define the URL of the website to scrape url = 'https://example.com'  # Send an HTTP request to the website and retrieve the HTML content response = requests.get(url)  # Parse the HTML content using lxml tree = html.fromstring(response.content)  # Extract specific elements from the HTML tree using XPath # For example, let's extract the titles of all the links on the page link_titles = tree.xpath('//a/text()')  # Print the extracted link titles for title in link_titles:     print(title) 

Output

More information...

Urllib Module

The urllib module in Python is a built-in library that provides functions for working with URLs. It allows you to interact with web pages by fetching URLs (Uniform Resource Locators), opening and reading data from them, and performing other URL-related tasks like encoding and parsing. Urllib is a package that collects several modules for working with URLs, such as:

  • urllib.request for opening and reading.
  • urllib.parse for parsing URLs
  • urllib.error for the exceptions raised
  • urllib.robotparser for parsing robot.txt files

If urllib is not present in your environment, execute the below code to install it.

pip install urllib3

Example

Here’s a simple example demonstrating how to use the urllib module to fetch the content of a web page:

  1. We define the URL of the web page we want to fetch.
  2. We use urllib.request.urlopen() function to open the URL and obtain a response object.
  3. We read the content of the response object using the read() method.
  4. Since the content is returned as bytes, we decode it to a string using the decode() method with ‘utf-8’ encoding.
  5. Finally, we print the HTML content of the web page.
Python
import urllib.request  # URL of the web page to fetch url = 'https://www.example.com'  try:     # Open the URL and read its content     response = urllib.request.urlopen(url)          # Read the content of the response     data = response.read()          # Decode the data (if it's in bytes) to a string     html_content = data.decode('utf-8')          # Print the HTML content of the web page     print(html_content)  except Exception as e:     print("Error fetching URL:", e) 

Output

uutt

PyautoGUI

The pyautogui module in Python is a cross-platform GUI automation library that enables developers to control the mouse and keyboard to automate tasks. While it’s not specifically designed for web scraping, it can be used in conjunction with other web scraping libraries like Selenium to interact with web pages that require user input or simulate human actions.

pip3 install pyautogui

Example

In this example, pyautogui is used to perform scrolling and take a screenshot of the search results page obtained by typing a query into the search input field and clicking the search button using Selenium.

Python
import pyautogui   # moves to (519,1060) in 1 sec pyautogui.moveTo(519, 1060, duration = 1)  # simulates a click at the present  # mouse position  pyautogui.click()  # moves to (1717,352) in 1 sec pyautogui.moveTo(1717, 352, duration = 1)   # simulates a click at the present  # mouse position pyautogui.click() 

Output

Schedule

The schedule module in Python is a simple library that allows you to schedule Python functions to run at specified intervals. It’s particularly useful in web scraping in Python when you need to regularly scrape data from a website at predefined intervals, such as hourly, daily, or weekly.

Example

  • We import the necessary modules: schedule, time, requests, and BeautifulSoup from the bs4 package.
  • We define a function scrape_data() that performs the web scraping task. Inside this function, we send a GET request to a website (replace ‘https://example.com’ with the URL of the website you want to scrape), parse the HTML content using BeautifulSoup, extract the desired data, and print it.
  • We schedule the scrape_data() function to run every hour using schedule.every().hour.do(scrape_data).
  • We enter a main loop that continuously checks for pending scheduled tasks using schedule.run_pending() and sleeps for 1 second between iterations to prevent the loop from consuming too much CPU.
Python
import schedule  import time   def func():      print("Geeksforgeeks")   schedule.every(1).minutes.do(func)   while True:      schedule.run_pending()      time.sleep(1)  

Output

Why Python3 for Web Scraping?

Python’s popularity for web scraping stems from several factors:

Ease of Use : Python’s clean and readable syntax makes it easy to understand and write code, even for beginners. This simplicity accelerates the development process and reduces the learning curve for web scraping tasks.

Rich Ecosystem : Python boasts a vast ecosystem of libraries and frameworks tailored for web scraping. Libraries like BeautifulSoup, Scrapy, and Requests simplify the process of parsing HTML, making data extraction a breeze.

Versatility : Python is a versatile language that can be used for a wide range of tasks beyond web scraping. Its flexibility allows developers to integrate web scraping seamlessly into larger projects, such as data analysis, machine learning, or web development.

Community Support : Python has a large and active community of developers who contribute to its libraries and provide support through forums, tutorials, and documentation. This wealth of resources ensures that developers have access to assistance and guidance when tackling web scraping challenges.

Conclusion

this tutorial has shown you the basics of how to use Python for web scraping. With the tools we’ve discussed, you can start collecting data from the internet quickly and easily. Whether you need this data for a project, research, or just for fun, Python makes it possible. Remember to always scrape data responsibly and follow the rules set by websites. If you’re excited to learn more about Python and web scraping, check out our Python Course . It’s a great resource to deepen your understanding and enhance your skills, all while having fun exploring the power of Python.

Python Web Scraping – FAQs

1. What is Python web scraping?

Python web scraping refers to the process of extracting data from websites using Python programming. It involves fetching HTML content from a web page and parsing it to gather specific information.

2. Is web scraping legal?

Web scraping is legal as long as you comply with the website’s terms of service and avoid scraping personal or sensitive data. Always check the site’s robots.txt file to ensure you’re following the rules.

3. What is the difference between BeautifulSoup and Scrapy?

BeautifulSoup is a simpler library for beginners focused on HTML parsing and extraction, whereas Scrapy is a more advanced web scraping framework that can handle complex tasks like crawling large datasets or handling pagination automatically.

4. What are some common use cases for Python web scraping?

Common use cases include extracting data for price comparison, content aggregation, job listings, real estate data, and sentiment analysis. Web scraping helps gather structured data from websites for various business and research purposes.





Next Article
Introduction to Web Scraping
author
abhishek1
Improve
Article Tags :
  • AI-ML-DS
  • Python
  • Web-scraping
Practice Tags :
  • python

Similar Reads

  • Python Web Scraping Tutorial
    In today’s digital world, data is the key to unlocking valuable insights, and much of this data is available on the web. But how do you gather large amounts of data from websites efficiently? That’s where Python web scraping comes in.Web scraping, the process of extracting data from websites, has em
    12 min read
  • Introduction to Web Scraping

    • Introduction to Web Scraping
      Web scraping is a technique to fetch data from websites. While surfing on the web, many websites prohibit the user from saving data for personal use. This article will brief you about What is Web Scraping, Uses, Techniques, Tools, and challenges of Web Scraping. Table of Content What is Web Scraping
      6 min read

    • What is Web Scraping and How to Use It?
      Suppose you want some information from a website. Let’s say a paragraph on Donald Trump! What do you do? Well, you can copy and paste the information from Wikipedia into your file. But what if you want to get large amounts of information from a website as quickly as possible? Such as large amounts o
      7 min read

    • Web Scraping - Legal or Illegal?
      If you're connected with the term 'Web Scraping' anyhow, then you must come across a question - Is Web Scraping legal or illegal? Okay, so let's discuss it. If you look closely, you will find out that in today's era the biggest asset of any business is Data! Even the top giants like Facebook, Amazon
      5 min read

    • Difference between Web Scraping and Web Crawling
      1. Web Scraping : Web Scraping is a technique used to extract a large amount of data from websites and then saving it to the local machine in the form of XML, excel or SQL. The tools used for web scraping are known as web scrapers. On the basis of the requirements given, they can extract the data fr
      2 min read

    • Web Scraping using cURL in PHP
      We all have tried getting data from a website in many ways. In this article, we will learn how to web scrape using bots to extract content and data from a website. We will use PHP cURL to scrape a web page, it looks like a typo from leaving caps lock on, but that’s really how you write it. cURL is t
      2 min read

    Basics of Web Scraping

    • HTML Basics
      HTML (HyperText Markup Language) is the standard markup language for creating and structuring web pages. It defines the structure of a webpage using elements and tags.HTML is responsible for displaying text, images, and other content.It serves as the foundation for building websites and web applicat
      6 min read

    • Tags vs Elements vs Attributes in HTML
      In HTML, tags represent the structural components of a document, such as <h1> for headings. Elements are formed by tags and encompass both the opening and closing tags along with the content. Attributes provide additional information or properties to elements, enhancing their functionality or
      2 min read

    • CSS Introduction
      CSS (Cascading Style Sheets) is a language designed to simplify the process of making web pages presentable. It allows you to apply styles to HTML documents by prescribing colors, fonts, spacing, and positioning.The main advantages are the separation of content (in HTML) and styling (in CSS) and the
      5 min read

    • CSS Syntax
      CSS (Cascading Style Sheets) is a stylesheet language used to describe the presentation of a document written in HTML. Understanding CSS syntax is fundamental for creating visually appealing and well-structured web pages. Basic CSS SyntaxCSS is written as rulesets. A ruleset consists of a selector a
      6 min read

    • JavaScript Cheat Sheet - A Basic Guide to JavaScript
      JavaScript is a lightweight, open, and cross-platform programming language. It is omnipresent in modern development and is used by programmers across the world to create dynamic and interactive web content like applications and browsers JavaScript (JS) is a versatile, high-level programming language
      15+ min read

    Setting Up the Environment

    • Installing BeautifulSoup: A Beginner's Guide
      BeautifulSoup is a Python library that makes it easy to extract data from HTML and XML files. It helps you find, navigate, and change the information in these files quickly and simply. It’s a great tool that can save you a lot of time when working with web data. The latest version of BeautifulSoup i
      2 min read

    • How to Install Requests in Python - For Windows, Linux, Mac
      Requests is an elegant and simple HTTP library for Python, built for human beings. One of the most famous libraries for Python is used by developers all over the world. This article revolves around how one can install the requests library of Python in Windows/ Linux/ macOS using pip. Table of Conten
      7 min read

    • Selenium Python Introduction and Installation
      Selenium's Python Module is built to perform automated testing with Python. Selenium in Python bindings provides a simple API to write functional/acceptance tests using Selenium WebDriver. Through Selenium Python API you can access all functionalities of python selenium webdriver intuitively. Table
      4 min read

    • How to Install Python Scrapy on Windows?
      Scrapy is a web scraping library that is used to scrape, parse and collect web data. Now once our spider has scrapped the data then it decides whether to: Keep the data.Drop the data or items.stop and store the processed data items. In this article, we will look into the process of installing the Sc
      2 min read

    Extracting Data from Web Pages

    • Implementing Web Scraping in Python with BeautifulSoup
      There are mainly two ways to extract data from a website: Use the API of the website (if it exists). For example, Facebook has the Facebook Graph API which allows retrieval of data posted on Facebook.Access the HTML of the webpage and extract useful information/data from it. This technique is called
      8 min read

    • How to extract paragraph from a website and save it as a text file?
      Perquisites: Beautiful soupUrllib Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file. Modules Needed bs4: Beautiful Soup(bs4) is
      2 min read

    • Extract all the URLs from the webpage Using Python
      Scraping is a very essential skill for everyone to get data from any website. In this article, we are going to write Python scripts to extract all the URLs from the website or you can save it as a CSV file. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and
      2 min read

    • How to Scrape Nested Tags using BeautifulSoup?
      We can scrap the Nested tag in beautiful soup with help of. (dot) operator. After creating a soup of the page if we want to navigate nested tag then with the help of. we can do it. For scraping Nested Tag using Beautifulsoup follow the below-mentioned steps. Step-by-step Approach Step 1: The first s
      3 min read

    • Extract all the URLs that are nested within <li> tags using BeautifulSoup
      Beautiful Soup is a python library used for extracting html and xml files. In this article we will understand how we can extract all the URLSs from a web page that are nested within <li> tags. Module needed and installation:BeautifulSoup: Our primary module contains a method to access a webpag
      4 min read

    • Clean Web Scraping Data Using clean-text in Python
      If you like to play with API's or like to scrape data from various websites, you must've come around random annoying text, numbers, keywords that come around with data. Sometimes it can be really complicating and frustrating to clean scraped data to obtain the actual data that we want. In this artic
      2 min read

    Fetching Web Pages

    • GET and POST Requests Using Python
      This post discusses two HTTP (Hypertext Transfer Protocol) request methods  GET and POST requests in Python and their implementation in Python.  What is HTTP? HTTP is a set of protocols designed to enable communication between clients and servers. It works as a request-response protocol between a cl
      7 min read

    • BeautifulSoup - Scraping Paragraphs from HTML
      In this article, we will discuss how to scrap paragraphs from HTML using Beautiful Soup Method 1: using bs4 and urllib. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. For installing the module-pip install bs4.urllib: urllib is a package that c
      3 min read

    HTTP Request Methods

    • GET method - Python requests
      Requests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make GET request to a specified URL using requests.GET() method. Before checking out GET method, let's figure out what a GET request is - GET Http Method T
      2 min read

    • POST method - Python requests
      Requests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make POST request to a specified URL using requests.post() method. Before checking out the POST method, let's figure out what a POST request is -   POST Ht
      2 min read

    • PUT method - Python requests
      The requests library is a powerful and user-friendly tool in Python for making HTTP requests. The PUT method is one of the key HTTP request methods used to update or create a resource at a specific URI. Working of HTTP PUT Method If the resource exists at the given URI, it is updated with the new da
      2 min read

    • DELETE method- Python requests
      Requests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make DELETE request to a specified URL using requests.delete() method. Before checking out the DELETE method, let's figure out what a Http DELETE request i
      2 min read

    • HEAD method - Python requests
      Requests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make HEAD request to a specified URL using requests.head() method. Before checking out the HEAD method, let's figure out what a Http HEAD request is - HEAD
      2 min read

    • PATCH method - Python requests
      Requests library is one of the important aspects of Python for making HTTP requests to a specified URL. This article revolves around how one can make PATCH request to a specified URL using requests.patch() method. Before checking out the PATCH method, let's figure out what a Http PATCH request is -
      3 min read

    Searching and Extract for specific tags Beautifulsoup

    • Python BeautifulSoup - find all class
      Prerequisite:- Requests , BeautifulSoup The task is to write a program to find all the classes for a given Website URL. In Beautiful Soup there is no in-built method to find all classes. Module needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This modu
      2 min read

    • BeautifulSoup - Search by text inside a tag
      Prerequisites: Beautifulsoup Beautifulsoup is a powerful python module used for web scraping. This article discusses how a specific text can be searched inside a given tag. INTRODUCTION: BeautifulSoup is a Python library for parsing HTML and XML documents. It provides a simple and intuitive API for
      4 min read

    • Scrape Google Search Results using Python BeautifulSoup
      In this article, we are going to see how to Scrape Google Search Results using Python BeautifulSoup. Module Needed:bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the te
      3 min read

    • Get tag name using Beautifulsoup in Python
      Prerequisite: Beautifulsoup Installation Name property is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. Name object corresponds to the name of an XML or HTML t
      1 min read

    • Extracting an attribute value with beautifulsoup in Python
      Prerequisite: Beautifulsoup Installation Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the
      2 min read

    • BeautifulSoup - Modifying the tree
      Prerequisites: BeautifulSoup Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify html webpages. This article depicts how beautifulsoup can be employed to modify the parse tree. BeautifulSoup is used to search the parse tree and allow you to m
      5 min read

    • Find the text of the given tag using BeautifulSoup
      Web scraping is a process of using software bots called web scrapers in extracting information from HTML or XML content of a web page. Beautiful Soup is a library used for scraping data through python. Beautiful Soup works along with a parser to provide iteration, searching, and modifying the conten
      2 min read

    • Remove spaces from a string in Python
      Removing spaces from a string is a common task in Python that can be solved in multiple ways. For example, if we have a string like " g f g ", we might want the output to be "gfg" by removing all the spaces. Let's look at different methods to do so: Using replace() methodTo remove all spaces from a
      2 min read

    • Understanding Character Encoding
      Ever imagined how a computer is able to understand and display what you have written? Ever wondered what a UTF-8 or UTF-16 meant when you were going through some configurations? Just think about how "HeLLo WorlD" should be interpreted by a computer. We all know that a computer stores data in bits an
      6 min read

    • XML parsing in Python
      This article focuses on how one can parse a given XML file and extract some useful data out of it in a structured way. XML: XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable.That's why, the design goals of X
      7 min read

    • Python - XML to JSON
      A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server. A JSON object contains data in the form of a key/value pai
      4 min read

    Scrapy Basics

    • Scrapy - Command Line Tools
      Prerequisite: Implementing Web Scraping in Python with Scrapy Scrapy is a python library that is used for web scraping and searching the contents throughout the web. It uses Spiders which crawls throughout the page to find out the content specified in the selectors. Hence, it is a very handy tool to
      5 min read

    • Scrapy - Item Loaders
      In this article, we are going to discuss Item Loaders in Scrapy. Scrapy is used for extracting data, using spiders, that crawl through the website. The obtained data can also be processed, in the form, of Scrapy Items. The Item Loaders play a significant role, in parsing the data, before populating
      15+ min read

    • Scrapy - Item Pipeline
      Scrapy is a web scraping library that is used to scrape, parse and collect web data. For all these functions we are having a pipelines.py file which is used to handle scraped data through various components (known as class) which are executed sequentially. In this article, we will be learning throug
      10 min read

    • Scrapy - Selectors
      Scrapy Selectors as the name suggest are used to select some things. If we talk of CSS, then there are also selectors present that are used to select and apply CSS effects to HTML tags and text. In Scrapy we are using selectors to mention the part of the website which is to be scraped by our spiders
      7 min read

    • Scrapy - Shell
      Scrapy is a well-organized framework, used for large-scale web scraping. Using selectors, like XPath or CSS expressions, one can scrape data seamlessly. It allows systematic crawling, and scraping the data, and storing the content in different file formats. Scrapy comes equipped with a shell, that h
      9 min read

    • Scrapy - Spiders
      Scrapy is a free and open-source web-crawling framework which is written purely in python. Thus, scrapy can be installed and imported like any other python package. The name of the package is self-explanatory. It is derived from the word 'scraping' which literally means extracting desired substance
      11 min read

    • Scrapy - Feed exports
      Scrapy is a fast high-level web crawling and scraping framework written in Python used to crawl websites and extract structured data from their pages. It can be used for many purposes, from data mining to monitoring and automated testing. This article is divided into 2 sections:Creating a Simple web
      5 min read

    • Scrapy - Link Extractors
      In this article, we are going to learn about Link Extractors in scrapy. "LinkExtractor" is a class provided by scrapy to extract links from the response we get while fetching a website. They are very easy to use which we'll see in the below post. Scrapy - Link Extractors Basically using the "LinkExt
      5 min read

    • Scrapy - Settings
      Scrapy is an open-source tool built with Python Framework. It presents us with a strong and robust web crawling framework that can easily extract the info from the online page with the assistance of selectors supported by XPath. We can define the behavior of Scrapy components with the help of Scrapy
      7 min read

    • Scrapy - Sending an E-mail
      Prerequisites: Scrapy Scrapy provides its own facility for sending e-mails which is extremely easy to use, and it’s implemented using Twisted non-blocking IO, to avoid interfering with the non-blocking IO of the crawler. This article discusses how mail can be sent using scrapy. For this MailSender c
      2 min read

    • Scrapy - Exceptions
      Python-based Scrapy is a robust and adaptable web scraping platform. It provides a variety of tools for systematic, effective data extraction from websites. It helps us to automate data extraction from numerous websites. Scrapy Python Scrapy describes the spider that browses websites and gathers dat
      7 min read

    Selenium Python Basics

    • Navigating links using get method - Selenium Python
      Selenium's Python Module is built to perform automated testing with Python. Selenium Python bindings provides a simple API to write functional/acceptance tests using Selenium WebDriver. Through Selenium Python API you can access all functionalities of Selenium WebDriver in an intuitive way. This art
      2 min read

    • Interacting with Webpage - Selenium Python
      Selenium’s Python module is designed for automating web testing tasks in Python. It provides a straightforward API through Selenium WebDriver, allowing you to write functional and acceptance tests. To open a webpage, you can use the get() method for navigation. However, the true power of Selenium li
      3 min read

    • Locating single elements in Selenium Python
      Locators Strategies in Selenium Python are methods that are used to locate elements from the page and perform an operation on the same. Selenium’s Python Module is built to perform automated testing with Python. Selenium Python bindings provide a simple API to write functional/acceptance tests using
      5 min read

    • Locating multiple elements in Selenium Python
      Locators Strategies in Selenium Python are methods that are used to locate single or multiple elements from the page and perform operations on the same. Selenium’s Python Module is built to perform automated testing with Python. Selenium Python bindings provide a simple API to write functional/accep
      5 min read

    • Locator Strategies - Selenium Python
      Locators Strategies in Selenium Python are methods that are used to locate elements from the page and perform an operation on the same. Selenium’s Python Module is built to perform automated testing with Python. Selenium Python bindings provides a simple API to write functional/acceptance tests usin
      2 min read

    • Writing Tests using Selenium Python
      Selenium's Python Module is built to perform automated testing with Python. Selenium Python bindings provides a simple API to write functional/acceptance tests using Selenium WebDriver. Through Selenium Python API you can access all functionalities of Selenium WebDriver in an intuitive way. This art
      2 min read

geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences