Skip to content
geeksforgeeks
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • Tutorials
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
  • Practice
    • Build your AI Agent
    • GfG 160
    • Problem of the Day
    • Practice Coding Problems
    • GfG SDE Sheet
  • Contests
    • Accenture Hackathon (Ending Soon!)
    • GfG Weekly [Rated Contest]
    • Job-A-Thon Hiring Challenge
    • All Contests and Events
  • Python Tutorial
  • Interview Questions
  • Python Quiz
  • Python Glossary
  • Python Projects
  • Practice Python
  • Data Science With Python
  • Python Web Dev
  • DSA with Python
  • Python OOPs
Open In App
Next Article:
Extract hyperlinks from PDF in Python
Next article icon

How to extract images from PDF in Python?

Last Updated : 09 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python.

To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow.

pip install PyMuPDF Pillow

PyMuPDF is used to access PDF files. To extract images from a PDF file, we need to follow the steps mentioned below-

  • Import necessary libraries
  • Specify the path of the file from which you want to extract images and open it
  • Iterate through all the pages of the PDF and get all images and objects present on every page
  • Use getImageList() method to get all image objects as a list of tuples
  • To get the image in bytes and along with the additional information about the image, use extractImage()

Note: To download the PDF file click here.

Implementation:

Python
# STEP 1 # import libraries import fitz  # PyMuPDF import io from PIL import Image  # STEP 2 # file path you want to extract images from file = "/content/pdf_file.pdf"  # open the file pdf_file = fitz.open(file)  # STEP 3 # iterate over PDF pages for page_index in range(len(pdf_file)):      # get the page itself     page = pdf_file.load_page(page_index)  # load the page     image_list = page.get_images(full=True)  # get images on the page      # printing number of images found in this page     if image_list:         print(f"[+] Found a total of {len(image_list)} images on page {page_index}")     else:         print("[!] No images found on page", page_index)          for image_index, img in enumerate(image_list, start=1):         # get the XREF of the image         xref = img[0]          # extract the image bytes         base_image = pdf_file.extract_image(xref)         image_bytes = base_image["image"]          # get the image extension         image_ext = base_image["ext"]          # save the image         image_name = f"image{page_index+1}_{image_index}.{image_ext}"         with open(image_name, "wb") as image_file:             image_file.write(image_bytes)             print(f"[+] Image saved as {image_name}") 

Output:

Image to PDF and PDF to Image Conversion:

Image to PDF Conversion

Note: The image used here can be found here.

Python
import fitz doc = fitz.open() imgdoc = fitz.open('image.jpeg')  # open image pdfbytes = imgdoc.convert_to_pdf() imgpdf = fitz.open("pdf", pdfbytes) doc.insert_pdf(imgpdf) doc.save('imagetopdf.pdf')  # save file 


First, we opened a blank document. Then we opened the image.

Now the image is converted to PDF using the convert_to_pdf() method.

After conversion, the image is appended to the empty doc which we created at starting. The document is saved after it has been appended.

Output:

PDF to Image Conversion

Note: We are using the sample.pdf for PDf to image conversion; to get the pdf, use the link below.

https://www.africau.edu/images/default/sample.pdf - sample.pdf 

Python
import fitz doc = fitz.open('sample.pdf') for page in doc:     pix = page.get_pixmap(matrix=fitz.Identity, dpi=None,                           colorspace=fitz.csRGB, clip=None, alpha=True, annots=True)     pix.save("samplepdfimage-%i.jpg" % page.number)  # save file 


We used the get_pixmap() method to convert pdf to image and then saved the image.

Output:

The sample.pdf is a two-page document, so two separate images are created.


Next Article
Extract hyperlinks from PDF in Python

D

devanshigupta1304
Improve
Article Tags :
  • Python
  • python-utility
  • Listicles
Practice Tags :
  • python

Similar Reads

  • Extract images from video in Python
    OpenCV comes with many powerful video editing functions. In current scenario, techniques such as image scanning, face recognition can be accomplished using OpenCV. Image Analysis is a very common field in the area of Computer Vision. It is the extraction of meaningful information from videos or imag
    2 min read
  • How to extract image metadata in Python?
    Prerequisites: PIL Metadata stands for data about data. In case of images, metadata means details about the image and its production. Some metadata is generated automatically by the capturing device.  Some details contained by image metadata is as follows: HeightWidthDate and TimeModel etc. Python h
    2 min read
  • How to Convert Image to PDF in Python?
    img2pdf is an open source Python package to convert images to pdf format. It includes another module Pillow which can also be used to enhance image (Brightness, contrast and other things) Use this command to install the packages pip install img2pdf   Below is the implementation: Image can be convert
    1 min read
  • Extract hyperlinks from PDF in Python
    Prerequisite: PyPDF2, Regex In this article, We are going to extract hyperlinks from PDF in Python. It can be done in different ways: Using PyPDF2Using pdfx Method 1: Using PyPDF2. PyPDF2 is a python library built as a PDF toolkit. It is capable of Extracting document information and many more. Appr
    2 min read
  • How to Extract PDF Tables in Python?
    This topic is about the way to extract tables from a PDF enter Python. At first, let's discuss what's a PDF file? PDF (Portable Document Format) may be a file format that has captured all the weather of a printed document as a bitmap that you simply can view, navigate, print, or forward to somebody
    2 min read
  • How to open an image from the URL in PIL?
    In this article, we will learn How to open an image from the URL using the PIL module in python. For the opening of the image from a URL in Python, we need two Packages urllib and Pillow(PIL). Approach:Install the required libraries and then import them. To install use the following commands:pip ins
    1 min read
  • How to Download All Images from a Web Page in Python?
    Prerequisite: Requests BeautifulSouposFile Handling Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the user to save data for personal use. One way is to manually copy-paste the data, which both tedious and time-consuming. Web Scraping is
    3 min read
  • How to download an image from a URL in Python
    Downloading content from its URL is a common task that Web Scrapers or online trackers perform. These URLs or Uniform Resource Locators can contain the web address (or local address) of a webpage, website, image, text document, container files, and many other online resources. It is quite easy to do
    3 min read
  • Extracting patches from large images using Python
    In this article, we are going to look at how we can extract patches from large images in Python. We will be using the patchify library to extract patches from images. Patchy is a Python library that can split images into small overlap able patches by given patch cell size, and merge patches into the
    2 min read
  • How to iterate through images in a folder Python?
    In this article, we will learn how to iterate through images in a folder in Python.  Method 1: Using os.listdirExample 1: Iterating through .png onlyAt first we imported the os module to interact with the operating system.Then we import listdir() function from os to get access to the folders given i
    2 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences