Introduction
We all probably know about the versatility of the python programming language. Python made programming easier by providing a large number of libraries, modules, or packages to work with different fields. There is a module called PDF2 that provides many functions to work with PDF files using python programming.
PyPDF2 is the updated version of pyPdf that was released in 2005. The module named PyPDF2 was released in 2016 by adding some extra features with pyPdf.
In this tutorial, you will learn:
- Extract data from a PDF file.
- Rotate only one page of a PDF file.
- Rotate only specific pages of a PDF file.
- Split a PDF file.
- Merge more than one PDF file.
- Add a watermark to the PDFs.
- Encrypt a PDF file.
- Decrypting a PDF file.
Read Also: Working with CSV files - Complete Tutorial
Requirements
Install PyPDF2: pip install PyPDF2
How to read data from a PDF file
We will extract the data from this PDF page using the Python program below.
Code
'''Extract the data'''
from PyPDF2 import PdfFileReader
# Opening the PDF file in read and binary mode
# 'rb': read in binary
pdfFile = open('LinuxForDevelopers.pdf', 'rb')
pdfReader = PdfFileReader(pdfFile)
# Page no: 20
thePage = pdfReader.getPage(20)
print(thePage.extractText())
print("Total Pages: ", pdfReader.numPages)
pdfFile.close()
Output
Look at the yellow line in the above code. There I mentioned the page number(Page: 20) from where the data have been extracted.
How to Rotate one page of a PDF file
Nowadays, we can easily convert multiple photos to PDF files by scanning through our mobile camera. There are many application software available to perform this task.
For instance, you're converting some pre-scanned images to a PDF file by merging them into one. But suppose you captured one photo in landscape mode instead of portrait. Then, only that diagonally perverted page can ruin the beauty of the whole PDF file.
In such a situation, it is difficult to fix just one or a few pages. But don't worry, python can do it in the blink of an eye. We'll create a python program to fix this issue.
First, we will learn how to rotate only a single page of a PDF file. To rotate more than one page we need to use a loop that will iterate through each page number. Please keep reading.
Code
'''Rotate only one page'''
from PyPDF2 import PdfFileReader
from PyPDF2.pdf import PdfFileWriter
pdfFile = 'LinuxForDevelopers.pdf'
pdfReader = PdfFileReader(pdfFile)
pdfWriter = PdfFileWriter()
resultPdf = open("result.pdf", 'wb')
# Page no: 0
thePage = pdfReader.getPage(0)
# thePage.rotateClockwise(90)
thePage.rotateCounterClockwise(90)
pdfWriter.addPage(thePage)
pdfWriter.write(resultPdf)
resultPdf.close()
Output
I've rotated the page 90 degree, anti-clockwise (See the image above). There is another option, clockwise(see the yellow marked line).
Rotate more than one page of a PDF file
As I mentioned in the earlier section, suppose a situation arises where we need to rotate more than one page of a PDF file.
Look precisely at the image above. There, three pages (0, 3, and 4) are needed to be fixed. Let's do it with the help of a python program.
Code
'''Rotate only a few pages'''
from PyPDF2 import PdfFileReader
from PyPDF2.pdf import PdfFileWriter
need_to_fix = [0, 3, 4]
pdfReader = PdfFileReader('merged_file.pdf')
pdfWriter = PdfFileWriter()
fixed_file = open('fixed_file.pdf', 'wb')
for page in range(pdfReader.getNumPages()):
thePage = pdfReader.getPage(page)
if page in need_to_fix:
thePage.rotateClockwise(90)
pdfWriter.addPage(thePage)
pdfWriter.write(fixed_file)
print("Done!")
fixed_file.close()
Output
Now, all pages are formatted well.
How to split the pages of a PDF file
Here, we are going to split a PDF file(with many pages) into several single-page PDFs using a python program. Below is the code for you.
Code
'''Split a PDF file'''
import PyPDF2
from PyPDF2.pdf import PdfFileWriter
pdfFile = 'LinuxForDevelopers.pdf'
pdfReader = PyPDF2.PdfFileReader(pdfFile)
# Split the pages from 0 to 9.
for page in range(0, 10):
pdfWriter = PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(page))
splitPage = f'{page}.pdf'
resultPdf = open(splitPage, 'wb')
pdfWriter.write(resultPdf)
resultPdf.close()
Output
Merge PDF files
In the previous step, we split a multi-page PDF file into several single-page PDFs. In this section, we will merge multiple PDFs (in our case, each single-page PDF file) into a single PDF file using a python program.
Code
'''Merge PDF files'''
import PyPDF2
# List of pdf files that are going to be merged.
files = ['0.pdf', '1.pdf', '2.pdf', '3.pdf']
pdfWriter = PyPDF2.PdfFileWriter()
for file in files:
pdfReader = PyPDF2.PdfFileReader(file)
pdfWriter.addPage(pdfReader.getPage(0))
mergePdf = open('merged_file.pdf', 'wb')
pdfWriter.write(mergePdf)
mergePdf.close()
Output
We successfully merged multiple PDFs into a single one.
Add Watermark in PDF using Python
To accomplish this task, we need to create a watermark (as per our choice) on another single-page PDF and then, merge that PDF with each page of the original PDF file. Here, we will perform this task using a python program.
Code
'''Add watermark to a PDF'''
from PyPDF2 import PdfFileReader, PdfFileWriter
pdfFile = 'sample_file.pdf'
watermarkFile = 'watermark.pdf'
result = open('watermarked_file.pdf', 'wb')
pdfReader = PdfFileReader(pdfFile)
pdfWriter = PdfFileWriter()
wmarkReader = PdfFileReader(watermarkFile)
for page in range(pdfReader.getNumPages()):
page = pdfReader.getPage(page)
page.mergePage(wmarkReader.getPage(0))
pdfWriter.addPage(page)
pdfWriter.write(result)
result.close()
Output
Look, every page is now watermarked.
How to encrypt a PDF file using python
It's wise to keep a personal file encrypted at all times to prevent unauthorized access. In this section, you'll learn to encrypt a PDF file with a password using a few lines of python code.
Code
'''Encrypt a PDF file'''
from PyPDF2 import PdfFileReader, PdfFileWriter
# Read the PDF file
pdfFile = PdfFileReader('sample_file.pdf')
# Create a PdfFileWriter object
pdfWriter = PdfFileWriter()
# The Result file: "encrypted.pdf"
result = open('encrypted.pdf', 'wb')
password = '00001111'
for page in range(pdfFile.getNumPages()):
pdfWriter.addPage(pdfFile.getPage(page))
# Call the encrypt function
pdfWriter.encrypt(user_pwd=password)
pdfWriter.write(result)
Output
Decrypt a password-protected PDF file using python
We learned how to encrypt a PDF file with a password using a python program in the previous section. Now it's necessary to learn the decryption process. In this section, we will do so.
Code
'''Decrypt a PDF file'''
from PyPDF2 import PdfFileReader, PdfFileWriter
# Read the PDF file
pdfFile = PdfFileReader('encrypted.pdf')
pdfWriter = PdfFileWriter()
# The Result file: "decrypted.pdf"
result = open('decrypted.pdf', 'wb')
password = '00001111'
# Call the decrypt function
pdfFile.decrypt(password=password)
for page in range(pdfFile.getNumPages()):
pdfWriter.addPage(pdfFile.getPage(page))
pdfWriter.write(result)
Output
Summary
In this tutorial, you've learned several methods for working with PDF files in Python. For example, Extracting data from a PDF, Rotating, Splitting, Merging, Adding a watermark to a PDF, encrypting, and decrypting a PDF file, etc.
I hope you loved this tutorial. If you have doubt anywhere, just leave your comment below without hesitating. You will get a reply soon.
Thanks for reading!💙
PySeek