Introduction
We all probably know about the versatility of the python programming language. Python made programming easier by providing a large number of libraries, modules, or packages to work with different fields. There is a module called PDF2 that provides many functions to work with PDF files using python programming.
PyPDF2 is the updated version of pyPdf that was released in 2005. The module named PyPDF2 was released in 2016 by adding some extra features with pyPdf.
In this tutorial, you will learn:
- Extract data from a PDF file.
- Rotate only one page of a PDF file.
- Rotate only specific pages of a PDF file.
- Split a PDF file.
- Merge more than one PDF file.
- Add watermark to the PDFs.
- Encrypt a PDF file.
- Decrypting a PDF file.
Requirements
Install PyPDF2: pip install PyPDF2
How to read data from a PDF file
We'll extract the data from this page here
Code
'''Extract the data'''
from PyPDF2 import PdfFileReader
# Opening the PDF file in read and binary mode
# 'rb': read in binary
pdfFile = open('LinuxForDevelopers.pdf', 'rb')
pdfReader = PdfFileReader(pdfFile)
# Page no: 20
thePage = pdfReader.getPage(20)
print(thePage.extractText())
print("Total Pages: ", pdfReader.numPages)
pdfFile.close()
Output
See the yellow line in the above code. There I mentioned the page number(Page: 20) from where the data have been extracted.
How to Rotate one page of a PDF file
Nowadays, we can easily convert multiple photos to PDF files by scanning through our mobile camera. There are many application software available to perform this task.
For instance, you're converting some pre-scanned images to a PDF file by merging them into one. But suppose you captured one photo in the landscape mode instead of portrait. Then, only that diagonally perverted page can ruin the beauty of the whole PDF file.
In such a situation, it is difficult to fix just one or a few pages. But don't worry, python can do it in the blink of an eye. We'll create a python program to fix this issue.
First, we will learn how to rotate only a single page of a PDF file. Rotate more than one page need to use a loop that will iterate through the page numbers. Please keep reading.
Code
'''Rotate only one page'''
from PyPDF2 import PdfFileReader
from PyPDF2.pdf import PdfFileWriter
pdfFile = 'LinuxForDevelopers.pdf'
pdfReader = PdfFileReader(pdfFile)
pdfWriter = PdfFileWriter()
resultPdf = open("result.pdf", 'wb')
# Page no: 0
thePage = pdfReader.getPage(0)
# thePage.rotateClockwise(90)
thePage.rotateCounterClockwise(90)
pdfWriter.addPage(thePage)
pdfWriter.write(resultPdf)
resultPdf.close()
Output
I've rotated the page 90 degree, anti-clockwise. There is another option, clockwise(see the yellow marked line).
Rotate more than one page of a PDF file
As I mentioned in the earlier section, suppose a situation arises where we need to rotate more than one page of a PDF file.
Let's solve this issue.
Code
'''Rotate only a few pages'''
from PyPDF2 import PdfFileReader
from PyPDF2.pdf import PdfFileWriter
need_to_fix = [0, 3, 4]
pdfReader = PdfFileReader('merged_file.pdf')
pdfWriter = PdfFileWriter()
fixed_file = open('fixed_file.pdf', 'wb')
for page in range(pdfReader.getNumPages()):
thePage = pdfReader.getPage(page)
if page in need_to_fix:
thePage.rotateClockwise(90)
pdfWriter.addPage(thePage)
pdfWriter.write(fixed_file)
print("Done!")
fixed_file.close()
Output
How to split the pages of a PDF file
Now split a PDF file(with many pages) into several single-page PDFs.
Code
'''Split a PDF file'''
import PyPDF2
from PyPDF2.pdf import PdfFileWriter
pdfFile = 'LinuxForDevelopers.pdf'
pdfReader = PyPDF2.PdfFileReader(pdfFile)
# Split the pages from 0 to 9.
for page in range(0, 10):
pdfWriter = PdfFileWriter()
pdfWriter.addPage(pdfReader.getPage(page))
splitPage = f'{page}.pdf'
resultPdf = open(splitPage, 'wb')
pdfWriter.write(resultPdf)
resultPdf.close()
Output
Merge PDF files
Code
'''Merge PDF files'''
import PyPDF2
# List of pdf files that are going to merged.
files = ['0.pdf', '1.pdf', '2.pdf', '3.pdf']
pdfWriter = PyPDF2.PdfFileWriter()
for file in files:
pdfReader = PyPDF2.PdfFileReader(file)
pdfWriter.addPage(pdfReader.getPage(0))
mergePdf = open('merged_file.pdf', 'wb')
pdfWriter.write(mergePdf)
mergePdf.close()
Output
Add Watermark in PDF using Python
To perform this task, create a watermark(as your choice) in another single-page PDF. Then, merge that with every page of the PDF file to which you want to add it.
Code
'''Add watermark to a PDF'''
from PyPDF2 import PdfFileReader, PdfFileWriter
pdfFile = 'sample_file.pdf'
watermarkFile = 'watermark.pdf'
result = open('watermarked_file.pdf', 'wb')
pdfReader = PdfFileReader(pdfFile)
pdfWriter = PdfFileWriter()
wmarkReader = PdfFileReader(watermarkFile)
for page in range(pdfReader.getNumPages()):
page = pdfReader.getPage(page)
page.mergePage(wmarkReader.getPage(0))
pdfWriter.addPage(page)
pdfWriter.write(result)
result.close()
Output
How to encrypt a PDF file using python
It's wise to keep a personal file encrypted at all times to prevent unauthorized access. In this section, you'll learn to encrypt a PDF file using a few lines of python code.
Code
'''Encrypt a PDF file'''
from PyPDF2 import PdfFileReader, PdfFileWriter
# Read the PDF file
pdfFile = PdfFileReader('sample_file.pdf')
# Create a PdfFileWriter object
pdfWriter = PdfFileWriter()
# The Result file: "encrypted.pdf"
result = open('encrypted.pdf', 'wb')
password = '00001111'
for page in range(pdfFile.getNumPages()):
pdfWriter.addPage(pdfFile.getPage(page))
# Call the encrypt function
pdfWriter.encrypt(user_pwd=password)
pdfWriter.write(result)
Output
Decrypt a password protected PDF file using python
Now we will decrypt the PDF file that we've encrypted in the previous section.
Code
'''Decrypt a PDF file'''
from PyPDF2 import PdfFileReader, PdfFileWriter
# Read the PDF file
pdfFile = PdfFileReader('encrypted.pdf')
pdfWriter = PdfFileWriter()
# The Result file: "decrypted.pdf"
result = open('decrypted.pdf', 'wb')
password = '00001111'
# Call the decrypt function
pdfFile.decrypt(password=password)
for page in range(pdfFile.getNumPages()):
pdfWriter.addPage(pdfFile.getPage(page))
pdfWriter.write(result)
Output
Conclusion
In this tutorial, you've learned several methods for working with PDF files in Python. For example, Extracting data from a PDF, Rotating, Splitting, Merging, Adding a watermark to a PDF, Encrypt, and Decrypt a PDF file, etc.
I hope you loved this tutorial. Please share your love❤️ and do comment below.
Thanks for reading!💙