'How can I get the total count of total pages of a PDF file using PDFMiner in Python?
In PyPDF 2, pdfreader.getNumPages() gives me the total number of pages of a PDF file.
How can I get this using PDFMiner?
Solution 1:[1]
Using pdfminer.six you just need to import the high level function extract_pages, convert the generator into a list and take its length.
from pdfminer.high_level import extract_pages
print(len(list(extract_pages(pdf_file))))
Solution 2:[2]
I realize you were asking for PDFMiner. However, people coming via Google Search to this question might also be interested in alternatives to PDFMiner.
Pike
from pikepdf import Pdf
pdf_doc = Pdf.open('fourpages.pdf')
pdf_page_count = len(pdf_doc)
Solution 3:[3]
Using pdfminer,import the necessary modules.
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
Create a PDF parser object associated with the file object.
fp = open('your_file.pdf', 'rb')
parser = PDFParser(fp)
Create a PDF document object that stores the document structure.
document = PDFDocument(parser)
Iterate through the create_pages() function incrementing each time there is a page.
num_pages = 0
for page in PDFPage.create_pages(document):
num_pages += 1
print(num_pages)
Solution 4:[4]
I found PDFMiner very slow in getting the total number of pages. I found this to be a cleaner and faster solution:
pip3 install PyPDF2
from PyPDF2 import PdfFileReader
def get_pdf_page_count(path):
with open(path, 'rb') as fl:
reader = PdfFileReader(fl)
return reader.getNumPages()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Chris |
| Solution 2 | Peter Mortensen |
| Solution 3 | Mangohero1 |
| Solution 4 | Peter Mortensen |
