'How can I get the total count of total pages of a PDF file using PDFMiner in Python?

In PyPDF 2, pdfreader.getNumPages() gives me the total number of pages of a PDF file.

How can I get this using PDFMiner?



Solution 1:[1]

Using pdfminer.six you just need to import the high level function extract_pages, convert the generator into a list and take its length.

from pdfminer.high_level import extract_pages

print(len(list(extract_pages(pdf_file))))

Solution 2:[2]

I realize you were asking for PDFMiner. However, people coming via Google Search to this question might also be interested in alternatives to PDFMiner.

Pike

Docs

from pikepdf import Pdf
pdf_doc = Pdf.open('fourpages.pdf')
pdf_page_count = len(pdf_doc)

Solution 3:[3]

Using pdfminer,import the necessary modules.

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

Create a PDF parser object associated with the file object.

fp = open('your_file.pdf', 'rb')
parser = PDFParser(fp)

Create a PDF document object that stores the document structure.

document = PDFDocument(parser)

Iterate through the create_pages() function incrementing each time there is a page.

num_pages = 0
for page in PDFPage.create_pages(document):
    num_pages += 1
print(num_pages)

Solution 4:[4]

I found PDFMiner very slow in getting the total number of pages. I found this to be a cleaner and faster solution:

pip3 install PyPDF2

from PyPDF2 import PdfFileReader

def get_pdf_page_count(path):
  with open(path, 'rb') as fl:
    reader = PdfFileReader(fl)
    return reader.getNumPages()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chris
Solution 2 Peter Mortensen
Solution 3 Mangohero1
Solution 4 Peter Mortensen