'Read PDF metadata using PyPDF2

I've tried to extract metadata with PyPDF2 and pdfminer.six and got:

reader = PdfFileReader("example.pdf")
info = pdf.getDocumentInfo()

gets response:

{'/Title': IndirectObject(38, 0), '/Author': IndirectObject(40, 0), '/Subject': IndirectObject(41, 0), '/Producer': IndirectObject(39, 0), '/Creator': IndirectObject(42, 0), '/CreationDate': IndirectObject(43, 0), '/ModDate': IndirectObject(43, 0)}

Using pdfrw

With pdfrw it worked like this:

from pdfrw import PdfReader
>>> PdfReader(<filename>).Info


Solution 1:[1]

This is now part of the PyPDF2 docs:

from PyPDF2 import PdfFileReader

reader = PdfFileReader("example.pdf")

info = reader.getDocumentInfo()

print(reader.numPages)

# All of the following could be None!
print(info.author)
print(info.creator)
print(info.producer)
print(info.subject)
print(info.title)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Thoma