'Getting errors reading pdf files using pdfminer
I am trying to read pdf files with this code on Ubuntu
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfpage import PDFPage
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from cStringIO import StringIO
def pdf_to_text(pdfname):
# PDFMiner boilerplate
rsrcmgr = PDFResourceManager()
sio = StringIO()
device = TextConverter(rsrcmgr, sio, codec='utf-8', laparams=LAParams())
interpreter = PDFPageInterpreter(rsrcmgr, device)
# get text from file
fp = file(pdfname, 'rb')
for page in PDFPage.get_pages(fp):
interpreter.process_page(page)
fp.close()
# Get text from StringIO
text = sio.getvalue()
# close objects
device.close()
sio.close()
return text
I have also installed pdfminer
$ sudo apt-get install -y python3-pdfminer
Reading package lists... Done
Building dependency tree
Reading state information... Done
python3-pdfminer is already the newest version (20191020+dfsg-2).
python3-pdfminer set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 72 not upgraded.
When I run the code with pdfminer I get following errors
ModuleNotFoundError: No module named 'pdfminer.pdfinterp'
And also
ModuleNotFoundError: No module named 'pdfminer.psparser'
ModuleNotFoundError: No module named 'pdfminer.converter'
ModuleNotFoundError: No module named 'pdfminer.layout'
How to install pdfminer to remove above errors?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
