'Can't import pdftotext in python in my Mac M1

I can't import pdftext in my new mac M1. The steps I took are:

  1. Install python 3.10

  2. Install command line developer tools

  3. pip3 install pdftotext from terminal

  4. Open IDLE, type import pdftotext

  5. I get this error:

    Traceback (most recent call last): File "<pyshell#9>", line 1, in import pdftotext ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pdftotext.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace '_ZN7poppler24set_debug_error_functionEPFvRKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEEPvES9'

I have already spent a few hours searching for this error message.

Any suggestions?

PS: I have tried several other pdf -> text packages, but they don't read the full pdf. For some weird reason, the pdfs I need to read are really complex and many packages don't read them fully. pdftotext does. So what I need is help to make this pdftotext work.



Solution 1:[1]

i dont think pdftotext good library. use PyPDF2 its better and here is example

import PyPDF2
 
#create file object variable
#opening method will be rb
pdffileobj=open('1.pdf','rb')
 
#create reader variable that will read the pdffileobj
pdfreader=PyPDF2.PdfFileReader(pdffileobj)
 
#This will store the number of pages of this pdf file
x=pdfreader.numPages
 
#create a variable that will select the selected number of pages
pageobj=pdfreader.getPage(x+1)
 
#(x+1) because python indentation starts with 0.
#create text variable which will store all text datafrom pdf file
text=pageobj.extractText()
 
#save the extracted data from pdf to a txt file
#we will use file handling here
#dont forget to put r before you put the file path
#go to the file location copy the path by right clicking on the file
#click properties and copy the location path and paste it here.
#put "\\your_txtfilename"
file1=open(r"C:\Users\SIDDHI\AppData\Local\Programs\Python\Python38\\1.txt","a")
file1.writelines(text)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 hmody3000