'how to Extract data from pdf by using python [duplicate]
I want to know how to extract data from pdf by using python language on pycharm .I tried to code by using pycharm by importing from pypdf2 but yet it is not showing results.
Solution 1:[1]
PyPDF2, PyPDF3, and PyPDF4 are all unmaintained. I would recommend taking a look at this question and trying one of the many different methods discussed.
According to the PyPDF2 documentation, the extractText() method "works well for some PDF files, but poorly for others, depending on the generator used". Without seeing your code, a large factor in why your code is not working may be incompatibility with the PDF file itself.
Solution 2:[2]
Use this code
from PyPDF2 import PdfFileReader
reader = PdfFileReader(filename)
pageObj = reader.getNumPages()
for page_count in range(pageObj):
page = reader.getPage(page_count)
page_data = page.extractText()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | farzany |
| Solution 2 | Shubham Korade |
