'Data hide automatically when converting text to DataFrame in Python
I have an issue with data hiding.When I print the extracted data as text, every data is shown properly. Below code is for printing extracted data and output is also given.
import os
import ocrmypdf
import pdfplumber
path= "G:\\SKM.pdf"
os.system(f'ocrmypdf {path} output.pdf')
ocrmypdf.ocr(path, "output.pdf")
invoice= pdfplumber.open("output.pdf")
count_pages= len(invoice.pages)
page=invoice.pages[count_pages-1]
text=page.extract_text(x_tolerance=2)
print(text)
Output:
Order Number : 202100050 Order Date : 25.11.2021
Client Number : 145 Delivery Date : Pending
Currency : Euro Contact Perso: Martin
Payment Condition : Due Email : [email protected]
When I convert to DataFrame and print the data, some data such as Order date, delivery date and email address have been partially hid. Output is given.
ds = pd.DataFrame(text.split('\n'))
print(ds)
Output:
1 Order Number : 202100050 Order Date : ...
2 Client Number : 145 Delivery Date : Pen...
3 Currency : Euro Contact Perso: Martin
4 Payment Condition : Due Email : martin@d...
What is the reason. How can I solve this issue?
Solution 1:[1]
Try using a pandas printing formater, like tabulate, that you must first install with pip install tabulate, and then you can use it to print the dataframe formated:
ds = pd.DataFrame(text.split('\n'))
print(ds.to_markdown())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
