'Data hide automatically when converting text to DataFrame in Python

I have an issue with data hiding.When I print the extracted data as text, every data is shown properly. Below code is for printing extracted data and output is also given.

import os
import ocrmypdf
import pdfplumber
path= "G:\\SKM.pdf"
os.system(f'ocrmypdf {path} output.pdf')
ocrmypdf.ocr(path, "output.pdf")
invoice= pdfplumber.open("output.pdf")
count_pages= len(invoice.pages)
page=invoice.pages[count_pages-1]
text=page.extract_text(x_tolerance=2)
print(text)

Output:

 Order  Number  :  202100050   Order  Date  :  25.11.2021 
Client  Number  :  145  Delivery  Date  :  Pending 
Currency  :  Euro  Contact  Perso:  Martin 
Payment  Condition  :  Due  Email  :  [email protected]

When I convert to DataFrame and print the data, some data such as Order date, delivery date and email address have been partially hid. Output is given.

ds = pd.DataFrame(text.split('\n'))
print(ds) 

Output:

1   Order  Number  :  202100050   Order  Date  : ...
2  Client  Number  :  145  Delivery  Date  :  Pen...
3        Currency  :  Euro  Contact  Perso:  Martin 
4  Payment  Condition  :  Due  Email  :  martin@d...

What is the reason. How can I solve this issue?



Solution 1:[1]

Try using a pandas printing formater, like tabulate, that you must first install with pip install tabulate, and then you can use it to print the dataframe formated:

ds = pd.DataFrame(text.split('\n'))
print(ds.to_markdown())

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1