'How can I get pdfplumber to recognise paragraphs in a cell?
I am using pdfplumber for table-extraction. By extracting a cell with multiple paragraphs pdfplumber recognises each paragraph as row, but they are in a cell and should be seen as one row. I tried some table_settings in the extract_table function, but could not change output. The code I used:
import pdfplumber
with pdfplumber.open(PDFPfad) as pdf:
Seite = pdf.pages[4]
Tabelle = Seite.extract_table()
print(Tabelle)
current output:
Tabelle = [
['Inhaltsstoff', 'CAS-Nr.', 'Wert', '', 'Zu', '', 'Grundlage'],
[None, None, None, None, 'überwachende', None, None],
[None, None, None, None, 'Parameter', None, None],
...
]
desired output:
Tabelle = [
['Inhaltsstoff', 'CAS-Nr.', 'Wert', '', 'Zu \nüberwachende \nParameter', '', 'Grundlage'],
...
]
I don't know which settings in extract_table(table_settings={...}) can lead to my desired output. I would be happy if you could help me.
Table example: https://i.stack.imgur.com/9oERz.png
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
