'PDFtoHTML Ligatures and Entities
I have using popper pdftohtml and generate HTML file Successfully. But don't know how to solve the below points:
In Command Prompt used:
pdftohtml -c -s -enc Latin2 Sample.pdf
- Entities need
named entity formatlikeŪinstead of UTF characterÜ. - ligature character issue
selfl essnessshould beselflessness. - Removing additional space at punctuation.
Pdftohtml Version 21.04.0 on Windows-10. How to solve the above points?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
