'PDFtoHTML Ligatures and Entities

I have using popper pdftohtml and generate HTML file Successfully. But don't know how to solve the below points:

In Command Prompt used:

pdftohtml -c -s -enc Latin2 Sample.pdf

  1. Entities need named entity format like Ū instead of UTF character Ü.
  2. ligature character issue selfl essness should be selflessness.
  3. Removing additional space at punctuation.

Pdftohtml Version 21.04.0 on Windows-10. How to solve the above points?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source