'Python convert docx to html using mammoth: html, head and body tag missing
I am trying to convert a simple docx file into HTML file using mammoth package. But it seems that the generated html contains only part of a full HTML file: the HTML, head, and body tags are all missing in the generated html string.
I wonder if there are parameters to make the HTML become valid HTML code.
Solution 1:[1]
I read the doc and haven't found an option to generate the full HTML. Since the generated HTML is just a string, it is easy to make it a full HTML-compliant:
import mammoth
with open("test.docx", "rb") as docx_file:
result = mammoth.convert_to_html(docx_file)
html = result.value # The generated HTML
messages = result.messages # Any messages,
full_html = (
'<!DOCTYPE html><html><head><meta charset="utf-8"/></head><body>'
+ html
+ "</body></html>"
)
with open("test.html", "w", encoding="utf-8") as f:
f.write(full_html)
In the above code, we just prepend and append the necessary opening and closing tags to make the html string a valid HTML source code.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
