'Python convert docx to html using mammoth: html, head and body tag missing

I am trying to convert a simple docx file into HTML file using mammoth package. But it seems that the generated html contains only part of a full HTML file: the HTML, head, and body tags are all missing in the generated html string.

I wonder if there are parameters to make the HTML become valid HTML code.

Solution 1:^[1]

I read the doc and haven't found an option to generate the full HTML. Since the generated HTML is just a string, it is easy to make it a full HTML-compliant:

import mammoth

with open("test.docx", "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file)
    html = result.value  # The generated HTML
    messages = result.messages  # Any messages,

    full_html = (
        '<!DOCTYPE html><html><head><meta charset="utf-8"/></head><body>'
        + html
        + "</body></html>"
    )

    with open("test.html", "w", encoding="utf-8") as f:
        f.write(full_html)

In the above code, we just prepend and append the necessary opening and closing tags to make the html string a valid HTML source code.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Python convert docx to html using mammoth: html, head and body tag missing

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]