'Convert an HTML string to a .txt file in Python

I have an HTML string which is guaranteed to only contain text (i.e. no images, videos, or other assets). However, just to note, there might be formatting with some of the text like some of them might be bold.

Is there a way to convert the HTML string output to a .txt file? I don't care about maintaining the formatting but I do want to maintain the spacing of the text.

Is that possible with Python?



Solution 1:[1]

I had a similar problem earlier where I needed to write the code for Echarts (the Framework for generating front-end diagrams) to a file.Maybe you can refer to it

# Generate HTML file
html_file = open(file_path, "w")
html_content = """
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
    <title>ECharts</title>
    <script src="echarts.min.js"></script>
  </head>
  <body>
    <div id="main" style="width: 1200px;height:800px;"></div>
    <script type="text/javascript">
      var myChart = echarts.init(document.getElementById('main'));

      var option = {
            title: {
              text: 'Memory Monitor'
            },
            tooltip: {
              trigger: 'axis'
            },
            legend: {
              data: ['%(package_name)s']
            },
            toolbox: {
              feature: {
                saveAsImage: {}
              }
            },
            xAxis: {
              type: 'category',
              boundaryGap: false,
              data: %(x_axis)s
            },
            yAxis: {
              type: 'value',
              scale : true,
              max : 20000,
              min : 8000,
              splitNumber : 5,
              boundaryGap : [ 0.2, 0.2 ]
            },
            dataZoom:[{
              type: 'slider',
              show: true,
              realtime: true,
              start: 0,
              end: 100
            }],
            series: [
              {
                name: '%(package_name)s',
                type: 'line',
                stack: 'Total',
                data: %(y_axis)s
              }
            ]
          };

      myChart.setOption(option);
    </script>
  </body>
</html>
""" % dict(package_name=package_name, x_axis=x_axis, y_axis=y_axis)
# Written to the file
html_file.write(html_content)
# Close file
html_file.close()

Solution 2:[2]

#!/usr/bin/env python

import urllib2
import html2text
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://example.com/page.html').read())

txt = soup.find('div', {'class' : 'body'})

print(html2text.html2text(txt))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 CN-LanBao
Solution 2