'how can i scrape medium content and get all the h1 nad p tag in strings

I have been trying to scrape medium content but was aunable to get all the h1 tag, I was able to get all p-tag all to the end but the h1-tag is missing in between the text

I want to be able to scrape all the content in order of appearance along with all the subheadings in h1 tag

this is what i have done

import stuff

import requests
import bs4
import os
import shutil
from PIL import Image
article_URL = 'https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c' #@param {type:"string"}
# article_URL = 'https://www.tmz.com/2020/07/29/dr-dre-answers-wife-divorce-petition-prenup/'
response = requests.get(article_URL)
soup = bs4.BeautifulSoup(response.text,'html')
paragraphs = soup.find_all(['li', 'p', 'strong', 'em'])
title = soup.find(['h1','title']).get_text()
print(title)
txt_list = []
tag_list = []
with open('content2.txt', 'w') as f:
  f.write(title + '\n\n')
  for p in paragraphs:
        if p.href:
            pass
        else:
            if len(p.get_text()) > 100: # this filters out things that are most likely not part of the core article
#                 print(p.href)
                tag_list.append(p.name)
                txt_list.append(p.get_text())

txt_list2 = []
tag_list2 = []
for i in range(len(txt_list)):
#     if '\n' not in txt_list[i]:
    print(txt_list[i])
#         print(len(txt_list[i]))
#     print(tag_list[i])
    print()
    comp1 = txt_list[i].split()[0:5]
    comp2 = txt_list[i-1].split()[0:5]
    if comp1 == comp2:
        pass
    else:
        pass

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'how can i scrape medium content and get all the h1 nad p tag in strings

import stuff

Sources

Related Questions