'Web scraping of coreyms.com
When I scrap the posts of website coreyms.com using BeautifulSoup, i.e., the heading, date, content and youtube link of the posts, I am facing this problem: all posts except one contains youtube link. So when I scrap the data, len(videolink)=9 and len(heading),len(date),len(content)=10. How can I make the len(videolink)=10 by inserting NaN in the post where youtube link is not present?
The code is given for reference:
from bs4 import BeautifulSoup
import requests
page7=requests.get('https://coreyms.com/')
page7
soup7=BeautifulSoup(page7.content)
soup7
heading=[]
for i in soup7.find_all('h2',class_='entry-title'):
heading.append(i.text)
heading
date=[]
for i in soup7.find_all('time',class_='entry-time'):
date.append(i.text)
date
content=[]
for i in soup7.find_all('div',class_='entry-content'):
content.append(i.text)
content
videolink=[]
for i in soup7.find_all('iframe',class_='youtube-player'):
videolink.append(i['src'])
videolink
print(len(heading),len(date),len(content),len(videolink))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
