'Webscrapping returns []
it's a simple webscrapping and i'm having a lot of problems
it has to give me all the titles of a yt playlist
the html code:
<a id="video-title" class="yt-simple-endpoint style-scope ytd-playlist-video-renderer" href="/watch?v=hnqRXZZAqPw&list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a&index=1&t=1542s" title="Annie Get Your Gun • Bernadette Peters • 1/3">
Annie Get Your Gun • Bernadette Peters • 1/3
</a>
my code: import requests from bs4 import BeautifulSoup
url = "https://www.youtube.com/playlist?list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a"
yt = requests.get(url)
soup = BeautifulSoup(yt.text, 'html.parser')
#t = soup.find_all("a", {"class": "yt-simple-endpoint style-scope ytd-playlist-video-renderer"})
##my first ideia was something like that, didnt work. then a friend said to me to do like this:
t = soup.select(".yt-simple-endpoint.style-scope.ytd-playlist-video-renderer")
texts = [element.text.strip() for element in t]
titles = [element.attrs.get("title") for element in t]
print(t)
print(texts)
print(titles)
but it only returns:
[]
[]
[]
Solution 1:[1]
I am not entirely sure of how requests works, but from my experience the HTML you get from requests.get() can be different from what you normally see from browser. This is related to how the server side (in this case, YouTube) works.
[Welcome for any in-depth explanation on this part].
A workaround is to use Selenium, a web scraping package that mimics a browser navigation. A minimal example:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
url = "https://www.youtube.com/playlist?list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a"
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
time.sleep(10) # sleep for awhile to make sure the page is loaded
soup = BeautifulSoup(driver.page_source, 'lxml')
t = soup.find_all('a', {'class': 'yt-simple-endpoint style-scope ytd-playlist-video-renderer'})
texts = [element.text.strip() for element in t]
print(texts)
When you run the code, a browser will fire up just as someone browsing.
Output:
[
'Annie Get Your Gun • Bernadette Peters • 1/3',
'Annie Get Your Gun • Bernadette Peters • 2/3',
...
]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | tyson.wu |
