'Webscrapping returns []

it's a simple webscrapping and i'm having a lot of problems

it has to give me all the titles of a yt playlist

the html code:

<a id="video-title" class="yt-simple-endpoint style-scope ytd-playlist-video-renderer" href="/watch?v=hnqRXZZAqPw&amp;list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a&amp;index=1&amp;t=1542s" title="Annie Get Your Gun • Bernadette Peters • 1/3">
          Annie Get Your Gun • Bernadette Peters • 1/3
        </a>

my code: import requests from bs4 import BeautifulSoup

url = "https://www.youtube.com/playlist?list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a"
yt = requests.get(url)

soup = BeautifulSoup(yt.text, 'html.parser')

#t = soup.find_all("a", {"class": "yt-simple-endpoint style-scope ytd-playlist-video-renderer"})  

##my first ideia was something like that, didnt work. then a friend said to me to do like this:

t = soup.select(".yt-simple-endpoint.style-scope.ytd-playlist-video-renderer")

texts = [element.text.strip() for element in t]  

titles = [element.attrs.get("title") for element in t]  

print(t)
print(texts)
print(titles)  

but it only returns:

[]

[]

[]



Solution 1:[1]

I am not entirely sure of how requests works, but from my experience the HTML you get from requests.get() can be different from what you normally see from browser. This is related to how the server side (in this case, YouTube) works. [Welcome for any in-depth explanation on this part].

A workaround is to use Selenium, a web scraping package that mimics a browser navigation. A minimal example:

import time
from bs4 import BeautifulSoup

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

url = "https://www.youtube.com/playlist?list=PLojsoh8U3jSyIaRcvOhd6ecqsibcM0y2a"

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
time.sleep(10) # sleep for awhile to make sure the page is loaded

soup = BeautifulSoup(driver.page_source, 'lxml')
t = soup.find_all('a', {'class': 'yt-simple-endpoint style-scope ytd-playlist-video-renderer'})
texts = [element.text.strip() for element in t]  

print(texts)

When you run the code, a browser will fire up just as someone browsing.

Output:

[
    'Annie Get Your Gun • Bernadette Peters • 1/3',
    'Annie Get Your Gun • Bernadette Peters • 2/3', 
    ...
]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tyson.wu