'Beautiful Soup find() function returning none even though element exists and find() works for other elements on the page?

I am trying to create a web scraper using Python and BeautifulSoup4 in order to get the data from the Billboard Hot 100 charts https://www.billboard.com/charts/hot-100/ .

For some reason the find()/findAll() functions do not work for getting the artist of the #1 song. But they work for getting the #1 song's title, as well as everything from #2-#100, so I'm not sure what I'm doing wrong.

The code that's returning None (or just an empty list when using findAll):

# GET #1 ARTIST (CURRENTLY NOT WORKING)
topArtist = soup.find("p", {"class": "c-tagline  a-font-primary-l a-font-primary-m@mobile-max lrv-u-color-black u-color-white@mobile-max lrv-u-margin-tb-00 lrv-u-padding-t-025 lrv-u-margin-r-150"})

The HTML tags are different for the artist/song at #1, hence why I'm using a separate statement than the other 99 songs. But since I got the song at #1and I'm following the same format I don't know why it's not working.

This is the HTML tag (I want to get the "Glass Animals"):

<p class="c-tagline  a-font-primary-l a-font-primary-m@mobile-max lrv-u-color-black u-color-white@mobile-max lrv-u-margin-tb-00 lrv-u-padding-t-025 lrv-u-margin-r-150">Glass Animals</p>

This is my working code for the 2-100 positions:

from bs4 import BeautifulSoup
import requests    

url = "https://www.billboard.com/charts/hot-100/"
result = requests.get(url)
soup = BeautifulSoup(result.text, "html.parser")

# GET ARTISTS 2-100
artist = soup.findAll("span", {"class": "c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line"
                                        "-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-"
                                        "truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only"
                               })

for i in range(99):
    artist_list.append(artist[i].text)


# GET #1 SONG
topSong = soup.find("a", {"href": "#",
                          "class": "c-title__link lrv-a-unstyle-link"})
song_list.append(topSong.text)

# GET SONGS 2-100
song = soup.findAll("h3", {"class": "c-title a-no-trucate a-font-primary-bold-s u-letter-spacing-0021 lrv-u-font-size"
                                    "-18@tablet lrv-u-font-size-16 u-line-height-125 u-line-height-normal@mobile-max "
                                    "a-truncate-ellipsis u-max-width-330 u-max-width-230@tablet-only",
                           "id": "title-of-a-story"})

for i in range(99):
    song_list.append(song[i].text)

I've looked all over and can't find how to fix it, using selenium webdriver didn't change anything for me. Any help would be appreciated.



Solution 1:[1]

Here's the code I used to scrape ALL 100 songs and their authors. This website is really horrible to scrape because it doesn't use ids or classes in a scrapable manner, so instead I relied (mostly) on the current structure of the page. I'm not sure what exactly was causing your problem. The page was made with a framework so it was littered with styling classes. Your selection was fickle because it relied on these being consistent. Perhaps the 1st element was styled differently (actually, this is almost certainly the case, notice how the cover image is bigger on the actual page).

from bs4 import BeautifulSoup
import requests    
x = requests.get("https://www.billboard.com/charts/hot-100/").text
soup = BeautifulSoup(x, "html.parser")
chart = soup.find("div",class_="lxml")
#div.chart-results-list > div.o-chart-results-list-row-container > ul.o-chart-results-list-row"

songNames = [x.text for x in soup.select("div.chart-results-list > div.o-chart-results-list-row-container > ul.o-chart-results-list-row > li:nth-child(4) > ul > li:nth-child(1) h3")]
authorNames = [x.text for x in soup.select("div.chart-results-list > div.o-chart-results-list-row-container > ul.o-chart-results-list-row > li:nth-child(4) > ul > li:nth-child(1) span")]
print(songNames)
#print(authorNames)
print(len(songNames))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 JadeSpy