'Beautifulsoup4 find_all not getting the results I need

I'm trying to get data from flashscore.com to a project I'm doing as a part of my self-tought Python study:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.flashscore.com/")
soup = BeautifulSoup(res.text, "lxml")
games = soup.find_all("div", {'class':['event__match', 'event__match--scheduled', 'event__match--twoLine']})
print(games)

When I run this, it gets me an empty list []

Why?



Solution 1:[1]

When an empty list is returned in find_all(), that means the elements that you specified could not be found.

Make sure that what you are trying to scrape isn't dynamically added such as an iframe in some cases

Solution 2:[2]

The failure is due to the fact that the website uses a set of Ajax technologies, specifically dynamically added content with the help of the JavaScript client scripting language. The client code for scripting languages is executed in the browser itself, not at the web server level. The success of such code depends on the browser's ability to interpret and execute it correctly. With the help of the BeatifulSoup library in the program you wrote, you only check the HTML code. JavaScript code can be open, for example, with the help of the Selenium library: https://www.selenium.dev/. Below is the full code for the data that I suppose you are interested in:


    # crawler_her_sel.py
    # -*- coding: utf-8 -*-
    
    import time
    
    from selenium.webdriver import Firefox
    from selenium.webdriver.firefox.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from bs4 import BeautifulSoup
    import pandas as pd
    
    # Variable with the URL of the website.
    my_url = "https://www.flashscore.com/"
    
    # Preparing of the browser for the work.
    options = Options()
    options.add_argument("--headless")
    driver = Firefox(options=options)
    driver.get(my_url)
    
    # Prepare the blank dictionary to fill in for pandas.
    dictionary_of_matches = {}
    
    # Preparation of lists with scraped data.
    list_of_countries = []
    list_of_leagues = []
    list_of_home_teams = []
    list_of_scores_for_home = []
    list_of_scores_for_away = []
    list_of_away_teams = []
    
    # Wait for page to fully render
    try:
        element = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.CLASS_NAME, \
                "boxOverContent__bannerLink")))
    finally:
        # Loads the website code as the BeautifulSoup object.
        pageSource = driver.page_source
        bsObj = BeautifulSoup(pageSource, "lxml")
    
        # Determining the number of the football matches with the help of 
        # the BeautifulSoup.
        games_1 = bsObj.find_all("div", {"class":\
        "event__participant event__participant--home"})
        games_2 = bsObj.find_all("div", {"class":\
        "event__participant event__participant--home fontBold"})
        games_3 = bsObj.find_all("div", {"class":\
        "event__participant event__participant--away"})
        games_4 = bsObj.find_all("div", {"class":\
        "event__participant event__participant--away fontBold"})
    
        # Determining the number of the countries for the given football 
        # matches.
        countries = driver.find_elements(By.CLASS_NAME , "event__title--type")
    
        # Determination of the number that determines the number of 
        # the loop iterations.
        sum_to_iterate = len(countries) + len(games_1) + len(games_2) +\
         len(games_3) + len(games_4)
        
        for ind in range(1, (sum_to_iterate+1)):
            # Scraping of the country names.
            try:
                country = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+\
                 ']/div[1]/div/span[1]').text
                list_of_countries.append(country)
            except:
                country = ""
                list_of_countries.append(country)
    
            # Scraping of the league names.
            try:
                league = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+\
                 ']/div[1]/div/span[2]').text
                list_of_leagues.append(league)
            except:
                league = ""
                list_of_leagues.append(league)
    
            # Scraping of the home team names.
            try:
                home_team = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+']/div[3]').text
                list_of_home_teams.append(home_team)
            except:
                home_team = ""
                list_of_home_teams.append(home_team)
    
            # Scraping of the home team scores.
            try:
                score_for_home_team = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+']/div[5]').text
                list_of_scores_for_home.append(score_for_home_team)
            except:
                score_for_home_team = ""
                list_of_scores_for_home.append(score_for_home_team)
    
            # Scraping of the away team scores.
            try: 
                score_for_away_team = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+']/div[6]').text
                list_of_scores_for_away.append(score_for_away_team)
            except:
                score_for_away_team = ""
                list_of_scores_for_away.append(score_for_away_team)
    
            # Scraping of the away team names.
            try:
                away_team = driver.find_element(By.XPATH ,\
                 '//div[@class="sportName soccer"]/div['+str(ind)+']/div[4]').text
                list_of_away_teams.append(away_team)
            except:
                away_team = ""
                list_of_away_teams.append(away_team)
    
        # Add lists with the scraped data to the dictionary in the correct 
        # order.
        dictionary_of_matches["Countries"] = list_of_countries
        dictionary_of_matches["Leagues"] = list_of_leagues
        dictionary_of_matches["Home_teams"] = list_of_home_teams
        dictionary_of_matches["Scores_for_home_teams"] = list_of_scores_for_home
        dictionary_of_matches["Scores_for_away_teams"] = list_of_scores_for_away
        dictionary_of_matches["Away_teams"] = list_of_away_teams
    
        # Creating of the frame for the data with the help of the pandas 
        # package.
        df_res = pd.DataFrame(dictionary_of_matches)
    
        # Saving of the properly formatted data to the csv file. The date 
        # and the time of the scraping are hidden in the file name.
        name_of_file = lambda: "flashscore{}.csv".format(time.strftime(\
            "%Y%m%d-%H.%M.%S"))
        df_res.to_csv(name_of_file(), encoding="utf-8")
    
        driver.quit()

The result of the script is a csv file, which, when loaded as data into Excel, gives the following table, e.g.:

Scraping result

It is worth mentioning here to download the necessary driver for your browser: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.
In addition, I give you links to two other interesting scripts that relate to scraping from the https://www.flashscore.com/ portal, i.e.: How can i scrape a football results from flashscore using python and Scraping stats with Selenium.
I would also like to raise legal issues here. The robots.txt file downloaded from the https://www.flashscore.com/robots.txt website looks like this:

File robots.txt

It shows that you can scrape the home page. But the „General Terms of Use” says that quoting „Without prior authorisation in writing from the Provider, Visitors are not authorised to copy, modify, tamper with, distribute, transmit, display, reproduce, transfer, upload, download or otherwise use or alter any of the content of the App. ”
This, unfortunately, introduces ambiguity and ultimately it is not clear what the owner really wants. Therefore, I recommend that you do not use this script constantly, and certainly not for commercial purposes and I ask other visitors for this that visit this website. I myself wrote this script for the purpose of learning to scrape and I do not intend to use it at all. The finished script can be downloaded from my GitHub.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rami M
Solution 2