'Webscraping with BeautifulSoup, can't find table within html
I am trying to webscrape the main table from this site: https://www.atptour.com/en/stats/leaderboard?boardType=serve&timeFrame=52Week&surface=all&versusRank=all&formerNo1=false Here is my code:
import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd
url = "https://www.atptour.com/en/stats/leaderboard?boardType=serve&timeFrame=52Week&surface=all&versusRank=all&formerNo1=false"
request = requests.get(url).text
soup = BeautifulSoup(request, 'lxml')
divs = soup.findAll('tbody', id = 'leaderboardTable')
print(divs)
However, this is the only output of this: 
How do I access the rest of the html? It appears to not be there when I search through the soup. I have also attached an image of the html I am seeking to access. Any help is appreciated. Thank you!
Solution 1:[1]
There is an ajax request that fetches that data, however it's blocked by cloudscraper. There is a package that can bypass that, however doesn't seem to work for this site.
What you'd need to do now, is use something like Selenium to allow the page to render first, then pull the data.
from selenium import webdriver
import pandas as pd
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get("https://www.atptour.com/en/stats/leaderboard?boardType=serve&timeFrame=52Week&surface=all&versusRank=all&formerNo1=false")
df= pd.read_html(browser.page_source, header=0)[0]
browser.close()
Output:
Solution 2:[2]
Your code is working as expected. The HTML you are parsing does not have any data under the table.
$ wget https://www.atptour.com/en/stats/leaderboard\?boardType\=serve\&timeFrame\=52Week\&surface\=all\&versusRank\=all\&formerNo1\=false -O page.html
$ grep -C 3 'leaderboardTable' page.html
class="stat-listing-table-content no-pagination">
<table class="stats-listing-table">
<!-- TODO: This table head will only appear on DESKTOP-->
<thead id="leaderboardTableHeader" class="leaderboard-table-header">
</thead>
<tbody id="leaderboardTable"></tbody>
</table>
</div>
You have shown a screenshot of the developer view that does contain the data. I would guess that there is a Javascript that modifies the HTML after it is loaded and puts in the rows. Your browser is able to run this Javascript, and hence you see the rows. requests of course doesn't run any scripts, it only downloads the HTML.
You can do "save as" in your browser to get the reuslting HTML, or you will have to use a more advanced web module such as Selenium that can run scripts.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | chitown88 |
| Solution 2 | Jessica |


