'Python - Web Scraping Entire Page

Looking for a general way to scrape an entire web page, with this one as an example:

https://www.boxscoregeeks.com/players?sort=wins_produced&direction=desc&season=2021

Tried the following:

import requests
import pandas as pd
import bs4

html = requests.get("https://www.boxscoregeeks.com/players?sort=wins_produced&direction=desc&season=2021", headers={"User-Agent": "XY"}).content
df_list = pd.read_html(html)
soup = bs4.BeautifulSoup(html)

In both cases I get a lot of information but none from the big table in the middle of the page.

How do I in general scrape an entire web page as it appears to a human user like me?

python web-scraping

Solution 1:^[1]

There is no "one fits all" solution, you always have to check website and behavior.

Content is provided dynamically via JavaScript, so you wont get it that simple way with requests and BeautifulSoup, but you should take a look at there api:

import pandas as pd
import requests

jsonData = requests.get('https://www.boxscoregeeks.com/api/player_seasons').json()

pd.DataFrame(jsonData)

#or sort it by wins

pd.DataFrame(jsonData).sort_values(by='wins_produced', ascending=False)

Output

id	name	games	minutes	per48_position_adj_prod	wins_produced	per48_wins_produced	per48_points	per48_rebounds	per48_assists	per48_points_over_par	exact_position	team_abbreviations	firstname	lastname	is_rookie	updated_at	position	secondary_position	url
191209	Nikola Jokic	61	2020.7	0.688922	15.5068	0.36835	37.7691	19.9297	11.687	8.37679	5	den	Nikola	Jokic	False	March 14, 2022 15:42 UTC	C	C	/players/1500-nikola-jokic
191158	Chris Paul	58	1916.08	0.46432	13.3606	0.334697	21.6943	6.51329	15.5066	7.33018	1	pho	Chris	Paul	False	March 14, 2022 15:41 UTC	PG	PG	/players/211-chris-paul
190781	Giannis Antetokounmpo	56	1836.3	0.57629	12.9864	0.339459	43.5484	16.7554	8.65218	7.47829	4.35506	mil	Giannis	Antetokounmpo	False	March 14, 2022 15:41 UTC	PF	C	/players/1344-giannis-antetokounmpo
191216	Robert Williams	54	1611.42	0.65683	12.9212	0.384891	16.0257	15.7278	3.30641	8.8912	4.62594	bos	Robert	Williams	False	March 14, 2022 15:43 UTC	C	PF	/players/3372-robert-williams
191258	Rudy Gobert	52	1662.03	0.693075	12.8982	0.372504	23.162	22.1223	1.70394	8.50596	5	uth	Rudy	Gobert	False	March 14, 2022 15:41 UTC	C	C	/players/1378-rudy-gobert
191049	Tyrese Haliburton	62	2174.88	0.364672	11.3213	0.249862	20.6577	5.53961	10.704	4.69182	1.74806	sac,ind	Tyrese	Haliburton	False	March 14, 2022 15:40 UTC	SG	PG	/players/4157-tyrese-haliburton
191100	Dejounte Murray	58	2016.35	0.396915	11.2593	0.268031	28.4712	11.7599	12.9501	5.25687	1.03584	sas	Dejounte	Murray	False	March 14, 2022 15:40 UTC	PG	SG	/players/3188-dejounte-murray

Another alternative could be to use selenium, to render website first and scrape based on that rendered page_source.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service

url = f'https://www.boxscoregeeks.com/players?sort=wins_produced&direction=desc&season=2021'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
time.sleep(10)
df = pd.read_html(repr(driver.page_source))[0]
driver.close()
df

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Python - Web Scraping Entire Page

Solution 1:[1]

Output

Sources

Related Questions

Solution 1:^[1]