'Scrape site that has data saved as list
I'm trying to scrape the site data from url below, but it's saved in one massive list it seems. Was trying to put the columns I wanted in a list and concat to df but can't isolate by the column headers on the page (CF/60, xGF/60, etc). End result to have df resembling layout layout on the webpage. New at coding but I'm sure this could be written more efficiently. See code below:
import bs4, requests
import pandas as pd
from fake_useragent import UserAgent
ua = UserAgent()
chrome = ua.chrome
url = f'https://naturalstattrick.com/teamtable.php?fromseason=20212022&thruseason=20212022&stype=2&sit=sva&score=all&rate=y&team=all&loc=H&gpf=410&fd=&td='
r = requests.get(url, headers={'User-Agent': chrome})
soup = bs4.BeautifulSoup(r.text, 'html.parser')
table = soup.find('table', id = 'teams')
ts = pd.DataFrame()
dl = []
body = table.find_all('tbody')
for team in body:
team_data = team.find_all('tr')
for data in team_data:
stats = data.find_all('td', class_='lh')
print(stats)
for d in team_data:
numbers = d.find_all('td')
for n in numbers:
print(n.text)
Would someone be willing to help me please?
Solution 1:[1]
I recommend you use pandas.read_html() which can read all tables like content from url.
import pandas as pd
url = f'https://naturalstattrick.com/teamtable.php?fromseason=20212022&thruseason=20212022&stype=2&sit=sva&score=all&rate=y&team=all&loc=H&gpf=410&fd=&td='
dfs = pd.read_html(url, index_col=0)
df = dfs[0]
print(df)
Team GP TOI/GP W L OTL ROW Points Point % ... LDSF% LDGF/60 LDGA/60 LDGF% LDSH% LDSV% SH% SV% PDO
1 Tampa Bay Lightning 29 47.4034 19 6 4 18 42 0.724 ... 49.98 0.38 0.40 48.75 3.57 96.25 8.43 92.55 1.010
2 Vegas Golden Knights 36 49.5505 20 13 3 19 43 0.597 ... 58.13 0.32 0.35 48.29 2.23 96.68 8.12 90.74 0.989
3 Toronto Maple Leafs 33 50.0242 24 7 2 23 50 0.758 ... 49.56 0.27 0.29 48.10 2.33 97.53 7.68 91.27 0.989
4 Washington Capitals 34 48.5299 16 13 5 14 37 0.544 ... 50.36 0.33 0.31 51.94 2.69 97.48 8.23 91.24 0.995
5 Colorado Avalanche 33 47.4298 26 4 3 24 55 0.833 ... 55.51 0.56 0.39 59.05 3.56 96.92 8.73 93.04 1.018
6 Edmonton Oilers 32 48.1698 20 12 0 17 40 0.625 ... 52.86 0.26 0.52 33.51 1.86 95.86 7.79 91.12 0.989
7 Buffalo Sabres 33 50.1535 12 16 5 10 29 0.439 ... 44.02 0.17 0.48 25.62 1.31 97.02 8.33 91.38 0.997
8 Ottawa Senators 34 48.7044 12 19 3 12 27 0.397 ... 49.99 0.34 0.26 56.07 2.51 98.03 6.71 91.95 0.987
9 Detroit Red Wings 34 48.4289 17 12 5 16 39 0.574 ... 46.62 0.35 0.41 46.09 2.98 96.96 9.21 90.61 0.998
10 Florida Panthers 32 48.0438 26 6 0 25 52 0.813 ... 58.29 0.44 0.27 61.81 3.08 97.34 9.41 91.42 1.008
11 New York Rangers 31 49.6167 22 6 3 19 47 0.758 ... 49.22 0.37 0.24 60.43 3.05 98.07 8.65 93.13 1.018
12 Carolina Hurricanes 33 47.0545 24 5 4 24 52 0.788 ... 59.27 0.46 0.31 59.42 2.88 97.14 7.85 92.86 1.007
13 Columbus Blue Jackets 34 48.9554 18 13 3 16 39 0.574 ... 44.54 0.49 0.72 40.38 4.14 95.09 9.69 90.33 1.000
14 Nashville Predators 31 46.7414 20 11 0 18 40 0.645 ... 50.35 0.21 0.39 34.77 1.67 96.82 8.29 93.10 1.014
15 Anaheim Ducks 34 47.6917 16 14 4 14 36 0.529 ... 49.46 0.33 0.46 41.55 2.86 96.06 7.21 91.67 0.989
16 Los Angeles Kings 35 48.0414 18 13 4 15 40 0.571 ... 55.52 0.21 0.11 66.61 1.57 99.02 6.76 91.66 0.984
17 New Jersey Devils 33 49.2258 16 14 3 14 35 0.530 ... 47.12 0.28 0.35 44.90 2.46 97.31 7.56 90.42 0.980
18 Philadelphia Flyers 34 48.7598 13 15 6 13 32 0.471 ... 47.42 0.38 0.41 47.84 2.90 97.15 8.27 91.56 0.998
19 Boston Bruins 32 47.5797 20 10 2 19 42 0.656 ... 58.72 0.31 0.52 37.54 1.92 95.46 6.47 89.47 0.959
20 Montreal Canadiens 33 48.4520 10 19 4 9 24 0.364 ... 47.83 0.29 0.47 38.02 2.31 96.55 6.70 91.05 0.977
21 Pittsburgh Penguins 33 48.8308 19 9 5 17 43 0.652 ... 52.20 0.19 0.31 37.51 1.43 97.40 7.96 92.09 1.000
22 San Jose Sharks 34 48.7069 16 14 4 14 36 0.529 ... 39.56 0.21 0.45 32.25 2.30 96.83 6.95 90.90 0.978
23 Calgary Flames 33 48.5475 21 6 6 20 48 0.727 ... 58.20 0.47 0.27 63.99 3.44 97.30 8.61 91.91 1.005
24 Arizona Coyotes 32 48.8130 9 22 1 9 19 0.297 ... 39.38 0.35 0.36 49.45 3.41 97.73 9.40 90.94 1.003
25 Chicago Blackhawks 32 48.2693 11 16 5 10 27 0.422 ... 44.82 0.32 0.49 39.45 2.64 96.71 7.11 90.80 0.979
26 Minnesota Wild 31 46.6710 23 7 1 21 47 0.758 ... 49.94 0.77 0.55 58.55 5.37 96.21 9.97 91.65 1.016
27 Winnipeg Jets 34 48.9397 19 13 2 18 40 0.588 ... 51.63 0.39 0.29 57.70 3.00 97.65 7.74 92.32 1.001
28 Dallas Stars 31 48.2016 21 9 1 20 43 0.694 ... 47.56 0.11 0.40 21.38 1.04 96.54 8.99 91.48 1.005
29 St Louis Blues 33 48.6293 20 9 4 19 44 0.667 ... 47.65 0.47 0.40 54.18 4.77 96.33 10.18 92.01 1.022
30 Seattle Kraken 32 48.1479 11 18 3 10 25 0.391 ... 47.02 0.55 0.57 49.01 4.74 95.62 8.19 89.43 0.976
31 Vancouver Canucks 32 48.7359 14 13 5 12 33 0.516 ... 52.24 0.35 0.08 81.75 2.65 99.35 6.47 93.90 1.004
32 New York Islanders 34 48.9333 17 13 4 17 38 0.559 ... 46.20 0.38 0.37 50.53 4.06 96.59 7.83 93.32 1.012
[32 rows x 71 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
