'Webscraping England Hockey Python BeautifulSoup
I am trying to BeautifulSoup to get the table found in this link: https://gms.englandhockey.co.uk/fixtures-and-results/competitions.php?comp=4154007
It's an England Hockey website and basically I want to download the table and put it in a DataFrame, and also eventually get the fixtures as well.
Whenever I try and find the right div or table, it returns None.
Here's what I have tried:
url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
I have tried to find the div the table is within, but it returns None.
bread_crumbs = soup.find("div", class_="container")
print(bread_crumbs
Again, I try to find the table but it returns None.
bread_crumbs = soup.find("table")
print(bread_crumbs)
If anyone can suggest a way to access the table, I would be grateful! It might be that Selenium would be better for this, but I haven't used Selenium yet so I am not sure how it would start.
As you can see from the link, it's a php website, so could this be part of the reason?
Solution 1:[1]
Because to access this site you must agree to the use of cookies, and accept their terms and condition
replace request with below code and try again
import requests
from bs4 import BeautifulSoup
url = "https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800"
headers = {
'Cookie': 'visitor-id=vPF0YU5Q; visitor-id-2=bQJHxVCjcBs4Qlmoy72Wzw%3D%3D; ImportantCookie=0; consentCookie=1; ImportantCookie=0; visitor-id=vPF0YUxZ; visitor-id-2=ZrPCssshUkv7rwB6MVkM2A%3D%3D'
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, "html.parser")
bread_crumbs = soup.find("table")
print(bread_crumbs)
Solution 2:[2]
You could let Selenium render the page then pull, or need to use the Cookie in the headers.
Both solutions provided work. The difference in mine is it utilizes a Session to create the cookie, as opposed to hard coding it.
Code:
import pandas as pd
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
# Crerate the Session and acces the site to aquire the cookies
s = requests.Session()
response = s.get('https://gms.englandhockey.co.uk/fixtures-and-results/', headers=headers)
# Create the cookie string to be used in the ehaders
cookies = response.cookies.get_dict()
cookieStr = 'consentCookie=1;'
for k, v in cookies.items():
cookieStr += f'{k}={v};'
headers.update({'Cookie':cookieStr})
# Go get the tables
url = 'https://gms.englandhockey.co.uk/fixtures-and-results/club.php?id=Royal%20Holloway%20HC&prev=4153800'
response = s.get(url, headers=headers)
dfs = pd.read_html(response.text)
for df in dfs:
print(df)
Output:
for df in dfs:
print(df.to_string())
Date Time Competition Home Team Away Team Venue Officials
0 12-Mar 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) Addiscombe 1 (F) NaN NaN
1 NaN 12:30 South East Men's Division 6 Oaks Fleet And Ewshot 4 (M) Royal Holloway 1 (M) Army Hockey Centre - Pitch 2 NaN
2 19-Mar 10:30 South East Women's Division 2 Oaks Oxted 2 (F) Royal Holloway 1 (F) Oxted School NaN
3 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) Woking 6 (M) NaN NaN
4 26-Mar 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) Epsom 4 (F) NaN NaN
5 02-Apr NaN South East Women's Division 2 Oaks Guildford 4 (F) Royal Holloway 1 (F) NaN NaN
6 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) Cranleigh 2 (M) NaN NaN
Date Time Competition Home Team Score Away Team Venue
0 05-Mar 14:15 South East Women's Division 2 Oaks Aldershot And Farnham 3 (F) 1 : 0 Royal Holloway 1 (F) Heath End
1 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 2 : 2 Epsom 7 (M) NaN
2 26-Feb 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) 1 : 7 Woking 4 (F) NaN
3 NaN 10:00 South East Men's Division 6 Oaks Kenley 3 (M) 0 : 7 Royal Holloway 1 (M) Warlingham County Secondary School
4 12-Feb 12:30 South East Women's Division 2 Oaks Reigate Priory 3 (F) NaN Royal Holloway 1 (F) St Bedes School
5 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 8 : 1 Reigate Priory 6 (M) NaN
6 05-Feb 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) NaN Horley 1 (F) NaN
7 NaN 10:30 South East Men's Division 6 Oaks Horley 3 (M) 0 : 3 Royal Holloway 1 (M) Hurstpierpoint College - Pitch 2
8 29-Jan NaN South East Women's Division 2 Oaks Kenley 1 (F) (HWO) 5 : 0 Royal Holloway 1 (F) NaN
9 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) (HWO) 5 : 0 Old Reigatian 3 (M) NaN
10 22-Jan 10:00 South East Women's Division 2 Oaks Royal Holloway 1 (F) 0 : 4 Cranleigh 1 (F) Cranleigh School - Pitch 1
11 NaN 15:30 South East Men's Division 6 Oaks Croydon & Old Whitgiftian Rustlers (M) 1 : 4 Royal Holloway 1 (M) Monks Hill Sports Centre
12 15-Jan 11:00 South East Women's Division 2 Oaks Ashford (Middlesex) 1 (F) (HWO) 5 : 0 Royal Holloway 1 (F) Ashford (Middx) Hockey Club
13 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 2 : 0 Sanderstead Cats (M) NaN
14 11-Dec NaN South East Women's Division 2 Oaks Royal Holloway 1 (F) 0 : 7 Guildford 4 (F) NaN
15 NaN 10:00 South East Men's Division 6 Oaks Cranleigh 2 (M) 1 : 1 Royal Holloway 1 (M) Cranleigh School - Pitch 1
16 04-Dec 09:45 South East Women's Division 2 Oaks Epsom 4 (F) 3 : 1 Royal Holloway 1 (F) Epsom HC - Old Schools Lane
17 27-Nov 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) 1 : 3 Oxted 2 (F) NaN
18 NaN 11:00 South East Men's Division 6 Oaks Woking 6 (M) 2 : 4 Royal Holloway 1 (M) Woking HC - Pitch 2
19 20-Nov 15:30 South East Women's Division 2 Oaks Addiscombe 1 (F) 1 : 0 Royal Holloway 1 (F) Woldingham School
20 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 0 : 1 Fleet And Ewshot 4 (M) NaN
21 13-Nov 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) 1 : 2 Aldershot And Farnham 3 (F) NaN
22 NaN 15:30 South East Men's Division 6 Oaks Epsom 7 (M) 0 : 2 Royal Holloway 1 (M) Therfield School
23 06-Nov 12:30 South East Women's Division 2 Oaks Woking 4 (F) (HWO) 5 : 0 Royal Holloway 1 (F) Woking HC - Pitch 2
24 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) (HWO) 5 : 0 Kenley 3 (M) NaN
25 30-Oct 12:30 South East Women's Division 2 Oaks Royal Holloway 1 (F) 1 : 7 Reigate Priory 3 (F) NaN
26 NaN 14:00 South East Men's Division 6 Oaks Reigate Priory 6 (M) 0 : 6 Royal Holloway 1 (M) St Bedes School
27 16-Oct 12:00 South East Women's Division 2 Oaks Horley 1 (F) (HWO) 5 : 0 Royal Holloway 1 (F) Worth School
28 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 5 : 0 Horley 3 (M) NaN
29 09-Oct 15:00 South East Women's Division 2 Oaks Royal Holloway 1 (F) 3 : 4 Kenley 1 (F) NaN
30 NaN 13:00 South East Men's Division 6 Oaks Old Reigatian 3 (M) 0 : 4 Royal Holloway 1 (M) Royal Alexandra & Albert School
31 02-Oct 14:30 South East Women's Division 2 Oaks Cranleigh 1 (F) 5 : 0 Royal Holloway 1 (F) NaN
32 NaN 12:30 South East Men's Division 6 Oaks Royal Holloway 1 (M) 2 : 3 Croydon & Old Whitgiftian Rustlers (M) NaN
33 25-Sep NaN South East Women's Division 2 Oaks Royal Holloway 1 (F) 1 : 17 Ashford (Middlesex) 1 (F) NaN
34 NaN 14:30 South East Men's Division 6 Oaks Sanderstead Cats (M) 1 : 1 Royal Holloway 1 (M) Caterham School
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | chitown88 |
