'Title of webpage printing as None, BeautifulSoup [closed]
I am trying to scrape data from this website and am not able to get the title of the webpage.
My code-
import requests
from bs4 import BeautifulSoup
base_url = "https://www.stfrancismedicalcenter.com/find-a-provider/"
content = requests.get(url = base_url).content
soup = BeautifulSoup(content, "html.parser")
profile_link = soup.find("a", {"class": "flex-top-between-block-500"}).get("href")
profile_url = base_url + profile_link[1:]
profile_content = requests.get(url = profile_url).content
profile_soup = BeautifulSoup(profile_content, "html.parser")
print(profile_soup.title.string)
This is the output am getting.
[Running] python -u "d:\Personal\CS\Web Scrapping\first.py"
None
[Done] exited with code=0 in 3.592 seconds
I'd like some suggestions on this!
Solution 1:[1]
Issue here is that the concatinated path to profile is not correct, the part find-a-provider is duplicated and so it becomes:
https://www.stfrancismedicalcenter.com/find-a-provider//find-a-provider/adegbenga-a-adetola-md/
Instead using your url define a specific ""baseUrl:
profile_url = 'https://www.stfrancismedicalcenter.com' + profile_link
or
baseUrl = 'https://www.stfrancismedicalcenter.com'
profile_url = baseUrl + profile_link
Example
import requests
from bs4 import BeautifulSoup
url = "https://www.stfrancismedicalcenter.com/find-a-provider"
baseUrl = 'https://www.stfrancismedicalcenter.com'
content = requests.get(url).content
soup = BeautifulSoup(content, "html.parser")
profile_link = soup.find("a", {"class": "flex-top-between-block-500"}).get("href")
profile_url = baseUrl + profile_link
profile_content = requests.get(url = profile_url).content
profile_soup = BeautifulSoup(profile_content, "html.parser")
profile_soup.title.text
Output
Adegbenga A. Adetola MD
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HedgeHog |
