'Outputting tag from BeautifulSoup gives html and None
I am having trouble with getting the tag from a BeautifulSoup .find() method.
Here is my code:
url = evaluations['href']
page = requests.get(url, headers = HEADERS)
soup = BeautifulSoup(page.content, 'lxml')
evaluators = soup.find("section", class_="main-content list-content")
evaluators_list = evaluators.find("ul", class_='evaluation-list').find_all("li")
evaluators_dict = defaultdict(dict)
for evaluator in evaluators_list:
eval_list = evaluator.find('ul', class_='highlights-list')
print(eval_list.prettify())
This then gives the output:
<ul class="highlights-list">
<li class="eval-meta evaluator">
<b class="uppercase heading">
Evaluated By
</b>
<img alt="Andrew Ivins" height="50" src="https://s3media.247sports.com/Uploads/Assets/680/358/9358680.jpeg?fit=bounds&crop=50:50,offset-y0.50&width=50&height=50&fit=crop" title="Andrew Ivins" width="50"/>
<div class="evaluator">
<b class="text">
Andrew Ivins
</b>
<span class="uppercase">
Southeast Recruiting Analyst
</span>
</div>
</li>
<li class="eval-meta projection">
<b class="uppercase heading">
Projection
</b>
<b class="text">
First Round
</b>
</li>
<li class="eval-meta">
<b class="uppercase heading">
Comparison
</b>
<a href="https://247sports.com/Player/Charles-Woodson-76747/" target="_blank">
Charles Woodson
</a>
<span class="uppercase">
Oakland Raiders
</span>
</li>
</ul>
and the error
Traceback (most recent call last):
File "XXX", line 2, in <module>
player = Player("Travis-Hunter-46084728").player
File "XXX", line 218, in __init__
self.player = self._parse_player()
File "XXX", line 253, in _parse_player
evaluators, background, skills = self._find_scouting_report(soup)
File "XXX", line 468, in _find_scouting_report
print(eval_list.prettify())
AttributeError: 'NoneType' object has no attribute 'prettify'
As you can see it does find the tag and outputs it in a prettify manner but also outputs a None. What can be a way around this? Thank you in advance. The link I am using is: https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/
EDIT: I have used selenium thinking it may be a JS problem but that did not resolve either.
Solution 1:[1]
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0'
}
def get_soup(content):
return BeautifulSoup(content, 'lxml')
def main(url):
with requests.Session() as req:
req.headers.update(headers)
r = req.get(url)
soup = get_soup(r.content)
goal = [list(x.stripped_strings) for x in soup.select(
'.main-content.list-content > .evaluation-list > li > .highlights-list')]
for i in goal:
print(i[1:3] + i[-2:])
if __name__ == "__main__":
main('https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/')
Output:
['Andrew Ivins', 'Southeast Recruiting Analyst', 'Charles Woodson', 'Oakland Raiders']
['Andrew Ivins', 'Southeast Recruiting Analyst', 'Xavier Rhodes', 'Minnesota Vikings']
['Charles Power', 'National writer', 'Marcus Peters', 'Baltimore Ravens']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | αԋɱҽԃ αмєÑιcαη |
