'Outputting tag from BeautifulSoup gives html and None

I am having trouble with getting the tag from a BeautifulSoup .find() method.

Here is my code:

url = evaluations['href']
page = requests.get(url, headers = HEADERS)
soup = BeautifulSoup(page.content, 'lxml')
evaluators = soup.find("section", class_="main-content list-content")
evaluators_list = evaluators.find("ul", class_='evaluation-list').find_all("li")
evaluators_dict = defaultdict(dict)
for evaluator in evaluators_list:
    eval_list = evaluator.find('ul', class_='highlights-list')
    print(eval_list.prettify())

This then gives the output:

<ul class="highlights-list">
 <li class="eval-meta evaluator">
  <b class="uppercase heading">
   Evaluated By
  </b>
  <img alt="Andrew Ivins" height="50" src="https://s3media.247sports.com/Uploads/Assets/680/358/9358680.jpeg?fit=bounds&amp;crop=50:50,offset-y0.50&amp;width=50&amp;height=50&amp;fit=crop" title="Andrew Ivins" width="50"/>
  <div class="evaluator">
   <b class="text">
    Andrew Ivins
   </b>
   <span class="uppercase">
    Southeast Recruiting Analyst
   </span>
  </div>
 </li>
 <li class="eval-meta projection">
  <b class="uppercase heading">
   Projection
  </b>
  <b class="text">
   First Round
  </b>
 </li>
 <li class="eval-meta">
  <b class="uppercase heading">
   Comparison
  </b>
  <a href="https://247sports.com/Player/Charles-Woodson-76747/" target="_blank">
   Charles Woodson
  </a>
  <span class="uppercase">
   Oakland Raiders
  </span>
 </li>
</ul>

and the error

Traceback (most recent call last):
  File "XXX", line 2, in <module>
    player = Player("Travis-Hunter-46084728").player
  File "XXX", line 218, in __init__
    self.player = self._parse_player()
  File "XXX", line 253, in _parse_player
    evaluators, background, skills = self._find_scouting_report(soup)
  File "XXX", line 468, in _find_scouting_report
    print(eval_list.prettify())
AttributeError: 'NoneType' object has no attribute 'prettify'

As you can see it does find the tag and outputs it in a prettify manner but also outputs a None. What can be a way around this? Thank you in advance. The link I am using is: https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/

EDIT: I have used selenium thinking it may be a JS problem but that did not resolve either.



Solution 1:[1]

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0'
}


def get_soup(content):
    return BeautifulSoup(content, 'lxml')


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        r = req.get(url)
        soup = get_soup(r.content)
        goal = [list(x.stripped_strings) for x in soup.select(
            '.main-content.list-content > .evaluation-list > li > .highlights-list')]
        for i in goal:
            print(i[1:3] + i[-2:])


if __name__ == "__main__":
    main('https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/')

Output:

['Andrew Ivins', 'Southeast Recruiting Analyst', 'Charles Woodson', 'Oakland Raiders']
['Andrew Ivins', 'Southeast Recruiting Analyst', 'Xavier Rhodes', 'Minnesota Vikings']
['Charles Power', 'National writer', 'Marcus Peters', 'Baltimore Ravens']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 αԋɱҽԃ αмєяιcαη