'Odd type error warning when using bs4 to obtain value from website

The following is a snippet from a website, where I am trying to obtain (only) the "Text to Capture". That text is surrounded by a couple of "div" classes, which contain tables, text etc.

<div class="rankbox">
    <div>Ranking 
        <div class="tooltip-wrapper"> ... </div>
        <div class="tooltiptext hide"> ... </div>
        **Text to Capture**
        <span class="sr-only"> of 5</span>
        <span class="rank_chip rankrect_1">&nbsp;</span>
        <span class="rank_chip rankrect_2">&nbsp;</span>
        <span class="rank_chip rankrect_3">3</span> 
        <span class="rank_chip rankrect_4">&nbsp;</span>
        <span class="rank_chip rankrect_5">&nbsp;</span>
    </div>
</div>

The oddity here is that the text to capture has no Tags associated to it whatsoever. I have gotten this to work:

rankbox = soup.find('div', attrs={'class': 'rankbox'})
lx = [x for x in list(rankbox.contents[1])]
returnvalue = str(lx[4]).strip()

However, I am getting a type error warning from pycharm: Expected type 'Iterable[_T]' (matched generic type 'Iterable[_T]'), got 'PageElement' instead because rankbox.contents[1] is a PageElement, not a List

I am wondering whether there is a more elegant way of doing achieving this , avoiding a warning too



Solution 1:[1]

Given this HTML source, the following is the a possible solution that I could think about.

The idea is

  1. Get the first div tag under div.rankbox
  2. Remove all div and span tags
  3. Obtain text from the remaining source
  4. Remove the text "Ranking" at the beginning
  5. Remove surrounding spaces
import re
from bs4 import BeautifulSoup

html = """
<div class="rankbox">
    <div>Ranking 
        <div class="tooltip-wrapper"> ... </div>
        <div class="tooltiptext hide"> ... </div>
        **Text to Capture**
        <span class="sr-only"> of 5</span>
        <span class="rank_chip rankrect_1">&nbsp;</span>
        <span class="rank_chip rankrect_2">&nbsp;</span>
        <span class="rank_chip rankrect_3">3</span> 
        <span class="rank_chip rankrect_4">&nbsp;</span>
        <span class="rank_chip rankrect_5">&nbsp;</span>
    </div>
</div>
"""

soup = BeautifulSoup(html)

x = soup.select("div.rankbox div")[0]  # div starting with Ranking
# remove all divs and spans
for d in x.find_all("div"):
    d.extract()
for s in x.find_all("span"):
    s.extract()
x = x.text
x = re.sub(r"^Ranking", "", x) # remove "Ranking" at first"
x = x.strip()

x
# '**Text to Capture**'

Solution 2:[2]

Previous answer helped me to find the shortest code for this:

xtract = soup.find('div', attrs={'class': 'zr_rankbox'})
x = xtract.select('div')[0].find_all(text=True, recursive=False)[1].get_text(strip=True)

without type error warning

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kota Mori
Solution 2 Quirn