'Odd type error warning when using bs4 to obtain value from website

The following is a snippet from a website, where I am trying to obtain (only) the "Text to Capture". That text is surrounded by a couple of "div" classes, which contain tables, text etc.

<div class="rankbox">
    <div>Ranking 
        <div class="tooltip-wrapper"> ... </div>
        <div class="tooltiptext hide"> ... </div>
        **Text to Capture**
        <span class="sr-only"> of 5</span>
        <span class="rank_chip rankrect_1">&nbsp;</span>
        <span class="rank_chip rankrect_2">&nbsp;</span>
        <span class="rank_chip rankrect_3">3</span> 
        <span class="rank_chip rankrect_4">&nbsp;</span>
        <span class="rank_chip rankrect_5">&nbsp;</span>
    </div>
</div>

The oddity here is that the text to capture has no Tags associated to it whatsoever. I have gotten this to work:

rankbox = soup.find('div', attrs={'class': 'rankbox'})
lx = [x for x in list(rankbox.contents[1])]
returnvalue = str(lx[4]).strip()

However, I am getting a type error warning from pycharm: Expected type 'Iterable[_T]' (matched generic type 'Iterable[_T]'), got 'PageElement' instead because rankbox.contents[1] is a PageElement, not a List

I am wondering whether there is a more elegant way of doing achieving this , avoiding a warning too

Solution 1:^[1]

Given this HTML source, the following is the a possible solution that I could think about.

The idea is

Get the first div tag under div.rankbox
Remove all div and span tags
Obtain text from the remaining source
Remove the text "Ranking" at the beginning
Remove surrounding spaces

import re
from bs4 import BeautifulSoup

html = """
<div class="rankbox">
    <div>Ranking 
        <div class="tooltip-wrapper"> ... </div>
        <div class="tooltiptext hide"> ... </div>
        **Text to Capture**
        <span class="sr-only"> of 5</span>
        <span class="rank_chip rankrect_1">&nbsp;</span>
        <span class="rank_chip rankrect_2">&nbsp;</span>
        <span class="rank_chip rankrect_3">3</span> 
        <span class="rank_chip rankrect_4">&nbsp;</span>
        <span class="rank_chip rankrect_5">&nbsp;</span>
    </div>
</div>
"""

soup = BeautifulSoup(html)

x = soup.select("div.rankbox div")[0]  # div starting with Ranking
# remove all divs and spans
for d in x.find_all("div"):
    d.extract()
for s in x.find_all("span"):
    s.extract()
x = x.text
x = re.sub(r"^Ranking", "", x) # remove "Ranking" at first"
x = x.strip()

x
# '**Text to Capture**'

Solution 2:^[2]

Previous answer helped me to find the shortest code for this:

xtract = soup.find('div', attrs={'class': 'zr_rankbox'})
x = xtract.select('div')[0].find_all(text=True, recursive=False)[1].get_text(strip=True)

without type error warning

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Kota Mori
Solution 2	Quirn

'Odd type error warning when using bs4 to obtain value from website

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]