'Python Beautifulsoup get texts before a certain tag

I have the following html code to run a python beautifulsoup to:

<html>
<head>  
<script> ... </head>
<title> ... </title>
<style> ... </title>
</head>

<body onload="nextHit()">
S. <a name="hit1"></a><span style="background-color: #FFFF00">NO</span>. 178 H. <a name="hit2"></a><span style="background-color: #FFFF00">NO</span>. 1323 / 46 OG <a name="hit3"></a><span style="background-color: #FFFF00">No</span>. 12, 5977 (December, 1950)
<center>
<h2>...</h2>
<h3>...</h3>    
</center>
<br>
....Lines omitted for brevity (more brs, divs, prs)...
</body>

The thing is I only want to get the texts in the beginning of the body tag, just before the first center tag like so:

S. NO. 178 H. NO. 1323 / 46 OG No. 12, 5977 (December, 1950)

I have tried:

ogsourcing = soup.find('center').previousSibling

But I am getting just the last part like so:

. 12, 5977 (December, 1950)

python beautifulsoup

Solution 1:^[1]

_{Version 2; based on OP's comment}

find() the <center> element
Use previous_siblings to get an iterator with all the siblings
Loop over then, append the .text to an list
Reverse the list since we're looping from bottom to top
''.join() the list to get the desired string

from bs4 import BeautifulSoup

html = """
<html>
    <head>
        <script></script>
        <title></title>
        <style></style>
    </head>

    <body onload="nextHit()">
        S. <a name="hit1"></a><span style="background-color: #FFFF00">NO</span>. 178 H. <a name="hit2"></a><span style="background-color: #FFFF00">NO</span>. 1323 / 46 OG <a name="hit3"></a><span style="background-color: #FFFF00">No</span>. 12, 5977 (December, 1950)
        <center>
        <h2>foo</h2>
        <h3>bar</h3>
        </center>
        <br>
        <em>test</em>
        <div>
            <em>test</em>
        </div>
    </body>
</html>
"""

res = []
soup = BeautifulSoup(html, 'html.parser')

for sibling in soup.find('center').previous_siblings:
    res.append(sibling.text)

res.reverse()
res = ''.join(res)

print(res)

The above print() will output:

S. NO. 178 H. NO. 1323 / 46 OG No. 12, 5977 (December, 1950)

You might want to include a .strip() to get rid of any whitespaces and/or newlines

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Python Beautifulsoup get texts before a certain tag

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]