'beautiful soup - get tag desired text
Very new to beautiful soup. I'm attempting to get the text between tags.
databs.txt
<p>$343,343</p><h3>Single</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
Python
#!/usr/bin/python
import os
from bs4 import BeautifulSoup
f = open(os.path.join("databs.txt"), "r")
text = f.read()
soup = BeautifulSoup(text, 'html.parser')
page1 = soup.find('p').getText()
print("P1:",page1)
page2 = soup.find('h3').getText()
print("H3:",page2)
Question:
- How do I get the text "$101,900, Multi, $201,900, Single"?
Solution 1:[1]
If you want to get the tags that have attributes, you can use lambda function to get them as follows:
from bs4 import BeautifulSoup
html = """
<p>$343,343</p>
<h3>Single</h3>
<p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
"""
soup = BeautifulSoup(html, 'lxml')
tags_with_attribute = soup.find_all(attrs=lambda x: x is not None)
clean_text = ", ".join([tag.get_text() for tag in tags_with_attribute])
Output would look like:
'$101,900, Multi, $201,900, Single'
Solution 2:[2]
Use find_all method to find all tags:
for p, h3 in zip(soup.find_all('p'), soup.find_all('h3')):
print("P:",p.getText())
print("H3:",h3.getText())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rustam Garayev |
| Solution 2 |
