'Extracting specific text / attribute value using BeautifulSoup
Following line of code:
results=(soup.find_all(type="folder"))
print(results)
will give me following output:
<object name="aaa" type="folder"/>
<object name="bbb" type="folder"/>
<object name="ccc" type="folder"/>
<object name="ddd" type="folder"/>
I only want to print:
aaa bbb ccc ddd
How can this be achieved?
Solution 1:[1]
You have to iterate the ResultSet to get each value of name attribute.
Example
h = '''
<object name="aaa" type="folder"/>
<object name="bbb" type="folder"/>
<object name="ccc" type="folder"/>
<object name="ddd" type="folder"/>
'''
soup = BeautifulSoup(h)
for r in soup.find_all(type="folder"):
print(r.get('name'))
->
aaa
bbb
ccc
ddd
or use list comprehension to get a list:
results = [r.get('name') for r in soup.find_all(type="folder")]
print(results)
->
['aaa', 'bbb', 'ccc', 'ddd']
Convert list into a single string:
' '.join([r.get('name') for r in soup.find_all(type="folder")])
-> aaa bbb ccc ddd
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
