'Extracting specific text / attribute value using BeautifulSoup

Following line of code:

results=(soup.find_all(type="folder"))
print(results)

will give me following output:

<object name="aaa" type="folder"/>
<object name="bbb" type="folder"/>
<object name="ccc" type="folder"/>
<object name="ddd" type="folder"/>

I only want to print:

aaa bbb ccc ddd

How can this be achieved?



Solution 1:[1]

You have to iterate the ResultSet to get each value of name attribute.

Example
h = '''
<object name="aaa" type="folder"/>
<object name="bbb" type="folder"/>
<object name="ccc" type="folder"/>
<object name="ddd" type="folder"/>
'''

soup = BeautifulSoup(h)
for r in soup.find_all(type="folder"):
    print(r.get('name'))

->
aaa
bbb
ccc
ddd

or use list comprehension to get a list:

results = [r.get('name') for r in soup.find_all(type="folder")]
print(results)

->
['aaa', 'bbb', 'ccc', 'ddd']

Convert list into a single string:

' '.join([r.get('name') for r in soup.find_all(type="folder")])

-> aaa bbb ccc ddd

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1