'Extract an image link from within a div and srcset using BS4

Example div tag within html:

[<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
                                https://img.example.image.link.here/954839 480w,
                                https://img.example.image.link.here/954839 600w,
                                https://img.example.image.link.here/954839 800w,
                                https://img.example.image.link.here/954839 1080w
                            ">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>]

Desired outcome (srcset):

https://img.example.image.link.here/954839

My function:

def extract_img_link(html):
            with open(html, 'rb') as file:
                content = BeautifulSoup(file)
                for image in content.findAll('div', attrs={'class':'event-info-and-content'}):
                    print(image.get("srcset"))
                    return(image)
    
    #calling out the html and function  
    html = 'data/website/events.html'
    print(extract_img_link(html))

My function simply returns the entire tag i was looking for, rather than the specific link within:

 [<div class="event-info-and-content">
    <picture content="https://img.example.image.link.here/954839">
    <source sizes="720px" srcset="
                                    https://img.example.image.link.here/954839 480w,
                                    https://img.example.image.link.here/954839 600w,
                                    https://img.example.image.link.here/954839 800w,
                                    https://img.example.image.link.here/954839 1080w
                                ">
    <img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
    </source></picture>
    </div>]

Solution 1:^[1]

You forgot about an extra layer inside, namely picture inside div

Following worked for me.

from bs4 import BeautifulSoup  

def extract_img_link(html):
    with open(html, 'rb') as file:
        content = BeautifulSoup(file, "html.parser")
        for image in content.find_all('div', attrs={'class':'event-info-and-content'}):
            for picture in image.find_all('picture'):
                print(picture["content"])
    
#calling out the html and function  
html = 'data/website/events.html'
extract_img_link(html)

Solution 2:^[2]

To get the image path change your selection and use the single one from the <picture>:

for e in soup.select('div.event-info-and-content picture'):
    print(e.get('content'))

or the <source>:

for e in soup.select('div.event-info-and-content source'):
    print(e.get('srcset').split()[0])

Example

from bs4 import BeautifulSoup

html = '''
<div class="event-info-and-content">
<picture content="https://img.example.image.link.here/954839">
<source sizes="720px" srcset="
                                https://img.example.image.link.here/954839 480w,
                                https://img.example.image.link.here/954839 600w,
                                https://img.example.image.link.here/954839 800w,
                                https://img.example.image.link.here/954839 1080w
                            ">
<img alt="" class="event-info-and-content" data-automation="event-hero-image"/>
</source></picture>
</div>
'''

soup = BeautifulSoup(html)

for e in soup.select('div.event-info-and-content picture'):
    print(e.get('content'))

Output

https://img.example.image.link.here/954839

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Neo
Solution 2	HedgeHog

'Extract an image link from within a div and srcset using BS4

Solution 1:[1]

Solution 2:[2]

Example

Output

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]