'BeautifulSoup find all links that contain any of the following words

Trying to find hrefs that contain any of the following words for example google,yahoo,msnbc

I can know how to find say all google, just lost on the next step please help

for a in elem.parent.find_all('a', string=re.compile('google')):

is what im using to find google



Solution 1:[1]

Basically (at least as far as I understand it) the string parameter just checks if a certain string is in the text of the tag, and not in a particularly efficient way (no hashing magic, simply iterating over the text of the tag).

But you can do that yourself, and make a check that fits your needs, perhaps something like that:

soup = BeautifulSoup(html_string, features = 'html.parser')
keywords = {'google', 'yahoo', 'msnbc'}
def check_keywords(link_text, words):
    for word in words:
        if word in link_text:
            return True
    return False
all_links = [a for a in soup.find_all('a') if check_keywords(a['href'], keywords)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nadav Porat