'Python Gmail Parsing and Regex

My Python function scans Gmails for stock options symbols. So far it parses the subject line using the following code. Instead, I would like it to parse the message body, since often there are too many symbols to fit in the subject. A similar regex could probably work, however the line containing symbols is repeated twice in the message body (see sample attached). So the regex should stop once it reaches the end of line character.

I will appreciate your help with repurposing the regex!

def parse_symbols_from_email_to_list(email_msg):
    if email_msg['Subject'].find("Following list of symbols were added") != -1:
        symbols_list = re.findall(r'\.[A-Z]+[A-Z0-9]+\.*[0-9]+', email_msg['Subject'])
        symbols_list = list(dict.fromkeys(symbols_list))
        return symbols_list


Solution 1:[1]

Change your regex to :.*?\.(.*),.*\.(.*)\..*\nA

It will catch the first : up until the first line break followed by the letter A

with your code it'll look something like:

def parse_symbols_from_email_to_list(email_msg):
    if email_msg['Subject'].find("Following list of symbols were added") != -1:
        symbols_list = re.match(':.*?\.(.*),.*\.(.*)\..*\nA', email_msg['Subject'])
        symbols_list = [symbols_list[1], symbols_list[2]]
        return symbols_list

Solution 2:[2]

Try using set data structure while getting keys from your symbols_list, so our code will look like:

def parse_symbols_from_email_to_list(email_msg):
if email_msg['Subject'].find("Following list of symbols were added") != -1:
    symbols_list = re.findall(r'\.[A-Z]+[A-Z0-9]+\.*[0-9]+', email_msg['Subject'])
    symbols_list = list(set(dict.fromkeys(symbols_list)))
    return symbols_list

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pavel Gomon
Solution 2