'How to remove the extra commas from the list of email addresses
I am using python and have a string of email addresses as shown below.
email_addr = '[email protected], [email protected], [email protected]'
Above string looks good, however some time i received the data that have blank email addresses in them.
For e.g.
email_addr = ' , , [email protected], [email protected], , , ,[email protected]
I am using str.split(',') and lot of errors checking. Wondering if is there a better way to do this?
Final value i am expecting from:
email_addr = ' , , [email protected], [email protected], , , ,[email protected]
to:
email_addr = '[email protected],[email protected],[email protected]'
Solution 1:[1]
No need for regular expressions. Use .split(',') to split into a list of strings.
email_lst = email_addr.split(',')
Then join with comma, but filter out blank values
email_addr2 = ",".join(e.strip() for e in email_lst if e.strip())
# '[email protected],[email protected],[email protected]'
In Python 3.8+, you can use the walrus operator to avoid calling .strip() twice:
email_addr2 = ",".join(e for ee in email_lst if (e := ee.strip()))
Solution 2:[2]
Try:
import re
email_addr = " , , [email protected], [email protected], , , ,[email protected]"
email_addr = email_addr.replace(" ", "").strip(",")
email_addr = re.sub(r",{2,}", ",", email_addr)
print(email_addr)
Prints:
[email protected],[email protected],[email protected]
Solution 3:[3]
I'd be quite tempted to validate as you go and rely on email.utils.parseaddr which will somewhat ensure email clients will accept them
>>> parse_email_addr("Foo Bar <[email protected]>")
('Foo Bar', '[email protected]')
from email.utils import parseaddr as parse_email_addr
email_addr = ' , , [email protected], [email protected], , , ,[email protected]'
result = ",".join(filter(None, (parse_email_addr(email)[1] for email in email_addr.split(","))))
# '[email protected],[email protected],[email protected]'
I'd also be tempted to account for bad fields, which may represent some input error (ie. how did you get these? should they be correct as inputs to your program?)
>>> result
'[email protected],[email protected],[email protected]'
>>> email_addr.rstrip(",").count(",") - result.count(",")
5
Solution 4:[4]
If we use regex, how about getting a list of matches with [^, ]+ and then joining all the items?
[^, ] means any char except , and , and + means "1 or more"
import re
email_addr = " , , [email protected], [email protected], , , ,[email protected]"
email_cleaned = ",".join(re.findall("[^, ]+", email_addr))
print(email_cleaned)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Andrej Kesely |
| Solution 3 | |
| Solution 4 |
