'Formatting Phone number with +1 with pandas.Series.replace
I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:
here is my code:
df = pd.DataFrame({'Company phone number': ['+1-541-296-2271', '+1-542-296-2271', '+1-543-296-2271'],
'Contact phone number': ['15112962271', None,'15312962271'],
'num_specimen_seen': [10, 2,3]},
index=['falcon', 'dog','cat'])
df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\+1-\d{3}-\d{3}-\d{4}')
desired output of df['Contact phone number']:
falcon +1-511-296-2271
dog None
cat +1-531-296-2271
It is always 11 digits with no spaces or special characters. Thanks!
Solution 1:[1]
You can use
df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d+)$', r'+1-\1-\2-\3-\4', regex=True)
Details:
^- a start of string(\d)- Group 1 (\1): a digit(\d{3})- Group 2 (\2): three digits(\d{3})- Group 3 (\3): three digits(\d+)- Group 4 (\4): any one or more digits (use\d{4}if you need to match exactly four next digits)$- end of string.
Output:
>>> df['Contact phone number']
falcon +1-1-511-296-2271
dog None
cat +1-1-531-296-2271
See the regex demo.
Solution 2:[2]
You can use .str.extract, convert each row of results to a list, and then use .str.join (and of course concatenate a + at the beginning):
df['Contact phone number'] = '+' + df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
Output:
>>> df
Company phone number Contact phone number num_specimen_seen
falcon +1-541-296-2271 +1-511-296-227 10
dog +1-542-296-2271 NaN 2
cat +1-543-296-2271 +1-531-296-227 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 | richardec |
