'Formatting Phone number with +1 with pandas.Series.replace

I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:

here is my code:

df = pd.DataFrame({'Company phone number': ['+1-541-296-2271', '+1-542-296-2271', '+1-543-296-2271'],
                   'Contact phone number': ['15112962271', None,'15312962271'],
                   'num_specimen_seen': [10, 2,3]},
                  index=['falcon', 'dog','cat'])

df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\+1-\d{3}-\d{3}-\d{4}')

desired output of df['Contact phone number']:

falcon    +1-511-296-2271
dog       None
cat       +1-531-296-2271

It is always 11 digits with no spaces or special characters. Thanks!



Solution 1:[1]

You can use

 df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d+)$', r'+1-\1-\2-\3-\4', regex=True)

Details:

  • ^ - a start of string
  • (\d) - Group 1 (\1): a digit
  • (\d{3}) - Group 2 (\2): three digits
  • (\d{3}) - Group 3 (\3): three digits
  • (\d+) - Group 4 (\4): any one or more digits (use \d{4} if you need to match exactly four next digits)
  • $ - end of string.

Output:

>>> df['Contact phone number']
falcon    +1-1-511-296-2271
dog                    None
cat       +1-1-531-296-2271

See the regex demo.

Solution 2:[2]

You can use .str.extract, convert each row of results to a list, and then use .str.join (and of course concatenate a + at the beginning):

df['Contact phone number'] = '+' + df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')

Output:

>>> df
       Company phone number Contact phone number  num_specimen_seen
falcon      +1-541-296-2271       +1-511-296-227                 10
dog         +1-542-296-2271                  NaN                  2
cat         +1-543-296-2271       +1-531-296-227                  3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew
Solution 2 richardec