'Extracting Business Extension from a phone number in Bigquery

I want to separate phone number into 2 fields in bigquery using regexp expression

  1. PhoneNumber
  2. Business Extension

The input looks like enter image description here

The expected output should look like enter image description here



Solution 1:[1]

Consider below approach

select FullPhoneNumber, 
  regexp_replace(FullPhoneNumber, r'(?i)(.*?)(e?xt?\.?\s*\d+)$', r'\1') as PhoneNumber,
  regexp_extract(FullPhoneNumber, r'(?i)e?xt?\.?\s*(\d+)$') as Extension
from your_table           

if applied to sample/dummy data

with your_table as (
  select "408-697-4639 x. 301" FullPhoneNumber union all 
  select "563-232-1400x6740" union all 
  select "248-456-8250ext. 100" union all 
  select "(123) 456-7890"
)  

   

output is

enter image description here

Solution 2:[2]

A fixed query. An extension always has the character x.

To replace the brackets the inner Select is needed.

with tbl as (
Select "408-697-4639 x. 301" phone
Union ALL Select "563-232-1400x6740"
Union ALL Select "248-456-8250ext. 100"
Union ALL Select " (781) 512-3435")


SELECT *,REGEXP_EXTRACT(phone, r"^[0-9\s\(\)]{0,8}[0-9\-]+") as phoneNumber,
REGEXP_EXTRACT(REGEXP_EXTRACT(phone, r"[xX].*\d+$"),r"\d+$") as extension
#FROM 
#(Select regexp_replace(phone,r"\s*\(\s*(\d+)\s*\)\s*", r"\1-")  as phone # Remove brackets
From tbl
#)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mikhail Berlyant
Solution 2