'Regexp expression for getting the file path till a given slash number

I have a log table with a row 'path' having values like root/home/desktop/parent/child/grandchild I want to do group by on this row based on some integer input 'n' where n is the number of slashes till which we want to extract the substring and then do group by on that. For example: in this case if n = 1 I would want to group by 'root/' if n was 3 if would want to group by 'root/home/desktop/'. How can I achieve this in BigQuery ? Can I use a regex for the same or is there any better way to achieve this ? Would appreciate giving a bit of explanation too for whatever the approach is. Thanks!!



Solution 1:[1]

In case if you want to use regexp - consider below

create temp function get_path(path string, n int64) as ((
  regexp_extract(path, r'(^(?:[^/]+/?){' || n || '})')
));
with your_table as (
  select 'root/home/desktop/parent/child/grandchild' path
)
select  
  get_path(path, 1) n1,
  get_path(path, 2) n2,
  get_path(path, 3) n3,
  get_path(path, 4) n4,
  get_path(path, 5) n5,
  get_path(path, 6) n6,
from your_table    

with output

enter image description here

Solution 2:[2]

Not sure if below example really requires any extra explanation

select *, 
  split(path, '/')[safe_offset(0)],
  split(path, '/')[safe_offset(1)],
  split(path, '/')[safe_offset(2)],
  split(path, '/')[safe_offset(3)],
  split(path, '/')[safe_offset(4)],
  split(path, '/')[safe_offset(5)]
from your_table    

with output

enter image description here

I would like to have the splits combined in the form of a string until the last slash ...

To get partial path from the beginning - use below example

create temp function get_path(path string, n int64) as ((
  select string_agg(part, '/' order by offset)
  from unnest(split(path, '/')) part with offset
  where offset < n
));
select  
  get_path(path, 1) n1,
  get_path(path, 2) n2,
  get_path(path, 3) n3,
  get_path(path, 4) n4,
  get_path(path, 5) n5,
  get_path(path, 6) n6
from your_table

with output like below

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mikhail Berlyant
Solution 2