'Issue with splitting string in bigquery

I cant seem to split this string, I would like to get the last 8 numbers as 'Date' in the format of YYYY-MM-DD. I would like to use the _filename to feed through and generate the date.

select

split(split("gs://mmmm_ssss_count_detail/mmm_ssss_count_detail_20220125.csv",'/')[offset(2)],'_')[offset(3)]

This just gives me 'detail' which is not what I need, it should at least give me '20220125.csv' where I can then remove the .csv part and parsedate it to 'Date' in my main select query.

Help please.



Solution 1:[1]

Instead of a split, can you use a regex to find the date value? If so try the following:

select 
    parse_date("%Y%m%d",regexp_extract(path, r'_(\d+)\.csv'))
from sample_data

The regex string above is looking for a pattern where a set of digits (\d+) exists between an _ and .csv

With the sample string you provided it yields:

enter image description here

Solution 2:[2]

Consider below approach / option

select _filename, 
  parse_date('%Y%m%d', array_reverse(split(replace(_filename, '.', '_'), '_'))[offset(1)]) file_date
from your_table      

if applied to sample data in your question - output is

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel Zagales
Solution 2 Mikhail Berlyant