'Select the depth in SQL path using regex expression
I am querying an S3 using Athena and I want to select the key until certain path level. Here are some key examples:
- project=proj1/ID=SS02769/input=client1/version=X3900/data/fold1/file1.csv
- project=proj1/ID=SS02770/input=client1/version=X4500/data/fold1/file2.csv
- project=proj1/ID=SS02775/input=client1/version=X9000/data/fold1/file3.csv
I want to query these rows and select all the string until 'data/' is there any regex expression to use or any sql expression? Actually I tried the next regex expression but does not work :
regex_match = fr'([^/]*[/]){{{depth}}}'
with depth=4 and with the next query :
f"SELECT REGEXP_EXTRACT(key,'{regex_match}') AS path FROM bucket_number_1)\
GROUP BY REGEXP_EXTRACT(key, '{regex_match}')"
but I dot empty dataframe
Any help is appreciated !
Solution 1:[1]
You can use
regex_match = r'^((?:[^/]*/){5})'
Details:
^- start of string((?:[^/]*/){5})- Capturing group 1:(?:[^/]*/){5}- five consecutive occurrences of[^/]*- any zero or more chars other than a/char/- a/char.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
