'How to find if a file pattern is in the S3_bucket location
I need to compare the S3 folder if any of them contains the file name which has the similar content like config.file_pattern in snowflake table :
I have a table (Config table) in snowflake where it stores a column called file_pattern which has the values like ".file_name_pattern.csv". (It is like a while card in SQL before and after the * it can have any value ) . It has other file formats as well like .txt ,.xls , so I mean to say that CSV is not a constant value. I need to compare this file_pattern with S3 bucket file list and see if the folder has anything which matches the file_pattern
Select file_pattern from Config ;
| file_pattern |
|---|
| .*file_name_pattern1*.csv |
| .*file_name_pattern2*.txt |
| .*file_name_pattern3*.png |
below is the sample S3 folder structure
DIR1/DIR2/DIR3/DIR4/file_name_pattern1_20190904.CSV
###Question :###
how do I compare this Wildcard field in snowflake column file_pattern to S3 folder . The tricky part is , I should also consider the .csv while comparing , not just the file_pattern.
I tried splitting the record from * to * in " *file_name_pattern*.csv" , again this will not consider .csv at the end
File_Pattern = [.*file_name_pattern1*.csv , .*file_name_pattern2*.pgp, *File_name_pattern.*.txt]
item['Key'] = DIR1/DIR2/DIR3/DIR4/file_name_pattern_20190904.CSV (This is the result of s3 folder/file list. this i got it through connecting to S3 by boto3)
Below is the initial version I tried which obviously did not work as it was comparing one to one match
for file in File_Pattern :
if file in item['Key']:
"Run a query"
Try 2 :
for file in File_Pattern :
file_1 = file.split('*') #(result was like [.,file_name_pattern1,.csv]
if file_1[1] in item['Key']:
"Run a query"
In this I was missing .csv if I compare just file_1[1]
I am not sure how to handle this
Solution 1:[1]
I am not sure I understand the exact problem but it does sound like existence of '*' in the file pattern is tripping you.
I think you are on the right path. It might work to ignore '' as in operator will look for that segment between the ''s anyway and focus on the end of the file (i.e. file type, .csv, .txt).
Taking your your second try, you might for example tweak it a bit to add the second condition to account for extension:
file_1 = file.split('*') #(result was like [.,file_name_pattern1,.csv]
if file_1[1] in item['Key'] and file_1[-1] == item['Key'][-4:]:
here you are asking both pattern and extension to match.
file_1[-1] would be the '.csv', '.txt' etc. ie., the extension part which would be the last element of the split result.
item['Key'][-4:] would give you the last 4 letters.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Adil Hindistan |
