'Airflow GCSFileTransformOperator source object filename wildcard
I am working on a DAG that should read an xml file, do some transformations to it and land the result as a CSV. For this I am using GCSFileTransformOperator.
Example:
xml_to_csv = GCSFileTransformOperator(
task_id=f'xml_to_csv',
source_bucket='source_bucket',
source_object=(
f'raw/dt=2022-01-19/File_20220119_4302.xml'
),
destination_bucket='destination_bucket',
destination_object=f'csv_format/dt=2022-01-19/File_20220119_4302.csv',
transform_script=[
'/path_to_script/transform_script.py'
],
)
My problem is that the filename has is ending with a 4 digit number that is different each day (File_20220119_4302). Next day the number will be different.
I can use template for execution date: {{ ds }}, {{ ds_nodash }}, but not sure what to with the number.
I have tried wildcards like File_20220119_*.xml, with no success.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
