'Azure Data Factory Unzipping many files into partitions based on filename
I have a large zip file that has 900k json files in it. I need to process these with a data flow. I'd like to organize the files into folders using the last two digits in the file name so I can process them in junks of 10k. My question is how to I setup a pipeline to use part of the file name of the files in the zip file (the source) as part of the path in the sink?
current setup: zipfile.zip -> /json/XXXXXX.json
desired setup: zipfile.zip -> /json/XXX/XXXXXX.json
Solution 1:[1]
Please check if below references can help
In source transformation, you can read from a container, folder, or individual file in Azure Blob storage. Use the Source options tab to manage how the files are read. Using a wildcard pattern will instruct the service to loop through each matching folder and file in a single source transformation. This is an effective way to process multiple files within a single flow.
- [ ] Matches one or more characters in the brackets.
/data/sales/**/*.csvGets all .csv files under /data/sales
And please go through 1. Copy and transform data in Azure Blob storage - Azure Data Factory & Azure Synapse | Microsoft Docs for other patterns and to check all filtering possibilities in azure blob storage.
In the sink transformation, you can write to either a container or a folder in Azure Blob storage. File name option: Determines how the destination files are named in the destination folder.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | kavyasaraboju-MT |
