'Azure Data Factory Unzipping many files into partitions based on filename

I have a large zip file that has 900k json files in it. I need to process these with a data flow. I'd like to organize the files into folders using the last two digits in the file name so I can process them in junks of 10k. My question is how to I setup a pipeline to use part of the file name of the files in the zip file (the source) as part of the path in the sink?

current setup: zipfile.zip -> /json/XXXXXX.json

desired setup: zipfile.zip -> /json/XXX/XXXXXX.json



Solution 1:[1]

Please check if below references can help

In source transformation, you can read from a container, folder, or individual file in Azure Blob storage. Use the Source options tab to manage how the files are read. Using a wildcard pattern will instruct the service to loop through each matching folder and file in a single source transformation. This is an effective way to process multiple files within a single flow.

  • [ ] Matches one or more characters in the brackets.
  • /data/sales/**/*.csv Gets all .csv files under /data/sales

And please go through 1. Copy and transform data in Azure Blob storage - Azure Data Factory & Azure Synapse | Microsoft Docs for other patterns and to check all filtering possibilities in azure blob storage.

  1. How to UnZip Multiple Files which are stored on Azure Blob Storage By using Azure Data Factory - Bing video

In the sink transformation, you can write to either a container or a folder in Azure Blob storage. File name option: Determines how the destination files are named in the destination folder.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 kavyasaraboju-MT