'Handling partitioned data in Azure?

I have some containers in ADLS (gen2) and have multiple folders within that container. I would like to have a mechanism to scan those folders to infer their schema and detect partitions and update them in the data catalog. How do I achieve this functionality in Azure?

Sample:

- container1
---table1-folder
-----10-12-1970
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----10-13-1970
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----10-14-1970
-------files1.parquet
-------files2.parquet
----table2-folder
-----zipcode1
-------files1.parquet
-------files2.parquet
-------files3.parquet
-----zipcode2
-------files1.parquet
-------files2.parquet

...

So, what I expect is that in the catalog, it will create two tables (table1 & table2) where table1 will have date-based partitions (3 dates for this case) and have underline data within that table. Same for table2 which will have two partitions and their underline data.

In the AWS world, I can run a Glue crawler that can crawl these files, infers schemas and partitions, and populate Glue data catalogs, later I can query them through Athena. What's the Azure equivalent approach to achieve something similar?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source