'Azure Databricks: Python parallel for loop
I am using Azure Databricks to analyze some data. I have the following folder structure in blob storage:
folder_1\n1 csv files
folder_2\n2 csv files
..
folder_k\nk csv files
I want to read these files, run some algorithm (relatively simple) and write out some log files and image files for each of the csv files in a similar folder structure at another blob storage location. Right now I have a simple loop structure to do this:
for folder in folders:
#set up some stuff
for file in files:
#do the work and write out results
The database contains 150k files. Is there a way to parallelize this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
