'Azure Databricks: Python parallel for loop

I am using Azure Databricks to analyze some data. I have the following folder structure in blob storage:

folder_1\n1 csv files
folder_2\n2 csv files
..
folder_k\nk csv files

I want to read these files, run some algorithm (relatively simple) and write out some log files and image files for each of the csv files in a similar folder structure at another blob storage location. Right now I have a simple loop structure to do this:

for folder in folders:
  #set up some stuff
  for file in files:
    #do the work and write out results

The database contains 150k files. Is there a way to parallelize this?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Azure Databricks: Python parallel for loop

Sources

Related Questions