'Utilizing multiple processors to load large number of files for pandas (python)

I need to load hundreds or thousands of JSON files into a big pandas dataframe. My current solution using a for loop to iterate the directory is slow and is not utilizing the multiple CPUs of the machine.

I already a basic idea of putting all the filenames to load into a list and split it into a multiple chunks, and have each thread or process to work on each chunk.

What's the best way doing this in utilizing multiple CPUs? I thought of using fork but it seems that a child process cannot return a data structure to the parent process.

Thank you

python pandas

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Utilizing multiple processors to load large number of files for pandas (python)

Sources

Related Questions