'Utilizing multiple processors to load large number of files for pandas (python)
I need to load hundreds or thousands of JSON files into a big pandas dataframe. My current solution using a for loop to iterate the directory is slow and is not utilizing the multiple CPUs of the machine.
I already a basic idea of putting all the filenames to load into a list and split it into a multiple chunks, and have each thread or process to work on each chunk.
What's the best way doing this in utilizing multiple CPUs? I thought of using fork but it seems that a child process cannot return a data structure to the parent process.
Thank you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|