'Processing multiple files simultaneously and output to multiple files
I have a flow to process files:
- Input: a list of N (N ~ 10 ^ 5) input files: input_file_list.
- Processing: Run function to extract data from each input file: process_single_file(input_file_list[i]). Each input file generates M_i records, you can think a record as a dictionary type.
- Output: Each record will be written into a file, depending on the information of a certain field on that record.
I want to optimize processing time. I tried multithreaded on Python but was not sure I implemented it the right way, the processing time did not better.
Please help to recommend some patterns or good strategies to speed up. It would be great if you can help to explain why that strategy could speed up.
Another question: what if after step 2, for each record, I have to query database to get more information for that record.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

