'Reading and writing files using parallelism or concurrency
I have a task to create two functions:
Create n txt files named with ordinal number and with content of expression of fibonacci sequence from that ordinal number (E.g. 10.txt contains int 55).
Takes path to the directory which contains generated files by 1st function and creates csv file where each row is like 'ordinal,expression' (E.g. 10,55)
Each of them are easy but the task is to use pararelism or concurrency to improve efficiency of the code and I literally have no idea how to do that. I assumed that this is I/O bound task and chose multithreading and tried to solve this in that way:
def generate_csv_threads(path, lock):
to_row = []
files = [file for file in os.listdir(path) if 'txt' in file]
for _ in range(len(files)):
lock.acquire()
with open(f'{path}/{files[0]}', 'r') as f:
to_row.append([os.path.splitext(files[0])[0], f.read()])
del files[0]
lock.release()
for _ in range(len(to_row)):
with open(f'{path}/fibo_csv_threading.csv', 'a+', encoding='UTF-8') as fw:
writer = csv.writer(fw)
writer.writerow(to_row[0])
del to_row[0]
lock = threading.Lock()
threads = [threading.Thread(target=generate_csv_threads, args=(path_to_file, lock))
for _ in range(50)]
start_th = time.time()
[t.start() for t in threads]
[t.join() for t in threads]
print('Csv threading time = ', time.time() - start_th)
but with given 50 threads and 30 files it generates 1500 (50*30) rows in csv. I also tried this kind of approach:
def test(path, lock):
to_row = []
global files
lock.acquire()
task = files[0]
with open(f'{path}/{task}', 'r') as f:
to_row.append([os.path.splitext(task)[0]])
del files[0]
lock.release()
with open(f'{path}/fibo_csv_threading.csv', 'a+', encoding='UTF-8') as fw:
writer = csv.writer(fw)
writer.writerow(to_row[0])
del to_row[0]
but it seems to be much slower, and number of rows in is equal to number of threads that I started.
I would be greatful if someone would give me some hint... btw. Is it event possible to improve compilation time in this kind of task with concurrency?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
