'Can Python MultiThreading corrupt files?

I am using python mutlithreading to process images in the following way:

def run_single_image(self,row):
    filename=row["filename"]
    image = cv2.imread(filename)
    new_image = self.image_processor(image)
    cv2.imwrite(new_image_path, new_image)

row is a row in from pandas data frame. To call this function I am using the following function:

def run_multi(self) -> None:
    ex_list = []
    print("running")
    with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
        for _, row in self.df.iterrows():

            ex_list.append(executor.submit(self.run_single_image, row))
        executor.shutdown(wait=True)

However, when I am reading the processed images, some of the processed images are corrupted and I get cv2 error. Can Multithreading corrupt those files ?



Solution 1:[1]

Well, after examining the dataframe again, there are rows that refer to the same filenames. Because new_image_path is composed of the old filename, in some cases, two threads wrote to the same to the same file resulting in half image processed as a function of the first row and a half image processed as a result of the second row.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 s.b