'How to convert images into numpy array quickly?

To train the image classification model I'm loading input data as NumPy array, I deal with thousands of images. Currently, I'm looping through each image and converting it into a NumPy array as shown below.

import glob
import cv2
import numpy as np

tem_arr_list = []
from time import time 
images_list = glob.glob(r'C:\Datasets\catvsdogs\cat\*.jpg')
start = time()
for idx, image_path in enumerate(images_list):
    start = time()
    img = cv2.imread(image_path)
    temp_arr = np.array(cv2.imread(image_path))
#     print(temp_arr.shape)
    tem_arr_list.append(temp_arr)
print("Total time taken {}".format (time() - start))

running this method takes a lot of time when data is huge. So I tried using list comprehension as below

tem_arr_list = [np.array(cv2.imread(image_path)) for image_path in images_list] 

which is slight quicker than looping but not fastest

I'm looking any other way to reduce the time to do this operation . Any help or suggestion on this will be appreciated.



Solution 1:[1]

Use the multiprocessing pool to load data parallely. In my PC the cpus count is 16. I tried loading 100 images and below you could see the time taken.

import multiprocessing
import cv2
import glob
from time import time 

def load_image(image_path):
    return cv2.imread(image_path)

if __name__ == '__main__':
    image_path_list = glob.glob('*.png')
        
        
    try:
        cpus = multiprocessing.cpu_count()
    except NotImplementedError:
        cpus = 2   # arbitrary default
    
    pool = multiprocessing.Pool(processes=cpus)
    
    start = time()
    images = pool.map(load_image, image_path_list)
    print("Total time taken using multiprocessing pool {} seconds".format (time() - start))
    
    images = []
    start = time()
    for image_path in image_path_list:
        images.append(load_image(image_path))
    print("Total time taken using for loop {} seconds".format (time() - start))
    
    
    start = time()
    images = [load_image(image_path) for image_path in image_path_list]        
    print("Total time taken using list comprehension {} seconds".format (time() - start))

Output:

Total time taken using multiprocessing pool 0.2922379970550537 seconds
Total time taken using for loop 1.4935636520385742 seconds
Total time taken using list comprehension 1.4925990104675293 seconds

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1