'List image paths of multiple formats in a Kaggle Dataset [closed]

How to form a list of images of multiple formats in a Kaggle dataset where the image paths are like? Working in Kaggle I wanted to convert the image paths into the list so that I can store and perform operations but couldn't find a proper image traversing Algo to give me the required list result.

Tree for the image is:

|-data
   |-images
        |-ID0
          |--- img4tgh4r3.jpg
          |--- img324633.png
          |
          .
          .
        |-ID1
        .
        .

I tried using ls -a but how do you convert this structure and save it into a data type to reuse it.

import os
  

path = "/"
dir_list = os.listdir(path)
  
print("Files and directories in '", path, "' :") 
  
# print the list
print(dir_list)

This only lists the directories but not all the image types.



Solution 1:[1]

This can be done using either os or glob modules in Python. I would suggest using glob as it facilitates more functionality w.r.t filenames in various scenarios.

SAMPLE CODE:


import glob
from tqdm import tdqm

# The required file extensions
fetch_formats = ['png', 'jpg', 'jpeg']

# Declare an empty list for storing the file names
img_list = list()

# State the directory of interest
path = working_dir + "images/**/*."

# Fetch each type of file from the given directory
for ff in tqdm(fetch_formats, desc="Fetching the filenames"):
    img_list.extend(list(glob.glob(path+ff)))

print(f"\nTotal number of images: {len(img_list)}")

NOTE:

  • Usage of tqdm is for generating the progress bar and can be avoided
  • The *.png* would imply any filename ending with .png`
  • The dir\**\*.png would imply any sub-directory inside dir which contains files whose names end with .png

Check out the official documentation for more information

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Smaranjit Ghose