'How to create a pairs. csv file given an image dataset

Given a dataset of images, I would like to create a pairs.csv file for both train and test set. The format for the csv file is shown below

Let's assume in train set folder A contains the following images:

1.jpg
2.jpg
3.jpg

then my CSV file will look like

|ImgA|ImgB
|1.jpg|2.jpg
|1.jpg|3.jpg
|2.jpg|1.jpg
|2.jpg|3.jpg
|3.jpg|1.jpg
|3.jpg|2.jpg

Another example of the dataset and csv file structure is shown below.

For the folder structure shown here

folder structure

some of the csv file entries are as follows:

CSV FOrmat expected sample output

I could do it manually if the number of images and permutations involved were not that large. For example, the screenshots were taken from a folder which has 31 subdirectories each subdirectory contains at most 5 or 6 images similar to screenshot 2.



Solution 1:[1]

Use ìtertools.product to make the pairings. Here an example of the logic:

import itertools as it

# just to make runnable data as string and then splitted into list
data = """1.jpg
2.jpg
3.jpg"""
data_list = data.split('\n')

# body of the program
header = '|ImgA|ImgB\n'
# make all parings
out = (f'{i}|{j}' for i, j in it.product(data_list, repeat=2) if i!=j)
out = header + '|{}'.format('\n|'.join(out))
print(out)

Output

|ImgA|ImgB
|1.jpg|2.jpg
|1.jpg|3.jpg
|2.jpg|1.jpg
|2.jpg|3.jpg
|3.jpg|1.jpg
|3.jpg|2.jpg

EDIT

  • Get the directories
data_list = [d for d in os.listdir(path) if d.endswith('.jpg')]
  • Save the file in current working directory
with open('pairs.csv', 'w') as fd:
    fd.write(out)

Edit 2:

from the comments: if amount of image big, 52712, itertools.product (or the generator) seems to "crash". Here (hope) another way with an explicit for-loop:

with open('...', 'w') as wd:
    header = '|ImgA|ImgB\n'
    fd.write(header)
    try:
        for i, j in it.product(data_list, repeat=2):
            if i != j:
                fd.write(f'|{i}|{j}\n')            
    except Exception("No way... to many data, a new strategy should be found!!") as e:
        print(e)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1