'How to create a pairs. csv file given an image dataset
Given a dataset of images, I would like to create a pairs.csv file for both train and test set. The format for the csv file is shown below
Let's assume in train set folder A contains the following images:
1.jpg
2.jpg
3.jpg
then my CSV file will look like
|ImgA|ImgB
|1.jpg|2.jpg
|1.jpg|3.jpg
|2.jpg|1.jpg
|2.jpg|3.jpg
|3.jpg|1.jpg
|3.jpg|2.jpg
Another example of the dataset and csv file structure is shown below.
For the folder structure shown here
some of the csv file entries are as follows:
I could do it manually if the number of images and permutations involved were not that large. For example, the screenshots were taken from a folder which has 31 subdirectories each subdirectory contains at most 5 or 6 images similar to screenshot 2.
Solution 1:[1]
Use ìtertools.product to make the pairings. Here an example of the logic:
import itertools as it
# just to make runnable data as string and then splitted into list
data = """1.jpg
2.jpg
3.jpg"""
data_list = data.split('\n')
# body of the program
header = '|ImgA|ImgB\n'
# make all parings
out = (f'{i}|{j}' for i, j in it.product(data_list, repeat=2) if i!=j)
out = header + '|{}'.format('\n|'.join(out))
print(out)
Output
|ImgA|ImgB
|1.jpg|2.jpg
|1.jpg|3.jpg
|2.jpg|1.jpg
|2.jpg|3.jpg
|3.jpg|1.jpg
|3.jpg|2.jpg
EDIT
- Get the directories
data_list = [d for d in os.listdir(path) if d.endswith('.jpg')]
- Save the file in current working directory
with open('pairs.csv', 'w') as fd:
fd.write(out)
Edit 2:
from the comments: if amount of image big, 52712, itertools.product (or the generator) seems to "crash". Here (hope) another way with an explicit for-loop:
with open('...', 'w') as wd:
header = '|ImgA|ImgB\n'
fd.write(header)
try:
for i, j in it.product(data_list, repeat=2):
if i != j:
fd.write(f'|{i}|{j}\n')
except Exception("No way... to many data, a new strategy should be found!!") as e:
print(e)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


