'How could subset image data
I'm working on an image classification problem; the dataset I use is food-101. The data contains 101 classes for each class with 1000 images; how can I feed less images to the model?
here is a code for splitting the data into train and test
from shutil import copy
def prepare_data(filepath, src,dest):
classes_images = defaultdict(list)
with open(filepath, 'r') as txt:
paths = [read.strip() for read in txt.readlines()]
for p in paths:
food = p.split('/')
classes_images[food[0]].append(food[1] + '.jpg')
for food in classes_images.keys():
print("\nCopying images into ",food)
if not os.path.exists(os.path.join(dest,food)):
os.makedirs(os.path.join(dest,food))
for i in classes_images[food]:
copy(os.path.join(src,food,i), os.path.join(dest,food,i))
print("Copying Done!")
then Prepare the training dataset by copying images from food-101/images to food-101/train using the file train.txt
print("Creating train data...")
prepare_data('food-101/meta/train.txt', 'food-101/images', 'food-101/train')
print("Creating train data...")
prepare_data('food-101/meta/train.txt', 'food-101/images', 'food-101/train')
here is the code that will check how many files are in the train folder
train_files = sum([len(files) for i, j, files in os.walk("food-101/train/")])
print("Total number of samples in train folder")
print(train_files)
will return 75750; this num represents 1000 images from 101 classes. How can I pick 100 images instead 1000 from each classes ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
