'How could subset image data

I'm working on an image classification problem; the dataset I use is food-101. The data contains 101 classes for each class with 1000 images; how can I feed less images to the model?

here is a code for splitting the data into train and test

from shutil import copy
def prepare_data(filepath, src,dest):
  classes_images = defaultdict(list)
  with open(filepath, 'r') as txt:
      paths = [read.strip() for read in txt.readlines()]
      for p in paths:
        food = p.split('/')
        classes_images[food[0]].append(food[1] + '.jpg')

  for food in classes_images.keys():
    print("\nCopying images into ",food)
    if not os.path.exists(os.path.join(dest,food)):
      os.makedirs(os.path.join(dest,food))
    for i in classes_images[food]:
      copy(os.path.join(src,food,i), os.path.join(dest,food,i))
  print("Copying Done!")

then Prepare the training dataset by copying images from food-101/images to food-101/train using the file train.txt

print("Creating train data...")
prepare_data('food-101/meta/train.txt', 'food-101/images', 'food-101/train')


print("Creating train data...")
prepare_data('food-101/meta/train.txt', 'food-101/images', 'food-101/train')

here is the code that will check how many files are in the train folder

train_files = sum([len(files) for i, j, files in os.walk("food-101/train/")])
print("Total number of samples in train folder")
print(train_files)

will return 75750; this num represents 1000 images from 101 classes. How can I pick 100 images instead 1000 from each classes ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source