'Custom Dataset class for hierarchical dataset using “Local Classifier per Parent Node” technique

I’m working on an image classification model for hierarchical dataset using PyTorch implementation of EfficientNet. I am seeking help on how to edit my torchvision.datasets.ImageFolder and any other part if necessary to implement Local Classifier per Parent Node technique used for hierarchical dataset.

So far I finished the basic model pipeline without any hierarchical technique by flattening all label classes in a single level:

├── seg_train
│   ├── golden_retriever_dog
│   ├── german_shepherd_dog
│   ├── persian_cat
│   ├── shorthair_cat
│   ├── golden_hamster
│   └── syrian_hamster

current filetree for training images. Each directory is an animal breed for dogs, cats, and hamsters filled with .png images.

├── seg_train
│   ├── dogs
│   │   ├── golden_retriever
│   │   └── german_shepherd
│   ├── cats
│   │   ├── persian
│   │   └── shorthair
│   └── hamster
│   │   ├── golden
│   │   └── syrian

This is the filetree I am working on to take advantage of the parent node’s information.

Here’s my short explanation of what I’m trying to accomplsih(LCPN) from this blog:

  • instead of one classifier classifying six categories with my original setup, I want to train a single classifier to classify images into parent categories (dogs, cats, hamster) and three classifier for each parent category to classify those images into sub categories.
  • For example, let’s say I have an image of a golden_retriever. The parent classifier will classify it into “dogs”(parent) category and the child classifier (child) will classify it into golden retriever label.

Currently, my load_dataset code looks like the following. I can have a different train_data loader for every parent node (dogs, cats) but I am having difficulty trying to code the loader for parent classifier, where “dogs” class should include all images from its sub categories.

def load_dataset():
    train_data = torchvision.datasets.ImageFolder(
        root="../input/seg_train", transform=train_transforms
    )
    test_data = torchvision.datasets.ImageFolder(
        root="../input/seg_test", transform=test_transforms
    )
    dataloaders = data_loader(
        train_data, test_data, valid_size=0.2, batch_size=BATCH_SIZE
    )
    # label of classes
    classes = train_data.classes
    # encoder and decoder to convert classes into integer
    decoder = {}
    for i in range(len(classes)):
        decoder[classes[i]] = i
    encoder = {}
    for i in range(len(classes)):
        encoder[i] = classes[i]
    
    return train_loader, dataloaders, classes, encoder, inv_normalize

I would really appreciate if anyone can share their insights on my issue! I am not sure if this is the optimal way to accomplish this LCPN technique, so I welcome any alternative solutions as well!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source