'Applying transformation to data set in pytorch and add them to the data

I want to load fashion-mnist (or any other data set) using torchvision.datasets.FashionMNIST(data_dir, train=True, download=True) and then apply some image transformation such as cropping or adding noise, etc and finally add the transformed data to the original data set. The only way I found is torchvision.transform but it changes the original dataset and do not augment the data set. How can I augment the data set?



Solution 1:[1]

As @Ivan already pointed out in the comments, when accessing an image, PyTorch always loads its original dataset version. Then, transform applies online your transformation of choice to the data.

In general, setting a transform to augment the data without touching the original dataset is the common practice when training neural models.

That said, if you need to mix an augmented dataset with the original one you can, for example, stack two datasets with torch.utils.data.ConcatDataset, as follows:

dset1 = torchvision.datasets.FashionMNIST(data_dir, train=True, download=True)
dset2 = torchvision.datasets.FashionMNIST(data_dir, train=True, transform=my_transform)

dset = torch.utils.data.ConcatDataset([dset1, dset2])

Have a look at this page for more alternatives.

Finally, if you need, you can also save your dataset for future use (thus freezing the random transform applied to your data):

for idx, (img, target) in enumerate(dset):
    torch.save(img, f"mydset/{fname}_img_{idx}.pt")
    torch.save(target, f"mydset/{fname}_tgt_{idx}.pt")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aretor