'Handling non-image inputs of different shapes in a batch without zero-filling/zero-padding

I have (for the most part) gigapixel images that I have divided into 512x512 patches. Then I feed each 512x512 2D image with 3 channel into a ResNet18 frozen network for feature extraction and I end up with a 1D 512 tensor. Eventually, I concatenate all these 512x512 1D 512 tensors and I end up with Nx512 intermediate representation dimension where N is the number of patches in the gigapixel image.

Since my original gigapixel images are not all the same size and they range from 17x512 to 6000x512, I am using the following as a strategy in order to feed them to my model. However, my preference is to use a more standardize method as in PyTorch (in case of 2D images with 3 channel perhaps we could easily do torch transform -- not here).

feature_path = 'features.pt'
features = torch.load(feature_path, map_location=lambda storage, loc: storage)
if features.shape[0] <= median_num_patches:
    a = torch.zeros((median_num_patches - features.shape[0], 512)) #zero padding to lenght median_num_patches
    embeddings = torch.cat((features, a), axis=0)
    sample['image'] = embeddings
else: 
    random_indices = torch.randint(features.shape[0], (median_num_patches, )) # max size: 6000 patches in an image
    sample['image'] = features[random_indices, :]
            

^ As mentioned earlier, the 2D intermediate representation (Nx512) is created in an offline process and saved in features.pt files.

The above solution, after finding what the median of size of 2D intermediate representations are based on number of patches in each gigapixel image, first checks to see if the size of current 2D intermediate representation in the batch is smaller that the median, and if so, it zero-fills that 2D intermediate representation to the size of median. And if the size of 2D intermediate representation in the batch is larger than median, it does sample median number of patches from that 2D representation.

I am looking for a better solution than the current one. Perhaps something without sampling or zero-filling and without loss of data. Thanks for any possible lead.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source