'How to best balance a dataset with overlapping classes?

I have a pandas dataframe created from a dataset of images. These images contain multiple animals: cats, dogs, ducks and fishes. The dataframe lists what percentage of animal is present within a given image, ie:

Image  cat   dog   duck   fish
1      10    20     20     50
2      99     0      1      0
3       0    10     10     80
...   ...    ...    ...    ...

This dataset is quite unbalanced, it contains way too high percentage of fish, so I would like to balance it out. Probably the best way is by removing some images of fish thus lowering the amount of fish present in the data.

Now my question is, what would be the best way to lower the amount of fish (and possibly other over-represented classes) in the dataset, without lowering the percentages of other classes too much? How can I decide which images to remove from the dataset, in such a way that the loss of other classes is minimized but the dataset becomes balanced? Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source