'Is there a good rule out there to choose an appropriate batch size?

The common heuristic is to ensure it is as big as your acceleration hardware allows it. Nevertheless, when training on a small dataset with a large batch size it appears that the training is inefficient because each epoch only has a small number of steps, which would prevent a good enough number of updates in order to make the training converge.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source