'Lowering batch-size eventually results in PyTorch increasing amount of memory reserved

I'm trying to train a model meant for long text classification. I'm using this repo: https://github.com/franbvalero/BERT-long-sentence-classification/tree/optimizer_lstm But I modified it to work with a different dataset (i.e. I changed the inputs, not anything directly related to the training). I'm also using a different BERT model trained for another language.

The problem I'm facing is that going lower in bath-size increases the amount PyTorch reserves. Using 64 as batch-size results in the following error message:

RuntimeError: CUDA out of memory. Tried to allocate 20.95 GiB (GPU 0; 8.00 GiB total capacity; 700.89 MiB already allocated; 5.68 GiB free; 756.00 MiB reserved in total by PyTorch)

Definitely too high for the capacity available to me; I try 32 instead.

RuntimeError: CUDA out of memory. Tried to allocate 10.47 GiB (GPU 0; 8.00 GiB total capacity; 602.89 MiB already allocated; 5.77 GiB free; 658.00 MiB reserved in total by PyTorch)

Next 16:

RuntimeError: CUDA out of memory. Tried to allocate 5.24 GiB (GPU 0; 8.00 GiB total capacity; 5.78 GiB already allocated; 596.27 MiB free; 5.83 GiB reserved in total by PyTorch)

At this point the reserved total by PyTorch has sky-rocketed and this trend follows all the way down to a batch-size of 1 where it will try allocate just a few mb but PyTorch will reserve >6 gb.

RuntimeError: CUDA out of memory. Tried to allocate 336.00 MiB (GPU 0; 8.00 GiB total capacity; 6.20 GiB already allocated; 138.27 MiB free; 6.26 GiB reserved in total by PyTorch)

I'm a little bit confused as to why this happens. Lowering the batch-size does reduce the memory required in terms of the allocation that is attempted, but it also increases the amount that PyTorch reserves.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source