'Training Yolov5 on RTX 3060 Ti GPU I'm getting error "RuntimeError: Unable to find a valid cuDNN algorithm to run convolution"

Training Yolov5 with --img 8088 and batch size 16 on RTX 3060 Ti GPU using the following command

python train.py --img 1088 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --device 0 --workers 0

I'm getting the following exception "RuntimeError: Unable to find a valid cuDNN algorithm to run convolution" and by reducing the batch size to 8 I'm able to train the model

 File "train.py", line 611, in <module>
    main(opt)
  File "train.py", line 509, in main
    train(opt.hyp, opt, device)
  File "train.py", line 311, in train
    pred = model(imgs)  # forward
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 123, in forward
    return self.forward_once(x, profile, visualize)  # single-scale inference, train
  File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 155, in forward_once
    x = m(x)  # run
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 137, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 45, in forward
    return self.act(self.bn(self.conv(x)))
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

P.S also can anyone guide me on how to evaluate which GPU is best for training my model please do enlighten me on that as well



Solution 1:[1]

The answer is on the error log

RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 5.48 GiB already allocated; 81.94 MiB free; 5.61 GiB reserved in total by PyTorch)

It is trying to allocate more memory than you have on your GPU.

Solution 2:[2]

Try to reduce the batch_size, I had the same problem and when I reduce the batch size, it works for me !

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Oscar Rangel
Solution 2 El Mehdi Tafik