'torch.manual_seed(seed) get RuntimeError: CUDA error: device-side assert triggered

I am using GOOGLE COLAB when I get this error. Here is my code, I didn't find anything wrong, these code were right few hour ago but suddenly went wrong, I don't know why

import torch
if torch.cuda.is_available():       
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
seed=1
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True 

The error is

There are 1 GPU(s) available.
We will use the GPU: Tesla P100-PCIE-16GB
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-121-436d9d8bb120> in <module>()
      9 seed=1
     10 np.random.seed(seed)
---> 11 torch.manual_seed(seed)
     12 torch.cuda.manual_seed_all(seed)
     13 torch.backends.cudnn.deterministic = True

3 frames
/usr/local/lib/python3.7/dist-packages/torch/cuda/random.py in cb()
    109         for i in range(device_count()):
    110             default_generator = torch.cuda.default_generators[i]
--> 111             default_generator.manual_seed(seed)
    112 
    113     _lazy_call(cb, seed_all=True)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Could anyone help me?



Solution 1:[1]

In my experience, this error may occur because of some kind of inconsistency between the number of labels in your targets and the number of classes in your model.

To solve it you can try to:

  1. Make sure that the label in your target data starts from 0. If you have n classes in your data, your target classes should be [0, 1, 2,..., n-1]
  2. Make sure that the model you are using is set to work with n classes

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Carlo Longhi