'cudaErrorInvalidAddressSpace: operation not supported on global/shared address space
I'm trying to run EnyaHermite's pytorch implementation of PicassoNet-II (https://github.com/EnyaHermite/Picasso) on a Ubuntu 18.04.6 LTS GPU cluster and I encounter the following error:
terminate called after throwing an instance of 'thrust::system::system_error' what(): CUDA free failed: cudaErrorInvalidAddressSpace: operation not supported on global/shared address space
The framework is utilizing a few CPP functions in its main python script and one of them is this decimate_gpu.cu file (https://github.com/EnyaHermite/Picasso/blob/main/pytorch/picasso/mesh/modules/source/decimate_gpu.cu).
I cannot debug the file since I'm running it on a GPU cluster, I only know the crash happends because of this file. I've only seen one similar issue (here: https://forums.developer.nvidia.com/t/invalidaddressspace-when-using-pointer-from-continuation-callable-parameters/184951/7). The issue in that post was an incorrect definition of a callable, so a change from __global__ to __device__ made it work.
I'm not sure if this error is similar, however I have no idea how to fix it.
Best, Bjonze
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
