'cudaErrorInvalidAddressSpace: operation not supported on global/shared address space

I'm trying to run EnyaHermite's pytorch implementation of PicassoNet-II (https://github.com/EnyaHermite/Picasso) on a Ubuntu 18.04.6 LTS GPU cluster and I encounter the following error:

terminate called after throwing an instance of 'thrust::system::system_error' what(): CUDA free failed: cudaErrorInvalidAddressSpace: operation not supported on global/shared address space

The framework is utilizing a few CPP functions in its main python script and one of them is this decimate_gpu.cu file (https://github.com/EnyaHermite/Picasso/blob/main/pytorch/picasso/mesh/modules/source/decimate_gpu.cu).

I cannot debug the file since I'm running it on a GPU cluster, I only know the crash happends because of this file. I've only seen one similar issue (here: https://forums.developer.nvidia.com/t/invalidaddressspace-when-using-pointer-from-continuation-callable-parameters/184951/7). The issue in that post was an incorrect definition of a callable, so a change from __global__ to __device__ made it work.

I'm not sure if this error is similar, however I have no idea how to fix it.

Best, Bjonze



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source