'How to run cuda cooperative template kernel

I am trying to unsuccessfully launch template kernel as cooperative kernel in CUDA C++ , what am I doing wrong

error


Error       cannot determine which instance of function template "boolPrepareKernel" is intended    
 

I try to invoke kernel like below

 ForBoolKernelArgs<int> fbArgs = ...;

    int device = 0;
    cudaDeviceProp deviceProp;
    cudaGetDeviceProperties(&deviceProp, device);
   cudaLaunchCooperativeKernel((void*)boolPrepareKernel, deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

kernel is defined like

template <typename TYO>
__global__ void boolPrepareKernel(ForBoolKernelArgs<TYO> fbArgs) {
...
}

I tried parametrarize launch (in this example with int) like

    cudaLaunchCooperativeKernel((void*)(<int>boolPrepareKernel), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs) ;

but I get error

no instance of overloaded function matches the argument list            argument types are: (<error-type>, int, dim3, ForBoolKernelArgs<int>)

For suggested case

cudaLaunchCooperativeKernel((void*)(boolPrepareKernel<int>), deviceProp.multiProcessorCount, fFArgs.threads, fbArgs)

My error is

 no instance of overloaded function matches the argument list            argument types are: (void *, int, dim3, ForBoolKernelArgs<int>)

This is probably sth simple but I am stuck - thanks for help !!

For reference kernel launch like

boolPrepareKernel << <fFArgs.blocks, fFArgs.threads >> > (fbArgs);

works but of course grid synchronization is unavailable.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source