'Are there performance/storage differences between uint2 and uint64_t in cuda10+?
I'm trying to optimize a piece of code for A100 GPUs (ampere gen), right now we use uint64_t but I am seeing uint2 datatypes being used instead in some cuda code. Does the uint2 offer advantages for register usage? I know there are a limited number of 64-bit registers, does uint2 split the x,y ints across 32-bit registers for better occupancy? I couldn't find any specific information about register storage with these datatypes so any links to documentation for it would be appreciated.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
