'Independent Thread Scheduling since Volta
Nvidia introduced a new Independent Thread Scheduling for their GPGPUs since Volta. In case CUDA threads diverge, alternative code paths are not executed in blocks but instruction-wise. Still, divergent paths can not be executed at the same time since the GPUs are SIMT as well. This is the original article:
https://developer.nvidia.com/blog/inside-volta/ (scroll down to "Independent Thread Scheduling").
I understood what this means. What I don't understand is, in which way this new behavoir accelerates code. Even the before/after diagrams in the above article do not reflect an overall speed-up.
My question: Which kinds of divergent algorithms will run faster on Volta (and newer) due to the described new scheduling?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
