'What is the maximum block count possible in CUDA?

Theoretically, you can have 65535 blocks per dimension of the grid, up to 65535 * 65535 * 65535.

If you call a kernel like this:

kernel<<< BLOCKS,THREADS >>>()

(without dim3 objects), what is the maximum number available for BLOCKS?

In an application of mine, I've set it up to 192000 and seemed to work fine... The problem is that the kernel I used changes the contents of a huge array, so although I checked some parts of the array and seemed fine, I can't be sure whether the kernel behaved strangely at other parts.

For the record I have a 2.1 GPU, GTX 500 ti.



Solution 1:[1]

In case anybody lands here based on a Google search (as I just did):

Nvidia changed the specification since this question was asked. With compute capability 3.0 and newer, the x-Dimension of a grid of thread blocks is allowed to be up to 2'147'483'647 or 2^31 - 1.

See the current: Technical Specification

Solution 2:[2]

65535 in a single dimension. Here's the complete table

Solution 3:[3]

I manually checked on my laptop (MX130), program crashes when #blocks > 678*1024+651. Each block with 1 thread, Adding even a single more block gives SegFault. Kernal code had no grid, linear structure only.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Marco
Solution 2 jwdmsd
Solution 3 Varun