'OpenCL FPGA: Kernel Execution of 2 copies of same kernel is not being made in parallel. In addition to that, there is also idle time in between them

My goal is to complete FFTs of 2 - 4K Data points together. Hence, I made 2 kernel objects from the same kernel and Enqueued the tasks at once, i.e. without any Buffer Read-Write or any callbacks in between. I find out that it doesn't happen that way. In addition to that, there is also some idle time between the executions. Can someone please explain? AOCL Report of the Program

I was expecting both of them to run in parallel because my FPGA seems to have more area. About 38 percent of it is used.



Solution 1:[1]

I found this question that kind off answers my doubts. It can be foundhere

Solution 2:[2]

The OpenCL queue works sequentially, so one kernel is executed after the other. This makes sure that - if kernel 2 reads memory that kernel 1 has updated, there is no race condition like if they would run concurrently. There may also be some latency to start execution of a kernel.

To run multiple kernels in parallel, you can try multiple queues.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Raghuttam Hombal
Solution 2