'How to correctly measure the running time of an opencl program
When I used opencl to test the performance of the program, I encountered some doubts. The original intention of our experiment was to test the calculation time of the program, but there were some problems in the actual process. In the following code, timex uses the method of binding events to calculate the results. However, in the actual results, we found that (end_time - start_time) < time1+time2+time3 . We don't know why, and we're also curious how exactly we should calculate the running time of the task.
start_time = clock()
compute(){
writeData(); // time1
clEnqueueNDRangeKernel(); // time2
readData(); //time3
do_other();
}
end_time = clock()
Solution 1:[1]
It depends on what exactly you want to measure. If it is just for profiling, you could be interested in time1, time2 and time3 separately. If it is for measuring the performance of the compute(), you want to measure its entire runtime.
To measure time with nanosecond precision, use this clock:
#include <chrono>
class Clock {
private:
typedef chrono::high_resolution_clock clock;
chrono::time_point<clock> t;
public:
Clock() { start(); }
void start() { t = clock::now(); }
double stop() const { return chrono::duration_cast<chrono::duration<double>>(clock::now()-t).count(); }
};
In your example, the issue with (end_time-start_time) < time1+time2+time3 may have two possible reasons:
- Your clock implementation is not accurate enough and it is just rounding error.
- You didn't use
clFinish(queue);. The OpenCL commands are only enqueued, but not executed right away. SoclEnqueueNDRangeKernel();for example immediately returns, and if you measure time before and time after, you get practically 0. To wait until the Kernel really has been executed, you need to callclFinishafterards.
Example:
Measure time1, time2 and time3 separately:
compute() {
//clFinish(queue); // if there could still be stuff in the queue, finish it first
Clock clock;
clock.start();
writeData(); // time1
clFinish(queue);
const double time1 = clock.stop(); // in seconds
clock.start();
clEnqueueNDRangeKernel(); // time2
clFinish(queue);
const double time2 = clock.stop(); // in seconds
clock.start();
readData(); //time3
clFinish(queue);
const double time3 = clock.stop(); // in seconds
do_other();
clFinish(queue); // don't forget to finish the queue!
}
To measure the entire execution time of compute(), you only need one clFinish at the end.
compute() {
//clFinish(queue); // if there could still be stuff in the queue, finish it first
Clock clock;
clock.start();
writeData(); // time1
clEnqueueNDRangeKernel(); // time2
readData(); //time3
do_other();
clFinish(queue); // don't forget to finish the queue!
const double time = clock.stop(); // in seconds
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ProjectPhysX |
