'Effect of min_granularity_ns on performance
In order to find the effect of kernel parameter, min_granularity_ns, a 16-thread OMP implementation of a matrix multiplication code is launched with high and low values of that parameter. The perf result is shown below:
# echo 1000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_omp 16
Using 16 threads
Total execution Time in seconds: 12.3690895601
MM execution Time in seconds: 12.2312941169
Performance counter stats for 'system wide':
911.97 Joules power/energy-pkg/
218,012,129,383 instructions # 0.26 insn per cycle
823,773,717,094 cycles
37,701 context-switches
131 cpu-migrations
51,012 page-faults
12.369310043 seconds time elapsed
# echo 1000000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_double_omp 16
Using 16 threads
Total execution Time in seconds: 12.3981724780
MM execution Time in seconds: 12.2612874920
Performance counter stats for 'system wide':
881.48 Joules power/energy-pkg/
218,063,319,724 instructions # 0.27 insn per cycle
822,622,830,036 cycles
37,959 context-switches
146 cpu-migrations
51,553 page-faults
12.400958939 seconds time elapsed
As you can see there is no difference between the results albeit the large change in the kernel parameter, from 1 us to 1 second. Although there are other parameters in addition to min_granularity_ns, does that "no-difference" make sense? Or maybe this is not a correct program to test?
UPDATE 1: I test another implementation which uses CBLAS and it utilizes 16-threads. As you can see, for a large matrix size (20k), the IPC is 1.77 which is acceptable. Again, by varying the min_granularity_ns, there is no difference in time, although the number of context-swtiches decreases for large granularity.
# echo 1000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_blas
Total execution Time in seconds: 47.0226452020
MM execution Time in seconds: 37.1756865050
Performance counter stats for 'system wide':
3,106.80 Joules power/energy-pkg/
3,943,151,227,404 instructions # 1.77 insn per cycle
2,230,425,316,645 cycles
273,271 context-switches
383 cpu-migrations
2,360,017 page-faults
47.272118708 seconds time elapsed
# echo 1000000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_blas
Total execution Time in seconds: 46.8930790700
MM execution Time in seconds: 37.0639640210
Performance counter stats for 'system wide':
3,080.33 Joules power/energy-pkg/
3,924,979,103,204 instructions # 1.77 insn per cycle
2,223,571,579,672 cycles
125,643 context-switches
355 cpu-migrations
2,358,432 page-faults
47.148148344 seconds time elapsed
Still I wonder what is the effect of that parameter on the performance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
