'Effect of min_granularity_ns on performance

In order to find the effect of kernel parameter, min_granularity_ns, a 16-thread OMP implementation of a matrix multiplication code is launched with high and low values of that parameter. The perf result is shown below:

# echo 1000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_omp 16
Using 16 threads
Total execution Time in seconds: 12.3690895601
MM execution Time in seconds: 12.2312941169

 Performance counter stats for 'system wide':

            911.97 Joules power/energy-pkg/
   218,012,129,383        instructions              #    0.26  insn per cycle
   823,773,717,094        cycles
            37,701        context-switches
               131        cpu-migrations
            51,012        page-faults

      12.369310043 seconds time elapsed

# echo 1000000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_double_omp 16
Using 16 threads
Total execution Time in seconds: 12.3981724780
MM execution Time in seconds: 12.2612874920

 Performance counter stats for 'system wide':

            881.48 Joules power/energy-pkg/
   218,063,319,724        instructions              #    0.27  insn per cycle
   822,622,830,036        cycles
            37,959        context-switches
               146        cpu-migrations
            51,553        page-faults

      12.400958939 seconds time elapsed

As you can see there is no difference between the results albeit the large change in the kernel parameter, from 1 us to 1 second. Although there are other parameters in addition to min_granularity_ns, does that "no-difference" make sense? Or maybe this is not a correct program to test?


UPDATE 1: I test another implementation which uses CBLAS and it utilizes 16-threads. As you can see, for a large matrix size (20k), the IPC is 1.77 which is acceptable. Again, by varying the min_granularity_ns, there is no difference in time, although the number of context-swtiches decreases for large granularity.

# echo 1000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_blas
Total execution Time in seconds: 47.0226452020
MM execution Time in seconds: 37.1756865050

 Performance counter stats for 'system wide':

          3,106.80 Joules power/energy-pkg/
 3,943,151,227,404        instructions              #    1.77  insn per cycle
 2,230,425,316,645        cycles
           273,271        context-switches
               383        cpu-migrations
         2,360,017        page-faults

      47.272118708 seconds time elapsed

# echo 1000000000 > /sys/kernel/debug/sched/min_granularity_ns
# perf stat -a -e $EVENTS -- ./mm_blas
Total execution Time in seconds: 46.8930790700
MM execution Time in seconds: 37.0639640210

 Performance counter stats for 'system wide':

          3,080.33 Joules power/energy-pkg/
 3,924,979,103,204        instructions              #    1.77  insn per cycle
 2,223,571,579,672        cycles
           125,643        context-switches
               355        cpu-migrations
         2,358,432        page-faults

      47.148148344 seconds time elapsed

Still I wonder what is the effect of that parameter on the performance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source