'Recommendation for nvfortran compiler useful flags

I use gfortran for years but quite new to nvfortran. I would like to ask if anyone can give me recommendation for nvfortran compiler useful flags for both debug and build modes?

what I know for debug mode are:

-C -g -Mbounds -traceback

and for build mode (with optimizations) are:

-O3 -Mconcur


Solution 1:[1]

We generally recommend using "-fast", "-O3", or "-fast -O3" for general performance. "-Mconcur" enables auto-parallelization which may or may not help. In general it's better to use explicit parallelization via OpenACC or OpenMP directives, or Fortran "DO CONCURRENT".

Other potentially useful optimization flags:

-Mnouniform - Allow non-uniform computation of SIMD and scalar code. Faster, but may reduce some accuracy.

-Mstack_arrays - Allocate automatic arrays on the stack rather than the heap. Faster but uses more stack. You may need to increase the program's stack in your shell environment.

-Bstatic-nvidia - Link the compiler runtime libraries statically rather than dynamic.

-Mfprelaxed - Allow use of faster but reduced precision intrinsics and floating-point computations.

-mp[=gpu] - Enable OpenMP directives and optionally enable target offload to GPUs.

-acc[=multicore] - Enable OpenACC directives, defaults to offload to GPUs, use "multicore" to target multicore CPUs.

-stdpar[=gpu] - Enable parallelization of DO CONCURRENT to host or GPU.

The debugging flags are fine, though "-C" and "-Mbounds" both enable bounds checking so only one is needed.

Another useful flag to use during development is "-Minfo". The compiler will give feedback messages on what optimization it's applying or not able to apply. It can be a lot of messages, so you can use sub-options to limit the output to particular types such as "-Minfo=vect" to see which loop are or are not getting vectorized. See "nvfortran -help -Minfo" for the full list of sub-options.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mat Colgrove