'ARM Cortex-A9 NEON and VFP
I am using ARM Cortex-A9 (zynq7000) and I want to enable the neon SIMD but not to use it for floating points unless specified.
When compiled by arm-none-eabi-gcc with following fpu options (seperately) :
- mfpu=vfpv3 -mfloat-abi=softfp ,
- mfpu=neon-vfpv3 -mfloat-abi=softfp,
- mfpu=neon -mfloat-abi=softfp,
the binaries 1 & 2 are different. But 2&3 are the same (vectorization not enabled), I am using -Og for optimization. ( -Og does not enable Vectorize options)
How can I make sure that all floating points are done in VFP, not the NEON when I use the option mfpu=neon-vfpv3?
According to the ARM Architecture Reference Manual, NEON and VFP support similar Instructions, which makes it difficult to distinguish the difference just by checking disassembly.
Moreover, I am planning to use #pragma GCC ivdep for the loops and functions that I need to vectorize, and what would be the appropriate compiler flags to achieve this?
Solution 1:[1]
The compiler will never use any neon instruction unless auto vectorization is enabled or enforced via intrinsics.
Even though neon and vfp instructions look similar, they even operate in a different mode each.
There are a few instructions shared by vfp and neon on armv7 (mostly memory related), but they shouldn't be of any concern.
Why don't you post the disassemblies?
Solution 2:[2]
-mfpu=
In GCC(arm) when the
-mcpu=cortex-a9or-march=armv7-ais set the optionmfpu=neon-vfpv3andmfpu=neonare identical.
-mfloat-abi=
- soft: VFP is not used and uses ARM Calling Convention
- softfp: VFP is used but uses ARM Calling Convention ( ARM R registers are used to pass parameters to functions)
- hard: VFP is used and Calling Convention is specific to the H/W ( along with ARM R registers VFP/NEON S and D registers are used to pass parameters to functions. S/D registers are used for floating-point parameters called by value) ?
Floating Point operations on NEON(SIMD)
- Unless the option ?
funsafe-math-optimizationsis set in GCC, Neon is NOT used for Floating-Point operations. (Neon does not follow IEEE 754 )
vfp and neon instructions in disassembly:
in case of vmov,
- The vfp uses only
vmov.f32andvmov.f64 - neon uses vmov.i64,
vmov.i32, and so on.
? Loop Vectorization
For Loop Vectorization
-ftree-vectorizeand-O2or-O3Optimization Option can be usedWhen
-OgOptimization is used Loops may not get vectorized automatically
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Salinda |
