'ARM Cortex-A9 NEON and VFP

I am using ARM Cortex-A9 (zynq7000) and I want to enable the neon SIMD but not to use it for floating points unless specified.

When compiled by arm-none-eabi-gcc with following fpu options (seperately) :

mfpu=vfpv3 -mfloat-abi=softfp ,
mfpu=neon-vfpv3 -mfloat-abi=softfp,
mfpu=neon -mfloat-abi=softfp,

the binaries 1 & 2 are different. But 2&3 are the same (vectorization not enabled), I am using -Og for optimization. ( -Og does not enable Vectorize options)

How can I make sure that all floating points are done in VFP, not the NEON when I use the option mfpu=neon-vfpv3?

According to the ARM Architecture Reference Manual, NEON and VFP support similar Instructions, which makes it difficult to distinguish the difference just by checking disassembly.

Moreover, I am planning to use #pragma GCC ivdep for the loops and functions that I need to vectorize, and what would be the appropriate compiler flags to achieve this?

Solution 1:^[1]

The compiler will never use any neon instruction unless auto vectorization is enabled or enforced via intrinsics.

Even though neon and vfp instructions look similar, they even operate in a different mode each.

There are a few instructions shared by vfp and neon on armv7 (mostly memory related), but they shouldn't be of any concern.

Why don't you post the disassemblies?

Solution 2:^[2]

-mfpu=

In GCC(arm) when the -mcpu=cortex-a9 or -march=armv7-a is set the option mfpu=neon-vfpv3 and mfpu=neon are identical.

‘+neon’ https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

-mfloat-abi=

soft: VFP is not used and uses ARM Calling Convention
softfp: VFP is used but uses ARM Calling Convention ( ARM R registers are used to pass parameters to functions)
hard: VFP is used and Calling Convention is specific to the H/W ( along with ARM R registers VFP/NEON S and D registers are used to pass parameters to functions. S/D registers are used for floating-point parameters called by value) ?

Floating Point operations on NEON(SIMD)

Unless the option ? funsafe-math-optimizations is set in GCC, Neon is NOT used for Floating-Point operations. (Neon does not follow IEEE 754 )

vfp and neon instructions in disassembly:

in case of vmov,

The vfp uses only vmov.f32 and vmov.f64
neon uses vmov.i64, vmov.i32, and so on.

? Loop Vectorization

For Loop Vectorization -ftree-vectorize and -O2 or -O3 Optimization Option can be used

When -Og Optimization is used Loops may not get vectorized automatically
vectorization of loops with neon

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Salinda

'ARM Cortex-A9 NEON and VFP

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]