'Can neon registers be indexed?

Consider a neon register such as:

uint16x8_t foo;

To access an individual lane, one is supposed to use vgetq_lane_u16(foo, 3). However, one might be tempted to write foo[3] instead given the intuition that foo is an array of shorts. When doing so, the gcc (10) compiles without warnings, but it is not clear that it does what it was intended to do.

The gcc documentation does not specifically mention the indexed access, but it says that operations behave like C++ valarrays. Those do support indexed access in the intuitive way.

Regardless of what foo[3] evaluates to, doing so seems to be faster than vgetq_lane_u16(foo, 3), so probably they're different or we wouldn't need both.

So what exactly does foo[3] mean? Is its behavior defined at all? If not, why does gcc happily compile it?



Solution 1:[1]

The foo[3] form is the GCC Vector extension form, as you have found and linked documentation to; it behaves as so:

Vectors can be subscripted as if the vector were an array with the same number of elements and base type. Out of bound accesses invoke undefined behavior at run time. Warnings for out of bound accesses for vector subscription can be enabled with -Warray-bounds.

This can have surprising results when used on big-endian systems, so Arm’s Arm C Language Extensions recommend to use vget_lane if you are using other Neon intrinsics like vld1 in the same code path.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 James Greenhalgh