'How to use multi-vector types in ARM64 inline assembly?

In ARM64 compilers with GCC-like __asm__, how could I make use of multi-vector NEON types like uint8x16x4_t?

uint8x16x4_t Meow()
{
    uint8x16x4_t result;
    __asm__(
        "meow %0"
    :   "=w"(result));
    return result;
}

That results in the following assembly output:

    meow v0

Is there a way to get it to be something like this?:

    meow { v0.16b - v3.16b }

Or even better, refer to the individual parts somehow.



Solution 1:[1]

You'll have to do it manually, but you can do so with the T, U and V modifiers. And suffixes can just be specified literally. The following code:

uint8x16x4_t Meow()
{
    uint8x16x4_t result;
    __asm__(
        "meow { %0.16b, %T0.16b, %U0.16b, %V0.16b }"
    :   "=w"(result));
    return result;
}

gives me:

Meow:
    meow { v4.16b, v5.16b, v6.16b, v7.16b }
    mov     v1.16b, v5.16b
    mov     v2.16b, v6.16b
    mov     v3.16b, v7.16b
    mov     v0.16b, v4.16b
    ret

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Siguza