'Why does does SSE set (_mm_set_ps) reverse the order of arguments

I recently noticed that

_m128 m = _mm_set_ps(0,1,2,3);

puts the 4 floats into reverse order when cast to a float array:

(float*) p = (float*)(&m);
// p[0] == 3
// p[1] == 2
// p[2] == 1
// p[3] == 0

The same happens with a union { _m128 m; float[4] a; } also.

Why do SSE operations use this ordering? It's not a big deal but slightly confusing.

And a follow-up question:

When accessing elements in the array by index, should one access in the order 0..3 or the order 3..0 ?

c++c simd sse intrinsics

Solution 1:^[1]

It's just a convention; they had to pick some order, and it really doesn't matter what the order is as long as everyone follows it. Intel happens to like little-endianness.

As far as accessing by index goes... the best thing is to try to avoid doing it. Nothing kills vector performance like element-wise accesses. If you must, try set things up so that the indexing matches the hardware vector lanes; that's what most vector programmers (in my experience) will expect.

Solution 2:^[2]

Depend on what you would like to do, you can use either _mm_set_ps or _mm_setr_ps.

__m128 _mm_setr_ps (float z, float y, float x, float w )
Sets the four SP FP values to the four inputs in reverse order.

Solution 3:^[3]

Isn't that consistent with the little-endian nature of x86 hardware? The way it stores the bytes of a long long.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Stephen Canon
Solution 2	phuclv
Solution 3	Bo Persson

'Why does does SSE set (_mm_set_ps) reverse the order of arguments

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]