'Implementing SHLD/SHRD instructions in C
I'm trying to efficiently implement SHLD and SHRD instructions of x86 without using inline assembly.
uint32_t shld_UB_on_0(uint32_t a, uint32_t b, uint32_t c) {
return a << c | b >> 32 - c;
}
seems to work, but invokes undefined behaviour when c == 0 because the second shift's operand becomes 32. The actual SHLD instruction with third operand being 0 is well defined to do nothing. (https://www.felixcloutier.com/x86/shld)
uint32_t shld_broken_on_0(uint32_t a, uint32_t b, uint32_t c) {
return a << c | b >> (-c & 31);
}
doesn't invoke undefined behaviour, but when c == 0 the result is a | b instead of a.
uint32_t shld_safe(uint32_t a, uint32_t b, uint32_t c) {
if (c == 0) return a;
return a << c | b >> 32 - c;
}
does what's intended, but gcc now puts a je. clang on the other hand is smart enough to translate it to a single shld instruction.
Is there any way to implement it correctly and efficiently without inline assembly?
And why is gcc trying so much not to put shld? The shld_safe attempt is translated by gcc 11.2 -O3 as (Godbolt):
shld_safe:
mov eax, edi
test edx, edx
je .L1
mov ecx, 32
sub ecx, edx
shr esi, cl
mov ecx, edx
sal eax, cl
or eax, esi
.L1:
ret
while clang does,
shld_safe:
mov ecx, edx
mov eax, edi
shld eax, esi, cl
ret
Solution 1:[1]
As far as I have tested with gcc 9.3 (x86-64), it translates the following code to shldq and shrdq.
uint64_t shldq_x64(uint64_t low, uint64_t high, uint64_t count) {
return (uint64_t)(((((unsigned __int128)high << 64) | (unsigned __int128)low) << (count & 63)) >> 64);
}
uint64_t shrdq_x64(uint64_t low, uint64_t high, uint64_t count) {
return (uint64_t)((((unsigned __int128)high << 64) | (unsigned __int128)low) >> (count & 63));
}
Also, gcc -m32 -O3 translates the following code to shld and shrd. (I have not tested with gcc (i386), though.)
uint32_t shld_x86(uint32_t low, uint32_t high, uint32_t count) {
return (uint32_t)(((((uint64_t)high << 32) | (uint64_t)low) << (count & 31)) >> 32);
}
uint32_t shrd_x86(uint32_t low, uint32_t high, uint32_t count) {
return (uint32_t)((((uint64_t)high << 32) | (uint64_t)low) >> (count & 31));
}
(I have just read the gcc code and written the above functions, i.e. I'm not sure they are your expected ones.)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Hironori Bono |
