'What are these "abundant" assembly codes used for and why are they compiled out?
I'm not familiar with the implementation of C++ compilers and I'm writing a C++ snippet like this (for learning):
#include <vector>
void vector8_inc(std::vector<unsigned char>& v) {
for (std::size_t i = 0; i < v.size(); i++) {
v[i]++;
}
}
void vector32_inc(std::vector<unsigned int>& v) {
for (std::size_t i = 0; i < v.size(); i++) {
v[i]++;
}
}
int main() {
std::vector<unsigned char> my{ 10,11,2,3 };
vector8_inc(my);
std::vector<unsigned int> my2{ 10,11,2,3 };
vector32_inc(my2);
}
By generating its assembly codes and disassemble its binary, I found the segments for
vector8_inc and vector32_inc:
g++ -S test.cpp -o test.a -O2
_Z11vector8_incRSt6vectorIhSaIhEE:
.LFB853:
.cfi_startproc
endbr64
movq (%rdi), %rdx
cmpq 8(%rdi), %rdx
je .L1
xorl %eax, %eax
.p2align 4,,10
.p2align 3
.L3:
addb $1, (%rdx,%rax)
movq (%rdi), %rdx
addq $1, %rax
movq 8(%rdi), %rcx
subq %rdx, %rcx // line C: calculate size of the vector
cmpq %rcx, %rax
jb .L3
.L1:
ret
.cfi_endproc
......
_Z12vector32_incRSt6vectorIjSaIjEE:
.LFB854:
.cfi_startproc
endbr64
movq (%rdi), %rax // line A
movq 8(%rdi), %rdx
subq %rax, %rdx // line C': calculate size of the vector
movq %rdx, %rcx
shrq $2, %rcx // Line B: ← What is this used for?
je .L6
addq %rax, %rdx
.p2align 4,,10
.p2align 3
.L8:
addl $1, (%rax)
addq $4, %rax
cmpq %rax, %rdx
jne .L8
.L6:
ret
.cfi_endproc
......
but in fact, these two functions are "inlined" within main(), and the instructions shown above will never be executed but compiled and loaded into memory (I verified this by adding a breakpoint to line A and checking the memory):
main() contains duplicate logic of the above segment
So here are my questions:
- What are these two segments used for and why are they compiled out?
_Z12vector32_incRSt6vectorIjSaIjEE,_Z11vector8_incRSt6vectorIhSaIhEE - Does the instruction (Line B) in
_Z12vector32_incRSt6vectorIjSaIjEE(i.e.shrq $2, %rcx) means 'to shift size integer of the vector right by 2 bits'? If it does, what's this used for? - Why is
size()computed every turn of the loop invector8_inc, but invector32_incit's computed before executing the loop? (see Line C and Line C', under-O2optimization level ofg++)
Solution 1:[1]
First, function names are a lot more readable if you demangle them, with a tool such as c++filt:
_Z12vector32_incRSt6vectorIjSaIjEE
-> vector32_inc(std::vector<unsigned int, std::allocator<unsigned int> >&)
_Z11vector8_incRSt6vectorIhSaIhEE
-> vector8_inc(std::vector<unsigned char, std::allocator<unsigned char> >&)
Q1
You defined those functions as global (external linkage). The compiler compiles one module at a time. It doesn't know whether this file is your entire program, or if you will link it with other object files later. In the latter case, some function from another module might call vector8_inc or vector32_inc, and so the compiler needs to create an out-of-line version to be available for such a call.
A traditional linker links code at the level of modules, and isn't able to selectively exclude functions from individual object files. It has to include your test.o module in the link, so that means the binary contains all the code from that module, including the out-of-line vector_inc functions. You know they can never be called, but the compiler didn't know that and the linker doesn't check.
Here are three ways you can avoid having them included in your binary:
Declare those functions as
staticin your source code. This ensures they cannot be called from other modules, so if the compiler can inline all calls from within this module, it need not emit the out-of-line versions.Compile with
-fwhole-programwhich tells GCC to assume that functions cannot be called from anywhere it can't see.Link using
-fltoto enable link time optimization. This provides more data to the linker, so that it can be smart and do things like omit individual functions that are never called (and much more). When I used-flto, again I no longer saw the out-of-line code in the binary.
Q2
Shift right by 2 is division by 4. It looks like the std::vector class, in GCC's implementation, contains a start pointer (loaded into %rax) and a finish pointer (loaded into %rdx). Subtracting these pointers yields the number of bytes in the vector. For a vector of unsigned int, each element is 4 bytes in size, so dividing by 4 yields the number of elements.
This value is only used as a test of whether the vector is empty, by comparing it to zero. In this case the loop is skipped and the function does nothing. (shr will set the zero flag if the result is zero, and je jumps if the zero flag is set.) So the only way the shift matters, instead of just comparing the pointer difference to 0, is if the pointer difference were 1, 2 or 3. That should not happen, since pointers to unsigned int should be 4-byte aligned (unless the library is doing clever things like storing metadata in the low bits). Thus I think the shift is actually unnecessary, and this may be a minor missed optimization.
Q3
This is most likely to handle aliasing.
There is a general rule in C++ that, roughly speaking, writing through a pointer of one type may not modify an object of another type. So in vector32_inc, the compiler knows that writing to elements of the vector, through a pointer to unsigned int, cannot modify the vector's metadata (which apparently is objects of some other type, most likely pointers). Therefore it is safe to hoist the size() computation out of the loop, since it cannot change over the course of the loop's execution. Likewise, the pointer used to iterate over the object can be kept in a register throughout.
However, character types have a special exemption from this rule, partly so that things like memcpy can work. So when you do vec[i]++, writing through a pointer to unsigned char, the compiler has to account for the possibility that the pointer may actually point to some of the vector metadata. (This should not actually happen if vector is being used properly, but the compiler doesn't know that.) If it did, then vec[i]++ might modify that metadata in memory. The next loop iteration would then be supposed to use the newly modified metadata, so it has to be reloaded from memory just in case. Note that the vector's start pointer is also reloaded, and the new value used, instead of keeping it in a register throughout.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
