'Why might a C++ compiler duplicate a function exit basic block?

Consider the following snippet of code:

int* find_ptr(int* mem, int sz, int val) {
    for (int i = 0; i < sz; i++) {
        if (mem[i] == val) { 
            return &mem[i];
        }
    }
    return nullptr;
}

GCC on -O3 compiles this to:

find_ptr(int*, int, int):
        mov     rax, rdi
        test    esi, esi
        jle     .L4                  # why not .L8?
        lea     ecx, [rsi-1]
        lea     rcx, [rdi+4+rcx*4]
        jmp     .L3
.L9:
        add     rax, 4
        cmp     rax, rcx
        je      .L8
.L3:
        cmp     DWORD PTR [rax], edx
        jne     .L9
        ret
.L8:
        xor     eax, eax
        ret
.L4:
        xor     eax, eax
        ret

In this assembly, the blocks with labels .L4 and .L8 are identical. Would it not be better to rewrite jumps to .L4 to .L8 and drop .L4? I thought this might be a bug, but clang also duplicates the xor-ret sequence back to back. However, ICC and MSVC each take a pretty different approach.

Is this an optimization in this case and, if not, are there times when it would be? What is the rationale behind this behavior?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source