'Why does this example of what compilers aren't allowed to do cause null pointer dereferencing using cmov?
C code:
int cread(int *xp) {
return (xp ? *xp : 0);
}
assembly code: (from a textbook example of what a compiler isn't allowed to do) using conditional move instruction
movl $0, %eax
testl %edx, %edx
cmovne (%edx), %eax
This is an example used in Computer Systems: A Programmer's Perspective (2nd edition) to show that code cannot be compiled using conditional data transfer if either branch of a condition results in an error. In this case, the error would be the null pointer dereferencing of xp.
I understand that xp is dereferenced, but I don't understand how xp becomes a null pointer. Wouldn't that depend on pointer being passed as a parameter into the function?
Solution 1:[1]
The assembly code is technically valid, but it would fault if the input was NULL and as such doesn't match the behavior of the C code. Given that the whole point of the thing is to return zero in that case and not fault, it's wrong. The C equivalent is:
int cread(int *xp) {
int val = *xp;
return (xp ? val : 0);
}
As you can see, it first dereferences xp and only then checks to see if xp is NULL so this clearly won't work for NULL input.
Solution 2:[2]
If you make the call
cread(0);
The cmovene instruction will seg fault because it evaluates *xp even though the value will never be used.
In the assembly language, this is expressed by (%edx). I.e. the contents of memory at the address in %edx are loaded regardless of the value of edx.
The value of cmov has been called into question generally. For example Linus Torvalds is not a fan.
Solution 3:[3]
I understand that xp is dereferenced, but I don't understand how xp becomes a null pointer. Wouldn't that depend on pointer being passed as a parameter into the function?
You're technically correct (and the textbook technically wrong - in theory, in some circumstances, the compiler could legally generate that code).
However; the circumstances where that code can be generated are:
a) the compiler (and/or linker) can prove that no caller ever passes NULL to the function. In this case the compiler also proves that the cmov is pointless and can be replaced with a normal store (a mov without any previous test).
b) the compiler (and/or linker) knows that referencing NULL in assembly (which is not C and does not need to follow the rules of C) is fine. Typically NULL in C is the address 0x00000000 in assembly, and typically the area at address 0x000000 is deliberately made inaccessible to help catch bugs; but there's no reason why an OS or a program can't make the area at 0x0000000 accessible (e.g. often it's trivial to do this just using a linker script).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jester |
| Solution 2 | Peter Cordes |
| Solution 3 | Brendan |
