'Memory management and overflow in C
I always wonder why C manages the memory the way it does.
Take a look at the following codes:
int main(){
int x = 10000000000;
printf("%d", x);
}
Of course, overflow occurs and it returns the following number:
1410065408
Or:
int main(){
int x = -10;
printf("%u", x);
}
Here x is signed and I am using the unsigned keyword "%u"
Returns:
4294967286
Or take a look at this one:
int main(){
char string_ = 'abc';
printf("%d", string_);
}
This returns:
99
That being said, I mainly have two questions:
- Why the program returns these specific numbers for specific inputs? I don't think it is a simple malfunctioning because it produces the same result for the same input. So there is a deterministic way for it to calculate these numbers. What is going under the hood when I pass these obviously invalid numbers?
- Most of these problems occur because C is not a memory-safe language. Wikipedia says:
In general, memory safety can be safely assured using tracing garbage collection and the insertion of runtime checks on every memory access
Then besides historical reasons, why are some languages not memory-safe? Is there any advantage of not being memory-safe rather than being memory-safe?
Solution 1:[1]
Of course, overflow occurs and it returns the following number:
There is no overflow in int x = 10000000000;
. Overflow in the C standard is when the result of an operation is not representable in the type. However, in int x = 10000000000;
, 10,000,000,000 is converted to type int
, and this conversion is defined to produce an implementation-defined result (that is implicitly representable in int
) or an implementation-defined result (C 2018 6.3.1.3 3). So there is no result that is not representable in int
.
You did not say which C implementation you are using (particularly the compiler), so we cannot be sure what the implementation defines for this conversion. For a 32-bit int
, it is common that an implementation wraps the number modulo 232. The remainder of 10,000,000,000 when divided by 232 is 1,410,065,408, and that matches the result you observed.
4294967286
In this case, you passed an int
where printf
expected an unsigned int
. The C standard does not define the behavior, but a common result is that the bits of an int
are reinterpreted as an unsigned int
. When two’s complement is used for a 32-bit int
value of ?10, the bits are FFFFFFF616. When the bits of an unsigned int
have that value, they represent 4,294,967,286, and that matches the result you observed.
char string_ = 'abc';
'abc'
is a character constant with more than one character. Its value is implementation defined (C 2018 6.4.4.4 10). Again, since you did not tell us which implementation you are using, we cannot be sure what the definition is.
One behavior of such constants is that 'abc'
will have the value ('a'*256 + 'b')*256 + 'c'
. When ASCII is used, this is (97*256 + 98)*256 + 99 = 6,382,179. Then char string_ = 'abc';
converts this value to char
. If char
is unsigned and is eight bits, the C standard defines this to wrap modulo 2256 (C 2018 6.3.1.3 2). If it is signed, it is implementation-defined, and a common behavior is to wrap modulo 2256. With either of those two methods, the result is 99, as the remainder of 6,382,179 when divided by 256 is 99, and this matches the result you observed.
Most of these problems occur because C is not a memory-safe language.
None of the above has anything to do with memory safety. None of the constants or the conversions access memory, so they are not affected by memory safety.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Eric Postpischil |