'Smallest integer not representable in single precision floating point
So I know that the smallest positive integer not representable by a single precision floating point is 2^(23+1) + 1 = 16,777,217.
How did we figure out that we use 2^(23+1) + 1. I understand that there is an implied 1, along with 23 being the number of bits represented in the mantissa but why does this work?
Solution 1:[1]
I think the trick here is to understand the basis of the floating point representation: Each number is represented as 1.fraction * 2^exponent. The key here is to know there are limits for both exponent (8 bits) and fraction (23 bits), but these limits do not necessarily match. For example, we can create 2^24 with the 8 bits exponent, while we cannot make 2^-24 with fraction (because it only has 23 bits). Thus, if you want to make number 16777216 = 2^24, you just set the fraction to 0 and set the exponent to represent 24. However, if you want to represent 16777217 = 2^24 + 1, the only thing you can do is to add a small fraction so when it multiplies with 2^24 it produce 1, and that small fraction should be 2^-24, which unfortunately cannot be produced by only 23 digits.
Solution 2:[2]
I think i got your question. Have a look on this, especially on how the structure/design of these variables are done. http://en.m.wikipedia.org/wiki/Single_precision
Float usually stands for floating point variable. This means you have (usually 3 bytes) where you store your number. Then you additionally have a (one byte) exponent, which says where to set the point within this number.
Now you can easily calculate the max and min numbers one can store in this value.
But there is a tricky part. As this is no fix point integer, it may have limited precision that can cause strange problems. As the number gets bigger, the absolute distance between the number is getting bigger. At some point, you will reach a number, where you can add 1, and it will remain the same number, cause one is outside the precision range you have available. As you will see on the wiki page above: 1 bit is used to flag negative numbers, 23 bits are used for your precision and 8 are used as an exponent. Now imagine, as an example, the exponent would be 40, now you will have a 23 bits number where the point is placed on position 40. All the rest is filled with 0. Adding 1 would not change the number, cause it is outside the significant scope and would not be stored.
Maybe you were asking why there is another +1 in the exponent. This is nicely explained here: Which is the first integer that an IEEE 754 float is incapable of representing exactly? It's cause an mantissa of the form abcdefg actually stands for 1.abcdefg.
Solution 3:[3]
How did we figure out that we use 2^(23+1) + 1
IEEE floating point represents normalised numbers in the following form.
(-1)s2(e – e0)(1+(m/2M))
Where:
- s is the sign bit, with a value of 0 or 1.
- e is the exponent field, with a value between 1 and 2E-2 where E is the number of exponent bits (values of 0 and 2E-1 are used for subnormals and infinites/NaNs).
- e0 is the exponent bias. It essentially sets the overall range of the floating point number.
- M is the number of mantissa bits.
- m is the mantissa with a value between 0 and 2M-1
zero can't be represented by this format, but it can be resented as a subnormal, so that is ok. All non-zero integers that can be represented are represented in the normalized format.
To convert a positive integer i to floating point we use.
e = floor(log2(i)) + e0
m = ((i/2(e – e0))-1) 2M
If i < 2M+1 then (e ? e0) ? M so m is an integer. Therefore i is representable.
If i = 2M+1 then (e ? e0) = M+1 and m = 0. Therefore i is representable.
If i = 2M+1 + 1 then (e ? e0) = M+1 and m = ½. Therefore i is not representable.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Peyman Mahdian |
| Solution 2 | Community |
| Solution 3 |
