'C/C++: Are IEEE 754 float addition/multiplication/... and int-to-float conversion standardized?

Example:

#include <math.h>
#include <stdio.h>

int main()
{
    float f1 = 1;
    float f2 = 4.f * 3.f;
    float f3 = 1.f / 1024.f;
    float f4 = 3.f - 2.f;
    printf("%a\n",f1);
    printf("%a\n",f2);
    printf("%a\n",f3);
    printf("%a\n",f4);
    return 0;
}

Output on gcc/clang as expected:

0x1p+0
0x1.8p+3
0x1p-10
0x1p+0

As one can see, the results look "reasonable". However, there are probably multiple different ways to display these numbers. Or to display numbers very close.

Is it guaranteed in C and in C++ that IEEE 754 floating arithmetic like addition, multiplication and int-to-float conversion yield the same results, on all machines and with all compilers (i.e. that the resulting floats are all bit-wise equal)?



Solution 1:[1]

No, unless the macro __STD_IEC_559__ is defined.

Basically the standard does not require IEEE 754 compatible floating point, so most compilers will use whatever floating point support the hardware provides. If the hardware provides IEEE compatible floating point, most compilers for that target will use it and predefine the __STD_IEC_559__ macro.

If the macro is defined, then IEEE 754 guarantees the bit representation (but not the byte order) of float and double as 32-bit and 64-bit IEEE 754. This in turn guarantees bit-exact representation of double arithmetic (but note that the C standard allows float arithmetic to happen at either 32 bit or 64 bit precision).

The C standard requires that float to int conversion be the same as the trunc function if the result is in range for the resulting type, but unfortunately IEEE doesn't actually define the behavior of functions, just of basic arithmetic. The C spec also allows the compiler reorder operations in violation of IEEE754 (which might affect precision), but most that support IEEE754 will not do that wihout a command line option.

Anecdotal evidence also suggest that some compilers do not define the macro even though they should while other compilers define it when they should not (do not follow all the requirements of IEEE 754 strictly). These cases should probably be considered compiler bugs.

Solution 2:[2]

Is it guaranteed in C and in C++ that IEEE 754 floating arithmetic like addition, multiplication and int-to-float conversion yield the same results, on all machines and with all compilers (i.e. that the resulting floats are all bit-wise equal)?

No


If the exceptional compiler defines _STDC_IEC_559__, then almost yes.

An implementation that defines STDC_IEC_559 shall conform to the specifications in this annex.
C17dr Annex F (normative) IEC 60559 floating-point arithmetic

IEEE 754 floating arithmetic like addition, multiplication and int-to-float conversion yield like results when _FLT_EVAL_METHOD_ == 0. When _FLT_EVAL_METHOD_ > 0, wider floating point math may be used for many operations causing different results. Yet even with _FLT_EVAL_METHOD_ == 0, I have doubts that all FP code will compute with exactly the same result.

For highly portable FP code, a variation tolerance should be expected.


OP is also looking for bit-wise equivalent. FP has endian issues too, so 2 implementations could meet all IEEE 754 criteria, yet differ in endian.

Solution 3:[3]

Realize that both the C and C++ standards strive to be inclusive of unusual architectures. They would never mandate strict adherence to IEEE-754.

Also realize that the systems that do use IEEE-754 will rely on the processor architecture to implement it correctly. Your actual question then is how well do the processors conform to the IEEE-754 rules, which is hard to answer with authority. The Intel Pentium famously had a bug that produced wrong results for a tiny subset of operations.

I don't know if the conversion of integers to float is as tightly specified as other operations, but I suspect it is. A 32-bit IEEE-754 float has 24 bits of mantissa, and can therefore hold any 24-bit integer without loss of precision. That would be the range from -16777216 to 16777216. I would be very disappointed in any implementation that couldn't perform the operation 100% reliably. Outside of that range there are integer float values that can't be represented, so rounding should be applied to determine the final value. For example there are no valid floats between 2147483520 and 2147483648, so what should happen if you try to convert 2147483583 or 2147483585? I honestly don't know what the result will be, or whether that result would be correct.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Mark Ransom