'Wrong result of multiplication: Undefined behavior or compiler bug?
Background
While debugging a problem in a numerical library, I was able to pinpoint the first place where the numbers started to become incorrect. However, the C++ code itself seemed correct. So I looked at the assembly produced by Visual Studio's C++ compiler and started suspecting a compiler bug.
Code
I was able to reproduce the behavior in a strongly simplified, isolated version of the code:
sourceB.cpp:
double alwaysOneB(double a[3]) {
return 1.0;
}
main.cpp:
#include <iostream>
__declspec(noinline)
bool alwaysTrue() {
return true;
}
__declspec(noinline)
double alwaysOneA(const double a[3]) {
return 1.0;
}
double alwaysOneB(double a[3]); // implemented in sourceB.cpp
int main() {
double* result = new double[2];
if (alwaysTrue()) {
double v[3];
v[0] = 0.0;
v[1] = 0.0;
v[2] = 0.0;
alwaysOneB(v);
double d = alwaysOneA(v); // d = 1
std::cout << "d = " << d << std::endl; // output: "d = 1" (as expected)
result[0] = d * v[2];
result[1] = d * d; // should be: 1 * 1 => 1
}
if (alwaysTrue()) {
std::cout << "result[1] = " << result[1] << std::endl; // output: "result[1] = 2.23943e-47" (expected: 1)
}
delete[] result;
return 0;
}
The code contains some bogus calls to other functions that are (unfortunately) necessary to reproduce the problem. However, the expected behavior should still be pretty clear. A value of 1.0 is assigned to the variable d, which is then multiplied by itself. That result should again be 1.0, which is written to an array and printed to the console. So the desired output is:
d = 1
result[1] = 1
However, the obtained output is:
d = 1
result[1] = 3.77013e+214
Test Environment
The code was tested with the C++ compiler that comes with Visual Studio Community 2019 (latest update, VS 16.11.9, VC++ 00435-60000-00000-AA327). The problem only occurs with optimizations activated (/O2). Compiling with /Od produces a binary that prints the correct output.
In the reduced example (not for the original problem when compiling the full library) I also had to deactivate "Full Program Optimization", otherwise the compiler gets rid of my bogus function calls.
This reduced example only reproduces the problem when compiled for x86 (other examples reproduce the problem for x64).
The full compilation command line is as follows:
/permissive- /ifcOutput "Release\" /GS /analyze- /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"Release\vc142.pdb" /Zc:inline /fp:precise /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oy- /Oi /MD /FC /Fa"Release\" /EHsc /nologo /Fo"Release\" /Fp"Release\DecimateBug2.pch" /diagnostics:column
Full Visual Studio solution to download: https://drive.google.com/file/d/1EyoX0uXEkvfJ_Fh649k9XjJQPdDUMik7/view?usp=sharing
Both the GNU compiler and Clang produce binaries that print the desired result.
Question
Is there any undefined behavior in this code that I am unable to see and that justifies an incorrect result? Or should I report this as a compiler bug?
Assembly produced by the compiler
For the two multiplication lines
result[0] = d * v[2];
result[1] = d * d;
the compiler produces the following assembly code:
00CF1432 movsd xmm1,mmword ptr [esp+18h] // Load d into first part of xmm1
00CF1438 unpcklpd xmm1,xmm1 // Load d into second part of xmm1
00CF143C movups xmm0,xmmword ptr [esp+30h] // Load second operands into xmm0
00CF1441 mulpd xmm0,xmm1 // 2 multiplications at one
00CF1445 movups xmmword ptr [esi],xmm0 // store result
Apparently it tries to perform the two multiplications at once using mulpd. In the first two lines it successfully loads the d operand into both parts of the xmm1 register (as first operands). But when it tries to load both second operands (v[2] and d), it simply loads 128 bits from the v[2] address (esp+30h). That's fine for the second operand of the first multiplication (v[2]), but not for the second multiplication (with d). Apparently the code supposes that d is located immediately after v in memory. However, it isn't. The variable d is never actually stored in memory, it seems to exist only in registers.
This makes me strongly suspect a compiler bug. However, I wanted to confirm that I am not missing any undefined behavior that justifies the incorrect assembly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
