'Efficient long multiplication with double word result

I am trying to optimize the multiplication code of a 31.32 fixed-point math library written in C#.

The (incorrect) pseudo code is: long result = (a * b) >> 32;

The problem of course is the potential overflow of a * b before being shifted down. Even if (a * b) >> 32 (the end result of the multiplication) is in the value range of long, the intermediate value a * b may not be.

The usual solution is to split a and b into a low and high part each and perform the shift operation on the high parts before the multiplication step. This avoids the overflow of the intermediate value, but makes the code rather more complicated:

var xl = x.m_rawValue;
var yl = y.m_rawValue;

var xlo = (ulong)(xl & 0x00000000FFFFFFFF);
var xhi = xl >> FRACTIONAL_PLACES;
var ylo = (ulong)(yl & 0x00000000FFFFFFFF);
var yhi = yl >> FRACTIONAL_PLACES;

var lolo = xlo * ylo;
var lohi = (long)xlo * yhi;
var hilo = xhi * (long)ylo;
var hihi = xhi * yhi;

var loResult = lolo >> FRACTIONAL_PLACES;
var midResult1 = lohi;
var midResult2 = hilo;
var hiResult = hihi << FRACTIONAL_PLACES;

var sum = (long)loResult + midResult1 + midResult2 + hiResult;

The resulting machine code is similarly complex.

The x86 imul instruction can return a double word result in two registers in a single instruction, but I have no idea how to write C# code that the compiler could optimize to use this.

Any ideas?

c++sharp .net

Solution 1:^[1]

Core 3.0 added support for that imul x64 CPU instruction you were looking for:

ulong lo64Bits;
ulong hi64Bits= System.Runtime.Intrinsics.X86.Bmi2.X64.MultiplyNoFlags(a, b, &lo);

(of course it is limited to x86 arch)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	SunsetQuest

'Efficient long multiplication with double word result

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]