'Optimization defeats robustness measures

I am currently investigating robust methods for the summation of arrays, and implemented the algorithm published by Shewchuk in "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". While the implemented algorithm works as expected in gfortran, ifort optimizes the countermeasures away.

To give some context, here is my code:

module test_mod
contains
  function shewchukSum( array ) result(res)
    implicit none
    real,intent(in) :: array(:)
    real            :: res
    integer         :: xIdx, yIdx, i, nPartials
    real            :: partials(100), hi, lo, x, y

    nPartials = 0
    do xIdx=1,size(array)
      i = 0
      x = array(xIdx)

      ! Calculate the partial sums
      do yIdx=1,nPartials
        y = partials(yIdx)
        hi = x + y
        if ( abs(x) < abs(y) ) then
          lo = x - (hi - y)
        else
          lo = y - (hi - x)
        endif
        x = hi

        ! If a round-off error occured, store it. Exact comparison intended
        if ( lo == 0. ) cycle
        i = i + 1 ; partials(i) = lo
      enddo ! yIdx
      nPartials = i + 1 ; partials( nPartials ) = x
    enddo ! xIdx

    res = sum( partials(:nPartials) )
  end function
end module

And the calling test program is

program test
  use test_mod
  implicit none
  print *,        sum([1.e0, 1.e16, 1.e0, -1.e16])
  print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16])
end program

Compilation using gfortran with produces the correct results for all optimization levels:

./a.out 
   0.00000000    
   2.00000000   

ifort, however, produces zeros for all optimizations above -O0:

./a.out 
   0.00000000
   0.00000000

I tried to debug the code and went down to the assembly level and figured out that ifort is optimizing away the calculation of lo and the operations after if ( lo == 0. ) cycle .

Is there a possibility to force ifort to perform the complete operation for all levels of optimization? This addition is a critical part of the calculations, and I want it to run as fast as possible. For comparison, gfortran at -O2 executes this code approximately eight to ten times faster than ifort at -O0 (measured for arrays of length >100k).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source