'Cycles Per Element V.S. actual performance of Polynomial Evaluation
In the book "Computer Systems: A Programmer's Perspective (3rd edition)"'s chapter 5, exercise 5.5 and 5.6 talked about Polynomial Evaluation:
It also gives two implementation poly() and polyh(), and says poly()'s CPE(Cycles Per Element) is 5.0 and polyh()'s CPE is 8.0, thus concludes poly() run faster than polyh(). **But with clang-12 or clang-14 on my ubuntu20.04, polyh() is much faster, instead of what these exercises said. I'm confused. **
The Polynomial Evaluation implementations:
// the naive method
double poly(double a[], double x, long degree)
{
long i;
double result = a[0];
double xpwr = x;
for (i = 1; i <= degree; i++)
{
result += a[i] * xpwr;
xpwr = x * xpwr;
}
return result;
}
// the Horner's method
double polyh(double a[], double x, long degree)
{
long i;
double result = a[degree];
for (i = degree-1; i>=0; i--)
{
result = a[i] + x * result;
}
return result;
}
My compilation flags: -O1. Full implementation (including timer) is: https://godbolt.org/z/3eW8Wzr7z
My time cost result:
polyh: took 2.318 ms, loop=10, avg = 0.232 ms
poly: took 78.980 ms, loop=10, avg = 7.898 ms
Why polyh run faster with large CPE?
update: Based on the comments of @Passer By, I use the website quich-bench for time cost measurement, and with different array size, the benchmark result is different:
n = 1000, poly() is faster (https://quick-bench.com/q/EpDmf22VD_E0CvLN0-6TY_Ye8bU)
n = 10000 , polyh() is much faster (https://quick-bench.com/q/yuzoVzz_KhWv1gJ-_j9wlZtfWVM)
Solution 1:[1]
I think there is some confusion regarding the statements in the book. The link you have provided clearly shows polyh() to have less CPE than poly():
polyh(double*, double, long):
# skipping non-loop code...
mulsd xmm0, xmm1
addsd xmm0, qword ptr [rdi + 8*rsi - 16]
add rsi, -1
cmp rsi, 1
jg .LBB1_2
vs
poly(double*, double, long):
# skipping non-loop code...
movsd xmm3, qword ptr [rdi + 8*rax + 8]
mulsd xmm3, xmm2
addsd xmm0, xmm3
mulsd xmm2, xmm1
add rax, 1
cmp rsi, rax
jne .LBB0_2
Clearly polyh() is more precise code in comparission with poly().
Now lets talk about optimization. First of all -O0 is used to disable optimization. -01 is the minimum optimizations.
But even if you throw optimization out of the window the code in polyh() is optimized before even compilation. It has only 1 of each multiplication, addition and assigment while poly() has 2 multiplications and assigments.
Clearly polyh() is leaner and farter code.
UPDATE:
After updated question here is what I found. I tested with same quick-bench but used GCC instead of CLANG as I was using on my computer, and thee results are still same. polyh() wins even with 1000 iterations.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |




