'Debug printf performance when compiled with clang cfi
Setup
I have a simple helloworld program:
// content of main.c
#include <stdio.h>
#include <limits.h>
int main() {
for (int i = 0; i < INT_MAX; ++i) {
printf("simply helloworld!\n");
}
return 0;
}
I compile a baseline version with clang 13.0.0 using clang -flto=thin -fvisibility=hidden -fuse-ld=lld main.c
To experiment with CFI, I compile another version using clang -flto=thin -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fsanitize-trap=cfi -fvisibility=hidden -fuse-ld=lld main.c
Expectation
I am expecting negligible performance overhead as I am only calling into a shared library that I expect will run the same code for both. The disassembly for main function for both binaries look the same.
Reality
The baseline version completes execution in ~27s while the cfi version completes execution in ~32s. Using perf stat -e instructions <binary> I can see that the cfi version runs ~100,000,000,000 more instructions. With perf record then perf diff, I can see that the difference is primarily in two functions _pthread_cleanup_push_defer and _pthread_cleanup_pop_restore that the cfi version runs. Using gdb, these functions are called as the call stack of printf gets deeper.
Question
How do I begin to explain the performance difference between these two binaries? What makes a simple call to printf call two different versions of itself for two different binaries?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
