'How can I accurately time (for a specific processor) a very deeply nested loop?
How can I accurately time for a specific processor a very deeply nested loop?
The AMD RYZEN 5000 series processor (used in some laptops like the Lenova Idea Pad 3) has hardware support for floating point multiply and floating point square root. The floating point square root apparently runs in one processor clock cycle (2-4 GHz) with 20 cycle latency.
I need to time the execution to completion of a five level loop. I am thinking about using GCC and Code Blocks.
Planning to run it as a console application but I need to make sure the compiler is using the floating point multiply/square root hardware.
MY FIRST QUESTION : How do I make sure the compiler is using the floating point multiply/square root hardware?
Also I need to make sure the compiler has not inserted debug/windows/or other extraneous code into the nested loop.
MY SECOND QUESTION :How do I check that the compiler has not inserted debug/windows/or other extraneous code into the nested loop
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
