'Can I implement this kind of profiling code with a macro?
Not an expert on preprocessor macro tricks, so if the problem here is just that I'm not familiar with some common macro idiom I'd be happy with just a term to Google. X macros are about as far as I've got before and I'm pretty sure I can't do anything with them.
Right now I do some stuff like this in code:
std::size_t trial = 0;
std::array<std::array<uint64_t, 5>, MAX_TRIALS> results_f;
void f()
{
unsigned int core;
results_f[trial][0] = __rdtscp(&core);
// do stuff
results_f[trial][1] = __rdtscp(&core);
// do some more stuff
results_f[trial][2] = __rdtscp(&core);
// do yet more stuff
results_f[trial][3] = __rdtscp(&core);
// do even more stuff
results_f[trial][4] = __rdtscp(&core);
if(++trial == MAX_TRIALS)
{
process_timestamps_f(results);
trial = 0;
}
}
(Here __rdtscp is an x86 intrinsic that gets a tick number from the CPU.)
I would instead like to be able to write something like so:
STOPWATCH_BOILERPLATE_PRE(f);
void f()
{
STOPWATCH_BEGIN;
// do stuff
STOPWATCH_LAP;
// do some more stuff
STOPWATCH_LAP;
// do yet more stuff
STOPWATCH_LAP;
// do even more stuff
STOPWATCH_END;
};
STOPWATCH_BOILERPLATE_POST(f);
So basically, I need to be able to count the number of times that STOPWATCH_LAP appears inside of f and use it to set the size of an array which is visible inside of f.
Bonus: it would be nice if, inside of process_timestamps_f, I can write something like LINE_ID(f, 3) and get the results of the __LINE__ preprocessor macro at the third instance of STOPWATCH_LAP in f.
(I imagine the actual code probably can't look exactly like what I wrote above. Open to whatever modifications are required to make this work. The only real requirement is that I don't have to constantly count how many of these lap points I've put into a function and update the corresponding code to match.)
Solution 1:[1]
I am assuming you do not want to have any dynamic memory management involved? Because otherwise you could simply use a std::vector and do a push_back() for each result...
Otherwise, I do not think this can be achieved easily by just using standard language elements. But MSVC, clang and gcc support __COUNTER__, which is a special macro that is incremented in each use and that can be exploited here. Storing the initial value before the function, then using it in every "LAP", you can compute the number of laps within the function. Moreover, you can declare the result array without needing to specify the first dimension before the function if you use a C-array via extern, and then define it afterwards with the now known number of laps.
You can also simply store the __LINE__ result at the same time when you store the __rdtscp() result.
See the following example. It is all quite fragile and assumes that the macros are used in that order, but depending on the actual code, it might be sufficient (https://godbolt.org/z/crrGY7n4P):
#include <array>
#include <cstdint>
#include <iostream>
#ifndef _MSC_VER
// https://code-examples.net/en/q/e19526
uint64_t __rdtscp( uint32_t * aux )
{
uint64_t rax,rdx;
asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (*aux) : : );
return (rdx << 32) + rax;
}
#else
#include <intrin.h>
#endif
constexpr std::size_t MAX_TRIALS = 3;
struct ResultElem
{
uint64_t timing;
unsigned line;
};
void process_timestamps(ResultElem results[][MAX_TRIALS], std::size_t numResult, char const * const func)
{
std::cout << func << ": Num = " << numResult << std::endl;
for (std::size_t trial = 0; trial < MAX_TRIALS; ++trial) {
std::cout << "\tTrial=" << trial << std::endl;
for (std::size_t i = 0; i < numResult; ++i) {
std::cout << "\t\tLine=" << results[i][trial].line << ", time=" << results[i][trial].timing << std::endl;
}
}
}
#define STOPWATCH_BOILERPLATE_PRE(f) \
extern ResultElem results_ ## f[][MAX_TRIALS]; \
constexpr std::size_t counterStart_ ## f = __COUNTER__; \
std::size_t trial_ ## f = 0;
#define STOPWATCH_BEGIN(f) uint32_t core; STOPWATCH_LAP(f)
#define STOPWATCH_LAP(f) results_ ## f[__COUNTER__ - counterStart_ ## f - 1][trial_ ## f] = {__rdtscp(&core), __LINE__}
#define STOPWATCH_END(f) \
STOPWATCH_LAP(f); \
if(++trial_ ## f == MAX_TRIALS) { \
process_timestamps(results_ ## f, __COUNTER__ - counterStart_ ## f - 1, #f); \
trial_ ## f = 0; \
}
// Needs to be used directly after STOPWATCH_END() because we subtract 2 from __COUNTER__.
#define STOPWATCH_BOILERPLATE_POST(f) \
constexpr std::size_t numResult_ ## f = __COUNTER__ - counterStart_ ## f - 2; \
ResultElem results_ ## f[numResult_ ## f][MAX_TRIALS];
STOPWATCH_BOILERPLATE_PRE(f)
void f()
{
STOPWATCH_BEGIN(f);
// do stuff
STOPWATCH_LAP(f);
// do some more stuff
STOPWATCH_LAP(f);
// do even more stuff
STOPWATCH_END(f);
}
STOPWATCH_BOILERPLATE_POST(f)
Alternatives I could think of:
Without dynamic allocations and staying in the standard, you might build something using BOOST_PP_COUNTER. The
STOPWATCH_LAPwould then probably turn into some form of#includestatement.I could also imagine that it might be possible to build something without macros using a weird loophole in C++14, but that gets terribly complicated.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sedenion |
