'unrolling for loops in a special case function
So I'm trying to optimize some code. I have a function with a variable sized loop. However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled. My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value. I've included a code snip it illustrating the kind of thing i have in mind.
inline void someFunction (const int a)
{
for (int i=0; i<a; i++)
{
// do something with i.
}
}
void specialCase()
{
someFunction (3);
}
void generalCase(int a)
{
someFunction (a);
}
So my question is is it reasonable for me to expect my compiler (GCC) to unroll the for loop inside of specialCase. I'm mean obviously I can copy - paste the contents of someFunction into specialCase and replace a with 3 but I'd rather only deal with one definition of someFunction in my code for clarity sake.
Solution 1:[1]
However for efficiency sake I want to make cases with 1, 2 and 3 sized loops special cases that are completely unrolled.
Have you measured that this is actually faster? I doubt it will be (or that the compiler won't unroll the loop automatically).
My approach so far is to declare the loop size as a const parameter then define wrapper functions that call the main function handing it a literal for the const value.
const doesn't mean anything here. It won't affect the compiler's ability to unroll the loop. It just means that a cannot be mutated inside the function body, but it's still a runtime argument.
If you want to ensure unrolling, then force it. It's quite easy with C++17.
template <typename F, std::size_t... Is>
void repeat_unrolled_impl(F&& f, std::index_sequence<Is...>)
{
(f(std::integral_constant<std::size_t, Is>{}), ...);
}
template <std::size_t Iterations, typename F>
void repeat_unrolled(F&& f)
{
repeat_unrolled_impl(std::forward<F>(f),
std::make_index_sequence<Iterations>{});
}
Solution 2:[2]
If you don't like templates and don't trust your compiler, there's always this method, which is inspired by the outdated method of manually unrolling loops called "duff's device":
void do_something(int i);
void do_something_n_times(int n)
{
int i = 0;
switch(n)
{
default:
while(n > 3) {
do_something(i++);
--n;
}
case 3: do_something(i++);
case 2: do_something(i++);
case 1: do_something(i++);
}
}
But I think it's worth saying that if you don't trust your compiler to do something so simple as loop unrolling for you, it's probably time to consider a new compiler.
Note that Duff's device was originally invented as a micro-optimisation strategy for programs compiled with compilers that did not automatically apply loop-unrolling optimisations.
It was invented by Tom Duff in 1983.
https://en.wikipedia.org/wiki/Duff%27s_device
Its use with modern compilers is questionable.
Solution 3:[3]
I'd rather go this way, if you're willing to use the force-inline (non-standard) feature of all popular compilers:
__attribute__((always_inline))
void bodyOfLoop(int i) {
// put code here
}
void specialCase() {
bodyOfLoop(0);
bodyOfLoop(1);
bodyOfLoop(2);
}
void generalCase(int a) {
for (int i=0; i<a; i++) {
bodyOfLoop(i);
}
}
Note: this is GCC/Clang solution. Use __forceinline for MSVC.
Solution 4:[4]
How about this C++20 unrolling-helpers:
#pragma once
#include <utility>
#include <concepts>
#include <iterator>
template<size_t N, typename Fn>
requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
{
(fn( Indices ), ...);
};
unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
{
return ((Indices, fn()), ...);
};
unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
{
return (fn( Indices ) && ...);
};
return unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
{
return ((Indices, fn()) && ...);
};
return unroll_n( std::make_index_sequence<N>() );
}
template<std::size_t N, typename RandomIt, typename UnaryFunction>
requires std::random_access_iterator<RandomIt>
&& requires( UnaryFunction fn, typename std::iterator_traits<RandomIt>::value_type elem ) { { fn( elem ) }; }
inline
RandomIt unroll_for_each( RandomIt begin, RandomIt end, UnaryFunction fn )
{
RandomIt &it = begin;
if constexpr( N > 1 )
for( ; it + N <= end; it += N )
unroll<N>( [&]( size_t i ) { fn( it[i] ); } );
for( ; it < end; ++it )
fn( *begin );
return it;
}
But be aware that the unrolling-factor is crucial here. Unrolling is not always beneficial and sometimes unrolling beyond the optimal CPU-specific unrolling-factor drops to the performance without unrolling.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vittorio Romeo |
| Solution 2 | |
| Solution 3 | geza |
| Solution 4 | Bonita Montero |
