'Standard compliant host to network endianess conversion

I am amazed at how many topics on StackOverflow deal with finding out the endianess of the system and converting endianess. I am even more amazed that there are hundreds of different answers to these two questions. All proposed solutions that I have seen so far are based on undefined behaviour, non-standard compiler extensions or OS-specific header files. In my opinion, this question is only a duplicate if an existing answer gives a standard-compliant, efficient (e.g., use x86-bswap), compile time-enabled solution.

Surely there must be a standard-compliant solution available that I am unable to find in the huge mess of old "hacky" ones. It is also somewhat strange that the standard library does not include such a function. Perhaps the attitude towards such issues is changing, since C++20 introduced a way to detect endianess into the standard (via std::endian), and C++23 will probably include std::byteswap, which flips endianess.

In any case, my questions are these:

Starting at what C++ standard is there a portable standard-compliant way of performing host to network byte order conversion?
I argue below that it's possible in C++20. Is my code correct and can it be improved?
Should such a pure-c++ solution be preferred to OS specific functions such as, e.g., POSIX-htonl? (I think yes)

I think I can give a C++23 solution that is OS-independent, efficient (no system call, uses x86-bswap) and portable to little-endian and big-endian systems (but not portable to mixed-endian systems):

// requires C++23. see https://gcc.godbolt.org/z/6or1sEvKn
#include <type_traits>
#include <utility>
#include <bit>

constexpr inline auto host_to_net(std::integral auto i) {
    static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
    if constexpr (std::endian::native == std::endian::big) {
        return i;
    } else {
        return std::byteswap(i);
    }
}

Since std::endian is available in C++20, one can give a C++20 solution for host_to_net by implementing byteswap manually. A solution is described here, quote:

// requires C++17
#include <climits>
#include <cstdint>
#include <type_traits>

template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) {
  return ((((i >> (N * CHAR_BIT)) & (T)(unsigned char)(-1)) <<
           ((sizeof(T) - 1 - N) * CHAR_BIT)) | ...);
}; //                                        ^~~~~ fold expression
template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) {
  return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}

The linked answer also provides a C++11 byteswap, but that one seems to be less efficient (not compiled to x86-bswap). I think there should be an efficient C++11 way of doing this, too (using either less template-nonsense or even more) but I don't care about older C++ and didn't really try.

Assuming I am correct, the remaining question is: can one can determine system endianess before C++20 at compile time in a standard-compliant and compiler-agnostic way? None of the answers here seem to do achieve this. They use reinterpret_cast (not compile time), OS-headers, union aliasing (which I believe is UB in C++), etc. Also, for some reason, they try to do it "at runtime" although a compiled executable will always run under the same endianess.)

One could do it outside of constexpr context and hope it's optimized away. On the other hand, one could use system-defined preprocessor definitions and account for all platforms, as seems to be the approach taken by Boost. Or maybe (although I would guess the other way is better?) use macros and pick platform-specific htnl-style functions from networking libraries(done, e.g., here (GitHub))?

c++endianness

Solution 1:^[1]

I made a benchmark comparing my C++ solution from the question and the solution by eeroika from the accepted answer.

Looking at this is a complete waste of time, but now that I did it, I though I might as well share it. The result is that (in the specific not-quite-realistic usecase I look at) they seem to be equivalent in terms of performance. This is despite my solution being compiled to use x86-bswap, while the solution by eeroika does it by just using mov.

The performance seems to differ a lot (!!) when using different compilers and the main thing I learned from these benchmarks is, again, that I'm just wasting my time...

// benchmark to compare two C++20-stand-alone host-to-big-endian endianess conversion.]
// Run at quick-bench.com! This is not a complete program. (https://quick-bench.com/q/2qnr4xYKemKLZupsicVFV_09rEk)
// To run locally, include Google benchmark header and a main method as required by the benchmarking library.
// Adapted from https://stackoverflow.com/a/71004000/9988487
#include <type_traits>
#include <utility>
#include <cstddef>
#include <cstdint>
#include <climits>
#include <type_traits>
#include <utility>
#include <bit>
#include <random>

/////////////////////////////// Solution 1 ////////////////////////////////

template <typename T> struct scalar_t { T t{}; /* no begin/end */ };
static_assert(not std::ranges::range< scalar_t<int> >);

template<class T, std::size_t... N>
constexpr T bswap_impl(T i, std::index_sequence<N...>) noexcept {
  constexpr auto bits_per_byte = 8u;
  static_assert(bits_per_byte == CHAR_BIT);
  return ((((i >> (N * bits_per_byte)) & (T)(unsigned char)(-1)) <<
           ((sizeof(T) - 1 - N) * bits_per_byte)) | ...);
}; //                                             ^~~~~ fold expression

template<class T, class U = typename std::make_unsigned<T>::type>
constexpr U bswap(T i) noexcept {
  return bswap_impl<U>(i, std::make_index_sequence<sizeof(T)>{});
}

constexpr inline auto host_to_net(std::integral auto i) {
    static_assert(std::endian::native == std::endian::big || std::endian::native == std::endian::little);
    if constexpr (std::endian::native == std::endian::big) {
        return i;
    } else {
        return bswap(i);  // replace by `std::byteswap` once it's available!
    }
}

/////////////////////////////// Solution 2 ////////////////////////////////

// helper to promote an integer type
template <class T>
using promote_t = std::decay_t<decltype(+std::declval<T>())>;

template <class T, std::size_t... I>
constexpr void
host_to_big_impl(
    unsigned char* buf,
    T t,
    [[maybe_unused]] std::index_sequence<I...>) noexcept {
    using U = std::make_unsigned_t<promote_t<T>>;
    constexpr U lastI = sizeof(T) - 1u;
    constexpr U bits = 8u;
    U u = t;
    ( (buf[I] = u >> ((lastI - I) * bits)), ... );
}


template <class T, std::size_t... I>
constexpr void
host_to_big(unsigned char* buf, T t) noexcept {
    using Indices = std::make_index_sequence<sizeof(T)>;
    return host_to_big_impl<T>(buf, t, Indices{});
}

//////////////////////// Benchmarks ////////////////////////////////////

template<std::integral T>
std::vector<T> get_random_vector(std::size_t length, unsigned int seed) {
    // NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. Don't use in production code!
    std::mt19937_64 rng{seed};
    std::uniform_int_distribution<T> distribution(
        std::numeric_limits<T>::min(), std::numeric_limits<T>::max());

    std::vector<T> result(length);
    for (auto && val : result) {
        val = distribution(rng);
    }
    return result;
}

template<>
std::vector<bool> get_random_vector<bool>(std::size_t length, unsigned int seed) {
    // NOTE: IT IS VERY SLOW TO RECREATE RNG EVERY TIME. ONLY USE FOR TESTING!
    std::mt19937_64 rng{seed};
    std::bernoulli_distribution distribution{0.5};

    std::vector<bool> vec(length);

    for (auto && val : vec) {
        val = distribution(rng);
    }
    return vec;
}

constexpr std::size_t n_ints{1000};


static void solution1(benchmark::State& state) {
  std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
  std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());

  for (auto _ : state) {
    for (std::size_t i{}; i < intvec.size(); ++i) {
        host_to_big(buffer.data() + sizeof(int)*i, intvec[i]);
    }
    
    benchmark::DoNotOptimize(buffer);
    benchmark::ClobberMemory();
  }
}
BENCHMARK(solution1);


static void solution2(benchmark::State& state) {
  std::vector<int> intvec = get_random_vector<int>(n_ints, 0);
  std::vector<std::uint8_t> buffer(sizeof(int)*intvec.size());

  for (auto _ : state) {
    for (std::size_t i{}; i < intvec.size(); ++i) {
        buffer[sizeof(int)*i] = host_to_net(intvec[i]);
    }
    
    benchmark::DoNotOptimize(buffer);
    benchmark::ClobberMemory();
  }
}
BENCHMARK(solution2);

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Standard compliant host to network endianess conversion

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]