'How to copy a binary data vector to a float vector instead of saving as file by ofstream.write?

I'm new to c++, and I got a memory buffer input as const std::vector<uint8_t>& buffer.

Original code use:

   std::ofstream os(path, std::ofstream::binary);
   for ( auto it = buffer.begin() + batchIndex * batchChunk; it != buffer.begin() + (batchIndex+1) * batchChunk; ++it )
   {
      uint8_t u = *it;
      os.write((char*)(&u), sizeof(uint8_t))

   }

to save the data to a file. In which batchIndex = 0 and batchChunk = 4000(it's weird that the actual length of data should be 1000, float, and it do like this when I read the file by numpy in python).

I want to save such binary data into a vector instead of saving it as a file, here is what I did:

  std::vector<float> vec;
  vec.resize(batchChunk);
  std::copy(buffer.begin(), buffer.begin() + (batchIndex+1) * batchChunk, std::back_inserter(vec));

Then I got a vector at the length of 4000, and full of int numbers (should be length of 1000 with floats).

Please leave your advice! Thanks you so much.

c++


Solution 1:[1]

The first thing is that float is 4 bytes in your case, that is 4 times the size of uint8_t.

So you should not resize with batchChunk but with batchChunk/sizeof(float) instead (you may directly use buffer.size() instead of batchChunk).

That being said, since you use a std::back_inserter, you should not resize() otherwise you'll get twice the size since std::back_inserter inserts elements.
On the other hand, you may use reserve() instead in order to avoid unnecessary reallocations since you already know the final size of the container. This is not mandatory though.

The second thing is that you actually construct each float value from a single uint8_t value. Hence the wrong value in output. You don't want to "cast" uint8_t to float, you want/need to reconstruct each float from the 4 corresponding uint8_t values.

The easiest way of doing it is to treat the std::vector<uint8_t> as if it was composed of float values by using reinterpret_cast.
You may also iterate over your container 4 by 4 and recreate each float one by one before pushing them back into the destination vector, but you lose the conciseness of using std::copy.

Your example can be rewritten as follows:

std::vector<float> vec;
vec.reserve(buffer.size()/sizeof(float)); // Optional (only to avoid unnecessary reallocations)

float * tmp = reinterpret_cast<float*>(buffer.data()); // maybe illegal (see the edit below)
std::copy(tmp, tmp + buffer.size()/sizeof(float), std::back_inserter(vec));

Edit

I suspect the above solution to break the strict aliasing rule (not sure if reinterpret_cast<float*>(uint8_t*) is legal).

A better and safer way to perform the deserialization would be to get rid of std::copy and use std::memcpy instead (arguments are taken as void*).

We could then rewrite the solution as:

std::vector<float> vec(buffer.size()/sizeof(float));
std::memcpy(vec.data(), buffer.data(), buffer.size());;

It is even shorter and consequently clearer.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1