'How to represent existing data as std::vector

I have to pass existing data (unsigned char memory area with known size) to the library function expecting const std::vector<std::byte>& . Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?

  1. I have data from the old legacy as a pointer and size, not as a std::vector. Legacy C code allocates memory by malloc() and provides pointer and size. Please do not suggest touching the legacy code - by the end of the phrase I'll cease to be an employee of the company.

  2. I don't want to create temporary vector and copy data because memory throughtput is huge (> 5GB/sec).

  3. Placement new creates vector - but with the first bytes used for the vector data itself. I cannot use few bytes before the memory area - legacy code didn't expect that (see above - memory area is allocated by malloc()).

  4. Changing third party library is out of question. It expects const std::vectorstd::byte& - not span iterators etc.

It looks that I have no way but to go with temporary vector but maybe there are other ideas... I wouldn't care but it is about intensive video processing and there will be a lot of data to copy for nothing.



Solution 1:[1]

Is there any way to "fool" the library function to believe that it received a vector while operating on existing data?

No.

The potential options are:

  1. Put the data in a vector in the first place.
  2. Or change the function expecting a vector to not expect a vector.
  3. Or create a vector and copy the data.

If 1. and 2. are not valid options for you, that leaves you with 3. whether you want it or not.

Solution 2:[2]

As the top answer mentions, this is impossible to do in standard C++. And you should not try to do it.

If you can tolerate only using libstdc++ and getting potentially stuck with a specific standard library version, it looks like you can do it. Again, you should not do this. I'm only writing this answer as it seems to be possible without UB in this specific circumstance.

It appears that the current version of libstdc++ exposes their vectors' important members as protected: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h#L422

All you need to do is inherit from std::vector (it's not forbidden), write your own constructor for setting these protected members, and write a destructor to reset the members so that the actual vector destructor does not delete your memory.

#include <vector>
#include <cstddef>

template <class T>
struct dont_use_me_in_prod : std::vector<T>
{
    dont_use_me_in_prod(T* data, size_t n) {
        this->_M_impl._M_start = data;
        this->_M_impl._M_finish = data + n;
        this->_M_impl._M_end_of_storage = this->_M_impl._M_finish;
    }  

    ~dont_use_me_in_prod() {
        this->_M_impl._M_start = nullptr;
        this->_M_impl._M_finish = nullptr;
        this->_M_impl._M_end_of_storage = nullptr;
    }
};

void innocent_function(const std::vector<int>& v);

void please_dont_do_this_in_prod(int* vals, int n) {
    dont_use_me_in_prod evil_vector(vals, n);
    innocent_function(evil_vector);
}

Note that this is not compiler, but standard library dependent, meaning that it'll work with clang as well as long as you use libstdc++ with it. But this is not conforming, so you gotta fix innocent_function somehow soon: https://godbolt.org/z/Tfcn7rdKq

Solution 3:[3]

The problem is std::vector is not a reference class like std::string_view or std::span. std::vector owns the managed memory. It allocates the memory and releases the owned memory. It is not designed to acquire the external buffer and release the managed buffer.

What you can do is a very dirty hack. You can create new structure with exactly the same layout as a std::vector, assign the data and size fields with what you get from external lib, and then pass this struct as a std::vector const& using reinterpret_cast. It can work as your library does not modify the vector (I assume they do not perform const_cast on std::vector const&).

The drawback is that code is unmaintainable. The next STL update can cause application crash, if the layout of the std::vector is changed.

Following is a pseudo code

struct FakeVector
{
  std::byte* Data;
  std::size Size;
  std::size Capacity;
}; 

void onNewData(std::byte* ptr, size_t size)
{
  auto vectorRef = FakeVector{ptr, size, size};
  doSomething(*reinterpret_cast<std::vector<std::byte>*>(&vectorRef)); 
}

Solution 4:[4]

Well, I've found the way working for me. I must admit that it is not fully standard compliant because casting of vector results in undefined behavior but for the foreseeable future I wouldn't expect this to fail. Idea is to use my own Allocator for the vector that accepts the buffer from the legacy code and works on it. The problem is that std::vector<std::byte> calls default initialization on resize() that zeroes the buffer. If there is a way to disable that - it would be a perfect solution but I have not found... So here the ugly cast comes - from the std::vector<InnerType> where InnerType is nothing but std::byte with default constructor disabled to the std::vector<std::byte> that library expects. Working code is shown at https://godbolt.org/z/7jME79EE9 , also here:

#include <cstdlib>
#include <iostream>
#include <vector>
#include <cstddef>

struct InnerType {
    std::byte value;
    InnerType() {}
    InnerType(std::byte v) : value(v) {}
};
static_assert(sizeof(InnerType) == sizeof(std::byte));

template <class T> class AllocatorExternalBufferT {
    T* const _buffer;
    const size_t _size;
public:
    typedef T value_type;

    constexpr AllocatorExternalBufferT() = delete;
    
    constexpr AllocatorExternalBufferT(T* buf, size_t size) : _buffer(buf), _size(size) {}

    [[nodiscard]] T* allocate(std::size_t n) {
        if (n > _size / sizeof(T)) {
            throw std::bad_array_new_length();
        }
        return _buffer;
    }

    void deallocate(T*, std::size_t) noexcept {}

};

template <class T, class U> bool operator==(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return true; }
template <class T, class U> bool operator!=(const AllocatorExternalBufferT <T>&, const AllocatorExternalBufferT <U>&) { return false; }

typedef std::vector<InnerType, AllocatorExternalBufferT<InnerType>> BufferDataVector;
typedef std::vector<std::byte, AllocatorExternalBufferT<std::byte>> InterfaceVector;

static void report(const InterfaceVector& vec) {
    std::cout << "size=" << vec.size()  << " capacity=" << vec.capacity() << " ";
    for(const auto& el : vec) {
        std::cout << static_cast<int>(el) << " ";
    }
    std::cout << "\n";
}

int main() {
    InnerType buffer4allocator[16] ;
    BufferDataVector v((AllocatorExternalBufferT<InnerType>(buffer4allocator, sizeof(buffer4allocator)))); // double parenthesis here for "most vexing parse" nonsense
    v.resize(sizeof(buffer4allocator));
    std::cout << "memory area kept intact after resizing vector:\n";
    report(*reinterpret_cast<InterfaceVector*>(&v));    
}

Solution 5:[5]

Yes you can do this. Not in a nice safe way but it's certainly possible.

All you need to do is create a fake std::vector that has the same ABI (memory layout) as std::vector. Then set it's internal pointer to point to your data and reinterpet_cast your fake vector back to a std::vector.

I wouldn't recommend it unless you really need to do it because any time your compiler changes its std::vector ABI (field layout basically) it will break. Though to be fair that is very unlikely to happen these days.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 eerorika
Solution 2 Fatih BAKIR
Solution 3
Solution 4 Ilya M
Solution 5 Timmmm