'writing directly to std::string internal buffers

I was looking for a way to stuff some data into a string across a DLL boundary. Because we use different compilers, all our dll interfaces are simple char*.

Is there a correct way to pass a pointer into the dll function such that it is able to fill the string buffer directly?

string stringToFillIn(100, '\0');
FunctionInDLL( stringToFillIn.c_str(), stringToFillIn.size() );   // definitely WRONG!
FunctionInDLL( const_cast<char*>(stringToFillIn.data()), stringToFillIn.size() );    // WRONG?
FunctionInDLL( &stringToFillIn[0], stringToFillIn.size() );       // WRONG?
stringToFillIn.resize( strlen( stringToFillIn.c_str() ) );

The one that looks most promising is &stringToFillIn[0] but is that a correct way to do this, given that you'd think that string::data() == &string[0]? It seems inconsistent.

Or is it better to swallow an extra allocation and avoid the question:

vector<char> vectorToFillIn(100);
FunctionInDLL( &vectorToFillIn[0], vectorToFillIn.size() );
string dllGaveUs( &vectorToFillIn[0] );


Solution 1:[1]

I'm not sure the standard guarantees that the data in a std::string is stored as a char*. The most portable way I can think of is to use a std::vector, which is guaranteed to store its data in a continuous chunk of memory:

std::vector<char> buffer(100);
FunctionInDLL(&buffer[0], buffer.size());
std::string stringToFillIn(&buffer[0]);

This will of course require the data to be copied twice, which is a bit inefficient.

Solution 2:[2]

Update (2021): C++11 cleared this up and the concerns expressed here are no longer relevant.

After a lot more reading and digging around I've discovered that string::c_str and string::data could legitimately return a pointer to a buffer that has nothing to do with how the string itself is stored. It's possible that the string is stored in segments for example. Writing to these buffers has an undefined effect on the contents of the string.

Additionally, string::operator[] should not be used to get a pointer to a sequence of characters - it should only be used for single characters. This is because pointer/array equivalence does not hold with string.

What is very dangerous about this is that it can work on some implementations but then suddenly break for no apparent reason at some future date.

Therefore the only safe way to do this, as others have said, is to avoid any attempt to directly write into the string buffer and use a vector, pass a pointer to the first element and then assign the string from the vector on return from the dll function.

Solution 3:[3]

In C++98 you should not alter the buffers returned by string::c_str() and string::data(). Also, as explained in the other answers, you should not use the string::operator[] to get a pointer to a sequence of characters - it should only be used for single characters.

Starting with C++11 the strings use contiguous memory, so you could use &string[0] to access the internal buffer.

Solution 4:[4]

As long as C++11 gives contiguous memory guaranties, in production practice this 'hacky' method is very popular:

std::string stringToFillIn(100, 0);
FunctionInDLL(stringToFillIn.data(), stringToFillIn.size());

Solution 5:[5]

Considering Patrick's comment I would say, it's OK and convenient/efficient to directly write into a std::string. I would use &s.front() to get a char *, like in this mex example:

#include "mex.h"
#include <string>
void mexFunction(
    int nlhs,
    mxArray *plhs[],
    int nrhs,
    const mxArray *prhs[]
)
{
    std::string ret;
    int len = (int)mxGetN(prhs[0]);
    ret.reserve(len+1);
    mxGetString(prhs[0],&ret.front(),len+1);
    mexPrintf(ret.c_str());
}

Solution 6:[6]

I'd not construct a std::string and ship a pointer to the internal buffers across dll boundaries. Instead I would use either a simple char buffer (statically or dynamically allocated). After the call to the dll returns, I'd let a std::string take over the result. It just feels intuitively wrong to let callees write in an internal class buffer.

Solution 7:[7]

You can use char buffer allocated in unique_ptr instead vector:

// allocate buffer
auto buf = std::make_unique<char[]>(len);
// read data
FunctionInDLL(buf.get(), len);
// initialize string
std::string res { buf.get() };

You cannot write directly into string buffer using mentioned ways such as &str[0] and str.data():

#include <iostream>
#include <string>
#include <sstream>

int main()
{
    std::string str;
    std::stringstream ss;
    ss << "test string";
    ss.read(&str[0], 4);       // doesn't work
    ss.read(str.data(), 4);    // doesn't work
    std::cout << str << '\n';
}

Live example.

Solution 8:[8]

You all have already addressed the contiguity issue (i.e. it's not guaranteed to be contiguous) so I'll just mention the allocation/deallocation point. I've had issues in the past where i've allocated memory in dlls (i.e. had dll return a string) that have caused errors upon destruction (outside the dll). To fix this you must ensure that your allocator and memory pool is consistent across the dll boundary. It'll save you some debugging time ;)

Solution 9:[9]

The standard part of std::string is the API and the some of the behavior, not the memory layout of the implementation.

Therefore if you're using different compilers you can't assume they are the same, so you'll need to transport the actual data. As others have said transport the chars and push into a new std::string.