'Passing unmanaged buffer from C++ to C# with HGlobalAlloc without copying

I have inherited maintenance of a .NET framework C#/ unmanaged C++ project and was asked to try and improve the performance of it. After going after low hanging fruit and some profiling the next thing I could easily do seemed to be trying to reduce the cost of transferring large buffers across the C#/C++ interop boundary since the program was spending about 30% of its time doing this marshaling/copying. I figured that this would be simple since C# has good interop support and the docs seemed to suggest it was possible.

After trying and failing to do this with several approaches I made a test solution to simplify things and to make sure I understood what was happening. The C# code for the example looks like this after removing the abstractions from the original code base:

using System;
using System.Runtime.InteropServices;
using System.Security;
using System.Diagnostics;


namespace InteropExample
{
    internal class Program
    {
        static void Main()
        {
            UnsafeNativeMethods.InitCpp();
            IntPtr buf = Marshal.AllocHGlobal(4096);
            int size = 0;
            UnsafeNativeMethods.OnGetDataBuffer(buf, ref size);
            Debug.WriteLine("C# buffer in:\t{0:X}", buf);

            Marshal.FreeHGlobal(buf);
        }
        // docs say we should see "substantial performance savings" with this attribute
        [SuppressUnmanagedCodeSecurity] 
        internal static class UnsafeNativeMethods
        {
            [DllImport("CppSide.dll")]
            public static extern void InitCpp();

            [DllImport("CppSide.dll")]
            public static extern void OnGetDataBuffer(IntPtr buffer, ref int size);

        }
    }
}

the C++ side with my notes on various pieces from the original code base and abstractions removed:

#include "CppSide.h"
#include <memory>

// way more complicated, former owner implemented their own shared_ptr
// and Memorypool system and has a ton of getter, setter, and helper functions for it
class Buffer
{
public:
    size_t _size;
    size_t _usedSize;
    char* _data;

    Buffer() :
        _size(4096),
        _data(NULL)
    {
        size_t charSize = _size / sizeof(char);
        // guessing this is to get around destructors running
        // and get better SSE2 vectorization
        _data = (char*)_aligned_malloc(_size, (size_t)256);
        
        // used size is set by the user after it has requested a buffer from the buffer pool
        // for now we set it to an arbitrary value
        _usedSize = 768;

    }
    ~Buffer() {
        _aligned_free(_data);
    }
};

std::shared_ptr<Buffer> out;

extern "C" __declspec (dllexport) void InitCpp()
{
    // in real program this starts up various threads and pipelines 
    // that produce and accept the buffers, for now we will simulate this
    // just by creating one buffer heading out
        
    out = std::make_shared<Buffer>();
}

extern "C" __declspec (dllexport) void OnGetDataBuffer(char* buffer, int size) {
    memcpy(buffer, out->_data, out->_usedSize);
    size = out->_usedSize;
    std::ostringstream ss;
    ss << "C++ out Buffer ptr:\t" << std::hex << static_cast<void*>(buffer) << std::endl;
    OutputDebugStringA(ss.str().c_str());
    out = NULL;
    // The actual code doesn't do remove out explicitly
    // but the buffer is returned to the pool soon after
    // being called by the pipeline replacing it
}

When I try and remove the memcpy and HGlobalAlloc, pass the _data pointer straight to the C# side, and pinning the shared pointer so the buffer isn't deleted I get a IntPtr set to zero back on the C# side. I thought it might be a issue with how I was allocating, and the underlying allocator for HGlobalAlloc is GlobalAlloc so I also swapped the _aligned_malloc and _aligned_free calls in buffer with GlobalAlloc(GPTR, charSize) and GlobalFree but still got the same results. This left me current non-working example:

using System;
using System.Runtime.InteropServices;
using System.Security;
using System.Diagnostics;

namespace InteropExample
{
    internal class Program
    {
        static void Main()
        {
            UnsafeNativeMethods.InitCpp();
            IntPtr buf = IntPtr.Zero;
            int size = 0;
            UnsafeNativeMethods.OnGetDataBuffer(ref buf, ref size);
            Debug.WriteLine("C# buffer in:\t{0:X}", buf);
        }

        [SuppressUnmanagedCodeSecurity]
        internal static class UnsafeNativeMethods
        {
            [DllImport("CppSide.dll")]
            public static extern void InitCpp();

            [DllImport("CppSide.dll")]
            public static extern void OnGetDataBuffer(ref IntPtr buffer, ref int size);
        }
    }
}
#include "CppSide.h"
#include <memory>
#include <windows.h>
#include <iostream>
#include <sstream>

class Buffer
{
public:
    size_t _size;
    size_t _usedSize;
    char* _data;

    Buffer() :
        _size(4096),
        _data(NULL)
    {
        size_t charSize = _size / sizeof(char);
        _data = (char *)GlobalAlloc(GPTR, charSize);

        // used size is set by the user after it has requested a buffer from the buffer pool
        // for now we set it to an arbitrary value
        _usedSize = 768;

    }
    ~Buffer() {
        GlobalFree(_data);
    }
};

std::shared_ptr<Buffer> out;

extern "C" __declspec (dllexport) void InitCpp()
{
    out = std::make_shared<Buffer>();
}

extern "C" __declspec (dllexport) void OnGetDataBuffer(char* buffer, int size) {
    buffer = out->_data;
    size = out->_usedSize;
    std::ostringstream ss;
    ss << "C++ out Buffer ptr:\t" << std::hex << static_cast<void*>(buffer) << std::endl;
    OutputDebugStringA(ss.str().c_str());
}

After reviewing the C# interop documentation another time I could not find an example of being able to pass back a pointer to unmanaged memory, so I guess my questions are as follows:

  1. Is it possible to pass a pointer to unmanaged memory allocated from C++ back to C# to avoid marshalling/copying of large buffers?
  2. If a buffer must be allocated on the managed side then have the contents of the unmanaged buffer copied into it, is there a more efficient way to do it and what pooling strategies can I use? I've looked at MemoryPool but I don't understand if/how it could be used to replace the Marshal.AllocHGlobal call in the original code.


Solution 1:[1]

Turns out I have not quite got my mind back into thinking with pointers. changing the interop interface to use a char** on the C++ side and rewriting the code around it fixed the issue due to me forgetting that I was passing the pointer . Testing seems to indicate that the allocation methods work as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AtAWitsEnd