'C++17 UTF8 std::string to std::wstring UTF32 using unicode.org code or C++ standard functions?

Looking for a working solution to the classic UTF8 to UTF32 in a stable and tested system.

Now I have the source to Unicode.org's

C code: https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/ConvertUTF.c https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/ConvertUTF.h License: https://android.googlesource.com/platform/external/id3lib/+/master/unicode.org/readme.txt

Using the following C++ which interfaces the C library code from above:

 std::wstring Utf8_To_wstring(const std::string& utf8string)
    {
        if (utf8string.length()==0)
        {
            return std::wstring();
        }
        size_t widesize = utf8string.length();
        if (sizeof(wchar_t) == 2)
        {
            std::wstring resultstring;
            resultstring.resize(widesize, L'\0');
            const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
            const UTF8* sourceend = sourcestart + widesize;
            UTF16* targetstart = reinterpret_cast<UTF16*>(&resultstring[0]);
            UTF16* targetend = targetstart + widesize;
            ConversionResult res = ConvertUTF8toUTF16(&sourcestart, sourceend, &targetstart, targetend, strictConversion);
            if (res != conversionOK)
            {
                return std::wstring(utf8string.begin(), utf8string.end());
            }
            *targetstart = 0;
            return std::wstring(resultstring.c_str());
        }
        else if (sizeof(wchar_t) == 4)
        {
            std::wstring resultstring;
            resultstring.resize(widesize, L'\0');
            const UTF8* sourcestart = reinterpret_cast<const UTF8*>(utf8string.c_str());
            const UTF8* sourceend = sourcestart + widesize;
            UTF32* targetstart = reinterpret_cast<UTF32*>(&resultstring[0]);
            UTF32* targetend = targetstart + widesize;
            ConversionResult res = ConvertUTF8toUTF32(&sourcestart, sourceend, &targetstart, targetend, lenientConversion);
            if (res != conversionOK)
            {
                return std::wstring(utf8string.begin(), utf8string.end());
            }
            *targetstart = 0;
            if(!resultstring.empty() && resultstring.size() > 0) {
                std::wstring result = std::wstring(resultstring.c_str());
                return result;
            } else {
                return std::wstring();
            }
        }
        else
        {
            assert(false);
            return L"";
        }
        return L"";
    }

Now this code initially works however crashes soon after due to some issues in the above interfacing code. This interfacing code was adapted from open source code found on GitHub from a production project...

However crashes a few strings into the conversion, so I guess there's a overflow in this code

Does anyone have a good replacement or example code for a simple C++11/C++17 solution to convert a std::string to std::wstring to get UTF32 unicode values encoded



Solution 1:[1]

I have a working solution for UTF8 to UTF16 using C++17 Locale:

This seems to do the job for me to convert to the correct level of Unicode to enable extraction of character codes to int to load glyph codes correctly

#include <locale>
#include <codecvt>
#include <string>

std::wstring Utf8_To_wstring(const std::string& utf8string)
{
    wstring_convert<codecvt_utf8_utf16<wchar_t>> converter;
    wstring utf16;
    try {
        utf16 = converter.from_bytes(utf8string);
    }
    catch(range_error e)
    {
        // log / handle exp      
    }
    return utf16;
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Danoli3