'Conversion from UTF-8 encoded string to bytes and vice versa in C++

In C#, we have following functions to convert a string to a UTF-8 encoded sequence of bytes and vice-versa:

  1. Encoding.UTF8.GetString(Byte[])
  2. Encoding.UTF8.GetBytes(Char[]) / Encoding.UTF8.GetBytes(String)

I am trying to achieve the same thing in C++, as follows:

std::string GetStringFromBytes(std::vector<uint8_t> bytes){
    std::string str(bytes.begin(), bytes.end());
    return str;
}

std::vector<uint8_t> GetBytesFromString(const std::string& str){
    std::vector<uint8_t> bytes(str.begin(), str.end());
    return bytes;
}

Is this approach correct? I'm assuming that the string that I'm converting is already in UTF-8 format.



Solution 1:[1]

C# string uses UTF-16, and thus requires a charset conversion to/from UTF-8.

C++ std::string does not use UTF-16 (std::u16string does). So, if you have a UTF-8 encoded std::string, you already have the raw bytes for it, just copy them as-is. The code you have shown is doing exactly that, and is fine for UTF-8 strings. Otherwise, if you have/need std::string encoded in some other charset, you will need a charset conversion to/from UTF-8. There are 3rd party Unicode libraries that can handle that, such as libiconv, ICU, etc.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Remy Lebeau