'PHP how to convert UTF-8 to MUTF-8?
in PHP, how can i convert UTF-8 to MUTF-8? i am hoping i can lazily just get away with
function utf8_to_mutf8(string $utf8):string{
return str_replace("\x00", "\xC0\x80", $utf8);
}
? given that all multi-byte characters in utf-8 have the high bit set, \x00 will never occur in any multi-byte character, and the following should be completely unnecessary?
function utf8_to_mutf8(string $utf8):string{
$old = mb_internal_encoding();
mb_internal_encoding("UTF-8");
$ret = mb_ereg_replace("\x00", "\xC0\x80",$utf8);
mb_internal_encoding($old);
return $ret;
}
Solution 1:[1]
- Yes,
"\x00"will only occur for codepoint U+0000 and never for any other codepoint. Only all ASCII characters have the highest bit not set (U+0000 to U+007F = bits 00000000 to 01111111). Encountering bytes that have not the highest bit set can also be used for sychronization in case it is unclear where the next codepoint/character begins. - Yes,
str_replace()is enough, because it is already binary safe, as said in the docs. Speak: it does neither care about the input's encoding, nor about global settings.
If your goal is to have a chain of bytes that will never ever have a "\x00" in it then you should achieve it this way.
Personally I think null terminations are outdated, and following the old Java way to work around that limitation just comes with the same disadvantages of not being able to use "\x00" in the first way. You just end up to unmodify your encoding again to let all UTF-8 handling properly deal with it.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AmigoJack |
