'Proper Charset to work with Vietnamese Characters (that isn't Unicode) in PHP [duplicate]

I've searched around for a while and haven't yet found something that'll work for me. I am using a PHP form to submit data into SAP using the SAP DI API. I need to figure out which character set will actually allow me to store and work with Vietnamese characters.

UTF8 seems to work for a lot of the characters but ô becomes Ã´. More importantly, there are character limits, and UTF-8 breaks character limits. If I have a string of 30 characters it tells the API that it's more than 50. The same is true for storing in MySQL--if there's a varchar character limit, UTF-8 causes the string to go above it.

Unfortunately, when I search, UTF-8 seems to be the only thing people suggest for Vietnamese characters. If I don't encode the characters at all, they get stored as their html character codes. I've also tried ISO-8859-1, converting into UCS-2 or UCS-4... I'm really at a loss. If anyone has experience working with vietnamese characters, your help would be greatly appreciated.

UPDATE

It appears the issue may be with my wampserver on Windows. here's a bit of code that is confusing me:

$str = 'VậTCôNG';
$str1 = utf8_encode($str);
if (mb_detect_encoding($str,"UTF-8",true) == true) {
    print_r('yes');
    if ($str1 == $str) {
        print_r('yes2');
    }
}
echo $str . $str1;

This prints "yes" but not "yes2", and $str.str1 = "VậTCôNGVáºTCÃ´NG" in the browser.

I have my php.ini file with:

default_charset = "utf-8"

and my httpd.conf file with:

AddDefaultCharset UTF-8

and my php file I'm running has:

header("Content-type: text/html; charset=utf-8");

So I'm now wondering: if the original string was utf-8, why wouldn't it equal a utf8 encoding of itself? and why is the utf8 encoding returning wrong characters? Is something wrong in the wampserver configurations?

Solution 1:^[1]

Ã´ is the "Mojibake" for ô. That is, you do have UTF-8, but something in the code mangled it.

See Trouble with utf8 characters; what I see is not what I stored and search for Mojibake. It says to check these:

The bytes to be stored need to be UTF-8-encoded. Fix this.
The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
HTML should start with <meta charset=UTF-8>.

It is possible to recover the data in the database, but it depends on details not yet provided.

http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

Each Vietnamese character take 2-3 bytes for encoding in UTF-8. It is unclear whether the "hard 50" is really a character limit or a byte limit.

If you happen to have Mojibake's sibling "double encoding", then a Vietnamese character will take 4-6 bytes and feel like 2-3 characters. See "Test the data" in the first link.

An example of how to 'undo' Mobibake in MySQL: CONVERT(BINARY(CONVERT('VáºTCÃ´NG' USING latin1)) USING utf8mb4) --> 'V?TCôNG'

"Double encoding" is sort of like Mojibake twice. That is one side treats it as latin1, the other as UTF-8, but twice.

V?TCôNG, as UTF-8, is hex 56e1baad5443c3b44e47. If that hex is treated as character set cp850 or keybcs2, the string is Vß?¡TC??NG.

Solution 2:^[2]

Change it to VISCII.

Input: ô 
Output: ô

You can test it at Charset converter.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Community
Solution 2	r0xette

'Proper Charset to work with Vietnamese Characters (that isn't Unicode) in PHP [duplicate]

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]