'Special French characters in HTML

French characters in HTML with utf-8 charset still display incorrectly. I have a small sample page in ShopAndBind.com/Sample.asp with META HTTP-EQUIV='Content-Type' CONTENT='text/html;charset=utf-8' that still does not display Véhicules Terrestres à Moteur correctly, whether it is in the source or loaded from MySQL data in a database. It displays fine everywhere else. I'm using Visual InterDev 6.0 from Visual Studio 2008 for development. NotePad, Kedit works. The hex in the file is'E0' and 'E9' respectively for é and à.



Solution 1:[1]

The page http://shopandbind.com/Sample.asp is served with HTTP headers that do not specify character encoding, the data does not start with BOM, but it contains a meta tag that specifies UTF-8 as the character encoding. However, the data contains bytes that are invalid in UTF-8. This explains the failure.

The data is in fact in ISO-8859-1 (or compatible) encoding, as you can see by manually selecting that encoding (often under the name “Western European”) in the View ? Encoding menu of your browser. Byes E0 and E9 denote é and à in ISO-8859-1, byt definitely not in UTF-8.

Thus, the minimal fix is to replace UTF-8 by ISO-8859-1 in the meta tag. A better fix might be to make the process that produces the HTML file to generate UTF-8 encoded data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jukka K. Korpela