'Microsoft.ACE.OLEDB text csv file UTF-8 with BOM - Read problem

I have an application in which the user can query data in a text (e.g. .csv) file with ad hoc SQL SELECT statements. Using ODBC drivers or OLEDB providers (including Microsoft.ACE.OLEDB.16.0) to read the files, everything works great except reading UTF-8 with BOM encoded files. I specify CharacterSet=65001 in the Extended Properties. It does correctly read the encoded Unicode text from the file. But it doesn't recognize the 3 byte BOM (EFBBBF in hex) at the start of the file. It assumes it is just another Unicode character, which it is: U+FEFF (ZWNBS, an unusual character, correctly encoded as 3 bytes in the file - my system cannot display it so a ? is shown). (If reading file as ANSI/ISO-8859-1 and not UTF-8/65001 then the BOM is displayed as ). The column names in the .csv file are in the 1st row of the text file. So the BOM (U+FEFF) gets prepended to the 1st column name (i.e. 1st column gets renamed) - messing things up. My HACK solution is, if this UTF-8 file does have a BOM, then, when displaying data (e.g. showing results of select * from file1, including the column names), remove the BOM from the 1st column name. And then add it back when querying for the column. Is there a better solution?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source