'List of encodings that Node.js supports
I need to read a file which is encoded with ISO-8859-1 (also called latin1), something like this:
var file_contents = fs.readFileSync("test_data.html", "latin1");
However, Node complains about "latin1" or "ISO-8859-1" not being a valid encoding ("Error: Unknown encoding").
What encodings does readFileSync accept?
Solution 1:[1]
The list of encodings that node supports natively is rather short:
- ascii
- base64
- base64url (Node v14+)
- hex
- ucs2/ucs-2/utf16le/utf-16le
- utf8/utf-8
- binary/latin1 (ISO8859-1, latin1 only in node 6.4.0+)
If you are using an older version than 6.4.0, or don't want to deal with non-Unicode encodings, you can recode the string:
Use iconv-lite to recode files:
var iconvlite = require('iconv-lite');
var fs = require('fs');
function readFileSync_encoding(filename, encoding) {
var content = fs.readFileSync(filename);
return iconvlite.decode(content, encoding);
}
Alternatively, use iconv:
var Iconv = require('iconv').Iconv;
var fs = require('fs');
function readFileSync_encoding(filename, encoding) {
var content = fs.readFileSync(filename);
var iconv = new Iconv(encoding, 'UTF-8');
var buffer = iconv.convert(content);
return buffer.toString('utf8');
}
Solution 2:[2]
The encodings are spelled out in the buffer documentation.
Buffers and character encodings:
Character Encodings
utf8: Multi-byte encoded Unicode characters. Many web pages and other document formats use UTF-8. This is the default character encoding.utf16le: Multi-byte encoded Unicode characters. Unlikeutf8, each character in the string will be encoded using either 2 or 4 bytes.latin1: Latin-1 stands for ISO-8859-1. This character encoding only supports the Unicode characters fromU+0000toU+00FF.Binary-to-Text Encodings
base64: Base64 encoding. When creating a Buffer from a string, this encoding will also correctly accept "URL and Filename Safe Alphabet" as specified in RFC 4648, Section 5.base64url(Node v14+): base64url encoding as specified in RFC 4648, Section 5. When creating a Buffer from a string, this encoding will also correctly accept regular base64-encoded strings. When encoding a Buffer to a string, this encoding will omit padding.hex: Encode each byte as two hexadecimal characters.Legacy Character Encodings
ascii: For 7-bit ASCII data only. Generally, there should be no reason to use this encoding, as 'utf8' (or, if the data is known to always be ASCII-only, 'latin1') will be a better choice when encoding or decoding ASCII-only text.binary: Alias for 'latin1'.ucs2: Alias of 'utf16le'.
Solution 3:[3]
Starting at v12, Node supports these encoding values:
asciibase64hexlatin1ucs2utf16leutf8
Node v14 and later add base64url coding.
Use this shell script to hackily extract the supported encodings from every version of Node:
for v in {12..17}; do echo v$v; curl -L https://raw.githubusercontent.com/nodejs/node/v$v.x/lib/buffer.js --silent | grep -A 100000 'const encodingOps' | grep -B 100000 -E '^}' -m 1 | grep -Eo '^ [^ :}]+' | grep -Eo '[^ ]+' | sort; echo ---; done
Solution 4:[4]
If the above solution does not work for you it is may be possible to obtain the same result with the following pure nodejs code. The above did not work for me and resulted in a compilation exception when running 'npm install iconv' on OSX:
npm install iconv
npm WARN package.json [email protected] No README.md file found!
npm http GET https://registry.npmjs.org/iconv
npm http 200 https://registry.npmjs.org/iconv
npm http GET https://registry.npmjs.org/iconv/-/iconv-2.0.4.tgz
npm http 200 https://registry.npmjs.org/iconv/-/iconv-2.0.4.tgz
> [email protected] install /Users/markboyd/git/portal/app/node_modules/iconv
> node-gyp rebuild
gyp http GET http://nodejs.org/dist/v0.10.1/node-v0.10.1.tar.gz
gyp http 200 http://nodejs.org/dist/v0.10.1/node-v0.10.1.tar.gz
xcode-select: Error: No Xcode is selected. Use xcode-select -switch <path-to-xcode>, or see the xcode-select manpage (man xcode-select) for further information.
fs.readFileSync() returns a Buffer if no encoding is specified. And Buffer has a toString() method that will convert to UTF8 if no encoding is specified giving you the file's contents. See the nodejs documentation. This worked for me.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Codebling |
| Solution 2 | Codebling |
| Solution 3 | |
| Solution 4 | markrboyd |
