'QR Code encoding (ISO 8859-1 vs "JIS8" vs UTF-8; ISO 18004:2000/2015 compatibility; encoding of backslash)
I have used multiple online QR Code generators to encode
"\\ö/" (3 characters: U+005C, U+00F6, U+002F). I have
verified the QR code using the Android app
"QR & Barcode Scanner" and
"https://zxing.org/w/decode.jspx". I have inspected the
bytes reported by "https://zxing.org/w/decode.jspx". The
following are the results and the questions I have about
them:
0100 00000100 01011100 11000011 10110110 00101111 ...
8bit length 4 0x5C 0xC3 0xB6 0x2F zeros and padding
\ UTF-8 for "ö" /
- Why does this work (decode as U+005C, U+00F6, U+002F)?
- Is 0x5C mapped to the Yen symbol in ISO 18004:2000 (as in "JIS8")?
- Would mapping 0x5C to the Yen symbol not be incompatible with ISO18004:2015 (using ISO 8859-1, mapping 0x5C to the backslash)?
- Why isn't 0x3C interpreted with ISO 8859-1 (according to ISO 18004:2015) as "Ã" (U+003C) and 0xB6 as "¶" (U+00B6)?
- Why aren't they interpreted with "JIS8" (according to ISO 18004:2000) as "テ" (U+FF83) and "カ" (U+FF76)?
- Why does ISO 18004:2015 claim that "Symbols complying with the requirements for QR Code Model 2, as defined in ISO/IEC 18004:2000, are readable with equipment complying with this International Standard" and "QR Code Model 2 symbols are fully compatible with QR Code reading systems"?
0111 00011010 0100 00000100 01011100 11000011 10110110 00101111 ...
ECI 26:UTF-8 8bit length 4 0x5C 0xC3 0xB6 0x2F zeros and padding
- Why does this work (decode as U+005C, U+00F6, U+002F)?
- Why is the backslash (U+005C) not doubled?
- Don't ISO 18004:2015 and ISO 18004:2000 explicitly say: "Where 5C[sub]HEX appears as true data it shall be doubled in the data string before encoding in symbols to which the ECI protocol applies"?
- What does this mean in ISO 18004:2015: "When a single occurrence of 5C[sub]HEX is encountered in the input to the decoder, an ECI indicator is inserted followed by the ECI Designator. When a doubled 5C[sub]HEX is encountered, it is encoded as two 5C[sub]HEX"?
0111 00011010 0100 00000101 01011100 01011100 11000011 10110110 00101111 ...
ECI 26:UTF-8 8bit length 5 0x5C 0x5C 0xC3 0xB6 0x2F zeros and padding
- Why does this not work (decodes as U+005C, U+005C, U+00F6, U+002F)?
- Shouldn't backslashes be doubled (see above)?
To me the most important of the above questions: (How) Can a backslash be encoded in a way that conforms to the standard and that allows reliable decoding?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
