'unescaping UTF-8 characters from file (InputStream)
I am trying to unescape UTF_8 characters like "\u00f6" to their UTF-8 representation.
E.g. file contains "Aalk\u00f6rben" should become "Aalkörben".
val tmp = text.toByteArray(Charsets.UTF_8)
val escaped = tmp.decodeToString()
// or val escaped = tmp.toString(Charsets.UTF_8)
When I set the string manually to "Aalk\u00f6rben", this works fine. However, when reading the string from the file it is interpreted like "Aalk\\u00f6rben" with the slash escaped (two slashes) and the escaping fails.
Is there any way to convince Kotlin to convert the special characters? I would rather not use external libraries like from Apache.
Solution 1:[1]
I do not know how you read the file, but what happens most probably is that ...\u00f6... is read as six single characters and the backslash is probably being escaped. You could check in the debugger.
So my assumption is that in memory you have "Aalk\\u00f6rben". Try this replace:
val result = text
.replace("\\u00f6", "\u00f6")
.toByteArray(Charsets.UTF_8)
.decodeToString()
Edit: this should replace all escaped 4 byte characters:
val text = Pattern
.compile("\\\\u([0-9A-Fa-f]{4})")
.matcher(file.readText())
.replaceAll { matchResult -> matchResult.group(1).toInt(16).toChar().toString() }
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
