'Java Transformer not writing unicode symbol in CDATA section
My goal is to write a String containing the xmldata into a XML file.
But for some reason unicodes are not written into the cdata sections properly but rather end up outside of them.
For example:
With the string received containing ...<TAG><![CDATA[Save ߒ¾ Test]]></TAG>... the wanted content of the file is:
<TAG><![CDATA[Save 💾 Test]]></TAG>
But after transforming it turns out as:
<TAG><![CDATA[Save ]]>💾<![CDATA[ Test]]></TAG>
Which results in problems when wanting to read the file.
Here the code for the function:
public static void saveFile(String fileName, String xmlData) throws Exception {
OutputStream out = null;
Writer writer = null;
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
InputStream iStream= new ByteArrayInputStream(xmlData.getBytes(Charset.forName("UTF-8")));
Document document = dBuilder.parse(iStream);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, document.getXmlEncoding());
out = new FileOutputStream(new File(fileName));
writer = new OutputStreamWriter(out, document.getXmlEncoding());
transformer.transform(new DOMSource(document), new StreamResult(writer));
} catch (Exception e) {
e.printStackTrace();
throw new Exception("Could not export XML-File!", e);
} finally {
CommonUtil.close(writer, out);
}
}
For reference the corresponding node in the document object also contains the whole text:
[[#cdata-section: Speichern ߒ¾ Test]]
Am I missing something or has anyone run into a similar problem before?
Solution 1:[1]
Thanks to everybody who commented on the question. @VGR and @Codo
Sadly setCoalescing didn't quite do the job for what I wanted to do.
Using dom4j helped at the end as I was able to use XMLWriter and an OutputFormat to write the String in the formatted way with the text "staying" in the CDATA Section. This of course doesn't make the XML file valid, having surrogate blocks in the section as VGR mentioned and doesn't make the solution good but it does what is needed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sperli |
