For performance, there are many developers who create XML content using traditional StringBuffer or StringBuilder by appending values with start and end tags. This has a performance advantage than using a XML Library API to generate XML both memory wise as well as Computation wise. But many developers don't know a side effect which can occur by using this method.
For example if you create the XML content by appending strings and use the following
For example if you create the XML content by appending strings and use the following
StringBuilder xml = new StringBuilder("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
xml.append("<envelope>");
xml.append(String.format("<header>%s</header>", "Important"));
xml.append(String.format("<body>%s</body>", "Hello, World!"));
xml.append("</envelope>");
BufferedWriter bw = null;
try {
bw = new BufferedWriter(new FileWriter("message.xml"));
bw.write(xml.toString());
} catch(Exception e) {
e.printStackTrace();
} finally {
try {
if(bw != null) {
bw.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
The above code will work as long as for content of XML you have Characters which are accepted by the default character set of the Operating System you used to execute this code with. If the character set is not "UTF-8" as soon as you have something like this "CAFÉ" in your XML Content, the XML will be corrupted and will not be able to opened up using web browsers and cannot be parsed using any parsing API.
To fix this you can either ensure you will never have characters outside of the default character set of the Operating System or if you want "UTF-8" by any means, you can change the content writing as following which will fix the issue.
try {
FileOutputStream fos = new FileOutputStream("message.xml");
bw = new BufferedWriter(new OutputStreamWriter(fos, "UTF8"));
bw.write(xml.toString());
} catch(Exception e) {
e.printStackTrace();
} finally {
try {
if(bw != null) {
bw.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
In this scenario we are explicitly saying that we need the content to be written in UTF8 Character set using OutputStreamWriter.