Helpful Information
 
 
Category: Programming Languages More
Whitespace problem in XML

Having a problem with the IBM SAX parser (which is Xerces).

We have the data comming from a client that looks something like this:
<tag>
A string of about 300 characters containing an embedded null
</tag>

Now when our Java servlet parses this we have a character method that does this:
public void characters(char[] ch, int start, int length)
{
String s = new String(ch, start, length);
.
.
.
And we get an SAX parser error on the column containing the null (Unicode 0x0).

What is the best way to handle this? I don't think ignorable Whitespace solves the problem since that seems to handle elements that are entirely whitespace. Any help is appreciated.

...We have the data comming from a client that looks something like this:
<tag>
A string of about 300 characters containing an embedded null
</tag>...
And we get an SAX parser error on the column containing the null (Unicode 0x0).

What is the best way to handle this? I don't think ignorable Whitespace solves the problem since that seems to handle elements that are entirely whitespace. Any help is appreciated.

Your data is not valid XML and the parser is correct in complaining about that, see eg. in section "Characters and escaping" from wikipedia.org/wiki/XML:

"& #0;" is not permitted, however, as the null character is one of the control characters excluded from XML, even when using a numeric character reference.[10] An alternative encoding mechanism such as Base64 is needed to represent such characters.