Page MenuHomePhabricator

Accessing bug 9444 via XML RPC API crashes due to invalid byte sequence: "not well-formed (invalid token)"
Closed, ResolvedPublic


body: '<?xml version="1.0" encoding="UTF-8"?><methodResponse><params><param><value><struct><member><name>bugs</name><value><struct><member><name>9444</name><value><struct><member><name>comments</name><value><array><data><value><struct><member><name>is_private</name><value><boolean>0</boolean></value></member><member><name>count</name><value><int>0</int></value></member><member><name>creator</name><value><string></string></value></member><member><name>time</name><value><dateTime.iso8601>20070329T08:11:13</dateTime.iso8601></value></member><member><name>bug_id</name><value><int>9444</int></value></member><member><name>author</name><value><string></string></value></member><member><name>text</name><value><string>A database error has occurred Query: SELECT\nmath_outputhash,math_html_conservativeness,math_html,math_mathml FROM math WHERE\nmath_inputhash = \'\xef\xbf\xbd\xef\xbf\xbd\xd7\xbe\xef\xbf\xbd\x1f\x11\xef\xbf\xbd\xef\xbf\xbd\x12@\x01\xcb\xb5\' LIMIT 1 Function: MathRenderer::_recall Error: 1\nERROR: invalid byte sequence for encoding "UTF8": 0xebc3d'
Traceback (most recent call last):

File "", line 64, in <module>
File "", line 49, in fetch
  com = server.Bug.comments(kwargs)['bugs'][bugid]['comments']
File "/usr/lib/python2.7/", line 1224, in __call__
  return self.__send(self.__name, args)
File "/usr/lib/python2.7/", line 1578, in __request
File "/usr/lib/python2.7/", line 1264, in request
  return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/", line 1297, in single_request
  return self.parse_response(response)
File "/usr/lib/python2.7/", line 1467, in parse_response
File "/usr/lib/python2.7/", line 557, in feed
  self._parser.Parse(data, 0)

xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 22

Version: wmf-deployment
Severity: major
See Also:



Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:43 AM
bzimport set Reference to bz69747.

Should drop some stupid chars like via

$string =~ tr/\xea-\xef/-/;

somewhere before

text       => $self->type('string', $comment->body_full),


I guess.
Late uneducated comment that might be blatantly wrong tomorrow morning.

[Mostly making comments here for myself.]

One problem here is that we have not 200% identified which actual chars are offending, we only guess.
Another problem is that I cannot easily create a local testcase.

Workaround in : Use
$initial =~ s/([\x01-\x08\x0b\x0c\x0f-\x1f])/sprintf "\\x%02x", ord($1)/ge; lists a similar example (also >x80 for stripping non-ascii entirely):
$qp_string =~ s/([=\x00-\x1F\x80-\xFF])/sprintf("=%02X",ord($1))/ge;

Above workaround is overkill though: if you replaced \x61 (letter: a) you'd end up with "Wrong/unsupported datatype 'boole\\x61n' specified" in the XMLRPC response. Hence slightly concerned about unwanted side effects, but above character range is nothing that should be used anyway.

So I tested the two-liner hack with the less commonly used letter \xc4\x8d (letter: č) in some comments, and the char replacement worked as expected in the XMLRPC response.

Helpful tables for conversion:

Change 155732 had a related patch set uploaded by Aklapper:
When exporting Bugzilla tickets via Chase's script we run into an API bug with specific Unicode letters for This is applying a hackish upstream workaround described in

Change 156100 had a related patch set uploaded by Aklapper:
Work around Bugzilla XML RPC bug with special Unicode characters

Change 155732 merged by Dzahn:
Create copy of upstream file (for followup custom change)

Change 156100 merged by Dzahn:
Work around Bugzilla XML RPC bug with special Unicode characters

Now a script querying the XML RPC API does not drop out anymore at ticket #9444, the XML also looks still valid, and I have not experienced any other explosions or incidents yet.

Closing as FIXED, crossing fingers it'll stay like that.

Note: As this workaround is applied to *any* output if also damages binary attachment data. See