Page MenuHomePhabricator

Make LinearDoc.js signal an error if given ill-formed XML
Open, LowPublic

Description

At the moment, LinearDoc.js will accept ill-formed XML containing syntax like <!--- foo ----> , which then gets misinterpreted.

For example, the document '<p>foo<!-- bar -->baz</p>' correctly gives the following LinearDoc (XML dump):

<p>
<cxtextblock>
  <cxtextchunk>foo</cxtextchunk>
  <cxtextchunk>baz</cxtextchunk>
</cxtextblock>
</p>

However, the ill-formed document '<p>foo<!--- bar ---->baz</p>' gives the following LinearDoc (XML dump):

<p>

The expected behaviour is to signal an XML parser error on reading the ill-formed document.

This bug occurs because the nodejs sax 0.6.0 module silently accepts ill-formed XML, which violates 5.1 of http://www.w3.org/TR/REC-xml/ ("Validating and non-validating processors alike must report violations of this specification's well-formedness constraints").


Version: unspecified
Severity: minor

Details

Reference
bz68149

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:39 AM
bzimport set Reference to bz68149.
bzimport added a subscriber: Unknown Object (MLST).

Can this actually happen and break anything with the way in which CX currently works? Or is it just for caution?

Arrbee lowered the priority of this task from Medium to Low.Jan 7 2019, 2:26 PM

Recommendation from @Nikerabbit is to add a test case for this. Not high priority at the moment.