- Currently, an HTML extract looks something like this:
<p>1\n</p>\n\n<p>When writing systems were created in ancient civilizations, a variety of objects, such\n<!-- Tidy found serious XHTML errors -->...
Notice how the second <p> is not closed, and how we're shipping extra debug information. Make sure the new config actually fixes the invalid code and doesn't output any debug information.
The issues came up while we were working on T156467.
- Using number of characters by string slicing can have unexpected consequences and we may want to revisit how we do that. See T92628