Follow up from the offsite:
Both the new and old page content APIs output JSON directly. While this is the end goal it has a few downsides:
- It does not separate the concerns of finding content within the HTML and formatting it into JSON
- Because changes are not output as HTML any cleanup performed by the API is not easily upstreamed to Parsoid if it is found to be general purpose.
In order to address this, a new API will be written the performs all the cleaning and formatting of the MCS API, but outputs it as HTML.
This includes, but is not limited to:
- Marking up sections, including the lead section (see also T114072).
- Marking up other content components that need special treatment or removal in mobile browsers or apps (ex: infobox, navboxes, references, pronunciation help).
In the longer term, this should result in an an improvement of the Parsoid markup spec with much better support for easily and efficiently accomplishing common selection and reformatting tasks on our content.