Page MenuHomePhabricator

Refactor Parsoid extension domToWikitext to be domToSource
Open, MediumPublic

Description

That is, the return value should be the *contents* of the extension tag and any *attributes* of the extension tag, not a wikitext string. (Right now we return the wikitext string starting with the open-extension tag and ending with the closed-extension-tag, leaving the extension to implement all escaping itself... often poorly or incompletely.)

The API should allow extensions to specify whether they prefer empty contents to be serialized as <tag></tag> or <tag/> or even perhaps <tag>\n</tag> (for "block" tags), but also whether they would like the tag to be dropped entirely (ex: Cite's <ref> tag has this behavior).

This is a follow up to https://gerrit.wikimedia.org/r/584052

Event Timeline

ssastry moved this task from Needs Triage to Tech Debt / Big changes on the Parsoid board.

(Might be worth throwing an exception or performing a generic fixup if the returned extension tag contents contain the </extension-tag> string? Extensions should "know" that's the one substring they cannot return from html2wt. In my wikitext2.0 proposal I suggested <extTag#foo> would match with </extTag#foo> (and otherwise the contents of the hash would be ignored) which would be an 'easy' way to do this. If the contents contain </extTag>, then do: pick or increment an N, if contents contain </extTag#N> lop; then serialize with <extTag#N> ... </extTag#N>.

Change 747211 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] WIP: Extension interface domToWikitext -> domToSource

https://gerrit.wikimedia.org/r/747211