Page MenuHomePhabricator

Implement htmlPreprocess and maybe use it for mw:DisplaySpace and DSR conversion
Closed, ResolvedPublic

Description

From the review of https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/591224

We've got some vague gestures in the Html2Wt code about *preprocessing* steps in Html2Wt handling, including an interface in the extension API which (AFAIK) isn't currently hooked up to anything.

It would be *really nice* if this code were a DomProcessor, and were invoked by the same machinery that invoked extension DomProcessors, both in the wt2html and html2wt directions. The current DisplayHack class would be DOMProcessor::wtPostprocess, and DOMProcessor::htmlPreprocess would convert mw:DisplaySpace back to ' ', then we can completely ignore mw:DisplaySpace in the rest of the html2wt pass.

If there's some reason this *wouldn't* work (interferes with selser?) then we should think hard about it and make it work, because that's the architecture we're currently expecting to use for various other extensions which implement DOMProcessor currently.

Event Timeline

Arlolra created this task.Jun 4 2020, 5:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 4 2020, 5:16 PM
cscott renamed this task from Implement htmlPreprocess and maybe use it for mw:DisplayHack and DSR conversion to Implement htmlPreprocess and maybe use it for mw:DisplaySpace and DSR conversion.Jun 4 2020, 5:18 PM

We used to have one,
https://github.com/wikimedia/parsoid/commit/02b09d887c9c5d8afa3d9c7dae69aede0182f94b
https://github.com/wikimedia/parsoid/blob/master/lib/FromHTML.js

But it was never ported.

Nothing ever call Ext/DOMProcessor::htmlPreprocess()

However, we do have another candidate for a core preprocessor, dsr offset conversion,
https://github.com/wikimedia/parsoid/blob/master/src/Core/WikitextContentModelHandler.php#L66

ssastry triaged this task as Low priority.Jun 4 2020, 8:06 PM
ssastry moved this task from Needs Triage to Feature requests on the Parsoid board.Jun 4 2020, 8:13 PM
Arlolra assigned this task to ssastry.Sep 23 2020, 8:56 PM

Change 628908 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Introduce an edited-DOM-preprocessor in the HTML -> WT direction

https://gerrit.wikimedia.org/r/628908

Change 628910 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Add preprocessing step to html2wt serializers

https://gerrit.wikimedia.org/r/628910

From https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/628908/2/src/Core/WikitextContentModelHandler.php#b69

@ssastry writes,

looks like we have 3 kinds of preprocessing then

  1. pre-dom-diff preprocessing on edited dom
  2. pre-dom-diff preprocessing on both doms
  3. post-domp-diff preprocessing on edited dom

so, we need to come up with different names and clear semantics for these stages.

Change 628910 abandoned by Subramanya Sastry:
[mediawiki/services/parsoid@master] Add preprocessing step to html2wt serializers

Reason:
Squashed into another patch.

https://gerrit.wikimedia.org/r/628910

Change 628908 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Introduce preprocessing in the HTML -> WT direction

https://gerrit.wikimedia.org/r/628908

ssastry closed this task as Resolved.Oct 1 2020, 12:13 AM

Change 635100 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100

Change 635100 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a12

https://gerrit.wikimedia.org/r/635100