Page MenuHomePhabricator

Provide DOM versions of the OutputTransform pipeline transformations
Open, HighPublic

Description

A significant performance issue on Parsoid is the back and forth between text and DOM formats on the output pipeline (@ssastry has a trace that puts that at 9 seconds (!!) on enwiki:Barack_Obama, see discussion on T348254).
To fix this issue, we want to provide a full DOM pipeline for the Parsoid output, which means implementing DOM versions of the current text transformations.

Event Timeline

ssastry triaged this task as High priority.May 22 2025, 5:52 PM

Change #1150712 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/core@master] Introduce ContentHolderTransformStage

https://gerrit.wikimedia.org/r/1150712

Change #1191726 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/core@master] Add a DOM version of the TOC markers pass

https://gerrit.wikimedia.org/r/1191726

We explicitely exclude HardenNFC from this task as this is a text-only-pass that happens at the end of the pipeline, and makes no sense to convert to DOM.