A significant performance issue on Parsoid is the back and forth between text and DOM formats on the output pipeline (@ssastry has a trace that puts that at 9 seconds (!!) on enwiki:Barack_Obama, see discussion on T348254).
To fix this issue, we want to provide a full DOM pipeline for the Parsoid output, which means implementing DOM versions of the current text transformations.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | ihurbain | T293512 ParserOutput::getText() should be removed from ParserOutput | |||
| Open | ihurbain | T394005 Provide DOM versions of the OutputTransform pipeline transformations | |||
| Open | None | T405927 Provide a DOM version of AddRedirectHeader OutputTranform | |||
| Open | None | T405928 Provide a DOM version of the AddWrapperDivClass OutputTransform | |||
| Resolved | ssastry | T405929 Provide a DOM version of the DeduplicateStyles OutputTransform | |||
| Open | None | T405930 Provide a DOM version of the ExecutePostCacheTransformHooks OutputTransform | |||
| Open | None | T405932 Provide a DOM version of the ExpandToAbsoluteURLs OutputTransform | |||
| Resolved | ssastry | T405933 Provide a DOM version of the ExtractBody OutputTransform | |||
| Resolved | ihurbain | T405935 Provide a DOM version of the HandleTOCMarkers OutputTransform | |||
| Open | None | T405936 Provide a DOM version of the HydrateHeaderPlaceholders OutputTransform | |||
| Open | None | T405937 Provide a DOM version of the RenderDebugInfo OutputTransform |
Event Timeline
Comment Actions
Change #1150712 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/core@master] Introduce ContentHolderTransformStage
Comment Actions
Change #1191726 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):
[mediawiki/core@master] Add a DOM version of the TOC markers pass
Comment Actions
We explicitely exclude HardenNFC from this task as this is a text-only-pass that happens at the end of the pipeline, and makes no sense to convert to DOM.