Page MenuHomePhabricator

Post-cache output transforms are expensive on large pages
Closed, ResolvedPublic

Description

@Krinkle in #mediawiki-core on 12 May 2025:

Looks like there's a major latency regression on mobile as of ~1 month ago, almost doubled.
started Feb 28
https://grafana.wikimedia.org/d/QLtC93rMz/backend-pageview-timing?viewPanel=panel-60
{F59940812 height=300}

OutputTransformPipeline runs various post-cache transformations on article HTML. On non-parsoid wikis, this consists of deduplicating style nodes within the content in addition to ToC handling for mobile output.

These transforms are expensive for larger pages. On mwdebug, the desktop transforms for the Barack Obama page take 350+ ms on a parser cache hit, twice as long for mobile which runs an additional step.

As of May 12th, we currently spend ~10% of index.php wall time in these transforms (4.5% overall). While T394005 will offer a long-term solution once Parsoid is the default everywhere, it may be worth improving the existing transforms to limit the impact.

Event Timeline

Change #1145252 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1145252

Change #1145252 merged by jenkins-bot:

[mediawiki/core@master] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1145252

Change #1146669 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/core@master] Make both DOM passes adjacent to each other in the pipeline

https://gerrit.wikimedia.org/r/1146669

Change #1146669 merged by jenkins-bot:

[mediawiki/core@master] Make both DOM passes adjacent to each other in the pipeline

https://gerrit.wikimedia.org/r/1146669

Change #1147743 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@master] parser: Optimize TOC placeholder replacement

https://gerrit.wikimedia.org/r/1147743

MSantos subscribed.

@mszabo I'm assigning this task to you based on the task history and the patches, feel free to re-assign in case I'm missing something. Thanks!

Change #1148315 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/core@wmf/1.45.0-wmf.1] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148315

Change #1148315 merged by jenkins-bot:

[mediawiki/core@wmf/1.45.0-wmf.1] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148315

Mentioned in SAL (#wikimedia-operations) [2025-05-20T12:17:44Z] <mszabo@deploy1003> Started scap sync-world: Backport for [[gerrit:1148315|DeduplicateStyles: Only transform possible style nodes (T394059)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-20T12:24:02Z] <mszabo@deploy1003> mszabo: Backport for [[gerrit:1148315|DeduplicateStyles: Only transform possible style nodes (T394059)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-05-20T12:35:26Z] <mszabo@deploy1003> Finished scap sync-world: Backport for [[gerrit:1148315|DeduplicateStyles: Only transform possible style nodes (T394059)]] (duration: 17m 42s)

Change #1148323 had a related patch set uploaded (by Reedy; author: Máté Szabó):

[mediawiki/core@REL1_44] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148323

Change #1148324 had a related patch set uploaded (by Reedy; author: Máté Szabó):

[mediawiki/core@REL1_43] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148324

Change #1148323 merged by jenkins-bot:

[mediawiki/core@REL1_44] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148323

Change #1147743 merged by jenkins-bot:

[mediawiki/core@master] parser: Optimize TOC placeholder replacement

https://gerrit.wikimedia.org/r/1147743

Krinkle closed this task as Resolved.EditedJun 17 2025, 8:26 PM

It has improved a lot, but I guess other (unreleated?) factors also regressed, and those are still elevated a fair bit (ref T255502: Goal: Save Timing median back under 1 second).

From Grafana / Backend Pageview Timing dashboard:

Screenshot 2025-06-17 at 13.17.50.png (2,819×1,447 px, 247 KB)

Change #1148324 merged by jenkins-bot:

[mediawiki/core@REL1_43] DeduplicateStyles: Only transform possible style nodes

https://gerrit.wikimedia.org/r/1148324