Page MenuHomePhabricator

Update the DiscussionTools post-cache HTML pass to use Remex instead of regexes
Closed, ResolvedPublic

Description

Ed and I have already sneaked in this work and worked out the details in Gerrit, filing a task for ease of reference, just in case we break something.

Motivation:

(…) However, two things have happened in the meantime:

  • We've had another XSS vulnerability caused by doing regexp replacements over HTML (T381617)
  • Because of that, we've started working again on updating the core post-cache HTML passes to use Remex instead of regexps (I4cb2f29cf8, I1ff9a7c942)

Thus I would very much like us to update the DiscussionTools post-cache HTML pass to use Remex instead of regexps too, and then redo this patch on top of it.

Event Timeline

Change #1148378 had a related patch set uploaded (by Bartosz Dziewoński; author: Esanders):

[mediawiki/extensions/DiscussionTools@master] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1148378

Change #1154332 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@master] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1154332

This seems like something you should warn the team about. ;)

Change #1148378 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1148378

Performance-wise, this is a bit slower, but not nearly as much as I worried. Tested by running composer phpunit:entrypoint -- extensions/DiscussionTools/tests/phpunit/CommentFormatterTest.php, average over 4 runs each: before – 5.4 s, after (I126203ab1) – 6.3 s, middle (this change) – 9.9 s. So there will be a performance degradation while we're running both versions of the code, but we should be able to handle that, and once we remove the compat code it's only slightly slower than the regexes.

QA: All talk page features should render identically to how they did before.

Change #1154332 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1154332

I suspect https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/1148378 is causing MediaWiki\Extension\DiscussionTools\Hooks\PageHooks::onOutputPageBeforeHTML to run on every normal article page. From the flame graph archive at https://performance.wikimedia.org/php-profiling/, this function is barely visible on 2025-06-13, but begins to call BatchModifyElements::apply and RemexHtml after 2025-06-14. The time wasted by this is considerable on long pages.

I suspect https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/1148378 is causing MediaWiki\Extension\DiscussionTools\Hooks\PageHooks::onOutputPageBeforeHTML to run on every normal article page. From the flame graph archive at https://performance.wikimedia.org/php-profiling/, this function is barely visible on 2025-06-13, but begins to call BatchModifyElements::apply and RemexHtml after 2025-06-14. The time wasted by this is considerable on long pages.

Interesting, thanks for letting us know. I started a new task for this, and replied there: T400115: Possible performance issue in DiscussionTools

Reedy added a parent task: Restricted Task.Sep 15 2025, 4:18 PM

Pardon me for asking another question here. I sometimes feel a bit annoyed by the time taken for the OutputPageBeforeHTML hook, as it can take more than 1s on some large talk pages (length alone might not be the problem, probably also related to the number of signatures), and the result is not cached. Is it inevitable by design, or already tracked somewhere in another ticket?

@Bewfip I don't think it can be cached, or at least not very effectively; the result depends on the current time (for the relative time display), and also on the user viewing it (for the state of topic subscriptions and presence of thanks buttons). A whole second is a lot slower than I would expect it to be though, even on large pages; I recorded a profile and there are some possible improvements. I started another new task for this: T405135: Optimize DiscussionTools OutputPageBeforeHTML hook handler (2025)

Change #1192211 had a related patch set uploaded (by Reedy; author: Esanders):

[mediawiki/extensions/DiscussionTools@REL1_44] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1192211

Change #1192214 had a related patch set uploaded (by Reedy; author: Esanders):

[mediawiki/extensions/DiscussionTools@REL1_43] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1192214

Change #1192211 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@REL1_44] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1192211

Change #1192221 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@REL1_44] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1192221

Change #1192221 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@REL1_44] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1192221

Change #1192255 had a related patch set uploaded (by Reedy; author: Bartosz Dziewoński):

[mediawiki/extensions/DiscussionTools@REL1_43] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1192255

Change #1192214 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@REL1_43] Use custom elements for content placeholders, with HtmlHelper::modifyElement

https://gerrit.wikimedia.org/r/1192214

Change #1192255 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@REL1_43] Remove cache compat code for content placeholders

https://gerrit.wikimedia.org/r/1192255