Page MenuHomePhabricator

Special:Contributions takes over a second to render 500 rows (2024)
Open, Needs TriagePublic

Description

Given:

  • You've set "500" in Special:Preferences for the preferred length of Special:RecentChanges when reviewing and patrolling on a wiki.
  • This sets the default limit for "View history" (action=history) and "My contributions" (Special:Contributions) as well.

Loading https://wikitech.wikimedia.org/wiki/Special:Contributions/Krinkle takes over a second to render.

This seems oddly high given that the revision table query for this is not slow, and there aren't slow extensions like Flow involved at render time. Most anything else should be batchable and fairly trivial in compute.

@tstarling and myself analysed a number of things yesterday on IRC which I'll create subtasks for here, with the taim to get it under 1 second, ideally something un-noteworthy like 200-300ms (compared to other backend read actions).

Related, as these either the same or similarly modelled code paths (ContribsPager, ContributionsPager, HistoryPager, ChangesList, ChangesListSpecialPage) and e.g. re-using the abstraction for PagerTools and ChangeTags::formatSummaryRow

Event Timeline

Krinkle updated the task description. (Show Details)

From #mediawiki-core (Logs)

Krinkle: I wasn't expecting rclimit to apply outside recent changes.
TimStarling: since 2007: https://static-codereview.wikimedia.org/MediaWiki/22010.html
TimStarling: set in the base class so it would have been applied to more things as we migrated them to Pager

Parser::finalizeHeadings

<Krinkle> I wonder if something regressed in message parsing that it's so expensive for simple strings
…
<TimStarling> Message::format is 22%
<TimStarling> ok there are a couple of odd things in message parsing
<TimStarling> you don't need limit reports for messages (Parser::makeLimitReport)
<TimStarling> Parser::finalizeHeadings() is very expensive, that's a recent change, November 2023

Parser::finalizeHeadings regression started in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/975064 for T218330: Table of contents HTML may be unbalanced.

<TimStarling> […] and it's a very simple fix, the cost here is just due to the DOMUtils::parseHTML() call, but it's only needed in the loop body and $headlines is empty
<TimStarling> I mean the DOMUtils::parseHTML() call needs to be skipped when $headlines is empty
<TimStarling> this is 2.4% of your profile but at least it is easy

TOCData

Krinkle: we presumably don't expose TOCData on interface messages, but there may be some non-obvious cases like special pages with interace messages and a TOC where we add to OutputPage with metadata. Afaik we parse those normally, not as interface. I don't think TOC works there today, but worth checking if that's exposed somewhere.

If we don't expose TOCData on messages parsed with interface=true (ParserOptions->getInterfaceMessage) then we may be able to skip most of finalizeHeadings() and other steps entirely.

PagerTools: rollback link

<Krinkle> it's like 1-2s minimum for basic 50 edits for any given user. stands out: revDateLink(), generateRollback(), lots of repeat queries and message parsing.
…
<Krinkle> with limit 500 there's 217 db select queries. […]
…
<Krinkle> most of them are rollback count and page protection store
…
<TimStarling> generateRollback() could be fixed by just removing those links -- generate them on the client side if they are needed
…
<Krinkle> the revision count... it seems like that information ought to be in memory already. But it's structured to act on one rev at a time. It might miss a few for the last row but for most it'll have the revs between it and the next 10 (ShowRollbackEditCount).

Today we perform the revision table query for the user's contribs in 1 fast batch query. But.. when rendering the rollback links, we perform a secondary query for each and every revision separately, looking for how many revisions there are between the current revision and any previous revisions on that page by the same author. This number is used only for the link label. E.g. "Rollback 1 edit" vs "Rollback 3 edits". This seems like information we could batch in one go. For HistoryPager (action=history) it might even be in memory already from the main query (except for the last one, but could be mitigated by making sure to fetch limit + ShowRollbackEditCount, either way doing it ad-hoc for 1 is much better than for N).

PagerTools: probablyCan('edit')

Krinkle:
with limit 500 there's 217 db select queries. […]

most of them are rollback count and page protection store

TimStarling:
RestrictionStore::getRestrictions() is 209ms in your profile, that could benefit from batching
WAN cache in front of a single fast query is questionable, especially when it is called 500 times
the existence of RestrictionStore with its in-process cache would make batching pretty straightforward, it's obvious how to implement that
Krinkle:
History pager has doBatchLookups() to place this
ContribPager.... [has] doBatchLookups in the parent, ContributionsPager […]
ref T341319, I recall reviewing that for RestrictionStore [because] you can't protect against rollback.
[so] that's for $authority->probablyCan( 'edit', $title );, right above $authority->probablyCan( 'rollback', $title ) also in PagerTools

ConditionalDefaultsLookup

TimStarling:
119ms for ConvertibleTimestamp::setTimestamp(), that is a very hot few lines of code
the setTimestamp() calls mostly come from ConditionalDefaultsLookup, which is new (Nov 2023)
memoizing of ConditionalDefaultsLookup::getOptionDefaultForUser() looks straightforward and would save a substantial fraction of 96ms

That's from https://gerrit.wikimedia.org/r/c/mediawiki/core/+/978537 for T321527: Support conditional defaults for user properties.

Change #1090565 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] parser: Skip most of Parser::finalizeHeadings when no header found

https://gerrit.wikimedia.org/r/1090565

As for rollback links, there is also T122546: Cache rollback edit counts shown on recent changes, but I have had some doubts about whether it would have the desired effect.

Alternatively, we can abandon counting and just check if the button can be shown. The queries would be simplified to: "Select the most recent revision not done by the last user. If exists && !( rev_deleted & text bit ), then show [rollback]." This can also be cached and shared between patrollers and views, reducing the average amount of read queries per request.