Page MenuHomePhabricator

Watchlist formatter is doing a user edit count database query in every row of the result
Closed, ResolvedPublic

Description

Discovered while trying to see why watchlist has suddenly got so slow.

As an example, currently loading a watchlist makes 919 queries: https://logstash.wikimedia.org/goto/1610dd046bee4e8cbb0e9a51270b0ccb

Many of them are exact same queries. In one case I used XhGui, it's calling the user edit count for the same user 101 times in context of the one request: https://performance.wikimedia.org/xhgui/run/symbol?id=6668463540e83c26e289c1da&symbol=MediaWiki%5CUser%5CUserEditTracker%3A%3AgetUserEditCount

excimer says 12% of the time of the request is being spent doing the exact same query 100 times: https://performance.wikimedia.org/excimer/profile/0a96d7cc5babe7eb

Stack trace:

grafik.png (658×942 px, 148 KB)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Ladsgroup triaged this task as Medium priority.Jun 11 2024, 1:06 PM
Ladsgroup moved this task from Triage to In progress on the DBA board.

Change #1041691 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] User: Cache User::getExperienceLevel() for an hour

https://gerrit.wikimedia.org/r/1041691

The edit count is cached in UserEditTracker, I would not expect the edit count for the same user is selected from the database for each row. It could be different user ids getting select one by one. This also happen for Linker::userToolLinks to mark red contribs link. There is no other cache. There is no way for a batch lookup to fill this cache, only UserEditTracker::setCachedUserEditCount via User::loadFromRow from a UserArray, but that looks not like a clean solution.

Why is experience level important when displaying watchlist?

(How is this question related to this ticket?) Anyway, I guess I'm less likely to review an edit to a page on my watchlist made by an experienced author...

( @Jeff_G FYI:
The rows of the watchlist have classes derived from the query — mw-changeslist-user-experienced, mw-changeslist-user-registered, mw-changeslist-user-unregistered etc.
I am not sure what currently makes use of these classes (I could not find anything), but they would be available for a skin or user script to add some sort of "confidence level" branding. )

It is clearly expensive to check the status of every different user mentioned in the watchlist, so if nothing actually uses the classes, perhaps the query should be removed altogether?

As an aside: The query is possibly incorrect – more so if cached.
The status of the user at the moment they made the edit is the correct detail, not the status of the user now.

A future solution might be to add the user status (at the moment of editing) as a tag attached to the edit.
The logic for choosing a user status could change over time, so a more "pure" solution would be to add the raw user edit count to the record for the edit.

The status of the user at the moment they made the edit is the correct detail, not the status of the user now.

https://commons.wikimedia.org/wiki/MediaWiki:Gadget-markAdmins.js appears to use the status of the user now to mark the appropriate usernames. I have that Gadget turned on in my preferences, should I try turning it off?

Turning off the gadget won't make a difference. The query adds the class to the HTML and it will continue to do so regardless.

Why is experience level important when displaying watchlist?

It's more important for recent changes, but (as I already said elsewhere) watchlist is just a specialized view of recent changes.

( @Jeff_G FYI:
The rows of the watchlist have classes derived from the query — mw-changeslist-user-experienced, mw-changeslist-user-registered, mw-changeslist-user-unregistered etc.
I am not sure what currently makes use of these classes (I could not find anything), but they would be available for a skin or user script to add some sort of "confidence level" branding. )

It is clearly expensive to check the status of every different user mentioned in the watchlist, so if nothing actually uses the classes, perhaps the query should be removed altogether?

They were introduced with the so-called "New filters for edit review," and have been in use since.

The filters are (if I recall correctly) fairly efficient and allow batch queries. Otherwise we could not filter the list of changes by it in production.

The classes added in Special:RecentChanges during rendering, though, seem either inefficiently implemented, or using a different but similar logic that is inherently less efficient. Caching would mask it, but I wonder if it can be avoided entirely by using the same approach as we would during the query? In other words, how come we can efficiently select changes that carry these labels, but then at rendering time don't have access to the same information?

Perhaps more can be included in the DB select, or alternatively, a batch can be used. ChangesListSpecialPage already does this for various aspect of the changes list. In ::execute, where we do other preloading and batching already, maybe we can add a call to UserArray there, so that it can batch and preload User::loadFromRow, which also naturally populates UserEditTracker::setCachedUserEditCount (since its stored in the same table).

Change #1047164 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] specialpage: Preload edit count on ChangesListSpecialPage

https://gerrit.wikimedia.org/r/1047164

Change #1041691 abandoned by Ladsgroup:

[mediawiki/core@master] User: Cache User::getExperienceLevel() for an hour

Reason:

In favor of I6d4945c3735ff61df31b21a08d9f74974d316d5d

https://gerrit.wikimedia.org/r/1041691

Change #1047164 merged by jenkins-bot:

[mediawiki/core@master] specialpage: Preload edit count on ChangesListSpecialPage

https://gerrit.wikimedia.org/r/1047164

Umherirrender claimed this task.