Page MenuHomePhabricator

GlobalContributions: Perform queries using revision table if lookup does not require use of cu_changes table
Closed, ResolvedPublic

Description

Summary

The Special:GlobalContributions page allows lookups for accounts. This query is performed on the cu_changes table, but does not need to be because we do not need any data from that table. Querying the revision table instead should be faster.

Background

  • Special:GlobalContributions allows lookups of global contributions from named accounts, temporary accounts, and all contributions by temporary accounts on a given IP address or range
    • All lookup types but the last are very similar to Special:Contributions but instead is global.
  • When looking up for account contributions we don't need to get any private data, such as the IP address, to assist in the lookups
  • Adding JOINs to the query which are not necessary will likely cause a slower query, which we want to avoid when performing this query on potentially 20 wikis at a time
  • Therefore, querying the revision table directly should help in avoiding these issues

Acceptance criteria

  • Query performed by Special:GlobalContributions only uses cu_changes table if the lookup requires using that table, otherwise uses the revision table

Event Timeline

Dreamy_Jazz renamed this task from {component}: {use imperative mood to describe desired outcome} to GlobalContributions: Perform queries using revision table if lookup does not require use of cu_changes table.Mar 12 2025, 1:32 PM
mszabo changed the task status from Open to In Progress.Mar 14 2025, 1:20 PM
mszabo claimed this task.
mszabo triaged this task as Medium priority.

Change #1127896 had a related patch set uploaded (by Máté Szabó; author: Máté Szabó):

[mediawiki/extensions/CheckUser@master] GlobalContributions: Eschew JOIN on cu_changes unless necessary

https://gerrit.wikimedia.org/r/1127896

Change #1127896 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] GlobalContributions: Avoid JOIN on cu_changes unless necessary

https://gerrit.wikimedia.org/r/1127896

https://trace.wikimedia.org/trace/7351c34fdcc73fd47c12357b3a73de30?uiFind=ab3023597021cdb6 trace from a few seconds ago, where the train is on wmf.21

https://trace.wikimedia.org/trace/59ceb6fcf95da8e250f44656ae47e1d8?uiFind=91fbcd0474894bff trace from wmf.20, before the patch was deployed.

Query for enwiki with patch on wmf.21 is 11.5ms. Query for enwiki with wmf.20 without patch is 350ms when we used cu_changes for the query for a username. 🎉

Following this change, should the checkuser-global-contributions-subtitle message be updated to reflect that - in circumstances where the query is now performed on the revision table - the results will no longer necessarily be limited to the last 90 days? (At least, I believe that this is the case!)

(edit: and probably also wikimedia-checkuser-global-contributions-registered-user-tools?)

Following this change, should the checkuser-global-contributions-subtitle message be updated to reflect that - in circumstances where the query is now performed on the revision table - the results will no longer necessarily be limited to the last 90 days? (At least, I believe that this is the case!)

Good point - there's no 90-day limit when the target is a user name.

Following this change, should the checkuser-global-contributions-subtitle message be updated to reflect that - in circumstances where the query is now performed on the revision table - the results will no longer necessarily be limited to the last 90 days? (At least, I believe that this is the case!)

Good point - there's no 90-day limit when the target is a user name.

I was wrong (thanks @kostajh for pointing this out) - the central index table entries are lost after 90 days for performance reasons, so the data only persists for 90 days: https://gerrit.wikimedia.org/g/mediawiki/extensions/CheckUser/+/7624c35149e6a9d39b3dd6637f3a054589af3258/src/Jobs/PruneCheckUserDataJob.php#61

I was wrong (thanks @kostajh for pointing this out) - the central index table entries are lost after 90 days for performance reasons, so the data only persists for 90 days: https://gerrit.wikimedia.org/g/mediawiki/extensions/CheckUser/+/7624c35149e6a9d39b3dd6637f3a054589af3258/src/Jobs/PruneCheckUserDataJob.php#61

Disclaimer that I haven't read all the code and so aren't entirely sure on what's happening behind the scenes, but something like https://meta.wikimedia.org/w/index.php?title=Special%3AGlobalContributions&target=A+smart+kitten&end=2024-01-01 currently displays edits.

Ctrl+F for cuci in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CheckUser/+/de6d8c356b50bb67c730de779adbf862a8b07da3/src/GlobalContributions/GlobalContributionsPager.php shows the central index table being queried in the fetchWikisToQuery() function - at a guess, might Special:GlobalContributions now display edits made by accounts outside of the 90-day window, but only for wikis that you have edited within the past 90 days? & if so, would it be potentially less confusing to manually limit any edits being displayed to those that have been made within the last CUDMaxAge?

Ctrl+F for cuci in https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CheckUser/+/de6d8c356b50bb67c730de779adbf862a8b07da3/src/GlobalContributions/GlobalContributionsPager.php shows the central index table being queried in the fetchWikisToQuery() function - at a guess, might Special:GlobalContributions now display edits made by accounts outside of the 90-day window, but only for wikis that you have edited within the past 90 days?

You are correct in your guess. However, one additional point is that it also includes any wikis you have performed any CheckUser logged action on. So for example, a user who is logging in to enwiki but has made no edits in the last 90 days will still have their edits displayed because some action exists.

& if so, would it be potentially less confusing to manually limit any edits being displayed to those that have been made within the last CUDMaxAge?

I would agree with this. Because we still have this limit in place when querying Temporary Account contributions on a IP or range. Plus, it will not always be consistent for when these extra contributions will appear as I've mentioned above.

Change #1139566 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Hide revisions from more than CUDMaxAge seconds ago in Special:GC

https://gerrit.wikimedia.org/r/1139566

Change #1139566 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Hide revisions from more than CUDMaxAge seconds ago in Special:GC

https://gerrit.wikimedia.org/r/1139566

Change #1149694 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] phpunit: Ensure wgCentralIdLookupProvider defaults to "local"

https://gerrit.wikimedia.org/r/1149694

Change #1149694 merged by jenkins-bot:

[mediawiki/core@master] phpunit: Ensure wgCentralIdLookupProvider defaults to "local"

https://gerrit.wikimedia.org/r/1149694