Page MenuHomePhabricator

Unpurged renameuser rows in recentchanges causing isDenseTagFilter to return true inappropriately
Closed, ResolvedPublic

Description

In enwiki.recentchanges, there are 11 rows in recentchanges that predate the 1 month cutoff. The oldest is from 2024-08-10. They are all renameuser/renameuser log actions. This is causing a severe performance regression in Special:RecentChanges because isDenseTagFilter() is returning true for some filters that are not actually dense.

The formula is:

$isDense = $limit * $rcSize < $tagCount * $tagCount;

$rcSize is ~132M when it should be ~10M. For filter 656, $tagCount is ~136k when it should be ~13k. So for limit = 100, the inequality is 1.3e10 < 1.9e10 when it should be 9.9e8 < 1.8e8.

Experimentally, using JOIN instead of STRAIGHT_JOIN reduces the query time from 10.6s to 0.5s.

Event Timeline

RecentChangesUpdateJob uses RecentChange::getQueryInfo() which joins on the actor table. The actor rows are missing. RecentChangesUpdateJob does not construct a RecentChange object and has no need for the actor table join so there is no benefit to using RecentChange::getQueryInfo().

Actually there is a hook RecentChangesPurgeRows which exists solely for ORES, and ORES does use one other field from the result set (not the actor join though).

the 3 month cutoff.

Please correct me if I'm wrong but cutoff for RC is 30 days. It's for CU that's 3 months:

wmf-config/CommonSettings.php:$wgRCMaxAge = 30 * 86400;

I assume most of 30 days old ones are probably rename user too.

RecentChangesUpdateJob uses RecentChange::getQueryInfo() which joins on the actor table. The actor rows are missing. RecentChangesUpdateJob does not construct a RecentChange object and has no need for the actor table join so there is no benefit to using RecentChange::getQueryInfo().

On the other hand it should never happen that the actor row is missing. In this case this is a fallout from T398177.

Okay if I simply drop the old rows (until it's properly fixed in code too)?

the 3 month cutoff.

Please correct me if I'm wrong but cutoff for RC is 30 days. It's for CU that's 3 months:

wmf-config/CommonSettings.php:$wgRCMaxAge = 30 * 86400;

Yes, but the other numbers are still correct, because I didn't use that figure anywhere. The rc_id ranges I used were correct. I'll just edit the task description.

On the other hand it should never happen that the actor row is missing. In this case this is a fallout from T398177.

Confirmed, but I'm still working on a patch to get rid of the join.

Okay if I simply drop the old rows (until it's properly fixed in code too)?

Yes, since the root cause has been identified it's fine to drop them.

cumin2024@db1163.eqiad.wmnet[enwiki]> delete from recentchanges where rc_timestamp < '20250725000000';
Query OK, 11 rows affected (0.009 sec)

https://logstash.wikimedia.org/goto/4fdb0da10c6be0cc2c0d457ed46e9519

It didn't make a dent in slow queries :(

grafik.png (961×237 px, 15 KB)

Maybe I'm doing it wrong. I'll check.

cumin2024@db1163.eqiad.wmnet[enwiki]> delete from recentchanges where rc_timestamp < '20250725000000';
Query OK, 11 rows affected (0.009 sec)

You should run this on other wikis too, as the issue in T398177 is not specific to enwiki. (It may not be causing slow queries on smaller wikis now, but it eventually might.)

My graph is only on enwiki. I will fix other wikis too tomorrow.

It didn't make a dent in slow queries :(
Maybe I'm doing it wrong. I'll check.

I can see with X-Wikimedia-Debug that the query for this URL takes 560ms now, whereas in the log yesterday it took 8.8 seconds. How much it impacts the stats overall depends on how often we get these kinds of queries and how many tags were affected. Let's just call it a small win for a small effort.

Change #1182683 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] recentchanges: Stop doing a join when purging RecentChanges rows

https://gerrit.wikimedia.org/r/1182683

Change #1182684 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/ORES@master] Use the new RecentChangesPurgeQuery hook

https://gerrit.wikimedia.org/r/1182684

Change #1182683 merged by jenkins-bot:

[mediawiki/core@master] recentchanges: Stop doing a join when purging RecentChanges rows

https://gerrit.wikimedia.org/r/1182683

Change #1182684 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] Use the new RecentChangesPurgeQuery hook

https://gerrit.wikimedia.org/r/1182684

Mentioned in SAL (#wikimedia-operations) [2025-08-28T11:52:31Z] <Amir1> delete from recentchanges where rc_timestamp < '20250725000000'; on all.dblist (T403002)

Ladsgroup assigned this task to tstarling.
Ladsgroup moved this task from Triage to Done on the DBA board.