Page MenuHomePhabricator

Move category changes from RecentChanges/Watchlist to separate Special page
Open, Needs TriagePublicFeature

Description

We need to make adjustments to the recentchanges table to ease the load on the database servers (esp on large wikis)

We need to make the recentchanges table smaller, so we want to move rows with rc_source='mw.categorize' to a separate table, and for users to use a different special page (probably Special:CategoryChanges (instead of Special:RecentChanges)) to observe these kinds of changes

Here are the steps:

  • create a new table to hold the category changes
  • writing category changes to the new table as well as to the current table
  • create the new Special page (with a user option "show only changes from my watchlist")
  • encourage users to migrate
  • remove the "Category changes" from the filters on Special:RecentChanges and Special:Watchlist
  • stop writing category changes to recentchanges

(note there's no need to actually migrate the data, we can just let the old data expire)

Event Timeline

Cparle updated the task description. (Show Details)
Cparle renamed this task from Separate mw.categorize changes from other recent changes to Move category changes from RecentChanges/Watchlist to separate Special page.Jul 14 2025, 2:42 PM

As an alternative, we can even split it to a new extension, since this proposed table will be empty by default in MediaWiki ($wgRCWatchCategoryMembership is false by default).

As an alternative, we can even split it to a new extension, since this proposed table will be empty by default in MediaWiki ($wgRCWatchCategoryMembership is false by default).

Doing this as a separate extension adds a large extra burden on the development that isn't necessary for medium-term performance-related work. It can be moved out to an extension later once the work is complete, if people really want, but given the feature is built into core already, having that argument now is not a good use of time.

After talking with design and product, it sounds like keeping category changes in Special:RecentChanges may be desirable over moving it out to a new page; we're talking through the idea of making category changes a mutually exclusive option; eg. selecting category changes deselects all other changes and queries the new table. Selecting changes besides category changes deselects category changes and queries the recentchanges table. The overall requirements for doing it this way would be very similar in terms of updating code for category change queries.

After talking with design and product, it sounds like keeping category changes in Special:RecentChanges may be desirable over moving it out to a new page; we're talking through the idea of making category changes a mutually exclusive option; eg. selecting category changes deselects all other changes and queries the new table. Selecting changes besides category changes deselects category changes and queries the recentchanges table. The overall requirements for doing it this way would be very similar in terms of updating code for category change queries.

And here is the design work for that approach T400523: Designs for making category changes an exclusive choice on all interfaces

After talking with design and product, it sounds like keeping category changes in Special:RecentChanges may be desirable over moving it out to a new page;

+1. Moving it them another special page would greatly reduce their visibility.

we're talking through the idea of making category changes a mutually exclusive option; eg. selecting category changes deselects all other changes and queries the new table.

Couldn’t this restriction be lifted? Since category changes are not visible by default, most requests wouldn’t want to query the new table anyway, so I’m hopeful that the few that do, would be okay in terms of response time. In terms of disk space usage, it wouldn’t matter; the new, more optimized¹ table would be the improvement, not the times we query it.

¹ Looking at the current schema, about half of the columns can be dropped:

  • rc_namespace – it’s always NS_CATEGORY anyway
  • rc_comment_id – can be replaced by a tinyint(2) column for storing the four possible comments (addition or removal, transcluded or not); the main win here is that the corresponding lines in the comment don’t need to exist
  • rc_minor – always zero
  • rc_new – being dropped from recentchanges as well (T36320)
  • rc_source and rc_type – the new table implicitly includes the type (the latter is being dropped from recentchanges as well, T74157)
  • rc_patrolled – category changes always count as patrolled
  • rc_old_len and rc_new_len – always NULL
  • rc_logid, rc_log_type, rc_log_action – not a log entry
  • rc_params – can be replaced by a tinyint(1) column storing whether the category is hidden; the added array key is already included in the above-mentioned tinyint(2) field. The tinyint(1) column would also be usable in the SELECT query – currently all rows are read and (depending on user preferences) those that correspond to hidden categories are skipped in PHP code; now they could be skipped at the SQL level.

Also, rc_title could be replaced with a foreign key to the linktarget table ­– with T299951: Normalize categorylinks table mostly done, all added and removed categories should have entries in that table (although we should be careful not to remove entries for removed categories prematurely).

And rc_bot, rc_ip and rc_last_oldid are derived from the original RC entry (available through rc_this_oldid), so maybe they can be dropped and joined on display? Of course, this is a tradeoff between disk usage and query complexity, so it may not be worth it. But dropping at least rc_ip is definitely worth it: it’s not used by Special:RecentChanges or Special:Watchlist (as far as I see, it’s only used for autoblocks in core, but autoblocks can just as well use the original RC entries; it’s also used by AbuseFilter and a maintenance script in CheckUser, but they can join it against recentchanges if they are interested in category changes at all), and dropping it would keep personally identifiable information out of the table.

After talking with design and product, it sounds like keeping category changes in Special:RecentChanges may be desirable over moving it out to a new page; we're talking through the idea of making category changes a mutually exclusive option; eg. selecting category changes deselects all other changes and queries the new table. Selecting changes besides category changes deselects category changes and queries the recentchanges table. The overall requirements for doing it this way would be very similar in terms of updating code for category change queries.

And here is the design work for that approach T400523: Designs for making category changes an exclusive choice on all interfaces

FWIW, this will add a lot of complexity to the implementation of the change and would make the rc code much more entangled and harder to maintain. If you really really want it and dev teams are willing to. Please go for it but I warn you, it's going to be a mess down the road.

Couldn’t this restriction be lifted? Since category changes are not visible by default, most requests wouldn’t want to query the new table anyway, so I’m hopeful that the few that do, would be okay in terms of response time. In terms of disk space usage, it wouldn’t matter; the new, more optimized¹ table would be the improvement, not the times we query it.

That's not the reason it's being split. RC/Watchlist special pages practically allow free form SQL and that forces MariaDB to scan many rows just to discard them right away and that's slowing down the whole page. So while it's hidden, just the fact that there are as many rows for cat changes in enwiki than all edits combined or 2.4 times in commons (24 million rows for cat changes, 10 million for edits) is slowing down practically all special page queries even when it's marked as hidden (in commons, it's basically tripling the load page of all of them). You might say just add index but I need to tell you that this table already has ten indexes (12 while we are migrating stuff) and it's not enough.

Also, rc_title could be replaced with a foreign key to the linktarget table ­– with T299951: Normalize categorylinks table mostly done, all added and removed categories should have entries in that table (although we should be careful not to remove entries for removed categories prematurely).

No. 1- RC tables are de-normalization tables and shouldn't be further normalized. IMO, actor and comment normalizations should actually be reverted. This beside adding one more join that could cause major issues (finding the right join order is a p-np complete problem and that's why finding a correct order after six seven table which rc query is already hitting is a big problem). adds a lot of complexity in terms on maintenance of data. Plus, that table is for links table and for example, it won't exist in core databases of s4 in several months (T398709)

And rc_bot, rc_ip and rc_last_oldid are derived from the original RC entry (available through rc_this_oldid), so maybe they can be dropped and joined on display? Of course, this is a tradeoff between disk usage and query complexity, so it may not be worth it.

As I said above, the table is the de-normalization/summary table. The approach to improving its performance is different than what we usually do and those changes won't give much benefit while actually making things worse.

But dropping at least rc_ip is definitely worth it: it’s not used by Special:RecentChanges or Special:Watchlist (as far as I see, it’s only used for autoblocks in core, but autoblocks can just as well use the original RC entries; it’s also used by AbuseFilter and a maintenance script in CheckUser, but they can join it against recentchanges if they are interested in category changes at all), and dropping it would keep personally identifiable information out of the table.

It's not populated in our production (last I checked) and dropping it won't make a difference. As I said, the approach to improve such tables should be on reducing rows not columns.

That's not the reason it's being split. RC/Watchlist special pages practically allow free form SQL and that forces MariaDB to scan many rows just to discard them right away and that's slowing down the whole page. So while it's hidden, just the fact that there are as many rows for cat changes in enwiki than all edits combined or 2.4 times in commons (24 million rows for cat changes, 10 million for edits) is slowing down practically all special page queries even when it's marked as hidden (in commons, it's basically tripling the load page of all of them). You might say just add index but I need to tell you that this table already has ten indexes (12 while we are migrating stuff) and it's not enough.

I mean “hidden” from a user perspective. With the new table, we can (and, of course, should) simply skip the query if the respective checkbox is not checked. If Special:RecentChanges does not query the table for a given request, how would that be less performant than another special page not querying it?

Also, rc_title could be replaced with a foreign key to the linktarget table ­– with T299951: Normalize categorylinks table mostly done, all added and removed categories should have entries in that table (although we should be careful not to remove entries for removed categories prematurely).

No. 1- RC tables are de-normalization tables and shouldn't be further normalized. IMO, actor and comment normalizations should actually be reverted. This beside adding one more join that could cause major issues (finding the right join order is a p-np complete problem and that's why finding a correct order after six seven table which rc query is already hitting is a big problem). adds a lot of complexity in terms on maintenance of data. Plus, that table is for links table and for example, it won't exist in core databases of s4 in several months (T398709)

Okay, I get your argument about de-normalization. (As I wrote, rc_comment_id can go away, practically reverting the comment normalization.)

It's not populated in our production (last I checked) and dropping it won't make a difference. As I said, the approach to improve such tables should be on reducing rows not columns.

Removing a column that is not used still improves performance, even if it’s not the main problem, doesn’t it? If you have shorter rows, you can load more rows into the memory. Furthermore, according to https://www.mediawiki.org/wiki/Manual:Recentchanges_table#Indexes, rc_ip has a one-column index, so not adding the column to the new table would also decrease the number of indices the new table starts with.

I believe we can split rc_ip to a new table (which will not exist in production), since (1) it is not related to the recentchanges feature (2) the only query pattern is (when CheckUser is not installed) querying the most recent IP for autoblock, so we only need one row per user instead of per edit.

As I said above, the table is the de-normalization/summary table. The approach to improving its performance is different than what we usually do and those changes won't give much benefit while actually making things worse.

How are rc_log_type, rc_log_action and rc_params used? There are no indices on these columns, and it seems no filters at Special:RecentChanges uses these columns.

If they are only used to generate displays for already selected recentchange entries, then we do not need columns on recentchanges table. We can just have an additional query on logging table (no join needed) with a set of rc_logid (there are no more than 500 such IDs in each request).

Similarly I also doubt the usefulness of rc_old_len and rc_new_len. In addition in order to show a diff of size, we only need one column instead of two even if we get everything from recentchanges instead of (either joining or querying in addition) revision table.

As I said above, the table is the de-normalization/summary table. The approach to improving its performance is different than what we usually do and those changes won't give much benefit while actually making things worse.

How are rc_log_type, rc_log_action and rc_params used? There are no indices on these columns, and it seems no filters at Special:RecentChanges uses these columns.

Query. For example, Nuke uses the first two to search for new uploads, API action=query&list=allusers uses rc_log_type to implement the activeusers parameter (yuck! that’s an unindexed query in an API GET request). Wikibase queries by rc_params – which don’t seem to be log parameters; contrary to the documentation, rc_params stores other kind of data for non-log RC entries.

That's not the reason it's being split. RC/Watchlist special pages practically allow free form SQL and that forces MariaDB to scan many rows just to discard them right away and that's slowing down the whole page. So while it's hidden, just the fact that there are as many rows for cat changes in enwiki than all edits combined or 2.4 times in commons (24 million rows for cat changes, 10 million for edits) is slowing down practically all special page queries even when it's marked as hidden (in commons, it's basically tripling the load page of all of them). You might say just add index but I need to tell you that this table already has ten indexes (12 while we are migrating stuff) and it's not enough.

I mean “hidden” from a user perspective. With the new table, we can (and, of course, should) simply skip the query if the respective checkbox is not checked. If Special:RecentChanges does not query the table for a given request, how would that be less performant than another special page not querying it?

The problem is when the user marks the checkbox and then mariadb needs to do a union which means it's going to scan a lot of rows. It's certainly better than status quo but if it's not exclusive (or separate special page), that's gonna be painful.

It's not populated in our production (last I checked) and dropping it won't make a difference. As I said, the approach to improve such tables should be on reducing rows not columns.

Removing a column that is not used still improves performance, even if it’s not the main problem, doesn’t it? If you have shorter rows, you can load more rows into the memory. Furthermore, according to https://www.mediawiki.org/wiki/Manual:Recentchanges_table#Indexes, rc_ip has a one-column index, so not adding the column to the new table would also decrease the number of indices the new table starts with.

Let me explain with numbers: The largest rc table in production is commons with 40M rows. Each rc_ip takes two bytes. In total it takes only 80MB . It's nothing. Even if you count the index, it's still nothing. If you multiply it by five, it's still nothing. As long as RC table makes queries that require scanning a lot of rows, none of these ideas could make a dent. It's nice design and all but the ROI is negative and we are better focused on doing something else. If you want to do it as volunteer, in your free time, sure. But this is not a good use of engineering time bought by donors money.

The hidecategorization and watchlisthidecategorization options default to true, which is the root cause of the table scanning. Only hidden rows contribute to excessive table scanning — shown rows are counted towards the display limit.

Reimplementing the feature as a union should be fine for performance because the union is only active when the rows are shown. Similarly adding a generated column rc_is_categorization defined as rc_source = 'mw.categorize' and indexing on that would work well enough: a condition like rc_is_categorization IN (0, 1) would be optimized to run as a union.

I investigated indexes on generated columns in the context of T389028, and I updated our minimum DBMS versions sufficiently so that we can actually do this in core if we want to. But a separate table is easier to deploy, and the new table would have smaller rows as @Tacsipacsi pointed out above.

Looking back at the notes for T9148, it is surprising how much time we spent on it and all the user acclaim we got from it for what turned out to be a non-default option.

WikiUsers with hidecategorization=0Users with watchlisthidecategorization=0
enwiki43464713
commonswiki11821478

Nobody ever asked for the Special:RecentChanges feature, that was an accident caused by the implementation. We can remove these events from Special:RecentChanges and every other consumer of the RecentChange abstraction without reopening T9148. But T9148 does require a union on Special:Watchlist.

Both an exclusive filter and a separate special page have poor usability. Nobody wants to check multiple pages habitually and doing so is similar to a union in terms of performance.

The simplest fix for this task is to set $wgRCWatchCategoryMembership = false and that would be defensible given its status as a non-default option.

Reimplementing the feature as a separate table and doing union in watchlist (and RC if there is a need) sounds like a good idea to me. Do you think it'd be easy for you to implement it? I'd be happy to do the reviews and such.

Reimplementing the feature as a separate table and doing union in watchlist (and RC if there is a need) sounds like a good idea to me. Do you think it'd be easy for you to implement it?

There's 9 WMF-deployed extensions that modify the ChangesListSpecialPage query, adding joins, fields and conditions, they may need some migration work. Doing the union on Special:Watchlist only doesn't help much because most of the complexity comes from ChangesListSpecialPage which will need to be aware of the union anyway.

There's some code smell, like

  • Global service container access in isRowApplicableCallable callbacks
  • GrowthExperiments conditionally adding a 0=1 condition
  • The long list of parameters to queryCallable
  • Extensions implementing filters outside of the filter framework, using the SpecialRecentChangesPanel and ChangesListSpecialPageQuery hooks.

It would benefit from having some sort of filter container.

Also a schema-aware union query builder which would be able to turn off a branch of the union if it saw a condition like rc_namespace=0.

Yeah, the state of the code of RC was the biggest reason behind me suggesting a dedicated special page. If you refactor it to be much cleaner, it can help in many other areas too (e.g. index hints based on conditions)

JWheeler-WMF subscribed.

@tstarling I assigned you to this task, given I think you're working on it. If you think it should be assigned to someone else, please re-assign!

I investigated the likely performance benefit of this change.

Using db2202, a depooled replica of s1, I created a copy of the recentchanges table, and I created two tables with subsets of that snapshot table: one with only categorizations and with no categorizations.

Row counts by source:

rc_sourcecountrelative count
mw.categorize353487436.4%
mw.edit472282448.6%
mw.log2589662.7%
mw.new2641502.7%
wb9332989.6%

I randomly chose two slow queries from the MediaWiki slow query logs and ran them against the snapshot and the reduced snapshot. One of the queries was severely filtered by a tag filter, and the other was severely filtered by an ORES filter. Both excluded categorization changes and so benefit from this change.

I used the slow query log to determine the actual number of examined rows. For the query time, I ran each query four times, and used the average of the last three runs.


  • URL: hideWikibase=1 & hidebots=1 & hidecategorization=1 & limit=100 & tagfilter=OAuth CID: 5481
  • Logged query time: 26.3
  • COUNT(*), limit removed: 59
  • COUNT(*), tag filter removed: 502584
MetricBaselineInterventionImprovement
Query time (s)10.69.113.8%
Total rows examined1804572153492514.9%

I also tested the union variant of this query, and its running time was 10.0s, so 5.5% faster than the baseline.


  • URL: damaging=likelybad;verylikelybad & goodfaith=likelybad;verylikelybad & hideWikibase=1 & hidebots=1 & hidecategorization=1 & hidepreviousrevisions=1
  • Logged query time: 6.9
  • COUNT(*), limit removed: 1272
  • COUNT(*), ORES filter removed: 355142
MetricBaselineInterventionImprovement
Query time (s)5.24.022.7%
Total rows examined106343083259421.7%

The intervention reduces the number of recentchanges rows scanned by about 36%, but both queries need to scan many rows in a joined table, so the overall impact is less than 36%.

Are these two queries representative? I examined a full day of the slow log for enwiki Special:RecentChanges. There were 88 requests in which the query took more than 5 seconds. 9 were essentially identical to the tagfilter query above. 39 were essentially identical to the ORES query above. The remainder mostly had either restrictive tag filters or restrictive ORES filters or both, so I think the analysis applies to most queries.

Consider, as an alternative for the ORES filter, an ORES summary table:

CREATE TABLE ores_summary_202508 (
  os_rc_id BIGINT PRIMARY KEY,
  os_damaging TINYINT, 
  os_goodfaith TINYINT, 
  INDEX (os_damaging, os_goodfaith), 
  INDEX (os_goodfaith, os_damaging)
);

INSERT INTO ores_summary_202508 SELECT 
  rc_id AS os_rc_id,
  CASE 
    WHEN d.oresc_probability<0.149 THEN 0 
    WHEN d.oresc_probability<0.301 THEN 1
    WHEN d.oresc_probability<0.629 THEN 2
    WHEN d.oresc_probability<0.944 THEN 3
    ELSE 4 
  END AS os_damaging,
  CASE 
    WHEN gf.oresc_probability<0.065 THEN 0 
    WHEN gf.oresc_probability<0.353 THEN 1 
    WHEN gf.oresc_probability<0.777 THEN 2 
    WHEN gf.oresc_probability<0.925 THEN 3
    ELSE 4 
  END AS os_goodfaith
  FROM recentchanges_202508
  JOIN ores_classification AS d ON d.oresc_model = 59 AND d.oresc_rev=rc_this_oldid
  JOIN ores_classification AS gf ON gf.oresc_model = 60 AND gf.oresc_rev=rc_this_oldid;

The thresholds are configured, so are reasonably stable.

The row counts, with the four regions selected by the previous example RC query in bold:

damaging = 01234
goodfaith = 0000438962
10085844405682
212313460172084576431
3584371920351240509860
46290450104563533430

The modified query still has the ores_classification joins, which are needed for display, but now it filters on the summary table.

SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,recentchanges_comment.comment_text AS `rc_comment_text`,recentchanges_comment.comment_data AS `rc_comment_data`,recentchanges_comment.comment_id AS `rc_comment_id`,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE (ct_rc_id=rc_id)  ) AS `ts_tags`,fp_stable,fp_pending_since,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges_202508` STRAIGHT_JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) STRAIGHT_JOIN `comment` `recentchanges_comment` ON ((comment_id=rc_comment_id)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 59 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 60 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)  LEFT JOIN `ores_summary_202508` ON os_rc_id=rc_id  WHERE rc_bot = 0 AND ((rc_this_oldid = page_latest OR rc_type = 3)) AND (rc_type != 6) AND (rc_source != 'wb') AND os_damaging IN (3,4) AND (rc_type NOT IN (3,5)) AND os_goodfaith IN (0,1) AND (rc_type NOT IN (3,5)) AND (rc_timestamp >= '20250819210821')  ORDER BY rc_timestamp DESC LIMIT 500;
MetricBaseline (s)Intervention (s)Improvement
Query time5.20.4092.3%
Total rows examined106343011163389.5%

For the change tag queries, it turns out that the problem is that there are 10 old rows in enwiki.recentchanges, that have been stuck there for a year, so MIN(rc_id) is much smaller than it should be. So the row count estimate for filter 656 is 136636 when it should be 13432. So isDenseTagFilter() returns true, causing STRAIGHT_JOIN to be used. Running the query with a plain join results in a query time of 0.49 seconds.

Are these two queries representative? I examined a full day of the slow log for enwiki Special:RecentChanges. There were 88 requests in which the query took more than 5 seconds. 9 were essentially identical to the tagfilter query above. 39 were essentially identical to the ORES query above. The remainder mostly had either restrictive tag filters or restrictive ORES filters or both, so I think the analysis applies to most queries.

Yes and no :D In enwiki, I think this is good enough and I'm thankful for the investigation. In Commons, we have 24M rows from category changes (vs 10M for edits) so the balance is more skewed there and how big the impact of the change would be there. We do have a lot of slow queries from not just commons but also many other wikis too.

The problem with the existing ("free form SQL") framework is that it is based on exclusion, not inclusion. I have presented this in my analysis in T307328#10715066.

When you take a look at ChangesListSpecialPage::$filterGroupDefinitions, the conditions added to queries are in the form $conds[] = $dbr->expr( 'rc_type', '!=', RC_EDIT ); (will use rc_source when the T74157: [Story] Use rc_source and drop rc_type migration is done).
Hence, none of the queries could really be expected to use the available indices on rc_type/rc_source, they are effectively bound to waste time on scanning the whole table.

I wonder if this is something that could be addressed in the code. For example, we could have an interface that, given all change types/sources the user does not want to see, determines the change types/sources to filter for (i.e., what to put in rc_source IN (...)). In my opinion, this could be another easy win, especially with two indices on rc_source. Obviously, this needs a "give me all existing sources" oracle that considers extensions (Wikibase, Flow, etc.), but that could be even something like SELECT DISTINCT rc_source FROM recentchanges.

Looking back at the notes for T9148, it is surprising how much time we spent on it and all the user acclaim we got from it for what turned out to be a non-default option.

Nobody ever asked for the Special:RecentChanges feature, that was an accident caused by the implementation.

I really wish the implementation was more meaningful, and thus actually useful: https://meta.wikimedia.org/wiki/Community_Wishlist/Wishes/Improve_tracking_of_categorization_changes.

The problem with the existing ("free form SQL") framework is that it is based on exclusion, not inclusion. I have presented this in my analysis in T307328#10715066.

When you take a look at ChangesListSpecialPage::$filterGroupDefinitions, the conditions added to queries are in the form $conds[] = $dbr->expr( 'rc_type', '!=', RC_EDIT ); (will use rc_source when the T74157: [Story] Use rc_source and drop rc_type migration is done).
Hence, none of the queries could really be expected to use the available indices on rc_type/rc_source, they are effectively bound to waste time on scanning the whole table.

I wonder if this is something that could be addressed in the code. For example, we could have an interface that, given all change types/sources the user does not want to see, determines the change types/sources to filter for (i.e., what to put in rc_source IN (...)). In my opinion, this could be another easy win, especially with two indices on rc_source. Obviously, this needs a "give me all existing sources" oracle that considers extensions (Wikibase, Flow, etc.), but that could be even something like SELECT DISTINCT rc_source FROM recentchanges.

I filed T403142 for this.