Page MenuHomePhabricator

Periodically remove orphaned global_block_whitelist entries
Closed, ResolvedPublic1 Estimated Story Points

Description

Background

If an IP is locally exempted from a global block, a row is added to the global_block_whitelist table that references the ID of the block in the globalblocks table.

If the block is removed manually before the expiration date, the global_block_whitelist is not removed. Instead it may be removed if the expiry date is reached (via GlobalBlockingBlockPurger::purgeExpiredBlocks). However, the expiry may be infinite or far in the future.

Before https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GlobalBlocking/+/1012667 the row would also be removed when exempting the same IP from another block; however, this is no longer the case. (And shouldn't be relied on since the IP may never be re-expemtped.)

This hasn't been a problem so far (perhaps for reasons suggested in T360516#9645897) but may become more of a problem when global blocks for accounts are enabled.

What to do

Orphaned rows in global_block_whitelist should be removed periodically.

(The maintenance script fixGlobalBlockWhitelist.php script already exists for this, but it doesn't seem to be run periodically.)

Event Timeline

Change 1013026 had a related patch set uploaded (by Tchanders; author: Tchanders):

[mediawiki/extensions/GlobalBlocking@master] Remove global block exemption when the relevant global block is removed

https://gerrit.wikimedia.org/r/1013026

Change 1013026 abandoned by Tchanders:

[mediawiki/extensions/GlobalBlocking@master] Remove global block exemption when the relevant global block is removed

Reason:

Not needed

https://gerrit.wikimedia.org/r/1013026

Tchanders renamed this task from Remove global_block_whitelist entry when the relevant global block is removed to Add maintenance script to remove orphaned global_block_whitelist entries.Mar 20 2024, 12:00 PM
Tchanders removed a project: Patch-For-Review.
Tchanders updated the task description. (Show Details)

@Dreamy_Jazz I've updated the task following our discussion on the above patch, but I'm now not actually sure that running a maintenance script is better than deleting the old row as we add a new one.

Running the maintenance script periodically will run a bunch of selects to lookup, for all the rows in global_block_whitelist, whether a row exists in globalblocks.

Deleting as we add a new row could result in some rows never being deleted, but this might be less of a burden (and hasn't been flagged as an issue so far).

@Dreamy_Jazz I've updated the task following our discussion on the above patch, but I'm now not actually sure that running a maintenance script is better than deleting the old row as we add a new one.

On WMF wikis nearly all global blocks are made on metawiki. As such, with the implementation listed in the abandoned patch (even if it is fixed to use the correct database) will only ever delete the whitelist entries on that one wiki. Therefore, all other wikis would not have this purging method catch these blocks. Using Special:GlobalBlockWhitelist for this also doesn't seem to be a method that would catch most of the entries, because the target would have to unblocked and then globally blocked again before the special page would find these entries (which I would argue is not particularly likely).

Running the maintenance script periodically will run a bunch of selects to lookup, for all the rows in global_block_whitelist, whether a row exists in globalblocks.

The lookup could be improved by grouping the select queries to the globalblocks (i.e. looking for multiple IDs at once).

To work out the impact of a query on global_block_whitelist is, I checked the row counts for those tables on a handful of wikis. The largest row counts I found were 6 on enwiki and 9 rows on commonswiki. Therefore, IMO the lookup of all rows is not an expensive operation if we were to run this on all WMF wikis every week or so.

Deleting as we add a new row could result in some rows never being deleted, but this might be less of a burden (and hasn't been flagged as an issue so far).

I agree that this hasn't been an issue for now, because on WMF wikis it is extraordinarily rare for an IP address or range to be globally blocked with an indefinite expiry. Therefore, the whitelist entries will all eventually be purged by GlobalBlockingBlockPurger::purgeExpiredBlocks because the expiry will be met for the whitelist row. Furthermore, I would argue that a wiki overriding a global IP block is really unlikely because they are placed with a large amount of consideration for collateral and as such the global block is only made if most wikis would agree to having it.

For global account blocks, I would argue that we would see a much larger proportion of infinite expiry blocks (because account blocks do not have the same collateral or dynamic nature of who the block targets). Using "normal" blocks as an example, on enwiki only 255 rows exist where the target of the block does not have a user ID and the expiry is infinite. Around 1 million blocks exist where the expiry is infinite and target of the block has a user ID.

If the block is ever removed the whitelist entry will never be deleted because GlobalBlockingBlockPurger::purgeExpiredBlocks cannot delete whitelist entries that have a infinite expiry. With the likely large increase in infinite blocks, we are likely to see an increase in these orphaned rows that will never be purged.


In summary, I think orphaned global_block_whitelist rows will become larger over time once global account blocks are enabled and looking up all rows in the global_block_whitelist table is not a problem for the time being.

Change 1013130 had a related patch set uploaded (by Tchanders; author: Tchanders):

[operations/puppet@production] Schedule weekly purge of global_block_whitelist

https://gerrit.wikimedia.org/r/1013130

Tchanders renamed this task from Add maintenance script to remove orphaned global_block_whitelist entries to Periodically remove orphaned global_block_whitelist entries.Mar 20 2024, 7:35 PM

It would be good to be able to test this script, make it use the SelectQueryBuilders and maybe improve it to handle batching the selecting of rows from globalblocks. I'll file a task for that.

Dreamy_Jazz set the point value for this task to 1.Mon, Apr 1, 4:30 PM

Change #1013130 merged by RLazarus:

[operations/puppet@production] Schedule weekly purge of global_block_whitelist

https://gerrit.wikimedia.org/r/1013130

Mentioned in SAL (#wikimedia-operations) [2024-04-16T16:13:06Z] <rzl> rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_globalblocking-fixGlobalBlockWhitelist.service # T360516