Page MenuHomePhabricator

Run `refreshGlobalimagelinks.php --pages=nonexisting` from the GlobalUsage extension
Closed, ResolvedPublic

Description

After the fixes for T183474 and T199398 are deployed, we should run the maintenance script extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php --pages=nonexisting to remove the incorrect entries caused by the bugs.

Based on some quick estimates (see comments on those tasks), this will delete at least 0.5% of the rows in the globalimagelinks table on commonswiki.

(And maybe more; it looks like the last time we did this was in 2015 for T65594, and there are some unresolved tasks like T60604 and T219964 where it's more difficult to estimate how many incorrect entries they generated and continue generating.)

Event Timeline

The changes are deployed and I have some time for this, so let's try. I'll do it for the beta cluster first, and note some details since I haven't done that before:

I start with ssh deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud per https://www.mediawiki.org/wiki/Beta_Cluster.

Checking the current data: sql commonswiki and select count(*) from globalimagelinks; says there are 434313 entries.

I'll run foreachwiki extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php --pages=nonexisting | tee ~/T322588.log now.

Mentioned in SAL (#wikimedia-releng) [2023-01-04T19:42:43Z] <MatmaRex> Ran maintenansce script refreshGlobalimagelinks.php for T322588

Script output for beta cluster: (nothing interesting)

Checking the data again: sql commonswiki and select count(*) from globalimagelinks; says there are 394569 entries now. So that removed around 9% of rows!

Mentioned in SAL (#wikimedia-operations) [2023-01-04T20:58:23Z] <Amir1> running refreshGlobalimagelinks.php on all wikis (T322588)

Verifying:

There are currently 2125234 rows with gil_wiki='commonswiki', which is about 0.33% of the globalimagelinks table, and all of which are probably wrong due to this bug.

select count(*)
from globalimagelinks
where gil_wiki = 'commonswiki'

…gives 2074385 now, so that did not seem to work.

I was curious how many of the existing entries are affected by this bug:

This gives 1634414 rows, which is about 0.25% of the 641608450 rows in the globalimagelinks table. (There might be a small number of false positives in this query, see T218778#8355511.) They could be fixed with the refreshGlobalimagelinks.php maintenance script.

select count(*)
from globalimagelinks
inner join page on gil_page=page_id and gil_page_namespace_id=page_namespace and gil_page_title=page_title
where gil_wiki != 'commonswiki'

…gives 7438 now, so it looks like this part worked!

Verifying:

There are currently 2125234 rows with gil_wiki='commonswiki', which is about 0.33% of the globalimagelinks table, and all of which are probably wrong due to this bug.

select count(*)
from globalimagelinks
where gil_wiki = 'commonswiki'

…gives 2074385 now, so that did not seem to work.

It doesn't work because running the script with --pages=nonexisting only cleans up entries for file usage on pages that don't exist – not file usage of files that don't exist.

For the latter, --pages=existing would do it, but it would also generate entries for all local file usage, which we don't want on Commons.

If anyone wants to run a delete from globalimagelinks where gil_wiki = 'commonswiki', or write a maintenance script that runs it, go for it. I don't particularly want to work on that, so I'm calling it out of scope and considering this done.

I can run it, in batches and gently. Question before I start: Does it affect anything user-facing or the extension ignores showing anything if the target wiki is the same as home wiki?

It does affect it, the entries pointing to Commons are shown in global usage and they should not be. T199398 has a test case.

Mentioned in SAL (#wikimedia-operations) [2023-01-09T20:36:33Z] <Amir1> deleting global usage coming from commons in commons (T322588)