Page MenuHomePhabricator

Import all existing images to the mediamoderation_scan table on WMF wikis
Closed, ResolvedPublic2 Estimated Story Points

Description

Once T350863: Create maintenance script to import all existing images to mediamoderation_scan table and T350323: Write an empty row to scan table on file upload is complete, the existing images on WMF wikis will need to be imported to the mediamoderation_scan table. This task tracks the progress for this.

Acceptance criteria
  • Have the maintenance script added in T350863 be run for all WMF wikis

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedkostajh
ResolvedNone
ResolvedTchanders
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
Resolvedkostajh
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

Dreamy_Jazz set the point value for this task to 2.Nov 20 2023, 4:39 PM

This will need to wait until either:

  1. The three relevant changes (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/979167/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/974687/, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/978155) are backported
  2. The 12-18th of December (depending on the specific wiki), so that the train with the above changes in has been normally carried out

We ran this on testwiki today successfully.

@Dreamy_Jazz @kostajh Should we try to run this on all production wikis asap, assuming we intend to do that manually? (I know we're waiting a bit before we run the scanning script on the rest of production.)

We ran this on testwiki today successfully.

@Dreamy_Jazz @kostajh Should we try to run this on all production wikis asap, assuming we intend to do that manually? (I know we're waiting a bit before we run the scanning script on the rest of production.)

Yes. The import on commons will likely take a while, so we should do this asap at least for commons so we could start a scan on commons over the holidays.

ImportExistingFilesToScanTable was run on group0 wikis today (took around 10 minutes). Once group1 are on 1.42-wmf.10 we'll run it on those.

Import script was run on all wikis. The number of rows in mediamoderation_scan is sometimes lower than the sum of rows in image + oldimage + filearchive, as expected: some file types can't be scanned; some SHA-1s are duplicated.

Leaving open in case there's any follow-up work to do.

I think we can close this and open new tasks as needed.