Page MenuHomePhabricator

Create maintenance script to import all existing images to mediamoderation_scan table
Closed, ResolvedPublic3 Estimated Story Points

Assigned To
Authored By
Dreamy_Jazz
Nov 9 2023, 12:27 PM
Referenced Files
F41551711: image.png
Dec 1 2023, 8:05 PM
F41551707: image.png
Dec 1 2023, 8:05 PM
F41551654: image.png
Dec 1 2023, 8:05 PM
F41551630: image.png
Dec 1 2023, 8:05 PM
F41551624: image.png
Dec 1 2023, 8:05 PM
F41551617: image.png
Dec 1 2023, 8:05 PM
F41551589: image.png
Dec 1 2023, 8:05 PM
F41551580: image.png
Dec 1 2023, 8:05 PM

Description

A maintenance script should be added that allows the importing of images from the image, oldimage and filearchive tables into the mediamoderation_scan table.

The table will contain the SHA-1 values of the images from these tables and null on all other row values.

This maintenance script should be run when update.php is run. However, this should only be done once T350323: Write an empty row to scan table on file upload is complete so that files uploaded after the maintenance script is run are added to the table without needing to run this script again.

Acceptance criteria
  • Create this maintenance script
  • Adequately phpunit test this script to ensure reliability

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolvedkostajh
ResolvedNone
ResolvedTchanders
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
Resolvedkostajh
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz

Event Timeline

Change 974687 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@master] [WIP] Add maintenance script to import existing files to scan table

https://gerrit.wikimedia.org/r/974687

Dreamy_Jazz renamed this task from Create maintenance script to import all existing images to mediamoderation_scan table to [M] Create maintenance script to import all existing images to mediamoderation_scan table.Nov 15 2023, 9:11 PM

Change 974686 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@master] [WIP] Change methods to accept ArchivedFile object as well as File

https://gerrit.wikimedia.org/r/974686

Dreamy_Jazz renamed this task from [M] Create maintenance script to import all existing images to mediamoderation_scan table to Create maintenance script to import all existing images to mediamoderation_scan table.Nov 20 2023, 4:19 PM
Dreamy_Jazz set the point value for this task to 2.

Change 974686 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] Change methods to accept ArchivedFile object as well as File

https://gerrit.wikimedia.org/r/974686

Dreamy_Jazz changed the point value for this task from 2 to 3.Nov 22 2023, 5:53 PM

Change 977215 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@master] Add importExistingFilesToScanTable.php to update.php

https://gerrit.wikimedia.org/r/977215

Change 974687 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] Add maintenance script to import existing files to scan table

https://gerrit.wikimedia.org/r/974687

I will write suggested QA steps for betawikis by tommorrow. QA should wait until QA on T350323 an T352234 is completed.

Suggested QA steps for betawikis:

  1. Make sure you have access to betawiki DBs, and if not get access
  2. Go to https://meta.wikimedia.beta.wmflabs.org/wiki/Special:SiteMatrix and choose a wikipedia from the list
  3. Go to that betawiki and load Special:ListFiles. Check that images appear in that list (ignore audio/video). If not, repeat step 2 to choose a different beta wikipedia.
  4. Connect to the betawiki over ssh (e.g. ssh deployment-deploy03.deployment-prep.eqiad1.wikimedia.cloud)
  5. Open the DB for your chosen wikipedia (e.g. sql dewiki)
  6. Run the following SQL and keep a note of the output:
SELECT COUNT(*) FROM mediamoderation_scan;
  1. Run the maintenance script. This can be done via mwscript extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --wiki=dewiki replacing dewiki with the name of the wiki you chose
  2. Make sure the maintenance script runs without any errors and wait until it completes fully.
  3. Repeat steps 5 and 6.
  4. Verify that the second time you ran the query a larger count was present

You can repeat the steps for the same wiki, but the count will not increase when you run the query for the second time and you would need to add --force to the end of the command in step 7.

You can also specify the options --start-timestamp, --batch-size, and --table. You may wish to repeat the steps with these specified. In more detail:

  • --start-timestamp allows you to import images that were uploaded at or after a given timestamp. The timestamp is provided in the format YYYYMMDDHHMMSS. If this is specified, the script will not be marked as complete. You would add this to the end of the command in step 7. For example, it could be --start-timestamp 20230504030201
  • --table allows you to specify the database table(s) to import images from. This can be specified multiple times and the valid values are image, oldimage, and filearchive. You could add this to the end of the command in step 7. If specified or not specifying all three options, the maintenance script will not be marked as complete. For example, this could be --table image --table oldimage
  • --batch-size allows you to control the number of files to import per batch. This takes an integer number and this would be added to the end of the command in step 7. For example, this could be --batch-size 40

Change 979167 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@master] On force make ImportExistingFilesToScanTable skip updatelog output

https://gerrit.wikimedia.org/r/979167

Change 979167 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] On force make ImportExistingFilesToScanTable skip updatelog output

https://gerrit.wikimedia.org/r/979167

Djackson-ctr subscribed.

I have verified that the new code has been implemented and is functioning and displaying as expected... Thank you for the QA Steps @Dreamy_Jazz


Count of images before running the maintenance script on zhwiki:

image.png (115×568 px, 5 KB)

Count of images after running the maintenance script on zhwiki:

image.png (104×610 px, 5 KB)


Count of images before running the maintenance script on viwiki:

image.png (104×564 px, 5 KB)

Count of images after running the maintenance script on viwiki using the --start-timestamp:

image.png (111×525 px, 5 KB)


Count of images before running the maintenance script on fawiki:

image.png (109×539 px, 5 KB)

Count of images after running the maintenance script on fawiki using the --table filearchive:

image.png (110×535 px, 5 KB)


Count of images before after running the maintenance script on fawiki using the --table archive:

image.png (102×544 px, 5 KB)

Count of images after running the maintenance script on fawiki using the --table image:

image.png (107×533 px, 5 KB)


Count of images before running the maintenance script on dewiki:

image.png (101×532 px, 5 KB)

Count of images after running the maintenance script on dewiki using the --table oldimage:

image.png (107×563 px, 5 KB)

Count of images after running the maintenance script on dewiki using the --table image:

image.png (102×530 px, 5 KB)

Count of images after running the maintenance script on dewiki using the --table filearchive:

image.png (101×540 px, 5 KB)


Count of images before running the maintenance script on enwiki:

image.png (107×580 px, 5 KB)

Count of images after running the maintenance script on enwiki:

image.png (98×539 px, 5 KB)

Dreamy_Jazz reopened this task as Open.

Change 977215 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@master] Add importExistingFilesToScanTable.php to update.php

https://gerrit.wikimedia.org/r/977215

The last change (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MediaModeration/+/977215/) shouldn't need QA, so this can be marked as resolved.

Change 979700 had a related patch set uploaded (by Kosta Harlan; author: Dreamy Jazz):

[mediawiki/extensions/MediaModeration@wmf/1.42.0-wmf.7] Add maintenance script to import existing files to scan table

https://gerrit.wikimedia.org/r/979700

Change 979700 merged by jenkins-bot:

[mediawiki/extensions/MediaModeration@wmf/1.42.0-wmf.7] Add maintenance script to import existing files to scan table

https://gerrit.wikimedia.org/r/979700

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:23:51Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]]

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:25:08Z] <urbanecm@deploy2002> kharlan and urbanecm: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2023-12-05T14:32:47Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:979698|User impact: update quantizeViews to process small series of view data (T352349)]], [[gerrit:979700|Add maintenance script to import existing files to scan table (T350863)]], [[gerrit:979701|Only allow drawing and bitmap media types to be scanned (T352234)]] (duration: 08m 55s)