Page MenuHomePhabricator

[L] Create scripts to populate/update image prioritisation for CAT
Open, Needs TriagePublic

Description

Note: this can only be implemented once T250748 is done

First make sure that all "assessed" images in the queue are prioritised so they will be processed first (by setting the priority field to some very large positive number)

Then create a script that runs daily that will fetch

  1. the most popular images from the last month (using https://wikimedia.org/api/rest_v1/#/Mediarequests%20data/get_metrics_mediarequests_top__referer___media_type___year___month___day_ )
  2. the number of requests made in the last month for all images that were uploaded on 2 random dates > 6 months in the past (using https://wikimedia.org/api/rest_v1/#/Mediarequests%20data/get_metrics_mediarequests_per_file__referer___agent___file_path___granularity___start___end_)

... and then loop through the images with their request counts and, if and only if the image does not already have depicts tags:

  • send image to google for classification if it hasn't already been classified
  • set priority to a high value (max value is 128)

This should allow us to prioritise the most popular images for CAT, and to trawl through the backlog

Event Timeline

Cparle created this task.May 7 2020, 5:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 7 2020, 5:17 PM
Ramsey-WMF added a subscriber: Ramsey-WMF.

Blocked until the parent ticket is addressed.

Change 602702 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/MachineVision@master] Allow populating image priority

https://gerrit.wikimedia.org/r/602702

matthiasmullie added a subscriber: matthiasmullie.EditedJun 5 2020, 3:14 PM

Above patch adds priority option to the existing maintenance script. This does not yet complete this task: we'll still need to generate those file lists, and automatically run the script periodically.

Change 602702 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Allow populating image priority

https://gerrit.wikimedia.org/r/602702

CBogen renamed this task from Create scripts to populate/update image prioritisation for CAT to [L] Create scripts to populate/update image prioritisation for CAT.Jul 15 2020, 4:18 PM
AnneT added a subscriber: AnneT.Jul 15 2020, 11:32 PM

@Ramsey-WMF For images that haven't previously been annotated, will a notification be sent to the user? Currently, only images that are annotated via the fetch annotations job, i.e. new uploads, will generate notifications.

Cparle updated the task description. (Show Details)Sep 16 2020, 4:13 PM