Page MenuHomePhabricator

[L] Allow prioritisation of images shown in "popular" tab in Special:SuggestedTags
Closed, ResolvedPublic

Description

ATM for CAT we show only "assessed" (featured/valued/quality) images in the "popular" tab. We do this by selecting images entered into the database prior to Dec. 1, 2019 (when the assessed images were imported).

To allow us to prioritise images in future, we need to add an INT priority column (default zero, probably with an index) to the machine_vision_image table, and order by that (in descending order) when retrieving images for review.

Also images from before Dec 1 2019 should have their priority set to a large positive number

(note - the new query to retrieve the images, and the new database field, should be checked out with a dba)

Acceptance criteria

  • images in popular tab ordered by priority (descending)
  • all images in popular tab on production should still be from before Dec 1 2019

COVID-19 Deployment Criteria

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Cparle renamed this task from Design a way of designating which images should appear in which views/tabs in CAT to Prioritise images shown in "popular" tab in Special:SuggestedTags.May 6 2020, 11:02 AM
Cparle updated the task description. (Show Details)
Cparle renamed this task from Prioritise images shown in "popular" tab in Special:SuggestedTags to Allow prioritisation images shown in "popular" tab in Special:SuggestedTags.May 6 2020, 11:53 AM
Cparle updated the task description. (Show Details)
Cparle renamed this task from Allow prioritisation images shown in "popular" tab in Special:SuggestedTags to Allow prioritisation of images shown in "popular" tab in Special:SuggestedTags.May 7 2020, 5:06 PM
Ramsey-WMF renamed this task from Allow prioritisation of images shown in "popular" tab in Special:SuggestedTags to [L] Allow prioritisation of images shown in "popular" tab in Special:SuggestedTags.May 13 2020, 4:28 PM

Change 602699 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/MachineVision@master] Allow prioritisation of images shown in "popular" tab

https://gerrit.wikimedia.org/r/602699

Change 602700 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/MachineVision@master] Start using priority column when selecting images to review

https://gerrit.wikimedia.org/r/602700

Change 602701 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/MachineVision@master] Drop unused index

https://gerrit.wikimedia.org/r/602701

These are the schema changes for DBA approval, once we've approved the series of patches internally (cc @Cparle) -- this is done


  1. These queries would need to run on production (commonswiki) to set up the new schema:
ALTER TABLE /*_*/machine_vision_image ADD COLUMN mvi_priority tinyint(3) DEFAULT 0;
CREATE INDEX /*i*/mvi_priority ON /*_*/machine_vision_image (mvi_priority);
  1. This one to update the existing data:
UPDATE machine_vision_image INNER JOIN machine_vision_label ON mvi_id = mvl_mvi_id INNER JOIN machine_vision_suggestion ON mvs_mvl_id = mvl_id SET mvi_priority = 127 WHERE mvs_timestamp < 20191201000000;

Note: can we do that one directly, or should we write a script to do it in batches?


This new query will be executed every time new images are requested: (to fetch a cutoff priority)

SELECT mvi_priority
FROM machine_vision_image
INNER JOIN machine_vision_label ON mvi_id = mvl_mvi_id
WHERE mvl_review = 0
GROUP BY mvi_priority HAVING COUNT( DISTINCT mvi_id ) >= 200
ORDER BY mvi_priority DESC
LIMIT 1

And then the existing subquery for actually fetching those images changes from this:

SELECT mvi_sha1
FROM machine_vision_image
INNER JOIN machine_vision_label ON mvi_id = mvl_mvi_id
INNER JOIN machine_vision_suggestion ON mvs_mvl_id = mvl_id
WHERE mvl_review = 0 AND mvs_timestamp < 20191201000000
ORDER BY mvi_rand DESC
LIMIT 10

to this:

SELECT mvi_sha1
FROM machine_vision_image
INNER JOIN machine_vision_label ON mvi_id = mvl_mvi_id
WHERE mvl_review = 0 AND mvi_priority >= $priority
ORDER BY mvi_rand DESC
LIMIT 10

Change 602701 abandoned by Matthias Mullie:
Drop unused index

Reason:
Combined mvi_priority_rand will not exist and we'll still use mvi_rand

https://gerrit.wikimedia.org/r/602701

The propose changes have been reviewed and agreed upon internally & the patch has been merged.
I've created a subticket for DBA to apply the schema changes.

Change 602699 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Allow prioritisation of images shown in "popular" tab

https://gerrit.wikimedia.org/r/602699

@matthiasmullie is this ready for QA? If so, can you move it to the QA column and tag Elena and let her know? Thanks!

This ticket requires multiple steps to be completed in sequence. The schema change has been merged & deployed to the cluster. The code to start using the new priority column has not yet been merged, so this is still in CR.
This is also not something that we'll be able to QA - it's only an internal change without any external/testable effects.

Oh! I thought I had merged that ... (goes and looks)

Change 602700 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Start using priority column when selecting images to review

https://gerrit.wikimedia.org/r/602700