Page MenuHomePhabricator

Update CAT/SuggestedTags logic to prioritize assessed (featured/quality/valued) images explicitly
Closed, ResolvedPublicBUG REPORT

Description

We have this:

The popular tab queue is currently the result is a (pseudo-)random selection from all images with labels awaiting review. It currently does not prioritize assessed images (featured/quality/valued) as originally intended.

We want this:

Explicitly prioritize the assessed images. The short-term way to do this is sort the queue by "suggestion timestamp" (mvs_timestamp in the db) since the assessed images were run through the Machine Vision algorithm first.

In the medium-to-long term, the data model should be updated to include the concept of which pool ("popular" or "uploads") a set of suggestions belongs to.

Screenshots (if possible):

Acceptance Criteria:

  • The Popular tab shows only Assessed images until that list has been exhausted (>200k files)

COVID-19 Deployment Criteria (see responses below)

  • Can you roll back this change without lasting impact?
    1. A recovery plan is required as this will help identify our capacity for recovering from the failure
    2. THIS IS A KEY QUESTION, if you can’t answer it, you shouldn’t deploy
  • Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
  • Is there a way to increase confidence about the correctness of this change?
    1. Reviews (Design, Code, etc)
    2. Testing coverage (unit tests, integration tests)
    3. Manual testing (e.g. Beta, vagrant, docker)

Event Timeline

Ramsey-WMF added a subscriber: Cparle.

Adding Cormac so he knows what's up (note I've actually set this as high priority because it's kind of a big deal)

Implementation notes:

The query to derive images to serve is in Repository::getTitlesWithUnreviewedLabels. If this is a request for "popular" images and not personal uploads (i.e., if $userId is null), add a join to the existing query for machine_vision_suggestion on mvs_mvl_id = mvl_id and select WHERE mvs_timestamp < 20191201000000. (Label suggestions for assessed images were entered in mid-November 2019 but new upload labeling was not enabled until December or later.)

EDIT: I tested the query for performance with the additional join and it's still fast.

I'm willing to take a stab at the "simple" implementation here, but will probably need some code review assistance again. If this ticket needs to be done ASAP then someone more comfortable writing DB queries should probably pick this up. I'll assign to myself tomorrow if no one has claimed it by then.

Change 585833 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] CAT: Only return "assessed" images if no user ID is provided

https://gerrit.wikimedia.org/r/585833

COVID-19 Deployment Criteria

Can you roll back this change without lasting impact?
Yes. Recovery plan: revert the change and the feature will revert to the current behavior.

Is specialized knowledge required to support this change in production? If so, are there multiple people with this knowledge?
Yes. I've discussed the change with Anne and Eric.

Is there a way to increase confidence about the correctness of this change?
The patch will be reviewed and tested, and can be verified on Beta Commons and Test Commons for further verification before hitting prod Commons.

Change 585833 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] CAT: Return only "assessed" images if no user ID is provided

https://gerrit.wikimedia.org/r/585833

Done and looking great.