Page MenuHomePhabricator

Define the lifecycle for MV-generated depicts suggestion data
Closed, ResolvedPublic


Machine vision-generated labels for Commons images will be requested from one or more MV providers. We need to answer the following lifecycle questions about this data:

  • How long should we retain candidates? Forever, or can they be dropped after a certain time?
    • This will depend in part on the requirements for promotion to SDC and model feedback.
  • How does this affect how candidates should be stored?
  • If/when should previously fetched data be refreshed?

Event Timeline

Mholloway created this task.Jul 5 2019, 5:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 5 2019, 5:27 PM
Mholloway renamed this task from Define the lifecycle for MV-generated depicts suggestions to Define the lifecycle for MV-generated depicts suggestion data.Jul 5 2019, 5:28 PM
Mholloway moved this task from Backlog to Depicts suggestions on the MachineVision board.

From Adam over email:

are there use cases for holding onto the labels for some interval post the first human verification? My gut intuition is we probably want to stash the latest retrieved label data indefinitely even if the original data for the paid services is only internet exposed until a human verification (plus some lag maybe to deal with risk of vandalism?), but in addition to just needing to deal with up front storage planning.

I'll just add to this one possible future use case we've discussed before: when/if WMF eventually has our own homegrown system for this, it could be useful to have the old labels from V1 to compare/contrast against and judge the quality of our new model(s).

LGoto triaged this task as Medium priority.Jul 17 2019, 3:42 PM

Resolved that the labels should be held indefinitely?

I'm in support of permanent stashing.

It's a subtler matter, but the question of refresh frequency is orthogonal.

Mholloway updated the task description. (Show Details)Jul 25 2019, 1:48 PM
Mholloway closed this task as Resolved.Aug 23 2019, 5:37 PM
Mholloway claimed this task.

See T229314 regarding where to store historical suggestions data.