Page MenuHomePhabricator

Semi-supervised, supervised learning for 2nd edit quality campaigns
Open, LowPublic

Description

Currently, we follow the following process:

  1. Randomly sample ~20k revisions
  2. Filter our edits by trusted users to get down to ~5k revisions
  3. Users label the edits

When doing a second campaign, we should use the model trained on the first campaign's data to filter the edits to an even smaller set.

  1. Randomly sample ~20k revisions
  2. Filter our edits by trusted users to get down to ~5k revisions
  3. Filter edits that are obviously not damaging and saved in goodfaith using the old model (down to ~1 or 2k?)
  4. Users label the edits

This will probably belong in editquality autolabel.