Licensing of labeled data
This came up in a discussion in a 1:1 with @Halfak as well as a review of data sources created or maintained by Research with Legal.

We're collecting labeled data via interfaces such as WikiLabels or Mechanical Turk in order to train the models in the context of #revision-scoring-as-a-service or Discussion-modeling (of Toxicity). In order to be able to publicly release labeled data, we need to clarify licensing and any privacy implications.

@Slaporte: can I pick your brain one of these days and go over this?

  • Clear licensing status of Detox labeled data
  • Clear licensing status of Wikilabels (ORES) labeled data
  • Language to be used going forward on labeling interfaces --> T156052

I'd be happy to discuss. Want to set up time?

@Slaporte and I met and sent out a note with next steps.

@Slaporte @APalmer_WMF: we closed the Detox subitem but it looks like @Halfak hasn't heard back on the handling of ORES labeled data. Do you guys have a status update/ETA?

@Slaporte we're moving forward and assuming CC0 for data collected so far and we have a new task to keep track of any suggestion of language to be used for the collection of labeled data going forward: T156052.