This came up in a discussion in a 1:1 with @Halfak as well as a review of data sources created or maintained by Research with Legal.
We're collecting labeled data via interfaces such as WikiLabels or Mechanical Turk in order to train the models in the context of #revision-scoring-as-a-service or Discussion-modeling (of Toxicity). In order to be able to publicly release labeled data, we need to clarify licensing and any privacy implications.
@Slaporte: can I pick your brain one of these days and go over this?
- Clear licensing status of Detox labeled data
- Clear licensing status of Wikilabels (ORES) labeled data
- Language to be used going forward on labeling interfaces --> T156052