Page MenuHomePhabricator

Develop 1 model to identify Misinformation
Closed, ResolvedPublic

Description

In this task we track our efforts on building a model to identify potential signals of mis/disinformation in Wikimedia projects.

  • Q1 Build an initial list of templates used by the editors to signal bad content.
  • Q2 Create the final list of templates used by editors to signal bad-content
  • Q3 Create one machine readable dataset around the most important templates found in the previous step. This dataset will be used to train ML models that can support editors in early identifying bad content.

Event Timeline

diego triaged this task as High priority.Aug 17 2020, 3:23 PM

Update

  • We are extending the list to other languages: es, pt, ca.
  • Reviewing outreaching applications that will help on creating the machine readable dataset.

Update

  • We have selected one Outreachy intern that will start on December. The intern will help on the task of developing the machine readable dataset.

Update

  • Kay (outreachy intern) has started her work based on the templates listed in this WikiProject.
  • We are exploring techniques to get negative examples (cases were the problem has been solved) for these templates.

Updates

  • We have analyzed the impact of reverts on negative examples (reliability issue being solved)
  • We have already created an heuristic to find negatives examples.
  • We have created an initial dataset 80 templates.
  • Currently we identifying relevant meta-data (ie. pre-computed features) to be added on the dataset.

Updates

  • New metadata has been added to the dataset: We are differentiating templates at article, section, and inline level.

Updates

  • We have announced the datasets in a presentation to the NLP group in the University of Cambridge.
  • Currently working on documenting the datasets.

Updates

  • The datasets will be published next week.

Updates

Updates

  • Information updated on betterworks.

*Updates*

  • We are preparing the camera ready version for SIGIR.

Updates

  • The CR version was submitted.
  • We have uploaded the paper on arxiv, it should be available next week.

Updates