Develop 1 model to identify Misinformation
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	diego
	Aug 17 2020, 3:23 PM

Description

In this task we track our efforts on building a model to identify potential signals of mis/disinformation in Wikimedia projects.

Q1 Build an initial list of templates used by the editors to signal bad content.
Q2 Create the final list of templates used by editors to signal bad-content
Q3 Create one machine readable dataset around the most important templates found in the previous step. This dataset will be used to train ML models that can support editors in early identifying bad content.

Related Objects

Mentioned In: T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia.
T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items.

Event Timeline

diego created this task.Aug 17 2020, 3:23 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 17 2020, 3:23 PM

diego triaged this task as High priority.Aug 17 2020, 3:23 PM

diego mentioned this in T243256: Measuring the consistency of information between Wikipedia articles and Wikidata items..Aug 31 2020, 1:52 PM

leila updated the task description. (Show Details)Oct 16 2020, 8:33 PM

leila moved this task from FY2020-21-Research-July-September to FY2020-21-Research-October-December on the Research board.

leila edited projects, added Research (FY2020-21-Research-October-December); removed Research (FY2020-21-Research-July-September).

Update

We are extending the list to other languages: es, pt, ca.
Reviewing outreaching applications that will help on creating the machine readable dataset.

Update

We have selected one Outreachy intern that will start on December. The intern will help on the task of developing the machine readable dataset.

Update

Kay (outreachy intern) has started her work based on the templates listed in this WikiProject.
We are exploring techniques to get negative examples (cases were the problem has been solved) for these templates.

diego updated the task description. (Show Details)Dec 11 2020, 2:46 PM

leila moved this task from FY2020-21-Research-October-December to FY2020-21-Research-January-March on the Research board.Jan 12 2021, 6:57 PM

leila edited projects, added Research (FY2020-21-Research-January-March); removed Research (FY2020-21-Research-October-December).

Updates

We have analyzed the impact of reverts on negative examples (reliability issue being solved)
We have already created an heuristic to find negatives examples.
We have created an initial dataset 80 templates.
Currently we identifying relevant meta-data (ie. pre-computed features) to be added on the dataset.

Updates

New metadata has been added to the dataset: We are differentiating templates at article, section, and inline level.

diego added a subscriber: Miriam.Feb 11 2021, 4:57 PM

Updates

We have announced the datasets in a presentation to the NLP group in the University of Cambridge.
Currently working on documenting the datasets.

Updates

The datasets will be published next week.

Updates

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia, has been published in Figshare.
The documentation about this dataset can be found here.
We also wrote a paper and submited to a peer-reviewed venue.
The Outreachy internship has been successfully completed.

diego updated the task description. (Show Details)Mar 8 2021, 2:58 PM

Updates

Information updated on betterworks.

Updates

No updates.

Updates

No updates

diego mentioned this in T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..Apr 3 2021, 1:08 AM

leila moved this task from FY2020-21-Research-January-March to FY2020-21-Research-April-June on the Research board.Apr 29 2021, 1:01 AM

leila edited projects, added Research (FY2020-21-Research-April-June); removed Research (FY2020-21-Research-January-March).

*Updates*

We are preparing the camera ready version for SIGIR.

Updates

The CR version was submitted.
We have uploaded the paper on arxiv, it should be available next week.

Updates

The paper is published at SIGIR and available here: https://arxiv.org/pdf/2105.04117.pdf

diego closed this task as Resolved.May 14 2021, 3:04 PM

Develop 1 model to identify MisinformationClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Develop 1 model to identify Misinformation
Closed, ResolvedPublic
Actions