Define a pipeline to implement and test the idea of a machine-assisted framework for identifying unsourced statements that need citation.
Ideal steps:
- Data Collection, possible sources:
- Manual annotation: WikiLabel (@Halfak )
- Manual annotation: Hypothesis.is
- Automatic extraction of citation overkilling statements
- Feature Extraction
- Online Evaluation
RESOURCES
- Meta-Wiki project: https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements
- Etherpad Discussion: https://etherpad.wikimedia.org/p/Research-WikiLibrary
- Guidelines for Citation Needed: https://docs.google.com/a/wikimedia.org/spreadsheets/d/1nUc8WmtU8F97vcmNv9LnqNSmOiK2UU9AOAl2JBdRFBs/edit?usp=sharing