Define a pipeline to implement and test the idea of a machine-assisted framework for identifying unsourced statements that need citation.
Ideal steps:
1) Data Collection, possible sources:
- Manual annotation: [[ http://labels.wmflabs.org/ui/ | WikiLabel ]] (@Halfak )
- Manual annotation: [[ https://web.hypothes.is/blog/working-with-wikipedia-articles/ | Hypothesis.is ]]
- Automatic extraction of [[ https://en.wikipedia.org/wiki/Wikipedia:Citation_overkill | citation overkilling ]] statements
2) Feature Extraction
3) Online Evaluation
- [[ https://meta.wikimedia.org/wiki/The_Wikipedia_Library/1Lib1Ref | WikiLibrary ]] (@Sadads @Ocaasi_WMF)
- [[ https://tools.wmflabs.org/citationhunt/en?id=13229c57&cat=all | CitationHunt ]]
RESOURCES
- Meta-Wiki project: https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements
- Etherpad Discussion: https://etherpad.wikimedia.org/p/Research-WikiLibrary