Together with Besnik from L3S, create a machine learning model based on the data collected in T186279 able to score statements according to the probability that they need a citation.
- Compile a summary of the 'citation needed' rules
- Derive multilingual NLP features from the rules above
- Implement features
- Train/test on positive/negative statements automatically extracted from articles
- Train/test on larger scale data of different quality
- Design an end-to-end framework based on deep learning on automatically collected data