In this task, we want to exercise simple parsing of a Wikipedia article and classifying some of its sentences.
Please write a program or script in your preferred language that:
1- Receives as input the title of a English Wikipedia article.
2- Retrieves the text of that article from the MediaWiki API. If using Python, consider using python-mwapi for this.
3- Identifies individual sentences within that text, along with the corresponding section titles. If using Python, mwparserfromhell can help you work with wiki markup.
4- Runs those sentences through the model to classify them.
5- Outputs the sentences, one per line, sorted by score given by the model.
This is similar to the run_citation_need_model.py script in the model repository, but that one loads its input statements from an already structured file, and you have to extract that informations directly from a Wikipedia article.
Please create a GitHub (or similar, like BitBucket) repository with your code and send us a link to it in a comment on this Phabricator entry.
Deadline: This task has no deadline of its own, other than the November 5th deadline for contributions in Outreachy. The sooner the better though, as we would like to look at your code, maybe file an issue and/or discuss design decisions before the actual deadline.
Feel free to ping @Miriam or myself if you have questions.