Name : Vanilla Thulisile Sibanda
Github : https://github.com/thulieblack/
Email : sibanda.thulie@gmail.com
Location : Capetown,South Africa
Time Zone : (UTC +02:00) Central African Time
Typical working hours : 9pm - 5am Central African Time
Summary
The project consists of researching, gathering and processing Wikipedia related data about articles content reliability, detecting crowd-generated tags or labels currently used by the Wikipedia editors and developers to signal problems with content integrity on Wikipedia to other editors. Wikipedia templates and tools are now used to label potentially bad content, but they are usually not machine friendly. This project is poised to group this content, select the most relevant ones, and create machine readable datasets that will allow ML systems to detect problematic content potentially automatically. We will also test those datasets by running different ML algorithms that will be used as baselines for future researchers. The project is poised to :
*Explore the space of templates related to content integrity using semi-automated methods
*Download and process articles and sections with the found templates
*Analyze the data to summarize the main statistics
*Potentially produce visualizations of the statistics and data properties
Mentor
lsaac Johnson @
Project Timellne
| Weeks | Outcomes
|November 5 to November 30 | During this period l would dedicate in learning more` and contribute