Name : Vanilla Thulisile Sibanda
Github : https://github.com/thulieblack/
Email : sibanda.thulie@gmail.com
Location : Capetown,South Africa
Time Zone : (UTC +02:00) Central African Time
Typical working hours : 9pm - 5am Central African Time
Summary
The project consists of researching, gathering and processing Wikipedia related data about articles content reliability, detecting crowd-generated tags or labels currently used by the Wikipedia editors and developers to signal problems with content integrity on Wikipedia to other editors. Wikipedia templates and tools are now used to label potentially bad content, but they are usually not machine friendly. This project is poised to group this content, select the most relevant ones, and create machine readable datasets that will allow ML systems to detect problematic content potentially automatically. We will also test those datasets by running different ML algorithms that will be used as baselines for future researchers. The project is poised to :
- Explore the space of templates related to content integrity using semi-automated methods
- Download and process articles and sections with the found templates
- Analyze the data to summarize the main statistics
- Potentially produce visualizations of the statistics and data properties
Mentor
lsaac Johnson @Isaac
Project Timeline
Weeks | Outcomes |
November 5th to November 30th | During this period I dedicate myself in learning and contributing to tasks on Mediawiki while improving my skills in data analysis/science and also learning machine learning. |
December 1st to December 28th | Week 1 to Week 4 -My aim is to intensely explore the spaces of templates related to content integrity by using machine learning algorithm's ,training data inorder to make predictions |
December 29 to January 25th | Week 5 to Week 8- Downloading and processing articles and sections with the found templates. Recreating files, making amendments and changes. |
26th January to February 8th | Week 9 to Week 11- Analyzing the data and summarizing the main statistics using python libraries and create statistical data reports. |
9th February to February 16th | Producing visualization of the analyzed data statistics and implementing machine learning systems |
18th February to February 23rd | Implement any feedbacks and changes added from the reviews |
24th February to March 1st | Finalize ,review, organize and document necessary changes |
Participation
I will continue to communicate with Isaac Johnson via the public chat
I will ask help on the designated project's communication channel
About Me
I started my tech journey in March 2020 after l lost my job in the hospitality industry. This has been a dream come true for me to study tech as it has always been my passion. I took advantage of the lockdown and did a diploma course in python programming at Alison. I went further and developed my skills in data analysis and did some micro projects at freeCodeCamp, here are some of the repos of the projects that l did here. I participated in a virtual 1 month internship at Hash Analytic where l learned to do visualization's, applying machine learning models and to do presentations. The training was a life lighter for me from which l learnt a lot that l believe will be uniquely beneficial to this project.
How do hear about this Program?
After completing on my internship l came across a post on twitter
.
Will you have other time commitments during the program?
I don't have any commitments that will interfere during the program.
What does this project mean to you?
It will be an honor and a privilege to participate in this project as this will help me enhance my skills and experience while working with expert researchers in the area of machine. It will also be a privilege to get this opportunity as wikimedia provides free educational content through projects and support structure for continued skills development.
This also would be a major milestone for me as this is my first time to contribute to Open Source.
Contributions
I recently joined wikimedia community during this outreachy round and l have been active on the public chat,collaborated with other outreachy applicants and have also communicated with my mentor.
Contributions to mediawiki
- https://phabricator.wikimedia.org/T263874 ~In progress