Since Wikimedia Commons introduced structured data to the files hosted on the site, users are encouraged to add structured data to the files. However, sometimes, quantity is weighed over quality. This project aims to create a tool, possibly named “Image Data Verification Tool”, for users to verify the structured data on files hosted on Wikimedia Commons, ensuring that the data on Wikimedia Commons is correct. The name simply describes what the tool is about, and allows room for extension out of just depict statements.
The plan is to start with verifying depict statements, if there's enough time, the tool would also have sections for other image data.
At the end of the program, we should have a tool that:
- Requires users to login using OAuth
- Shows users description of images (during the program, it should cover depict statements)
- The told would allow the user to choose to retrieve images from recent changes, a category or a tag (e.g. ISA)
- Let's users select if the description is true or false (probably also allow a user to undo his/her selection)
- Have a user page to show statistics of a user’s contribution (maybe also a history of a user's record on the site)
- All code are written elegantly
Nice to have:
- Lets users create campaigns (Similar to ISA, particularly useful during special occasions)
- Extend to more than depict statements, to cover other media data, and machine-suggested image labels
- Implement a method to ensure that edits made through the tool are legit (to be discussed with mentors carefully to choose the best strategy, options listed below)
Overview of how the tool would work
Possible database structure
The following timeline sets the deadlines. However, it is highly likely that we will achieve more than what's listed below.
Note that I’ve listed them with huge flexibility given the current situation.
4 May - 31 May (Community bonding period)
- Create tool on toolforge
- Create repo on Gerrit
- Create project on Phabricator.
- Discuss implementation details with mentors.
1 June - 28 June (4 weeks)
- Set-up development environment.
- Create the core part of the tool with OAuth login and the ability to get user details.
- Add ability to retrieve statements from Commons and show to the user.
- Ability to save changes.
Phase 1 evaluation
29 June - 26 July (4 weeks)
- Documentation and bug fixes.
- User statistics page.
- Write tests.
Phase 2 evaluation
27 July - 23 August (4 weeks)
- Writing documentation.
- Additional features.
24 August - 30 August
Code submission and student final evaluation.
31 August - 7 September
Mentor submit final evaluation.
- Work on and upload code to the repository every weekday, sometimes weekends too.
- Be online on IRC during my working hours (I am usually very responsive as long as I'm up) (we could probably use other medium of communication depending on the mentor' preference).
- Use Phabricator to track tasks and progress.
Why Me (About Me and Past Experience)
- I am a student from Hong Kong, currently studying Computer Science at Lancaster University, United Kingdom.
- I am the maintainer of gabrielchihonglee-bot, running on Toolforge, using pywikibot, mainly performing edits in Commons (80k+ edits), also an adminbot on Chinese Wikivoyage, sometimes on other wikis.
- I am an admin on Chinese Wikivoyage, so I do understand how wikimedia projects works.
- I am comfortable coding in C, Java and Python. I do have a little bit of experience with Flask. I am also familiar with git.
- I target to write beautiful code, as proven in the patches below.
- I've set up and am maintaining several websites.
- Why this task: it's at the sweet spot between too-hard and too-easy for me. Allows me to learn while using my existing knowledge.
- I will continue to maintain the tool after the GSoC program (pointing this out as I heard that a lot of mentees tend to abandon their project after the program, I am a trusted and long-term contributor on Wikimedia projects, so the likelihood of that happening is low)
Most of them are ISA-related. I initially started working on it just for this application. But I found it interesting (and a bit addicting), so I think I will continue contributing to ISA or maybe other tools in the future. :)