Page MenuHomePhabricator

Proposal: Media Data Verification Tool - GSoC 2020
Closed, ResolvedPublic

Assigned To
Authored By
Gabrielchl
Mar 12 2020, 10:08 PM
Referenced Files
F31718178: Screenshot from 2020-03-31 18-58-05.png
Mar 31 2020, 5:58 PM
F31716734: use-cases.jpg
Mar 31 2020, 6:54 AM
F31716732: ensure-data-correct-methods.jpg
Mar 31 2020, 6:54 AM
F31716729: prototype-screenshot.jpg
Mar 31 2020, 6:54 AM
F31678462: gsoc2020proposal1.jpg
Mar 12 2020, 10:08 PM
Tokens
"Barnstar" token, awarded by srishakatux."Like" token, awarded by zhuyifei1999.

Description

Profile

Name: Gabriel Lee
IRC nick: gabrielchl
Web profile: GitHub | Meta wiki profile
Location: United Kingdom / Hong Kong
Typical working hours: 10am - 11pm UTC+1 / 8am - 10pm UTC+8

Synopsis

Since Wikimedia Commons introduced structured data to the files hosted on the site, users are encouraged to add structured data to the files. However, sometimes, quantity is weighed over quality. This project aims to create a tool, possibly named “Image Data Verification Tool”, for users to verify the structured data on files hosted on Wikimedia Commons, ensuring that the data on Wikimedia Commons is correct. The name simply describes what the tool is about, and allows room for extension out of just depict statements.

The plan is to start with verifying depict statements, if there's enough time, the tool would also have sections for other image data.

Mentors: @Eugene233, @NavinoEvans
(expressed interest to participate in this task)

Prototype

Media Data Verification Tool on GitHub

prototype-screenshot.jpg (2×3 px, 458 KB)

Deliverables

At the end of the program, we should have a tool that:
Must have:

  • Requires users to login using OAuth
  • Shows users description of images (during the program, it should cover depict statements)
  • The told would allow the user to choose to retrieve images from recent changes, a category or a tag (e.g. ISA)
  • Let's users select if the description is true or false (probably also allow a user to undo his/her selection)
  • Have a user page to show statistics of a user’s contribution (maybe also a history of a user's record on the site)
  • All code are written elegantly

Nice to have:

  • Lets users create campaigns (Similar to ISA, particularly useful during special occasions)
  • Extend to more than depict statements, to cover other media data, and machine-suggested image labels
  • Implement a method to ensure that edits made through the tool are legit (to be discussed with mentors carefully to choose the best strategy, options listed below)

ensure-data-correct-methods.jpg (2×2 px, 403 KB)

Overview of how the tool would work

use-cases.jpg (2×2 px, 540 KB)

Possible database structure

Screenshot from 2020-03-31 18-58-05.png (686×814 px, 62 KB)

Timeline

The following timeline sets the deadlines. However, it is highly likely that we will achieve more than what's listed below.

Note that I’ve listed them with huge flexibility given the current situation.

4 May - 31 May (Community bonding period)

  • Create tool on toolforge
  • Create repo on Gerrit
  • Create project on Phabricator.
  • Discuss implementation details with mentors.

1 June - 28 June (4 weeks)

  • Set-up development environment.
  • Create the core part of the tool with OAuth login and the ability to get user details.
  • Add ability to retrieve statements from Commons and show to the user.
  • Ability to save changes.

Phase 1 evaluation

29 June - 26 July (4 weeks)

  • Documentation and bug fixes.
  • User statistics page.
  • Write tests.

Phase 2 evaluation

27 July - 23 August (4 weeks)

  • Internationalization.
  • Writing documentation.
  • Additional features.

24 August - 30 August
Code submission and student final evaluation.

31 August - 7 September
Mentor submit final evaluation.

8 September
Results announced.

Participation
  • Work on and upload code to the repository every weekday, sometimes weekends too.
  • Be online on IRC during my working hours (I am usually very responsive as long as I'm up) (we could probably use other medium of communication depending on the mentor' preference).
  • Use Phabricator to track tasks and progress.
Why Me (About Me and Past Experience)
  • I am a student from Hong Kong, currently studying Computer Science at Lancaster University, United Kingdom.
  • I am the maintainer of gabrielchihonglee-bot, running on Toolforge, using pywikibot, mainly performing edits in Commons (80k+ edits), also an adminbot on Chinese Wikivoyage, sometimes on other wikis.
  • I am an admin on Chinese Wikivoyage, so I do understand how wikimedia projects works.
  • I am comfortable coding in C, Java and Python. I do have a little bit of experience with Flask. I am also familiar with git.
  • I target to write beautiful code, as proven in the patches below.
  • I've set up and am maintaining several websites.
  • Why this task: it's at the sweet spot between too-hard and too-easy for me. Allows me to learn while using my existing knowledge.
  • I will continue to maintain the tool after the GSoC program (pointing this out as I heard that a lot of mentees tend to abandon their project after the program, I am a trusted and long-term contributor on Wikimedia projects, so the likelihood of that happening is low)
Related-Tasks

Most of them are ISA-related. I initially started working on it just for this application. But I found it interesting (and a bit addicting), so I think I will continue contributing to ISA or maybe other tools in the future. :)

Recently completed
T230942, T245759, T226306, T246657, T246652, T231831, T231193, T231466, T234526, T246651, T232434
(along with a few non task-dependent patches)

Waiting for code review
T225817, T234860, T226316, T228512

Event Timeline

@Gabrielchl Hi! Thanks for your proposal. I hope that you are already in touch with potential mentors of this project as it would be nice if you could get early feedback from them. If for some reason you are not able to reach out to them, please let me know!

@Gabrielchl Hi! Thanks for your proposal. I hope that you are already in touch with potential mentors of this project as it would be nice if you could get early feedback from them. If for some reason you are not able to reach out to them, please let me know!

Hi there, yes I am already in touch with them, thanks a lot!

Gabrielchl renamed this task from Proposal: Image Data Verification Tool - GSoC 2020 to Proposal: Media Data Verification Tool - GSoC 2020.Mar 16 2020, 10:07 PM
Gabrielchl updated the task description. (Show Details)
Gabrielchl updated the task description. (Show Details)

Hi @Gabrielchl, thanks so much for the proposal it all looks very well thought out so definitely nothing that needs adding or changing :)

I've added a couple of minor comments for future discussion in the Google docs version. @Eugene233, do you have anything further to add?

@Gabrielchl My suggestion would be to think out a descriptive name which could be used for the tool and explain why you think this name could be used. Also, it would be nice to explain more the features which you think would be nice to have.

@Gabrielchl My suggestion would be to think out a descriptive name which could be used for the tool and explain why you think this name could be used. Also, it would be nice to explain more the features which you think would be nice to have.

Will do! Thank you both for all the comments :)

Hello. It does look like to me as a "middle-ground" between ISA and the Wikidata games approach. Am I correct ?

@Anthere: Hard to say without links explaining what exactly "ISA" and "the Wikidata games approach" are.

ISA is mentionned in the project description : https://tools.wmflabs.org/isa/
The two mentors were involved in ISA and unless wrong, Gabriel Lee already participated to its code.

Wikidata games : https://www.wikidata.org/wiki/Q17595556
And by "approach", I meant "delivery of already pre-triaged' info and validation Y/N by a human

But my question was more addressed to Gabriel Lee. I probably should have noted that. I am just trying to identify what are the shared elements.

Hi @Anthere, this tool was proposed WITHOUT ISA / Wikidata games in mind. However, currently, we do have plans to allow ISA and this tool to integrate. For example, there might be campaigns in this tool to verify claims created in campaigns in ISA. Prior to last week, I never heard of Wikidata games. It does seem like this project kind-of overlaps with some of Wikidata games' functionality. However, here, we aim to create an easy-to-use platform.

The proposed tool does use the Wikidata games approach, with the tool showing existing or machine suggested depict claims to the user, and having the user validate it.

A note to anyone who would like to learn more about the development progress:

Our team is using ClickUp, another project management tool, for progress tracking. We understand that others might want to be updated on the development progress. For that purpose, we've setup a page on Commons: https://commons.wikimedia.org/wiki/Commons:Media_Data_Verification_Tool, where we will post updates, at least bi-weekly.

Google-Summer-of-Code (2020) is over! I believe you have already documented your project here https://www.mediawiki.org/wiki/Google_Summer_of_Code/Past_projects#2020. If not, I would encourage you to do so. Also, is there anything else remaining in this task to address? If not, please consider closing this task as resolved.