Page MenuHomePhabricator

PROPOSAL: To Design and Develop a tool to correct False depicts claims manually on Wikimedia Commons
Closed, DeclinedPublic

Description

Profile Information:

Name: Ayush Shrivastav
Email: shrivastavayush32@gmail.com
IRC Nick: ayush_12 (freenode)
Github: AyushShri
Location: India
Typical Working Hours: 11 am - 6:30 pm IST (weekdays) ( UTC+5:30 ) + I will also work on weekends and overtime as required.

Synopsis:

Wikimedia Commons is used to contribute to the community by uploading free usable media files. In recent times, it has introduced a tool to add structured data to its media files. But it has been observed that sometimes the structured data added is not correct and hence needs to be verified. Hence this Project focuses on building such a tool named “WikiCommons Image Verification Tool” so as to ensure that all the structured data Provided on the commons website is correct.
Currently, the idea is to verify the categories and image description (depict statements)
Mentors: @Eugene233 & @NavinoEvans

Deliverables:

Must have

  • Users Sign-in using Oauth
  • Users will be able to search WikiCommons Categories and fetch images and its respective Depict statements based on that Category
  • The results can be sorted in 2 parameters - Recents & Random
  • Let users review with “ YES” or “NO” if the image description is true or false.
  • Attach a Rating for Every Review depending on User Account level (example: google map local guide level)
  • Moderator Sign up
  • Review by Moderator if in case Anomaly Encountered- Retain or Reject
  • A page to see User’s past contributions
  • If During Anomaly, those users whose vote matches with Moderator action ( Retain or Reject), then award points to User account which inturn increases User Account level.
  • Robust design of the tool.

Will Extend, if time Permits

  • extend to additional image information.
  • includes hashtags for searching along with categories.
  • integrate with ISA tool to keep fair competitions during campaigns.
  • use Vision API to get Machine suggestions for the image data.

Timeline:

The following specified is the deadline but it is very likely that we will achieve more.

4 May- 31 May
Community Bonding Period: Build Repository on Gerrit, tasks on Phabricator, Create the Tool on ToolForge, Learn about various API, discuss the implementation with the mentors and get ready with the environment

1 June- 14 June ( 2 weeks )
Setup user sign-in using OAuth, fetch user details and create a past Contributions history page.

15 June- 4 July ( 3 weeks )
Build a user Dashboard with the ability to retrieve Wikicommons Image and descriptions from category along with ability to Vote “YES” / ”NO” and add the ability to Retain or Remove claims based on avg. vote % automatically and update in the DB.

5 July- 18 July ( 2 weeks )
Build a moderator Dashboard with the ability to finally review in case of anomaly and retain or remove the claim. Also to award the user with points on a successful review by moderator.

19 July- 25 July ( 1 week )
Bug Fixes if encountered else UI Improvements.

26 July- 1 Aug ( 1 week )
Implement Administration Dashboard to Select and approve Moderator sign-up.

2 Aug- 8 Aug ( 1 week )
Overall Bug Fixes and other improvements.

9 Aug- 22 Aug ( 2 week )
Writing Documentation and test cases.

23 Aug- 1 Sept. ( 2 week )
Try to implement Extended Features.

2nd Sept.
Get Final Mentor review & Final submission.

Note: I will also have my semester exams during mid june during which i will be able to work less but I assure you I will compensate for that time beforehand so as this project progress doesn’t get hampered.
Apart from GSOC, I do not have any other commitments or internships scheduled during this period.

Participation:

  • I will maintain my repo and update and upload code every weekday.
  • I will be online on IRC and Zulip in my working hours to collaborate with the mentors.
  • I will use Phabricator for managing bugs and subtasks.
  • I will be available in gmail to be contacted when needed in the non-working hours.
  • Write Weekly Reports.

Idea Specification and Implementation:

STEP 1- We can make a web interface where the users can search the wikimedia commons categories.
STEP 2- Now we will use a Rating system for the users where a rating will be assigned to the respective image. This rating will depend on the User level/ experience as we see the rating for Google Local guides.
STEP 3- All the user has to do is click "Yes" or "No" when questioned about the things ( all structured data depicts, Wikidata items) they see in the images and a rating will be attached with the image. For instance if the uploader has set “trees are swaying” as the description while uploading, then we will ask the tool users if the image belongs to the correct category and has the correct description and structured data attached, and then his answer will be recorded as a YES/NO.
STEP 4- If a user depicts YES, then Yes stack flag will be updated with the rating while If a user depicts NO, then No stack flag will be updated with the rating and an average will be calculated for both the votes and a TimeStamp will be attached with it.
STEP 5- If the vote count will reach the threshold value, then going by the average vote %, we will update the Database automatically.
STEP 6- If the average YES and NO Vote percent is equal or ± 2 % and no. of votes ≥ (say) 50k then we will flag those images and a cron job will trigger an alert to the moderator.
STEP 7- Now the Moderator can Log On and can RETAIN or REMOVE only those claims which were flagged for the moderator and as per his decision, Database will be updated.

This process will be reliable and fast as many images can be reviewed at a time by the moderator depending on users average votes.

Q) What is the REQUIREMENT OF A MODERATOR ?

Ans- If there is an Anomaly under which the average YES and NO vote percent is equal or ± 2 % and no of votes ≥ threshold then we will flag those images and the moderator will only review these images. Hence we will achieve more accurate results with less moderator effort.

Q) If a user has already rated an image, WILL THAT IMAGE REPEAT?

Ans- No, Every image will be shown once. I have also created a Pseudo code snippet so as a user world not be shown the same image more than once ( if he has already voted to that image).
Pseudo code is attached below-

Workflow of the Tool:

Use Case Diagrams:

1- User:

2- Moderator:

Wireframe Screens:

1- User Screen

2- Moderator’s Screen-
(If Anomaly arises)

The Working Graphical Prototype for the user Dashboard can also be found by clicking here.

About me:

I am currently in the 2nd year of my Bachelors in Technology Degree from Pranveer Singh Institute of Technology, Kanpur, India with a major in Computer Science Engineering.
I have been tinkering with code since High school days. I love working on Web Apps, Tools as well as have a keen interest in Backend Development. I am a patient learner and like to work in collaboration.
This is my first participation in GSoC. During the summer, GSoC and this Project will be my first priority since I won't have any other commitments during this period.

I have been looking and understanding wikimedia for some time now and tried to contribute to some open issues and i have been thrilled to see the working methods of wikimedia and its products at such an amazing level and i think contributing to wikimedia by this project would impact the society in a positive manner. I am also excited and look forward to working with some amazing people from which I can learn a lot.

Past Experiences:

  • I have experience in working with GIT, C/C++, HTML,CSS, JavaScript and its Frameworks.
  • I have also experience with Node.Js and use MySQL For the Databases.
  • I am also comfortable with Python and have a basic knowledge of Flask ( which i shall improve during this intern course).
  • I have also set up my 2 major and 2 minor personal projects related to Web Development and excited to learn more.
  • I am also an Open Source Evangelist.
  • I have also organised a Workshop at college level to introduce Freshmen Students to GIT And Version Control Systems.

MicroTasks and Current Progress:

  • With Wikimedia , I have been trying to solve issue T232038 and T105637 and also made a PR for it.
  • Also raised a PR in gerrit https://gerrit.wikimedia.org/r/#/c/labs/tools/Isa/+/588656/ for updating the README Instructions while setting up ISA Tool. Also currently trying to solve another ISA Issue for removing hidden categories from commons showing in isa tool.
  • As a part of the Project , I have also started to create OAuth for login which is in progress.
  • Repo link- Github and the same repository will be updated soon with new commits which i have locally.

Why me ?

  • I have Studied the API’s required and other prerequisites thoroughly and realised that this project lies in my do-able range ( not easy & not tough).I have realised that i will learn alot during this project.
  • Most importantly, I would be happy to make a tool which will affect the community in a positive way and will be used by millions of users.
  • I would also keep contributing and working on this tool even after this GSOC development program as a responsible maintainer and contributor and will take the responsibility for further bugs and try to implement new features.

Looking Forward for an awesome learning Experience.

Event Timeline

Ayushshri121 added a comment.EditedMar 26 2020, 1:33 PM

Hello @Eugene233 & @NavinoEvans. Here is my Proposal for GSOC 2020 For T245758. It would be of great help to me if you could provide your valuable feedback on my Proposal. I have also shared my Google Docs Draft Proposal. Please review and provide your feedback about the same.
Thank You
Ayush Shrivastav

Pavithraes updated the task description. (Show Details)Mar 28 2020, 7:25 PM

@Ayushshri121 Your proposal looks great! I especially like all the diagrams, wireframes and the prototype, it shows that you have a clear mental model. :) Just one suggestion, I believe the tool will be hosted on Toolforge, maybe you could include that in your proposal and/or timeline.

@Ayushshri121 I think your proposal has a detailed analysis of the proposed tool. The diagrams are self-explanatory. Maybe you could clearly state out the deliverables in a section...

Ayushshri121 added a comment.EditedMar 28 2020, 8:12 PM

@Ayushshri121 Your proposal looks great! I especially like all the diagrams, wireframes and the prototype, it shows that you have a clear mental model. :) Just one suggestion, I believe the tool will be hosted on Toolforge, maybe you could include that in your proposal and/or timeline.

Thank You @Pavithraes for your feedback :) . I'll surely add about creating the tool on Toolforge and include it in my timeline as well.

@Ayushshri121 I think your proposal has a detailed analysis of the proposed tool. The diagrams are self-explanatory. Maybe you could clearly state out the deliverables in a section...

Thank You @Eugene233 for your valuable feedback :) . I have already stated all my intended work in the Deliverables section. You can check the same above :)

Hence this Project focuses on building such a tool named “WikiCommons Image Verification Tool”

We don't call Commons "WikiCommons"

Hence this Project focuses on building such a tool named “WikiCommons Image Verification Tool”

We don't call Commons "WikiCommons"

Thanks for the remark @zhuyifei1999. I had mentioned that name as a short name for the said tool. Still if you say, we can change the name of the tool.

Ayushshri121 updated the task description. (Show Details)Mar 30 2020, 9:52 AM

Hi @Ayushshri121, really nice work on the proposal :D The mock screenshots and diagrams are excellent, and really help to understand how everything works.
One thing that's missing is mention of the actual "depict statements" that refer to Wikidata items. For example, on https://commons.wikimedia.org/wiki/File:Playing_in_the_Nuba_mountains.jpg you can see the structured data tab shows it depicts "airplane (Q197)", "tree (Q10884)", "sand (Q34673)" etc. However, this doesn't affect any of the user flows you have described and can easily fit into the same UI design - it just amounts to some extra content that can be rated for keep/remove with questions like "Does this image depict a tree?".

Ayushshri121 added a comment.EditedMar 30 2020, 7:31 PM

Thanks alot @NavinoEvans for your valuable feedback :) .Yes we can surely rate the said extra structured data for keep/remove by asking questions. Also, i think, it will not be a problem to fetch the the extra structured data attached with the image and can be easily implemented :) :D

Ayushshri121 updated the task description. (Show Details)Mar 31 2020, 7:37 AM
Ayushshri121 updated the task description. (Show Details)Mar 31 2020, 7:48 AM
Ayushshri121 updated the task description. (Show Details)Apr 28 2020, 9:27 AM
Pavithraes closed this task as Declined.May 5 2020, 6:57 PM

@Ayushshri121 We are sorry to say that we could not allocate a slot for you this time. Please do not consider the rejection to be an assessment of your proposal. We received over 100 quality applications, and we could only accept 14 students. We were not able to give all applicants a slot that would have deserved one, and these were some very tough decisions to make. Please know that you are still a valued member of our community and we by no means want to exclude you. Many students who we did not accept in 2019 have become Wikimedia maintainers, contractors and even GSoC students and mentors this year!

If you would like a de-brief on why your proposal was not accepted, please let me know as a reply to this comment or on the ‘Feeback on Proposals’ topic of the Zulip stream #gsoc20-outreachy20. I will respond to you within a week or so. :)

Your ideas and contributions to our projects are still welcome! As a next step, you could consider finishing up any pending pull requests or inform us that someone has to take them over. Here is the recommended place for you to get started as a newcomer: https://www.mediawiki.org/wiki/New_Developers.

If you would still be eligible for GSoC next year, we look forward to your participation!

This comment was removed by Ayushshri121.

@Ayushshri121 We are sorry to say that we could not allocate a slot for you this time. Please do not consider the rejection to be an assessment of your proposal. We received over 100 quality applications, and we could only accept 14 students. We were not able to give all applicants a slot that would have deserved one, and these were some very tough decisions to make. Please know that you are still a valued member of our community and we by no means want to exclude you. Many students who we did not accept in 2019 have become Wikimedia maintainers, contractors and even GSoC students and mentors this year!

If you would like a de-brief on why your proposal was not accepted, please let me know as a reply to this comment or on the ‘Feeback on Proposals’ topic of the Zulip stream #gsoc20-outreachy20. I will respond to you within a week or so. :)

Your ideas and contributions to our projects are still welcome! As a next step, you could consider finishing up any pending pull requests or inform us that someone has to take them over. Here is the recommended place for you to get started as a newcomer: https://www.mediawiki.org/wiki/New_Developers.

If you would still be eligible for GSoC next year, we look forward to your participation!

@Pavithraes Thank you for reviewing my proposal and i would definitely would like to learn more from this experience.
Also as you said, I would be really thankful if you de-brief my proposal as it would help me in learning and understanding things more precisely. Also it would help me in improving things more to try again while i'll continue contributing with WMF.
I had learned some really interesting things with WMF and would also like to learn more.
Thank you so much.

@Ayushshri121 You've created a good application and thank you for your contributions! I see that you have also received feedback from your mentors, which is nice. :)

The student selected for this project (see proposal) seems to have made ~15 contributions and has been involved in the Wikimedia movement for some time. These proved to be an advantage for them. I'd suggest that for the next round, you focus on making more number of quality contributions during the application phase to improve your chances of selection. I'd also recommend that you continue making contributions to FOSS projects between now and the next season. This will not only count towards your past experience, but will also help you quickly understand new projects and make fast progress towards them.

Looking forward to your participation in the next round!