Page MenuHomePhabricator

Outreachy/GSOC'17 proposal for Single Image Batch Upload
Closed, DeclinedPublic

Description

If you are accepted, fields that are marked "(public)" will be displayed on a public acceptances page. If you prefer different information to be displayed publicly than you want to provide in this form, please provide both a private version visible only to the coordinators and mentors of Outreachy, and a public version to be displayed publicly.

Name (public):
Kamakshi Suri

Do you meet the eligibility requirements outlined at https://wiki.gnome.org/Outreachy#Eligibility (if no, explain why not)?
Yes

Preferred pronoun (e.g. she, he, they):
She

E-mail address:
kamsuri5@gmail.com

IRC nick (public):
kamsuri

Internet presence (e.g. web page, blog, portfolio, GitHub, Twitter, LinkedIn links) (blog will be displayed publicly):
Github: https://github.com/kamsuri
Linkedin: https://www.linkedin.com/in/kamakshi-suri/

Location (city, state/province, and country) (public):
City: New Delhi
State: Delhi
Country: India

Education completed or in progress (include university, major/concentration, degree level, and graduation year):
University: Netaji Subhas Institute Of Technology, New Delhi
Education in progress: Junior Undergraduate
Major: Computer Engineering
Degree Level: B.E
Graduation Year: 2018

How did you hear about this program?
A friend of mine participated in outreachy (Round 13). Also many senior students of my university have performed well in GSOC.

Are you applying for Google Summer of Code and, if so, with what organization(s)?
Yes, i am applying for Google Summer of Code with Wikimedia organization for single image batch upload project.

Please describe your experience with the organization's product as a user and as a contributor (include the information, as well as a link or an attachment, for the required contribution you made to the project you are interested in here):

As a user, idea of the project seems fascinating as it will ease out the work required to upload images at Wikimedia:Commons by users from GLAMs.
There will be plethora of advantages as-

  1. No manual uploading is required- easy upload on one button press.
  2. Flexibility to upload multiple images at same time.
  3. No requirement of any technical knowledge.
  4. No need to deal with permissions.
  5. Everything will be automated from tag fetching to title generation, no need to enter any details.
  6. Overcoming the hassle(download, metadata updation, etc) required to upload an image from an external source.

As a contributor i find this project challenging and exciting, both at the same time. I find it challenging as it will extract the best out of me and exciting because it will allow me to work on something new along under some talented mentors. This project doesn’t have an already existing codebase which will allow me to explore for options and select the best out of the lot which implies no limitations. I love automated things as they ease out one’s work and this project will allow me to contribute to such an automation. As a contributor the problems which this project will resolve will majorly be batch uploading, no uploading of duplicate images with different names on Commons, no need to design different upload scripts for different GLAMs and making the process of uploading modular.

Please describe your experience with any other FOSS projects as a user and as a contributor:
I have contributed to FirstAide-Web app by systers organization which is a very handy tool for people in emergency situations.

Please describe any relevant projects that you have worked on previously and what knowledge you gained from working on them (include links):

Online-Treasure-Hunt: I have developed a platform for online treasure hunt. This is a ready to use portal to host an online-Treasure-Hunt. I built it for a competition hosted by the technical society of my university. It is coded in native PHP and is safe from various security flaws. From this project i have learnt how to deal with the problems encountered in a live running code. From this project i have also gained knowledge of various aspects of security issues and improved my coding practice to make my code efficient and flexible.
Link: https://github.com/kamsuri/Treasure_hunt

Survive: Web portal for a Disaster Management Android Application android application. It is designed for volunteers where they can register and receive messages. I have also deigned API through which information can be exchanged between web portal and android application. It is build in PHP. It is a project of my first hackathon, through this project i learnt how to make a final product in a simulated time and realized the importance of modular and efficient coding.
Link: https://github.com/kamsuri/Survive
Face-Recognition Tool: This is a simple python script which detects faces in an image. This project was a good learning experience for me. I majorly learnt about OpenCV and python through this project.
Link: https://github.com/kamsuri/Face-Recognition

Hospital-Hacks: It is an interface made to reduce the gap between hospitals and the patients. It aims at automating all the thought process that goes into looking for a hospital around one’s house considering their needs. It gives access to live scenario of the hospital to the user. Doctor and reception interface is built in PHP and for user there is an android application.
Link: https://github.com/kamsuri/Hospital-Hacks

Moksha-Website: This is a portal for Inter college festival of my university. I have coded a few APIs for event registration and contestant registration for this website. Through this project i learnt to code modularly and gained an experience to work with with a huge team.
Link: https://github.com/kamsuri/MokshaWebsite

Content-Management-System: I have also designed and coded a content-management system for computer society of my university. It is developed to smoothen the content control of website and social media platforms. This project helped to learn how to integrate various APIs into my code efficiently. Through this i learnt to deal with all the permissions issues required to access external sources.
Link: https://github.com/kamsuri/Content-Management-System

What project(s) are you interested in (these can be in the same or different organizations)?
I am interested in Single Image Batch Upload project which is under Wikimedia organization.

Who is a possible mentor for the project you are most interested in?
Mentor: Basvb
Co-mentor(s) : Tom29739, zhuyifei1999

Please describe the details and the timeline of the work you plan to accomplish on the project you are most interested in (discuss these first with the mentor of the project):

Synopsis
The main objective of this project is to ease out the workflow required to upload images released by GLAMs(acronym for Galleries, Libraries, Archives, and Museums) to Wikimedia: Commons. It also aims at overcoming the problem of batch upload due to which multiple irrelevant images gets uploaded, as user does not intend to do manual uploading of each image. This project aims at building a Flask platform from where users can upload a single or a small set of images at the same time to Wikimedia: Commons through a well structured process and in minimum possible time. This project will also deliver a robust set of APIs to support integration on GLAM’s end.

Project Goals

  • Precise Flask design for backend and frontend of the framework.
  • Setting up frontend for the platform.
  • Coding of upload script which includes the following:-
    • Getting GLAM's API access
    • License verification of uploaded files
    • OAuth permissions for authentication of users
    • Fetching image from URL
    • Building final upload template from metadata mappings
  • Setting up a structure on Wikimedia: commons which will include the following:
    • Making a landing page for the product
    • Adding categories for GLAMs
  • Designing of metadata-mappings for GLAM’s APIs.
  • Working on date parsing, feedback, file title generator and a few more default functionalities.
  • Connecting with communities to pitch them for our product.
  • Technical User documentation and documentation for users.
  • Deployment of framework on Toollabs.

Proposed Flow Of Framework

Upload.jpg (811×730 px, 71 KB)

Process.jpg (1×754 px, 118 KB)

Timeline

May 4, 2017 – May 30, 2017

This phase will include community bonding along with investigation and design planning of the tool.

  • Study the requirements of the project to more depth to have a more clear scenario and discuss them with mentors.
  • Study already existing tools for uploading images to Wikimedia: Commons which will allow to compare different techniques and methods to achieve the same goal.
  • Getting Toollabs access and trying it a bit to understand its working properly which will help in deployment.
  • Learning more about metadata fields and various techniques to fetch them.
  • More Investigation on the needs of GLAM users to design our tool accordingly.
  • Compiling list of GLAMs.
  • Preparing Phabricator tasks and categorizing the project into multiple phases.
  • Design proper workflow along with finalizing precise design for frontend and backend.

Minimum Viable Product(MVP)
The minimum viable product aims at completing the following tasks before the end of June:-

  • Designing a simple UI for the platform.
  • Tool to generate template for upload file using metadata mapping of selected GLAM.
  • Coding an upload script in a proper working condition which will be able to upload files to Wikimedia: Commons using OAuth permissions along with license verification of uploaded files

Week 1 (June 1 – June 7)
In the initial week, i will be focusing on the working of the following functionalities:-

  • Setup a quick and simple UI
  • Identify selected GLAM from the hash table
  • Call to GLAM’s API and getting API access key

Week 2 (June 8 - June 14)
This week will include dealing with following aspects:-

  • Selection of image(s)
  • Dealing with Licenses
  • Writing user and technical documentation and testing of the code.

Week 3 (June 15 – June 21)
This week will focus on the following:-

  • Getting URL of image
  • Download image and check if it already exists
  • Collecting tags and generating filename for the image.
  • Build final template for upload file using metadata mappings

Week 4(June 22 -June 28) - Week 5(June 29 -July 4)
This would be the final phase of coding upload script.

  • Upload file using OAuth permissions.
  • Testing and documenting of upload script.
  • Deployment of MVP on Toollabs
  • Writing user documentation and testing the MVP.

By the end of June, Minimum Viable Product is completed and is tested.

Week 6(July 5 -July 11) - Week 7(July 12 -July 18)
From July onwards i will start working on Core extensions

  • Setting up a structure on wikimedia: commons, creating some categories and sending proposals to a few communities.
  • Designing a few mappings along with default functionalities such as date-parsing, title generator, connecting to wikidata elements, etc.
  • Testing those mappings with upload script

Week 8(July 19 -July 25)

  • Connecting with communities who seems interested.
  • Working on feedback to add something more to default functionalities.

Week 9(July 26 -Aug 1) - Week 10(Aug 2- Aug 8)

  • Designing a few more mappings along with default functionalities.
  • Testing those mappings with our upload script.
  • Working on improving the front end.

Week 11(Aug 9 -Aug 15)

  • Completing user and technical documentation.

Week 12(Aug 16 -Aug 22) - Week 13(Aug 23- Aug 30)

  • This phase will involve rigorous testing and making the required changes. And fix subsequent bugs encountered.
  • At the end of the internship program the Final report is submitted for evaluation.
Tasks to be completedTimeline
Community Bonding Period, get better understanding of the project, exploring existing tools, get familiarized with Toollabs and get its access , prepare workflow and precise design layout for frontend and backend.04/05/17-30/05/17
Setting up a simple UI along with Coding upload script in a proper workable condition01/06/17-25/06/17
Testing MVP and deployment of MVP on Toollabs (Phase 1 evaluation)26/06/17-04/07/17
Setting up structure on Commons, Designing metadata mappings along with default functionalities.05/07/17-18/07/17
Connecting with communities and working on feedback19/07/17-23/07/17
Designing a few more mappings and improve the front end. (Phase 2 evaluation)24/07/17-08/08/17
Completing user documentation09/08/17-15/08/17
Testing and updating Deployment16/08/17-22/08/17
Freeze code and final Report Submission .(Final evaluation)23/08/17-30/08/17

Micro-Tasks Carried Out :

Participation
As MediaWiki uses Phabricator which is a set of powerful tools used for managing bugs and tasks, I also intend to use it for tracking bugs, features. It is also helpful for getting feedback from people who are a part of the organisation.All bugs and features will have tasks linking to the project, which will allow easy tracking and monitoring.
The code base will also use Git for reviewing and managing workflow.I'm comfortable with using Git for project development and management.
I can be contacted on email or IRC, also I intend on using my new blog for sharing my experience while working on the project.I will try to weekly update my blog by writing new posts.
I believe IRC and Mailing lists are great place to seek help.It would also be great to get in touch directly with my mentors via email if possible.
If i get selected, GSOC will be my main area of focus during summers.

About Me

  • I'm 19 year old Junior undergraduate of Netaji Subhas Institute Of Technology pursuing Bachelor of Engineering in Computer Sciences.
  • I enjoy working on open source technologies and love to contribute.
  • I like to work with new technologies and explore new areas. I have a good experience of working in teams.
  • It is always a learning experience to understand practical development at large scale. I also enjoy working on competitive programming problems and learning algorithms.

Experience

  1. Works as Director Of Web Operations at CSI NSIT.
  2. Worked as Front end designer at Ecounsellors.in for duration of 6 months.
  3. Academic Courses : Object oriented programming, Algorithm Design, Operating systems, Web technologies

Skills
-Programming Languages : C++, PHP, Javascript, HTML, Python, CSS
-Frameworks and Libraries: jQuery, Bootstrap
-Operating Systems: Linux, Windows
-Tools: Git, Adobe Photoshop
-Database : MySql

Will you have any other time commitments, such as school work, exams, research, another job, planned vacation, etc., between May 30, 2017 and August 30, 2017? Please provide exact dates for these commitments and the number of hours a week these commitments take.
I have my college vacations from May 30 to July 25. Although i am enrolled in a college internship programme which is mandatory for all the students from May 29 to July 21 but GSOC will be my main area of focus if i get selected. I will be available for minimum of 5 hours(7:00 pm to 00:00 am [Indian standard time]) on weekdays. On weekends the schedule would be to work on a minimum of 8 hours(11:00 am to 07:00 pm [Indian standard time]). I will be working on this project for minimum of 40 hours a week as per my above schedule. My internship programme and classes will take maximum of 23 hours a week.

If a student, please list the courses you will be taking between May 30, 2017 and August 30, 2017, how many credits you will be taking, and how many credits a full-time student normally takes at your school. Please provide a link or upload your program's suggested curriculum by semester, which includes the suggested number of credits in each semester. Please provide a link or upload your school's academic calendar.

Curriculum:http://nsitonline.in/student-resources/academics/coe/syllabus-coe?download=195:COE
A full time student normally takes 27credits for a semester, but if a student is enrolled in a programme the project credits
are not required which are about 12(3credits*4projects) credits for a semester.
Academic Calendar: https://drive.google.com/open?id=0B0KJtMR93IZaZjhpeG1Ic1hNaXc

Event Timeline

@Basvb , @tom29739 , @zhuyifei1999 kindly please review my Outreachy Proposal and give your feedback

Kamsuri5 updated the task description. (Show Details)

Hi Kamsuri5 thanks for the nice and thorough proposal. I'll be giving some feedback as requested, hopefully you can use the feedback to make an ever better proposal.

  1. First of all maybe it is good to provide some more structure in the proposal here (see some of the other proposals and the style template). Different levels of headings and start with the most important things (projects goals/personal goals, project overview/timeline) and end with your background (past experience)
  2. Small remark: MVP means minimum viable product instead of "Minimum Viable Project".
  3. You could upload your design sketches to phabricator and link (some of) them to support your story. #May 4- May 30 is the time for community bonding, this is a good time to get the know the project and the people behind it a better. It is also a good time to make request access to tool labs and work together on making a precise design of what has to happen (including making phabricator tasks and describing the different steps). This will allow you to directly start on the 1st of June without having to wait on anything or investigate a lot after the start of coding.
  4. Try to be as explicit as possible in what you want to do. An example: Instead of writing: "I’ll do some reading and get to know the exact skill required to start with this project and work on it" you can better write: I'll be reading X and Y and will work on skill Z and A. The same holds for "some microtasks".
  5. Week 1: I don't think it is a good idea to start with the front end, starting with a very simple front end for the MVP in month 1 and focussing on the back end seems better to me. Then in month 2 there can be more focus on the front end.
  6. I think it is a good idea to quickly try to get things working on toollabs, this will be something that will likely cause some difficulties so trying a bit in May could be a good idea.
  7. Please ensure that you test and document your code along the whole project, and not just at the end. Doing testing and documentation at the end will cost more time because you are less familiar with what you did and if something costs more time there won't be no time left for this.
  8. Week 6 and 7. I'm not sure whether it is a good idea to focus on adding GLAMs for two whole weeks. Maybe it is a better idea to combine this and week 9 and 10. For example week 6 and 7 you work on 2 metadata mappings and date parsing + connecting to wikidata (or any other combination) and in week 9 and 10 you again do a few mappings and some default functionality.
  9. Week 8: For a button on a GLAMs website we need to get a GLAM involved who wants to do this. I think this is not that easy to achieve and it is very dependent on external factors.
  10. Community engagement: maybe a good idea would be to use week 8 for setting up a structure on Wikimedia Commons, we need to create some categories for images to land in. But we also need a location where people can request new metadata templates and changes to templates. If we have a MVP ready we can show this to some communities and allow them to test it.
  11. Contact with GLAM. I can try to set up a meeting (Skype?) with some GLAM employees (likely Dutch as that is where I know the GLAMs) and we can ask them to look at MVP + ask them how they think about a button on their website to use the tool with.
  12. Time constraints: How many hours per week will the internship be? And do you mean that you will do the internship during the week? And from 25 JulY to 30 August will you have classes again? How many hours in the week will you have to work on your classes? Can you please list the courses, their credits and the full number of credits for a study year here (the link doesn't work for me). Please realise that you should be able to commit 40 hours a week from 1 June to 30 August to this project, if you have more than 20 hours of other commitments this will be very difficult and I'd like to see a very good plan on how you plan these things together.
  13. Communication: Please write a small part on how you plan to communicate with us (how often, where do you receive feedback). From the template: "We don't just want to know what you plan to accomplish; we want to know how. Briefly describe your work style: how you plan to communicate progress, where you plan to publish your source code while you're working, how and where you plan to ask for help. (We will tend to favor applicants that demonstrate a clear vision for what it means to be an active participant in our development community.)"
  14. Can you list the microtasks you completed (including a link to your Commons account for the image upload).

Quite a few pointers altogether, hopefully they can help you in improving your proposal even further. I like the ambitious MVP in the first month and hope that from there we can extend it into a really nice tool in month 2 and 3. After the Outreachy proposal deadline you can still change the proposal here/for GSOC so I think it is a good idea to focus on the most important things first. I'm looking forward to seeing your final proposal!

@Basvb thank you so much for such a detailed and a proper feedback, i have made the required changes. Please review my updated proposal, i hope this might fullfill all the requirements. I hope to have your feedback on this as well.

@Kamsuri5: I see you created another proposal: T161782. Do you want to have two separate proposals for GSOC and outreachy? Or is T161782: GSOC'17 Proposal for Single Image Batch Upload the new proposal and do I no longer have to look at this proposal? If that is the case please close this (T161649) task.

@Basvb Don't I have to submit different proposals for both the programs? The project schedule is same in both the proposals, it is just different from the perspective of their structures. As outreachy has it's own template. So if one proposal will server for both the programs, I will close one of the tasks.

We can leave it like this. Yes you have to submit a proposal to both Outreachy and GSoC. I think you are correct that for both of those you have to send in a different proposal. However we also have a proposal here on Phabricator which we use from the Wikimedia side. Personally I think it is better to have 1 proposal here if they are almost same. It gives a bit more clarity and saves all of the people who have to review the proposal from reading two proposals.

Alright, I will close the GSOC'17 proposal as this is more descriptive one.

Mind if I bold the questions? It's kind of hard to read with everything mixed together.

@zhuyifei1999 i have made the questions bold, hope now it will be easier for you to review the proposal.

@Basvb and @zhuyifei1999 i have studied the Commons licenses terms and conditions. I have uploaded images via upload script of pywikibot and as far as i have configured it does not deal with copyrights that is why my earlier images have been deleted and the image which i uploaded with upload wizard of Commons, it asks for copyrights permission so it was not deleted. I have researched a bit and found a few copyright tags https://commons.wikimedia.org/wiki/Commons:Copyright_tags, using which i think we can deal with this problem.

@Kamsuri5: Please make sure that you have submitted your proposal to GSOC also their website (https://summerofcode.withgoogle.com), I currently do not see your proposal their. Make sure to also set your proposal as final. I'll try to give some more feedback on the proposals (and image uploads) but this might be after the gsoc/outreachy deadlines.

@Kamsuri5: Yes if you do an upload with any of the upload scripts in pywikibot you have to create the full wikitext for the image description yourself. This includes an information template and a license template (template meaning wikitemplate).

Some questions on your available time still from my side:

  • After the 25th of July will you have classes and is your internship finished by then?
  • You state "due to my internship programme which i will cover on weekends" but you say to have 6 hours on weekdays and 10 hours in weekends for GSoC is your internship during weekdays or weekends?
  • You do not have to dedicate 50 hours a week to GSOC, 40 hours is what is asked for.

In the other aspects the proposal looks good and clear.

Kamsuri5 renamed this task from Outreachy proposal for Single Image Batch Upload to Outreachy/GSOC'17 proposal for Single Image Batch Upload.Apr 2 2017, 10:15 PM
Kamsuri5 updated the task description. (Show Details)
Kamsuri5 updated the task description. (Show Details)

@Basvb but the upload can be done without wikitemplate as well, the description is the only field required which it fetches from title itself. I will try to do an upload by framing a wikitemplate as well.

  1. After 25th my internship programme will be finished and my classes will resume.
  2. My internship will be on weekdays only and will approximately take 22-23 hours a week only.
  3. According to my daily schedule, I will able to contribute 50 hours a week. But if you want me to specify the schedule only for 40 hours, i will make the required changes.
Mvolz subscribed.

Just a reminder that this is the final day to submit your final proposal on summerofcode.withgoogle.com

@Mvolz thanks for reminding, i have submitted the proposal.

Thank you very much for your proposal. We had to pick between multiple suitable candidates for 1 place and ended up selecting another candidate. I send you an email with more information and some good points and potential points for improvement. I hope to see you around, just around at Wikimedia coding or in one of the next rounds.

Aklapper changed the task status from Resolved to Declined.May 7 2017, 4:05 PM

@Basvb: That makes this task declined though. :)