IRC nickname on Freenode: megha213
Github Profile: https://github.com/MeghaSharma21
Location (country or state): Chandigarh (UT), India
Typical working hours (include your timezone): 12:00 - 02:00 (UTC+5:30)
Short summary describing your project and how it will benefit Wikimedia projects
As we all know, Wikipedia is a community-powered project whose quality and quantity is largely dependent on the contributors. But sometimes people aren’t able to find the correct set of articles which need work and are of their interest. Also, since Wikipedia is so big and vast, newcomers usually get lost into it.
Hence, I’ve taken up this project of ‘Building a worklist tool for campaigns and in-person editing events’.
With the help of this tool, we’d be able to create, share and modify worklists which’ll facilitate collaboration on articles which need work. Also, this tool will enable people to work on articles which fall in their areas of interest. All in all, we’ll be able to encourage more contributions by providing an intermediate platform.
Wikipedia community will be benefited because of this project in the following ways:
- By using this tool, people will be able to collaboratively work on articles that need contributions.
- The worklists present on the tool can be used for campaigns, in person editing events or other similar activities.
- Through this tool, people can look for articles that fall in their area of interest and contribute to them.
- Also, this tool can provide a good starting point to the newcomers.
Have you contacted your mentors already?
Describe the timeline of your work with deadlines and milestones, broken down week by week. Make sure to include time you are planning to allocate for investigation, coding, deploying, testing and documentation
During the internship, I’ll be following the Agile Model of SDLC. The whole tenure of internship will be broken down into sprints of 2 weeks each. Each sprint will consist of one full SDLC until and unless it’s an epic. Hence, continuous testing and documentation will be done.
For every task, there will be an Acceptance Criteria (AC). Only after it is satisfied, task will be considered complete.
- For coding tasks AC will be : Proper UTs should be in place. Code review and beta testing should be done.
- For design tasks AC will be : Design document should adhere to the requirements and should be approved by the mentors.
- Usability Testing will be performed with representative users. After that only, the milestone will considered as complete.
The solution that I’ve designed has been explained in detail below:
Database schema for the tool has been framed as a microtask. Link for the same is : https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/1
I’ll be using Django for the backend and ReactJS for the front end.
The tool will have 5 web pages -
- Main page : On this page, user will be provided with these options - 1. Create Worklist, 2. Open My Worklists and 3. Search Worklists. (1 and 2 will only be for logged-in users.)
- Create Worklist page : This page will contain a form for creating the worklist. The user needs to provide the following details for creating a worklist - 1. Name, 2. Theme, 3. Description, 4. Option to add articles to the list (will be the same as "Add articles to existing worklists" except that it will be successful only if the worklist is created successfully) and 5. Option to add articles associated with a petscan query to this worklist, by providing petscan query ID. All of this info along with the user's username and creation_date will be stored in database, corresponding to the unique name and id of newly created worklist. Since the name for a worklist has to be unique, we'll not allow the user to enter a name which is already there in our database. This will be done while we'll be validating the information added by the user. The checking mechanism won't be just limited to string matching, it would also include things like - 1. new list's name differs from an existing list only by case, 2. new list's name is a plural/ singular form of the previous one etc. (I've thought of these only as of now, will update it once I get more ideas).
I’ve thought of having some pre-populated themes under which different lists can be categorized. This will make searching of lists easier. These themes can be decided in the requirements phase with the help of users.
- Search lists page : In this, the lists will be populated in a tabular format where each row will contain the list’s name, no. of articles under it and no. of editors working on it. On this page, functionality to sort by name, date_created, date_updated, no. of articles and no. of editors will be provided. Also, user will be able to filter the lists by name and theme.
- Open My Lists Page : It'll be similar to the 'Search Lists Page' except in this only user’s lists will be retrieved from the DB. Also, the user needs to be logged in to see this page. User will see the following -
- Worklists created by the user : Since we've created index over created_by attribute in the Worklist table, we'll be able to fetch all the worklists created by the user without scanning the whole Worklist table.
- Tasks created by the user : Since we've created index over created_by attribute in the Task table, we'll be able to fetch all the tasks created by the user without scanning the whole Task table.
- Tasks claimed by the user : Since we've created index over claimed_by attribute in the Task table, we'll be able to fetch all the tasks claimed by the user without scanning the whole Task table.
- Lists page : We'll fetch all the tasks belonging to that worklist by querying all the tasks which have worklistId = <worklist-page-id>. This will be fast and not require scanning the whole table because <worklistId,articleId> is primary key. The tasks will be shown in tabular form. Every task will be a hyperlink leading to it's task page. Only attractive/repulsive metrics related to that task will be shown on this page. These can be discussed, but for the first iteration I'm thinking of having status of that task (So that user only investigates Open tasks), effort involved in that task (So that user only investigates tasks of appropriate difficulty) and average page views of article on the page. Status can be of 3 forms - Open (need to be worked upon), Claimed (is being worked upon) and Closed (has been worked upon). Effort can be - Low, Medium and High. So that people can judge how much effort do they need to put in to complete the task. On this page, functionality to sort by page views and date_updated will be provided. Also, user will be able to filter the lists by name, status and effort. On this page user will be able to do the following:
Add articles to existing worklists
An existing worklist will have it's entry in database already. The user can only add articles to the worklist and not remove (but can close that task after completion). There'll be an option to "Add article to worklist" on the worklist page uniquely identified by https://tools.wmflabs.org/worklist-tool/<worklist-name> . Upon clicking the "Add article to worklist" button, user will be required to provide the following details - 1. Article Name (we'll automatically find the Wikipedia article) 2. Description (of the problem he wants to be solved) 3. Effort (easy, medium, hard - depending upon what he thinks is the effort required to solve the problem). All of this info, along with user's username (created_by), worklist_id(the id of the worklist in which user is adding the article), status (will be open for a newly created article), progress (will be 0 for a newly created article), claimed_by (None for newly created article), date_created (creation-date) will be used to create a new task. Whenever a task will be created, the corresponding article field will be created or updated (in the case that article was involved in some other task) in Article table. The page views for an article will be calculated using pageviews API, projects and grade will be calculated using pageassessments API and size will be calculated from page table
Update petscan query articles to existing worklist
Only one petscan query can be associated with a worklist. The curator of the worklist can update the petscan query id associated with that worklist by clicking on the button "Update Petscan Query ID associated with this worklist" on the worklist page.
Sharing of worklist
Whenever a worklist is created, it will be alloted an ID when it is stored in the database. Also, a worklist can be uniquely identified by its name as well. Therefore, any worklist can be identified by this url, https://tools.wmflabs.org/worklist-tool/<worklist-name> . To share a worklist, one would either share the name of the worklist or the above URL.
In a worklist, an article once added can only be closed but not removed (so that after completion also, they remain a part of the worklist. It'll help newcomers to look upon the previous completed tasks and also help us to add the metrics & reporting feature later on). Also, description and theme of the worklist will be editable but not it's name (as we've the name is being used for sharing of worklists and as a primary key in the DB). The option to edit the description and theme will be provided on the worklist page itself.
Only the curator of the worklist will be able to edit the theme and the description.
- Tasks page : We'll show the description of the task (as entered by creator of the task or those who claimed it) which will be editable, progress, effort, status [we'll fetch all these details from Task table]. The progress will be marked by the user who has claimed to work on the article. If article hasn't been claimed, it'll be either zero or equal to the value assigned by the last user who claimed it. Also, we'll show the properties of the article involved in this task, like name of the article, it's page_views, projects of which it is a part of, it's size in bytes and grades (like FA, A, GA etc). Also, we'll show in which all other tasks (and hence worklists) this article is involved in by querying all the tasks which have articleId = <article-page-id>. This will be fast and not require scanning the whole table because we've created index over articleId. On this page, user will be able to do the following:
Claim an article of a worklist which is unclaimed
The user can claim an article which was previously unclaimed and corresponding to that task's claimed_by field in database, his name will be entered and status will be changed to claimed. Because of the auto-refresh functionality, this info will be propagated to all the clients in real- time.
Update progress of a task -- only by claimed_by user
Only the claimed_by user can update progress for a task
Claimed_by user can change the status from
- claimed to close -- when the user has completed the task. In that case, that task's progress will be changed to 100 and status will be changed to closed.
- claimed to open -- when the user does not want to work upon the task anymore. At this time, user can change progress of the task, effort of the task and description of the task so that these stay up-to date and new user who'll claim this task does not have to start from scratch.
Update effort & description of a task
Only the claimed_by user can update effort for a task, in the case when he has changed it's status from claimed to open. And the initial effort estimation and description will be set by the user who has added the article to the list.
Note : Whenever Worklist is created or PetScan Query ID of a worklist is updated, we'll be fetching and storing the results (names of articles) of the PetScan query in the PSIDArticles table. This will improve the User Experience, as otherwise users would've had to wait till the results could be fetched from PetScan, every time they would see the worklists page. But these results will be stale. So in order to make the most recent results available to the user as soon as possible, once the page loads showing the results stored in database, we'll send ajax call to the server to update the results in database and also show the updated results to the user as the call returns.
Also, we've mentioned above that we'll fetch and store various properties of articles like it's average Page Views, Size, Projects, Grade. We'll update these values through a cron job, which will run after a specific interval of time, the length of which can be discussed.
Apr 24 - May 1
- Community Bonding Period
- Setting up environment on Toolforge
- Setting up CI on Travis for Git repository
- Understanding Wikimedia’s current tools that are of similar type. It’ll help me in identifying the areas of improvement
- Studying similar existing tools that don’t belong to Wikimedia.
Note: As I’m already familiar with MediaWiki API, toolforge, git and Wikimedia’s database structure, I’d get a headstart.
May 7 - May 14
- Though this will be a part of community bonding period, still I will start with the implementation part so that MVP can be completed by May end. (Before this, I'll be having end semester exams, hence I'll be able to start from 7th May only.)
- Decide upon the database to be used and then finalize the database schema.
- Write the DB calls
- Write module for creating worklists (It includes the validation checks mentioned above in the detailed solution)
May 14 - May 28 (Week 1, 2)
- The major focus during this sprint will be to finish the basic backend functionality with minimal UI, so that MVP release can be done on 30th May.
- Integrate OAuth with the tool.
- Write module for finding worklists.
- Implement the feature of sharing of worklists.
- Implement auto-refresh functionality which will make real-time updation of data possible.
- Write module for importing Petscan queries.
May 30 : Do MVP Release and gather feedback from users
May 30 - June 10 (Week 3, 4)
- Write module for describing worklists created and being worked upon by the user
- Write module for calculating page views of the task
- Write module for calculating grading and projects (to which it belongs) of the task
- Write module for fetching the size of the task's article from the database
June 10 - June 24 (Week 5,6)
- Adding autocomplete feature to finding worklists module
- Implement filtering and sorting mechanism for lists and articles page
- Preparing UI mockup
Note: This task will help us to gain clarity about the functionality that we need to implement.
(Acceptance Criteria: Design document should adhere to the requirements and should be approved by the mentors.)
June 24 - July 8 (Week 7,8)
- In this sprint I’ll be working upon the stretch goals
- Write module to draw lists of articles from other sources like CSV and .txt files
- Write module for exporting information about the list for use in other tools
July 8 - July 22 (Week 9,10)
- Implementation of complete UI part
- In this sprint, optimizations like lazy load will also be included
July 22 - Aug 12 (Week 11,12)
- Write UTs for the tool
- Finish up pending reviews and documentation, if any.
Even after internship, I plan to lookout for new and challenging opportunities being offered by Wikimedia community. I will continue to maintain and contribute to the development of this tool.
- Gained familiarity with existing tools and formats
- Setup environment and CI pipeline for the tool
- Finalized database schema
- Finished back-end implementation of modules for creating, searching, importing petscan queries and sharing worklists
- Integrated OAuth with the tool
- Added auto-refresh functionality
- Do MVP Release
- Added modules for finding user’s lists and filtering and searching worklists
- Prepared design mockups
- Added functionality for fetching page views, projects, grading and size of the task's article
- Added autocomplete feature in finding worklists
- Written code for 2 stretch goals
- July - August:
- Finished UI for the tool
- Done with final release of the product
Describe how you plan to communicate progress and ask for help, where you plan to publish your source code, etc
- I’ll be maintaining the code base using Git. Hence, PRs will be used to gather feedback from the mentors on the same.
- For sharing status and discussing issues, I'll be using Phabricator and Github issues respectively. Other than this, IRC and Gmail will act as other communication mediums. I’ll be active on IRC during my working hours.
- Also, I’ll be participating in scrum meetings and sprint planning. These meetings will help me in knowing the community better. And to discuss tasks and issues, I’ll be having one-on-one once in every two weeks with my mentors.
- I’ve planned to use meta wiki user page for quick weekly updates.
- I’ve also thought to write a blog about my experiences and challenging work that I’ll be doing as a part of this project. I plan to write it once in every 2 weeks.
Tell us about a few:
Your education (completed or in progress)
I am a final year computer science undergraduate student at PEC University of Technology, Chandigarh, India. I am a Computer Science enthusiast, because it gives me the power to reach billions and transform their lives. I fell in love with mathematics and problem-solving in primary school and that love drives my programming. I enjoy Coding and Building new things.
How did you hear about this program?
I got to know about this from my college technical society.
Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?
Till August 1 I will be having holidays, so I will be completely free. After that I will be joining my job in Amazon after graduation.
We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
What does making this project happen mean to you?
Wikipedia has always been an ultimate destination when it comes to making college assignments or reading about a topic or finding answers to some questions.
What google is to searching, Wikipedia is to knowledge. And Wikipedia is nothing without its community. If I’ll get an opportunity to make a difference to the community through my tool, the happiness that I’ll get out of it will be immeasurable.
Whatever I've done so far in my computer science career has only reached a maximum of 500 people. I want to build something which will impact millions, including my friend sitting next to me. And this project gives me that opportunity, and at the same time, challenges me to learn and grow. Hence, I would love to solve this hard and complex problem.
Describe any relevant projects that you've worked on previously and what knowledge you gained from working on them. Describe any open source projects you have contributed to as a user and contributor (include links). If you have already written a feature or bugfix for a Wikimedia technology such as MediaWiki, link to it here; we will give strong preference to candidates who have done so
Software Engineering Intern , Amazon, Bengaluru : Jan 2017 - July 2017
- I worked with the tech team that supported the retail business of Amazon.
- I built a visualization platform that quantified a team’s performance with the help of metrics. The visualizations provided an insight into the data which eventually helped in increasing a team’s productivity.
- I used plethora of AWS technologies like QuickSight, Redshift, DynamoDB, Lambda etc for making this tool. Open source tools like Kibana were used to create visualizations.
- Multiple reviews were held with senior leaders and clients and the platform improved with multiple iterations, which in turn improved my coding, testing and leadership skills.
- I got pre-placement offer from Amazon for my exemplary work during the internship.
Researchshala - www.researchshala.com
- Researchshala is an online platform to connect professors to research-interns, helping them with their research projects
- I built this website in Django, hosting on AWS for the budding startup of my college
Research-Intern at IIT Delhi
- I was among the top 0.33% students all over India who got selected for Student Research Fellowship Program, IIT-Delhi.
- At IITD, I was a Research Fellow under Asst. Prof. Maya Ramanath.
- I built a technical knowledge base to establish relationships between technical concepts by using Wikipedia as the input corpus and then organised relations into a hierarchical structure.
- For this, I'd set up Hadoop for using PATTY, a MPI resource and maintained database using PostgreSQL and MongoDB.
Health Monitoring Application using Brain Computer Interface
- I built an android app to analyse and improve user’s concentration and meditation levels.
- The state of the user’s brain was monitored using Muse, an EEG device.
- Binaural beats were used to improve the current levels of concentration and meditation.
Accessing and Extracting Information from the Hidden-Web
- Objective was to build a database of publications and research papers of foreign academicians of Indian origin.
- For this, I built a crawler that explored the Deep Web in a targeted manner after receiving a list of names of academicians as input.
- Libraries like BeautifulSoup, Requests and JSoup were used for fetching and structuring the results.
- This Project was funded by NSTMIS, a division of Department of Science and Technology, India.
Intelligent Subtraction Tutor
- I built a chatbot that is capable of teaching subtraction to a child by first judging its current knowledge level and then teaching it accordingly.
- Facebook Messenger API was used to host the bot on Modulus, having Facebook messenger as the interface.
- ai.wit was used to train the bot.
Password Based Door Lock
- I built a password-based door-lock system, integrated with IR based motion sensor using 8051 microcontroller
- The microcontroller was microprogrammed in C language.
Open Source contributions:
- Developed a user contribution summary tool named WikiCV for Wikipedia under Outreachy Round 15. This tool summarizes the user contributions in an automated manner and present them in a CV-like manner. It was developed with the intent of creating a powerful force through the tool - 1. to draw new editors to the project; 2. allow existing editors to spend more time on it and 3. get a professional edge by showing off their contributions. Link for the tool: https://tools.wmflabs.org/outreachy-wikicv/wiki-cv/Tgr/ , Link for the repository: https://github.com/MeghaSharma21/WikiCV , Link for sprint board: https://phabricator.wikimedia.org/tag/wikicv/
- Developed a Sublime Package for C++ code snippets, which contained templates for algorithms commonly used in competitive programming. This was developed with the intent to help programmers solve questions with greater speed and accuracy. Link: https://github.com/MeghaSharma21/CPP_Competitive_Programming_Sublime_Snippets
- Developed an Atom Package, which has been developed for competitive programmers on similar lines like the aforementioned sublime package. Link: https://github.com/MeghaSharma21/CPP_Competitive_Programming_Atom_Snippets
Note : Both the packages are available with their respective package managers, having 500+ installations.
- Created a tool on toolforge to show the rank of a user based on no. of edits in Hindi Wikipedia. The rank and percentile of the user is presented along with the graph showing distribution of all the users according to number of edits which adds more clarity as to where the user stands. For this, I developed a Django app and the User table was taken as a data source. Link to Code: https://github.com/MeghaSharma21/outreachy-wikimedia-user-rank-tool/pull/1 , Link to Tool: https://tools.wmflabs.org/outreachy-user-ranking-tool
- Created a tool on Toolforge that shows the contributions of a user in terms of no. of pages edited and created in the present year. The contributions have been shown in the form of a timeline similar to the Github’s contribution timeline. Link to tool: https://tools.wmflabs.org/outreachy-user-contribution-tool/ , Link to code: https://github.com/MeghaSharma21/outreachy-wikimedia-user-contribution-tool/pull/1
Designed the database schema for the tool. It has been shown in the form of a class diagram, as in Django, database schema is declared as collection of classes in models.py file of the app.
Developed a tool on Toolforge that takes the Petscan query ID as input, fetches the results from Petscan, parses them and presents in the form of cards.
Link for the code: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/2
Link for the tool: https://tools.wmflabs.org/gsoc-petscan-query-articles/
Developed a tool on Toolforge that is mainly a prototype/demo for creating and showing of worklists with minimal UI. Through this, I've achieved basic functionality and end-to-end connectivity. Whole of it will be reused for the main tool.
Link for the code: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/3
Link for the tool: https://tools.wmflabs.org/worklist-tool
Any Other Info
Add any other relevant information such as UI mockups, references to related projects, a link to your proof of concept code, etc
Through my Outreachy Internship with Wikimedia, I've got pretty familiar with the Wikimedia community and tools. Also, the technology stack used in WikiCV is pretty much same as we would want to use for this tool. All this will give me a head start.